Ryan Ernst created SOLR-4643:
--------------------------------

             Summary: Refactor shard handler (and factory) to make pieces more 
pluggable
                 Key: SOLR-4643
                 URL: https://issues.apache.org/jira/browse/SOLR-4643
             Project: Solr
          Issue Type: Improvement
            Reporter: Ryan Ernst


Over the past few weeks I've been trying to write my own shard handler/factory, 
and it is a bit of a pain.  The pieces that I don't want to reimplement are 
tied very closely with those that I do.

I believe the current design is as follows:

ShardHandlerFactory - created once, shared across cores (except in some legacy 
case where it is per core?).  This contains the "heavyweight" stuff like 
threadpool for parallelizing requests and httpclient.  It also is what keeps a 
solrj loadbalancer object.

ShardHandler - created per request, it has the logic for determining if a 
request is distributed, and sending the requests in parallel (using an executor 
from the parent factory object).  It also has the knowledge of how to send 
requests and parse the response embedded within the parallelization piece 
(through solrj code).

I've attempted to address some of the ease of plug-ability:
https://issues.apache.org/jira/browse/SOLR-4544
This was an attempt to get to reuse the code for parallelizing the requests, 
but still plug in code for making the requests.  It sort of works, but was just 
a stop gap measure.  You still cannot format the request or parse the response 
without reimplementing ShardHandler.

https://issues.apache.org/jira/browse/SOLR-4613
Here I was trying to only require creating a shard handler when the request is 
distributed, instead of every request just to find out if it is distributed.

At this point I thought I would create a jira to write down a proposal for how 
to do this refactoring, instead of continuing with piecemeal/out of context 
jiras.


I view this shard handler business as needing the following:
1. Something to parallelize the requests.  Most people should never have to 
replace this (if anyone?).  It contains the thread pool and execution service 
and is global (like the shard handler factory now).

2. Something that knows how to talk to the shards.  This includes formatting 
the request and parsing the response. This could probably be per core or even 
per request handler?

3. Something to do load balancing.  This could probably be in 2, although I 
could see it being separate for easier plugging of LB without having to handle 
request/response format or vice versa.  It would contain the http client for 
talking to hosts, and so probably still be global.

I would love to get consensus on the design of this before going off and doing 
it, and suggestions for how to break this into smaller pieces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to