Re: Exposing Solr routing to SolrJ client

Mark Miller Mon, 12 Mar 2012 07:24:44 -0700

On Mar 12, 2012, at 9:39 AM, Per Steffensen wrote:

> Mark Miller skrev:
>> Hey Per,
>> 
>> A couple things:
>> 
>> 1. Distributed realtime get is coming - I know Yonik was looking at this 
>> recently but got caught up in some other things.
>>   
>> 
> Fantistic! I believe, if the client becomes "routing aware", it is only 
> necessary when you are sending more than one id (using "ids") in your 
> realtime-get request, and even then the distribution (to several Solr servers 
> and merging of results from those) could happen in the client (or not, if you 
> dont think that is appropriate).
>> 2. There is a Solrj client that is aware of the cluster state - its called 
>> CloudSolrServer. You give it the zookeeper address rather than a node's 
>> address. Currently it doesn't send directly to the leader, but this is 
>> planned
> Nice! So you plan to solve the "two hop" problem (as ElasticSearch calls it) 
> that I was mentioning! 
> http://www.elasticsearch.org/guide/reference/java-api/client.html
>>  - it's a little tricky due to lack of access to the Schema for hashing, but 
>> likely coming soon - there is a JIRA issue for it. Clients in other 
>> languages should be able to do the same thing.
>>   
>> 
> But can I do realtime-get from a SolrJ client already, then? You say that 
> CloudSolrServer does not go directly to leader yet, and if I am correct when 
> I claim that realtime-get (/get) requests are not routed on serverside to 
> leader, then I will still not be able to do realtime-get using 
> CloudSolrServer. Am I correct that I cant do it yet, even using 
> CloudSolrServer?


Right, you can't yet even with CloudSolrServer - but I think it will be done 
soon - certainly before the 4 release anyway.

> 
> BTW, congratulations and thanks, for the terrific work you guys are doing on 
> Solr(Cloud)! Hope to get to contribute "versioning" (for optimistic locking) 
> and a "unique key" feature that allows the operation to fail if the document 
> already exists (instead of just automatically deleting what is already there).
>> - Mark
>> 
>> On Mar 12, 2012, at 5:26 AM, Per Steffensen wrote:
>> 
>>   
>> 
>>> Hi
>>> 
>>> I believe Solr(Cloud) is doing some internal routing of update-requests to 
>>> make sure documents are stored in the correct core/shard decided by Solrs 
>>> internal routing algoritm (I believe it basically finds out who is the 
>>> leader-shard for a given document, using shared information in ZK, info 
>>> about the collection and hash(document.id)). All nice and cool.
>>> 
>>> I also believe realtime-gets are not forwarded internally in Solr through 
>>> this routing algorithm, and that it therefore is "impossible" to do 
>>> realtime-gets from a client, because you dont know which core/shard to 
>>> contact directly, again because you dont know the routing alogrithm. If Im 
>>> wrong, it would be very helpfull with a few directions on how to do 
>>> realtime-gets from a client to a Solr servers system containing many shards 
>>> and collection. If Im right, I think it would be very nice if the the 
>>> routing algorithm was somehow exposed to the client (in code reachable from 
>>> SolrJ) so that you can get to do realtime-gets from a SolrJ-based client - 
>>> if it should be done automatically for you of if the client using SolrJ 
>>> explicitly needs to call some code to get info about the core to contact, 
>>> is not so important for now.
>>> 
>>> Such a solution would also make it possible to get rid of another 
>>> performance related "problem", that most update-requests has to be 
>>> transported among JVMs twice to reach their destination. First from client 
>>> to some "random" Solr server, and then from this Solr server to the Solr 
>>> server holding the core involved in the update. If routing information was 
>>> available for the client it could make sure to route its updates directly 
>>> to the core (the one currently playing the role as leader-shard for the 
>>> shard to which the routing algorithm maps the document) involved in the 
>>> update.
>>> 
>>> ElasticSearch has a solution to this problem by the usage of "Node Client" 
>>> (instead of just "Transport Client"), where a node client is basically a 
>>> real node in the system that just doesnt store document, but which have all 
>>> the logic and shared information like e.g. routing algorithm available - 
>>> http://www.elasticsearch.org/guide/reference/java-api/client.html
>>> . It certainly doesnt have to be like that with Solr clients, but it would 
>>> be nice if somehow routing logic where available to the SolrJ so that it 
>>> can send its updates (and realtime-gets) directly to the correct 
>>> destination.
>>> 
>>> Hope to get some comments on this issue.
>>> 
>>> Regards, Per Steffensen
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: 
>>> dev-unsubscr...@lucene.apache.org
>>> 
>>> For additional commands, e-mail: 
>>> dev-h...@lucene.apache.org
>>> 
>>> 
>>>     
>>> 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 
>> dev-unsubscr...@lucene.apache.org
>> 
>> For additional commands, e-mail: 
>> dev-h...@lucene.apache.org
>> 
>> 
>> 
>>   
>> 
> 

- Mark Miller
lucidimagination.com












---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Exposing Solr routing to SolrJ client

Reply via email to