On Mar 12, 2012, at 9:39 AM, Per Steffensen wrote: > Mark Miller skrev: >> Hey Per, >> >> A couple things: >> >> 1. Distributed realtime get is coming - I know Yonik was looking at this >> recently but got caught up in some other things. >> >> > Fantistic! I believe, if the client becomes "routing aware", it is only > necessary when you are sending more than one id (using "ids") in your > realtime-get request, and even then the distribution (to several Solr servers > and merging of results from those) could happen in the client (or not, if you > dont think that is appropriate). >> 2. There is a Solrj client that is aware of the cluster state - its called >> CloudSolrServer. You give it the zookeeper address rather than a node's >> address. Currently it doesn't send directly to the leader, but this is >> planned > Nice! So you plan to solve the "two hop" problem (as ElasticSearch calls it) > that I was mentioning! > http://www.elasticsearch.org/guide/reference/java-api/client.html >> - it's a little tricky due to lack of access to the Schema for hashing, but >> likely coming soon - there is a JIRA issue for it. Clients in other >> languages should be able to do the same thing. >> >> > But can I do realtime-get from a SolrJ client already, then? You say that > CloudSolrServer does not go directly to leader yet, and if I am correct when > I claim that realtime-get (/get) requests are not routed on serverside to > leader, then I will still not be able to do realtime-get using > CloudSolrServer. Am I correct that I cant do it yet, even using > CloudSolrServer?
Right, you can't yet even with CloudSolrServer - but I think it will be done soon - certainly before the 4 release anyway. > > BTW, congratulations and thanks, for the terrific work you guys are doing on > Solr(Cloud)! Hope to get to contribute "versioning" (for optimistic locking) > and a "unique key" feature that allows the operation to fail if the document > already exists (instead of just automatically deleting what is already there). >> - Mark >> >> On Mar 12, 2012, at 5:26 AM, Per Steffensen wrote: >> >> >> >>> Hi >>> >>> I believe Solr(Cloud) is doing some internal routing of update-requests to >>> make sure documents are stored in the correct core/shard decided by Solrs >>> internal routing algoritm (I believe it basically finds out who is the >>> leader-shard for a given document, using shared information in ZK, info >>> about the collection and hash(document.id)). All nice and cool. >>> >>> I also believe realtime-gets are not forwarded internally in Solr through >>> this routing algorithm, and that it therefore is "impossible" to do >>> realtime-gets from a client, because you dont know which core/shard to >>> contact directly, again because you dont know the routing alogrithm. If Im >>> wrong, it would be very helpfull with a few directions on how to do >>> realtime-gets from a client to a Solr servers system containing many shards >>> and collection. If Im right, I think it would be very nice if the the >>> routing algorithm was somehow exposed to the client (in code reachable from >>> SolrJ) so that you can get to do realtime-gets from a SolrJ-based client - >>> if it should be done automatically for you of if the client using SolrJ >>> explicitly needs to call some code to get info about the core to contact, >>> is not so important for now. >>> >>> Such a solution would also make it possible to get rid of another >>> performance related "problem", that most update-requests has to be >>> transported among JVMs twice to reach their destination. First from client >>> to some "random" Solr server, and then from this Solr server to the Solr >>> server holding the core involved in the update. If routing information was >>> available for the client it could make sure to route its updates directly >>> to the core (the one currently playing the role as leader-shard for the >>> shard to which the routing algorithm maps the document) involved in the >>> update. >>> >>> ElasticSearch has a solution to this problem by the usage of "Node Client" >>> (instead of just "Transport Client"), where a node client is basically a >>> real node in the system that just doesnt store document, but which have all >>> the logic and shared information like e.g. routing algorithm available - >>> http://www.elasticsearch.org/guide/reference/java-api/client.html >>> . It certainly doesnt have to be like that with Solr clients, but it would >>> be nice if somehow routing logic where available to the SolrJ so that it >>> can send its updates (and realtime-gets) directly to the correct >>> destination. >>> >>> Hope to get some comments on this issue. >>> >>> Regards, Per Steffensen >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: >>> dev-unsubscr...@lucene.apache.org >>> >>> For additional commands, e-mail: >>> dev-h...@lucene.apache.org >>> >>> >>> >>> >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: >> dev-h...@lucene.apache.org >> >> >> >> >> > - Mark Miller lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org