Hi Soheb, On Wed, 26 Jan 2011 16:29 +0000, "Soheb Mahmood" <soheb.luc...@gmail.com> wrote:
> We are going to implement distributed indexing for Solr - without the > use of SolrCloud (so it can be easily up-scaled). We have a deadline by > February to get this done, so we need to get cracking ;) :-) > So far, we've had a look at the solr classes and thought about > distributed indexing on Solr, and we have come up with these ideas: > > 1. We plan to modify SimplePostTool to accommodate posting to specific > shards. We are going to add an optional system property to allow the > user to specify a list of shards to index to Solr. > Example of this being "java > -Durl=http://localhost:7574/solr/collection1/update > -Dshards=localhost:8983/solr,localhost:7574/solr -jar post.jar <list of > XML files>" As Yonik says, the SimplePostTool is really for testing. The shard information must be contained within the URL, and processed by an UpdateRequestHandler (called DistributedUpdateRequestHandler?). That way, you can embed that data into the solrconfig.xml file as an invariant or a default, or later it can be derived from Zookeeper in SolrCloud. > We also plan to modify server request processing to handle distributed > indexing. We are looking at CommonsHttpSolrServer.java for ways to > accomplish this. > > With all these changes, we realise that we are only modifying the Java > version, and that other languages need to be updated to accommodate our > changes (e.g. perl). We were wondering if there was a simple way of > applying these changes we wrote in Java across all the other languages. If you add this support to Solr itself, it is then the responsibility of each client library to worry about supporting it. You should only be focussing on the Solr DistributedUpdateHandler code rather than on any client libraries (other than the code you use as your test harness. > 2. We are going to make an interface to handle distributed writing. We > plan for it to sit between the Solr server and the shards - if no shards > are specified, then the post.jar tool will work exactly the same way it > does now. However, if the user specifies shards for post.jar, then we > want a class that has extended our interface to kick into action. The interface you need will be a ShardPolicy or some such. You will hand to it a document, and a number of or list of shards, and it will tell you which shard that document should go in. This interface will then allow for pluggable shard policies, whether a simple modulo on the document ID (for deterministic indexing) or a simple round-robin (for random indexing). You'll then need to split the documents you've gathered from the post request to the UpdateRequestHandler, and forward them to whichever shards the ShardPolicy suggested. > 3. We plan to test our results by acceptance testing (we run Solr and > see if it works ourselves) and writing a test class. Sounds great. Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org