[
https://issues.apache.org/jira/browse/SOLR-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178823#comment-13178823
]
Mark Miller commented on SOLR-3001:
-----------------------------------
I fixed a bug around this a week or two ago (adding more than one doc per
request) - I'll check around this again, but best may be to just try with an
updated version.
> Documents droping when using DistributedUpdateProcessor
> -------------------------------------------------------
>
> Key: SOLR-3001
> URL: https://issues.apache.org/jira/browse/SOLR-3001
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.0
> Environment: Windows 7, Ubuntu
> Reporter: Rafał Kuć
>
> I have a problem with distributed indexing in solrcloud branch. I've setup a
> cluster with three Solr servers. I'm using DistributedUpdateProcessor to do
> the distributed indexing. What I've noticed is when indexing with
> StreamingUpdateSolrServer or CommonsHttpSolrServer and having a queue or a
> list which have more than one document the documents seems to be dropped. I
> did some tests which tried to index 450k documents. If I was sending the
> documents one by one, the indexing was properly executed and the three Solr
> instances was holding 450k documents (when summed up). However if when I
> tried to add documents in batches (for example with StreamingUpdateSolrServer
> and a queue of 1000) the shard I was sending the documents to had a minimum
> number of documents (about 100) while the other shards had about 150k
> documents.
> Each Solr was started with a single core and in Zookeeper mode. An example
> solr.xml file:
> {noformat}
> <?xml version="1.0" encoding="UTF-8" ?>
> <solr persistent="true">
> <cores defaultCoreName="collection1" adminPath="/admin/cores"
> zkClientTimeout="10000" hostPort="8983" hostContext="solr">
> <core shard="shard1" instanceDir="." name="collection1" />
> </cores>
> </solr>
> {noformat}
> The solrconfig.xml file on each of the shard consisted of the following
> entries:
> {noformat}
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
> <lst name="defaults">
> <str name="update.chain">distrib</str>
> </lst>
> </requestHandler>
> {noformat}
> {noformat}
> <updateRequestProcessorChain name="distrib">
> <processor
> class="org.apache.solr.update.processor.DistributedUpdateProcessorFactory" />
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
> {noformat}
> I found a solution, but I don't know if it is a proper one. I've modified the
> code that is responsible for handling the replicas in:
> {{private List<String> setupRequest(int hash)}} of
> {{DistributedUpdateProcessorFactory}}
> I've added the following code:
> {noformat}
> if (urls == null) {
> urls = new ArrayList<String>(1);
> urls.add(leaderUrl);
> } else {
> if (!urls.contains(leaderUrl)) {
> urls.add(leaderUrl);
> }
> }
> {noformat}
> after:
> {noformat}
> urls = getReplicaUrls(req, collection, shardId, nodeName);
> {noformat}
> If this is the proper approach I'll be glad to provide a patch with the
> modification.
> --
> Regards
> Rafał Kuć
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]