[ 
https://issues.apache.org/jira/browse/SOLR-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Sirotzke updated SOLR-9925:
-------------------------------
    Description: 
When pushing documents to Solr in parallel, doing a delete-by-query and then 
add for the same set of IDs within each thread results in some of the replicas 
missing some of the child documents.  All the parent documents are successfully 
replicated.

This appears to trigger some sort of race condition, since:

* Documents are never missing from the leader.
* Documents _might_ be missing from the replicas.
* When they are missing, the number and which documents are different for each 
replica and each run.
* It happens more easily with large documents; my test script needs a huge 
number of documents to trigger it a small number of times, whereas it happens 
~5% of the time on our dataset.
* We're currently on Solr 5.5.2, but I've also managed to trigger it on 6.3.0
* When not running anything in parallel, this doesn't occur.

Quick aside, since this is surely the first thing that will jump out:  We can't 
just do an update due to to the uniqueKey/\_root\_ issue behind SOLR-5211.

  was:
When pushing documents to Solr in parallel, doing a delete-by-query and then 
add for the same set of IDs within each thread results in some of the replicas 
missing some of the child documents.  All the parent documents are successfully 
replicated.

This appears to trigger some sort of race condition, since:

* Documents are never missing from the leader.
* Documents _might_ be missing from the replicas.
* When they are missing, the number and which documents are different for each 
replica and each run.
* It happens more easily with large documents; my test script needs a huge 
number of documents to trigger it a small number of times, whereas it happens 
~5% of the time on our dataset.
* We're currently on Solr 5.5.2, but I've also managed to trigger it on 6.3.0
* When not running anything in parallel, this doesn't occur.

Quick aside, since this is surely the first thing that will jump out:  We can't 
just do an update due to to the uniqueKey/_root_ issue behind SOLR-5211.


> Child documents missing from replicas during parallel delete+add
> ----------------------------------------------------------------
>
>                 Key: SOLR-9925
>                 URL: https://issues.apache.org/jira/browse/SOLR-9925
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.5.2, 6.3
>         Environment: Java 1.8 (OpenJDK) on both CentOS 6.7 and Ubuntu 16.04.1
>            Reporter: Dan Sirotzke
>         Attachments: generate.py, run.sh
>
>
> When pushing documents to Solr in parallel, doing a delete-by-query and then 
> add for the same set of IDs within each thread results in some of the 
> replicas missing some of the child documents.  All the parent documents are 
> successfully replicated.
> This appears to trigger some sort of race condition, since:
> * Documents are never missing from the leader.
> * Documents _might_ be missing from the replicas.
> * When they are missing, the number and which documents are different for 
> each replica and each run.
> * It happens more easily with large documents; my test script needs a huge 
> number of documents to trigger it a small number of times, whereas it happens 
> ~5% of the time on our dataset.
> * We're currently on Solr 5.5.2, but I've also managed to trigger it on 6.3.0
> * When not running anything in parallel, this doesn't occur.
> Quick aside, since this is surely the first thing that will jump out:  We 
> can't just do an update due to to the uniqueKey/\_root\_ issue behind 
> SOLR-5211.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to