[ 
https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284147#comment-16284147
 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
----------------------------------------------

I thought about three options
1. Fix the actual race condition, don't let duplicate async IDs at all.
2. Fix the Overseer so that it checks before running each task if one with the 
same ID was completed before.
3. Let the Overseer re-run the tasks (leave it as it is now). Maybe just add 
logging, or a way to show the error (failed tasks)

#3 can be dangerous, since the task could be something like a DELETEREPLICA. If 
the duplicate ID was caused by some broken retry logic on the client side, Solr 
could be deleting many replicas with what the client thought was a single 
command. 

#2 may be OK, the problem I see with that is that it gives an inconsistent 
behavior to the user (sometimes the duplicate IDs are rejected, and sometimes 
not). Also, this would make the Overseer silently drop tasks (yes, we can add 
some sort of failure in the logs but we can’t assume anyone is going to 
notice). 

#1 is the correct fix from the functional stand point, however I can’t think of 
a way to really fix the race condition without adding an extra write to 
ZooKeeper, which we’d have to do for every collection request with an asyncID. 
And this is to cover from a client misuse edge case. 

I think (and I discussed this offline with [~anshumg], he thinks this too) #1 
is the way to go. I’ll put up a patch.

> Solr can accept duplicated async IDs
> ------------------------------------
>
>                 Key: SOLR-11739
>                 URL: https://issues.apache.org/jira/browse/SOLR-11739
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tomás Fernández Löbbe
>            Priority: Minor
>         Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs 
> are sent fast enough, a race condition in Solr will let the repeated IDs 
> through. The duplicated task is ran and and then silently fails to report as 
> completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to