[jira] [Commented] (SOLR-9191) OverseerTaskQueue.peekTopN() fatally flawed

ASF GitHub Bot (JIRA) Tue, 07 Jun 2016 16:52:31 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319716#comment-15319716
 ]


ASF GitHub Bot commented on SOLR-9191:
--------------------------------------

Github user dragonsinth commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/41#discussion_r66172805
  
    --- Diff: 
solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java ---
    @@ -466,6 +466,8 @@ private void markTaskComplete(String id, String asyncId)
               log.warn("Could not find and remove async call [" + asyncId + "] 
from the running map.");
             }
           }
    +
    +      workQueue.remove(head);
    --- End diff --
    
    @markrmiller can you think of any reason not to do this?  I don't 
understand why currently getting things out of the queue takes an extra 
iteration.  I think my fix unmasked a latent problem exposed by 
DeleteStatusTest; to get that test to pass I have to eagerly remove completed 
items from the work queue, which seems correct to me.  Not sure why we'd want 
to wait for a loop-around to `cleanUpWorkQueue()` 


> OverseerTaskQueue.peekTopN() fatally flawed
> -------------------------------------------
>
>                 Key: SOLR-9191
>                 URL: https://issues.apache.org/jira/browse/SOLR-9191
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.4, 5.4.1, 5.5, 5.5.1, 6.0, 6.0.1
>            Reporter: Scott Blum
>            Assignee: Scott Blum
>            Priority: Blocker
>             Fix For: 5.6, 6.1, 5.5.2, 6.0.2, 6.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We rewrote DistributedQueue in SOLR-6760, to optimize its obvious use case as 
> a FIFO.  But in doing so, we broke the assumptions in 
> OverseerTaskQueue.peekTopN()..
> OverseerTaskQueue.peekTopN() involves filtering out items you're already 
> working on, it's trying to peek for new items in the queue beyond what you 
> already know about.  But DistributedQueue (being designed as a FIFO) doesn't 
> know about the filtering; as long as it has any items in-memory it just keeps 
> returning those over and over without ever pulling new data from ZK.  This is 
> true even if the watcher has fired and marked the state as dirty.  So 
> OverseerTaskQueue gets into a state where it can never read new items in ZK 
> because DQ keeps returning the same items that it has marked as in-progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9191) OverseerTaskQueue.peekTopN() fatally flawed

Reply via email to