[ 
https://issues.apache.org/jira/browse/SOLR-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319717#comment-15319717
 ] 

ASF GitHub Bot commented on SOLR-9191:
--------------------------------------

Github user dragonsinth commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/41#discussion_r66172870
  
    --- Diff: 
solr/core/src/test/org/apache/solr/cloud/DistributedQueueTest.java ---
    @@ -137,6 +136,49 @@ public void testDistributedQueueBlocking() throws 
Exception {
         assertNull(dq.poll());
       }
     
    +  @Test
    +  public void testPeekElements() throws Exception {
    +    String dqZNode = "/distqueue/test";
    +    byte[] data = "hello world".getBytes(UTF8);
    +
    +    DistributedQueue dq = makeDistributedQueue(dqZNode);
    +
    +    // Populate with data.
    +    dq.offer(data);
    +    dq.offer(data);
    +    dq.offer(data);
    +
    +    // Should be able to get 0, 1, 2, or 3 instantly
    +    for (int i = 0; i <= 3; ++i) {
    +      assertEquals(i, dq.peekElements(i, 0, child -> true).size());
    +    }
    +
    +    // Asking for more should return only 3.
    +    assertEquals(3, dq.peekElements(4, 0, child -> true).size());
    +
    +    // If we filter everything out, we should block for the full time.
    +    long start = System.nanoTime();
    +    assertEquals(0, dq.peekElements(4, 1000, child -> false).size());
    +    assertTrue(System.nanoTime() - start >= 
TimeUnit.MILLISECONDS.toNanos(500));
    +
    +    // If someone adds a new matching element while we're waiting, we 
should return immediately.
    +    executor.submit(() -> {
    +      try {
    +        Thread.sleep(500);
    +        dq.offer(data);
    +      } catch (Exception e) {
    +        // ignore
    +      }
    +    });
    +    start = System.nanoTime();
    +    assertEquals(1, dq.peekElements(4, 2000, child -> {
    +      // The 4th element in the queue will end with a "3".
    +      return child.endsWith("3");
    +    }).size());
    +    assertTrue(System.nanoTime() - start < 
TimeUnit.MILLISECONDS.toNanos(1000));
    +    assertTrue(System.nanoTime() - start >= 
TimeUnit.MILLISECONDS.toNanos(250));
    +  }
    +
    --- End diff --
    
    @markrmiller the new test you suggested


> OverseerTaskQueue.peekTopN() fatally flawed
> -------------------------------------------
>
>                 Key: SOLR-9191
>                 URL: https://issues.apache.org/jira/browse/SOLR-9191
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.4, 5.4.1, 5.5, 5.5.1, 6.0, 6.0.1
>            Reporter: Scott Blum
>            Assignee: Scott Blum
>            Priority: Blocker
>             Fix For: 5.6, 6.1, 5.5.2, 6.0.2, 6.2
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We rewrote DistributedQueue in SOLR-6760, to optimize its obvious use case as 
> a FIFO.  But in doing so, we broke the assumptions in 
> OverseerTaskQueue.peekTopN()..
> OverseerTaskQueue.peekTopN() involves filtering out items you're already 
> working on, it's trying to peek for new items in the queue beyond what you 
> already know about.  But DistributedQueue (being designed as a FIFO) doesn't 
> know about the filtering; as long as it has any items in-memory it just keeps 
> returning those over and over without ever pulling new data from ZK.  This is 
> true even if the watcher has fired and marked the state as dirty.  So 
> OverseerTaskQueue gets into a state where it can never read new items in ZK 
> because DQ keeps returning the same items that it has marked as in-progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to