Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

via GitHub Tue, 03 Oct 2023 03:22:32 -0700


javanna commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343870579



##########
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##########
@@ -267,7 +265,132 @@ protected LeafSlice[] slices(List<LeafReaderContext> 
leaves) {
             return slices.toArray(new LeafSlice[0]);
           }
         };
-    searcher.search(new MatchAllDocsQuery(), 10);
-    assertEquals(leaves.size(), numExecutions.get());
+    TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+    assertTrue(topDocs.totalHits.value > 0);
+    if (leaves.size() <= 1) {
+      assertEquals(0, numExecutions.get());
+    } else {
+      assertEquals(leaves.size(), numExecutions.get());
+    }
+  }
+
+  /**
+   * The goal of this test is to ensure that TaskExecutor.invokeAll waits for 
all tasks/callables to
+   * finish even if one or more of them throw an Exception.
+   *
+   * <p>To make the test deterministic, a custom single threaded executor is 
used. And to ensure
+   * that TaskExecutor.invokeAll does not return early upon getting an 
Exception, two Exceptions are
+   * thrown in the underlying Query class (in the Weight#scorer method). The 
first Exception is
+   * thrown by the first call to Weight#scorer and the last Exception is 
thrown by the last call to
+   * Weight#scorer. Since TaskExecutor.invokeAll adds subsequent Exceptions to 
the first one caught
+   * as a suppressed Exception, we can check that both exceptions were thrown, 
ensuring that all
+   * TaskExecutor#invokeAll check all tasks (using future.get()) before it 
returned.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+    List<LeafReaderContext> leaves = reader.leaves();
+    IndexSearcher searcher =
+        new IndexSearcher(
+            reader,
+            task -> {
+              task.run();

Review Comment:
   I think that I mislead you with this suggestion, sorry. I would rather use a 
single threaded executor instead. That is rather slow as it executes one task 
at a time, but it's executing on a separate thread hence the `execute` call 
does not block which is more realistic, otherwise you indeed implicitly wait 
for all tasks to be completed because invokeAll first submits them and then 
waits for them all to complete.
   
   We have control over the future (because we create a FutureTask in 
TaskExecutor), but it sounds artificial to e.g. override `get` to ensure it is 
called. I would like to rather have a higher level test that verifies that no 
running tasks are left behind once the overall top-level operation returns.
   
   I have been thinking that it probably makes sense to move this test to 
TestTaskExecutor and make it less generic. `invokeAll` takes now a collection 
of callables and perhaps you can test that independently of docs and segments 
created. Possibly increasing the number of tasks is going to make the test more 
repeatable. I think we can live with the probabilistic nature of this test 
though.
   
   The following:
   
   ```
     public void testInvokeAllDoesNotLeaveTasksBehind() {
         TaskExecutor taskExecutor = new TaskExecutor(executorService);
         AtomicInteger tasks = new AtomicInteger(0);
         List<Callable<Void>> callables = new ArrayList<>();
         callables.add(() -> {
             throw new RuntimeException();
         });
         for (int i = 0; i < 99; i++) {
             callables.add(() -> {
                 tasks.incrementAndGet();
                 return null;
             });
         }
         expectThrows(RuntimeException.class, () -> 
taskExecutor.invokeAll(callables));
         assertEquals(99, tasks.get());
     }
   ```
   
   fails without the wait 903 times out of 1000 on my machine. It is also much 
simpler and does not require a custom query either.
   
   I would still consider testing suppressed exceptions and wait in two 
different methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Reply via email to