Following Erick’s Bad 🍏 report, I looked at MultiThreadedOCPTest.test().
I've found a failure in testFillWorkQueue() in Jenkins logs (not able to
reproduce locally).

This test enqueues a large number of tasks (115, more than the 100
Collection API parallel executors) to the Collection API queue for a
collection COLL_A, then observes a short delay and enqueues a task for
another collection COLL_B.
It verifies that the COLL_B task (that does not require the same lock as
the COLL_A tasks) completes before the third (?) COLL_A task.

*Test failures happen for a disarmingly simple reason:* when enqueues are
slowed down enough, the first 3 tasks on COLL_A complete even before the
COLL_B task gets enqueued!

In the failed Jenkins test execution, the COLL_B task enqueue happened
1275ms after the enqueue of the first COLL_A, leaving plenty of time for a
few (and possibly all) COLL_A tasks to complete.

I suggest two changes (is adding a PR to SOLR-12801
<https://issues.apache.org/jira/browse/SOLR-12801> the right way to do it?
Will somebody merge it from there?):

   - Make the “blocking” COLL_A task longer to execute (increase from 1 to
   2 seconds) to compensate for slow enqueues. Hopefully 2 seconds is
   sufficient… If it’s not, we can increase it more later.
   - Verify the COLL_B task (a 1ms task) finishes before the *first* COLL_A
   task (the long running one) and not the 3rd. This would be a better
   indication that even though the collection queue was filled with tasks
   waiting for a busy lock, a non competing task was picked and executed right
   away.

There would still be a grey area: what if the enqueue of the COLL_B task
happened before the first COLL_A task even started to execute? If we wanted
to deal with that, we could enqueue all COLL_A tasks, make the second COLL_A
task long running (and not the first), enqueue the COLL_B task once the
first COLL_A task has completed and verify the COLL_B task completes before
the second (long running) COLL_A task. I believe that's slightly overkill
(yet easier to implement than to describe, so can include that as well in
the PR if deemed useful).

Ilan

Reply via email to