Following Erick’s Bad 🍏 report, I looked at MultiThreadedOCPTest.test(). I've found a failure in testFillWorkQueue() in Jenkins logs (not able to reproduce locally).
This test enqueues a large number of tasks (115, more than the 100 Collection API parallel executors) to the Collection API queue for a collection COLL_A, then observes a short delay and enqueues a task for another collection COLL_B. It verifies that the COLL_B task (that does not require the same lock as the COLL_A tasks) completes before the third (?) COLL_A task. *Test failures happen for a disarmingly simple reason:* when enqueues are slowed down enough, the first 3 tasks on COLL_A complete even before the COLL_B task gets enqueued! In the failed Jenkins test execution, the COLL_B task enqueue happened 1275ms after the enqueue of the first COLL_A, leaving plenty of time for a few (and possibly all) COLL_A tasks to complete. I suggest two changes (is adding a PR to SOLR-12801 <https://issues.apache.org/jira/browse/SOLR-12801> the right way to do it? Will somebody merge it from there?): - Make the “blocking” COLL_A task longer to execute (increase from 1 to 2 seconds) to compensate for slow enqueues. Hopefully 2 seconds is sufficient… If it’s not, we can increase it more later. - Verify the COLL_B task (a 1ms task) finishes before the *first* COLL_A task (the long running one) and not the 3rd. This would be a better indication that even though the collection queue was filled with tasks waiting for a busy lock, a non competing task was picked and executed right away. There would still be a grey area: what if the enqueue of the COLL_B task happened before the first COLL_A task even started to execute? If we wanted to deal with that, we could enqueue all COLL_A tasks, make the second COLL_A task long running (and not the first), enqueue the COLL_B task once the first COLL_A task has completed and verify the COLL_B task completes before the second (long running) COLL_A task. I believe that's slightly overkill (yet easier to implement than to describe, so can include that as well in the PR if deemed useful). Ilan