DIdn’t mean to imply that the test _should_ be serialized, rather that I’ve inadvertently done so when something shouldn’t be…
Have fun! > On May 30, 2020, at 6:23 PM, Ilan Ginzburg <ilans...@gmail.com> wrote: > > Erick, > > Serializing isn't really an option here, we want to test that execution order > is not submission order... > > I believe if we wanted to verify the property that this test seems to assert: > "when there are more tasks than the number of executor threads and all are > blocked on a single lock, then a task enqueued afterwards and requiring > another available lock will be run right away", then maybe a unit test of > OverseerTaskProcessor/OverseerCollectionConfigSetProcessor would be > appropriate. But doing so would require refactoring: the run() method of > OverseerTaskProcessor is 200 lines. > > I'll likely go with an "overkill" option. It will still not be free from > timing issues, but hopefully less vulnerable. And I'll try to identify when > test preconditions do not hold, in order to fail with an informative message > such as "slow task finished execution too soon, cannot test running tasks in > parallel". > > Will create a new Jira. > > Thanks for your week-end feedback, > Ilan > > On Sat, May 30, 2020 at 11:26 PM Erick Erickson <erickerick...@gmail.com> > wrote: > Ilan: > > Please raise a new JIRA and attach any fixes there, a Git PR or patch > whichever you prefer. Your analysis will get lost in the noise for > SOLR-12801. And thanks for digging! We usually title them something like > “harden MultiThreadedOCPTest” or “fix failing MultiThreadedOCPTest” or > similar. > > I haven’t really looked at the code you’re analyzing, it’s the weekend after > all ;). So ignore this question if it makes no sense. Anyway, I’m always > suspicious of using delays for just this kind of reason; depending on the > hardware something may fail sometimes. Is there any way to make the > sequencing more of a positive lock? Unfortunately that can be tricky, I’ve > wound up doing things like serializing all tasks which…er…isn’t a good fix > ;). But it sounds like your “overkill” section is along those lines? > > Up to you. > > Meanwhile, I’ve started beasting that test on my spare machine. If you’re > curious about what that is: > https://gist.github.com/markrmiller/dbdb792216dc98b018ad > > But don’t worry about that part. The point is I can run the test multiple > times, with N running in parallel, hopefully making my machine exhibit the > problem sometime over the night. If I can get it to fail before but not after > your fix, it provides some reassurance that your fix is working. Not totally > certain of course. Otherwise, we’ll just commit your fixes and see if Hoss’ > rollups stop showing it. > > Thanks again! > Erick > > > > > On May 30, 2020, at 1:42 PM, Ilan Ginzburg <ilans...@gmail.com> wrote: > > > > MultiThreadedOCPTest > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org