Re: Why MultiThreadedOCPTest (sometimes) fails

Erick Erickson Sat, 30 May 2020 15:46:57 -0700

DIdn’t mean to imply that the test _should_ be serialized, rather that I’ve 
inadvertently done so when something shouldn’t be…


Have fun!

> On May 30, 2020, at 6:23 PM, Ilan Ginzburg <ilans...@gmail.com> wrote:
> 
> Erick,
> 
> Serializing isn't really an option here, we want to test that execution order 
> is not submission order...
> 
> I believe if we wanted to verify the property that this test seems to assert: 
> "when there are more tasks than the number of executor threads and all are 
> blocked on a single lock, then a task enqueued afterwards and requiring 
> another available lock will be run right away", then maybe a unit test of 
> OverseerTaskProcessor/OverseerCollectionConfigSetProcessor would be 
> appropriate. But doing so would require refactoring: the run() method of 
> OverseerTaskProcessor is 200 lines.
> 
> I'll likely go with an "overkill" option. It will still not be free from 
> timing issues, but hopefully less vulnerable. And I'll try to identify when 
> test preconditions do not hold, in order to fail with an informative message 
> such as "slow task finished execution too soon, cannot test running tasks in 
> parallel".
> 
> Will create a new Jira.
> 
> Thanks for your week-end feedback,
> Ilan
> 
> On Sat, May 30, 2020 at 11:26 PM Erick Erickson <erickerick...@gmail.com> 
> wrote:
> Ilan:
> 
> Please raise a new JIRA and attach any fixes there, a Git PR or patch 
> whichever you prefer. Your analysis will get lost in the noise for 
> SOLR-12801. And thanks for digging! We usually title them something like 
> “harden MultiThreadedOCPTest” or “fix failing MultiThreadedOCPTest” or 
> similar.
> 
> I haven’t really looked at the code you’re analyzing, it’s the weekend after 
> all ;). So ignore this question if it makes no sense. Anyway, I’m always 
> suspicious of using delays for just this kind of reason; depending on the 
> hardware something may fail sometimes. Is there any way to make the 
> sequencing more of a positive lock? Unfortunately that can be tricky, I’ve 
> wound up doing things like serializing all tasks which…er…isn’t a good fix 
> ;). But it sounds like your “overkill” section is along those lines? 
> 
> Up to you.
> 
> Meanwhile, I’ve started beasting that test on my spare machine. If you’re 
> curious about what that is: 
> https://gist.github.com/markrmiller/dbdb792216dc98b018ad
> 
> But don’t worry about that part. The point is I can run the test multiple 
> times, with N running in parallel, hopefully making my machine exhibit the 
> problem sometime over the night. If I can get it to fail before but not after 
> your fix, it provides some reassurance that your fix is working. Not totally 
> certain of course. Otherwise, we’ll just commit your fixes and see if Hoss’ 
> rollups stop showing it.
> 
> Thanks again!
> Erick
> 
> 
> 
> > On May 30, 2020, at 1:42 PM, Ilan Ginzburg <ilans...@gmail.com> wrote:
> > 
> > MultiThreadedOCPTest
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Why MultiThreadedOCPTest (sometimes) fails

Reply via email to