I'd like some discussion about the problem outlined in SOLR-14861

Erick Erickson Wed, 16 Sep 2020 05:15:47 -0700

All:

The test framework, and perhaps all of Solr has a disorderly shutdown process. 
I’ve seen at least one case where this is responsible for “bogus” test 
failures, bogus in the sense that due to race conditions the test failed with 
unreleased objects. The short form is that our test harness can call 
CoreContainer.shutdown() directly, and we got to it while reload() operations 
were in-flight and had gotten past the test for CoreContainer.isShutdown(). 
Then the reload() thread is time-sliced out, the shutdown() thread gets partway 
through and the reload() thread then picks up, but CoreContainer is partly 
shutdown and things go wonky.


The focus on CoreContainer.isShutdown is just for illustration and is somewhat 
of a legacy problem since the test harness manipulates at this level.

Then looking through the code, there are a number of places outside 
CoreContainer that check the isShutdown flag in CoreContainer, so the problem 
is more widespread than just CoreContainer.

Don’t look at the patch on that JIRA, it’s a totally bad approach the more I 
think about it.

Generically, we need a mechanism that, when we shut Solr down we

1> stop any new requests from being processed. IMO they should be rejected 
immediately
2> wait for all in-flight operations to complete. This could get tricky if one 
of the operations is, say, optimize.
3> then shut down.

Then perhaps rework the test harness to use that mechanism rather than call 
CoreContainer.shutdown() directly.

That said, I don’t have a clue how to make that happen.

Erick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

I'd like some discussion about the problem outlined in SOLR-14861

Reply via email to