Flaky TestCircuitBreaker.testResponseWithCBTiming

Alex Deparvu Mon, 03 Jul 2023 11:03:04 -0700

Hi,

I am looking at the flaky TestCircuitBreaker.testResponseWithCBTiming test.


Failure stats
  Class: org.apache.solr.util.TestCircuitBreaker
  Method: testResponseWithCBTiming
  Failures: 6.04% (20 / 331)


There is a ticket tracking this already SOLR-15819, that contains an
initial analysis but unfortunately has not moved in a while, so I will be
building on top of that, trying to raise some visibility and ideally move
to reduce this noise.

The trouble with this flaky test is that it will setup a memory circuit
breaker with a 75% threshold and fire off a number of queries without any
apparent need to test anything:
public void testResponseWithCBTiming() {
    assertQ(   /* list of queries */ );
}

The way I am reading this is: the test is actually not verifying anything
and as it can already be seen, it is very fragile to external factors
(concurrent tests on the same jvm eating the available heap) so failing
frequently.
I am proposing to simply remove it. Alternatively we could leave it in but
bump the threshold to 95%.

Just going purely on the test name, I am seeing a 'query response timing'
aspect. If this was the intention behind it (to check overhead of the CBs)
we could transition this test over to the benchmark module where we could
compare CB vs non-CB mode numbers.

Please chime in with your thoughts, and/or let me know if I missed anything.

best,
alex

Flaky TestCircuitBreaker.testResponseWithCBTiming

Reply via email to