gharris1727 commented on PR #15080:
URL: https://github.com/apache/kafka/pull/15080#issuecomment-1875639893

   > just invoking methods in order is not enough to trigger the deadlock. I 
believe it is possible to reliably reproduce the deadlock with two countdown 
latches, one countdown in the ConfigBackingStore#snapshot and another in 
ConfigBackingStore#putTaskConfigs. This requires a mock for the config backing 
store. If you have a better idea I am happy to analyze it.
   
   Yeah I understand. I think the cost of deterministically reproducing the 
deadlock is too high. I did it in #8259 because I didn't know what 
synchronization was missing and needed a repro case to debug.
   
   I would be satisfied with a test which non-deterministically reproduces the 
deadlock but is less brittle and includes less mocks. Currently we only have 
two connectors calling task reconfiguration (mirror checkpoint and source) and 
one test in the DistributedHerder. There is zero coverage in StandaloneHerder, 
which is part of why we never found this bug :)
   
   > Unrelated to this PR's issue, it may be that the wait operations in 
StandaloneHerderTest are by mistake 1000 seconds instead of milliseconds. Isn't 
it?
   
   Yeah that timeout seems a bit absurd, but if there aren't deadlocks in the 
test it should never incur that timeout. It looks like the test suite is very 
well behaved in practice, so I'm inclined to keep it as-is: 
https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.container=*StandaloneHerderTest&tests.sortField=FLAKY


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to