gharris1727 commented on PR #15080: URL: https://github.com/apache/kafka/pull/15080#issuecomment-1875639893
> just invoking methods in order is not enough to trigger the deadlock. I believe it is possible to reliably reproduce the deadlock with two countdown latches, one countdown in the ConfigBackingStore#snapshot and another in ConfigBackingStore#putTaskConfigs. This requires a mock for the config backing store. If you have a better idea I am happy to analyze it. Yeah I understand. I think the cost of deterministically reproducing the deadlock is too high. I did it in #8259 because I didn't know what synchronization was missing and needed a repro case to debug. I would be satisfied with a test which non-deterministically reproduces the deadlock but is less brittle and includes less mocks. Currently we only have two connectors calling task reconfiguration (mirror checkpoint and source) and one test in the DistributedHerder. There is zero coverage in StandaloneHerder, which is part of why we never found this bug :) > Unrelated to this PR's issue, it may be that the wait operations in StandaloneHerderTest are by mistake 1000 seconds instead of milliseconds. Isn't it? Yeah that timeout seems a bit absurd, but if there aren't deadlocks in the test it should never incur that timeout. It looks like the test suite is very well behaved in practice, so I'm inclined to keep it as-is: https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.container=*StandaloneHerderTest&tests.sortField=FLAKY -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org