Re: [PR] KAFKA-16051: Fixed deadlock in StandaloneHerder [kafka]

via GitHub Wed, 03 Jan 2024 08:28:54 -0800


gharris1727 commented on PR #15080:
URL: https://github.com/apache/kafka/pull/15080#issuecomment-1875639893

> just invoking methods in order is not enough to trigger the deadlock. I
believe it is possible to reliably reproduce the deadlock with two countdown
latches, one countdown in the ConfigBackingStore#snapshot and another in
ConfigBackingStore#putTaskConfigs. This requires a mock for the config backing
store. If you have a better idea I am happy to analyze it.

Yeah I understand. I think the cost of deterministically reproducing the
deadlock is too high. I did it in #8259 because I didn't know what
synchronization was missing and needed a repro case to debug.

I would be satisfied with a test which non-deterministically reproduces the
deadlock but is less brittle and includes less mocks. Currently we only have
two connectors calling task reconfiguration (mirror checkpoint and source) and
one test in the DistributedHerder. There is zero coverage in StandaloneHerder,
which is part of why we never found this bug :)

> Unrelated to this PR's issue, it may be that the wait operations in
StandaloneHerderTest are by mistake 1000 seconds instead of milliseconds. Isn't
it?

Yeah that timeout seems a bit absurd, but if there aren't deadlocks in the
test it should never incur that timeout. It looks like the test suite is very
well behaved in practice, so I'm inclined to keep it as-is:
https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.container=*StandaloneHerderTest&tests.sortField=FLAKY

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-16051: Fixed deadlock in StandaloneHerder [kafka]

Reply via email to