[ https://issues.apache.org/jira/browse/CASSANDRA-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109647#comment-17109647 ]
Ekaterina Dimitrova edited comment on CASSANDRA-15685 at 5/17/20, 9:20 PM: --------------------------------------------------------------------------- After a couple of hundred more runs of this test (my gut feeling told me that I miss something), it was confirmed that the lossy notifications are not the primary issue with this test. In some cases even if we catch the notifications for success/error and the flags "success" and "wasConsistent" are properly set, still the PreviewRepair shows that the Incremental Repair is still running. {code:java} [junit-timeout] java.lang.RuntimeException: Repair session 82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805], (9223372036854775805,-1]] failed with error An incremental repair with session id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair runtime {code} Turns out getting the notification doesn't always mean that the rest of the nodes are already informed about the completion. I can easily increase the time before preview repair starts. But we were considering with [~dcapwell] to open a case as there might be other parts of the code or tools relying only on the notifications for completion. Worth to be checked. Also, I am gonna check tomorrow in detail how we can improve this test not to rely on timing but probably some metadata. was (Author: e.dimitrova): After a couple of hundred more runs of this test (my gut feeling told me that I miss something), it was confirmed that the lossy notifications are not the primary issue with this test. The thing is that even if we catch the notifications for success/error and the flags "success" and "wasConsistent" are properly set, still the PreviewRepair shows that the Incremental Repair is still running. {code:java} [junit-timeout] java.lang.RuntimeException: Repair session 82ff3420-9737-11ea-b32d-7fa12d874715 for range [(-1,9223372036854775805], (9223372036854775805,-1]] failed with error An incremental repair with session id 82eff1e0-9737-11ea-b32d-7fa12d874715 finished during this preview repair runtime {code} Turns out getting the notification doesn't always mean that the rest of the nodes are already informed about the completion. I can easily increase the time before preview repair starts. But we were considering with [~dcapwell] to open a case as there might be other parts of the code or tools relying only on the notifications for completion. Worth to be checked. Also, I am gonna check tomorrow in detail how we can improve this test not to rely on timing but probably some metadata. > flaky testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-15685 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15685 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest > Reporter: Kevin Gallardo > Assignee: Ekaterina Dimitrova > Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Attachments: log-CASSANDRA-15685.txt, output > > Time Spent: 10m > Remaining Estimate: 0h > > Observed in: > https://app.circleci.com/pipelines/github/newkek/cassandra/34/workflows/1c6b157d-13c3-48a9-85fb-9fe8c153256b/jobs/191/tests > Failure: > {noformat} > testWithMismatchingPending - > org.apache.cassandra.distributed.test.PreviewRepairTest > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.PreviewRepairTest.testWithMismatchingPending(PreviewRepairTest.java:97) > {noformat} > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FCASSANDRA-15685] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org