[ https://issues.apache.org/jira/browse/CASSANDRA-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770022#comment-17770022 ]
Andres de la Peña commented on CASSANDRA-18816: ----------------------------------------------- The new {{ConcurrentIrWithPreviewFuzzTest}} introduced by this patch is ~6% flaky in both 5.0 and trunk: * https://app.circleci.com/pipelines/github/adelapena/cassandra/3222/workflows/ecfca708-f183-429e-80e5-b2bfea8d25a0/jobs/80292/tests * https://app.circleci.com/pipelines/github/adelapena/cassandra/3221/workflows/bb777ac0-6263-4d6e-aa54-35d6928e1e9b/jobs/80294 {code} junit.framework.AssertionFailedError: Property error detected: Seed = 3695691971125975155 Examples = 2 Pure = false Error: property test did not complete within PT1M Values: at accord.utils.Property$Common.checkWithTimeout(Property.java:115) at accord.utils.Property$SingleBuilder.check(Property.java:223) at accord.utils.Property$ForBuilder.check(Property.java:124) at org.apache.cassandra.repair.ConcurrentIrWithPreviewFuzzTest.concurrentIrWithPreview(ConcurrentIrWithPreviewFuzzTest.java:46) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} I don't see any repeated runs on the CI results above, were they run? I have opened CASSANDRA-18890 to deal with it. > Add support for repair coordinator to retry messages that timeout > ----------------------------------------------------------------- > > Key: CASSANDRA-18816 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18816 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair > Reporter: David Capwell > Assignee: David Capwell > Priority: Normal > Fix For: 5.0-alpha2 > > Time Spent: 13h 10m > Remaining Estimate: 0h > > Now that CASSANDRA-15399 is in, most of the repair messages have a state that > they can check against to make message delivery idempotent, allowing the > coordinator to retry such messages; a few of the most critical messages to > retry are: PREPARE_MSG, VALIDATION_REQ, VALIDATION_RSP, SYNC_REQ, and > SYNC_RSP. > With this I propose making the coordinator able to retry these key messages > to try and make repair more resilient to ephemeral issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org