[ 
https://issues.apache.org/jira/browse/CASSANDRA-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-16094:
----------------------------------------
    Test and Documentation Plan: new jvm dtest, cci run
                         Status: Patch Available  (was: Open)

This failure looks to be caused by a gossip shutdown race.

During {{drain()}} we send out {{GOSSIP_SHUTDOWN}} messages to all live 
endpoints which marks the node down on all other nodes. Sometimes a node can 
get a {{GossipDigestAck}} from the shutting down node after the GOSSIP_SHUTDOWN 
message, then we will send an ECHO_REQ to the shutting down node, which replies 
and the node gets marked as UP again.

In this case it makes the mutation that is supposed to only go to node1 get 
queued up and applied when the node gets back which makes us not get a digest 
mismatch and no repair data tracking warning.

[Patch|https://github.com/krummas/cassandra/commits/marcuse/16094] to avoid 
replying to an {{ECHO_REQ}} if we are shutting down.

[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/575/workflows/242a0b75-e3a1-4e29-84d1-3353a32d4096]

> Flaky Test: TestIncRepair.test_repaired_tracking_with_mismatching_replicas
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16094
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16094
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/python
>            Reporter: Caleb Rackliffe
>            Assignee: Marcus Eriksson
>            Priority: Normal
>              Labels: dtest, incremental_repair, repair
>             Fix For: 4.0-beta
>
>
> We have two recent failures for this test on trunk: 
> 1.) 
> https://app.circleci.com/pipelines/github/maedhroz/cassandra/102/workflows/37ed8dab-9da4-4730-a883-20b7a99d88b4/jobs/518/tests
>  (CASSANDRA-15909)
> 2.) 
> https://app.circleci.com/pipelines/github/jolynch/cassandra/6/workflows/41e080e0-d7ff-4256-899e-b4010c6ef5ab/jobs/716/tests
>  (CASSANDRA-15379)
> The test expects there to be mismatches and then read repair executed on a 
> following SELECT, but either those mismatches aren’t there, read repair isn’t 
> happening, or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to