[ https://issues.apache.org/jira/browse/CASSANDRA-10143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-10143: ------------------------------------------ Fix Version/s: (was: 3.0.x) 3.1 > Apparent counter overcount during certain network partitions > ------------------------------------------------------------ > > Key: CASSANDRA-10143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10143 > Project: Cassandra > Issue Type: Bug > Reporter: Joel Knighton > Assignee: Aleksey Yeschenko > Fix For: 2.1.x, 2.2.x, 3.1 > > > This issue is reproducible in this [Jepsen > Test|https://github.com/riptano/jepsen/blob/f45f5320db608d48de2c02c871aecc4910f4d963/cassandra/test/cassandra/counter_test.clj#L16]. > The test starts a five-node cluster and issues increments by one against a > single counter. It then checks that the counter is in the range [OKed > increments, OKed increments + Write Timeouts] at each read. Increments are > issued at CL.ONE and reads at CL.ALL. Throughout the test, network failures > are induced that create halved network partitions. A halved network partition > splits the cluster into three connected nodes and two connected nodes, > randomly. > This test started failing; bisects showed that it was actually a test change > that caused this failure. When the network partitions are induced in a cycle > of 15s healthy/45s partitioned or 20s healthy/45s partitioned, the test > failes. When network partitions are induced in a cycle of 15s healthy/60s > partitioned, 20s healthy/45s partitioned, or 20s healthy/60s partitioned, the > test passes. > There is nothing unusual in the logs of the nodes for the failed tests. The > results are very reproducible. > One noticeable trend is that more reads seem to get serviced during the > failed tests. > Most testing has been done in 2.1.8 - the same issue appears to be present in > 2.2/3.0/trunk, but I haven't spent as much time reproducing. > Ideas? -- This message was sent by Atlassian JIRA (v6.3.4#6332)