[jira] [Commented] (CASSANDRA-3858) expose "propagation delay" metric in JMX
[ https://issues.apache.org/jira/browse/CASSANDRA-3858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242307#comment-16242307 ] George commented on CASSANDRA-3858: --- I'm surprised there's not great interest for such a feature. Propagation delays must be a valid concern. What am I missing? > expose "propagation delay" metric in JMX > > > Key: CASSANDRA-3858 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3858 > Project: Cassandra > Issue Type: Improvement >Reporter: Peter Schuller >Priority: Minor > > My idea is to augment the gossip protocol to contain timestamps. We wouldn't > use the timestamps for anything "important", but we could use them to allow > each node to expose a number which is the number of milliseconds (or seconds) > "old" information is about nodes that are "the oldest" and also alive. > When nodes go down you'd see spikes, but for most cases where nodes live, > this information should give you a pretty good idea of how fast gossip > information is propagating through the cluster, assuming you keep your clocks > in synch. > It should be a good thing to have graphed, and to have alerts on. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696364#comment-14696364 ] Sylvestor George edited comment on CASSANDRA-10041 at 8/14/15 9:09 PM: --- I am able to reproduce the exception as per the given specification. {code} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:54) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted {code} was (Author: sylvestor88): I am able to reproduce the exception as per the given specification. Also, the application stops updating counters in the other table after the exception occurs. {code} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:54) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted {code} timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Fix For: 2.1.x Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George updated CASSANDRA-10041: - Comment: was deleted (was: I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. The Application reads from one table and updates to another table with Consistency QUORUM {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} ) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695958#comment-14695958 ] Sylvestor George edited comment on CASSANDRA-10041 at 8/13/15 9:44 PM: --- I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. The Application reads from one table and updates to another table with Consistency QUORUM {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} was (Author: sylvestor88): I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695958#comment-14695958 ] Sylvestor George edited comment on CASSANDRA-10041 at 8/13/15 9:16 PM: --- I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} was (Author: sylvestor88): I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695958#comment-14695958 ] Sylvestor George commented on CASSANDRA-10041: -- I have been able to reproduce the ReadTimeoutException for the above given environment. However I couldn't produce any WriteTimeoutException. {code}Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted{code} timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696364#comment-14696364 ] Sylvestor George commented on CASSANDRA-10041: -- I am able to reproduce the exception as per the given specification. Also, the application stops updating counters in the other table after the exception occurs. {code} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:54) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:34) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:182) ~[cassandra-driver-core-2.0.9.2.jar:na] at com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) ~[cassandra-driver-core-2.0.9.2.jar:na] ... 21 common frames omitted {code} timeout during write query at consistency ONE when updating counter at consistency QUORUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QUORUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10021) Losing writes in a single-node cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George reassigned CASSANDRA-10021: Assignee: Sylvestor George Losing writes in a single-node cluster -- Key: CASSANDRA-10021 URL: https://issues.apache.org/jira/browse/CASSANDRA-10021 Project: Cassandra Issue Type: Bug Environment: Docker images Reporter: Jeremy Schlatter Assignee: Sylvestor George Attachments: cpp-repro.zip, go-repro.zip I am able to reliably reproduce write losses in the following circumstances: * Set up a single-node cluster. * Create keyspace with SimpleStrategy, replication_factor = 1. * Create a table with a float field. * Send an UPDATE command to set the float value on a row. * After the command returns, immediately send another UPDATE to set the float value to something _smaller_ than the first value. * The second UPDATE is sometimes lost. Reproduction code attached. There are two implementations: one in Go and one in C++. They do the same thing -- I implemented both to rule out a bug in the client library. For both cases, you can reproduce by doing the following: 1. docker run --name repro-cassandra --rm cassandra:2.0.14 (or any other Cassandra version) 2. Download and unzip one of the zip files, and change to its directory. 3. docker build -t repro . 4. docker run --link repro-cassandra:cassandra --rm repro The reproduction code will repeatedly run two UPDATEs followed by a SELECT until it detects a lost write, and then print: Lost a write. Got 20.50, want 10.50 This may be fixed in 2.1.8 because I have not been able to reproduce it in that version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-10041) timeout during write query at consistency ONE when updating counter at consistency QOURUM and 2 of 3 nodes alive
[ https://issues.apache.org/jira/browse/CASSANDRA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George reassigned CASSANDRA-10041: Assignee: Sylvestor George timeout during write query at consistency ONE when updating counter at consistency QOURUM and 2 of 3 nodes alive -- Key: CASSANDRA-10041 URL: https://issues.apache.org/jira/browse/CASSANDRA-10041 Project: Cassandra Issue Type: Bug Components: Core Environment: centos 6.6 server, java version 1.8.0_45, cassandra 2.1.8, 3 machines, keyspace with replication factor 3 Reporter: Anton Lebedevich Assignee: Sylvestor George Test scenario is: kill -9 one node, wait 60 seconds, start it back, wait till it becomes available, wait 120 seconds (during that time all 3 nodes are up), repeat with the next node. Application reads from one table and updates counters in another table with consistency QOURUM. When one node out of 3 is killed application logs this exception for several seconds: {noformat} Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:57) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:195) ~[com.datastax.cassandra.cassandra-driver-core-2.1.6.jar:na] at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [io.netty.netty-codec-4.0.27.Final.jar:4.0.27.Final] ... 13 common frames omitted {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10021) Losing writes in a single-node cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-10021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692458#comment-14692458 ] Sylvestor George commented on CASSANDRA-10021: -- I was able to reproduce the issue in the attached Go file. The update also fails if the new float value is larger than the previous float value. Although its not the case for every update but its significantly frequent. I tested with C* version 2.1.6 and 2.1.7. The issue does not exist for C* version 2.1.8 Losing writes in a single-node cluster -- Key: CASSANDRA-10021 URL: https://issues.apache.org/jira/browse/CASSANDRA-10021 Project: Cassandra Issue Type: Bug Environment: Docker images Reporter: Jeremy Schlatter Assignee: Sylvestor George Attachments: cpp-repro.zip, go-repro.zip I am able to reliably reproduce write losses in the following circumstances: * Set up a single-node cluster. * Create keyspace with SimpleStrategy, replication_factor = 1. * Create a table with a float field. * Send an UPDATE command to set the float value on a row. * After the command returns, immediately send another UPDATE to set the float value to something _smaller_ than the first value. * The second UPDATE is sometimes lost. Reproduction code attached. There are two implementations: one in Go and one in C++. They do the same thing -- I implemented both to rule out a bug in the client library. For both cases, you can reproduce by doing the following: 1. docker run --name repro-cassandra --rm cassandra:2.0.14 (or any other Cassandra version) 2. Download and unzip one of the zip files, and change to its directory. 3. docker build -t repro . 4. docker run --link repro-cassandra:cassandra --rm repro The reproduction code will repeatedly run two UPDATEs followed by a SELECT until it detects a lost write, and then print: Lost a write. Got 20.50, want 10.50 This may be fixed in 2.1.8 because I have not been able to reproduce it in that version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601839#comment-14601839 ] Sylvestor George commented on CASSANDRA-9293: - Yes. It looks for the logs from previous test, and checks if any new LEAK DETECTED for that test class. If a leak is detected, the added test fails. I will further look into detecting the error from the files you have mentioned above. Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George updated CASSANDRA-9293: Attachment: (was: 9293.txt) Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George updated CASSANDRA-9293: Attachment: 9293.txt Hi Ariel, I have implemented one of the possible solutions which first counts leaks detected in the logs @BeforeClass. I then added another @Test which counts and compares the leaks with the before counts and fails if the count is more after. The changes are in the attached file 9293.txt Please let me know if this looks like a good solution. Sylvestor Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601576#comment-14601576 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/25/15 6:32 PM: -- I have implemented one of the possible solutions which first counts leaks detected in the logs @BeforeClass. I then added another @Test which counts and compares the leaks with the before counts and fails if the count is more after. The changes are in the attached file 9293.txt Please let me know if this looks like a good solution. Sylvestor was (Author: sylvestor88): Hi Ariel, I have implemented one of the possible solutions which first counts leaks detected in the logs @BeforeClass. I then added another @Test which counts and compares the leaks with the before counts and fails if the count is more after. The changes are in the attached file 9293.txt Please let me know if this looks like a good solution. Sylvestor Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14601576#comment-14601576 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/25/15 6:33 PM: -- I have implemented one of the possible solutions which first counts leak detected in the logs @BeforeClass. I then added another @Test which counts and compares the leaks with the before counts and fails if the count is more after. The changes are in the attached file 9293.txt Please let me know if this looks like a good solution. Sylvestor was (Author: sylvestor88): I have implemented one of the possible solutions which first counts leaks detected in the logs @BeforeClass. I then added another @Test which counts and compares the leaks with the before counts and fails if the count is more after. The changes are in the attached file 9293.txt Please let me know if this looks like a good solution. Sylvestor Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582671#comment-14582671 ] Sylvestor George commented on CASSANDRA-9293: - This will be done post 9528 which will Improve log output from unit tests. The output recorded from each unit test can be used to check whether that unit test had any leaks. Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Sylvestor George Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579696#comment-14579696 ] Sylvestor George commented on CASSANDRA-9293: - The only way I can think of is to include the 2 functions in all the test files. In that case, the unit tests within that file will be checked for leak detection. Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579622#comment-14579622 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/9/15 10:02 PM: -- Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there an better alternate to incorporate this. Thanks, Sylvestor George was (Author: sylvestor88): Hi, 2 functions are added to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if this solution works or else any suggestions to improve this process. Thanks, Sylvestor George Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579612#comment-14579612 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/9/15 10:02 PM: -- Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there an better alternate to incorporate this. Thanks, Sylvestor George was (Author: sylvestor88): Hi, 2 functions are added to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there an better alternate to incorporate this. Thanks, Sylvestor George Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George updated CASSANDRA-9293: Attachment: 9293.txt Hi, 2 functions are added to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if this solution works or else any suggestions to improve this process. Thanks, Sylvestor George Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvestor George updated CASSANDRA-9293: Comment: was deleted (was: Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there an better alternate to incorporate this. Thanks, Sylvestor George) Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579612#comment-14579612 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/9/15 10:07 PM: -- Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there a better alternative to incorporate this. Thanks, Sylvestor George was (Author: sylvestor88): Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there is a better alternative to incorporate this. Thanks, Sylvestor George Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9293) Unit tests should fail if any LEAK DETECTED errors are printed
[ https://issues.apache.org/jira/browse/CASSANDRA-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579612#comment-14579612 ] Sylvestor George edited comment on CASSANDRA-9293 at 6/9/15 10:07 PM: -- Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there is a better alternative to incorporate this. Thanks, Sylvestor George was (Author: sylvestor88): Hi, I have added 2 functions to check before and after count of leak detection errors for every unit test. If the after count is more than the before count for a specific unit test, the test fails with message 'Leak Detected for this test.' Please let me know if I am on the right track or is there an better alternate to incorporate this. Thanks, Sylvestor George Unit tests should fail if any LEAK DETECTED errors are printed -- Key: CASSANDRA-9293 URL: https://issues.apache.org/jira/browse/CASSANDRA-9293 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Philip Thompson Labels: test Attachments: 9293.txt We shouldn't depend on dtests to inform us of these problems (which have error log monitoring) - they should be caught by unit tests, which may also cover different failure conditions (besides being faster). There are a couple of ways we could do this, but probably the easiest is to add a static flag that is set to true if we ever see a leak (in Ref), and to just assert that this is false at the end of every test. [~enigmacurry] is this something TE can help with? -- This message was sent by Atlassian JIRA (v6.3.4#6332)