[jira] [Resolved] (CASSANDRA-14918) multiget_slice returning inconsistent results when performed with CL higher than ONE
[ https://issues.apache.org/jira/browse/CASSANDRA-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Harvey resolved CASSANDRA-14918. -- Resolution: Duplicate > multiget_slice returning inconsistent results when performed with CL higher > than ONE > > > Key: CASSANDRA-14918 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14918 > Project: Cassandra > Issue Type: Bug > Components: Coordination > Environment: 9 node ring, cassandra 3.11.2, RF of 3. > Ring is not upgraded from any previous version. >Reporter: Jason Harvey >Priority: Major > > I'm on cassandra 3.11.2. On a number of CFs I'm observing that multiget_slice > sometimes returns inconsistent, partially-empty results for the exact same > request, despite the underlying data not changing. This behaviour is only > observed when I perform the multiget with a CL higher than `ONE` - all `ONE` > requests work as expected. > I was able to create a test table in a lab environment and after fiddling > with the data enough was able to repro. I was unable to perform a very basic > repro with only a few rows present. To repro, I inserted a couple million > rows, deleted a subset of those rows, and the performed a multiget slice on a > list of partitions which included living and deleted partitions. The result > is that sometimes, when performing a multiget on this data, I get a thrift > struct back with partition info, but no column names or values - the thrift > LIST that is generated contains 0 elements. If I issue this exact same > request 5 times I might get the appropriate data back once or twice. I have > verified on the wire that the request being made is identical - only the > results are different. > The repro case described above is rather meandering so I'm working to break > it down into a simple of a case as I can. It is unclear if deletions need to > occur to reproduce this or not. > Edit: Just confirmed I'm observing this behaviour on multiple distinct 3.11.2 > rings. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713569#comment-16713569 ] Dinesh Joshi commented on CASSANDRA-14554: -- [~benedict] thanks, I have assigned the ticket to you. I can review it and [~Stefania] please feel free to add yourself as a reviewer too. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Benedict >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14554: - Reviewers: Dinesh Joshi > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Benedict >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14554: - Reviewer: (was: Benedict) > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Benedict >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi reassigned CASSANDRA-14554: Assignee: Benedict (was: Dinesh Joshi) > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Benedict >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14922) In JVM dtests need to clean up after instance shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14922: - Attachment: AllThreadsStopped.png ClassLoadersRetaining.png OnlyThreeRootsLeft.png > In JVM dtests need to clean up after instance shutdown > -- > > Key: CASSANDRA-14922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14922 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png, > Leaking_Metrics_On_Shutdown.png, OnlyThreeRootsLeft.png > > > Currently the unit tests are failing on circleci ([example > one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1], > [example > two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1]) > because we use a small container (medium) for unit tests by default and the > in JVM dtests are leaking a few hundred megabytes of memory per test right > now. This is not a big deal because the dtest runs with the larger containers > continue to function fine as well as local testing as the number of in JVM > dtests is not yet high enough to cause a problem with more than 2GB of > available heap. However we should fix the memory leak so that going forwards > we can add more in JVM dtests without worry. > I've been working with [~ifesdjeen] to debug, and the issue appears to be > unreleased Table/Keyspace metrics (screenshot showing the leak attached). I > believe that we have a few potential issues that are leading to the leaks: > 1. The > [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac29120c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354] > method is not successfully cleaning up all the metrics created by the > {{CassandraMetricsRegistry}} > 2. The > [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac29120c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283] > method is not waiting for all the instances to finish shutting down and > cleaning up before continuing on > 3. I'm not sure if this is an issue assuming we clear all metrics, but > [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951] > does not release all the metric references (which could leak them) > I am working on a patch which shuts down everything and assures that we do > not leak memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14922) In JVM dtests need to clean up after instance shutdown
[ https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713487#comment-16713487 ] Joseph Lynch commented on CASSANDRA-14922: -- Alright, I think I'm narrowing in on this. I've managed to get all the threads to die over in [a branch|https://github.com/jolynch/cassandra/tree/CASSANDRA-14922] but we're still leaking all the static state through the {{InstanceClassLoader}} s. I think I've narrowed it down to just three remaining references (and I _think_ only one of them is a strong reference), details attached in the screenshots. We basically just need to kill that last strong reference and I believe that the whole {{InstanceClassLoader}} should become collectible at that point (even with all the static state and self references to the classloaders should be ok since it'll be cut off at the root, I think). > In JVM dtests need to clean up after instance shutdown > -- > > Key: CASSANDRA-14922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14922 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Minor > Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png, > Leaking_Metrics_On_Shutdown.png, OnlyThreeRootsLeft.png > > > Currently the unit tests are failing on circleci ([example > one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1], > [example > two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1]) > because we use a small container (medium) for unit tests by default and the > in JVM dtests are leaking a few hundred megabytes of memory per test right > now. This is not a big deal because the dtest runs with the larger containers > continue to function fine as well as local testing as the number of in JVM > dtests is not yet high enough to cause a problem with more than 2GB of > available heap. However we should fix the memory leak so that going forwards > we can add more in JVM dtests without worry. > I've been working with [~ifesdjeen] to debug, and the issue appears to be > unreleased Table/Keyspace metrics (screenshot showing the leak attached). I > believe that we have a few potential issues that are leading to the leaks: > 1. The > [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac29120c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354] > method is not successfully cleaning up all the metrics created by the > {{CassandraMetricsRegistry}} > 2. The > [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac29120c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283] > method is not waiting for all the instances to finish shutting down and > cleaning up before continuing on > 3. I'm not sure if this is an issue assuming we clear all metrics, but > [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951] > does not release all the metric references (which could leak them) > I am working on a patch which shuts down everything and assures that we do > not leak memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713138#comment-16713138 ] Benedict commented on CASSANDRA-14554: -- Thanks. I had a suspicion it might have simply been migrated, but I wasn't quite sure where form. Looks like it goes all the way back to 2016; I don't think [~yukim] is around anymore? I guess we'll just leave it be; I'm not terribly thrilled at the asymmetry between fully synchronising {{abort}} (and hence accesses to {{sstables}}), and only synchronising {{finishTransaction}} in {{finish}} - though this *should* be fine, the asymmetry suggests maybe there's something we're missing. Anyway, it certainly maintains the prior behaviour. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713129#comment-16713129 ] Blake Eggleston commented on CASSANDRA-14554: - [~benedict] I'm afraid I don't have a great explanation other than that's just how it was in StreamReceiveTask before the refactor. > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
[ https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713122#comment-16713122 ] Blake Eggleston commented on CASSANDRA-14871: - thanks for taking a look Robert. bq. I think this lock has no "real" effect - mean, the method just returns the reference to topology. But we should make topology volatile. WDYT? Agreed, fixed. bq. Not sure how this could actually help. It's definitely fine as a safety net though. WDYT about replacing it with assert type != null : "Parsing '" + str + "' yielded null, which is a bug"; right before the synchronized. Right, it's just to quickly catch future bugs. I'd prefer to keep it as is if that's ok with you. Right now it's verifying the entire un-cached case up to and including the caching of the type. Regarding using {{assert}}, I avoid it outside of tests because asserts can be disabled, and {{Preconditions}} and {{Verify}} do a better job of communicating what you're checking (imo). > Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser > --- > > Key: CASSANDRA-14871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14871 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Critical > Fix For: 4.0, 3.0.x, 3.11.x > > > There are a couple of places in the code base that do not respect that > j.u.HashMap + related classes are not thread safe and some parts rely on > internals of the implementation of HM, which can change. > We have observed failures like {{NullPointerException}} and > {{ConcurrentModificationException}} as well as wrong behavior. > Affected areas in the code base: > * {{SizeTieredCompactionStrategy}} > * {{DateTieredCompactionStrategy}} > * {{TimeWindowCompactionStrategy}} > * {{TokenMetadata.Topology}} > * {{TypeParser}} > * streaming / concurrent access to {{LifecycleTransaction}} (handled in > CASSANDRA-14554) > While the patches for the compaction strategies + {{TypeParser}} are pretty > straight forward, the patch for {{TokenMetadata.Topology}} requires it to be > made immutable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14921) thrift_test.py failures on 3.0 and 3.x branches
[ https://issues.apache.org/jira/browse/CASSANDRA-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-14921: Reviewers: Stefan Podkowinski > thrift_test.py failures on 3.0 and 3.x branches > --- > > Key: CASSANDRA-14921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14921 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Major > Fix For: 3.0.18, 3.11.4 > > > {{putget_test::TestPutGet::test_wide_slice}} fails on CircleCI since the > docker image was updated for CASSANDRA-14713. The reason for this is that the > {{fastbinary}} extension used by {{TBinaryProtocolAccelerated}} is not > compatible with thrift 0.10.0 (according to [this bug report against > Pycassa|https://github.com/pycassa/pycassa/issues/245]). The offending binary > is present in the filesystem of the [current docker > image|https://hub.docker.com/r/spod/cassandra-testing-ubuntu18-java11/], but > wasn't in [the previous image > |https://hub.docker.com/r/kjellman/cassandra-test/], which meant that thrift > would fallback to the standard protocol implementation (silently). > As this is the only test which uses {{TBinaryProtocolAccelerated}} it's easy > enough to switch it to {{TBinaryProtocol}}, which also fixes things. We might > want consider removing the binary next time the image is updated though (cc > [~spo...@gmail.com]). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14921) thrift_test.py failures on 3.0 and 3.x branches
[ https://issues.apache.org/jira/browse/CASSANDRA-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-14921: Resolution: Fixed Fix Version/s: (was: 3.11.x) (was: 3.0.x) 3.11.4 3.0.18 Status: Resolved (was: Patch Available) Thanks, committed as {{3d069f713ef2bd61a32863b29a6f160b74ecd89c}} The binary that causes the issue is: {{/home/cassandra/env/lib/python3.6/site-packages/thrift/protocol/fastbinary.cpython-36m-x86_64-linux-gnu.so}} > thrift_test.py failures on 3.0 and 3.x branches > --- > > Key: CASSANDRA-14921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14921 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Major > Fix For: 3.0.18, 3.11.4 > > > {{putget_test::TestPutGet::test_wide_slice}} fails on CircleCI since the > docker image was updated for CASSANDRA-14713. The reason for this is that the > {{fastbinary}} extension used by {{TBinaryProtocolAccelerated}} is not > compatible with thrift 0.10.0 (according to [this bug report against > Pycassa|https://github.com/pycassa/pycassa/issues/245]). The offending binary > is present in the filesystem of the [current docker > image|https://hub.docker.com/r/spod/cassandra-testing-ubuntu18-java11/], but > wasn't in [the previous image > |https://hub.docker.com/r/kjellman/cassandra-test/], which meant that thrift > would fallback to the standard protocol implementation (silently). > As this is the only test which uses {{TBinaryProtocolAccelerated}} it's easy > enough to switch it to {{TBinaryProtocol}}, which also fixes things. We might > want consider removing the binary next time the image is updated though (cc > [~spo...@gmail.com]). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-dtest git commit: Remove use of TBinaryProtocolAccelerated
Repository: cassandra-dtest Updated Branches: refs/heads/master 325ef3fa0 -> 3d069f713 Remove use of TBinaryProtocolAccelerated Patch by Sam Tunnicliffe; reviewed by Stefan Podkowinski for CASSANDRA-14921 Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/3d069f71 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/3d069f71 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/3d069f71 Branch: refs/heads/master Commit: 3d069f713ef2bd61a32863b29a6f160b74ecd89c Parents: 325ef3f Author: Sam Tunnicliffe Authored: Thu Nov 29 17:36:05 2018 + Committer: Sam Tunnicliffe Committed: Fri Dec 7 15:43:00 2018 + -- putget_test.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/3d069f71/putget_test.py -- diff --git a/putget_test.py b/putget_test.py index 8d22f85..bdb2cb1 100644 --- a/putget_test.py +++ b/putget_test.py @@ -217,7 +217,7 @@ class ThriftConnection(object): socket = TSocket.TSocket(host, port) self.transport = TTransport.TFramedTransport(socket) -protocol = TBinaryProtocol.TBinaryProtocolAccelerated(self.transport) +protocol = TBinaryProtocol.TBinaryProtocol(self.transport) self.client = self.Cassandra.Client(protocol) socket.open() - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712973#comment-16712973 ] C. Scott Andreas commented on CASSANDRA-14812: -- Thanks! Reassuring to see that CQL is not impacted here. > Multiget Thrift query returns null records after digest mismatch > > > Key: CASSANDRA-14812 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14812 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Core >Reporter: Sivukhin Nikita >Assignee: Benedict >Priority: Critical > Fix For: 3.0.18 > > Attachments: repro_script.py, requirements.txt, small_repro_script.py > > > It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} > Thrift query processing logic. When one tries to read data from several > partitions with a single {{multiget}} query and {{DigestMismatch}} exception > is raised during this query processing, request coordinator prematurely > terminates response stream right at the point where the first > \{{DigestMismatch}} error is occurring. This leads to situation where clients > "do not see" some data contained in the database. > We managed to reproduce this bug in all versions of Cassandra starting with > v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like > [refactoring of iterator transformation > hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7] > related to CASSANDRA-9975 triggers incorrect behaviour. > When concatenated iterator is returned from the > [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770], > Cassandra starts to consume this combined iterator. Because of > {{DigestMismatch}} exception some elements of this combined iterator contain > additional {{ThriftCounter}}, that was added during > [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120] > execution. While consuming iterator for many partitions Cassandra calls > [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115] > method that must switch from one partition iterator to another in case of > exhaustion of the former. In this case all Transformations contained in the > next iterator are applied to the combined BaseIterator that enumerates > partitions sequence which is wrong. This behaviour causes BaseIterator to > stop enumeration after it fully consumes partition with {{DigestMismatch}} > error, because this partition iterator has additional {{ThriftCounter}} data > limit. > The attachment contains the python2 script [^small_repro_script.py] that > reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is > an extended version of this script - [^repro_script.py] - that contains more > logging information and provides the ability to test behavior for many > Cassandra versions (to run all test cases from repro_script.py you can call > {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the > necessary dependencies contained in the [^requirements.txt] > > This bug is critical in our production environment because we can't permit > any data skip. > Any ideas about a patch for this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14482) ZSTD Compressor support in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712962#comment-16712962 ] Sushma A Devendrappa commented on CASSANDRA-14482: -- Thank you [~djoshi3] > ZSTD Compressor support in Cassandra > > > Key: CASSANDRA-14482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14482 > Project: Cassandra > Issue Type: New Feature > Components: Compression, Libraries >Reporter: Sushma A Devendrappa >Assignee: Sushma A Devendrappa >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712923#comment-16712923 ] Benedict edited comment on CASSANDRA-14812 at 12/7/18 2:35 PM: --- I have pushed a minimal patch [here|https://github.com/belliottsmith/cassandra/tree/14812]. I intend to follow-up with a separate back port of the in-jvm distributed tests to include a unit test. This issue does not, serendipitously, seemingly affect CQL queries. The kinds of limit we apply to CQL queries would not fall prey to this issue. was (Author: benedict): I have pushed a minimal patch [here|https://github.com/belliottsmith/cassandra/tree/14812]. I intend to follow-up with a separate back port of the in-jvm distributed tests to include a unit test. > Multiget Thrift query returns null records after digest mismatch > > > Key: CASSANDRA-14812 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14812 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Core >Reporter: Sivukhin Nikita >Assignee: Benedict >Priority: Critical > Fix For: 3.0.18 > > Attachments: repro_script.py, requirements.txt, small_repro_script.py > > > It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} > Thrift query processing logic. When one tries to read data from several > partitions with a single {{multiget}} query and {{DigestMismatch}} exception > is raised during this query processing, request coordinator prematurely > terminates response stream right at the point where the first > \{{DigestMismatch}} error is occurring. This leads to situation where clients > "do not see" some data contained in the database. > We managed to reproduce this bug in all versions of Cassandra starting with > v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like > [refactoring of iterator transformation > hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7] > related to CASSANDRA-9975 triggers incorrect behaviour. > When concatenated iterator is returned from the > [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770], > Cassandra starts to consume this combined iterator. Because of > {{DigestMismatch}} exception some elements of this combined iterator contain > additional {{ThriftCounter}}, that was added during > [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120] > execution. While consuming iterator for many partitions Cassandra calls > [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115] > method that must switch from one partition iterator to another in case of > exhaustion of the former. In this case all Transformations contained in the > next iterator are applied to the combined BaseIterator that enumerates > partitions sequence which is wrong. This behaviour causes BaseIterator to > stop enumeration after it fully consumes partition with {{DigestMismatch}} > error, because this partition iterator has additional {{ThriftCounter}} data > limit. > The attachment contains the python2 script [^small_repro_script.py] that > reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is > an extended version of this script - [^repro_script.py] - that contains more > logging information and provides the ability to test behavior for many > Cassandra versions (to run all test cases from repro_script.py you can call > {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the > necessary dependencies contained in the [^requirements.txt] > > This bug is critical in our production environment because we can't permit > any data skip. > Any ideas about a patch for this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14812: - Labels: (was: bug) Reproduced In: 3.11.3, 3.11.2, 3.11.1, 3.11.0 (was: 3.11.0, 3.11.1, 3.11.2, 3.11.3) Status: Patch Available (was: Open) I have pushed a minimal patch [here|https://github.com/belliottsmith/cassandra/tree/14812]. I intend to follow-up with a separate back port of the in-jvm distributed tests to include a unit test. > Multiget Thrift query returns null records after digest mismatch > > > Key: CASSANDRA-14812 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14812 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Core >Reporter: Sivukhin Nikita >Assignee: Benedict >Priority: Critical > Fix For: 3.0.18 > > Attachments: repro_script.py, requirements.txt, small_repro_script.py > > > It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} > Thrift query processing logic. When one tries to read data from several > partitions with a single {{multiget}} query and {{DigestMismatch}} exception > is raised during this query processing, request coordinator prematurely > terminates response stream right at the point where the first > \{{DigestMismatch}} error is occurring. This leads to situation where clients > "do not see" some data contained in the database. > We managed to reproduce this bug in all versions of Cassandra starting with > v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like > [refactoring of iterator transformation > hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7] > related to CASSANDRA-9975 triggers incorrect behaviour. > When concatenated iterator is returned from the > [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770], > Cassandra starts to consume this combined iterator. Because of > {{DigestMismatch}} exception some elements of this combined iterator contain > additional {{ThriftCounter}}, that was added during > [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120] > execution. While consuming iterator for many partitions Cassandra calls > [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115] > method that must switch from one partition iterator to another in case of > exhaustion of the former. In this case all Transformations contained in the > next iterator are applied to the combined BaseIterator that enumerates > partitions sequence which is wrong. This behaviour causes BaseIterator to > stop enumeration after it fully consumes partition with {{DigestMismatch}} > error, because this partition iterator has additional {{ThriftCounter}} data > limit. > The attachment contains the python2 script [^small_repro_script.py] that > reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is > an extended version of this script - [^repro_script.py] - that contains more > logging information and provides the ability to test behavior for many > Cassandra versions (to run all test cases from repro_script.py you can call > {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the > necessary dependencies contained in the [^requirements.txt] > > This bug is critical in our production environment because we can't permit > any data skip. > Any ideas about a patch for this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14812) Multiget Thrift query returns null records after digest mismatch
[ https://issues.apache.org/jira/browse/CASSANDRA-14812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14812: - Summary: Multiget Thrift query returns null records after digest mismatch (was: Multiget Thrift query processor skips records in case of digest mismatch) > Multiget Thrift query returns null records after digest mismatch > > > Key: CASSANDRA-14812 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14812 > Project: Cassandra > Issue Type: Bug > Components: Coordination, Core >Reporter: Sivukhin Nikita >Assignee: Benedict >Priority: Critical > Labels: bug > Fix For: 3.0.18 > > Attachments: repro_script.py, requirements.txt, small_repro_script.py > > > It seems that in Cassandra 3.0.0 a nasty bug was introduced in {{multiget}} > Thrift query processing logic. When one tries to read data from several > partitions with a single {{multiget}} query and {{DigestMismatch}} exception > is raised during this query processing, request coordinator prematurely > terminates response stream right at the point where the first > \{{DigestMismatch}} error is occurring. This leads to situation where clients > "do not see" some data contained in the database. > We managed to reproduce this bug in all versions of Cassandra starting with > v3.0.0. The pre-release version 3.0.0-rc2 works correctly. It looks like > [refactoring of iterator transformation > hierarchy|https://github.com/apache/cassandra/commit/609497471441273367013c09a1e0e1c990726ec7] > related to CASSANDRA-9975 triggers incorrect behaviour. > When concatenated iterator is returned from the > [StorageProxy.fetchRows(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/service/StorageProxy.java#L1770], > Cassandra starts to consume this combined iterator. Because of > {{DigestMismatch}} exception some elements of this combined iterator contain > additional {{ThriftCounter}}, that was added during > [DataResolver.resolve(...)|https://github.com/apache/cassandra/blob/ee9e06b5a75c0be954694b191ea4170456015b98/src/java/org/apache/cassandra/service/reads/DataResolver.java#L120] > execution. While consuming iterator for many partitions Cassandra calls > [BaseIterator.tryGetMoreContents(...)|https://github.com/apache/cassandra/blob/a05785d82c621c9cd04d8a064c38fd2012ef981c/src/java/org/apache/cassandra/db/transform/BaseIterator.java#L115] > method that must switch from one partition iterator to another in case of > exhaustion of the former. In this case all Transformations contained in the > next iterator are applied to the combined BaseIterator that enumerates > partitions sequence which is wrong. This behaviour causes BaseIterator to > stop enumeration after it fully consumes partition with {{DigestMismatch}} > error, because this partition iterator has additional {{ThriftCounter}} data > limit. > The attachment contains the python2 script [^small_repro_script.py] that > reproduces this bug within 3-nodes ccmlib controlled cluster. Also, there is > an extended version of this script - [^repro_script.py] - that contains more > logging information and provides the ability to test behavior for many > Cassandra versions (to run all test cases from repro_script.py you can call > {{python -m unittest2 -v repro_script.ThriftMultigetTestCase}}). All the > necessary dependencies contained in the [^requirements.txt] > > This bug is critical in our production environment because we can't permit > any data skip. > Any ideas about a patch for this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14806) CircleCI workflow improvements and Java 11 support
[ https://issues.apache.org/jira/browse/CASSANDRA-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712884#comment-16712884 ] Stefan Podkowinski commented on CASSANDRA-14806: Let's just get rid of the pre-commit hook completely then. How about just adding the following steps to the instructions and handle things manually? Switching between low/high resource configurations: * cp config.yml.HIGHRES config.yml # use LOWRES to revert Updating the original config: * Make your changes to config-2_1.yml (keep this for lowres settings) * Generate valid config using circleci cli tool: circleci config process config-2_1.yml > config.yml.LOWRES * Generate version for highres settings patch -o config-2_1.yml.HIGHRES config-2_1.yml config-2_1.yml.high_res.patch && \ circleci config process config-2_1.yml.HIGHRES > config.yml.HIGHRES && \ rm config-2_1.yml.HIGHRES * Copy either HIGH or LOWRES to config.yml to make changes effective > CircleCI workflow improvements and Java 11 support > -- > > Key: CASSANDRA-14806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14806 > Project: Cassandra > Issue Type: Improvement > Components: Build, Testing >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > > The current CircleCI config could use some cleanup and improvements. First of > all, the config has been made more modular by using the new CircleCI 2.1 > executors and command elements. Based on CASSANDRA-14713, there's now also a > Java 11 executor that will allow running tests under Java 11. The {{build}} > step will be done using Java 11 in all cases, so we can catch any regressions > for that and also test the Java 11 multi-jar artifact during dtests, that > we'd also create during the release process. > The job workflow has now also been changed to make use of the [manual job > approval|https://circleci.com/docs/2.0/workflows/#holding-a-workflow-for-a-manual-approval] > feature, which now allows running dtest jobs only on request and not > automatically with every commit. The Java8 unit tests still do, but that > could also be easily changed if needed. See [example > workflow|https://circleci.com/workflow-run/be25579d-3cbb-4258-9e19-b1f571873850] > with start_ jobs being triggers needed manual approval for starting the > actual jobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-9167) Improve bloom-filter false-positive-ratio
[ https://issues.apache.org/jira/browse/CASSANDRA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-9167: Status: Open (was: Patch Available) It's quite old. No pressure here. > Improve bloom-filter false-positive-ratio > - > > Key: CASSANDRA-9167 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9167 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Minor > Labels: perfomance > > {{org.apache.cassandra.utils.BloomCalculations}} performs some table lookups > to calculate the bloom filter specification (size, # of hashes). Using the > exact maths for that computation brings a better false-positive-ratio (the > maths usually returns higher numbers for hash-counts). > TL;DR increasing the number of hash-rounds brings a nice improvement. Finally > it's a trade-off between CPU and I/O. > ||false-positive-chance||elements||capacity||hash count > new||false-positive-ratio new||hash count current||false-positive-ratio > current||improvement > |0.1|1|50048|3|0.0848|3|0.0848|0 > |0.1|10|500032|3|0.09203|3|0.09203|0 > |0.1|100|564|3|0.0919|3|0.0919|0 > |0.1|1000|5064|3|0.09182|3|0.09182|0 > |0.1|1|50064|3|0.091874|3|0.091874|0 > |0.01|1|100032|7|0.0092|5|0.0107|0.1630434783 > |0.01|10|164|7|0.00818|5|0.00931|0.1381418093 > |0.01|100|1064|7|0.008072|5|0.009405|0.1651387512 > |0.01|1000|10064|7|0.008174|5|0.009375|0.146929288 > |0.01|1|100064|7|0.008197|5|0.009428|0.150176894 > |0.001|1|150080|10|0.0008|7|0.001|0.25 > |0.001|10|1500032|10|0.0006|7|0.00094|0.57 > |0.001|100|1564|10|0.000717|7|0.000991|0.3821478382 > |0.001|1000|15064|10|0.000743|7|0.000992|0.33512786 > |0.001|1|150064|10|0.000741|7|0.001002|0.3522267206 > |0.0001|1|200064|13|0|10|0.0002|#DIV/0! > |0.0001|10|264|13|0.4|10|0.0001|1.5 > |0.0001|100|2064|13|0.75|10|0.91|0.21 > |0.0001|1000|20064|13|0.69|10|0.87|0.2608695652 > |0.0001|1|200064|13|0.68|10|0.9|0.3235294118 > If we decide to allow more hash-rounds, it could be nicely back-ported even > to 2.0 without affecting existing sstables. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
[ https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712822#comment-16712822 ] Robert Stupp commented on CASSANDRA-14871: -- Few comments: * I think [this lock|https://github.com/bdeggleston/cassandra/commit/7eeec9be03be6d326432fc715e9dce4b173acdf4#diff-b9ead760fa9628889810dd64e6507d9cR1279] has no "real" effect - mean, the method just returns the reference to {{topology}}. But we should make {{topology}} {{volatile}}. WDYT? * The builder-approach for {{Topology}} is nice! * Not sure how [this|https://github.com/bdeggleston/cassandra/commit/0a8f3909098a233bca42d651b8242b288bb2c557#diff-052bdc412f1a356a3fb5409de51dceb5R108] could actually help. It's definitely fine as a safety net though. WDYT about replacing it with {{assert type != null : "Parsing '" + str + "' yielded null, which is a bug";}} right before the {{synchronized}}. * +1 on the other changes! > Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser > --- > > Key: CASSANDRA-14871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14871 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Critical > Fix For: 4.0, 3.0.x, 3.11.x > > > There are a couple of places in the code base that do not respect that > j.u.HashMap + related classes are not thread safe and some parts rely on > internals of the implementation of HM, which can change. > We have observed failures like {{NullPointerException}} and > {{ConcurrentModificationException}} as well as wrong behavior. > Affected areas in the code base: > * {{SizeTieredCompactionStrategy}} > * {{DateTieredCompactionStrategy}} > * {{TimeWindowCompactionStrategy}} > * {{TokenMetadata.Topology}} > * {{TypeParser}} > * streaming / concurrent access to {{LifecycleTransaction}} (handled in > CASSANDRA-14554) > While the patches for the compaction strategies + {{TypeParser}} are pretty > straight forward, the patch for {{TokenMetadata.Topology}} requires it to be > made immutable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
[ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712803#comment-16712803 ] Benedict commented on CASSANDRA-14554: -- ||3.0||3.11||trunk|| |[patch|https://github.com/belliottsmith/cassandra/tree/14554-3.0]|[patch|https://github.com/belliottsmith/cassandra/tree/14554-3.11]|[patch|https://github.com/belliottsmith/cassandra/tree/14554-trunk]| |[CI|https://circleci.com/workflow-run/3e615cf7-a985-448b-9ba0-6d52f9b87eec]|[CI|https://circleci.com/workflow-run/a74915a5-ea21-49ed-ab0a-389964b606f4]|[CI|https://circleci.com/workflow-run/64915295-ab06-4946-b736-85b799765003]| AFAICT the CI failures are related to CASSANDRA-14921. There was a [brief single unit test failure|https://circleci.com/gh/belliottsmith/cassandra/1032#tests/containers/2] for one run, but probably environment or timing related. This patch is simply a slightly modified and ported version of Stefania's patch above. As discussed, this doesn't necessarily solve the problem perfectly, but it is close enough that it's worth applying now and worrying about that later. [~bdeggleston]: I would appreciate it if you could take a quick look at [this|https://github.com/belliottsmith/cassandra/commit/fd54c420da81ee6ebfa1d29f45b8edc4922c8bdb#diff-374d64d7ac810fe7be021a2ef356c071R216], as it's not clear to me why there is a separate synchronised {{finishTransaction}} method. Was there anticipated to be some kind of potential deadlock? > LifecycleTransaction encounters ConcurrentModificationException when used in > multi-threaded context > --- > > Key: CASSANDRA-14554 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14554 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > > When LifecycleTransaction is used in a multi-threaded context, we encounter > this exception - > {quote}java.util.ConcurrentModificationException: null > at > java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) > at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) > at java.lang.Iterable.forEach(Iterable.java:74) > at > org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78) > at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320) > at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285) > at > org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136) > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529) > {quote} > During streaming we create a reference to a {{LifeCycleTransaction}} and > share it between threads - > [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156] > This is used in a multi-threaded context inside {{CassandraIncomingFile}} > which is an {{IncomingStreamMessage}}. This is being deserialized in parallel. > {{LifecycleTransaction}} is not meant to be used in a multi-threaded context > and this leads to streaming failures due to object sharing. On trunk, this > object is shared across all threads that transfer sstables in parallel for > the given {{TableId}} in a {{StreamSession}}. There are two options to solve > this - make {{LifecycleTransaction}} and the associated objects thread safe, > scope the transaction to a single {{CassandraIncomingFile}}. The consequences > of the latter option is that if we experience streaming failure we may have > redundant SSTables on disk. This is ok as compaction should clean this up. A > third option is we synchronize access in the streaming infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
[ https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712792#comment-16712792 ] Robert Stupp commented on CASSANDRA-14871: -- [~bdeggleston], thank you! Will take a look. > Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser > --- > > Key: CASSANDRA-14871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14871 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Robert Stupp >Assignee: Robert Stupp >Priority: Critical > Fix For: 4.0, 3.0.x, 3.11.x > > > There are a couple of places in the code base that do not respect that > j.u.HashMap + related classes are not thread safe and some parts rely on > internals of the implementation of HM, which can change. > We have observed failures like {{NullPointerException}} and > {{ConcurrentModificationException}} as well as wrong behavior. > Affected areas in the code base: > * {{SizeTieredCompactionStrategy}} > * {{DateTieredCompactionStrategy}} > * {{TimeWindowCompactionStrategy}} > * {{TokenMetadata.Topology}} > * {{TypeParser}} > * streaming / concurrent access to {{LifecycleTransaction}} (handled in > CASSANDRA-14554) > While the patches for the compaction strategies + {{TypeParser}} are pretty > straight forward, the patch for {{TokenMetadata.Topology}} requires it to be > made immutable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14921) thrift_test.py failures on 3.0 and 3.x branches
[ https://issues.apache.org/jira/browse/CASSANDRA-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712729#comment-16712729 ] Stefan Podkowinski commented on CASSANDRA-14921: +1 on the patch Do you have the path of the binary that should be removed? > thrift_test.py failures on 3.0 and 3.x branches > --- > > Key: CASSANDRA-14921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14921 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Major > Fix For: 3.0.x, 3.11.x > > > {{putget_test::TestPutGet::test_wide_slice}} fails on CircleCI since the > docker image was updated for CASSANDRA-14713. The reason for this is that the > {{fastbinary}} extension used by {{TBinaryProtocolAccelerated}} is not > compatible with thrift 0.10.0 (according to [this bug report against > Pycassa|https://github.com/pycassa/pycassa/issues/245]). The offending binary > is present in the filesystem of the [current docker > image|https://hub.docker.com/r/spod/cassandra-testing-ubuntu18-java11/], but > wasn't in [the previous image > |https://hub.docker.com/r/kjellman/cassandra-test/], which meant that thrift > would fallback to the standard protocol implementation (silently). > As this is the only test which uses {{TBinaryProtocolAccelerated}} it's easy > enough to switch it to {{TBinaryProtocol}}, which also fixes things. We might > want consider removing the binary next time the image is updated though (cc > [~spo...@gmail.com]). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14923) Java8 forEach cannot be used in UDF
Lapo Luchini created CASSANDRA-14923: Summary: Java8 forEach cannot be used in UDF Key: CASSANDRA-14923 URL: https://issues.apache.org/jira/browse/CASSANDRA-14923 Project: Cassandra Issue Type: Bug Environment: FreeBSD 11.2 Cassandra 3.11.3 OpenJDK 8.181.13 Reporter: Lapo Luchini I get the following error: {noformat} cqlsh:test> CREATE OR REPLACE FUNCTION sumMap (state map, val map) ... RETURNS NULL ON NULL INPUT ... RETURNS map ... LANGUAGE java AS ... $$ ... val.forEach((k, v) -> { ... Integer cur = state.get(k); ... state.put(k, (cur == null) ? v : cur + v); ... }); ... return state; ... $$; InvalidRequest: Error from server: code=2200 [Invalid query] message="Java source compilation failed: Line 2: The type java.util.function.BiConsumer cannot be resolved. It is indirectly referenced from required .class files Line 2: The method forEach(BiConsumer) from the type Map refers to the missing type BiConsumer Line 2: The target type of this expression must be a functional interface " {noformat} on the other hand, this compiles correctly: {noformat} CREATE OR REPLACE FUNCTION sumMap (state map, val map) RETURNS NULL ON NULL INPUT RETURNS map LANGUAGE java AS $$ for (Map.Entry e : val.entrySet()) { String k = e.getKey(); Integer v = e.getValue(); Integer cur = state.get(k); state.put(k, (cur == null) ? v : cur + v); }; return state; $$; {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14921) thrift_test.py failures on 3.0 and 3.x branches
[ https://issues.apache.org/jira/browse/CASSANDRA-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712597#comment-16712597 ] Sam Tunnicliffe commented on CASSANDRA-14921: - bq. yes, pycassa isn't python 3 compatible and we needed new thrift bindings to get python 3 support for the remianing thrift tests we still have. Jeff Jirsa did this work. from: [this comment | https://issues.apache.org/jira/browse/CASSANDRA-14134?focusedCommentId=16314023&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16314023] on CASSANDRA-14134. > thrift_test.py failures on 3.0 and 3.x branches > --- > > Key: CASSANDRA-14921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14921 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Major > Fix For: 3.0.x, 3.11.x > > > {{putget_test::TestPutGet::test_wide_slice}} fails on CircleCI since the > docker image was updated for CASSANDRA-14713. The reason for this is that the > {{fastbinary}} extension used by {{TBinaryProtocolAccelerated}} is not > compatible with thrift 0.10.0 (according to [this bug report against > Pycassa|https://github.com/pycassa/pycassa/issues/245]). The offending binary > is present in the filesystem of the [current docker > image|https://hub.docker.com/r/spod/cassandra-testing-ubuntu18-java11/], but > wasn't in [the previous image > |https://hub.docker.com/r/kjellman/cassandra-test/], which meant that thrift > would fallback to the standard protocol implementation (silently). > As this is the only test which uses {{TBinaryProtocolAccelerated}} it's easy > enough to switch it to {{TBinaryProtocol}}, which also fixes things. We might > want consider removing the binary next time the image is updated though (cc > [~spo...@gmail.com]). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14921) thrift_test.py failures on 3.0 and 3.x branches
[ https://issues.apache.org/jira/browse/CASSANDRA-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712594#comment-16712594 ] Sam Tunnicliffe commented on CASSANDRA-14921: - bq. Can't we just downgrade to 0.9.3 in dtest's requirements.txt? I seem to recall that there is some other incompatibility between thrift 0.9 and python3, which was why the dependency got bumped in CASSANDRA-14134. I didn't make that change though, so I'll have to check. Fixing the issue on our side is fine with me, hence the patch. I just wanted to note the presence of the problematic binary so we can maybe remove next time we have to do some maintenance on the docker image. > thrift_test.py failures on 3.0 and 3.x branches > --- > > Key: CASSANDRA-14921 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14921 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Major > Fix For: 3.0.x, 3.11.x > > > {{putget_test::TestPutGet::test_wide_slice}} fails on CircleCI since the > docker image was updated for CASSANDRA-14713. The reason for this is that the > {{fastbinary}} extension used by {{TBinaryProtocolAccelerated}} is not > compatible with thrift 0.10.0 (according to [this bug report against > Pycassa|https://github.com/pycassa/pycassa/issues/245]). The offending binary > is present in the filesystem of the [current docker > image|https://hub.docker.com/r/spod/cassandra-testing-ubuntu18-java11/], but > wasn't in [the previous image > |https://hub.docker.com/r/kjellman/cassandra-test/], which meant that thrift > would fallback to the standard protocol implementation (silently). > As this is the only test which uses {{TBinaryProtocolAccelerated}} it's easy > enough to switch it to {{TBinaryProtocol}}, which also fixes things. We might > want consider removing the binary next time the image is updated though (cc > [~spo...@gmail.com]). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org