[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055548#comment-17055548 ] David Capwell commented on CASSANDRA-15313: --- bq. Reverting CASSANDRA-1 fixes corruptionCausesFailure with seed 71671740653044L for me Sorry I don't fully follow, can you elaborate [~spod]? There are 3 issues with the test/feature (may have missed something, going off memory). 1) corruption can cause lz4 to crash the JVM. This was fixed in CASSANDRA-15556 by using the "safe" methods rather than "fast" 2) corrupted lz4 stream may not fail and may produce output != input; this is still an issue and the tests fail periodically with this. 3) generators generated too much garbage. CASSANDRA-1 switched to fixed memory and switched from strings (charset depends on the test environment since it doesn't define which charset to use) to raw bytes. Given the fact the generated data changed the seeds which failed before no longer fail, and the seeds that fail now did not fail with the old generators; both generators were able to reproduce #2, but with different seeds. > Fix flaky - ChecksummingTransformerTest - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > --- > > Key: CASSANDRA-15313 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15313 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-alpha > > Attachments: CASSANDRA-15313-hack.patch > > > During the recent runs, this test appears to be flaky. > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94] > corruptionCausesFailure-compression - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > {code:java} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at org.quicktheories.impl.Precursor.(Precursor.java:17) > at > org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8) > at > org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23) > at org.quicktheories.generators.Retry.generate(CodePoints.java:51) > at > org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190) > at > org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184) > at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93) > at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown > Source) > at > org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188) > at > org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.impl.Core.generate(Core.java:150) > at org.quicktheories.impl.Core.shrink(Core.java:103) > at org.quicktheories.impl.Core.run(Core.java:39) > at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35) > at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150) > at > org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162) > at > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055533#comment-17055533 ] Yifan Cai edited comment on CASSANDRA-15338 at 3/10/20, 2:55 AM: - The test failure was not able to be reproduced when simply running it from my laptop. However, it can be easily reproduced when running in a docker container with limited CPUs (i.e., 2). After multiple runs, the observation was that the test runs only failed when testing with LargeMessage. It indicated that the failures were probably related with {{LargeMessageDelivery}}. The following is what I think have happened. # When the {{inbound}} just opened and the first message gets queued into the {{outbound}}, handshake happens and the execution was deferred once the connection was established (executeAgain). # Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs immediately. The effect is that the runnable gets registered, but not run yet. # Connection is established, so we {{executeAgain()}}. Because the runnable {{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still false. The test runs the runnable, which counts down {{deliveryDone}} unexpectedly. # Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the flush is async and race condition can happen. ## when the inbound has received message (and countdown receiveDone) ## {{LargeMessageDelivery}} is still polling for the completion of flush, so not yet release capacity. Therefore, the assertion on the pendingCount failed. There are 2 places in the test flow are (or can go) wrong. See step 3 and step 4. Regarding step 3, the runnable {{stopAndRun}} should not be registered when establishing the connection. In production, is there a case that a {{stopAndRun}} being registered this early? Probably not. Regarding step 4, the {{outbound}} has no knowledge about whether the {{inbound}} has received any message. Test should register the runnable {{stopAndRun}} at the message handler to count down the {{deliveryDone}}. Therefore, the runnable can correctly wait for the current delivery to complete. Then it runs. PR is here: https://github.com/apache/cassandra/pull/466 As mentioned, I reproduced using the docker. Here is the bundle that one can simply download and run. [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} repeatedly until failures. I have included the patch within the image too. To reproduce, run {code:bash} bash build_and_run.sh {code} To see the runs with the patch, run {code:bash} bash build_and_run.sh patched {code} was (Author: yifanc): The test failure was not able to reproduce when simply running it from my laptop. However, it can be easily reproduced when running in a docker container with limited CPUs (i.e., 2). After multiple runs, the observation was that the test runs only failed when testing with LargeMessage. It indicated that the failures were probably related with {{LargeMessageDelivery}}. The following is what I think have happened. # When the {{inbound}} just opened and the first message gets queued into the {{outbound}}, handshake happens and the execution was deferred once the connection was established (executeAgain). # Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs immediately. The effect is that the runnable gets registered, but not run yet. # Connection is established, so we {{executeAgain()}}. Because the runnable {{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still false. The test runs the runnable, which counts down {{deliveryDone}} unexpectedly. # Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the flush is async and race condition can happen. ## when the inbound has received message (and countdown receiveDone) ## {{LargeMessageDelivery}} is still polling for the completion of flush, so not yet release capacity. Therefore, the assertion on the pendingCount failed. There are 2 places in the test flow are (or can go) wrong. See step 3 and step 4. Regarding step 3, the runnable {{stopAndRun}} should not be registered when establishing the connection. In production, is there a case that a {{stopAndRun}} being registered this early? Probably not. Regarding step 4, the {{outbound}} has no knowledge about whether the {{inbound}} has received any message. Test should register the runnable {{stopAndRun}} at the message handler to count down the {{deliveryDone}}. Therefore, the runnable can correctly wait for the current delivery to complete. Then it runs. PR is here: https://github.com/apache/cassandra/pull/466 As mentioned, I reproduced using the docker. Here is the bundle that one can simply download and run. [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} repeatedly until failures. I have
[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-15338: -- Test and Documentation Plan: unit test Status: Patch Available (was: Open) > Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest > --- > > Key: CASSANDRA-15338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15338 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Attachments: CASS-15338-Docker.zip > > Time Spent: 10m > Remaining Estimate: 0h > > Example failure: > [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1] > > {code:java} > Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest): FAILED > expected:<0> but was:<1> > junit.framework.AssertionFailedError: expected:<0> but was:<1> > at > org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code} > > Looking closer at > org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that > the run method is called before > org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to > a test race condition where the CountDownLatch completes before executing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-15338: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Unit Test Severity: Low Assignee: Yifan Cai Status: Open (was: Triage Needed) > Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest > --- > > Key: CASSANDRA-15338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15338 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Attachments: CASS-15338-Docker.zip > > Time Spent: 10m > Remaining Estimate: 0h > > Example failure: > [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1] > > {code:java} > Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest): FAILED > expected:<0> but was:<1> > junit.framework.AssertionFailedError: expected:<0> but was:<1> > at > org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code} > > Looking closer at > org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that > the run method is called before > org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to > a test race condition where the CountDownLatch completes before executing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055533#comment-17055533 ] Yifan Cai commented on CASSANDRA-15338: --- The test failure was not able to reproduce when simply running it from my laptop. However, it can be easily reproduced when running in a docker container with limited CPUs (i.e., 2). After multiple runs, the observation was that the test runs only failed when testing with LargeMessage. It indicated that the failures were probably related with {{LargeMessageDelivery}}. The following is what I think have happened. # When the {{inbound}} just opened and the first message gets queued into the {{outbound}}, handshake happens and the execution was deferred once the connection was established (executeAgain). # Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs immediately. The effect is that the runnable gets registered, but not run yet. # Connection is established, so we {{executeAgain()}}. Because the runnable {{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still false. The test runs the runnable, which counts down {{deliveryDone}} unexpectedly. # Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the flush is async and race condition can happen. ## when the inbound has received message (and countdown receiveDone) ## {{LargeMessageDelivery}} is still polling for the completion of flush, so not yet release capacity. Therefore, the assertion on the pendingCount failed. There are 2 places in the test flow are (or can go) wrong. See step 3 and step 4. Regarding step 3, the runnable {{stopAndRun}} should not be registered when establishing the connection. In production, is there a case that a {{stopAndRun}} being registered this early? Probably not. Regarding step 4, the {{outbound}} has no knowledge about whether the {{inbound}} has received any message. Test should register the runnable {{stopAndRun}} at the message handler to count down the {{deliveryDone}}. Therefore, the runnable can correctly wait for the current delivery to complete. Then it runs. PR is here: https://github.com/apache/cassandra/pull/466 As mentioned, I reproduced using the docker. Here is the bundle that one can simply download and run. [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} repeatedly until failures. I have included the patch within the image too. To reproduce, run {code:bash} bash build_and_run.sh {code} To see the runs with the patch, run {code:bash} bash build_and_run.sh patched {code} > Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest > --- > > Key: CASSANDRA-15338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15338 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Attachments: CASS-15338-Docker.zip > > Time Spent: 10m > Remaining Estimate: 0h > > Example failure: > [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1] > > {code:java} > Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest): FAILED > expected:<0> but was:<1> > junit.framework.AssertionFailedError: expected:<0> but was:<1> > at > org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code} > > Looking closer at > org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that > the run method is called before > org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to > a test race condition where the CountDownLatch completes before executing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Cai updated CASSANDRA-15338: -- Attachment: CASS-15338-Docker.zip > Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest > --- > > Key: CASSANDRA-15338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15338 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Attachments: CASS-15338-Docker.zip > > Time Spent: 10m > Remaining Estimate: 0h > > Example failure: > [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1] > > {code:java} > Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest): FAILED > expected:<0> but was:<1> > junit.framework.AssertionFailedError: expected:<0> but was:<1> > at > org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code} > > Looking closer at > org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that > the run method is called before > org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to > a test race condition where the CountDownLatch completes before executing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15338: --- Labels: pull-request-available (was: ) > Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest > --- > > Key: CASSANDRA-15338 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15338 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1] > > {code:java} > Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest): FAILED > expected:<0> but was:<1> > junit.framework.AssertionFailedError: expected:<0> but was:<1> > at > org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625) > at > org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258) > at > org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231) > at > org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code} > > Looking closer at > org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that > the run method is called before > org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to > a test race condition where the CountDownLatch completes before executing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric
[ https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-15620: -- Reviewers: Chris Lohfink, Chris Lohfink (was: Chris Lohfink, Chris Lohfink) Chris Lohfink, Chris Lohfink (was: Chris Lohfink) Status: Review In Progress (was: Patch Available) > Add "unleveled sstables" table metric > - > > Key: CASSANDRA-15620 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15620 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Normal > > The number of unleveled sstables is an important indicator that deserves to > be a dedicated table metric on its own. This will also add a global gauge > that is convenient to query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric
[ https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-15620: -- Status: Ready to Commit (was: Review In Progress) > Add "unleveled sstables" table metric > - > > Key: CASSANDRA-15620 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15620 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Normal > > The number of unleveled sstables is an important indicator that deserves to > be a dedicated table metric on its own. This will also add a global gauge > that is convenient to query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric
[ https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-15620: -- Test and Documentation Plan: na Status: Patch Available (was: Open) > Add "unleveled sstables" table metric > - > > Key: CASSANDRA-15620 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15620 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Normal > > The number of unleveled sstables is an important indicator that deserves to > be a dedicated table metric on its own. This will also add a global gauge > that is convenient to query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15620) Add "unleveled sstables" table metric
[ https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055509#comment-17055509 ] Chris Lohfink commented on CASSANDRA-15620: --- +1 on code, just spot checked with lcs and stcs as well and worked great thanks! > Add "unleveled sstables" table metric > - > > Key: CASSANDRA-15620 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15620 > Project: Cassandra > Issue Type: Improvement > Components: Observability/Metrics >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Normal > > The number of unleveled sstables is an important indicator that deserves to > be a dedicated table metric on its own. This will also add a global gauge > that is convenient to query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15627) sstable not in the corresponding level in the leveled manifest
[ https://issues.apache.org/jira/browse/CASSANDRA-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15627: -- Bug Category: Parent values: Correctness(12982)Level 1 values: API / Semantic Implementation(12988) Complexity: Normal Discovered By: Workload Replay Severity: Normal Status: Open (was: Triage Needed) > sstable not in the corresponding level in the leveled manifest > -- > > Key: CASSANDRA-15627 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15627 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/Compaction/LCS >Reporter: David Capwell >Priority: Normal > > I get the following warning logs when running smoke tests > bq. Live sstable > /cassandra/d1/data/ks/table-cce7c54b5abf3f369bb7659a74e9e963/mf-71-big-Data.db > from level 0 is not on corresponding level in the leveled manifest. This is > not a problem per se, but may indicate an orphaned sstable due to a failed > compaction not cleaned up properly. > There are no other warning logs and no error logs; so compaction doesn’t have > anything saying there was a failure. > Schema > {code} > CREATE TABLE ks.table ( > pk1 ascii, > pk2 bigint, > ck1 ascii, > ck2 ascii, > ck3 ascii, > v1 int, > v2 ascii, > PRIMARY KEY ((pk1,pk2), ck1, ck2, ck3) > ) WITH comment = 'test table' > AND gc_grace_seconds = 1 > AND memtable_flush_period_in_ms = 100 > AND compression = {'class': 'LZ4Compressor'} > AND compaction = {'class': 'LeveledCompactionStrategy', > 'only_purge_repaired_tombstones': true} > AND CLUSTERING ORDER BY (ck1 DESC,ck2 ASC,ck3 DESC); > {code} > test > * run simulated queries for 30 minutes > * run incremental repair in a loop (once one completes run the next) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15627) sstable not in the corresponding level in the leveled manifest
David Capwell created CASSANDRA-15627: - Summary: sstable not in the corresponding level in the leveled manifest Key: CASSANDRA-15627 URL: https://issues.apache.org/jira/browse/CASSANDRA-15627 Project: Cassandra Issue Type: Bug Components: Local/Compaction, Local/Compaction/LCS Reporter: David Capwell I get the following warning logs when running smoke tests bq. Live sstable /cassandra/d1/data/ks/table-cce7c54b5abf3f369bb7659a74e9e963/mf-71-big-Data.db from level 0 is not on corresponding level in the leveled manifest. This is not a problem per se, but may indicate an orphaned sstable due to a failed compaction not cleaned up properly. There are no other warning logs and no error logs; so compaction doesn’t have anything saying there was a failure. Schema {code} CREATE TABLE ks.table ( pk1 ascii, pk2 bigint, ck1 ascii, ck2 ascii, ck3 ascii, v1 int, v2 ascii, PRIMARY KEY ((pk1,pk2), ck1, ck2, ck3) ) WITH comment = 'test table' AND gc_grace_seconds = 1 AND memtable_flush_period_in_ms = 100 AND compression = {'class': 'LZ4Compressor'} AND compaction = {'class': 'LeveledCompactionStrategy', 'only_purge_repaired_tombstones': true} AND CLUSTERING ORDER BY (ck1 DESC,ck2 ASC,ck3 DESC); {code} test * run simulated queries for 30 minutes * run incremental repair in a loop (once one completes run the next) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12510) Disallow decommission when number of replicas will drop below configured RF
[ https://issues.apache.org/jira/browse/CASSANDRA-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055361#comment-17055361 ] Stefan Miklosovic commented on CASSANDRA-12510: --- Isnt the same logic applicable to _drain_ ? > Disallow decommission when number of replicas will drop below configured RF > --- > > Key: CASSANDRA-12510 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12510 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Streaming and Messaging > Environment: C* version 3.3 >Reporter: Atin Sood >Assignee: Kurt Greaves >Priority: Low > Labels: lhf > Fix For: 4.0 > > Attachments: 12510-3.x-v2.patch, 12510-3.x.patch > > > Steps to replicate : > - Create a 3 node cluster in DC1 and create a keyspace test_keyspace with > table test_table with replication strategy NetworkTopologyStrategy , DC1=3 . > Populate some data into this table. > - Add 5 more nodes to this cluster, but in DC2. Also do not alter the > keyspace to add the new DC2 to replication (this is intentional and the > reason why the bug shows up). So the desc keyspace should still list > NetworkTopologyStrategy with DC1=3 as RF > - As expected, this will now be a 8 node cluster with 3 nodes in DC1 and 5 in > DC2 > - Now start decommissioning the nodes in DC1. Note that the decommission runs > fine on all the 3 nodes, but since the new nodes are in DC2 and the RF for > keyspace is restricted to DC1, the new 5 nodes won't get any data. > - You will now end with the 5 node cluster which has no data from the > decommissioned 3 nodes and hence ending up in data loss > I do understand that this problem could have been avoided if we perform an > alter stmt and add DC2 replication before adding the 5 nodes. But the fact > that decommission ran fine on the 3 nodes on DC1 without complaining that > there were no nodes to stream its data seems a little discomforting. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-sidecar] branch master updated: Ninja fix changelog
This is an automated email from the ASF dual-hosted git repository. rustyrazorblade pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git The following commit(s) were added to refs/heads/master by this push: new 2c5f484 Ninja fix changelog 2c5f484 is described below commit 2c5f4841479d5ff80a21540ec4e2fa5344a52251 Author: Jon Haddad AuthorDate: Mon Mar 9 11:11:17 2020 -0700 Ninja fix changelog --- CHANGES.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGES.txt b/CHANGES.txt index 49d7800..00defa6 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,5 +1,6 @@ 1.0.0 - + * Build and Test with both Java 8 & 11 in Circle CI (CASSANDRA-15611) * Upgraded Gradle and replaced FindBugs with SpotBugs (CASSANDRA-15610) * Improving local HealthCheckTest reliability (CASSANDRA-15615) * Read sidecar.yaml from sidecar.config System Property instead of classpath (CASSANDRA-15288) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-sidecar] branch master updated: Improving CircleCI build reliability
This is an automated email from the ASF dual-hosted git repository. rustyrazorblade pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git The following commit(s) were added to refs/heads/master by this push: new 595fea7 Improving CircleCI build reliability 595fea7 is described below commit 595fea7d97f0d87ac9b9a1510379a6faa6a29abf Author: Jon Haddad AuthorDate: Wed Mar 4 13:56:46 2020 -0800 Improving CircleCI build reliability Switched to Circle machine image - docker has issues with networking in tests Fix storing of test results Updated readme with Java 11 Upgrade vertx Wait for vertx server startup before sending requests Update simulacron to latest bug fix version added spotbugs exclude config to avoid incorrect NPE error on java 11 Configure CircleCi to run tests with Java 11 Patch by Jon Haddad; Reviewed by Dinesh Joshi for CASSANDRA-15611 --- .circleci/config.yml | 89 -- README.md | 11 ++- build.gradle | 17 +++-- .../sidecar/HealthServiceIntegrationTest.java | 1 + src/main/resources/spotbugs-exclude.xml| 14 .../sidecar/AbstractHealthServiceTest.java | 10 ++- 6 files changed, 110 insertions(+), 32 deletions(-) diff --git a/.circleci/config.yml b/.circleci/config.yml index 8ab909d..690b4a6 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -2,42 +2,87 @@ # # Check https://circleci.com/docs/2.0/language-java/ for more details # -version: 2 -jobs: - build: -docker: - - image: circleci/openjdk:8-jdk +version: 2.1 +# need to reuse the same base environment for several tests +aliases: + base_job: _job +machine: + image: ubuntu-1604:201903-01 working_directory: ~/repo - environment: TERM: dumb +# we might modify this in the future to accept a parameter for the java package to install +commands: + install_java: +description: "Installs Java 8 using AdoptOpenJDK" +parameters: + version: +type: string + steps: - - checkout + - run: wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | sudo apt-key add - + - run: sudo add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ + - run: sudo apt-get update + - run: sudo apt-get install -y << parameters.version>> + + install_common: +description: "Installs common software and certificates" +steps: + - run: sudo apt-get update + - run: sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common - # Download and cache dependencies - - restore_cache: - keys: -- v1-dependencies-{{ checksum "build.gradle" }} -# fallback to using the latest cache if no exact match is found -- v1-dependencies- +jobs: + java8: +<<: *base_job - - run: ./gradlew dependencies +steps: + - checkout + - install_common + + - install_java: + version: adoptopenjdk-8-hotspot - - save_cache: - paths: -- ~/.gradle - key: v1-dependencies-{{ checksum "build.gradle" }} + - run: sudo update-java-alternatives -s adoptopenjdk-8-hotspot-amd64 && java -version # make sure it builds with build steps like swagger docs and dist - - run: ./gradlew build + - run: ./gradlew build --stacktrace + + - store_artifacts: + path: build/reports + destination: test-reports + + - store_test_results: + path: ~/repo/build/test-results/ + + java11: +<<: *base_job +steps: + - checkout + - install_common - # run tests! - - run: ./gradlew check + - install_java: + version: adoptopenjdk-11-hotspot + + - run: sudo update-java-alternatives -s adoptopenjdk-11-hotspot-amd64 && java -version + + - run: ./gradlew build --stacktrace - store_artifacts: path: build/reports destination: test-reports + - store_test_results: - path: build/reports \ No newline at end of file + path: ~/repo/build/test-results/ + +workflows: + version: 2 + + test_java_8: +jobs: + - java8 + + test_java_11: +jobs: + - java11 \ No newline at end of file diff --git a/README.md b/README.md index 327948b..f0e29b9 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,8 @@ For more information, see [the Apache Cassandra web site](http://cassandra.apach Requirements - 1. Java >= 1.8 (OpenJDK or Oracle) - 2. Apache Cassandra 4.0 + 1. Java >= 1.8 (OpenJDK or Oracle), or Java 11 + 2. Apache Cassandra 4.0. We depend on virtual tables which is a 4.0 only feature. Getting started --- @@ -20,6 +20,13 @@ Apache Cassandra running on
[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI
[ https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jon Haddad updated CASSANDRA-15611: --- Fix Version/s: 4.0-alpha Source Control Link: https://github.com/apache/cassandra-sidecar/commit/595fea7d97f0d87ac9b9a1510379a6faa6a29abf Resolution: Fixed Status: Resolved (was: Ready to Commit) > Build and Test with both Java 8 & 11 in Circle CI > - > > Key: CASSANDRA-15611 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15611 > Project: Cassandra > Issue Type: Improvement > Components: Sidecar >Reporter: Jon Haddad >Assignee: Jon Haddad >Priority: Normal > Fix For: 4.0-alpha > > > We currently only build and test with Java 8. We should ensure Java 11 is > fully supported for both builds and testing in CircleCI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI
[ https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-15611: - Status: Ready to Commit (was: Review In Progress) +1 LGTM! > Build and Test with both Java 8 & 11 in Circle CI > - > > Key: CASSANDRA-15611 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15611 > Project: Cassandra > Issue Type: Improvement > Components: Sidecar >Reporter: Jon Haddad >Assignee: Jon Haddad >Priority: Normal > > We currently only build and test with Java 8. We should ensure Java 11 is > fully supported for both builds and testing in CircleCI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI
[ https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-15611: - Reviewers: Dinesh Joshi, Dinesh Joshi (was: Dinesh Joshi) Dinesh Joshi, Dinesh Joshi (was: Dinesh Joshi) Status: Review In Progress (was: Patch Available) > Build and Test with both Java 8 & 11 in Circle CI > - > > Key: CASSANDRA-15611 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15611 > Project: Cassandra > Issue Type: Improvement > Components: Sidecar >Reporter: Jon Haddad >Assignee: Jon Haddad >Priority: Normal > > We currently only build and test with Java 8. We should ensure Java 11 is > fully supported for both builds and testing in CircleCI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys
[ https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-14365: --- Fix Version/s: 4.x 3.11.x 3.0.x 2.2.x > Commit log replay failure for static columns with collections in clustering > keys > > > Key: CASSANDRA-14365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14365 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > > In the old storage engine, static cells with a collection as part of the > clustering key fail to validate because a 0 byte collection (like in the cell > name of a static cell) isn't valid. > To reproduce: > 1. > {code:java} > CREATE TABLE test.x ( > id int, > id2 frozen>, > st int static, > PRIMARY KEY (id, id2) > ); > INSERT INTO test.x (id, st) VALUES (1, 2); > {code} > 2. > Kill the cassandra process > 3. > Restart cassandra to replay the commitlog > Outcome: > {noformat} > ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting > due to error while processing commit log during initialization. > org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: > Unexpected error deserializing mutation; saved to > /tmp/mutation3825739904516830950dat. This may be caused by replaying a > mutation against a table with the same name but incompatible schema. > Exception follows: org.apache.cassandra.serializers.MarshalException: Not > enough bytes to read a set > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) > [main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) > [main/:na] > {noformat} > I haven't investigated if there are other more subtle issues caused by these > cells failing to validate other places in the code, but I believe the fix for > this is to check for 0 byte length collections and accept them as valid as we > do with other types. > I haven't had a chance for any extensive testing but this naive patch seems > to have the desired affect. > ||Patch|| > |[2.2 > PoC|https://github.com/vincewhite/cassandra/commits/zero_length_collection]| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15388) Add compaction allocation measurement test to support compaction gc optimization.
[ https://issues.apache.org/jira/browse/CASSANDRA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055216#comment-17055216 ] David Capwell commented on CASSANDRA-15388: --- bq. This is not meant to be in a state where it can be plugged into our ci process. Sure, would be good for this to evolve over time but not a blocker for this. New changes are fine, only nits really left (though would prefer isAgentLoaded since logs are too dense its easy to miss) +1 > Add compaction allocation measurement test to support compaction gc > optimization. > -- > > Key: CASSANDRA-15388 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15388 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > This adds a test that is able to quickly and accurately measure the effect of > potential gc optimizations against a wide range of (synthetic) compaction > workloads. This test accurately measures allocation rates from 16 workloads > in less that 2 minutes. > This test uses google’s {{java-allocation-instrumenter}} agent to measure the > workloads. Measurements using this agent are very accurate and pretty > repeatable from run to run, with most variance being negligible (1-2 bytes > per partition), although workloads with larger but fewer partitions vary a > bit more (still less that 0.03%). > The thinking behind this patch is that with compaction, we’re generally > interested in the memory allocated per partition, since garbage scales more > or less linearly with the number of partitions compacted. So measuring > allocation from a small number of partitions that otherwise represent real > world use cases is a good enough approximation. > In addition to helping with compaction optimizations, this test could be used > as a template for future optimization work. This pattern could also be used > to set allocation limits on workloads/operations and fail CI if the > allocation behavior changes past some threshold. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055189#comment-17055189 ] Benjamin Lerer commented on CASSANDRA-15557: Sorry, there is an issue with the patch as pointed out by [~jasonstack] in CASSANDRA-15303. The timestamp need to be set during the {{execution}} phase and not during the {{prepare}} one. Otherwise if the statement is prepared by the user it will reuse the same timestamp everytime it is executed. > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15557: --- Status: Patch Available (was: Ready to Commit) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-15557: -- Reviewers: Benjamin Lerer, Aleksey Yeschenko (was: Aleksey Yeschenko, Benjamin Lerer) Benjamin Lerer, Aleksey Yeschenko (was: Benjamin Lerer) Status: Review In Progress (was: Patch Available) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055175#comment-17055175 ] Aleksey Yeschenko commented on CASSANDRA-15557: --- LGTM as well. > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-15557: -- Status: Ready to Commit (was: Review In Progress) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055165#comment-17055165 ] David Capwell commented on CASSANDRA-15564: --- [~ifesdjeen] [~jasonstack] replied or changed based off feedback; please review > Refactor repair coordinator so errors are consistent > > > Key: CASSANDRA-15564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15564 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 11h 10m > Remaining Estimate: 0h > > This is to split the change in CASSANDRA-15399 so the refactor is isolated > out. > Currently the repair coordinator special cases the exit cases at each call > site; this makes it so that errors can be inconsistent and there are cases > where proper complete isn't done (proper notifications, and forgetting to > update ActiveRepairService). > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth
[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055162#comment-17055162 ] Benedict Elliott Smith commented on CASSANDRA-15369: I think _probably_ it is preferable to generate fake row deletions where possible, since their semantics are much better than range tombstones. If the user is lucky, they might never see a range tombstone. Since it's anyway impossible today to deal with range tombstones, we need a separate effort there, and so it's probably reasonable to leave unsolved for now the cases that _require_ fake RTs. We will either need to guarantee RTs are replicated as inserted (without any subdivisions we currently produce) or that they are only accounted for in digest via non-RT data (since otherwise there seems no possible way to ensure a consistent digest for a response). Either way, it's probably better to do our best to avoid the scenario altogether, and use row deletions wherever possible. > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > --- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055146#comment-17055146 ] Alex Petrov commented on CASSANDRA-15564: - [~dcapwell] thank you for the patch!The change looks good overall. I've added several small comments on github. As discussed offline, we also need to change the initialization order to make sure outbound message sending is wrapping fake messaging and not vice versa. I'm wondering whether we should stick to {{runInbound}} in builder, or we should switch to `filters().inbound()` or something similar, where `filters()` would return some interface that has `inbound` and `outbound`. This could even leave most of the things more or less same implementation-wise. Should we add a test that ensures the order (in other words, any message first goes through outbound, and only then through inbound filter)? Also, it might make sense to test both in- and out-bound filters in {{testMessageMatching}}, wdyt? > Refactor repair coordinator so errors are consistent > > > Key: CASSANDRA-15564 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15564 > Project: Cassandra > Issue Type: Sub-task > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 11h 10m > Remaining Estimate: 0h > > This is to split the change in CASSANDRA-15399 so the refactor is isolated > out. > Currently the repair coordinator special cases the exit cases at each call > site; this makes it so that errors can be inconsistent and there are cases > where proper complete isn't done (proper notifications, and forgetting to > update ActiveRepairService). > [Circle > CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15303) drop column statement should not initialize timestamp because of statement cache
[ https://issues.apache.org/jira/browse/CASSANDRA-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15303: --- Reviewers: Benjamin Lerer > drop column statement should not initialize timestamp because of statement > cache > > > Key: CASSANDRA-15303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15303 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.x > > > When executing drop-column query without timestamp, > {{AlterTableStatement#Raw}} initializes a default timestamp and then the > prepared statement is cached. The same timestamp will be reused for the same > drop-column query. (related to CASSANDRA-13426) > > The fix is to use NULL timestamp to indicate: using statement execution time > instead. > > patch: > [https://github.com/jasonstack/cassandra/commits/fix-drop-column-timestamp] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth
[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055129#comment-17055129 ] ZhaoYang edited comment on CASSANDRA-15369 at 3/9/20, 4:11 PM: --- bq. initially addressing only the differing ways we create fake deletions do you mean by unifying the tombstone creation from memtable/ sstable/slice-query to only range tombstone markers? was (Author: jasonstack): bq. initially addressing only the differing ways we create fake deletions do you mean by unifying the tombstone creation from memtable/ sstable/slice-query to only row tombstone markers? > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > --- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15557: --- Status: Patch Available (was: Ready to Commit) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth
[ https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055129#comment-17055129 ] ZhaoYang commented on CASSANDRA-15369: -- bq. initially addressing only the differing ways we create fake deletions do you mean by unifying the tombstone creation from memtable/ sstable/slice-query to only row tombstone markers? > Fake row deletions and range tombstones, causing digest mismatch and sstable > growth > --- > > Key: CASSANDRA-15369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15369 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/Memtable, Local/SSTable >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 3.0.x, 3.11.x > > > As assessed in CASSANDRA-15363, we generate fake row deletions and fake > tombstone markers under various circumstances: > * If we perform a clustering key query (or select a compact column): > * Serving from a {{Memtable}}, we will generate fake row deletions > * Serving from an sstable, we will generate fake row tombstone markers > * If we perform a slice query, we will generate only fake row tombstone > markers for any range tombstone that begins or ends outside of the limit of > the requested slice > * If we perform a multi-slice or IN query, this will occur for each > slice/clustering > Unfortunately, these different behaviours can lead to very different data > stored in sstables until a full repair is run. When we read-repair, we only > send these fake deletions or range tombstones. A fake row deletion, > clustering RT and slice RT, each produces a different digest. So for each > single point lookup we can produce a digest mismatch twice, and until a full > repair is run we can encounter an unlimited number of digest mismatches > across different overlapping queries. > Relatedly, this seems a more problematic variant of our atomicity failures > caused by our monotonic reads, since RTs can have an atomic effect across (up > to) the entire partition, whereas the propagation may happen on an > arbitrarily small portion. If the RT exists on only one node, this could > plausibly lead to fairly problematic scenario if that node fails before the > range can be repaired. > At the very least, this behaviour can lead to an almost unlimited amount of > extraneous data being stored until the range is repaired and compaction > happens to overwrite the sub-range RTs and row deletions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055118#comment-17055118 ] Stefan Podkowinski commented on CASSANDRA-15313: Reverting CASSANDRA-1 fixes corruptionCausesFailure with seed 71671740653044L for me. But semantics for generators change with that as well, so I'm not 100% sure its the actual cause. > Fix flaky - ChecksummingTransformerTest - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > --- > > Key: CASSANDRA-15313 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15313 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-alpha > > Attachments: CASSANDRA-15313-hack.patch > > > During the recent runs, this test appears to be flaky. > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94] > corruptionCausesFailure-compression - > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest > {code:java} > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at org.quicktheories.impl.Precursor.(Precursor.java:17) > at > org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8) > at > org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23) > at org.quicktheories.generators.Retry.generate(CodePoints.java:51) > at > org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190) > at > org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184) > at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown > Source) > at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93) > at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown > Source) > at > org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188) > at > org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown > Source) > at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255) > at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225) > at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36) > at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown > Source) > at org.quicktheories.impl.Core.generate(Core.java:150) > at org.quicktheories.impl.Core.shrink(Core.java:103) > at org.quicktheories.impl.Core.run(Core.java:39) > at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35) > at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150) > at > org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162) > at > org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15557: --- Status: Ready to Commit (was: Review In Progress) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15557: --- Test and Documentation Plan: The patch is a fix for a flacky test. Status: Patch Available (was: In Progress) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Lerer updated CASSANDRA-15557: --- Reviewers: Benjamin Lerer, Benjamin Lerer (was: Benjamin Lerer) Benjamin Lerer, Benjamin Lerer (was: Benjamin Lerer) Status: Review In Progress (was: Patch Available) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055110#comment-17055110 ] Benjamin Lerer commented on CASSANDRA-15557: The patch looks good to me. > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108 ] Chandrasekhar Thumuluru commented on CASSANDRA-15397: - {quote} I'm not sure if assuming long will be a good idea. {quote} I meant in the context of generics and about the performance. I'll make necessary changes, compare it again and post the results. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Labels: pull-request-available > Attachments: 90p_100k_sstables_with_1000_searches.png, > 90p_1million_sstables_with_1000_searches.png, > 90p_250k_sstables_with_1000_searches.png, > 90p_500k_sstables_with_1000_searches.png, > 90p_750k_sstables_with_1000_searches.png, > 95p_1_SSTable_with_5000_Searches.png, > 95p_100k_sstables_with_1000_searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_1million_sstables_with_1000_searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_250k_sstables_with_1000_searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 95p_500k_sstables_with_1000_searches.png, > 95p_750k_sstables_with_1000_searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_100k_sstables_with_1000_searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_1million_sstables_with_1000_searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_250k_sstables_with_1000_searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, > 99p_500k_sstables_with_1000_searches.png, > 99p_750k_sstables_with_1000_searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_100k_sstables_with_1000_searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_1million_sstables_with_1000_searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_250k_sstables_with_1000_searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png, > Mean_500k_sstables_with_1000_searches.png, > Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, > replace_intervaltree_with_intervallist.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108 ] Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 3/9/20, 3:50 PM: -- {quote} I'm not sure if assuming long will be a good idea. {quote} I meant in the context of generics and not about the performance. I'll make necessary changes, compare it again and post the results. was (Author: cthumuluru): {quote} I'm not sure if assuming long will be a good idea. {quote} I meant in the context of generics and about the performance. I'll make necessary changes, compare it again and post the results. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Labels: pull-request-available > Attachments: 90p_100k_sstables_with_1000_searches.png, > 90p_1million_sstables_with_1000_searches.png, > 90p_250k_sstables_with_1000_searches.png, > 90p_500k_sstables_with_1000_searches.png, > 90p_750k_sstables_with_1000_searches.png, > 95p_1_SSTable_with_5000_Searches.png, > 95p_100k_sstables_with_1000_searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_1million_sstables_with_1000_searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_250k_sstables_with_1000_searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 95p_500k_sstables_with_1000_searches.png, > 95p_750k_sstables_with_1000_searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_100k_sstables_with_1000_searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_1million_sstables_with_1000_searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_250k_sstables_with_1000_searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, > 99p_500k_sstables_with_1000_searches.png, > 99p_750k_sstables_with_1000_searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_100k_sstables_with_1000_searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_1million_sstables_with_1000_searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_250k_sstables_with_1000_searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png, > Mean_500k_sstables_with_1000_searches.png, > Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, > replace_intervaltree_with_intervallist.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail:
[jira] [Updated] (CASSANDRA-15601) Ensure repaired data tracking reads a consistent amount of data across replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-15601: Reviewers: Aleksey Yeschenko > Ensure repaired data tracking reads a consistent amount of data across > replicas > --- > > Key: CASSANDRA-15601 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15601 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0-alpha > > > When generating a digest for repaired data tracking, the amount of repaired > data that needs to be read may depend on the unrepaired data on the replica. > As this may vary between replicas, digest mismatches can be reported even > though the repaired data may actually be in sync. > For example, two replicas, A & B and a table like > {code} > CREATE TABLE t (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER > BY ck DESC; > Unrepaired > === > Instance A > (0, 5) > Instance B > (0, 6) > (0, 5) > Repaired (Both A & B) > = > (0, 4) > (0, 3) > (0, 2) > (0, 1) > (0, 0) > SELECT * FROM tbl WHERE pk = 0 LIMIT 3; > {code} > Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from > the repaired set. > Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) > from repaired data. > Unrepaired row/range/partition tombstones shadowing repaired data and present > on some replicas but not others will have the opposite effect, with more > repaired data being read in comparison. > To fix this, when repaired data tracking is in effect each replica needs to > overread during a full data read. Replicas should read up to {{LIMIT}} (i.e. > the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless > of how much is read from the unrepaired data. At the point where that amount > of repaired data has been read, replica should stop updating the digest. So > if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read, > the digest is only calculated over the first {{LIMIT}}-worth of repaired data. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15566) Repair coordinator can hang under some cases
[ https://issues.apache.org/jira/browse/CASSANDRA-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055063#comment-17055063 ] David Capwell commented on CASSANDRA-15566: --- bq. C* 4.0 code is quite new to me... Me too :) One of the best ways to start is testing; we need more tests to show where repair needs improvement. When I joined this project I asked operators top pain points with repair (all were from 2.1) and as I write tests I see 4.0 has the same issues. More tests which show new areas world be great! Think your 5 classifications are good, though 1/2 can merge; our networking is lossy (not a bad thing, under load it’s crash or drop). I would love a smoke test which runs user/operators tasks constantly under “load” (should be able to artificially lower resources). This test would help show if the different sub systems work well or need improvement as well. About participate crashing, I added a jvm dtest with shows this is handled; assuming failure detector detect this (restart node also fails repair). About detection and abort, I agree it should be external for now. Any/all things the external tools need must be identified and tested to show they work (for example does aborting repair work?). > Repair coordinator can hang under some cases > > > Key: CASSANDRA-15566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15566 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > Repair coordination makes a few assumptions about message delivery which > cause it to hang forever when those assumptions don’t hold true: fire and > forget will not get rejected (participate has an issue and rejects the > message), and a very delayed message will one day be seen (messaging can be > dropped under load or when failure detector thinks a node is bad but is just > GCing). > Given this and the desire to have better observability with repair (see > CASSANDRA-15399), coordination should be changed into a request/response > pattern (with retries) and polling (validation status and MerkleTree > sending). This would allow the coordinator to detect changes in state (it > was known participate was working on validation, but it no longer knows about > the validation task), and to be able to recover from ephemeral issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14587) TrueDiskSpaceUsed overcounts snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova reassigned CASSANDRA-14587: --- Assignee: (was: Ekaterina Dimitrova) > TrueDiskSpaceUsed overcounts snapshots > -- > > Key: CASSANDRA-14587 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14587 > Project: Cassandra > Issue Type: Improvement > Components: Tool/nodetool > Environment: Debian 8 > Cassandra 3.11.2 >Reporter: Elliott Sims >Priority: Low > > Running 'nodetool listsnapshots' seems to overcount "TrueDiskSpaceUsed" under > some circumstances. Specifically when there's a large number of snapshots. > I suspect that it's not deduplicating space used when multiple snapshots > share sstables that are not part of the current table. > Results of "nodetool listsnapshots": > Total TrueDiskSpaceUsed: 396.11 MiB > Results of "du -hcs" on the table's directory: > 18M total > This is 50+ snapshots (every minute) run with "-t -sf > --column-family " > The results of a "du -hcs -L "TrueDiskSpaceUsed" > I have only tested against 3.11.2, but have no reason to believe it's unique > to that version or even 3.x. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys
[ https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984658#comment-16984658 ] Michael Semb Wever edited comment on CASSANDRA-14365 at 3/9/20, 2:33 PM: - With new tests… (test against trunk also needed a rewrite bc {{`TableMetadata.Builder`}} ||branch||circleci||jenkins pipeline|| |[cassandra_2.2_14365|https://github.com/apache/cassandra/compare/cassandra-2.2...thelastpickle:mck/cassandra-2.2_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-2.2_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/40/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/40]| |[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]| |[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]| |[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]| was (Author: michaelsembwever): With new tests… (test against trunk also needed a rewrite bc {{`TableMetadata.Builder`}} ||branch||circleci||jenkins pipeline|| |[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]| |[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]| |[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]| > Commit log replay failure for static columns with collections in clustering > keys > > > Key: CASSANDRA-14365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14365 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > In the old storage engine, static cells with a collection as part of the > clustering key fail to validate because a 0 byte collection (like in the cell > name of a static cell) isn't valid. > To reproduce: > 1. > {code:java} > CREATE TABLE test.x ( > id int, > id2 frozen>, > st int static, > PRIMARY KEY (id, id2) > ); > INSERT INTO test.x (id, st) VALUES (1, 2); > {code} > 2. > Kill the cassandra process > 3. > Restart cassandra to replay the commitlog > Outcome: > {noformat} > ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting > due to error while processing commit log during initialization. > org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: > Unexpected error deserializing mutation; saved to > /tmp/mutation3825739904516830950dat. This may be caused by replaying a > mutation against a table with the same name but incompatible schema. > Exception follows: org.apache.cassandra.serializers.MarshalException: Not > enough bytes to read a set > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638) > [main/:na] > at >
[jira] [Comment Edited] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys
[ https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984658#comment-16984658 ] Michael Semb Wever edited comment on CASSANDRA-14365 at 3/9/20, 2:32 PM: - With new tests… (test against trunk also needed a rewrite bc {{`TableMetadata.Builder`}} ||branch||circleci||jenkins pipeline|| |[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]| |[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]| |[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]| was (Author: michaelsembwever): With new tests… (test against trunk also needed a rewrite bc {{`TableMetadata.Builder`}} ||branch||circleci||asf jenkins tests||asf jenkins dtests|| |[cassandra-2.2_14365|https://github.com/apache/cassandra/compare/cassandra-2.2...thelastpickle:mck/cassandra-2.2_14365]|[circleci|https://circleci.com/workflow-run/d500cc5f-1d87-4beb-815e-9931f8e84d95]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/29//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/29/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/703//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/703]| |[cassandra-3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/workflow-run/747730de-573a-4e80-98f0-4defa14db909]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/33//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/33/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/706//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/706]| |[cassandra-3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/workflow-run/86ca8a61-5cc2-40db-84a4-1210cf44f285]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/34//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/34/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707]| |[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/workflow-run/a034a6b1-a7d7-43cd-b1ab-14769799b30e]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/35//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/35/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707]| > Commit log replay failure for static columns with collections in clustering > keys > > > Key: CASSANDRA-14365 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14365 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Vincent White >Assignee: Vincent White >Priority: Normal > > In the old storage engine, static cells with a collection as part of the > clustering key fail to validate because a 0 byte collection (like in the cell > name of a static cell) isn't valid. > To reproduce: > 1. > {code:java} > CREATE TABLE test.x ( > id int, > id2 frozen>, > st int static, > PRIMARY KEY (id, id2) > ); > INSERT INTO test.x (id, st) VALUES (1, 2); > {code} > 2. > Kill the cassandra process > 3. >
[jira] [Comment Edited] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
[ https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990 ] Kevin Gallardo edited comment on CASSANDRA-15543 at 3/9/20, 2:08 PM: - Sounds good, thanks, hope you had a good weekend :) As a summary, In any case, I believe passing an immutable copy of the {{failureReasonByEndpoint}} map to the constructor of Read/WriteFailureException would reduce the chances for the {{number of failures}} and the failure messages to be inconsistent. In addition to that, there's the remaining question of the behavior of ReadCallback when failures happen (do we fail fast? or do we wait for all responses to come back/timeout?). Depending on the outcome of that, the test that is flaky at the moment would need to be adjusted to expect 1 *or* 2 failures in the response. was (Author: newkek): Sounds good. As a summary, In any case, I believe passing an immutable copy of the {{failureReasonByEndpoint}} map to the constructor of Read/WriteFailureException would reduce the chances for the {{number of failures}} and the failure messages to be inconsistent. In addition to that, there's the remaining question of the behavior of ReadCallback when failures happen (do we fail fast? or do we wait for all responses to come back/timeout?). Depending on the outcome of that, the test that is flaky at the moment would need to be adjusted to expect 1 *or* 2 failures in the response. > flaky test > org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement > --- > > Key: CASSANDRA-15543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15543 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: David Capwell >Assignee: Kevin Gallardo >Priority: Normal > Fix For: 4.0-alpha > > > This fails infrequently, last seen failure was on java 8 > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
[ https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990 ] Kevin Gallardo edited comment on CASSANDRA-15543 at 3/9/20, 2:07 PM: - Sounds good. As a summary, In any case, I believe passing an immutable copy of the {{failureReasonByEndpoint}} map to the constructor of Read/WriteFailureException would reduce the chances for the {{number of failures}} and the failure messages to be inconsistent. In addition to that, there's the remaining question of the behavior of ReadCallback when failures happen (do we fail fast? or do we wait for all responses to come back/timeout?). Depending on the outcome of that, the test that is flaky at the moment would need to be adjusted to expect 1 *or* 2 failures in the response. was (Author: newkek): Sounds good. As a summary, In any case, I believe passing an immutable copy of the {{failureReasonByEndpoint}} map to the constructor of Read/WriteFailureException would reduce the chances for the {{number of failures}} and the failure messages to be inconsistent. In addition to that, there's the remaining question of the behavior of ReadCallback when failures happen (do we fail fast? or do we wait for all responses to come back/timeout?). Depending on the outcome of that the test that is flaky at the moment would need to be adjusted to expect 1 *or* 2 failures in the response. > flaky test > org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement > --- > > Key: CASSANDRA-15543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15543 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: David Capwell >Assignee: Kevin Gallardo >Priority: Normal > Fix For: 4.0-alpha > > > This fails infrequently, last seen failure was on java 8 > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
[ https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990 ] Kevin Gallardo commented on CASSANDRA-15543: Sounds good. As a summary, In any case, I believe passing an immutable copy of the {{failureReasonByEndpoint}} map to the constructor of Read/WriteFailureException would reduce the chances for the {{number of failures}} and the failure messages to be inconsistent. In addition to that, there's the remaining question of the behavior of ReadCallback when failures happen (do we fail fast? or do we wait for all responses to come back/timeout?). Depending on the outcome of that the test that is flaky at the moment would need to be adjusted to expect 1 *or* 2 failures in the response. > flaky test > org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement > --- > > Key: CASSANDRA-15543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15543 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: David Capwell >Assignee: Kevin Gallardo >Priority: Normal > Fix For: 4.0-alpha > > > This fails infrequently, last seen failure was on java 8 > {code} > junit.framework.AssertionFailedError > at > org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15626) Need microsecond precision for dropped columns so we can avoid timestamp issues
[ https://issues.apache.org/jira/browse/CASSANDRA-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Svihla updated CASSANDRA-15626: Description: In CASSANDRA-15557 the fix for the flaky test is reimplementing the logic from CASSANDRA-12997 which was removed as part of CASSANDRA-13426 However, since dropped columns are stored at a millisecond precision instead of a microsecond precision and ClientState.getTimestamp adds microseconds on each call we will lose the precision on save and some writes that should be dropped could reappear. Note views affected as well [https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716] was: In CASSANDRA-15557 the fix for the flaky test was reimplementing the logic from CASSANDRA-12997 which was removed as part of CASSANDRA-13426 However, since dropped columns are stored at a millisecond precision instead of a microsecond precision and ClientState.getTimestamp adds microseconds on each call we will lose the precision on save and some writes that should be dropped could reappear. Note views affected as well [https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716] > Need microsecond precision for dropped columns so we can avoid timestamp > issues > --- > > Key: CASSANDRA-15626 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15626 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Ryan Svihla >Priority: Normal > > In CASSANDRA-15557 the fix for the flaky test is reimplementing the logic > from CASSANDRA-12997 which was removed as part of CASSANDRA-13426 > However, since dropped columns are stored at a millisecond precision instead > of a microsecond precision and ClientState.getTimestamp adds microseconds on > each call we will lose the precision on save and some writes that should be > dropped could reappear. > Note views affected as well > > [https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054989#comment-17054989 ] Ryan Svihla commented on CASSANDRA-15557: - New [PR|https://github.com/apache/cassandra/pull/465] Note: I think this also happens to fix this behavior in CASSANDRA-15303 and made a new Jira for the issue this causes with dropped columns and precision https://issues.apache.org/jira/browse/CASSANDRA-15626 > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15626) Need microsecond precision for dropped columns so we can avoid timestamp issues
Ryan Svihla created CASSANDRA-15626: --- Summary: Need microsecond precision for dropped columns so we can avoid timestamp issues Key: CASSANDRA-15626 URL: https://issues.apache.org/jira/browse/CASSANDRA-15626 Project: Cassandra Issue Type: Improvement Components: Local/SSTable Reporter: Ryan Svihla In CASSANDRA-15557 the fix for the flaky test was reimplementing the logic from CASSANDRA-12997 which was removed as part of CASSANDRA-13426 However, since dropped columns are stored at a millisecond precision instead of a microsecond precision and ClientState.getTimestamp adds microseconds on each call we will lose the precision on save and some writes that should be dropped could reappear. Note views affected as well [https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15557: --- Labels: pull-request-available (was: ) > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15595) Many errors of "java.lang.AssertionError: Illegal bounds"
[ https://issues.apache.org/jira/browse/CASSANDRA-15595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054938#comment-17054938 ] Roy Burstein commented on CASSANDRA-15595: -- [~brandon.williams] - can you direct us what info you need in order to debug this issue ? > Many errors of "java.lang.AssertionError: Illegal bounds" > - > > Key: CASSANDRA-15595 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15595 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Yakir Gibraltar >Priority: Normal > Fix For: 3.11.7 > > > Hi, i'm running cassandra 3.11.6 and getting on all hosts many errors of: > {code} > ERROR [ReadStage-6] 2020-02-24 13:53:34,528 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[ReadStage-6,5,main] > java.lang.AssertionError: Illegal bounds [-2102982480..-2102982472); size: > 2761628520 > at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:345) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at org.apache.cassandra.io.util.Memory.getLong(Memory.java:254) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:234) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.CompressedChunkReader$Standard.readChunk(CompressedChunkReader.java:114) > ~[apache-cassandra-3.11.6.ja > r:3.11.6] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:158) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:2949) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$15(BoundedLocalCache.java:1807) > ~[caffeine-2.2.6.jar:na] > at > java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) > ~[na:1.8.0-zing_19.12.102.0] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1805) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1788) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66) > ~[caffeine-2.2.6.jar:na] > at > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:236) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:214) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:207) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.FileHandle.createReader(FileHandle.java:150) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(SSTableReader.java:1807) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.(AbstractSSTableIterator.java:103) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.columniterator.SSTableIterator.(SSTableIterator.java:49) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:65) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.StorageHook$1.makeRowIterator(StorageHook.java:100) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:982) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:693) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at >
[jira] [Commented] (CASSANDRA-15595) Many errors of "java.lang.AssertionError: Illegal bounds"
[ https://issues.apache.org/jira/browse/CASSANDRA-15595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054939#comment-17054939 ] Roy Burstein commented on CASSANDRA-15595: -- [~brandon.williams] - can you direct us what info you need in order to debug this issue ? > Many errors of "java.lang.AssertionError: Illegal bounds" > - > > Key: CASSANDRA-15595 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15595 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Yakir Gibraltar >Priority: Normal > Fix For: 3.11.7 > > > Hi, i'm running cassandra 3.11.6 and getting on all hosts many errors of: > {code} > ERROR [ReadStage-6] 2020-02-24 13:53:34,528 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[ReadStage-6,5,main] > java.lang.AssertionError: Illegal bounds [-2102982480..-2102982472); size: > 2761628520 > at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:345) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at org.apache.cassandra.io.util.Memory.getLong(Memory.java:254) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:234) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.CompressedChunkReader$Standard.readChunk(CompressedChunkReader.java:114) > ~[apache-cassandra-3.11.6.ja > r:3.11.6] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:158) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:2949) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$15(BoundedLocalCache.java:1807) > ~[caffeine-2.2.6.jar:na] > at > java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) > ~[na:1.8.0-zing_19.12.102.0] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1805) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1788) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97) > ~[caffeine-2.2.6.jar:na] > at > com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66) > ~[caffeine-2.2.6.jar:na] > at > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:236) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:214) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:207) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.util.FileHandle.createReader(FileHandle.java:150) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(SSTableReader.java:1807) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.columniterator.AbstractSSTableIterator.(AbstractSSTableIterator.java:103) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.columniterator.SSTableIterator.(SSTableIterator.java:49) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:65) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.StorageHook$1.makeRowIterator(StorageHook.java:100) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:982) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:693) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at >
[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054911#comment-17054911 ] Aleksandr Sorokoumov commented on CASSANDRA-14801: -- Thank you for a comprehensive response, [~benedict]! I am quite new to C* 4.0 code, so it will take me some time to ramp up. If anyone has planned to work on this issue in the next 1-2 weeks, it probably makes sense for me to work on something else. Otherwise, I'd be happy to contribute. In the latter case, in the next couple of days, I plan to read on how pending ranges are calculated and what changes since 3.11 introduced the bug. Then I'll write a test case that reproduces the issue. > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-beta > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054908#comment-17054908 ] Ryan Svihla commented on CASSANDRA-15557: - So looking at the alter schema logic more: [https://github.com/apache/cassandra/blob/08b2192da0eb6deddcd8f79cd180d069442223ae/src/java/org/apache/cassandra/cql3/statements/schema/AlterTableStatement.java#L398] and [https://github.com/apache/cassandra/blob/08b2192da0eb6deddcd8f79cd180d069442223ae/src/java/org/apache/cassandra/cql3/statements/schema/AlterTableStatement.java#L411-L426] it does seem (naively) reasonable to have it using the ClientState's getTimestamp() method in the AlterTableStatement since the ClientState is already there, but I'm sure I'm missing lots of background. Will wait for more experience people to weigh in. > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Fix For: 4.0-alpha > > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054887#comment-17054887 ] Ryan Svihla commented on CASSANDRA-15557: - So digging into the actual failure, this took a few tries as wrapping logging, or flushing sstables seemed to make it hard to reproduce, I've confirmed it's time based errors at least in this case: row ts: {{1583753957613001 }} dropped time: {{1583753957613000}} {{[junit-timeout] Testcase: testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest): FAILED }} {{[junit-timeout] Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001 }} {{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) }} {{[junit-timeout] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) }} {{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 column 2 (mycollection of type list), expected but got <[first element]> }} {{[junit-timeout] at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98) }} {{[junit-timeout]}} {{[junit-timeout]}} > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Fix For: 4.0-alpha > > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName
[ https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054887#comment-17054887 ] Ryan Svihla edited comment on CASSANDRA-15557 at 3/9/20, 11:56 AM: --- So digging into the actual failure, this took a few tries as wrapping logging, or flushing sstables seemed to make it hard to reproduce, I've confirmed it's time based errors at least in this case: row ts: 1583753957613001 dropped time: {{1583753957613000}} {{[junit-timeout] Testcase: testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest): FAILED }} {{[junit-timeout] Dropped column: {java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001 }} {{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: {java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) }} {{[junit-timeout] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) }} {{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 column 2 (mycollection of type list), expected but got <[first element]> }} {{[junit-timeout] at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98) }} {{[junit-timeout]}} {{[junit-timeout]}} was (Author: rssvihla): So digging into the actual failure, this took a few tries as wrapping logging, or flushing sstables seemed to make it hard to reproduce, I've confirmed it's time based errors at least in this case: row ts: {{1583753957613001 }} dropped time: {{1583753957613000}} {{[junit-timeout] Testcase: testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest): FAILED }} {{[junit-timeout] Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001 }} {{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row timestamp: 1583753957613001}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) }} {{[junit-timeout] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) }} {{[junit-timeout] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) }} {{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 column 2 (mycollection of type list), expected but got <[first element]> }} {{[junit-timeout] at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}} {{[junit-timeout] at org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98) }} {{[junit-timeout]}} {{[junit-timeout]}} > Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest > testDropListAndAddListWithSameName > --- > > Key: CASSANDRA-15557 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15557 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: Ryan Svihla >Priority: Normal > Fix For: 4.0-alpha > > > https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests > {code} > junit.framework.AssertionFailedError: Invalid value for row 0 column 2 > (mycollection of type list), expected but got <[first element]> > at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070) > at > org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054832#comment-17054832 ] Benedict Elliott Smith commented on CASSANDRA-14801: Nobody is actively working on it, but this is one of the most deceptively complex tickets that needs to be accomplished before 4.0 is released. I can see you work at DataStax, so perhaps you have the time and skill to dedicate to this, but please be confident before you address it, and be willing to wait a while for a sufficient review. The class in which the change is needed has had numerous bugs (and in fact has inherent conceptually bugs wrt range movements that are mostly out of scope to address here), so a great deal of care is needed. Ideally this ticket would attempt to address some of the ugliness that permitted the bug, and _certainly_ needs to be accompanied by a sophisticated-ish randomised correctness test. > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-beta > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054830#comment-17054830 ] Aleksandr Sorokoumov commented on CASSANDRA-14801: -- Is anyone working on this ticket? If not, I would like to work on it. > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 4.0-beta > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio reassigned CASSANDRA-15625: --- Assignee: Alex Lumpov (was: Antonio) > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Alex Lumpov >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio reassigned CASSANDRA-15625: --- Assignee: Antonio > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Assignee: Antonio >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15625) Nodetool toppartitions error
[ https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antonio updated CASSANDRA-15625: Description: c* version :3.0.15 here's my test table: CREATE TABLE app300.test ( a bigint PRIMARY KEY, b text, c text ) INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); when i use topartition :nodetool toppartitions app300 test 50,get error error: Expected 8 or 0 byte long (1048576) -- StackTrace -- org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long (1048576) at org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) but when i flush this table, topartition can work was: c* version :3.0.15 here's my test table: CREATE TABLE app300.test ( a bigint PRIMARY KEY, b text, c text ) INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); when i use topartition :nodetool toppartitions app300 test 50,get error error: Expected 8 or 0 byte long (1048576) -- StackTrace -- org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long (1048576) at org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) but when i flush this table, topartition can work > Nodetool toppartitions error > > > Key: CASSANDRA-15625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 > Project: Cassandra > Issue Type: Bug >Reporter: Antonio >Priority: Normal > > c* version :3.0.15 > here's my test table: > CREATE TABLE app300.test ( > a bigint PRIMARY KEY, > b text, > c text > ) > INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); > when i use topartition :nodetool toppartitions app300 test 50,get error > error: Expected 8 or 0 byte long (1048576) > -- StackTrace -- > org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long > (1048576) > at > org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) > at > org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) > at > org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > but when i flush this table, topartition can work > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15625) Nodetool toppartitions error
Antonio created CASSANDRA-15625: --- Summary: Nodetool toppartitions error Key: CASSANDRA-15625 URL: https://issues.apache.org/jira/browse/CASSANDRA-15625 Project: Cassandra Issue Type: Bug Reporter: Antonio c* version :3.0.15 here's my test table: CREATE TABLE app300.test ( a bigint PRIMARY KEY, b text, c text ) INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1'); when i use topartition :nodetool toppartitions app300 test 50,get error error: Expected 8 or 0 byte long (1048576) -- StackTrace -- org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long (1048576) at org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42) at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128) at org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) but when i flush this table, topartition can work -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15623) When running CQLSH with STDIN input, exit with error status code if script fails
[ https://issues.apache.org/jira/browse/CASSANDRA-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054803#comment-17054803 ] Jacob Becker commented on CASSANDRA-15623: -- [~jrwest], after taking a look at the code, I would say yes, I will provide a patch. I believe I can manage within a few days. As for the exit code being 1 or 2, it is debatable indeed as, AFAIK, there is no (generic) specification in this regard. What is truly important is that it is not 0 (and it is not), so I wasn't sure if the subject is even worth a new ticket. I personally can live just fine with 2, I mentioned it only because, from my experience, anything above 1 usually has some underlying reason (ideally - explained in documentation); from what I can tell, there is no such reason in this case (especially considering the script *never* exits with 1) and no mention in the documentation. > When running CQLSH with STDIN input, exit with error status code if script > fails > > > Key: CASSANDRA-15623 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15623 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Tools >Reporter: Jacob Becker >Priority: Normal > > Assuming CASSANDRA-6344 is in place for years and considering that scripts > submitted with the `-e` option behave in a similar fashion, it is very > surprising that scripts submitted to STDIN (i.e. piped in) always exit with a > zero code, regardless of errors. I believe this should be fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14517) Short read protection can cause partial updates to be read
[ https://issues.apache.org/jira/browse/CASSANDRA-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054790#comment-17054790 ] ZhaoYang commented on CASSANDRA-14517: -- This sounds like "repeatable read" issue.. I don't think Cassandra ever provides any read isolation level.. [~bdeggleston] do you think this issue blocks 4.0 release? should we move it into backlog for future reference? > Short read protection can cause partial updates to be read > -- > > Key: CASSANDRA-14517 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14517 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > If a read is performed in two parts due to short read protection, and the > data being read is written to between reads, the coordinator will return a > partial update. Specifically, this will occur if a single partition batch > updates clustering values on both sides of the SRP break, or if a range > tombstone is written that deletes data on both sides of the break. At the > coordinator level, this breaks the expectation that updates to a partition > are atomic, and that you can’t see partial updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15624) Avoid lazy initializing shut down instances when trying to send them messages
Marcus Eriksson created CASSANDRA-15624: --- Summary: Avoid lazy initializing shut down instances when trying to send them messages Key: CASSANDRA-15624 URL: https://issues.apache.org/jira/browse/CASSANDRA-15624 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson We currently use {{to.broadcastAddressAndPort()}} when figuring out if we should send a message to an instance, if that instance has been shut down it will get re-initialized but not startup:ed which makes the tests fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054782#comment-17054782 ] Benedict Elliott Smith edited comment on CASSANDRA-15397 at 3/9/20, 9:18 AM: - bq. I couldn't do so since the code uses a generic that's comparable The {{IntervalTree}} is used in precisely one place in the codebase, so it would be possible to hardcode to this use case for improved performance. bq. I'm not sure if assuming long will be a good idea I would be very surprised if it is not significantly faster. Particularly in tests that correctly account for memory latency (i.e. ensure the data is not entirely held in CPU cache before the test begins). was (Author: benedict): bq. I couldn't do so since the code uses a generic that's comparable The {{IntervalTree}} is used in precisely one place in the codebase, so it would be possible to hardcode to this use case for improved performance. bq. I'm not sure if assuming long will be a good idea I would be very surprised if it is not significantly faster. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Labels: pull-request-available > Attachments: 90p_100k_sstables_with_1000_searches.png, > 90p_1million_sstables_with_1000_searches.png, > 90p_250k_sstables_with_1000_searches.png, > 90p_500k_sstables_with_1000_searches.png, > 90p_750k_sstables_with_1000_searches.png, > 95p_1_SSTable_with_5000_Searches.png, > 95p_100k_sstables_with_1000_searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_1million_sstables_with_1000_searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_250k_sstables_with_1000_searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 95p_500k_sstables_with_1000_searches.png, > 95p_750k_sstables_with_1000_searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_100k_sstables_with_1000_searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_1million_sstables_with_1000_searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_250k_sstables_with_1000_searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, > 99p_500k_sstables_with_1000_searches.png, > 99p_750k_sstables_with_1000_searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_100k_sstables_with_1000_searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_1million_sstables_with_1000_searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_250k_sstables_with_1000_searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png, > Mean_500k_sstables_with_1000_searches.png, > Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, > replace_intervaltree_with_intervallist.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the
[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054782#comment-17054782 ] Benedict Elliott Smith commented on CASSANDRA-15397: bq. I couldn't do so since the code uses a generic that's comparable The {{IntervalTree}} is used in precisely one place in the codebase, so it would be possible to hardcode to this use case for improved performance. bq. I'm not sure if assuming long will be a good idea I would be very surprised if it is not significantly faster. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Labels: pull-request-available > Attachments: 90p_100k_sstables_with_1000_searches.png, > 90p_1million_sstables_with_1000_searches.png, > 90p_250k_sstables_with_1000_searches.png, > 90p_500k_sstables_with_1000_searches.png, > 90p_750k_sstables_with_1000_searches.png, > 95p_1_SSTable_with_5000_Searches.png, > 95p_100k_sstables_with_1000_searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_1million_sstables_with_1000_searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_250k_sstables_with_1000_searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 95p_500k_sstables_with_1000_searches.png, > 95p_750k_sstables_with_1000_searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_100k_sstables_with_1000_searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_1million_sstables_with_1000_searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_250k_sstables_with_1000_searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, > 99p_500k_sstables_with_1000_searches.png, > 99p_750k_sstables_with_1000_searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_100k_sstables_with_1000_searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_1million_sstables_with_1000_searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_250k_sstables_with_1000_searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png, > Mean_500k_sstables_with_1000_searches.png, > Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, > replace_intervaltree_with_intervallist.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15566) Repair coordinator can hang under some cases
[ https://issues.apache.org/jira/browse/CASSANDRA-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054748#comment-17054748 ] ZhaoYang commented on CASSANDRA-15566: -- Thanks for the update.. I don't have any concrete implementation details in mind yet, C* 4.0 code is quite new to me... Based on my understanding on 3.x, the main reasons for repair hanging are: # request/response messages got dropped if exceeding expiration time which is 10s.. # internode connections are closed and clear all queued messages due to network or gossip status changes.. # participant crashed. # failure response was not sent to coordinator in {{RepairMessageVerbHandler.doVerb()}} in case of unknown exception. currently it only handles dropped tables.. # participant is indeed making progress but very slow during validation because disk IO throttle. For problem #1-2, I am thinking to make repair message idempotent and sender will periodically resend message until it got a reply. For problem #3, make sure repair manager responds to endpoint status changes(eg. up/down/remove, etc..) if it doesn't do it already. For problem #4, make sure all exceptions are caught and responded with failure. need to add some failure injections to dtests. For problem #5, as you suggested in CASSANDRA-15399, coordinator should be able to check participants' in-mem virtual table to determine if it's making progress. In order to make repair great again, i think it's important to be able to identify hanged repairs automatically (even with some false-positive) and abort those hanged repairs by nodetool. Because I don't expect repair operations to be run by operators manually. On production, it should be managed by automation tool, like repair service or reaper which will abort and retry hanged repair.. It can probably be done in CASSANDRA-15399 or a separate ticket.. > Repair coordinator can hang under some cases > > > Key: CASSANDRA-15566 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15566 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 4.0-beta > > > Repair coordination makes a few assumptions about message delivery which > cause it to hang forever when those assumptions don’t hold true: fire and > forget will not get rejected (participate has an issue and rejects the > message), and a very delayed message will one day be seen (messaging can be > dropped under load or when failure detector thinks a node is bad but is just > GCing). > Given this and the desire to have better observability with repair (see > CASSANDRA-15399), coordination should be changed into a request/response > pattern (with retries) and polling (validation status and MerkleTree > sending). This would allow the coordinator to detect changes in state (it > was known participate was working on validation, but it no longer knows about > the validation task), and to be able to recover from ephemeral issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org