[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest

2020-03-09 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055548#comment-17055548
 ] 

David Capwell commented on CASSANDRA-15313:
---

bq. Reverting CASSANDRA-1 fixes corruptionCausesFailure with seed 
71671740653044L for me

Sorry I don't fully follow, can you elaborate [~spod]?  There are 3 issues with 
the test/feature (may have missed something, going off memory).

1) corruption can cause lz4 to crash the JVM.  This was fixed in 
CASSANDRA-15556 by using the "safe" methods rather than "fast"
2) corrupted lz4 stream may not fail and may produce output != input; this is 
still an issue and the tests fail periodically with this.
3) generators generated too much garbage.  CASSANDRA-1 switched to fixed 
memory and switched from strings (charset depends on the test environment since 
it doesn't define which charset to use) to raw bytes.  Given the fact the 
generated data changed the seeds which failed before no longer fail, and the 
seeds that fail now did not fail with the old generators; both generators were 
able to reproduce #2, but with different seeds.

> Fix flaky - ChecksummingTransformerTest - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> ---
>
> Key: CASSANDRA-15313
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15313
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15313-hack.patch
>
>
> During the recent runs, this test appears to be flaky.
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94]
> corruptionCausesFailure-compression - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> {code:java}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at org.quicktheories.impl.Precursor.(Precursor.java:17)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23)
>   at org.quicktheories.generators.Retry.generate(CodePoints.java:51)
>   at 
> org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190)
>   at 
> org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184)
>   at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93)
>   at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown 
> Source)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188)
>   at 
> org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.impl.Core.generate(Core.java:150)
>   at org.quicktheories.impl.Core.shrink(Core.java:103)
>   at org.quicktheories.impl.Core.run(Core.java:39)
>   at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35)
>   at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162)
>   at 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055533#comment-17055533
 ] 

Yifan Cai edited comment on CASSANDRA-15338 at 3/10/20, 2:55 AM:
-

The test failure was not able to be reproduced when simply running it from my 
laptop. 
 
However, it can be easily reproduced when running in a docker container with 
limited CPUs (i.e., 2). 
 
After multiple runs, the observation was that the test runs only failed when 
testing with LargeMessage. It indicated that the failures were probably related 
with {{LargeMessageDelivery}}. 
 
The following is what I think have happened. 
# When the {{inbound}} just opened and the first message gets queued into the 
{{outbound}}, handshake happens and the execution was deferred once the 
connection was established (executeAgain). 
# Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs 
immediately. The effect is that the runnable gets registered, but not run yet. 
# Connection is established, so we {{executeAgain()}}. Because the runnable 
{{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still 
false. The test runs the runnable, which counts down {{deliveryDone}} 
unexpectedly. 
# Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the 
flush is async and race condition can happen.
   ## when the inbound has received message (and countdown receiveDone)
   ## {{LargeMessageDelivery}} is still polling for the completion of flush, so 
not yet release capacity. 

Therefore, the assertion on the pendingCount failed. 
 
There are 2 places in the test flow are (or can go) wrong. See step 3 and step 
4. 

Regarding step 3, the runnable {{stopAndRun}} should not be registered when 
establishing the connection. In production, is there a case that a 
{{stopAndRun}} being registered this early? Probably not.

Regarding step 4, the {{outbound}} has no knowledge about whether the 
{{inbound}} has received any message. Test should register the runnable 
{{stopAndRun}} at the message handler to count down the {{deliveryDone}}. 
Therefore, the runnable can correctly wait for the current delivery to 
complete. Then it runs. 
 
PR is here: https://github.com/apache/cassandra/pull/466

As mentioned, I reproduced using the docker. Here is the bundle that one can 
simply download and run.  [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} 
repeatedly until failures.
I have included the patch within the image too. 

To reproduce, run
{code:bash}
bash build_and_run.sh
{code}

To see the runs with the patch, run
{code:bash}
bash build_and_run.sh patched
{code}


was (Author: yifanc):
The test failure was not able to reproduce when simply running it from my 
laptop. 
 
However, it can be easily reproduced when running in a docker container with 
limited CPUs (i.e., 2). 
 
After multiple runs, the observation was that the test runs only failed when 
testing with LargeMessage. It indicated that the failures were probably related 
with {{LargeMessageDelivery}}. 
 
The following is what I think have happened. 
# When the {{inbound}} just opened and the first message gets queued into the 
{{outbound}}, handshake happens and the execution was deferred once the 
connection was established (executeAgain). 
# Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs 
immediately. The effect is that the runnable gets registered, but not run yet. 
# Connection is established, so we {{executeAgain()}}. Because the runnable 
{{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still 
false. The test runs the runnable, which counts down {{deliveryDone}} 
unexpectedly. 
# Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the 
flush is async and race condition can happen.
   ## when the inbound has received message (and countdown receiveDone)
   ## {{LargeMessageDelivery}} is still polling for the completion of flush, so 
not yet release capacity. 

Therefore, the assertion on the pendingCount failed. 
 
There are 2 places in the test flow are (or can go) wrong. See step 3 and step 
4. 

Regarding step 3, the runnable {{stopAndRun}} should not be registered when 
establishing the connection. In production, is there a case that a 
{{stopAndRun}} being registered this early? Probably not.

Regarding step 4, the {{outbound}} has no knowledge about whether the 
{{inbound}} has received any message. Test should register the runnable 
{{stopAndRun}} at the message handler to count down the {{deliveryDone}}. 
Therefore, the runnable can correctly wait for the current delivery to 
complete. Then it runs. 
 
PR is here: https://github.com/apache/cassandra/pull/466

As mentioned, I reproduced using the docker. Here is the bundle that one can 
simply download and run.  [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} 
repeatedly until failures.
I have 

[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15338:
--
Test and Documentation Plan: unit test
 Status: Patch Available  (was: Open)

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15338:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
 Severity: Low
 Assignee: Yifan Cai
   Status: Open  (was: Triage Needed)

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055533#comment-17055533
 ] 

Yifan Cai commented on CASSANDRA-15338:
---

The test failure was not able to reproduce when simply running it from my 
laptop. 
 
However, it can be easily reproduced when running in a docker container with 
limited CPUs (i.e., 2). 
 
After multiple runs, the observation was that the test runs only failed when 
testing with LargeMessage. It indicated that the failures were probably related 
with {{LargeMessageDelivery}}. 
 
The following is what I think have happened. 
# When the {{inbound}} just opened and the first message gets queued into the 
{{outbound}}, handshake happens and the execution was deferred once the 
connection was established (executeAgain). 
# Since enqueue is not blocking, the next line, {{unsafeRunOnDelivery}} runs 
immediately. The effect is that the runnable gets registered, but not run yet. 
# Connection is established, so we {{executeAgain()}}. Because the runnable 
{{stopAndRun}} is present, and at this point, the {{inProgress}} flag is still 
false. The test runs the runnable, which counts down {{deliveryDone}} 
unexpectedly. 
# Delivery proceeds to flush the message. In {{LargeMessageDelivery}}, the 
flush is async and race condition can happen.
   ## when the inbound has received message (and countdown receiveDone)
   ## {{LargeMessageDelivery}} is still polling for the completion of flush, so 
not yet release capacity. 

Therefore, the assertion on the pendingCount failed. 
 
There are 2 places in the test flow are (or can go) wrong. See step 3 and step 
4. 

Regarding step 3, the runnable {{stopAndRun}} should not be registered when 
establishing the connection. In production, is there a case that a 
{{stopAndRun}} being registered this early? Probably not.

Regarding step 4, the {{outbound}} has no knowledge about whether the 
{{inbound}} has received any message. Test should register the runnable 
{{stopAndRun}} at the message handler to count down the {{deliveryDone}}. 
Therefore, the runnable can correctly wait for the current delivery to 
complete. Then it runs. 
 
PR is here: https://github.com/apache/cassandra/pull/466

As mentioned, I reproduced using the docker. Here is the bundle that one can 
simply download and run.  [^CASS-15338-Docker.zip] It runs {{ConnectionTest}} 
repeatedly until failures.
I have included the patch within the image too. 

To reproduce, run
{code:bash}
bash build_and_run.sh
{code}

To see the runs with the patch, run
{code:bash}
bash build_and_run.sh patched
{code}

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15338:
--
Attachment: CASS-15338-Docker.zip

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15338:
---
Labels: pull-request-available  (was: )

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric

2020-03-09 Thread Chris Lohfink (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-15620:
--
Reviewers: Chris Lohfink, Chris Lohfink  (was: Chris Lohfink, Chris Lohfink)
   Chris Lohfink, Chris Lohfink  (was: Chris Lohfink)
   Status: Review In Progress  (was: Patch Available)

> Add "unleveled sstables" table metric
> -
>
> Key: CASSANDRA-15620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15620
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> The number of unleveled sstables is an important indicator that deserves to 
> be a dedicated table metric on its own. This will also add a global gauge 
> that is convenient to query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric

2020-03-09 Thread Chris Lohfink (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-15620:
--
Status: Ready to Commit  (was: Review In Progress)

> Add "unleveled sstables" table metric
> -
>
> Key: CASSANDRA-15620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15620
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> The number of unleveled sstables is an important indicator that deserves to 
> be a dedicated table metric on its own. This will also add a global gauge 
> that is convenient to query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15620) Add "unleveled sstables" table metric

2020-03-09 Thread Chris Lohfink (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-15620:
--
Test and Documentation Plan: na
 Status: Patch Available  (was: Open)

> Add "unleveled sstables" table metric
> -
>
> Key: CASSANDRA-15620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15620
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> The number of unleveled sstables is an important indicator that deserves to 
> be a dedicated table metric on its own. This will also add a global gauge 
> that is convenient to query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15620) Add "unleveled sstables" table metric

2020-03-09 Thread Chris Lohfink (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055509#comment-17055509
 ] 

Chris Lohfink commented on CASSANDRA-15620:
---

+1 on code, just spot checked with lcs and stcs as well and worked great thanks!

> Add "unleveled sstables" table metric
> -
>
> Key: CASSANDRA-15620
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15620
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
>Priority: Normal
>
> The number of unleveled sstables is an important indicator that deserves to 
> be a dedicated table metric on its own. This will also add a global gauge 
> that is convenient to query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15627) sstable not in the corresponding level in the leveled manifest

2020-03-09 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15627:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: API / 
Semantic Implementation(12988)
   Complexity: Normal
Discovered By: Workload Replay
 Severity: Normal
   Status: Open  (was: Triage Needed)

> sstable not in the corresponding level in the leveled manifest
> --
>
> Key: CASSANDRA-15627
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15627
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Local/Compaction/LCS
>Reporter: David Capwell
>Priority: Normal
>
> I get the following warning logs when running smoke tests
> bq. Live sstable 
> /cassandra/d1/data/ks/table-cce7c54b5abf3f369bb7659a74e9e963/mf-71-big-Data.db
>  from level 0 is not on corresponding level in the leveled manifest. This is 
> not a problem per se, but may indicate an orphaned sstable due to a failed 
> compaction not cleaned up properly.
> There are no other warning logs and no error logs; so compaction doesn’t have 
> anything saying there was a failure.
> Schema
> {code}
> CREATE TABLE ks.table (
>   pk1 ascii,
>   pk2 bigint,
>   ck1 ascii,
>   ck2 ascii,
>   ck3 ascii,
>   v1 int,
>   v2 ascii, 
>   PRIMARY KEY ((pk1,pk2), ck1, ck2, ck3)
> ) WITH comment = 'test table'
>   AND gc_grace_seconds = 1
>   AND memtable_flush_period_in_ms = 100
>   AND compression = {'class': 'LZ4Compressor'}
>   AND compaction = {'class': 'LeveledCompactionStrategy', 
> 'only_purge_repaired_tombstones': true}
>   AND CLUSTERING ORDER BY (ck1 DESC,ck2 ASC,ck3 DESC);
> {code}
> test
> * run simulated queries for 30 minutes
> * run incremental repair in a loop (once one completes run the next)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15627) sstable not in the corresponding level in the leveled manifest

2020-03-09 Thread David Capwell (Jira)
David Capwell created CASSANDRA-15627:
-

 Summary: sstable not in the corresponding level in the leveled 
manifest
 Key: CASSANDRA-15627
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15627
 Project: Cassandra
  Issue Type: Bug
  Components: Local/Compaction, Local/Compaction/LCS
Reporter: David Capwell


I get the following warning logs when running smoke tests

bq. Live sstable 
/cassandra/d1/data/ks/table-cce7c54b5abf3f369bb7659a74e9e963/mf-71-big-Data.db 
from level 0 is not on corresponding level in the leveled manifest. This is not 
a problem per se, but may indicate an orphaned sstable due to a failed 
compaction not cleaned up properly.

There are no other warning logs and no error logs; so compaction doesn’t have 
anything saying there was a failure.

Schema

{code}
CREATE TABLE ks.table (
  pk1 ascii,
  pk2 bigint,
  ck1 ascii,
  ck2 ascii,
  ck3 ascii,
  v1 int,
  v2 ascii, 
  PRIMARY KEY ((pk1,pk2), ck1, ck2, ck3)
) WITH comment = 'test table'
  AND gc_grace_seconds = 1
  AND memtable_flush_period_in_ms = 100
  AND compression = {'class': 'LZ4Compressor'}
  AND compaction = {'class': 'LeveledCompactionStrategy', 
'only_purge_repaired_tombstones': true}
  AND CLUSTERING ORDER BY (ck1 DESC,ck2 ASC,ck3 DESC);
{code}

test
* run simulated queries for 30 minutes
* run incremental repair in a loop (once one completes run the next)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12510) Disallow decommission when number of replicas will drop below configured RF

2020-03-09 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055361#comment-17055361
 ] 

Stefan Miklosovic commented on CASSANDRA-12510:
---

Isnt the same logic applicable to _drain_ ? 

> Disallow decommission when number of replicas will drop below configured RF
> ---
>
> Key: CASSANDRA-12510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
> Environment: C* version 3.3
>Reporter: Atin Sood
>Assignee: Kurt Greaves
>Priority: Low
>  Labels: lhf
> Fix For: 4.0
>
> Attachments: 12510-3.x-v2.patch, 12510-3.x.patch
>
>
> Steps to replicate :
> - Create a 3 node cluster in DC1 and create a keyspace test_keyspace with 
> table test_table with replication strategy NetworkTopologyStrategy , DC1=3 . 
> Populate some data into this table.
> - Add 5 more nodes to this cluster, but in DC2. Also do not alter the 
> keyspace to add the new DC2 to replication (this is intentional and the 
> reason why the bug shows up). So the desc keyspace should still list 
> NetworkTopologyStrategy with DC1=3 as RF
> - As expected, this will now be a 8 node cluster with 3 nodes in DC1 and 5 in 
> DC2
> - Now start decommissioning the nodes in DC1. Note that the decommission runs 
> fine on all the 3 nodes, but since the new nodes are in DC2 and the RF for 
> keyspace is restricted to DC1, the new 5 nodes won't get any data.
> - You will now end with the 5 node cluster which has no data from the 
> decommissioned 3 nodes and hence ending up in data loss
> I do understand that this problem could have been avoided if we perform an 
> alter stmt and add DC2 replication before adding the 5 nodes. But the fact 
> that decommission ran fine on the 3 nodes on DC1 without complaining that 
> there were no nodes to stream its data seems a little discomforting. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-sidecar] branch master updated: Ninja fix changelog

2020-03-09 Thread rustyrazorblade
This is an automated email from the ASF dual-hosted git repository.

rustyrazorblade pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git


The following commit(s) were added to refs/heads/master by this push:
 new 2c5f484  Ninja fix changelog
2c5f484 is described below

commit 2c5f4841479d5ff80a21540ec4e2fa5344a52251
Author: Jon Haddad 
AuthorDate: Mon Mar 9 11:11:17 2020 -0700

Ninja fix changelog
---
 CHANGES.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CHANGES.txt b/CHANGES.txt
index 49d7800..00defa6 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,5 +1,6 @@
 1.0.0
 -
+ * Build and Test with both Java 8 & 11 in Circle CI (CASSANDRA-15611)
  * Upgraded Gradle and replaced FindBugs with SpotBugs (CASSANDRA-15610)
  * Improving local HealthCheckTest reliability (CASSANDRA-15615)
  * Read sidecar.yaml from sidecar.config System Property instead of classpath 
(CASSANDRA-15288)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-sidecar] branch master updated: Improving CircleCI build reliability

2020-03-09 Thread rustyrazorblade
This is an automated email from the ASF dual-hosted git repository.

rustyrazorblade pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-sidecar.git


The following commit(s) were added to refs/heads/master by this push:
 new 595fea7  Improving CircleCI build reliability
595fea7 is described below

commit 595fea7d97f0d87ac9b9a1510379a6faa6a29abf
Author: Jon Haddad 
AuthorDate: Wed Mar 4 13:56:46 2020 -0800

Improving CircleCI build reliability

Switched to Circle machine image - docker has issues with networking in 
tests
Fix storing of test results
Updated readme with Java 11
Upgrade vertx
Wait for vertx server startup before sending requests
Update simulacron to latest bug fix version
added spotbugs exclude config to avoid incorrect NPE error on java 11
Configure CircleCi to run tests with Java 11

Patch by Jon Haddad; Reviewed by Dinesh Joshi for CASSANDRA-15611
---
 .circleci/config.yml   | 89 --
 README.md  | 11 ++-
 build.gradle   | 17 +++--
 .../sidecar/HealthServiceIntegrationTest.java  |  1 +
 src/main/resources/spotbugs-exclude.xml| 14 
 .../sidecar/AbstractHealthServiceTest.java | 10 ++-
 6 files changed, 110 insertions(+), 32 deletions(-)

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 8ab909d..690b4a6 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -2,42 +2,87 @@
 #
 # Check https://circleci.com/docs/2.0/language-java/ for more details
 #
-version: 2
-jobs:
-  build:
-docker:
-  - image: circleci/openjdk:8-jdk
+version: 2.1
 
+# need to reuse the same base environment for several tests
+aliases:
+  base_job: _job
+machine:
+  image: ubuntu-1604:201903-01
 working_directory: ~/repo
-
 environment:
   TERM: dumb
 
+# we might modify this in the future to accept a parameter for the java 
package to install
+commands:
+  install_java:
+description: "Installs Java 8 using AdoptOpenJDK"
+parameters:
+  version:
+type: string
+
 steps:
-  - checkout
+  - run: wget -qO - 
https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | sudo apt-key 
add -
+  - run: sudo add-apt-repository --yes 
https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/
+  - run: sudo apt-get update
+  - run: sudo apt-get install -y << parameters.version>>
+
+  install_common:
+description: "Installs common software and certificates"
+steps:
+  - run: sudo apt-get update
+  - run: sudo apt-get install apt-transport-https ca-certificates curl 
gnupg-agent software-properties-common
 
-  # Download and cache dependencies
-  - restore_cache:
-  keys:
-- v1-dependencies-{{ checksum "build.gradle" }}
-# fallback to using the latest cache if no exact match is found
-- v1-dependencies-
+jobs:
+  java8:
+<<: *base_job
 
-  - run: ./gradlew dependencies
+steps:
+  - checkout
+  - install_common
+  
+  - install_java:
+  version: adoptopenjdk-8-hotspot
 
-  - save_cache:
-  paths:
-- ~/.gradle
-  key: v1-dependencies-{{ checksum "build.gradle" }}
+  - run: sudo update-java-alternatives -s adoptopenjdk-8-hotspot-amd64 && 
java -version
 
   # make sure it builds with build steps like swagger docs and dist
-  - run: ./gradlew build
+  - run: ./gradlew build --stacktrace
+
+  - store_artifacts:
+  path: build/reports
+  destination: test-reports
+
+  - store_test_results:
+  path: ~/repo/build/test-results/
+
+  java11:
+<<: *base_job
+steps:
+  - checkout
+  - install_common
 
-  # run tests!
-  - run: ./gradlew check
+  - install_java:
+  version: adoptopenjdk-11-hotspot
+
+  - run: sudo update-java-alternatives -s adoptopenjdk-11-hotspot-amd64 && 
java -version
+
+  - run: ./gradlew build --stacktrace
 
   - store_artifacts:
   path: build/reports
   destination: test-reports
+
   - store_test_results:
-  path: build/reports
\ No newline at end of file
+  path: ~/repo/build/test-results/
+
+workflows:
+  version: 2
+
+  test_java_8:
+jobs:
+  - java8
+
+  test_java_11:
+jobs:
+  - java11
\ No newline at end of file
diff --git a/README.md b/README.md
index 327948b..f0e29b9 100644
--- a/README.md
+++ b/README.md
@@ -7,8 +7,8 @@ For more information, see [the Apache Cassandra web 
site](http://cassandra.apach
 
 Requirements
 
-  1. Java >= 1.8 (OpenJDK or Oracle)
-  2. Apache Cassandra 4.0
+  1. Java >= 1.8 (OpenJDK or Oracle), or Java 11
+  2. Apache Cassandra 4.0.  We depend on virtual tables which is a 4.0 only 
feature.
 
 Getting started
 ---
@@ -20,6 +20,13 @@ Apache Cassandra running on 

[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-03-09 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15611:
---
  Fix Version/s: 4.0-alpha
Source Control Link: 
https://github.com/apache/cassandra-sidecar/commit/595fea7d97f0d87ac9b9a1510379a6faa6a29abf
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-03-09 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15611:
-
Status: Ready to Commit  (was: Review In Progress)

+1 LGTM!

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-03-09 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15611:
-
Reviewers: Dinesh Joshi, Dinesh Joshi  (was: Dinesh Joshi)
   Dinesh Joshi, Dinesh Joshi  (was: Dinesh Joshi)
   Status: Review In Progress  (was: Patch Available)

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2020-03-09 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-14365:
---
Fix Version/s: 4.x
   3.11.x
   3.0.x
   2.2.x

> Commit log replay failure for static columns with collections in clustering 
> keys
> 
>
> Key: CASSANDRA-14365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> In the old storage engine, static cells with a collection as part of the 
> clustering key fail to validate because a 0 byte collection (like in the cell 
> name of a static cell) isn't valid.
> To reproduce:
> 1.
> {code:java}
> CREATE TABLE test.x (
> id int,
> id2 frozen>,
> st int static,
> PRIMARY KEY (id, id2)
> );
> INSERT INTO test.x (id, st) VALUES (1, 2);
> {code}
> 2.
>  Kill the cassandra process
> 3.
>  Restart cassandra to replay the commitlog
> Outcome:
> {noformat}
> ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mutation; saved to 
> /tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not 
> enough bytes to read a set
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
>  [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:565)
>  [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:517)
>  [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:397)
>  [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:143)
>  [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:181) 
> [main/:na]
> at 
> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:161) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:284) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:533)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:642) 
> [main/:na]
> {noformat}
> I haven't investigated if there are other more subtle issues caused by these 
> cells failing to validate other places in the code, but I believe the fix for 
> this is to check for 0 byte length collections and accept them as valid as we 
> do with other types.
> I haven't had a chance for any extensive testing but this naive patch seems 
> to have the desired affect. 
> ||Patch||
> |[2.2 
> PoC|https://github.com/vincewhite/cassandra/commits/zero_length_collection]|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15388) Add compaction allocation measurement test to support compaction gc optimization.

2020-03-09 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055216#comment-17055216
 ] 

David Capwell commented on CASSANDRA-15388:
---

bq. This is not meant to be in a state where it can be plugged into our ci 
process.

Sure, would be good for this to evolve over time but not a blocker for this.

New changes are fine, only nits really left (though would prefer isAgentLoaded 
since logs are too dense its easy to miss)

+1

> Add compaction allocation measurement test to support compaction gc 
> optimization. 
> --
>
> Key: CASSANDRA-15388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15388
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> This adds a test that is able to quickly and accurately measure the effect of 
> potential gc optimizations against a wide range of (synthetic) compaction 
> workloads. This test accurately measures allocation rates from 16 workloads 
> in less that 2 minutes.
> This test uses google’s {{java-allocation-instrumenter}} agent to measure the 
> workloads. Measurements using this agent are very accurate and pretty 
> repeatable from run to run, with most variance being negligible (1-2 bytes 
> per partition), although workloads with larger but fewer partitions vary a 
> bit more (still less that 0.03%).
> The thinking behind this patch is that with compaction, we’re generally 
> interested in the memory allocated per partition, since garbage scales more 
> or less linearly with the number of partitions compacted. So measuring 
> allocation from a small number of partitions that otherwise represent real 
> world use cases is a good enough approximation.
> In addition to helping with compaction optimizations, this test could be used 
> as a template for future optimization work. This pattern could also be used 
> to set allocation limits on workloads/operations and fail CI if the 
> allocation behavior changes past some threshold. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055189#comment-17055189
 ] 

Benjamin Lerer commented on CASSANDRA-15557:


Sorry, there is an issue with the patch as pointed out by [~jasonstack] in  
CASSANDRA-15303. The timestamp need to be set during the {{execution}} phase 
and not during the {{prepare}} one. Otherwise if the statement is prepared by 
the user it will reuse the same timestamp everytime it is executed. 

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15557:
---
Status: Patch Available  (was: Ready to Commit)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15557:
--
Reviewers: Benjamin Lerer, Aleksey Yeschenko  (was: Aleksey Yeschenko, 
Benjamin Lerer)
   Benjamin Lerer, Aleksey Yeschenko  (was: Benjamin Lerer)
   Status: Review In Progress  (was: Patch Available)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055175#comment-17055175
 ] 

Aleksey Yeschenko commented on CASSANDRA-15557:
---

LGTM as well.

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15557:
--
Status: Ready to Commit  (was: Review In Progress)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent

2020-03-09 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055165#comment-17055165
 ] 

David Capwell commented on CASSANDRA-15564:
---

[~ifesdjeen] [~jasonstack] replied or changed based off feedback; please review

> Refactor repair coordinator so errors are consistent
> 
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated 
> out.
> Currently the repair coordinator special cases the exit cases at each call 
> site; this makes it so that errors can be inconsistent and there are cases 
> where proper complete isn't done (proper notifications, and forgetting to 
> update ActiveRepairService).
> [Circle 
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-03-09 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055162#comment-17055162
 ] 

Benedict Elliott Smith commented on CASSANDRA-15369:


I think _probably_ it is preferable to generate fake row deletions where 
possible, since their semantics are much better than range tombstones.  If the 
user is lucky, they might never see a range tombstone.

Since it's anyway impossible today to deal with range tombstones, we need a 
separate effort there, and so it's probably reasonable to leave unsolved for 
now the cases that _require_ fake RTs.  We will either need to guarantee RTs 
are replicated as inserted (without any subdivisions we currently produce) or 
that they are only accounted for in digest via non-RT data (since otherwise 
there seems no possible way to ensure a consistent digest for a response).  
Either way, it's probably better to do our best to avoid the scenario 
altogether, and use row deletions wherever possible.

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent

2020-03-09 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055146#comment-17055146
 ] 

Alex Petrov commented on CASSANDRA-15564:
-

[~dcapwell] thank you for the patch!The change looks good overall. I've added 
several small comments on github. As discussed offline, we also need to change 
the initialization order to make sure outbound message sending is wrapping fake 
messaging and not vice versa.

I'm wondering whether we should stick to {{runInbound}} in builder, or we 
should switch to `filters().inbound()` or something similar, where `filters()` 
would return some interface that has `inbound` and `outbound`. This could even 
leave most of the things more or less same implementation-wise.

Should we add a test that ensures the order (in other words, any message first 
goes through outbound, and only then through inbound filter)? 

Also, it might make sense to test both in- and out-bound filters in 
{{testMessageMatching}}, wdyt?

> Refactor repair coordinator so errors are consistent
> 
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated 
> out.
> Currently the repair coordinator special cases the exit cases at each call 
> site; this makes it so that errors can be inconsistent and there are cases 
> where proper complete isn't done (proper notifications, and forgetting to 
> update ActiveRepairService).
> [Circle 
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15303) drop column statement should not initialize timestamp because of statement cache

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15303:
---
Reviewers: Benjamin Lerer

> drop column statement should not initialize timestamp because of statement 
> cache
> 
>
> Key: CASSANDRA-15303
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15303
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL/Interpreter
>Reporter: ZhaoYang
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.x
>
>
> When executing drop-column query without timestamp, 
> {{AlterTableStatement#Raw}} initializes a default timestamp and then the 
> prepared statement is cached. The same timestamp will be reused for the same 
> drop-column query.  (related to CASSANDRA-13426)
>  
> The fix is to use NULL timestamp to indicate: using statement execution time 
> instead.
>  
> patch: 
> [https://github.com/jasonstack/cassandra/commits/fix-drop-column-timestamp]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-03-09 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055129#comment-17055129
 ] 

ZhaoYang edited comment on CASSANDRA-15369 at 3/9/20, 4:11 PM:
---

bq.  initially addressing only the differing ways we create fake deletions

do you mean by unifying the tombstone creation from memtable/ 
sstable/slice-query to only range tombstone markers?


was (Author: jasonstack):
bq.  initially addressing only the differing ways we create fake deletions

do you mean by unifying the tombstone creation from memtable/ 
sstable/slice-query to only row tombstone markers?

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15557:
---
Status: Patch Available  (was: Ready to Commit)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-03-09 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055129#comment-17055129
 ] 

ZhaoYang commented on CASSANDRA-15369:
--

bq.  initially addressing only the differing ways we create fake deletions

do you mean by unifying the tombstone creation from memtable/ 
sstable/slice-query to only row tombstone markers?

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest

2020-03-09 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055118#comment-17055118
 ] 

Stefan Podkowinski commented on CASSANDRA-15313:


Reverting CASSANDRA-1 fixes corruptionCausesFailure with seed 
71671740653044L for me. But semantics for generators change with that as well, 
so I'm not 100% sure its the actual cause.

> Fix flaky - ChecksummingTransformerTest - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> ---
>
> Key: CASSANDRA-15313
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15313
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15313-hack.patch
>
>
> During the recent runs, this test appears to be flaky.
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94]
> corruptionCausesFailure-compression - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> {code:java}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at org.quicktheories.impl.Precursor.(Precursor.java:17)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23)
>   at org.quicktheories.generators.Retry.generate(CodePoints.java:51)
>   at 
> org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190)
>   at 
> org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184)
>   at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93)
>   at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown 
> Source)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188)
>   at 
> org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.impl.Core.generate(Core.java:150)
>   at org.quicktheories.impl.Core.shrink(Core.java:103)
>   at org.quicktheories.impl.Core.run(Core.java:39)
>   at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35)
>   at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162)
>   at 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15557:
---
Status: Ready to Commit  (was: Review In Progress)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15557:
---
Test and Documentation Plan: The patch is a fix for a flacky test.
 Status: Patch Available  (was: In Progress)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15557:
---
Reviewers: Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Benjamin Lerer, Benjamin Lerer  (was: Benjamin Lerer)
   Status: Review In Progress  (was: Patch Available)

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055110#comment-17055110
 ] 

Benjamin Lerer commented on CASSANDRA-15557:


The patch looks good to me.

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108
 ] 

Chandrasekhar Thumuluru commented on CASSANDRA-15397:
-

{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and about the performance.  

I'll make necessary changes, compare it again and post the results. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Chandrasekhar Thumuluru (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055108#comment-17055108
 ] 

Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 3/9/20, 3:50 PM:
--

{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and not about the performance.  I'll make 
necessary changes, compare it again and post the results. 


was (Author: cthumuluru):
{quote}
I'm not sure if assuming long will be a good idea.
{quote}
I meant in the context of generics and about the performance.  

I'll make necessary changes, compare it again and post the results. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Updated] (CASSANDRA-15601) Ensure repaired data tracking reads a consistent amount of data across replicas

2020-03-09 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15601:

Reviewers: Aleksey Yeschenko

> Ensure repaired data tracking reads a consistent amount of data across 
> replicas
> ---
>
> Key: CASSANDRA-15601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15601
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> When generating a digest for repaired data tracking, the amount of repaired 
> data that needs to be read may depend on the unrepaired data on the replica. 
> As this may vary between replicas, digest mismatches can be reported even 
> though the repaired data may actually be in sync.
> For example, two replicas, A & B and a table like
> {code}
> CREATE TABLE t  (pk int, ck int, PRIMARY KEY (pk, ck)) WITH CLUSTERING ORDER 
> BY ck DESC; 
> Unrepaired
> ===
> Instance A
> (0, 5)
> Instance B
> (0, 6)
> (0, 5)
> Repaired (Both A & B)
> =
> (0, 4)
> (0, 3)
> (0, 2)
> (0, 1)
> (0, 0)
> SELECT * FROM tbl WHERE pk = 0 LIMIT 3;
> {code}
> Instance A would read (0, 5) from the unrepaired set and (0, 4) (0, 3) from 
> the repaired set. 
>  Instance B would read (0, 6) (0, 5) from its unrepaired set and just (0, 4) 
> from repaired data.
> Unrepaired row/range/partition tombstones shadowing repaired data and present 
> on some replicas but not others will have the opposite effect, with more 
> repaired data being read in comparison.
>  To fix this, when repaired data tracking is in effect each replica needs to 
> overread during a full data read. Replicas should read up to {{LIMIT}} (i.e. 
> the {{DataLimit}} of the {{ReadCommand}}) from the repaired set, regardless 
> of how much is read from the unrepaired data. At the point where that amount 
> of repaired data has been read, replica should stop updating the digest. So 
> if unrepaired tombstones cause more than {{LIMIT}} repaired data to be read, 
> the digest is only calculated over the first {{LIMIT}}-worth of repaired data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15566) Repair coordinator can hang under some cases

2020-03-09 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055063#comment-17055063
 ] 

David Capwell commented on CASSANDRA-15566:
---

bq. C* 4.0 code is quite new to me...

Me too :)

One of the best ways to start is testing; we need more tests to show where 
repair needs improvement. When I joined this project I asked operators top pain 
points with repair (all were from 2.1) and as I write tests I  see 4.0 has the 
same issues.  More tests which show new areas world be great!

Think your 5 classifications are good, though 1/2 can merge; our networking is 
lossy (not a bad thing, under load it’s crash or drop).  I would love a smoke 
test which runs user/operators tasks constantly under “load” (should be able to 
artificially lower resources). This test would help show if the different sub 
systems work well or need improvement as well.

About participate crashing, I added a jvm dtest with shows this is handled; 
assuming failure detector detect this (restart node also fails repair).

About detection and abort, I agree it should be external for now. Any/all 
things the external tools need must be identified and tested to show they work 
(for example does aborting repair work?). 

> Repair coordinator can hang under some cases
> 
>
> Key: CASSANDRA-15566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15566
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Repair coordination makes a few assumptions about message delivery which 
> cause it to hang forever when those assumptions don’t hold true: fire and 
> forget will not get rejected (participate has an issue and rejects the 
> message), and a very delayed message will one day be seen (messaging can be 
> dropped under load or when failure detector thinks a node is bad but is just 
> GCing).
> Given this and the desire to have better observability with repair (see 
> CASSANDRA-15399), coordination should be changed into a request/response 
> pattern (with retries) and polling (validation status and MerkleTree 
> sending).  This would allow the coordinator to detect changes in state (it 
> was known participate was working on validation, but it no longer knows about 
> the validation task), and to be able to recover from ephemeral issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14587) TrueDiskSpaceUsed overcounts snapshots

2020-03-09 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-14587:
---

Assignee: (was: Ekaterina Dimitrova)

> TrueDiskSpaceUsed overcounts snapshots
> --
>
> Key: CASSANDRA-14587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14587
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/nodetool
> Environment: Debian 8
> Cassandra 3.11.2
>Reporter: Elliott Sims
>Priority: Low
>
> Running 'nodetool listsnapshots' seems to overcount "TrueDiskSpaceUsed" under 
> some circumstances.  Specifically when there's a large number of snapshots.  
> I suspect that it's not deduplicating space used when multiple snapshots 
> share sstables that are not part of the current table.
> Results of "nodetool listsnapshots":
> Total TrueDiskSpaceUsed: 396.11 MiB
> Results of "du -hcs" on the table's directory:
> 18M    total
> This is 50+ snapshots (every minute) run with "-t  -sf 
> --column-family  "
> The results of a "du -hcs -L  "TrueDiskSpaceUsed"
> I have only tested against 3.11.2, but have no reason to believe it's unique 
> to that version or even 3.x.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2020-03-09 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984658#comment-16984658
 ] 

Michael Semb Wever edited comment on CASSANDRA-14365 at 3/9/20, 2:33 PM:
-

With new tests… (test against trunk also needed a rewrite bc 
{{`TableMetadata.Builder`}}

||branch||circleci||jenkins pipeline||
|[cassandra_2.2_14365|https://github.com/apache/cassandra/compare/cassandra-2.2...thelastpickle:mck/cassandra-2.2_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-2.2_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/40/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/40]|
|[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]|
|[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]|
|[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]|


was (Author: michaelsembwever):
With new tests… (test against trunk also needed a rewrite bc 
{{`TableMetadata.Builder`}}

||branch||circleci||jenkins pipeline||
|[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]|
|[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]|
|[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]|

> Commit log replay failure for static columns with collections in clustering 
> keys
> 
>
> Key: CASSANDRA-14365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> In the old storage engine, static cells with a collection as part of the 
> clustering key fail to validate because a 0 byte collection (like in the cell 
> name of a static cell) isn't valid.
> To reproduce:
> 1.
> {code:java}
> CREATE TABLE test.x (
> id int,
> id2 frozen>,
> st int static,
> PRIMARY KEY (id, id2)
> );
> INSERT INTO test.x (id, st) VALUES (1, 2);
> {code}
> 2.
>  Kill the cassandra process
> 3.
>  Restart cassandra to replay the commitlog
> Outcome:
> {noformat}
> ERROR [main] 2018-04-05 04:58:23,741 JVMStabilityInspector.java:99 - Exiting 
> due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
> Unexpected error deserializing mutation; saved to 
> /tmp/mutation3825739904516830950dat.  This may be caused by replaying a 
> mutation against a table with the same name but incompatible schema.  
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not 
> enough bytes to read a set
> at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:638)
>  [main/:na]
> at 
> 

[jira] [Comment Edited] (CASSANDRA-14365) Commit log replay failure for static columns with collections in clustering keys

2020-03-09 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984658#comment-16984658
 ] 

Michael Semb Wever edited comment on CASSANDRA-14365 at 3/9/20, 2:32 PM:
-

With new tests… (test against trunk also needed a rewrite bc 
{{`TableMetadata.Builder`}}

||branch||circleci||jenkins pipeline||
|[cassandra_3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.0_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]|
|[cassandra_3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fcassandra-3.11_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/42/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/42]|
|[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_14365]|[!https://builds.apache.org/job/Cassandra-devbranch/43/badge/icon!|https://builds.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/43]|


was (Author: michaelsembwever):
With new tests… (test against trunk also needed a rewrite bc 
{{`TableMetadata.Builder`}}

||branch||circleci||asf jenkins tests||asf jenkins dtests||
|[cassandra-2.2_14365|https://github.com/apache/cassandra/compare/cassandra-2.2...thelastpickle:mck/cassandra-2.2_14365]|[circleci|https://circleci.com/workflow-run/d500cc5f-1d87-4beb-815e-9931f8e84d95]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/29//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/29/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/703//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/703]|
|[cassandra-3.0_14365|https://github.com/apache/cassandra/compare/cassandra-3.0...thelastpickle:mck/cassandra-3.0_14365]|[circleci|https://circleci.com/workflow-run/747730de-573a-4e80-98f0-4defa14db909]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/33//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/33/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/706//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/706]|
|[cassandra-3.11_14365|https://github.com/apache/cassandra/compare/cassandra-3.11...thelastpickle:mck/cassandra-3.11_14365]|[circleci|https://circleci.com/workflow-run/86ca8a61-5cc2-40db-84a4-1210cf44f285]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/34//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/34/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707]|
|[trunk_14365|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_14365]|[circleci|https://circleci.com/workflow-run/a034a6b1-a7d7-43cd-b1ab-14769799b30e]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/35//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-pipeline/35/]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707//badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/707]|

> Commit log replay failure for static columns with collections in clustering 
> keys
> 
>
> Key: CASSANDRA-14365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14365
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Normal
>
> In the old storage engine, static cells with a collection as part of the 
> clustering key fail to validate because a 0 byte collection (like in the cell 
> name of a static cell) isn't valid.
> To reproduce:
> 1.
> {code:java}
> CREATE TABLE test.x (
> id int,
> id2 frozen>,
> st int static,
> PRIMARY KEY (id, id2)
> );
> INSERT INTO test.x (id, st) VALUES (1, 2);
> {code}
> 2.
>  Kill the cassandra process
> 3.
>  

[jira] [Comment Edited] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement

2020-03-09 Thread Kevin Gallardo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990
 ] 

Kevin Gallardo edited comment on CASSANDRA-15543 at 3/9/20, 2:08 PM:
-

Sounds good, thanks, hope you had a good weekend :)

As a summary, 

In any case, I believe passing an immutable copy of the 
{{failureReasonByEndpoint}} map to the constructor of 
Read/WriteFailureException would reduce the chances for the {{number of 
failures}} and the failure messages to be inconsistent.

In addition to that, there's the remaining question of the behavior of 
ReadCallback when failures happen (do we fail fast? or do we wait for all 
responses to come back/timeout?). Depending on the outcome of that, the test 
that is flaky at the moment would need to be adjusted to expect 1 *or* 2 
failures in the response.


was (Author: newkek):
Sounds good.

As a summary, 

In any case, I believe passing an immutable copy of the 
{{failureReasonByEndpoint}} map to the constructor of 
Read/WriteFailureException would reduce the chances for the {{number of 
failures}} and the failure messages to be inconsistent.

In addition to that, there's the remaining question of the behavior of 
ReadCallback when failures happen (do we fail fast? or do we wait for all 
responses to come back/timeout?). Depending on the outcome of that, the test 
that is flaky at the moment would need to be adjusted to expect 1 *or* 2 
failures in the response.

> flaky test 
> org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
> ---
>
> Key: CASSANDRA-15543
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15543
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: Kevin Gallardo
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> This fails infrequently, last seen failure was on java 8
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement

2020-03-09 Thread Kevin Gallardo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990
 ] 

Kevin Gallardo edited comment on CASSANDRA-15543 at 3/9/20, 2:07 PM:
-

Sounds good.

As a summary, 

In any case, I believe passing an immutable copy of the 
{{failureReasonByEndpoint}} map to the constructor of 
Read/WriteFailureException would reduce the chances for the {{number of 
failures}} and the failure messages to be inconsistent.

In addition to that, there's the remaining question of the behavior of 
ReadCallback when failures happen (do we fail fast? or do we wait for all 
responses to come back/timeout?). Depending on the outcome of that, the test 
that is flaky at the moment would need to be adjusted to expect 1 *or* 2 
failures in the response.


was (Author: newkek):
Sounds good.

As a summary, 

In any case, I believe passing an immutable copy of the 
{{failureReasonByEndpoint}} map to the constructor of 
Read/WriteFailureException would reduce the chances for the {{number of 
failures}} and the failure messages to be inconsistent.

In addition to that, there's the remaining question of the behavior of 
ReadCallback when failures happen (do we fail fast? or do we wait for all 
responses to come back/timeout?). Depending on the outcome of that the test 
that is flaky at the moment would need to be adjusted to expect 1 *or* 2 
failures in the response.

> flaky test 
> org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
> ---
>
> Key: CASSANDRA-15543
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15543
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: Kevin Gallardo
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> This fails infrequently, last seen failure was on java 8
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15543) flaky test org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement

2020-03-09 Thread Kevin Gallardo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054990#comment-17054990
 ] 

Kevin Gallardo commented on CASSANDRA-15543:


Sounds good.

As a summary, 

In any case, I believe passing an immutable copy of the 
{{failureReasonByEndpoint}} map to the constructor of 
Read/WriteFailureException would reduce the chances for the {{number of 
failures}} and the failure messages to be inconsistent.

In addition to that, there's the remaining question of the behavior of 
ReadCallback when failures happen (do we fail fast? or do we wait for all 
responses to come back/timeout?). Depending on the outcome of that the test 
that is flaky at the moment would need to be adjusted to expect 1 *or* 2 
failures in the response.

> flaky test 
> org.apache.cassandra.distributed.test.SimpleReadWriteTest.readWithSchemaDisagreement
> ---
>
> Key: CASSANDRA-15543
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15543
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: Kevin Gallardo
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> This fails infrequently, last seen failure was on java 8
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest.readWithSchemaDisagreement(DistributedReadWritePathTest.java:276)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15626) Need microsecond precision for dropped columns so we can avoid timestamp issues

2020-03-09 Thread Ryan Svihla (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Svihla updated CASSANDRA-15626:

Description: 
In CASSANDRA-15557 the fix for the flaky test is reimplementing the logic from 
CASSANDRA-12997  which was removed as part of CASSANDRA-13426

However, since dropped columns are stored at a millisecond precision instead of 
a microsecond precision and ClientState.getTimestamp adds microseconds on each 
call we will lose the precision on save and some writes that should be dropped 
could reappear.

Note views affected as well
 
[https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716]

  was:
In CASSANDRA-15557 the fix for the flaky test was reimplementing the logic from 
CASSANDRA-12997  which was removed as part of CASSANDRA-13426

However, since dropped columns are stored at a millisecond precision instead of 
a microsecond precision and ClientState.getTimestamp adds microseconds on each 
call we will lose the precision on save and some writes that should be dropped 
could reappear.

Note views affected as well
[https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716]


> Need microsecond precision for dropped columns so we can avoid timestamp 
> issues
> ---
>
> Key: CASSANDRA-15626
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15626
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Ryan Svihla
>Priority: Normal
>
> In CASSANDRA-15557 the fix for the flaky test is reimplementing the logic 
> from CASSANDRA-12997  which was removed as part of CASSANDRA-13426
> However, since dropped columns are stored at a millisecond precision instead 
> of a microsecond precision and ClientState.getTimestamp adds microseconds on 
> each call we will lose the precision on save and some writes that should be 
> dropped could reappear.
> Note views affected as well
>  
> [https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Ryan Svihla (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054989#comment-17054989
 ] 

Ryan Svihla commented on CASSANDRA-15557:
-

New [PR|https://github.com/apache/cassandra/pull/465] 

Note: I think this also happens to fix this behavior in CASSANDRA-15303 and 
made a new Jira for the issue this causes with dropped columns and precision 
https://issues.apache.org/jira/browse/CASSANDRA-15626

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15626) Need microsecond precision for dropped columns so we can avoid timestamp issues

2020-03-09 Thread Ryan Svihla (Jira)
Ryan Svihla created CASSANDRA-15626:
---

 Summary: Need microsecond precision for dropped columns so we can 
avoid timestamp issues
 Key: CASSANDRA-15626
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15626
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/SSTable
Reporter: Ryan Svihla


In CASSANDRA-15557 the fix for the flaky test was reimplementing the logic from 
CASSANDRA-12997  which was removed as part of CASSANDRA-13426

However, since dropped columns are stored at a millisecond precision instead of 
a microsecond precision and ClientState.getTimestamp adds microseconds on each 
call we will lose the precision on save and some writes that should be dropped 
could reappear.

Note views affected as well
[https://github.com/apache/cassandra/blob/cb83fbff479bb90e9abeaade9e0f8843634c974d/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L712-L716]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15557:
---
Labels: pull-request-available  (was: )

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15595) Many errors of "java.lang.AssertionError: Illegal bounds"

2020-03-09 Thread Roy Burstein (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054938#comment-17054938
 ] 

Roy Burstein commented on CASSANDRA-15595:
--

[~brandon.williams] - can you direct us what info you need in order to debug 
this issue ?

> Many errors of "java.lang.AssertionError: Illegal bounds"
> -
>
> Key: CASSANDRA-15595
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15595
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Yakir Gibraltar
>Priority: Normal
> Fix For: 3.11.7
>
>
> Hi, i'm running cassandra 3.11.6 and getting on all hosts many errors of:
> {code}
> ERROR [ReadStage-6] 2020-02-24 13:53:34,528 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[ReadStage-6,5,main]
> java.lang.AssertionError: Illegal bounds [-2102982480..-2102982472); size: 
> 2761628520
> at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:345) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at org.apache.cassandra.io.util.Memory.getLong(Memory.java:254) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:234)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.CompressedChunkReader$Standard.readChunk(CompressedChunkReader.java:114)
>  ~[apache-cassandra-3.11.6.ja
> r:3.11.6]
> at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:158) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:2949)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$15(BoundedLocalCache.java:1807)
>  ~[caffeine-2.2.6.jar:na]
> at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) 
> ~[na:1.8.0-zing_19.12.102.0]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1805)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1788)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66)
>  ~[caffeine-2.2.6.jar:na]
> at 
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:236)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:214)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:207)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.FileHandle.createReader(FileHandle.java:150) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(SSTableReader.java:1807)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.(AbstractSSTableIterator.java:103)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.columniterator.SSTableIterator.(SSTableIterator.java:49)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:65)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.StorageHook$1.makeRowIterator(StorageHook.java:100) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:982)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:693)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> 

[jira] [Commented] (CASSANDRA-15595) Many errors of "java.lang.AssertionError: Illegal bounds"

2020-03-09 Thread Roy Burstein (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054939#comment-17054939
 ] 

Roy Burstein commented on CASSANDRA-15595:
--

[~brandon.williams] - can you direct us what info you need in order to debug 
this issue ?

> Many errors of "java.lang.AssertionError: Illegal bounds"
> -
>
> Key: CASSANDRA-15595
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15595
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Yakir Gibraltar
>Priority: Normal
> Fix For: 3.11.7
>
>
> Hi, i'm running cassandra 3.11.6 and getting on all hosts many errors of:
> {code}
> ERROR [ReadStage-6] 2020-02-24 13:53:34,528 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[ReadStage-6,5,main]
> java.lang.AssertionError: Illegal bounds [-2102982480..-2102982472); size: 
> 2761628520
> at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:345) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at org.apache.cassandra.io.util.Memory.getLong(Memory.java:254) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:234)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.CompressedChunkReader$Standard.readChunk(CompressedChunkReader.java:114)
>  ~[apache-cassandra-3.11.6.ja
> r:3.11.6]
> at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:158) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache.java:2949)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$15(BoundedLocalCache.java:1807)
>  ~[caffeine-2.2.6.jar:na]
> at 
> java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853) 
> ~[na:1.8.0-zing_19.12.102.0]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:1805)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:1788)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:97)
>  ~[caffeine-2.2.6.jar:na]
> at 
> com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:66)
>  ~[caffeine-2.2.6.jar:na]
> at 
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:236)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.cache.ChunkCache$CachingRebufferer.rebuffer(ChunkCache.java:214)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:207)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.util.FileHandle.createReader(FileHandle.java:150) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(SSTableReader.java:1807)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.(AbstractSSTableIterator.java:103)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.columniterator.SSTableIterator.(SSTableIterator.java:49)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:72)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:65)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.StorageHook$1.makeRowIterator(StorageHook.java:100) 
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:982)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:693)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670)
>  ~[apache-cassandra-3.11.6.jar:3.11.6]
> at 
> 

[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2020-03-09 Thread Aleksandr Sorokoumov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054911#comment-17054911
 ] 

Aleksandr Sorokoumov commented on CASSANDRA-14801:
--

Thank you for a comprehensive response, [~benedict]! I am quite new to C* 4.0 
code, so it will take me some time to ramp up. If anyone has planned to work on 
this issue in the next 1-2 weeks, it probably makes sense for me to work on 
something else. Otherwise, I'd be happy to contribute.

In the latter case, in the next couple of days, I plan to read on how pending 
ranges are calculated and what changes since 3.11 introduced the bug. Then I'll 
write a test case that reproduces the issue.

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Ryan Svihla (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054908#comment-17054908
 ] 

Ryan Svihla commented on CASSANDRA-15557:
-

So looking at the alter schema logic more:
[https://github.com/apache/cassandra/blob/08b2192da0eb6deddcd8f79cd180d069442223ae/src/java/org/apache/cassandra/cql3/statements/schema/AlterTableStatement.java#L398]

and 
[https://github.com/apache/cassandra/blob/08b2192da0eb6deddcd8f79cd180d069442223ae/src/java/org/apache/cassandra/cql3/statements/schema/AlterTableStatement.java#L411-L426]

it does seem (naively) reasonable to have it using the ClientState's 
getTimestamp() method in the AlterTableStatement since the ClientState is 
already there, but I'm sure I'm missing lots of background. 

Will wait for more experience people to weigh in.

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Ryan Svihla (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054887#comment-17054887
 ] 

Ryan Svihla commented on CASSANDRA-15557:
-

So digging into the actual failure, this took a few tries as wrapping logging, 
or flushing sstables seemed to make it hard to reproduce, I've confirmed it's 
time based errors at least in this case:

row ts:             {{1583753957613001 }}

dropped time: {{1583753957613000}}

{{[junit-timeout] Testcase: 
testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest):
 FAILED }}
{{[junit-timeout] Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001 }}
{{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: 
\{java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001}}
{{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
 }}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
}}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 }}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 }}
{{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 
column 2 (mycollection of type list), expected  but got <[first 
element]> }}
{{[junit-timeout] at 
org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}}
{{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98)
 }}
{{[junit-timeout]}}
{{[junit-timeout]}}

 

 

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15557) Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest testDropListAndAddListWithSameName

2020-03-09 Thread Ryan Svihla (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054887#comment-17054887
 ] 

Ryan Svihla edited comment on CASSANDRA-15557 at 3/9/20, 11:56 AM:
---

So digging into the actual failure, this took a few tries as wrapping logging, 
or flushing sstables seemed to make it hard to reproduce, I've confirmed it's 
time based errors at least in this case:

row ts:             1583753957613001

dropped time: {{1583753957613000}}

{{[junit-timeout] Testcase: 
testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest):
 FAILED }}
 {{[junit-timeout] Dropped column: {java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001 }}
 {{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: 
{java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001}}
 {{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
 }}
 {{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
}}
 {{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 }}
 {{[junit-timeout] at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 }}
 {{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 
column 2 (mycollection of type list), expected  but got <[first 
element]> }}
 {{[junit-timeout] at 
org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}}
 {{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98)
 }}
 {{[junit-timeout]}}
 {{[junit-timeout]}}

 

 


was (Author: rssvihla):
So digging into the actual failure, this took a few tries as wrapping logging, 
or flushing sstables seemed to make it hard to reproduce, I've confirmed it's 
time based errors at least in this case:

row ts:             {{1583753957613001 }}

dropped time: {{1583753957613000}}

{{[junit-timeout] Testcase: 
testDropListAndAddListWithSameName(org.apache.cassandra.cql3.validation.operations.AlterTest):
 FAILED }}
{{[junit-timeout] Dropped column: \{java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001 }}
{{[junit-timeout] junit.framework.AssertionFailedError: Dropped column: 
\{java.nio.HeapByteBuffer[pos=0 lim=12 
cap=12]=DroppedColumn{column=mycollection, droppedTime=1583753957613000}} Row 
timestamp: 1583753957613001}}
{{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:102)
 }}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
}}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 }}
{{[junit-timeout] at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 }}
{{[junit-timeout] Caused by: java.lang.AssertionError: Invalid value for row 0 
column 2 (mycollection of type list), expected  but got <[first 
element]> }}
{{[junit-timeout] at 
org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)}}
{{[junit-timeout] at 
org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:98)
 }}
{{[junit-timeout]}}
{{[junit-timeout]}}

 

 

> Fix flaky test org.apache.cassandra.cql3.validation.operations.AlterTest 
> testDropListAndAddListWithSameName
> ---
>
> Key: CASSANDRA-15557
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15557
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> https://app.circleci.com/jobs/github/dcapwell/cassandra/482/tests
> {code}
> junit.framework.AssertionFailedError: Invalid value for row 0 column 2 
> (mycollection of type list), expected  but got <[first element]>
>   at org.apache.cassandra.cql3.CQLTester.assertRows(CQLTester.java:1070)
>   at 
> org.apache.cassandra.cql3.validation.operations.AlterTest.testDropListAndAddListWithSameName(AlterTest.java:91)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2020-03-09 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054832#comment-17054832
 ] 

Benedict Elliott Smith commented on CASSANDRA-14801:


Nobody is actively working on it, but this is one of the most deceptively 
complex tickets that needs to be accomplished before 4.0 is released.  I can 
see you work at DataStax, so perhaps you have the time and skill to dedicate to 
this, but please be confident before you address it, and be willing to wait a 
while for a sufficient review.  The class in which the change is needed has had 
numerous bugs (and in fact has inherent conceptually bugs wrt range movements 
that are mostly out of scope to address here), so a great deal of care is 
needed.  Ideally this ticket would attempt to address some of the ugliness that 
permitted the bug, and _certainly_ needs to be accompanied by a 
sophisticated-ish randomised correctness test.

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2020-03-09 Thread Aleksandr Sorokoumov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054830#comment-17054830
 ] 

Aleksandr Sorokoumov commented on CASSANDRA-14801:
--

Is anyone working on this ticket? If not, I would like to work on it.

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15625) Nodetool toppartitions error

2020-03-09 Thread Antonio (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio reassigned CASSANDRA-15625:
---

Assignee: Alex Lumpov  (was: Antonio)

> Nodetool toppartitions error
> 
>
> Key: CASSANDRA-15625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15625
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Antonio
>Assignee: Alex Lumpov
>Priority: Normal
>
> c* version :3.0.15
> here's my test table:
> CREATE TABLE app300.test (
>  a bigint PRIMARY KEY,
>  b text,
>  c text
> )
> INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');
> when i use topartition :nodetool  toppartitions app300 test 50,get error
> error: Expected 8 or 0 byte long (1048576)
> -- StackTrace --
> org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
> (1048576)
>   at 
> org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
>   at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> but when i flush this table, topartition can work 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15625) Nodetool toppartitions error

2020-03-09 Thread Antonio (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio reassigned CASSANDRA-15625:
---

Assignee: Antonio

> Nodetool toppartitions error
> 
>
> Key: CASSANDRA-15625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15625
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Antonio
>Assignee: Antonio
>Priority: Normal
>
> c* version :3.0.15
> here's my test table:
> CREATE TABLE app300.test (
>  a bigint PRIMARY KEY,
>  b text,
>  c text
> )
> INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');
> when i use topartition :nodetool  toppartitions app300 test 50,get error
> error: Expected 8 or 0 byte long (1048576)
> -- StackTrace --
> org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
> (1048576)
>   at 
> org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
>   at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> but when i flush this table, topartition can work 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15625) Nodetool toppartitions error

2020-03-09 Thread Antonio (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio updated CASSANDRA-15625:

Description: 
c* version :3.0.15

here's my test table:
CREATE TABLE app300.test (
 a bigint PRIMARY KEY,
 b text,
 c text
)

INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');

when i use topartition :nodetool  toppartitions app300 test 50,get error

error: Expected 8 or 0 byte long (1048576)
-- StackTrace --
org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
(1048576)
at 
org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
at 
org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
at 
org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)




but when i flush this table, topartition can work 
 

  was:
c* version :3.0.15

here's my test table:
CREATE TABLE app300.test (
 a bigint PRIMARY KEY,
 b text,
 c text
)

INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');

when i use topartition :nodetool  toppartitions app300 test 50,get error

error: Expected 8 or 0 byte long (1048576)
-- StackTrace --
org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
(1048576)
at 
org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
at 
org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
at 
org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


 but when i flush this table, topartition can work 
 


> Nodetool toppartitions error
> 
>
> Key: CASSANDRA-15625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15625
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Antonio
>Priority: Normal
>
> c* version :3.0.15
> here's my test table:
> CREATE TABLE app300.test (
>  a bigint PRIMARY KEY,
>  b text,
>  c text
> )
> INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');
> when i use topartition :nodetool  toppartitions app300 test 50,get error
> error: Expected 8 or 0 byte long (1048576)
> -- StackTrace --
> org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
> (1048576)
>   at 
> org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
>   at 
> org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> but when i flush this table, topartition can work 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15625) Nodetool toppartitions error

2020-03-09 Thread Antonio (Jira)
Antonio created CASSANDRA-15625:
---

 Summary: Nodetool toppartitions error
 Key: CASSANDRA-15625
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15625
 Project: Cassandra
  Issue Type: Bug
Reporter: Antonio


c* version :3.0.15

here's my test table:
CREATE TABLE app300.test (
 a bigint PRIMARY KEY,
 b text,
 c text
)

INSERT INTO app300.test(a ,b, c ) VALUES (50, 'test1', 'test1');

when i use topartition :nodetool  toppartitions app300 test 50,get error

error: Expected 8 or 0 byte long (1048576)
-- StackTrace --
org.apache.cassandra.serializers.MarshalException: Expected 8 or 0 byte long 
(1048576)
at 
org.apache.cassandra.serializers.LongSerializer.validate(LongSerializer.java:42)
at 
org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:128)
at 
org.apache.cassandra.db.ColumnFamilyStore.finishLocalSampling(ColumnFamilyStore.java:1579)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


 but when i flush this table, topartition can work 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15623) When running CQLSH with STDIN input, exit with error status code if script fails

2020-03-09 Thread Jacob Becker (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054803#comment-17054803
 ] 

Jacob Becker commented on CASSANDRA-15623:
--

[~jrwest], after taking a look at the code, I would say yes, I will provide a 
patch. I believe I can manage within a few days.

As for the exit code being 1 or 2, it is debatable indeed as, AFAIK, there is 
no (generic) specification in this regard. What is truly important is that it 
is not 0 (and it is not), so I wasn't sure if the subject is even worth a new 
ticket. I personally can live just fine with 2, I mentioned it only because, 
from my experience, anything above 1 usually has some underlying reason 
(ideally - explained in documentation); from what I can tell, there is no such 
reason in this case (especially considering the script *never* exits with 1) 
and no mention in the documentation.

> When running CQLSH with STDIN input, exit with error status code if script 
> fails
> 
>
> Key: CASSANDRA-15623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15623
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: Jacob Becker
>Priority: Normal
>
> Assuming CASSANDRA-6344 is in place for years and considering that scripts 
> submitted with the `-e` option behave in a similar fashion, it is very 
> surprising that scripts submitted to STDIN (i.e. piped in) always exit with a 
> zero code, regardless of errors. I believe this should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14517) Short read protection can cause partial updates to be read

2020-03-09 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054790#comment-17054790
 ] 

ZhaoYang commented on CASSANDRA-14517:
--

This sounds like "repeatable read" issue.. I don't think Cassandra ever 
provides any read isolation level.. 

[~bdeggleston] do you think this issue blocks 4.0 release? should we move it 
into backlog for future reference?

> Short read protection can cause partial updates to be read
> --
>
> Key: CASSANDRA-14517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14517
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> If a read is performed in two parts due to short read protection, and the 
> data being read is written to between reads, the coordinator will return a 
> partial update. Specifically, this will occur if a single partition batch 
> updates clustering values on both sides of the SRP break, or if a range 
> tombstone is written that deletes data on both sides of the break. At the 
> coordinator level, this breaks the expectation that updates to a partition 
> are atomic, and that you can’t see partial updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15624) Avoid lazy initializing shut down instances when trying to send them messages

2020-03-09 Thread Marcus Eriksson (Jira)
Marcus Eriksson created CASSANDRA-15624:
---

 Summary: Avoid lazy initializing shut down instances when trying 
to send them messages
 Key: CASSANDRA-15624
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15624
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson


We currently use {{to.broadcastAddressAndPort()}} when figuring out if we 
should send a message to an instance, if that instance has been shut down it 
will get re-initialized but not startup:ed which makes the tests fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054782#comment-17054782
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15397 at 3/9/20, 9:18 AM:
-

bq. I couldn't do so since the code uses a generic that's comparable

The {{IntervalTree}} is used in precisely one place in the codebase, so it 
would be possible to hardcode to this use case for improved performance.

bq.  I'm not sure if assuming long will be a good idea

I would be very surprised if it is not significantly faster.  Particularly in 
tests that correctly account for memory latency (i.e. ensure the data is not 
entirely held in CPU cache before the test begins).


was (Author: benedict):
bq. I couldn't do so since the code uses a generic that's comparable

The {{IntervalTree}} is used in precisely one place in the codebase, so it 
would be possible to hardcode to this use case for improved performance.

bq.  I'm not sure if assuming long will be a good idea

I would be very surprised if it is not significantly faster.

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the 

[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.

2020-03-09 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054782#comment-17054782
 ] 

Benedict Elliott Smith commented on CASSANDRA-15397:


bq. I couldn't do so since the code uses a generic that's comparable

The {{IntervalTree}} is used in precisely one place in the codebase, so it 
would be possible to hardcode to this use case for improved performance.

bq.  I'm not sure if assuming long will be a good idea

I would be very surprised if it is not significantly faster.

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> --
>
> Key: CASSANDRA-15397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/SSTable
>Reporter: Chandrasekhar Thumuluru
>Assignee: Chandrasekhar Thumuluru
>Priority: Low
>  Labels: pull-request-available
> Attachments: 90p_100k_sstables_with_1000_searches.png, 
> 90p_1million_sstables_with_1000_searches.png, 
> 90p_250k_sstables_with_1000_searches.png, 
> 90p_500k_sstables_with_1000_searches.png, 
> 90p_750k_sstables_with_1000_searches.png, 
> 95p_1_SSTable_with_5000_Searches.png, 
> 95p_100k_sstables_with_1000_searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_1million_sstables_with_1000_searches.png, 
> 95p_2_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_250k_sstables_with_1000_searches.png, 
> 95p_3_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 95p_500k_sstables_with_1000_searches.png, 
> 95p_750k_sstables_with_1000_searches.png, 
> 99p_1_SSTable_with_5000_Searches.png, 
> 99p_100k_sstables_with_1000_searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_1million_sstables_with_1000_searches.png, 
> 99p_2_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_250k_sstables_with_1000_searches.png, 
> 99p_3_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, 
> 99p_500k_sstables_with_1000_searches.png, 
> 99p_750k_sstables_with_1000_searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_1_SSTable_with_5000_Searches.png, 
> Mean_100k_sstables_with_1000_searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_1million_sstables_with_1000_searches.png, 
> Mean_2_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_250k_sstables_with_1000_searches.png, 
> Mean_3_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, 
> Mean_500k_sstables_with_1000_searches.png, 
> Mean_750k_sstables_with_1000_searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15566) Repair coordinator can hang under some cases

2020-03-09 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054748#comment-17054748
 ] 

ZhaoYang commented on CASSANDRA-15566:
--

Thanks for the update..

I don't have any concrete implementation details in mind yet, C* 4.0 code is 
quite new to me...

Based on my understanding on 3.x, the main reasons for repair hanging are:
 # request/response messages got dropped if exceeding expiration time which is 
10s..
 # internode connections are closed and clear all queued messages due to 
network or gossip status changes..
 # participant crashed.
 # failure response was not sent to coordinator in 
{{RepairMessageVerbHandler.doVerb()}} in case of unknown exception. currently 
it only handles dropped tables..
 # participant is indeed making progress but very slow during validation 
because disk IO throttle.

For problem #1-2, I am thinking to make repair message idempotent and sender 
will periodically resend message until it got a reply.

For problem #3, make sure repair manager responds to endpoint status 
changes(eg. up/down/remove, etc..) if it doesn't do it already.

For problem #4, make sure all exceptions are caught and responded with failure. 
need to add some failure injections to dtests.

For problem #5, as you suggested in CASSANDRA-15399, coordinator should be able 
to check participants' in-mem virtual table to determine if it's making 
progress.

In order to make repair great again, i think it's important to be able to 
identify hanged repairs automatically (even with some false-positive) and abort 
those hanged repairs by nodetool. Because I don't expect repair operations to 
be run by operators manually. On production, it should be managed by automation 
tool, like repair service or reaper which will abort and retry hanged repair.. 
It can probably be done in CASSANDRA-15399 or a separate ticket..

> Repair coordinator can hang under some cases
> 
>
> Key: CASSANDRA-15566
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15566
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Repair coordination makes a few assumptions about message delivery which 
> cause it to hang forever when those assumptions don’t hold true: fire and 
> forget will not get rejected (participate has an issue and rejects the 
> message), and a very delayed message will one day be seen (messaging can be 
> dropped under load or when failure detector thinks a node is bad but is just 
> GCing).
> Given this and the desire to have better observability with repair (see 
> CASSANDRA-15399), coordination should be changed into a request/response 
> pattern (with retries) and polling (validation status and MerkleTree 
> sending).  This would allow the coordinator to detect changes in state (it 
> was known participate was working on validation, but it no longer knows about 
> the validation task), and to be able to recover from ephemeral issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org