date:20201014

[jira] [Updated] (CASSANDRA-15229) Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed Chunks

2020-10-14 Thread Caleb Rackliffe (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-15229:

Status: Ready to Commit  (was: Review In Progress)

> Segregate Network and Chunk Cache BufferPools and Recirculate Partially Freed 
> Chunks
> 
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.
> -
> Since CASSANDRA-5863, chunk cache is implemented to use buffer pool. When 
> local pool is full, one of its chunks will be evicted and only put back to 
> global pool when all buffers in the evicted chunk are released. But due to 
> chunk cache, buffers can be held for long period of time, preventing evicted 
> chunk to be recycled even though most of space in the evicted chunk are free.
> There two things need to be improved:
> 1. Evicted chunk with free space should be recycled to global pool, even if 
> it's not fully free. It's doable in 4.0.
> 2. Reduce fragmentation caused by different buffer size. With #1, partially 
> freed chunk will be available for allocation, but "holes" in the partially 
> freed chunk are with different sizes. We should consider allocating fixed 
> buffer size which is unlikely to fit in 4.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16114) Fix tests CQLTester.assertLastSchemaChange causes ClassCastException

2020-10-14 Thread Berenguer Blasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214442#comment-17214442
 ] 

Berenguer Blasi commented on CASSANDRA-16114:
-

[~cedric.nabaa] are you still planning on working on this one?

> Fix tests CQLTester.assertLastSchemaChange causes ClassCastException
> 
>
> Key: CASSANDRA-16114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16114
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Cedric Nabaa
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Build: 
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/494/workflows/b3765545-7b09-48dd-85ff-830c4f348329/jobs/2681
> {code}
> java.lang.ClassCastException: 
> org.apache.cassandra.transport.messages.ResultMessage$Void cannot be cast to 
> org.apache.cassandra.transport.messages.ResultMessage$SchemaChange
>   at 
> org.apache.cassandra.cql3.CQLTester.assertLastSchemaChange(CQLTester.java:916)
>   at 
> org.apache.cassandra.cql3.validation.entities.UFTest.testSchemaChange(UFTest.java:94)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-14 Thread Berenguer Blasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214434#comment-17214434
 ] 

Berenguer Blasi commented on CASSANDRA-15996:
-

The only 2 things that came to my mind are:
- On node start, instead of relying on the patient cql connection, lets add 
flags to wait for the binary protocol and other startup stuff to complete i.e . 
But this is just a stab in the dark based on previous experience fixing tests. 
Just in case there is some esoteric race at startup.
- {{NoSpamLogger}} has some shuffling of instances around that _maybe_ have a 
concurrency hole, _maybe_ I am just imagining things. I have to look at it for 
a while a bit longer to make up my mind. In any case I didn't see how that 
could affect in this particular case were usage is pretty straightforward and 
not multithreaded. So I am also at a loss here so far as well.



> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16152) In-JVM dtest - modify schema with stopped nodes and use yaml fragments for config

2020-10-14 Thread Dinesh Joshi (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-16152:
-
Reviewers: Alex Petrov, David Capwell, Dinesh Joshi, Yifan Cai  (was: David 
Capwell, Dinesh Joshi, Yifan Cai)

> In-JVM dtest - modify schema with stopped nodes and use yaml fragments for 
> config
> -
>
> Key: CASSANDRA-16152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16152
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>
> Some convenience improvements to in-JVM dtest that are useful across versions 
> that I needed while working on CASSANDRA-16144
> * Add support for changing schema with stopped nodes.
> * Make it simpler to modify nested configuration items by specifying Yaml 
> fragments 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-14 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214286#comment-17214286
 ] 

Benedict Elliott Smith commented on CASSANDRA-16180:


{quote}I'd also propose that we leave Paxos/CAS out of scope for this issue.
{quote}
Yes, that's probably best - there's a related ticket where Sylvain and I have 
both expanded Paxos test coverage anyway, and besides this I think it is better 
to wait until post 4.0 (shortly after which I hope the Paxos landscape will 
materially improve in the project)

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16211) Improve job metadata queries exception handling in cassandra-diff

2020-10-14 Thread Yifan Cai (Jira)

Yifan Cai created CASSANDRA-16211:
-

 Summary: Improve job metadata queries exception handling in 
cassandra-diff
 Key: CASSANDRA-16211
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16211
 Project: Cassandra
  Issue Type: Improvement
  Components: Tool/diff
Reporter: Yifan Cai
Assignee: Yifan Cai


The job metadata tracks the progress of the diff job. Sometimes, a job can fail 
due to the progress update query failures. 
The progress update queries can be categorized into 2 groups, critical and 
trivial one. 
When a query failed to update a trivial status (e.g. ProgressTracker), we would 
mostly hope to continue the job and just log the failure. 
When a query failed to update a critical status (e.g. JobLifeCycle), we can 
apply the client-side retry strategy (e.g. exponential backoff) in addition to 
the retry policy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries

2020-10-14 Thread Chris Lohfink (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214206#comment-17214206
 ] 

Chris Lohfink commented on CASSANDRA-15241:
---

I gave a talk on why I feel its necessary last year at apache con, that said 
its super late in release and its a pretty big patch (mostly just code making 
Mutations and Messages human readable) so I understand it not going in. Review 
feedback has been addressed I believe so waiting on that.

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread Jordan West (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214205#comment-17214205
 ] 

Jordan West commented on CASSANDRA-16148:
-

Committed. Thanks. 

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread Jordan West (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-16148:

  Fix Version/s: 4.0-beta3
  Since Version: 4.0-beta3
Source Control Link: 
https://github.com/jrwest/cassandra/commit/9a40e8079baff6f499229535a4af75be97f9a3b9
  
https://github.com/apache/cassandra/commit/06bc316c89053067d162da3f118b43a62dcf0854
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread Jordan West (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-16148:

Status: Ready to Commit  (was: Changes Suggested)

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated (3a05ed3 -> 06bc316)

2020-10-14 Thread jwest

This is an automated email from the ASF dual-hosted git repository.

jwest pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 3a05ed3  Follow-up: fix test failures caused by 16207.
 add 9a40e80  upgrade to dtest-api 0.0.6
 add 7b3a15d  Merge branch 'cassandra-2.2' into cassandra-3.0
 add a6c2224  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 06bc316  Merge branch 'cassandra-3.11' into trunk

No new revisions were added by this update.

Summary of changes:
 build.xml  |   2 +-
 src/java/org/apache/cassandra/gms/Gossiper.java|  14 +--
 .../cassandra/utils/ExpiringMemoizingSupplier.java | 132 +
 .../impl/DelegatingInvokableInstance.java  |   6 +
 .../cassandra/distributed/impl/Instance.java   |  16 ++-
 .../cassandra/distributed/test/ReadRepairTest.java |  15 +++
 .../org/apache/cassandra/gms/GossiperTest.java |   8 +-
 7 files changed, 180 insertions(+), 13 deletions(-)
 create mode 100644 
src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-3.11 updated (d3f7bdf -> a6c2224)

2020-10-14 Thread jwest

This is an automated email from the ASF dual-hosted git repository.

jwest pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from d3f7bdf  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 9a40e80  upgrade to dtest-api 0.0.6
 add 7b3a15d  Merge branch 'cassandra-2.2' into cassandra-3.0
 add a6c2224  Merge branch 'cassandra-3.0' into cassandra-3.11

No new revisions were added by this update.

Summary of changes:
 build.xml   | 2 +-
 .../cassandra/distributed/impl/DelegatingInvokableInstance.java | 6 ++
 .../distributed/org/apache/cassandra/distributed/impl/Instance.java | 5 +
 3 files changed, 12 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-2.2 updated (521a6e2 -> 9a40e80)

2020-10-14 Thread jwest

This is an automated email from the ASF dual-hosted git repository.

jwest pushed a change to branch cassandra-2.2
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 521a6e2  Fixed a NullPointerException when calling nodetool 
enablethrift
 add 9a40e80  upgrade to dtest-api 0.0.6

No new revisions were added by this update.

Summary of changes:
 build.xml   | 2 +-
 .../cassandra/distributed/impl/DelegatingInvokableInstance.java | 6 ++
 .../distributed/org/apache/cassandra/distributed/impl/Instance.java | 5 +
 3 files changed, 12 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-3.0 updated (6eeca9d -> 7b3a15d)

2020-10-14 Thread jwest

This is an automated email from the ASF dual-hosted git repository.

jwest pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 6eeca9d  Fix NPE when calling broadcast address on unintialized node
 add 9a40e80  upgrade to dtest-api 0.0.6
 add 7b3a15d  Merge branch 'cassandra-2.2' into cassandra-3.0

No new revisions were added by this update.

Summary of changes:
 build.xml  | 2 +-
 .../cassandra/distributed/impl/DelegatingInvokableInstance.java| 7 +++
 .../org/apache/cassandra/distributed/impl/Instance.java| 5 +
 3 files changed, 13 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-14 Thread Adam Holmberg (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214127#comment-17214127
 ] 

Adam Holmberg commented on CASSANDRA-15996:
---

[~Bereng] I noticed that too. I've been staring at NoSpam logger for a bit and 
haven't seen a way that it should fail in this way with a single request in 
flight. What did you have in mind for an edge case? 

I looked a bit at the logs from the other failure and noticed one anomaly. I'm 
not sure how it could be related, but I noticed that server never emits the 
"Startup complete" message. We only have one example of this.

The logs from the test run on this ticket are expired out of Circle. I was 
coming here to ask [~dcapwell] or anyone if they have other examples of this 
failing where the log files are still retained?

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16209) Log Warning Rather than Verbose Trace when Preview Repair Validation Conflicts with Incremental Repair

2020-10-14 Thread Marcus Eriksson (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-16209:

Reviewers: Marcus Eriksson

> Log Warning Rather than Verbose Trace when Preview Repair Validation 
> Conflicts with Incremental Repair
> --
>
> Key: CASSANDRA-16209
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16209
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a preview repair on repaired data identifies which SSTables to validate, 
> it might come across an SSTable that's still pending for an in-progress 
> incremental repair session. It makes sense that we immediately fail the 
> preview repair in that case, but the resulting error and verbose stack trace 
> in the logs is a bit too severe a reaction. We should downgrade this to a 
> simple warning message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214074#comment-17214074
 ] 

David Capwell edited comment on CASSANDRA-16148 at 10/14/20, 5:17 PM:
--

I feel that 
https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885
 is because you didn't rebase... I added a lot line to all branches to detect 
when we complete startup, and looks like it times out after 10m since it never 
sees that log.

Confirmed, https://github.com/jrwest/cassandra/commits/jwest/16148 doesn't have 
the commit which checks for the log.


was (Author: dcapwell):
I feel that 
https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885
 is because you didn't rebase... I added a lot line to all branches to detect 
when we complete startup, and looks like it times out after 10m since it never 
sees that log.

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214075#comment-17214075
 ] 

David Capwell commented on CASSANDRA-16148:
---

+1 from me

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16148) Test failures caused by merging CASSANDRA-15833

2020-10-14 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214074#comment-17214074
 ] 

David Capwell commented on CASSANDRA-16148:
---

I feel that 
https://app.circleci.com/pipelines/github/jrwest/cassandra/71/workflows/a6356d72-d33c-449d-8561-332ec190910c/jobs/885
 is because you didn't rebase... I added a lot line to all branches to detect 
when we complete startup, and looks like it times out after 10m since it never 
sees that log.

> Test failures caused by merging CASSANDRA-15833
> ---
>
> Key: CASSANDRA-16148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jordan West
>Assignee: Jordan West
>Priority: Normal
>
> Three issues were caused by merging CASSANDRA-15833:
> 1. `GossiperTest#testHaveAnyVersion3Nodes` was failing on trunk: 
> https://app.circleci.com/pipelines/github/jrwest/cassandra/53/workflows/95f9f401-1ef8-4b8d-9c64-3703d9669d95/jobs/771
> 2. python dtest ReadRepairTest#test_atomic_writes[blocking] was failing
> 3. In-jvm dtests being worked on as part of CASSANDRA-15977 uncovered an 
> issue with how CASSANDRA-15833 changes interacted with in-jvm dtests running 
> without {{Feature.GOSSIP}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-14 Thread David Capwell (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214046#comment-17214046
 ] 

David Capwell commented on CASSANDRA-15935:
---

Since aleksey is on-board with Action, I will backoff and not argue the point.

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false

2020-10-14 Thread Dmitrii Saprykin (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214043#comment-17214043
 ] 

Dmitrii Saprykin edited comment on CASSANDRA-16091 at 10/14/20, 4:45 PM:
-

Is this issue fixed by CASSANDRA-16127 ?


was (Author: saprykin):
Is this issue fixed by CASSANDRA-16124 ?

> rpc server gets wrongly initialized with rpc_enabled:false
> --
>
> Key: CASSANDRA-16091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16091
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Tom van der Woerdt
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception 
> is thrown:
> {code:java}
>  java.lang.RuntimeException: Client SSL is not supported for non-blocking 
> sockets (hsha). Please remove client ssl from the configuration.
>   at 
> org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74)
>   at 
> org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55)
>   at 
> org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128)
>   at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55)
>   at 
> org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713)
>   at 
> org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768)
> {code}
> No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in 
> both versions.
> I created this Jira issue because clearly something changed between 3.11.7 
> and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which 
> is not clearly documented to be about Thrift only) from `hsha` to `sync` does 
> resolve the issue, as expected, but this does seem like a regression, hence 
> the Jira issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16091) rpc server gets wrongly initialized with rpc_enabled:false

2020-10-14 Thread Dmitrii Saprykin (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214043#comment-17214043
 ] 

Dmitrii Saprykin commented on CASSANDRA-16091:
--

Is this issue fixed by CASSANDRA-16124 ?

> rpc server gets wrongly initialized with rpc_enabled:false
> --
>
> Key: CASSANDRA-16091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16091
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Tom van der Woerdt
>Assignee: David Capwell
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> After upgrading to Cassandra 3.11.8, Cassandra no longer starts. An exception 
> is thrown:
> {code:java}
>  java.lang.RuntimeException: Client SSL is not supported for non-blocking 
> sockets (hsha). Please remove client ssl from the configuration.
>   at 
> org.apache.cassandra.thrift.THsHaDisruptorServer$Factory.buildTServer(THsHaDisruptorServer.java:74)
>   at 
> org.apache.cassandra.thrift.TServerCustomFactory.buildTServer(TServerCustomFactory.java:55)
>   at 
> org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.(ThriftServer.java:128)
>   at org.apache.cassandra.thrift.ThriftServer.start(ThriftServer.java:55)
>   at 
> org.apache.cassandra.service.CassandraDaemon.startNativeTransport(CassandraDaemon.java:713)
>   at 
> org.apache.cassandra.service.CassandraDaemon.start(CassandraDaemon.java:538)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:643)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:768)
> {code}
> No configuration changed between 3.11.7 and 3.11.8. rpc_enabled is false in 
> both versions.
> I created this Jira issue because clearly something changed between 3.11.7 
> and 3.11.8. Maybe intentional, maybe not. Changing `rpc_server_type` (which 
> is not clearly documented to be about Thrift only) from `hsha` to `sync` does 
> resolve the issue, as expected, but this does seem like a regression, hence 
> the Jira issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-14 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214034#comment-17214034
 ] 

Caleb Rackliffe commented on CASSANDRA-16180:
-

I'd also propose that we leave Paxos/CAS out of scope for this issue. CC 
[~benedict]

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16181) 4.0 quality testing: Replication

2020-10-14 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214032#comment-17214032
 ] 

Caleb Rackliffe commented on CASSANDRA-16181:
-

See 
https://issues.apache.org/jira/browse/CASSANDRA-16180?focusedCommentId=17214031=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17214031,
 but focusing on hints and the write path.

> 4.0 quality testing: Replication
> 
>
> Key: CASSANDRA-16181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16181
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on replication.
> I think that the main reference dtest for this is 
> [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-14 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214031#comment-17214031
 ] 

Caleb Rackliffe commented on CASSANDRA-16180:
-

[~adelapena] [~bdeggleston] I know the main focus of this issue is testing, but 
I want to propose something that bleeds into our documentation and code 
organization along the way. {{StorageProxy}} is one of the most critical 
classes in the entire project, but it is almost exactly 3000 lines of code and 
has zero class-level JavaDoc. We should break it up into its major constituent 
parts (hints, Paxos, point reads, range reads, etc.) and consider testing those 
constituent parts in isolation. (There is a {{StorageProxyTest}}, but it's 
really just a test for some utilities that also happen to be jammed into 
{{StorageProxy}}.)

We don't have to boil the ocean either. You, [~jasonstack], and I know that SAI 
is already likely going to pull the range read logic out of {{StorageProxy}}, 
so pulling that forward (again, assuming we have reasonable tests to avoid 
risk) along w/ point reads could be a good first step. (That also corresponds 
pretty closely to this Jira in particular.)

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16181) 4.0 quality testing: Replication

2020-10-14 Thread Caleb Rackliffe (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16181:

Reviewers: Caleb Rackliffe

> 4.0 quality testing: Replication
> 
>
> Key: CASSANDRA-16181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16181
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on replication.
> I think that the main reference dtest for this is 
> [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16181) 4.0 quality testing: Replication

2020-10-14 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214025#comment-17214025
 ] 

Caleb Rackliffe commented on CASSANDRA-16181:
-

[~adelapena] Do you think we should cover hints/hinted handoff here? In some 
sense, this and CASSANDRA-16180 are both about coordinator hardening, but this 
one is on the write side, and that one the read side.

> 4.0 quality testing: Replication
> 
>
> Key: CASSANDRA-16181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16181
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on replication.
> I think that the main reference dtest for this is 
> [replication_test.py|https://github.com/apache/cassandra-dtest/blob/master/replication_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16180) 4.0 quality testing: Coordination

2020-10-14 Thread Caleb Rackliffe (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caleb Rackliffe updated CASSANDRA-16180:

Reviewers: Caleb Rackliffe

> 4.0 quality testing: Coordination
> -
>
> Key: CASSANDRA-16180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16180
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/unit
>Reporter: Andres de la Peña
>Assignee: Andres de la Peña
>Priority: Normal
> Fix For: 4.0
>
>
> This is a subtask of CASSANDRA-15579 focusing on coordination.
> I think that the main reference dtest for this is 
> [consistency_test.py|https://github.com/apache/cassandra-dtest/blob/master/consistency_test.py].
>  We should identify which other tests cover this and identify what should be 
> extended, similarly to what has been done with CASSANDRA-15977.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214013#comment-17214013
 ] 

Alex Petrov commented on CASSANDRA-16057:
-

+1

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Fix Version/s: (was: 4.0-beta3)

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Alex Petrov (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

Status: Resolved  (was: Open)

Thank you [~marcuse]! Committed a follow-up to 
[3a05ed3ce15ab4dcd5f13b9b56c18c0198c0e203|https://github.com/apache/cassandra/commit/3a05ed3ce15ab4dcd5f13b9b56c18c0198c0e203]

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-14 Thread Yifan Cai (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214000#comment-17214000
 ] 

Yifan Cai commented on CASSANDRA-16057:
---

Good catch! 

It turns out that my previous command used to list {{(System.out | 
System.err)}} usage was wrong. The "-u" is unnecessary. 
I have fixed the 3.11 branch. 

{code:java}
08:14:09 in cassandra on b/CASSANDRA-16057-3.11 
➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk 
{'print $1'} | sort | uniq 
src/java/org/apache/cassandra/tools/AbstractJmxClient.java:
src/java/org/apache/cassandra/tools/BulkLoader.java:
src/java/org/apache/cassandra/tools/GetVersion.java:
src/java/org/apache/cassandra/tools/LoaderOptions.java:
src/java/org/apache/cassandra/tools/NodeProbe.java:
src/java/org/apache/cassandra/tools/Output.java:
src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java:
src/java/org/apache/cassandra/tools/SSTableExport.java:
src/java/org/apache/cassandra/tools/SSTableLevelResetter.java:
src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java:
src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java:
src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java:
src/java/org/apache/cassandra/tools/StandaloneSSTableUtil.java:
src/java/org/apache/cassandra/tools/StandaloneScrubber.java:
src/java/org/apache/cassandra/tools/StandaloneSplitter.java:
src/java/org/apache/cassandra/tools/StandaloneUpgrader.java:
src/java/org/apache/cassandra/tools/StandaloneVerifier.java:
src/java/org/apache/cassandra/tools/Util.java:
src/java/org/apache/cassandra/tools/nodetool/formatter/TableBuilder.java:

08:15:16 in cassandra on b/CASSANDRA-16057-3.0 
➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk 
{'print $1'} | sort | uniq 
src/java/org/apache/cassandra/tools/AbstractJmxClient.java:
src/java/org/apache/cassandra/tools/BulkLoader.java:
src/java/org/apache/cassandra/tools/GetVersion.java:
src/java/org/apache/cassandra/tools/NodeProbe.java:
src/java/org/apache/cassandra/tools/Output.java:
src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java:
src/java/org/apache/cassandra/tools/SSTableExport.java:
src/java/org/apache/cassandra/tools/SSTableLevelResetter.java:
src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java:
src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java:
src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java:
src/java/org/apache/cassandra/tools/StandaloneSSTableUtil.java:
src/java/org/apache/cassandra/tools/StandaloneScrubber.java:
src/java/org/apache/cassandra/tools/StandaloneSplitter.java:
src/java/org/apache/cassandra/tools/StandaloneUpgrader.java:
src/java/org/apache/cassandra/tools/StandaloneVerifier.java:
src/java/org/apache/cassandra/tools/Util.java:

08:15:26 in cassandra on b/CASSANDRA-16057-2.2
➜ egrep -r 'System.out|System.err' src/java/org/apache/cassandra/tools | awk 
{'print $1'} | sort | uniq 
src/java/org/apache/cassandra/tools/AbstractJmxClient.java:
src/java/org/apache/cassandra/tools/BulkLoader.java:
src/java/org/apache/cassandra/tools/GetVersion.java:
src/java/org/apache/cassandra/tools/NodeProbe.java:
src/java/org/apache/cassandra/tools/Output.java:
src/java/org/apache/cassandra/tools/SSTableExpiredBlockers.java:
src/java/org/apache/cassandra/tools/SSTableExport.java:
src/java/org/apache/cassandra/tools/SSTableImport.java:
src/java/org/apache/cassandra/tools/SSTableLevelResetter.java:
src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java:
src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java:
src/java/org/apache/cassandra/tools/SSTableRepairedAtSetter.java:
src/java/org/apache/cassandra/tools/StandaloneScrubber.java:
src/java/org/apache/cassandra/tools/StandaloneSplitter.java:
src/java/org/apache/cassandra/tools/StandaloneUpgrader.java:
src/java/org/apache/cassandra/tools/StandaloneVerifier.java:
src/java/org/apache/cassandra/tools/Util.java:
{code}


> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated (6098762 -> 3a05ed3)

2020-10-14 Thread ifesdjeen

This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 6098762  Fail truncation requests when they fail on a replica
 add 3a05ed3  Follow-up: fix test failures caused by 16207.

No new revisions were added by this update.

Summary of changes:
 .../apache/cassandra/distributed/impl/Instance.java | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16057) Should update in-jvm dtest to expose stdout and stderr for nodetool

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213989#comment-17213989
 ] 

Alex Petrov commented on CASSANDRA-16057:
-

Code looks good. The only thing is that in 3.11 we still use {{System.out}} in 
{{ViewBuildStatus.java}} and {{Info.java}}. 

> Should update in-jvm dtest to expose stdout and stderr for nodetool
> ---
>
> Key: CASSANDRA-16057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16057
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: NA
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Many nodetool commands output to stdout or stderr so running nodetool using 
> in-jvm dtest should expose that to tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Marcus Eriksson (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213979#comment-17213979
 ] 

Marcus Eriksson commented on CASSANDRA-16207:
-

+1

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica

2020-10-14 Thread Brandon Williams (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16208:
-
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/609876275738589fdfb9a3e20cb2f594aa404037
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Fail truncation requests when they fail on replica
> --
>
> Key: CASSANDRA-16208
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16208
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16208) Fail truncation requests when they fail on replica

2020-10-14 Thread Brandon Williams (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16208:
-
Status: Ready to Commit  (was: Review In Progress)

> Fail truncation requests when they fail on replica
> --
>
> Key: CASSANDRA-16208
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16208
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta3
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra-dtest] branch master updated: Add test_truncate_failure

2020-10-14 Thread brandonwilliams

This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/master by this push:
 new 8cb6bd2  Add test_truncate_failure
8cb6bd2 is described below

commit 8cb6bd23e62c4d3b4e208d3909361d6812182bc6
Author: Ekaterina Dimitrova 
AuthorDate: Thu Oct 8 09:23:00 2020 -0400

Add test_truncate_failure

Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for
CASSANDRA-16208
---
 byteman/truncate_fail.btm |  8 
 cql_test.py   | 33 +
 2 files changed, 41 insertions(+)

diff --git a/byteman/truncate_fail.btm b/byteman/truncate_fail.btm
new file mode 100644
index 000..fa9caba
--- /dev/null
+++ b/byteman/truncate_fail.btm
@@ -0,0 +1,8 @@
+RULE Throw during truncate operation
+CLASS org.apache.cassandra.db.ColumnFamilyStore
+METHOD truncateBlocking()
+AT ENTRY
+IF TRUE
+DO
+   throw new RuntimeException("Dummy failure");
+ENDRULE
\ No newline at end of file
diff --git a/cql_test.py b/cql_test.py
index eced21d..dde7b7d 100644
--- a/cql_test.py
+++ b/cql_test.py
@@ -1,4 +1,5 @@
 import itertools
+import re
 import struct
 import time
 import pytest
@@ -764,6 +765,38 @@ class TestMiscellaneousCQL(CQLTester):
 [2, None, 2, None],
 [3, None, 3, None]])
 
+@since("4.0")
+def test_truncate_failure(self):
+"""
+@jira_ticket CASSANDRA-16208
+Tests that if a TRUNCATE query fails on some replica, the coordinator 
will immediately return an error to the
+client instead of waiting to time out because it couldn't get the 
necessary number of success acks.
+"""
+cluster = self.cluster
+cluster.populate(3, install_byteman=True).start()
+node1, _, node3 = cluster.nodelist()
+node3.byteman_submit(['./byteman/truncate_fail.btm'])
+
+session = self.patient_exclusive_cql_connection(node1)
+create_ks(session, 'ks', 3)
+
+logger.debug("Creating data table")
+session.execute("CREATE TABLE data (id int PRIMARY KEY, data text)")
+session.execute("UPDATE data SET data = 'Awesome' WHERE id = 1")
+
+self.fixture_dtest_setup.ignore_log_patterns = ['Dummy failure']
+logger.debug("Truncating data table (error expected)")
+
+thrown = False
+exception = None
+try:
+session.execute("TRUNCATE data")
+except Exception as e:
+exception = e
+thrown = True
+
+assert thrown, "No exception has been thrown"
+assert re.search("Truncate failed on replica /127.0.0.3", 
str(exception)) is not None
 
 @since('3.2')
 class AbortedQueryTester(CQLTester):


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated: Fail truncation requests when they fail on a replica

2020-10-14 Thread brandonwilliams

This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 6098762  Fail truncation requests when they fail on a replica
6098762 is described below

commit 609876275738589fdfb9a3e20cb2f594aa404037
Author: Ekaterina Dimitrova 
AuthorDate: Mon Oct 12 18:11:51 2020 -0400

Fail truncation requests when they fail on a replica

Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for
CASSANDRA-16208
---
 CHANGES.txt|  1 +
 .../apache/cassandra/db/TruncateVerbHandler.java   | 24 +--
 .../cassandra/service/TruncateResponseHandler.java | 27 ++
 3 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index a7701c7..fe3fef8 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-beta3
+ * Fail truncation requests when they fail on a replica (CASSANDRA-16208)
  * Move compact storage validation earlier in startup process (CASSANDRA-16063)
  * Fix ByteBufferAccessor cast exceptions are thrown when trying to query a 
virtual table (CASSANDRA-16155)
  * Consolidate node liveness check for forced repair (CASSANDRA-16113)
diff --git a/src/java/org/apache/cassandra/db/TruncateVerbHandler.java 
b/src/java/org/apache/cassandra/db/TruncateVerbHandler.java
index c605d1f..0d71464 100644
--- a/src/java/org/apache/cassandra/db/TruncateVerbHandler.java
+++ b/src/java/org/apache/cassandra/db/TruncateVerbHandler.java
@@ -34,31 +34,31 @@ public class TruncateVerbHandler implements 
IVerbHandler
 
 public void doVerb(Message message)
 {
-TruncateRequest t = message.payload;
-Tracing.trace("Applying truncation of {}.{}", t.keyspace, t.table);
+TruncateRequest truncation = message.payload;
+Tracing.trace("Applying truncation of {}.{}", truncation.keyspace, 
truncation.table);
 try
 {
-ColumnFamilyStore cfs = 
Keyspace.open(t.keyspace).getColumnFamilyStore(t.table);
+ColumnFamilyStore cfs = 
Keyspace.open(truncation.keyspace).getColumnFamilyStore(truncation.table);
 cfs.truncateBlocking();
 }
-catch (Exception e)
+catch (Throwable throwable)
 {
-logger.error("Error in truncation", e);
-respondError(t, message);
+logger.error("Error in truncation", throwable);
+respondError(truncation, message);
 
-if (FSError.findNested(e) != null)
-throw FSError.findNested(e);
+if (FSError.findNested(throwable) != null)
+throw FSError.findNested(throwable);
 }
 Tracing.trace("Enqueuing response to truncate operation to {}", 
message.from());
 
-TruncateResponse response = new TruncateResponse(t.keyspace, t.table, 
true);
-logger.trace("{} applied.  Enqueuing response to {}@{} ", t, 
message.id(), message.from());
+TruncateResponse response = new TruncateResponse(truncation.keyspace, 
truncation.table, true);
+logger.trace("{} applied.  Enqueuing response to {}@{} ", truncation, 
message.id(), message.from());
 MessagingService.instance().send(message.responseWith(response), 
message.from());
 }
 
-private static void respondError(TruncateRequest t, Message 
truncateRequestMessage)
+private static void respondError(TruncateRequest truncation, Message 
truncateRequestMessage)
 {
-TruncateResponse response = new TruncateResponse(t.keyspace, t.table, 
false);
+TruncateResponse response = new TruncateResponse(truncation.keyspace, 
truncation.table, false);
 
MessagingService.instance().send(truncateRequestMessage.responseWith(response), 
truncateRequestMessage.from());
 }
 }
diff --git a/src/java/org/apache/cassandra/service/TruncateResponseHandler.java 
b/src/java/org/apache/cassandra/service/TruncateResponseHandler.java
index bcd7426..c2651e6 100644
--- a/src/java/org/apache/cassandra/service/TruncateResponseHandler.java
+++ b/src/java/org/apache/cassandra/service/TruncateResponseHandler.java
@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.service;
 
+import java.net.InetAddress;
 import java.util.concurrent.TimeoutException;
 import java.util.concurrent.atomic.AtomicInteger;
 
@@ -24,19 +25,22 @@ import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.TruncateResponse;
+import org.apache.cassandra.exceptions.TruncateException;
 import org.apache.cassandra.net.RequestCallback;
 import org.apache.cassandra.net.Message;
 import org.apache.cassandra.utils.concurrent.SimpleCondition;
 
 import static java.util.concurrent.TimeUnit.NANOSECONDS;
 
-public class TruncateResponseHandler implements RequestCallback

[jira] [Comment Edited] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213973#comment-17213973
 ] 

Alex Petrov edited comment on CASSANDRA-16207 at 10/14/20, 2:52 PM:


This patch caused several test failures. Follow-up/fix: 
|[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]|


was (Author: ifesdjeen):
This patch caused several test failures. 

|[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]|

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213973#comment-17213973
 ] 

Alex Petrov commented on CASSANDRA-16207:
-

This patch caused several test failures. 

|[patch|https://github.com/apache/cassandra/pull/777]|[CI|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=CASSANDRA-16207-followup]|

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Alex Petrov (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

Status: Open  (was: Resolved)

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-16209) Log Warning Rather than Verbose Trace when Preview Repair Validation Conflicts with Incremental Repair

2020-10-14 Thread Caleb Rackliffe (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213480#comment-17213480
 ] 

Caleb Rackliffe edited comment on CASSANDRA-16209 at 10/14/20, 2:38 PM:


[patch|https://github.com/apache/cassandra/pull/776]
[CircleCI|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16209]

Note: The failures in the first round of tests look mostly related to 
CASSANDRA-16148


was (Author: maedhroz):
[patch|https://github.com/apache/cassandra/pull/776]
[CircleCI|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16209]

> Log Warning Rather than Verbose Trace when Preview Repair Validation 
> Conflicts with Incremental Repair
> --
>
> Key: CASSANDRA-16209
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16209
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Caleb Rackliffe
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a preview repair on repaired data identifies which SSTables to validate, 
> it might come across an SSTable that's still pending for an in-progress 
> incremental repair session. It makes sense that we immediately fail the 
> preview repair in that case, but the resulting error and verbose stack trace 
> in the logs is a bit too severe a reaction. We should downgrade this to a 
> simple warning message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Description: Unit Test failure: TestRepairDataSystemTable.repair_table_test 
(vnodes) - one random failure was reported which pointed to a race condition to 
be spotted.   (was: Unit Test failure: 
TestRepairDataSystemTable.repair_table_test (vnodes) - one failure there was 
reported which pointed to a race condition to be spotted. )

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> random failure was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Fix Version/s: 4.0-beta3
   3.11.x

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta3
>
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> failure there was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

Description: Unit Test failure: TestRepairDataSystemTable.repair_table_test 
(vnodes) - one failure there was reported which pointed to a race condition to 
be spotted. 

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>
> Unit Test failure: TestRepairDataSystemTable.repair_table_test (vnodes) - one 
> failure there was reported which pointed to a race condition to be spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16210:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Normal
  Component/s: Cluster/Schema
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Synchronize Keyspace instance store/clear
> -
>
> Key: CASSANDRA-16210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-16210) Synchronize Keyspace instance store/clear

2020-10-14 Thread Ekaterina Dimitrova (Jira)

Ekaterina Dimitrova created CASSANDRA-16210:
---

 Summary: Synchronize Keyspace instance store/clear
 Key: CASSANDRA-16210
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16210
 Project: Cassandra
  Issue Type: Bug
Reporter: Ekaterina Dimitrova
Assignee: Ekaterina Dimitrova






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16197) Upgrade the metrics version

2020-10-14 Thread Benjamin Lerer (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213941#comment-17213941
 ] 

Benjamin Lerer commented on CASSANDRA-16197:


It is probably easier and cleaner to open a new one at that time if we have the 
need for it. 

> Upgrade the metrics version
> ---
>
> Key: CASSANDRA-16197
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16197
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
>
> The current metrics version used by Cassandra is 3.1.5 which was not compiled 
> and targeted for the JDK 8 
> (https://metrics.dropwizard.io/4.1.2/about/release-notes.html). 
> There are several bug fixes that would also be interesting to get.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15581) 4.0 quality testing: Compaction

2020-10-14 Thread Marcus Eriksson (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213926#comment-17213926
 ] 

Marcus Eriksson commented on CASSANDRA-15581:
-

I think this task should focus mostly on the mechanics of picking sstables for 
compaction, not the actual merging of sstables (though, that will of course 
also be tested by anything we do here). What [~paulo] defined above would be a 
good start

Major compaction changes were CASSANDRA-6696 and CASSANDRA-7019

* Run all tests with different amounts of data directories (1/5/20)
* Run all tests with different compaction strategies (LCS/STCS/TWCS)
* Run LCS tests with {{single_sstable_uplevel}} on/off - CASSANDRA-12526
* Bootstrap/decom/replace, make sure disk usage is balanced on new + old nodes
* Heavy compaction load + range movements
* Heavy compaction load + ALTER .. WITH compaction = ..
* Heavy compaction load + incremental repair / anticompaction
* Test large node upgrades with several data directories (3.0 -> 4.0 probably 
most interesting here)
* Test `nodetool garbagecollect` with large datasets and many tombstones.

> 4.0 quality testing: Compaction
> ---
>
> Key: CASSANDRA-15581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15581
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest/python
>Reporter: Josh McKenzie
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Reference [doc from 
> NGCC|https://docs.google.com/document/d/1uhUOp7wpE9ZXNDgxoCZHejHt5SO4Qw1dArZqqsJccyQ/edit#]
>  for context.
> *Shepherd: Marcus Eriksson*
> Alongside the local and distributed read/write paths, we'll also want to 
> validate compaction. CASSANDRA-6696 introduced substantial 
> changes/improvements that require testing (esp. JBOD).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries

2020-10-14 Thread Josh McKenzie (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213919#comment-17213919
 ] 

Josh McKenzie commented on CASSANDRA-15241:
---

[~clohfink] - It's a new feature so we wouldn't put it in 4.0 right? I don't 
*think* this is one of the ones we discussed on the ML/slack about straddling 
the freeze (inferring from dates on the ticket here). Feel free to correct me 
if I'm wrong on that though; we've been talking a lot lately.

> Virtual table to expose current running queries
> ---
>
> Key: CASSANDRA-15241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Virtual Tables
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
> Fix For: 4.0
>
>
> Expose current running queries and their duration.
> {code}cqlsh> select * from system_views.queries;
>  thread_id| duration_micros | task
> --+-+-
>  Native-Transport-Requests-17 |6325 |  QUERY 
> select * from system_views.queries; [pageSize = 100]
>   Native-Transport-Requests-4 |   14681 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>   Native-Transport-Requests-6 |   14678 | EXECUTE 
> f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE
>  ReadStage-10 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-13 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-14 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-19 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-20 |   11861 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-22 |7279 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>  ReadStage-23 |4716 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-5 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-7 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000
>   ReadStage-8 |   16535 | 
>SELECT * FROM basic.wide1 LIMIT 5000{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16151) Package tools/bin scripts as executable

2020-10-14 Thread Paulo Motta (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-16151:

Bug Category: Parent values: Packaging(13660)Level 1 values: Source 
Distribution(13661)  (was: Parent values: Code(13163)Level 1 values: Bug - 
Unclear Impact(13164))

> Package tools/bin scripts as executable
> ---
>
> Key: CASSANDRA-16151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16151
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Angelo Polo
>Assignee: Angelo Polo
>Priority: Normal
>  Labels: patch
> Fix For: 4.0-beta, 3.11.9
>
> Attachments: 3.11-Package-tools-bin-scripts-as-executable.patch, 
> trunk-Package-tools-bin-scripts-as-executable.patch
>
>
> The tools/bin scripts aren't packaged as executable in the source 
> distributions, though in the repository the scripts have the right bits.
> This causes, on 3.11.8 for example, the tests in 
> org.apache.cassandra.cql3.EmptyValuesTest to fail:
> {{java.io.IOException: Cannot run program "tools/bin/sstabledump": error=13, 
> Permission denied}}
> {{[junit-timeout] junit.framework.AssertionFailedError: java.io.IOException}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verify(EmptyValuesTest.java:85)}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verifyJsonInsert(EmptyValuesTest.java:112)}}
> See attached patch of build.xml for the trunk and cassandra-3.11 branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-14 Thread Aleksey Yeschenko (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213879#comment-17213879
 ] 

Aleksey Yeschenko commented on CASSANDRA-15935:
---

The difference isn't huge, and I myself don't have a *strong* preference 
either, but my weak preference goes to the more Java-y, {{Action}} route.

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799
 ] 

Alex Petrov edited comment on CASSANDRA-15935 at 10/14/20, 10:17 AM:
-

Moving a 
[conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334]
 about {{Action}} vs static method here. 

bq. in your examples there are no real differences between run and forEach, so 
I rather have forEach only.

You're right there are no real differences between {{run}} and {{forEach}}. 
However, I had several reasons to use interface implementaions, which are:

# {{Action}} is an atomic unit of logic, unlike a static method. You can 
immediately see all things related to a specific action, reuse, and move then 
at your discression. Using static methods will quickly get out of hand when we 
have more sophisticated actions. 
# Separation of input arguments (for example "disseminate gossip state of the 
node X") and `target`, making `target` explicit and common for all cases. In 
some cases, we can even reduce amount of work we're doing, and do it once in a 
constructor. For example, 
[here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144],
 we create gossip state that gets disseminated by getting applied to each 
action. Contrast this with 
[this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143],
 where we either have to change to use instance collection as input, or 
re-create distributed state each time.
# Main idea behind the `Action` was to create reusable pieces of logic you can 
apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, 
and then bootstrap", but we can reuse similar sequences of steps in: 
*  (a) Harry, where we can schedule different actions against different sets of 
nodes, producing reliable results
* (b) In upgrade test, where you'll be able to run named actions aginst 
instances despite the fact they have different versions. 
To do (a) and (b) with static methods, we'll have to _still_ implement some 
interface. 
# We can use static code analysis to find all Actions in the code.
# We can chain actions, too:
{code}
cluster.run(asList(pullSchemaFrom(cluster.get(1)),
 bootstrap()),
newInstance.config().num());
{code}

I've used {{Action}} from the beginning with these intention. Everyone I asked 
has no strong preference towards on or the other, and it's same with me: aside 
from the above arguments, difference is purely syntactic. Both approaches have 
equivalent semantics.


was (Author: ifesdjeen):
Moving a 
[conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334]
 about {{Action}} vs static method here. 

bq. in your examples there are no real differences between run and forEach, so 
I rather have forEach only.

You're right there are no real differences between {{run}} and {{forEach}}. 
However, I had several reasons to use interface implementaions, which are:

1. {{Action}} is an atomic unit of logic, unlike a static method. You can 
immediately see all things related to a specific action, reuse, and move then 
at your discression. Using static methods will quickly get out of hand when we 
have more sophisticated actions. 
2. Separation of input arguments (for example "disseminate gossip state of the 
node X") and `target`, making `target` explicit and common for all cases. In 
some cases, we can even reduce amount of work we're doing, and do it once in a 
constructor. For example, 
[here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144],
 we create gossip state that gets disseminated by getting applied to each 
action. Contrast this with 
[this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143],
 where we either have to change to use instance collection as input, or 
re-create distributed state each time.
3. Main idea behind the `Action` was to create reusable pieces of logic you can 
apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, 
and then bootstrap", but we can reuse similar sequences of steps in: 
   a. Harry, where we can schedule different actions against different sets of 
nodes, producing reliable results
   b. In upgrade test, where you'll be able to run named actions aginst 
instances despite the fact they have different versions. 
To do (a) and (b) with static methods, we'll have to _still_ implement some

[jira] [Updated] (CASSANDRA-16151) Package tools/bin scripts as executable

2020-10-14 Thread Jira



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16151:
--
Reviewers: Andres de la Peña

> Package tools/bin scripts as executable
> ---
>
> Key: CASSANDRA-16151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16151
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Angelo Polo
>Assignee: Angelo Polo
>Priority: Normal
>  Labels: patch
> Fix For: 4.0-beta, 3.11.9
>
> Attachments: 3.11-Package-tools-bin-scripts-as-executable.patch, 
> trunk-Package-tools-bin-scripts-as-executable.patch
>
>
> The tools/bin scripts aren't packaged as executable in the source 
> distributions, though in the repository the scripts have the right bits.
> This causes, on 3.11.8 for example, the tests in 
> org.apache.cassandra.cql3.EmptyValuesTest to fail:
> {{java.io.IOException: Cannot run program "tools/bin/sstabledump": error=13, 
> Permission denied}}
> {{[junit-timeout] junit.framework.AssertionFailedError: java.io.IOException}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verify(EmptyValuesTest.java:85)}}
> {{[junit-timeout]         at 
> org.apache.cassandra.cql3.EmptyValuesTest.verifyJsonInsert(EmptyValuesTest.java:112)}}
> See attached patch of build.xml for the trunk and cassandra-3.11 branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799
 ] 

Alex Petrov edited comment on CASSANDRA-15935 at 10/14/20, 10:16 AM:
-

Moving a 
[conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334]
 about {{Action}} vs static method here. 

bq. in your examples there are no real differences between run and forEach, so 
I rather have forEach only.

You're right there are no real differences between {{run}} and {{forEach}}. 
However, I had several reasons to use interface implementaions, which are:

1. {{Action}} is an atomic unit of logic, unlike a static method. You can 
immediately see all things related to a specific action, reuse, and move then 
at your discression. Using static methods will quickly get out of hand when we 
have more sophisticated actions. 
2. Separation of input arguments (for example "disseminate gossip state of the 
node X") and `target`, making `target` explicit and common for all cases. In 
some cases, we can even reduce amount of work we're doing, and do it once in a 
constructor. For example, 
[here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144],
 we create gossip state that gets disseminated by getting applied to each 
action. Contrast this with 
[this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143],
 where we either have to change to use instance collection as input, or 
re-create distributed state each time.
3. Main idea behind the `Action` was to create reusable pieces of logic you can 
apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, 
and then bootstrap", but we can reuse similar sequences of steps in: 
   a. Harry, where we can schedule different actions against different sets of 
nodes, producing reliable results
   b. In upgrade test, where you'll be able to run named actions aginst 
instances despite the fact they have different versions. 
To do (a) and (b) with static methods, we'll have to _still_ implement some 
interface. 
4. We can use static code analysis to find all Actions in the code.
5. We can chain actions, too:
{code}
cluster.run(asList(pullSchemaFrom(cluster.get(1)),
 bootstrap()),
newInstance.config().num());
{code}

I've used {{Action}} from the beginning with these intention. Everyone I asked 
has no strong preference towards on or the other, and it's same with me: aside 
from the above arguments, difference is purely syntactic. Both approaches have 
equivalent semantics.


was (Author: ifesdjeen):
Moving a 
[conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334]
 about {{Action}} vs static method here. 

bq. in your examples there are no real differences between run and forEach, so 
I rather have forEach only.

You're right there are no real differences between {{run}} and {{forEach}}. 
However, I had several reasons to use interface implementaions, which are:

1. {{Action}} is an atomic unit of logic, unlike a static method. You can 
immediately see all things related to a specific action, reuse, and move then 
at your discression. Using static methods will quickly get out of hand when we 
have more sophisticated actions. 
2. Separation of input arguments (for example "disseminate gossip state of the 
node X") and `target`, making `target` explicit and common for all cases.
3. Main idea behind the `Action` was to create reusable pieces of logic you can 
apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, 
and then bootstrap", but we can reuse similar sequences of steps in: 
   a. Harry, where we can schedule different actions against different sets of 
nodes, producing reliable results
   b. In upgrade test, where you'll be able to run named actions aginst 
instances despite the fact they have different versions. 
To do (a) and (b) with static methods, we'll have to _still_ implement some 
interface. 
4. In some cases, we can just reduce amount of work we're doing. For example, 
[here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144],
 we create gossip state that gets disseminated by getting applied to each 
action. Contrast this with 
[this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143],
 where we either have to change to use instance collection as input, or 
re-create distributed state each time.
5. We can use static code

[jira] [Commented] (CASSANDRA-15935) Improve machinery for testing consistency in presence of range movements

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213799#comment-17213799
 ] 

Alex Petrov commented on CASSANDRA-15935:
-

Moving a 
[conversation|https://github.com/apache/cassandra/pull/759#discussion_r503611334]
 about {{Action}} vs static method here. 

bq. in your examples there are no real differences between run and forEach, so 
I rather have forEach only.

You're right there are no real differences between {{run}} and {{forEach}}. 
However, I had several reasons to use interface implementaions, which are:

1. {{Action}} is an atomic unit of logic, unlike a static method. You can 
immediately see all things related to a specific action, reuse, and move then 
at your discression. Using static methods will quickly get out of hand when we 
have more sophisticated actions. 
2. Separation of input arguments (for example "disseminate gossip state of the 
node X") and `target`, making `target` explicit and common for all cases.
3. Main idea behind the `Action` was to create reusable pieces of logic you can 
apply to cluster or nodes. For now, logic is simple, like "pull schema fom X, 
and then bootstrap", but we can reuse similar sequences of steps in: 
   a. Harry, where we can schedule different actions against different sets of 
nodes, producing reliable results
   b. In upgrade test, where you'll be able to run named actions aginst 
instances despite the fact they have different versions. 
To do (a) and (b) with static methods, we'll have to _still_ implement some 
interface. 
4. In some cases, we can just reduce amount of work we're doing. For example, 
[here|https://github.com/ifesdjeen/cassandra/blob/96eab42347f51bd32a22875b85f4acf6cd9785d4/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L136-L144],
 we create gossip state that gets disseminated by getting applied to each 
action. Contrast this with 
[this|https://github.com/dcapwell/cassandra/blob/935f78b101484f6dfee473fd5375a31761f02b39/test/distributed/org/apache/cassandra/distributed/action/GossipHelper.java#L122-L143],
 where we either have to change to use instance collection as input, or 
re-create distributed state each time.
5. We can use static code analysis to find all Actions in the code.
6. We can chain actions, too:
{code}
cluster.run(asList(pullSchemaFrom(cluster.get(1)),
 bootstrap()),
newInstance.config().num());
{code}

I've used {{Action}} from the beginning with these intention. Everyone I asked 
has no strong preference towards on or the other, and it's same with me: aside 
from the above arguments, difference is purely syntactic. Both approaches have 
equivalent semantics.

> Improve machinery for testing consistency in presence of range movements
> 
>
> Key: CASSANDRA-15935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15935
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> Currently, we can test range movements only by adding and bootstrapping a new 
> node. This is both inefficient and insufficient for large-scale tests. We 
> need a possibility to dynamically change ring ownership over the lifetime of 
> cluster, with a flexibility to changing gossip status of the node from 
> perspective of other participants, adding and removing nodes from other 
> nodes' views on demand.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16196) Fix flaky test test_disk_balance_after_boundary_change_lcs - disk_balance_test.TestDiskBalance

2020-10-14 Thread Berenguer Blasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213787#comment-17213787
 ] 

Berenguer Blasi commented on CASSANDRA-16196:
-

SGTM and there's no byteman either I can think of to catch pending deletes... 
:shrug:

> Fix flaky test test_disk_balance_after_boundary_change_lcs - 
> disk_balance_test.TestDiskBalance
> --
>
> Key: CASSANDRA-16196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: node2-debug-end.log
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/622/workflows/adcd463c-156a-43c7-a9bc-7f3e4938dbe8/jobs/3514
> {code}
> error_message = '' if 'error_message' not in kwargs else 
> kwargs['error_message']
> assert vmin > vmax * (1.0 - error) or vmin == vmax, \
> >   "values not within {:.2f}% of the max: {} ({})".format(error * 
> > 100, args, error_message)
> E   AssertionError: values not within 10.00% of the max: (8022760, 
> 9192165, 4575645, 9235566, 9091014) (node2)
> tools/assertions.py:206: AssertionError
> {code}
> Marking as distinct issue after chat in CASSANDRA-14030



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16196) Fix flaky test test_disk_balance_after_boundary_change_lcs - disk_balance_test.TestDiskBalance

2020-10-14 Thread Berenguer Blasi (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-16196:

Reviewers: Berenguer Blasi, Brandon Williams  (was: Brandon Williams)

> Fix flaky test test_disk_balance_after_boundary_change_lcs - 
> disk_balance_test.TestDiskBalance
> --
>
> Key: CASSANDRA-16196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: node2-debug-end.log
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/622/workflows/adcd463c-156a-43c7-a9bc-7f3e4938dbe8/jobs/3514
> {code}
> error_message = '' if 'error_message' not in kwargs else 
> kwargs['error_message']
> assert vmin > vmax * (1.0 - error) or vmin == vmax, \
> >   "values not within {:.2f}% of the max: {} ({})".format(error * 
> > 100, args, error_message)
> E   AssertionError: values not within 10.00% of the max: (8022760, 
> 9192165, 4575645, 9235566, 9091014) (node2)
> tools/assertions.py:206: AssertionError
> {code}
> Marking as distinct issue after chat in CASSANDRA-14030



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16201:
---
Reviewers: Michael Semb Wever

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-14 Thread Alex Petrov (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16157:

  Fix Version/s: 4.0-beta3
  Since Version: 4.0-beta1
Source Control Link: 
https://github.com/apache/cassandra/commit/5be83b6a72695253c552535d2b826209f144cc63
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to trunk with 
[5be83b6a72695253c552535d2b826209f144cc63|https://github.com/apache/cassandra/commit/5be83b6a72695253c552535d2b826209f144cc63]

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-beta3
>
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated: Fix NPEs when 3.0 messages get re-serialized for filtering on 4.0 nodes in in-JVM dtests.

2020-10-14 Thread ifesdjeen

This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 5be83b6  Fix NPEs when 3.0 messages get re-serialized for filtering on 
4.0 nodes in in-JVM dtests.
5be83b6 is described below

commit 5be83b6a72695253c552535d2b826209f144cc63
Author: Alex Petrov 
AuthorDate: Thu Oct 1 17:00:12 2020 +0200

Fix NPEs when 3.0 messages get re-serialized for filtering on 4.0 nodes in 
in-JVM dtests.

Patch by Alex Petrov; reviewed by Yifan Cai and David Capwell for 
CASSANDRA-16157
---
 .../cassandra/distributed/impl/Instance.java   | 14 +-
 .../cassandra/distributed/impl/MessageImpl.java| 13 -
 .../cassandra/distributed/upgrade/UpgradeTest.java | 22 ++
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java 
b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
index 6ad0712..47e2b32 100644
--- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
@@ -83,6 +83,7 @@ import org.apache.cassandra.gms.Gossiper;
 import org.apache.cassandra.gms.VersionedValue;
 import org.apache.cassandra.hints.HintsService;
 import org.apache.cassandra.index.SecondaryIndexManager;
+import org.apache.cassandra.io.IVersionedAsymmetricSerializer;
 import org.apache.cassandra.io.sstable.IndexSummaryManager;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.DataInputBuffer;
@@ -91,6 +92,7 @@ import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.locator.InetAddressAndPort;
 import org.apache.cassandra.net.Message;
 import org.apache.cassandra.net.MessagingService;
+import org.apache.cassandra.net.NoPayload;
 import org.apache.cassandra.net.Verb;
 import org.apache.cassandra.schema.Schema;
 import org.apache.cassandra.schema.SchemaConstants;
@@ -110,6 +112,7 @@ import org.apache.cassandra.tools.NodeTool;
 import org.apache.cassandra.tracing.TraceState;
 import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.transport.messages.ResultMessage;
+import org.apache.cassandra.utils.ByteArrayUtil;
 import org.apache.cassandra.utils.DiagnosticSnapshotService;
 import org.apache.cassandra.utils.ExecutorUtils;
 import org.apache.cassandra.utils.FBUtilities;
@@ -285,9 +288,18 @@ public class Instance extends IsolatedExecutor implements 
IInvokableInstance
 
 private static IMessage serializeMessage(InetAddressAndPort from, 
InetAddressAndPort to, Message messageOut)
 {
+int version = MessagingService.instance().versions.get(to);
+if (messageOut.verb().serializer() == 
((IVersionedAsymmetricSerializer) NoPayload.serializer) || messageOut.payload 
== null)
+{
+return new MessageImpl(messageOut.verb().id,
+   ByteArrayUtil.EMPTY_BYTE_ARRAY,
+   messageOut.id(),
+   version,
+   fromCassandraInetAddressAndPort(from));
+}
+
 try (DataOutputBuffer out = new DataOutputBuffer(1024))
 {
-int version = MessagingService.instance().versions.get(to);
 Message.serializer.serialize(messageOut, out, version);
 byte[] bytes = out.toByteArray();
 if (messageOut.serializedSize(version) != bytes.length)
diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java 
b/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java
index ebc31b1..607e890 100644
--- a/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/MessageImpl.java
@@ -21,7 +21,7 @@ package org.apache.cassandra.distributed.impl;
 import java.net.InetSocketAddress;
 
 import org.apache.cassandra.distributed.api.IMessage;
-import org.apache.cassandra.distributed.shared.NetworkTopology;
+import org.apache.cassandra.utils.ByteArrayUtil;
 
 // a container for simplifying the method signature for per-instance message 
handling/delivery
 public class MessageImpl implements IMessage
@@ -65,5 +65,16 @@ public class MessageImpl implements IMessage
 {
 return from;
 }
+
+public String toString()
+{
+return "MessageImpl{" +
+   "verb=" + verb +
+   ", bytes=" + ByteArrayUtil.bytesToHex(bytes) +
+   ", id=" + id +
+   ", version=" + version +
+   ", from=" + from +
+   '}';
+}
 }
 
diff --git 
a/test/distributed/org/apache/cassandra/distributed/upgrade/UpgradeTest.java

[jira] [Commented] (CASSANDRA-16157) RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade

2020-10-14 Thread Alex Petrov (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213752#comment-17213752
 ] 

Alex Petrov commented on CASSANDRA-16157:
-

[~yifanc] I've added {{toString}} to message. Thank you for reviewing!

> RTE during re-serialization for message filtering during 3.0 -> 4.0 upgrade
> ---
>
> Key: CASSANDRA-16157
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16157
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
>
> When trying to upgrade 3.0 to 4.0, we’re often running into a problem, if 
> older node serves as a coordinator:
> {code}
>  15294 java.lang.RuntimeException: Can not deserialize message 
> org.apache.cassandra.distributed.impl.MessageImpl@4c46aead
>   15295 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:299)
>  ~[dtest-4.0-beta3.jar:?]
>   15296 at 
> org.apache.cassandra.distributed.impl.Instance.lambda$receiveMessage$7(Instance.java:315)
>  ~[dtest-4.0-beta3.jar:?]
>   15297 at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>   15298 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[?:1.8.0_232]
>   15299 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_232]
>   15300 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_232]
>   15301 at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [dtest-4.0-beta3.jar:?]
>   15302 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>   15303 Caused by: java.io.EOFException
>   15304 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>  ~[dtest-4.0-beta3.jar:?]
>   15305 at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>  ~[dtest-4.0-beta3.jar:?]
>   15306 at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>  ~[dtest-4.0-beta3.jar:?]
>   15307 at 
> org.apache.cassandra.net.Message$Serializer.deserializeHeaderPost40(Message.java:694)
>  ~[dtest-4.0-beta3.jar:?]
>   15308 at 
> org.apache.cassandra.net.Message$Serializer.deserializePost40(Message.java:765)
>  ~[dtest-4.0-beta3.jar:?]
>   15309 at 
> org.apache.cassandra.net.Message$Serializer.deserialize(Message.java:625) 
> ~[dtest-4.0-beta3.jar:?]
>   15310 at 
> org.apache.cassandra.distributed.impl.Instance.deserializeMessage(Instance.java:295)
>  ~[dtest-4.0-beta3.jar:?]
>   15311 ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15996) Fix flaky python dtest test_expiration_overflow_policy_capnowarn - ttl_test.TestTTL

2020-10-14 Thread Berenguer Blasi (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213740#comment-17213740
 ] 

Berenguer Blasi commented on CASSANDRA-15996:
-

I have been focusing on this one today and I want to share my findings. Here is 
the stdout from David's test for the record:

{noformat}
AssertionError: Log message should be print for CAP and CAP_NOWARN policy 
assert []
self = 

@since('2.1')
def test_expiration_overflow_policy_cap(self):
>   self._base_expiration_overflow_policy_test(default_ttl=False, 
> policy='CAP')

ttl_test.py:343: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = , default_ttl = False
policy = 'CAP'

def _base_expiration_overflow_policy_test(self, default_ttl, policy):
"""
Checks that expiration date overflow policy is correctly applied
@jira_ticket CASSANDRA-14092
"""
MAX_TTL = 20 * 365 * 24 * 60 * 60  # 20 years in seconds
default_time_to_live = MAX_TTL if default_ttl else None
self.prepare(default_time_to_live=default_time_to_live)

# Restart node with expiration_date_overflow_policy
self.cluster.stop()

self.cluster.start(jvm_args=['-Dcassandra.expiration_date_overflow_policy={}'.format(policy)])
self.session1 = self.patient_cql_connection(self.cluster.nodelist()[0])
self.session1.execute("USE ks;")

# Try to insert data, should only fail if policy is REJECT
query = 'INSERT INTO ttl_table (key, col1) VALUES (%d, %d)' % (1, 1)
if not default_time_to_live:
query = query + "USING TTL %d" % (MAX_TTL)
try:
result = self.session1.execute_async(query + ";")
result.result()
if policy == 'REJECT':
self.fail("should throw InvalidRequest")
if self.cluster.version() >= '3.0':  # client warn only on 3.0+
if policy == 'CAP':
logger.debug("Warning is {}", result.warnings[0])
assert 'exceeds maximum supported expiration' in 
result.warnings[0], 'Warning not found'
else:
assert not result.warnings, "There should be no warnings"

except InvalidRequest as e:
if policy != 'REJECT':
self.fail("should not throw InvalidRequest")

self.cluster.flush()
# Data should be present unless policy is reject
assert_row_count(self.session1, 'ttl_table', 0 if policy == 'REJECT' 
else 1)

# Check that warning is always logged, unless policy is REJECT
if policy != 'REJECT':
node1 = self.cluster.nodelist()[0]
prefix = 'default ' if default_ttl else ''
warning = node1.grep_log("Request on table {}.{} with {}ttl of {} 
seconds exceeds maximum supported expiration"
 .format('ks', 'ttl_table', prefix, 
MAX_TTL))
>   assert warning, 'Log message should be print for CAP and CAP_NOWARN 
> policy'
E   AssertionError: Log message should be print for CAP and CAP_NOWARN 
policy
E   assert []

ttl_test.py:410: AssertionError
{noformat}

As we can see from the code above we're being called with policy 'CAP'. And 
following the test code we make it through to line 392 where we 
[check|https://github.com/apache/cassandra-dtest/blob/master/ttl_test.py#L392] 
there was a client warning indeed. So the TTL 'business logic' is happening and 
it's correct. The only bit missing is that being logged which falls on 
{{NoSpamLogger}}'s shoulders. I can only think of some edge case on 
{{NoSpamLogger}} missing to log, which would explain why it happens so seldom, 
why it hasn't been repro'ed so far and why I didn't manage to repro either even 
on a thinned down machine.

> Fix flaky python dtest test_expiration_overflow_policy_capnowarn - 
> ttl_test.TestTTL
> ---
>
> Key: CASSANDRA-15996
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15996
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/python
>Reporter: David Capwell
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 3.11.x, 4.0-beta
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/361/workflows/3a42fa45-1f60-4c95-86a4-15a6773e384e/jobs/1860
> {code}
> >   assert warning, 'Log message should be print for CAP and 
> > CAP_NOWARN policy'
> E   AssertionError: Log message should be print for CAP and 
> CAP_NOWARN policy
> E   assert []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To

[jira] [Updated] (CASSANDRA-16207) NPE when calling broadcast address on unintialized node

2020-10-14 Thread Alex Petrov (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16207:

  Fix Version/s: 4.0-beta3
 3.11.9
 3.0.23
  Since Version: 3.0.21
Source Control Link: 
https://github.com/apache/cassandra/commit/6eeca9d6cc482417fd4564302baa349ed76fd7ec
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed to 3.0 with [6eeca9d6cc482417fd4564302baa349ed76fd7ec 
|https://github.com/apache/cassandra/commit/6eeca9d6cc482417fd4564302baa349ed76fd7ec]
 and merged to 
[3.11|https://github.com/apache/cassandra/commit/d3f7bdfe017cd236779cbac0b788ab8a3c619278]
 and 
[trunk|https://github.com/apache/cassandra/commit/83033075d334997298dc6937dc64067de76a3077].

> NPE when calling broadcast address on unintialized node
> ---
>
> Key: CASSANDRA-16207
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16207
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 3.0.23, 3.11.9, 4.0-beta3
>
>
> When trying to run upgrades, sometimes we’re calling broadcasts addrerss on 
> an uninitialised new node:
> {code}
> java.lang.IllegalStateException: Can't use shut down instances, delegate is 
> null
>   at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.delegate(AbstractCluster.java:163)
>   at 
> org.apache.cassandra.distributed.impl.DelegatingInvokableInstance.broadcastAddress(DelegatingInvokableInstance.java:53)
>  
>   at 
> org.apache.cassandra.distributed.impl.Instance$2.allowIncomingMessage(Instance.java:278)
>  
>   at 
> org.apache.cassandra.net.MessagingService.receive(MessagingService.java:1031) 
> ~[dtest-3.0.19.jar:?]
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:213)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:182)
>  
>   at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-3.11 updated (45982f5 -> d3f7bdf)

2020-10-14 Thread ifesdjeen

This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 45982f5  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 6eeca9d  Fix NPE when calling broadcast address on unintialized node
 add d3f7bdf  Merge branch 'cassandra-3.0' into cassandra-3.11

No new revisions were added by this update.

Summary of changes:
 .../cassandra/distributed/upgrade/UpgradeTest.java | 24 ++
 1 file changed, 24 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch trunk updated (d890b7a -> 8303307)

2020-10-14 Thread ifesdjeen

This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from d890b7a  Merge branch 'cassandra-3.11' into trunk
 add 6eeca9d  Fix NPE when calling broadcast address on unintialized node
 add d3f7bdf  Merge branch 'cassandra-3.0' into cassandra-3.11
 add 8303307  Merge branch 'cassandra-3.11' into trunk

No new revisions were added by this update.

Summary of changes:
 .../cassandra/distributed/upgrade/UpgradeTest.java | 25 ++
 1 file changed, 25 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra] branch cassandra-3.0 updated (0700dfa -> 6eeca9d)

2020-10-14 Thread ifesdjeen

This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 0700dfa  Check SSTables for latest version before dropping compact 
storage
 add 6eeca9d  Fix NPE when calling broadcast address on unintialized node

No new revisions were added by this update.

Summary of changes:
 .../cassandra/distributed/upgrade/UpgradeTest.java | 24 ++
 1 file changed, 24 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16163) Rename master branches to trunk in all repositories

2020-10-14 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213710#comment-17213710
 ] 

Michael Semb Wever commented on CASSANDRA-16163:


Instructions

1. In a git clone, create the trunk branch (as a rename of the master)
{code}
git branch -m master trunk
git branch --unset-upstream
git push -u origin  trunk
{code}

2. Open an INFRA ticket, asking for the upstream default branch to change
3. Inform developers of change.

> Rename master branches to trunk in all repositories
> ---
>
> Key: CASSANDRA-16163
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16163
> Project: Cassandra
>  Issue Type: Task
>  Components: Build
>Reporter: Michael Semb Wever
>Priority: Normal
>
> Applies to the following repositories
> * cassandra-builds
> * cassandra-website
> * cassandra-dtest
> * cassandra-sidecar
> * cassandra-diff
> * cassandra-in-jvm-dtest-api
> * cassandra-harry
> This was discussed in 
> https://lists.apache.org/thread.html/r54db4cd870d2d665060d5fb50d925843be4b4d54dc64f3d21f04c367%40%3Cdev.cassandra.apache.org%3E
> The general preference there was trunk over main, so to match the cassandra 
> repository.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16168) Rename master branch to trunk in cassandra-diff

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16168:
---
Change Category: Semantic
 Complexity: Normal
Component/s: Build
 Status: Open  (was: Triage Needed)

> Rename master branch to trunk in cassandra-diff
> ---
>
> Key: CASSANDRA-16168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16168
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra-diff] branch trunk created (now 4c9bc4f)

2020-10-14 Thread mck

This is an automated email from the ASF dual-hosted git repository.

mck pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-diff.git.


  at 4c9bc4f  Allow optional query retry

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16164) Rename master branch to trunk in cassandra-builds

2020-10-14 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213704#comment-17213704
 ] 

Michael Semb Wever commented on CASSANDRA-16164:


Waiting on INFRA-20982

> Rename master branch to trunk in cassandra-builds
> -
>
> Key: CASSANDRA-16164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16164
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[cassandra-builds] branch trunk created (now 5e17c9b)

2020-10-14 Thread mck

This is an automated email from the ASF dual-hosted git repository.

mck pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git.


  at 5e17c9b  Reduce CCM heap settings to match those in circleci, and 
limit docker containers to 15g memory (and disable swapping) (INFRA-20107)

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16164) Rename master branch to trunk in cassandra-builds

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16164:
---
Change Category: Semantic
 Complexity: Normal
Component/s: Build
 Status: Open  (was: Triage Needed)

> Rename master branch to trunk in cassandra-builds
> -
>
> Key: CASSANDRA-16164
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16164
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Michael Semb Wever
>Assignee: Michael Semb Wever
>Priority: Normal
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-10-14 Thread Marcus Eriksson (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213681#comment-17213681
 ] 

Marcus Eriksson commented on CASSANDRA-15369:
-

looks good in general, two concerns;
* performance of {{SinglePartitionReadCommand#reduceFilter}} is much worse now 
(a silly laptop local benchmark shows queries being 15% slower) - the reason 
seems to be that we use {{try (UnfilteredRowIterator iterator = 
result.unfilteredIterator(columnFilter(), filter.getSlices(metadata()), 
false))}} - I think we can just replace that with {{try (UnfilteredRowIterator 
iterator = result.unfilteredIterator(columnFilter(), clusterings, false))}}?
* {{AbstractBTreePartition#getRow}} - this looks like it is missing the fix 
from CASSANDRA-15363 - the {{row == null}} case should probably be
{code}
// this means our partition level deletion superseedes all 
other deletions and we don't have to keep the row deletions
if (activeDeletion == partitionDeletion)
return null;
// no need to check activeDeletion.isLive here - if 
anything superseedes the partitionDeletion
// it must be non-live
return BTreeRow.emptyDeletedRow(clustering, 
Row.Deletion.regular(activeDeletion));
{code}

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16201:
---
Since Version: 3.0 alpha 1

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16201) Reduce amount of allocations during batch statement execution

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-16201:
---
Fix Version/s: 3.0.x

> Reduce amount of allocations during batch statement execution
> -
>
> Key: CASSANDRA-16201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16201
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> In a Cas 2.1 / 3.0 / 3.11 / 4.0b2 comparison test with the same load profile, 
> we see 4.0b2 going OOM from time to time. According to a heap dump, we have 
> multiple NTR threads in a 3-digit MB range.
> This is likely related to object array pre-allocations at the size of 
> {{BatchUpdatesCollector.updatedRows}} per {{BTree}} although there is always 
> only 1 {{BTreeRow}} in the {{BTree}}.
>  !screenshot-1.png|width=100%! 
> So it seems we have many, many 20K elemnts pre-allocated object arrays 
> resulting in a shallow heap of 80K each, although there is only one element 
> in the array.
> This sort of pre-allocation is causing a lot of memory pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Fix Version/s: (was: 4.0.x)
   (was: 3.11.x)
Since Version: 3.0 alpha 1

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Resolution: Duplicate  (was: Fixed)
Status: Resolved  (was: Open)

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-14 Thread Michael Semb Wever (Jira)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15430:
---
Status: Open  (was: Resolved)

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-10-14 Thread Michael Semb Wever (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213675#comment-17213675
 ] 

Michael Semb Wever commented on CASSANDRA-15430:


[~tsteinmaurer], under CASSANDRA-16201 [~marcuse] and I plan to address the 
issues also here.

To have it stated, 16201 also needs to include for 3.0,
1. add initialCapacity to BTree$Builder, 
2. make sure initialCapacity is sane, 
3. add an initialCapacity to MultiCBuilder 

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
> Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, 
> jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, 
> jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, 
> jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, 
> screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

81 matches

Mail list logo