[jira] [Comment Edited] (CASSANDRA-15081) LegacyLayout does not have same behavior as 2.x when handling unknown column names
[ https://issues.apache.org/jira/browse/CASSANDRA-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968120#comment-16968120 ] Michael Semb Wever edited comment on CASSANDRA-15081 at 11/6/19 6:37 AM: - Thanks [~cam1982]. I believe you're correct. But it needs to be checked. There's an upgrade dtest relevant to this, I will check it out and get back to you. was (Author: michaelsembwever): Thanks [~cam1982]. I believe you're correct. But it needs to be checked. I believe there's an upgrade dtest relevant. Will check it out and get back to you. > LegacyLayout does not have same behavior as 2.x when handling unknown column > names > -- > > Key: CASSANDRA-15081 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15081 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: High > Labels: patch, pull-request-available > Attachments: 15081.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Due to a bug I haven't been able to reproduce the production cluster had > unknown column names. To replicate the issue for this test I did the > following: > {noformat} > $ ccm create -v 2.1.19 -n 1 -s bug > $ cat > schema.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text, "paylo!d" > text); > EOF > $ ccm node1 cqlsh -f schema.cql > $ export CASSANDRA_INCLUDE=~/.ccm/bug/node1/bin/cassandra.in.sh > $ cat > bug.json << 'EOF' > [ > {"key": "1", > "cells": [["","",1554432501209207], > ["paylo!d","hello world",1554432501209207], > ["payload","hello world",1554432501209207]]} > ] > EOF > $ ~/.ccm/repository/2.1.19/tools/bin/json2sstable -K test -c unknowntest > ~/bug.json > ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-Data.db{noformat} > Then test the behavior of unknown columns in 2.1: > {noformat} > $ ccm stop > $ ccm create -v 2.1.19 -n 1 -s bug2_1_19 > $ cat > schema2.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text); > EOF > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug2_1_19/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug2_1_19 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 2.1.19 | CQL spec 3.2.1 | Native protocol v3] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > id | payload > +- > 1 | hello world > (1 rows){noformat} > Compared to 3.11.4 which did the following: > {noformat} > $ ccm stop > $ ccm create -v 3.11.4 -n 1 -s bug3_11_4 > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug3_11_4/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug3_11_4 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'failures': 1, 'received_responses': 0, 'required_responses': 1, > 'consistency': 'ONE'} > {noformat} > In the logs this resulted in an IllegalStateException from LegacyLayout line > 1127 > The expected behavior would be to ignore the column and return results the > same as in 2.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15081) LegacyLayout does not have same behavior as 2.x when handling unknown column names
[ https://issues.apache.org/jira/browse/CASSANDRA-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968120#comment-16968120 ] Michael Semb Wever commented on CASSANDRA-15081: Thanks [~cam1982]. I believe you're correct. But it needs to be checked. I believe there's an upgrade dtest relevant. Will check it out and get back to you. > LegacyLayout does not have same behavior as 2.x when handling unknown column > names > -- > > Key: CASSANDRA-15081 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15081 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: High > Labels: patch, pull-request-available > Attachments: 15081.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Due to a bug I haven't been able to reproduce the production cluster had > unknown column names. To replicate the issue for this test I did the > following: > {noformat} > $ ccm create -v 2.1.19 -n 1 -s bug > $ cat > schema.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text, "paylo!d" > text); > EOF > $ ccm node1 cqlsh -f schema.cql > $ export CASSANDRA_INCLUDE=~/.ccm/bug/node1/bin/cassandra.in.sh > $ cat > bug.json << 'EOF' > [ > {"key": "1", > "cells": [["","",1554432501209207], > ["paylo!d","hello world",1554432501209207], > ["payload","hello world",1554432501209207]]} > ] > EOF > $ ~/.ccm/repository/2.1.19/tools/bin/json2sstable -K test -c unknowntest > ~/bug.json > ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-Data.db{noformat} > Then test the behavior of unknown columns in 2.1: > {noformat} > $ ccm stop > $ ccm create -v 2.1.19 -n 1 -s bug2_1_19 > $ cat > schema2.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text); > EOF > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug2_1_19/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug2_1_19 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 2.1.19 | CQL spec 3.2.1 | Native protocol v3] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > id | payload > +- > 1 | hello world > (1 rows){noformat} > Compared to 3.11.4 which did the following: > {noformat} > $ ccm stop > $ ccm create -v 3.11.4 -n 1 -s bug3_11_4 > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug3_11_4/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug3_11_4 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'failures': 1, 'received_responses': 0, 'required_responses': 1, > 'consistency': 'ONE'} > {noformat} > In the logs this resulted in an IllegalStateException from LegacyLayout line > 1127 > The expected behavior would be to ignore the column and return results the > same as in 2.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15081) LegacyLayout does not have same behavior as 2.x when handling unknown column names
[ https://issues.apache.org/jira/browse/CASSANDRA-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968072#comment-16968072 ] Cameron Zemek commented on CASSANDRA-15081: --- [~mck] no input. As far as I could tell CASSANDRA-13939 shouldn't be affected by this, but to be honest I didn't fully understand that issue. I thought I mentioned it in just in case it might and someone more knowledgable might be able to injected if they see an issue. The unit tests passed so hoping that means I haven't broken anything elsewhere. > LegacyLayout does not have same behavior as 2.x when handling unknown column > names > -- > > Key: CASSANDRA-15081 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15081 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Cameron Zemek >Priority: High > Labels: patch, pull-request-available > Attachments: 15081.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Due to a bug I haven't been able to reproduce the production cluster had > unknown column names. To replicate the issue for this test I did the > following: > {noformat} > $ ccm create -v 2.1.19 -n 1 -s bug > $ cat > schema.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text, "paylo!d" > text); > EOF > $ ccm node1 cqlsh -f schema.cql > $ export CASSANDRA_INCLUDE=~/.ccm/bug/node1/bin/cassandra.in.sh > $ cat > bug.json << 'EOF' > [ > {"key": "1", > "cells": [["","",1554432501209207], > ["paylo!d","hello world",1554432501209207], > ["payload","hello world",1554432501209207]]} > ] > EOF > $ ~/.ccm/repository/2.1.19/tools/bin/json2sstable -K test -c unknowntest > ~/bug.json > ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-Data.db{noformat} > Then test the behavior of unknown columns in 2.1: > {noformat} > $ ccm stop > $ ccm create -v 2.1.19 -n 1 -s bug2_1_19 > $ cat > schema2.cql << 'EOF' > CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE test.unknowntest (id int primary key, payload text); > EOF > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug2_1_19/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug2_1_19 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 2.1.19 | CQL spec 3.2.1 | Native protocol v3] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > id | payload > +- > 1 | hello world > (1 rows){noformat} > Compared to 3.11.4 which did the following: > {noformat} > $ ccm stop > $ ccm create -v 3.11.4 -n 1 -s bug3_11_4 > $ ccm node1 cqlsh -f schema2.cql > $ ccm stop > $ cp ~/.ccm/bug/node1/data0/test/unknowntest-/test-unknowntest-ka-1-* > ~/.ccm/bug3_11_4/node1/data0/test/unknowntest-/ > $ ccm start > $ ccm node1 cqlsh > Connected to bug3_11_4 at 127.0.0.1:9042. > [cqlsh 5.0.1 | Cassandra 3.11.4 | CQL spec 3.4.4 | Native protocol v4] > Use HELP for help. > cqlsh> select * from test.unknowntest where id = 1; > ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] > message="Operation failed - received 0 responses and 1 failures" > info={'failures': 1, 'received_responses': 0, 'required_responses': 1, > 'consistency': 'ONE'} > {noformat} > In the logs this resulted in an IllegalStateException from LegacyLayout line > 1127 > The expected behavior would be to ignore the column and return results the > same as in 2.1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15349) Add “Going away” message to the client protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968011#comment-16968011 ] Chris Lohfink edited comment on CASSANDRA-15349 at 11/6/19 1:47 AM: Would it be possible to send a CQL level event on the connection itself (ie new opcode 0x11)? On larger clusters it can take 10s to propagate a gossip event. Perhaps even a "request to close" event or something on the connection itself and let the client itself disconnect instead of just stop sending requests to it. Then the server can just forcibly shut down (like it does now) after some time or move on if all connections are left. was (Author: cnlwsu): Would it be possible to send a CQL level event on the connection itself? On larger clusters it can take 10s to propagate a gossip event. Perhaps even a "request to close" event or something on the connection itself and let the client itself disconnect instead of just stop sending requests to it. Then the server can just forcibly shut down (like it does now) after some time or move on if all connections are left. > Add “Going away” message to the client protocol > --- > > Key: CASSANDRA-15349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15349 > Project: Cassandra > Issue Type: New Feature > Components: Messaging/Client >Reporter: Alex Petrov >Priority: Normal > Labels: client-impacting > > Add “Going away” message that allows node to announce its shutdown and let > clients gracefully shutdown and not attempt further requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15349) Add “Going away” message to the client protocol
[ https://issues.apache.org/jira/browse/CASSANDRA-15349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968011#comment-16968011 ] Chris Lohfink commented on CASSANDRA-15349: --- Would it be possible to send a CQL level event on the connection itself? On larger clusters it can take 10s to propagate a gossip event. Perhaps even a "request to close" event or something on the connection itself and let the client itself disconnect instead of just stop sending requests to it. Then the server can just forcibly shut down (like it does now) after some time or move on if all connections are left. > Add “Going away” message to the client protocol > --- > > Key: CASSANDRA-15349 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15349 > Project: Cassandra > Issue Type: New Feature > Components: Messaging/Client >Reporter: Alex Petrov >Priority: Normal > Labels: client-impacting > > Add “Going away” message that allows node to announce its shutdown and let > clients gracefully shutdown and not attempt further requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15399) Add ability to track state in repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15399: --- Labels: pull-request-available (was: ) > Add ability to track state in repair > > > Key: CASSANDRA-15399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15399 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > > To enhance the visibility in repair, we should add in-memory objects that can > be exposed via JMX and virtual tables to show the state of the coordinator, > and validations (leaving sync out for now). > These objects should expose the timing (create, start, complete), current > state (enum specific to the entity), and progress estimate (% complete); > along with any entity specific information useful. > To help with growth, ActiveRepairService should periodically cleanup > completed state after a configurable interval. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15399) Add ability to track state in repair
[ https://issues.apache.org/jira/browse/CASSANDRA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15399: -- Change Category: Operability Complexity: Normal Status: Open (was: Triage Needed) > Add ability to track state in repair > > > Key: CASSANDRA-15399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15399 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Repair >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > > To enhance the visibility in repair, we should add in-memory objects that can > be exposed via JMX and virtual tables to show the state of the coordinator, > and validations (leaving sync out for now). > These objects should expose the timing (create, start, complete), current > state (enum specific to the entity), and progress estimate (% complete); > along with any entity specific information useful. > To help with growth, ActiveRepairService should periodically cleanup > completed state after a configurable interval. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15399) Add ability to track state in repair
David Capwell created CASSANDRA-15399: - Summary: Add ability to track state in repair Key: CASSANDRA-15399 URL: https://issues.apache.org/jira/browse/CASSANDRA-15399 Project: Cassandra Issue Type: Improvement Components: Consistency/Repair Reporter: David Capwell Assignee: David Capwell To enhance the visibility in repair, we should add in-memory objects that can be exposed via JMX and virtual tables to show the state of the coordinator, and validations (leaving sync out for now). These objects should expose the timing (create, start, complete), current state (enum specific to the entity), and progress estimate (% complete); along with any entity specific information useful. To help with growth, ActiveRepairService should periodically cleanup completed state after a configurable interval. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967802#comment-16967802 ] Benedict Elliott Smith commented on CASSANDRA-15397: It's true that the costs of constructing a new {{IntervalTree}} are non-trivial, and it isn't necessarily reasonable to assume that it occurs sufficiently infrequent to not matter eitherr. The lookup cost is not a terribly significant cost to worry about, but reducing construction cost could be a win for some users, and this modification might improve that. If we really cared about construction costs, it would be possible to introduce an immutable but updatable {{IntervalTree}} instead of building it from scratch every time. But that's likely to be a lot more work. It's worth noting that the {{OverlapIterator}} we already have in tree is very similar in principle, I assume (but with different assumptions about usage), though I haven't had a chance to look at your proposal yet. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967732#comment-16967732 ] Chandrasekhar Thumuluru commented on CASSANDRA-15397: - Sure [~benedict]. I can make the changes and update the ticket with Github links. As you can see I simplified the IntervalTree implementation for comparison purposes. I'll make the final changes with tests and push them to my fork by weekend. I completely agree with you it's not a pressing change but given the construction cost and immutable nature of IntervalTree usage I felt it's worth a shot. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967715#comment-16967715 ] Benedict Elliott Smith commented on CASSANDRA-15397: Hi [~cthumuluru], this sounds like a plausible optimisation (without having thought about it much myself yet). Unfortunately it's not a very _pressing_ optimisation, but I will try to find time within the next couple of weeks to give you some feedback. If possible, we generally prefer links to GitHub branches. Could you push your fork with these changes somewhere to look at? > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-15397: --- Change Category: Performance Complexity: Normal Component/s: Local/SSTable Reviewers: Benedict Elliott Smith Priority: Low (was: Normal) Status: Open (was: Triage Needed) > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Low > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith reassigned CASSANDRA-15397: -- Assignee: Chandrasekhar Thumuluru > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement >Reporter: Chandrasekhar Thumuluru >Assignee: Chandrasekhar Thumuluru >Priority: Normal > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction
[ https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967687#comment-16967687 ] Benedict Elliott Smith edited comment on CASSANDRA-15390 at 11/5/19 5:16 PM: - I like the pattern. It might be worth putting in some explanatory comments, about why these functions exist? I'm not thrilled by the name "Getter" since that usually means a member function that returns a value, but I don't have a much better suggestion. Perhaps {{IterateFunction}}? Could also drop "get" as a prefix of the method name. I wonder if there is any value in introducing a bulk {{nextAt}} method that can fetch into an array, for the leaf building mode. We could fetch them all via {{arrayCopy}}, then loop over the array to invoke the {{UpdateFunction}} (conditionally on it not being no-op, even, since we seem to test this anyway already). was (Author: benedict): I like the pattern. It might be worth putting in some explanatory comments, about why these functions exist? I'm not thrilled by the name "Getter" since that usually means a member function that returns a value, but I don't have a much better suggestion. Perhaps {{IterateFunction}}? Could also drop "get" as a prefix of the method name. I wonder if there is any value in introducing a bulk {{nextAt}} method that can fetch into an array, for the leaf building mode. We could fetch them all via arrayCopy, then loop over the array to invoke the {{UpdateFunction}} (conditionally on it not being no-op, even, since we seem to test this anyway already). > Avoid unnecessary collection/iterator allocations during btree construction > --- > > Key: CASSANDRA-15390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15390 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > A heavily used btree builder path does a lot of unnecessary conversions to > and from collections and iterators. Adding dedicated support for Object[] > reduces compaction garbage by up to 8.3% -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction
[ https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967687#comment-16967687 ] Benedict Elliott Smith commented on CASSANDRA-15390: I like the pattern. It might be worth putting in some explanatory comments, about why these functions exist? I'm not thrilled by the name "Getter" since that usually means a member function that returns a value, but I don't have a much better suggestion. Perhaps {{IterateFunction}}? Could also drop "get" as a prefix of the method name. I wonder if there is any value in introducing a bulk {{nextAt}} method that can fetch into an array, for the leaf building mode. We could fetch them all via arrayCopy, then loop over the array to invoke the {{UpdateFunction}} (conditionally on it not being no-op, even, since we seem to test this anyway already). > Avoid unnecessary collection/iterator allocations during btree construction > --- > > Key: CASSANDRA-15390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15390 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > A heavily used btree builder path does a lot of unnecessary conversions to > and from collections and iterators. Adding dedicated support for Object[] > reduces compaction garbage by up to 8.3% -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction
[ https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-15390: --- Reviewers: Benedict Elliott Smith, Benedict Elliott Smith (was: Benedict Elliott Smith) Benedict Elliott Smith, Benedict Elliott Smith Status: Review In Progress (was: Patch Available) > Avoid unnecessary collection/iterator allocations during btree construction > --- > > Key: CASSANDRA-15390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15390 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > A heavily used btree builder path does a lot of unnecessary conversions to > and from collections and iterators. Adding dedicated support for Object[] > reduces compaction garbage by up to 8.3% -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction
[ https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-15390: --- Status: Changes Suggested (was: Review In Progress) > Avoid unnecessary collection/iterator allocations during btree construction > --- > > Key: CASSANDRA-15390 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15390 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > A heavily used btree builder path does a lot of unnecessary conversions to > and from collections and iterators. Adding dedicated support for Object[] > reduces compaction garbage by up to 8.3% -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14773) Overflow of 32-bit integer during compaction.
[ https://issues.apache.org/jira/browse/CASSANDRA-14773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-14773: --- Reviewers: (was: Benedict Elliott Smith) > Overflow of 32-bit integer during compaction. > - > > Key: CASSANDRA-14773 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14773 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction >Reporter: Vladimir Bukhtoyarov >Assignee: Vladimir Bukhtoyarov >Priority: Urgent > Fix For: 4.0, 4.0-beta > > > In scope of CASSANDRA-13444 the compaction was significantly improved from > CPU and memory perspective. Hovewer this improvement introduces the bug in > rounding. When rounding the expriration time which is close to > *Cell.MAX_DELETION_TIME*(it is just *Integer.MAX_VALUE*) the math overflow > happens(because in scope of -CASSANDRA-13444-) data type for point was > changed from Long to Integer in order to reduce memory footprint), as result > point became negative and acts as silent poison for internal structures of > StreamingTombstoneHistogramBuilder like *DistanceHolder* and *DataHolder*. > Then depending of point intervals: > * The TombstoneHistogram produces wrong values when interval of points is > less then binSize, it is not critical. > * Compaction crashes with ArrayIndexOutOfBoundsException if amount of point > intervals is great then binSize, this case is very critical. > > This is pull request [https://github.com/apache/cassandra/pull/273] that > reproduces the issue and provides the fix. > > The stacktrace when running(on codebase without fix) > *testMathOverflowDuringRoundingOfLargeTimestamp* without -ea JVM flag > {noformat} > java.lang.ArrayIndexOutOfBoundsException > at java.lang.System.arraycopy(Native Method) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$DistanceHolder.add(StreamingTombstoneHistogramBuilder.java:208) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushValue(StreamingTombstoneHistogramBuilder.java:140) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$$Lambda$1/1967205423.consume(Unknown > Source) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder$Spool.forEach(StreamingTombstoneHistogramBuilder.java:574) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.flushHistogram(StreamingTombstoneHistogramBuilder.java:124) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilder.build(StreamingTombstoneHistogramBuilder.java:184) > at > org.apache.cassandra.utils.streamhist.StreamingTombstoneHistogramBuilderTest.testMathOverflowDuringRoundingOfLargeTimestamp(StreamingTombstoneHistogramBuilderTest.java:183) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41) > at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) > at org.junit.runners.ParentRunner.run(ParentRunner.java:220) > at org.junit.runner.JUnitCore.run(JUnitCore.java:159) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {noformat} > > The stacktrace when running(on codebase without fix) >
[jira] [Updated] (CASSANDRA-14779) Changing EndpointSnitch via JMX has problems
[ https://issues.apache.org/jira/browse/CASSANDRA-14779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-14779: --- Reviewers: (was: Benedict Elliott Smith) > Changing EndpointSnitch via JMX has problems > > > Key: CASSANDRA-14779 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14779 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Observability/JMX >Reporter: Benedict Elliott Smith >Assignee: Ian Cleasby >Priority: Low > Fix For: 4.x > > > The snitch can be set via StorageService over JMX, for what reason I’m > unsure. If this were to happen, we might encounter the following problems: > * If the effective local DC were to change, we would not update it. Perhaps > changing the local DC of a node should be rejected and cause it to fail, but > presently, it would simply result in our disagreeing with the snitch. > * During the transition, routing of queries might be broken, as we fetch the > snitch multiple times in different locations when deciding where to route our > query and writes. It’s not clear what the outcome of a discordant view of the > ring would be. > Probably, changing this information in a live cluster is dangerous and we > should actually reject any effective changes to rack, or DC for any node. But > presently we don’t seem to corroborate that this information remains the > same. We don’t seem to perform any cluster wide confirmation that this data > is consistent, generally, which perhaps we should also consider. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
[ https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-15358: --- Bug Category: Parent values: Availability(12983)Level 1 values: Response Crash(12991) Complexity: Normal Discovered By: User Report Severity: Normal Status: Open (was: Triage Needed) > Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue > > > Key: CASSANDRA-15358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15358 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Santhosh Kumar Ramalingam >Assignee: Benedict Elliott Smith >Priority: Normal > Labels: 4.0, alpha > > Hitting a bug with cassandra 4 alpha version. The same bug is repeated with > difefrent version of Java(8,11 &12) [~benedict] > > Stack trace: > {code:java} > INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for > CQL clients on /1.3.0.6:9042 (unencrypted)... > WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - > 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message > of type PING_REQ whose timeout expired before reaching the network > WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - > 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /1.3.0.2:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - > 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of > type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network > INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 > - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect > > ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 > InboundMessageHandler.java:657 - > 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception > caught while processing inbound messages; terminating connection > java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. > at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59) > at > org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95) > at > org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) > at > io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) > at > io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > at >
[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
[ https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967677#comment-16967677 ] Santhosh Kumar Ramalingam commented on CASSANDRA-15358: --- [~benedict] Our spark test was run on the version taken from the trunk. > Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue > > > Key: CASSANDRA-15358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15358 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Santhosh Kumar Ramalingam >Assignee: Benedict Elliott Smith >Priority: Normal > Labels: 4.0, alpha > > Hitting a bug with cassandra 4 alpha version. The same bug is repeated with > difefrent version of Java(8,11 &12) [~benedict] > > Stack trace: > {code:java} > INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for > CQL clients on /1.3.0.6:9042 (unencrypted)... > WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - > 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message > of type PING_REQ whose timeout expired before reaching the network > WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - > 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /1.3.0.2:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - > 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of > type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network > INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 > - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect > > ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 > InboundMessageHandler.java:657 - > 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception > caught while processing inbound messages; terminating connection > java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. > at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59) > at > org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95) > at > org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) > at > io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) > at > io.netty.channel.unix.PreferredDirectByteBufAllocator.ioBuffer(PreferredDirectByteBufAllocator.java:53) > at > io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) > at > io.netty.channel.epoll.EpollRecvByteAllocatorHandle.allocate(EpollRecvByteAllocatorHandle.java:75) > at >
[jira] [Commented] (CASSANDRA-15241) Virtual table to expose current running queries
[ https://issues.apache.org/jira/browse/CASSANDRA-15241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967672#comment-16967672 ] Benedict Elliott Smith commented on CASSANDRA-15241: [~clohfink] sorry, for some reason in my mind this was still in your queue, not mine. I will get it reviewed for you this week. > Virtual table to expose current running queries > --- > > Key: CASSANDRA-15241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15241 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Virtual Tables >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > > Expose current running queries and their duration. > {code}cqlsh> select * from system_views.queries; > thread_id| duration_micros | task > --+-+- > Native-Transport-Requests-17 |6325 | QUERY > select * from system_views.queries; [pageSize = 100] > Native-Transport-Requests-4 | 14681 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > Native-Transport-Requests-6 | 14678 | EXECUTE > f4115f91190d4acf09e452637f1f2444 with 0 values at consistency LOCAL_ONE > ReadStage-10 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-13 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-14 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-19 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-20 | 11861 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-22 |7279 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-23 |4716 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-5 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-7 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000 > ReadStage-8 | 16535 | >SELECT * FROM basic.wide1 LIMIT 5000{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15389) Minimize BTree iterator allocations
[ https://issues.apache.org/jira/browse/CASSANDRA-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967627#comment-16967627 ] Benedict Elliott Smith commented on CASSANDRA-15389: Just collecting here some comments I made on GitHub: h3. Rows.collectStats * Could simply increment the long directly by 0x and 1, respectively, without unpacking * The saturation checks seem to be of limited value after the loop terminates, and should perhaps be done on each increment? It would throw if we had an overflow of 2B to 4B, but not 4B to 6B. Not sure how likely either of these things are. * The right-shift to extract should probably be unsigned (though unimportant if we haven't overflowed) h3. SerializationHeader Not sure if this would be an improvement or not, but {{FullBTreeSearchIterator}} has a rewind method, and this could be hoisted into {{SearchIterator}} to make it reusable. It's not clear if this would be faster than consulting a {{HashMap}}, particularly with the new {{LeafBTreeSearchIterator}} that uses {{binarySearch}} without any optimisation for the case where we are looking up the same set of values in sequence, however {{FullBTreeSearchIterator}} would have no indirect memory accesses for the common case of all (or most) columns being visited, and this could also be propagated to {{LeafBTreeSearchIterator}}. It would mean fewer indirect memory accesses. h3. BTreeRow * {{hasComplex}} doesn't need to use an iterator at all - we can simply search for the first complex cell using {{BTree.find}} and the {{Cell}} equivalent of {{Columns.findFirstComplexIdx}} - however it looks like this method isn't even used, so we could simply remove it entirely. * {{hasComplexDeletion}} could use the same logic to determine the {{firstComplexIdx}}, and instead of providing a {{StopCondition}} we could provide {{(firstComplexIdx, size)}} as the bounds to accumulate over. These two would remove the need for a direction to the accumulate function, and the {{StopCondition}}, which I think would be an easier to understand API (and easier to parse implementation). > Minimize BTree iterator allocations > --- > > Key: CASSANDRA-15389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15389 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > Allocations of BTree iterators contribute a lot amount of garbage to the > compaction and read paths. > This patch removes most btree iterator allocations on hot paths by: > • using Row#apply where appropriate on frequently called methods > (Row#digest, Row#validateData > • adding BTree accumulate method. Like the apply method, this method walks > the btree with a function that takes and returns a long argument, this > eliminates iterator allocations without adding helper object allocations > (BTreeRow#hasComplex, BTreeRow#hasInvalidDeletions, BTreeRow#dataSize, > BTreeRow#unsharedHeapSizeExcludingData, Rows#collectStats, > UnfilteredSerializer#serializedRowBodySize) as well as eliminating the > allocation of helper objects in places where apply was used previously^[1]^. > • Create map of columns in SerializationHeader, this lets us avoid > allocating a btree search iterator for each row we serialize. > These optimizations reduce garbage created during compaction by up to 13.5% > > [1] the memory test does measure memory allocated by lambdas capturing objects -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15394) Remove list iterators
[ https://issues.apache.org/jira/browse/CASSANDRA-15394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-15394: --- Status: Changes Suggested (was: Review In Progress) > Remove list iterators > - > > Key: CASSANDRA-15394 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15394 > Project: Cassandra > Issue Type: Sub-task > Components: Local/Compaction >Reporter: Blake Eggleston >Assignee: Blake Eggleston >Priority: Normal > Fix For: 4.0 > > > We allocate list iterators in several places in hot paths. This converts them > to get by index. This provides a ~4% improvement in relvant workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandrasekhar Thumuluru updated CASSANDRA-15397: Attachment: Mean_3_SSTable_with_5000_Searches.png Mean_25000_SSTable_with_5000_Searches.png Mean_2_SSTable_with_5000_Searches.png Mean_15000_SSTable_with_5000_Searches.png Mean_1_SSTable_with_5000_Searches.png Mean_5000_SSTable_with_5000_Searches.png > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement >Reporter: Chandrasekhar Thumuluru >Priority: Normal > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java, > Mean_1_SSTable_with_5000_Searches.png, > Mean_15000_SSTable_with_5000_Searches.png, > Mean_2_SSTable_with_5000_Searches.png, > Mean_25000_SSTable_with_5000_Searches.png, > Mean_3_SSTable_with_5000_Searches.png, > Mean_5000_SSTable_with_5000_Searches.png > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandrasekhar Thumuluru updated CASSANDRA-15397: Attachment: 95p_5000_SSTable_with_5000_Searches.png 95p_1_SSTable_with_5000_Searches.png 95p_15000_SSTable_with_5000_Searches.png 95p_2_SSTable_with_5000_Searches.png 95p_25000_SSTable_with_5000_Searches.png 95p_3_SSTable_with_5000_Searches.png > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement >Reporter: Chandrasekhar Thumuluru >Priority: Normal > Attachments: 95p_1_SSTable_with_5000_Searches.png, > 95p_15000_SSTable_with_5000_Searches.png, > 95p_2_SSTable_with_5000_Searches.png, > 95p_25000_SSTable_with_5000_Searches.png, > 95p_3_SSTable_with_5000_Searches.png, > 95p_5000_SSTable_with_5000_Searches.png, > 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree by removing the data > portion of the interval. Modified the template version (Java generics) to a > specialized version. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandrasekhar Thumuluru updated CASSANDRA-15397: Description: Cassandra uses IntervalTrees to identify the SSTables that overlap with search interval. In Cassandra, IntervalTrees are not mutated. They are recreated each time a mutation is required. This can be an issue during repairs. In fact we noticed such issues during repair. Since lists are cache friendly compared to linked lists and trees, I decided to compare the search performance with: * Linear Walk. * Elimination using Binary Search (idea is to eliminate intervals using start and end points of search interval). Based on the tests I ran, I noticed Binary Search based elimination almost always performs similar to IntervalTree or out performs IntervalTree based search. The cost of IntervalTree construction is also substantial and produces lot of garbage during repairs. I ran the tests using random intervals to build the tree/lists and another randomly generated search interval with 5000 iterations. I'm attaching all the relevant graphs. The x-axis in the graphs is the search interval coverage. 10p means the search interval covered 10% of the intervals. The y-axis is the time the search took in nanos. PS: # For the purpose of test, I simplified the IntervalTree by removing the data portion of the interval. Modified the template version (Java generics) to a specialized version. # I used the code from Cassandra version _3.11_. # Time in the graph is in nanos. was: Cassandra uses IntervalTrees to identify the SSTables that overlap with search interval. In Cassandra, IntervalTrees are not mutated. They are recreated each time a mutation is required. This can be an issue during repairs. In fact we noticed such issues during repair. Since lists are cache friendly compared to linked lists and trees, I decided to compare the search performance with: * Linear Walk. * Elimination using Binary Search (idea is to eliminate intervals using start and end points of search interval). Based on the tests I ran, I noticed Binary Search based elimination almost always performs similar to IntervalTree or out performs IntervalTree based search. The cost of IntervalTree construction is also substantial and produces lot of garbage during repairs. I ran the tests using random intervals to build the tree/lists and another randomly generated search interval with 5000 iterations. I'm attaching all the relevant graphs. The x-axis in the graphs is the search interval coverage. 10p means the search interval covered 10% of the intervals. The y-axis is the time the search took in nanos. PS: # For the purpose of test, I simplified the IntervalTree code by making it non-generic and removing the data portion of the interval. # I used the code from Cassandra version _3.11_. # Time in the graph is in nanos. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement >Reporter: Chandrasekhar Thumuluru >Priority: Normal > Attachments: 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search
[jira] [Commented] (CASSANDRA-15358) Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue
[ https://issues.apache.org/jira/browse/CASSANDRA-15358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967610#comment-16967610 ] Benedict Elliott Smith commented on CASSANDRA-15358: Could you try running the following branch? [15358|https://github.geo.apple.com/belliottsmith/cassandra/tree/15358] It is based on trunk, so let me know if you would prefer it rebased to {{4.0-alpha1}} or {{4.0-alpha2}}, though I don't believe a great deal has changed since. Given the log statements, there's a good chance this is the problem. Even though I still think it's nearly impossible for a read-only buffer to be created and returned to the pool, I cannot find another more plausible cause. If this doesn't fix the problem, I'll see if I can put together a debug build that can maybe report where this {{ByteBuffer}} materialises from. > Cassandra alpha 4 testing - Nodes crashing due to bufferpool allocator issue > > > Key: CASSANDRA-15358 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15358 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark >Reporter: Santhosh Kumar Ramalingam >Assignee: Benedict Elliott Smith >Priority: Normal > Labels: 4.0, alpha > > Hitting a bug with cassandra 4 alpha version. The same bug is repeated with > difefrent version of Java(8,11 &12) [~benedict] > > Stack trace: > {code:java} > INFO [main] 2019-10-11 16:07:12,024 Server.java:164 - Starting listening for > CQL clients on /1.3.0.6:9042 (unencrypted)... > WARN [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:13,961 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > WARN [Messaging-EventLoop-3-2] 2019-10-11 16:07:22,038 NoSpamLogger.java:94 - > 10.3x.4x.5x:7000->1.3.0.5:7000-LARGE_MESSAGES-[no-channel] dropping message > of type PING_REQ whose timeout expired before reaching the network > WARN [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:343 > - CassandraRoleManager skipped default role setup: some nodes were not ready > INFO [OptionalTasks:1] 2019-10-11 16:07:23,963 CassandraRoleManager.java:369 > - Setup task failed with error, rescheduling > INFO [Messaging-EventLoop-3-6] 2019-10-11 16:07:32,759 NoSpamLogger.java:91 - > 10.3x.4x.5x:7000->1.3.0.2:7000-URGENT_MESSAGES-[no-channel] failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /1.3.0.2:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:243) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:667) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:644) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:524) > at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:414) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > WARN [Messaging-EventLoop-3-3] 2019-10-11 16:11:32,639 NoSpamLogger.java:94 - > 1.3.4.6:7000->1.3.4.5:7000-URGENT_MESSAGES-[no-channel] dropping message of > type GOSSIP_DIGEST_SYN whose timeout expired before reaching the network > INFO [Messaging-EventLoop-3-18] 2019-10-11 16:11:33,077 NoSpamLogger.java:91 > - 1.3.4.5:7000->1.3.4.4:7000-URGENT_MESSAGES-[no-channel] failed to connect > > ERROR [Messaging-EventLoop-3-11] 2019-10-10 01:34:34,407 > InboundMessageHandler.java:657 - > 1.3.4.5:7000->1.3.4.8:7000-LARGE_MESSAGES-0b7d09cd unexpected exception > caught while processing inbound messages; terminating connection > java.lang.IllegalArgumentException: initialBuffer is not a direct buffer. > at io.netty.buffer.UnpooledDirectByteBuf.(UnpooledDirectByteBuf.java:87) > at > io.netty.buffer.UnpooledUnsafeDirectByteBuf.(UnpooledUnsafeDirectByteBuf.java:59) > at > org.apache.cassandra.net.BufferPoolAllocator$Wrapped.(BufferPoolAllocator.java:95) > at > org.apache.cassandra.net.BufferPoolAllocator.newDirectBuffer(BufferPoolAllocator.java:56) > at >
[jira] [Updated] (CASSANDRA-15397) IntervalTree performance comparison with Linear Walk and Binary Search based Elimination.
[ https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandrasekhar Thumuluru updated CASSANDRA-15397: Description: Cassandra uses IntervalTrees to identify the SSTables that overlap with search interval. In Cassandra, IntervalTrees are not mutated. They are recreated each time a mutation is required. This can be an issue during repairs. In fact we noticed such issues during repair. Since lists are cache friendly compared to linked lists and trees, I decided to compare the search performance with: * Linear Walk. * Elimination using Binary Search (idea is to eliminate intervals using start and end points of search interval). Based on the tests I ran, I noticed Binary Search based elimination almost always performs similar to IntervalTree or out performs IntervalTree based search. The cost of IntervalTree construction is also substantial and produces lot of garbage during repairs. I ran the tests using random intervals to build the tree/lists and another randomly generated search interval with 5000 iterations. I'm attaching all the relevant graphs. The x-axis in the graphs is the search interval coverage. 10p means the search interval covered 10% of the intervals. The y-axis is the time the search took in nanos. PS: # For the purpose of test, I simplified the IntervalTree code by making it non-generic and removing the data portion of the interval. # I used the code from Cassandra version _3.11_. # Time in the graph is in nanos. was: Cassandra uses IntervalTrees to identify the SSTables that overlap with search interval. In Cassandra, IntervalTrees are not mutated. They are recreated each time a mutation is required. This can be an issue during repairs. In fact we noticed such issues during repair. Since lists are cache friendly compared to linked lists and trees, I decided to compare the search performance with: * Linear Walk. * Elimination using Binary Search (idea is to eliminate intervals using start and end points of search interval). Based on the tests I ran, I noticed Binary Search based elimination almost always performs similar to IntervalTree performance or out performs IntervalTree based search. I ran the tests using random intervals to build the tree/lists and another randomly generated search interval with 5000 iterations. I'm attaching all the relevant graphs. PS: # For the purpose of test, I simplified the IntervalTree code by making it non-generic and removing the data portion of the interval. # I used the code from Cassandra version _3.11_. # Time in the graph is in nanos. > IntervalTree performance comparison with Linear Walk and Binary Search based > Elimination. > -- > > Key: CASSANDRA-15397 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15397 > Project: Cassandra > Issue Type: Improvement >Reporter: Chandrasekhar Thumuluru >Priority: Normal > Attachments: 99p_1_SSTable_with_5000_Searches.png, > 99p_15000_SSTable_with_5000_Searches.png, > 99p_2_SSTable_with_5000_Searches.png, > 99p_25000_SSTable_with_5000_Searches.png, > 99p_3_SSTable_with_5000_Searches.png, > 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, > IntervalListWithElimination.java, IntervalTreeSimplified.java > > > Cassandra uses IntervalTrees to identify the SSTables that overlap with > search interval. In Cassandra, IntervalTrees are not mutated. They are > recreated each time a mutation is required. This can be an issue during > repairs. In fact we noticed such issues during repair. > Since lists are cache friendly compared to linked lists and trees, I decided > to compare the search performance with: > * Linear Walk. > * Elimination using Binary Search (idea is to eliminate intervals using start > and end points of search interval). > Based on the tests I ran, I noticed Binary Search based elimination almost > always performs similar to IntervalTree or out performs IntervalTree based > search. The cost of IntervalTree construction is also substantial and > produces lot of garbage during repairs. > I ran the tests using random intervals to build the tree/lists and another > randomly generated search interval with 5000 iterations. I'm attaching all > the relevant graphs. The x-axis in the graphs is the search interval > coverage. 10p means the search interval covered 10% of the intervals. The > y-axis is the time the search took in nanos. > PS: > # For the purpose of test, I simplified the IntervalTree code by making it > non-generic and removing the data portion of the interval. > # I used the code from Cassandra version _3.11_. > # Time in the graph is in nanos. -- This message was sent by Atlassian Jira
[jira] [Commented] (CASSANDRA-14731) Transient Write Metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967580#comment-16967580 ] Abdul Aziz Ali commented on CASSANDRA-14731: Thanks [~benedict] ill try to send a patch or a PR in the next week or so > Transient Write Metrics > --- > > Key: CASSANDRA-14731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14731 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Abdul Aziz Ali >Priority: Low > Labels: metrics, transient-replication > Fix For: 4.x > > > While we record the number of attempt transient writes, we do not record how > successful these were. > Also, we do not count transient writes that happen due to the failure > detector. While these are distinct from those writes that happen > ‘speculatively’ due to slow responses, there’s a strong chance they will be > the most common form of transient write. It might be worth having separate > metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15398) TBD (minor, boring)
Aleksey Yeschenko created CASSANDRA-15398: - Summary: TBD (minor, boring) Key: CASSANDRA-15398 URL: https://issues.apache.org/jira/browse/CASSANDRA-15398 Project: Cassandra Issue Type: Bug Components: Cluster/Schema Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14731) Transient Write Metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967544#comment-16967544 ] Benedict Elliott Smith edited comment on CASSANDRA-14731 at 11/5/19 1:54 PM: - Hi [~abdulazizali]. Sure, you're welcome to take this ticket. It's a while since I thought about this, so I will have to acclimatise to the context again, but there appear to be two proposals in this ticket: 1) tracking those transient writes we perform _because of the failure detector,_ and these are going to happen in {{sendToHintedReplicas}}; 2) tracking success of a transient write, particularly for achieving the requested consistency level, and that would happen in {{AbstractWriteResponseHandler}} but, in retrospect, I'm not entirely sure what this would look like. was (Author: benedict): Hi [~abdulazizali]. Sure, you're welcome to take this ticket. It's a while since I thought about this, so I will have to acclimatise to the context again, but there appear to be two proposals in this ticket: 1) tracking those transient writes we perform _because of the failure detector,_ and these are going to happen in {{sendToHintedReplicas}}. If we want to track success of transient writes for achieving quorum, that would happen in {{AbstractWriteResponseHandler}} but, in retrospect, I'm not entirely sure what this would look like. > Transient Write Metrics > --- > > Key: CASSANDRA-14731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14731 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Abdul Aziz Ali >Priority: Low > Labels: metrics, transient-replication > Fix For: 4.x > > > While we record the number of attempt transient writes, we do not record how > successful these were. > Also, we do not count transient writes that happen due to the failure > detector. While these are distinct from those writes that happen > ‘speculatively’ due to slow responses, there’s a strong chance they will be > the most common form of transient write. It might be worth having separate > metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14731) Transient Write Metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967544#comment-16967544 ] Benedict Elliott Smith commented on CASSANDRA-14731: Hi [~abdulazizali]. Sure, you're welcome to take this ticket. It's a while since I thought about this, so I will have to acclimatise to the context again, but there appear to be two proposals in this ticket: 1) tracking those transient writes we perform _because of the failure detector,_ and these are going to happen in {{sendToHintedReplicas}}. If we want to track success of transient writes for achieving quorum, that would happen in {{AbstractWriteResponseHandler}} but, in retrospect, I'm not entirely sure what this would look like. > Transient Write Metrics > --- > > Key: CASSANDRA-14731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14731 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Abdul Aziz Ali >Priority: Low > Labels: metrics, transient-replication > Fix For: 4.x > > > While we record the number of attempt transient writes, we do not record how > successful these were. > Also, we do not count transient writes that happen due to the failure > detector. While these are distinct from those writes that happen > ‘speculatively’ due to slow responses, there’s a strong chance they will be > the most common form of transient write. It might be worth having separate > metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14731) Transient Write Metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith updated CASSANDRA-14731: --- Description: While we record the number of attempt transient writes, we do not record how successful these were. Also, we do not count transient writes that happen due to the failure detector. While these are distinct from those writes that happen ‘speculatively’ due to slow responses, there’s a strong chance they will be the most common form of transient write. It might be worth having separate metrics. was: While we record the number of attempt transient writes, we do not record how successful these were. Also, we do not count transient writes that happen due to the failure detector. Possibly, these While these are distinct from those writes that happen ‘speculatively’ due to slow responses, there’s a strong chance they will be the most common form of transient write. It might be worth having separate > Transient Write Metrics > --- > > Key: CASSANDRA-14731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14731 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Abdul Aziz Ali >Priority: Low > Labels: metrics, transient-replication > Fix For: 4.x > > > While we record the number of attempt transient writes, we do not record how > successful these were. > Also, we do not count transient writes that happen due to the failure > detector. While these are distinct from those writes that happen > ‘speculatively’ due to slow responses, there’s a strong chance they will be > the most common form of transient write. It might be worth having separate > metrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14731) Transient Write Metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-14731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict Elliott Smith reassigned CASSANDRA-14731: -- Assignee: Abdul Aziz Ali > Transient Write Metrics > --- > > Key: CASSANDRA-14731 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14731 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core, Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Abdul Aziz Ali >Priority: Low > Labels: metrics, transient-replication > Fix For: 4.x > > > While we record the number of attempt transient writes, we do not record how > successful these were. > Also, we do not count transient writes that happen due to the failure > detector. Possibly, these While these are distinct from those writes that > happen ‘speculatively’ due to slow responses, there’s a strong chance they > will be the most common form of transient write. It might be worth having > separate -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org