[jira] [Commented] (CASSANDRA-15177) Reloading of auth caches happens on the calling thread

2019-06-24 Thread Sam Tunnicliffe (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871004#comment-16871004
 ] 

Sam Tunnicliffe commented on CASSANDRA-15177:
-

[~ben.manes] can't we just supply a different executor here: 
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/AuthCache.java#L221]
 ? 

> Reloading of auth caches happens on the calling thread
> --
>
> Key: CASSANDRA-15177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15177
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Authorization
>Reporter: Sam Tunnicliffe
>Priority: Normal
>
> When Guava caches were replaced by their Caffeine equivalents in 
> CASSANDRA-10855, the async reloading of stale AuthCache entries was lost due 
> to the use of {{MoreExecutors.directExecutor()}} to provide the delegate 
> executor. Under normal conditions, we can expect these operations to be 
> relatively expensive, and in failure scenarios where replicas for the auth 
> data are DOWN this will greatly increase latency, so they shouldn’t be done 
> on threads servicing requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15178) Skipping illegal legacy cells can break reverse iteration of indexed partitions

2019-06-24 Thread Marcus Eriksson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15178:

Reviewers: Marcus Eriksson
   Status: Review In Progress  (was: Patch Available)

> Skipping illegal legacy cells can break reverse iteration of indexed 
> partitions
> ---
>
> Key: CASSANDRA-15178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> The fix for CASSANDRA-15086 interacts badly with the accounting of bytes read 
> from disk when indexed partitions are read in reverse. The skipped columns 
> can cause the tracking of where CQL rows span index block boundaries to be 
> incorrectly calculated, leading to rows being missing from read results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15180) SASI: Cant retrieve the result when the condition is Mixed query condition when one conditon is "=" for a int type and other condition is ">" for a string type

2019-06-24 Thread wu taiyin (JIRA)
wu taiyin created CASSANDRA-15180:
-

 Summary: SASI: Cant retrieve the result when the condition is 
Mixed query condition when one conditon is "=" for a int type and other 
condition is ">" for a string type
 Key: CASSANDRA-15180
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15180
 Project: Cassandra
  Issue Type: Bug
Reporter: wu taiyin


I have a SASI scene and I am not sure if this is a problem. 
I also create two SASI indexes for two column in one table, One column is int 
type and one is text type. 
When i use the query condition by these two index cloumn with ununequal 
conditon , the result cannot be retrieved.
and the schema and query cql as follow :

1: create a table and SASI index for two column:
cassandra@cqlsh:app105> CREATE TABLE app105.complextestff(
 ... a int,
 ... b int,
 ... c int,
 ... d text,
 ... PRIMARY KEY (a,b)
 ... );
cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_c ON 
app105.complextestff (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_d ON 
app105.complextestff (d) USING 'org.apache.cassandra.index.sasi.SASIIndex';

2: insert some data to this table 
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,1,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,2,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,3,1,'2');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,4,1,'2');
cassandra@cqlsh:app105> select * from complextestff ;
 a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2


3: query by condition. all the index column use "=" , and it can retrieve the 
result succefully.
cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1;
 a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2


4、but when i use the ">" operator to a index column which is text type , it 
cant retrieve the result succefully
cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1 and d >= 
'1' ALLOW FILTERING;
 a | b | c | d
---+---+---+---
(0 rows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15177) Reloading of auth caches happens on the calling thread

2019-06-24 Thread Ben Manes (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871194#comment-16871194
 ] 

Ben Manes commented on CASSANDRA-15177:
---

Oh of course. There was a request at some point to not do that, so I was 
mentioning the trick in case if you preferred the mixed behavior.

> Reloading of auth caches happens on the calling thread
> --
>
> Key: CASSANDRA-15177
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15177
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Authorization
>Reporter: Sam Tunnicliffe
>Priority: Normal
>
> When Guava caches were replaced by their Caffeine equivalents in 
> CASSANDRA-10855, the async reloading of stale AuthCache entries was lost due 
> to the use of {{MoreExecutors.directExecutor()}} to provide the delegate 
> executor. Under normal conditions, we can expect these operations to be 
> relatively expensive, and in failure scenarios where replicas for the auth 
> data are DOWN this will greatly increase latency, so they shouldn’t be done 
> on threads servicing requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15180) SASI: Cant retrieve the result when the condition is Mixed query condition when one conditon is "=" for a int type and other condition is ">" for a string type

2019-06-24 Thread wu taiyin (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wu taiyin updated CASSANDRA-15180:
--
Description: 
I have a SASI scene and I am not sure if this is a problem. 
 I also create two SASI indexes for two column in one table, One column is int 
type and one is text type. 
 When i use the query condition by these two index cloumn with ununequal 
conditon , the result cannot be retrieved.
 and the schema and query cql as follow :

1: create a table and SASI index for two column:
 cassandra@cqlsh:app105> CREATE TABLE app105.complextestff(
 ... a int,
 ... b int,
 ... c int,
 ... d text,
 ... PRIMARY KEY (a,b)
 ... );
 cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_c ON 
app105.complextestff (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
 cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_d ON 
app105.complextestff (d) USING 'org.apache.cassandra.index.sasi.SASIIndex';

2: insert some data to this table 
 cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,1,1,'1');
 cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,2,1,'1');
 cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,3,1,'2');
 cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,4,1,'2');
 cassandra@cqlsh:app105> select * from complextestff ;
 a | b | c | d
 ---++--+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2

3: query by condition. all the index column use "=" , and it can retrieve the 
result succefully.
 cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1;
 a | b | c | d
 ---++--+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2

4、but when i use the ">" operator to a index column which is text type , it 
cant retrieve the result succefully
 cassandra@cqlsh:app105> select * from complextestff WHER*{color:#FF}E a=1 
and c=1 and d >= '1'{color}* ALLOW FILTERING;
 a | b | c | d
 ---++--+---
 (0 rows)

  was:
I have a SASI scene and I am not sure if this is a problem. 
I also create two SASI indexes for two column in one table, One column is int 
type and one is text type. 
When i use the query condition by these two index cloumn with ununequal 
conditon , the result cannot be retrieved.
and the schema and query cql as follow :

1: create a table and SASI index for two column:
cassandra@cqlsh:app105> CREATE TABLE app105.complextestff(
 ... a int,
 ... b int,
 ... c int,
 ... d text,
 ... PRIMARY KEY (a,b)
 ... );
cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_c ON 
app105.complextestff (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_d ON 
app105.complextestff (d) USING 'org.apache.cassandra.index.sasi.SASIIndex';

2: insert some data to this table 
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,1,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,2,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,3,1,'2');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,4,1,'2');
cassandra@cqlsh:app105> select * from complextestff ;
 a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2


3: query by condition. all the index column use "=" , and it can retrieve the 
result succefully.
cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1;
 a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2


4、but when i use the ">" operator to a index column which is text type , it 
cant retrieve the result succefully
cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1 and d >= 
'1' ALLOW FILTERING;
 a | b | c | d
---+---+---+---
(0 rows)


> SASI: Cant retrieve the result when the condition is Mixed query condition 
> when one conditon is "=" for a int type and other condition is ">" for a 
> string type
> ---
>
> Key: CASSANDRA-15180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15180
> Project: Cassandra
>  Issue Type: Bug
>Reporter: wu taiyin
>Priority: Normal
>
> I have a SASI scene and I am not sure if this is a problem. 
>  I also create two SASI indexes for two column in one table, One column is 
> int type and one is text type. 
>  When i use the query condition by these two index cloumn with ununequal 
> conditon , the result cannot be retrieved.
>  and the schema and query cql as follow :
> 1: create a table and SASI index for two column:
>  cassandra@cqlsh:app105> CREATE TABLE app105.complex

[jira] [Commented] (CASSANDRA-15180) SASI: Cant retrieve the result when the condition is Mixed query condition when one conditon is "=" for a int type and other condition is ">" for a string type

2019-06-24 Thread wu taiyin (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871201#comment-16871201
 ] 

wu taiyin commented on CASSANDRA-15180:
---

And I aslo try to use casandra secondly index , I find that it can get the 
result in a same query condition : 

hope your reply, thanks

CREATE TABLE app105.complextestff (
 a int,
 b int,
 c int,
 d text,
 PRIMARY KEY (a, b)
) WITH CLUSTERING ORDER BY (b ASC)
 AND bloom_filter_fp_chance = 0.01
 AND caching = \{'keys': 'ALL', 'rows_per_partition': 'NONE'}
 AND comment = ''
 AND compaction = \{'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}
 AND compression = \{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
 AND crc_check_chance = 1.0
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99PERCENTILE';
CREATE INDEX sasi_105_findtest4_c ON app105.complextestff (c);
CREATE INDEX sasi_105_findtest4_d ON app105.complextestff (d);

cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,1,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,2,1,'1');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,3,1,'2');
cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
1,4,1,'2');
cassandra@cqlsh:app105> select * from complextestff ;

a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2

(4 rows)
cassandra@cqlsh:app105> select * from complextestff *{color:#FF}WHERE a=1 
and c=1 and d >= '1' ALLOW FILTERING;{color}*

a | b | c | d
---+---+---+---
 1 | 1 | 1 | 1
 1 | 2 | 1 | 1
 1 | 3 | 1 | 2
 1 | 4 | 1 | 2

> SASI: Cant retrieve the result when the condition is Mixed query condition 
> when one conditon is "=" for a int type and other condition is ">" for a 
> string type
> ---
>
> Key: CASSANDRA-15180
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15180
> Project: Cassandra
>  Issue Type: Bug
>Reporter: wu taiyin
>Priority: Normal
>
> I have a SASI scene and I am not sure if this is a problem. 
> I also create two SASI indexes for two column in one table, One column is int 
> type and one is text type. 
> When i use the query condition by these two index cloumn with ununequal 
> conditon , the result cannot be retrieved.
> and the schema and query cql as follow :
> 1: create a table and SASI index for two column:
> cassandra@cqlsh:app105> CREATE TABLE app105.complextestff(
>  ... a int,
>  ... b int,
>  ... c int,
>  ... d text,
>  ... PRIMARY KEY (a,b)
>  ... );
> cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_c ON 
> app105.complextestff (c) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> cassandra@cqlsh:app105> CREATE CUSTOM INDEX sasi_105_findtest4_d ON 
> app105.complextestff (d) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> 2: insert some data to this table 
> cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
> 1,1,1,'1');
> cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
> 1,2,1,'1');
> cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
> 1,3,1,'2');
> cassandra@cqlsh:app105> INSERT INTO complextestff(a , b , c , d ) VALUES ( 
> 1,4,1,'2');
> cassandra@cqlsh:app105> select * from complextestff ;
>  a | b | c | d
> ---+---+---+---
>  1 | 1 | 1 | 1
>  1 | 2 | 1 | 1
>  1 | 3 | 1 | 2
>  1 | 4 | 1 | 2
> 3: query by condition. all the index column use "=" , and it can retrieve the 
> result succefully.
> cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1;
>  a | b | c | d
> ---+---+---+---
>  1 | 1 | 1 | 1
>  1 | 2 | 1 | 1
>  1 | 3 | 1 | 2
>  1 | 4 | 1 | 2
> 4、but when i use the ">" operator to a index column which is text type , it 
> cant retrieve the result succefully
> cassandra@cqlsh:app105> select * from complextestff WHERE a=1 and c=1 and d 
> >= '1' ALLOW FILTERING;
>  a | b | c | d
> ---+---+---+---
> (0 rows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15178) Skipping illegal legacy cells can break reverse iteration of indexed partitions

2019-06-24 Thread Marcus Eriksson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15178:

Status: Ready to Commit  (was: Review In Progress)

+1

> Skipping illegal legacy cells can break reverse iteration of indexed 
> partitions
> ---
>
> Key: CASSANDRA-15178
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15178
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Local Write-Read Paths
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> The fix for CASSANDRA-15086 interacts badly with the accounting of bytes read 
> from disk when indexed partitions are read in reverse. The skipped columns 
> can cause the tracking of where CQL rows span index block boundaries to be 
> incorrectly calculated, leading to rows being missing from read results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14772) Fix issues in audit / full query log interactions

2019-06-24 Thread Marcus Eriksson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14772:

Reviewers: Aleksey Yeschenko, Per Otterström, Vinay Chella  (was: Aleksey 
Yeschenko, Vinay Chella)

> Fix issues in audit / full query log interactions
> -
>
> Key: CASSANDRA-14772
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14772
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL, Legacy/Tools
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0
>
>
> There are some problems with the audit + full query log code that need to be 
> resolved before 4.0 is released:
> * Fix performance regression in FQL that makes it less usable than it should 
> be.
> * move full query log specific code to a separate package 
> * do some audit log class renames (I keep reading {{BinLogAuditLogger}} vs 
> {{BinAuditLogger}} wrong for example)
> * avoid parsing the CQL queries twice in {{QueryMessage}} when audit log is 
> enabled.
> * add a new tool to dump audit logs (ie, let fqltool be full query log 
> specific). fqltool crashes when pointed to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14772) Fix issues in audit / full query log interactions

2019-06-24 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871397#comment-16871397
 ] 

Marcus Eriksson commented on CASSANDRA-14772:
-

thanks for having a look [~eperott], pushed up fixes to your comments

and tests have run here: 
https://circleci.com/workflow-run/58e385bb-0298-4136-b58c-3dd83643e774

> Fix issues in audit / full query log interactions
> -
>
> Key: CASSANDRA-14772
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14772
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL, Legacy/Tools
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0
>
>
> There are some problems with the audit + full query log code that need to be 
> resolved before 4.0 is released:
> * Fix performance regression in FQL that makes it less usable than it should 
> be.
> * move full query log specific code to a separate package 
> * do some audit log class renames (I keep reading {{BinLogAuditLogger}} vs 
> {{BinAuditLogger}} wrong for example)
> * avoid parsing the CQL queries twice in {{QueryMessage}} when audit log is 
> enabled.
> * add a new tool to dump audit logs (ie, let fqltool be full query log 
> specific). fqltool crashes when pointed to them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: trunk_vs_30x_summary.png

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14764) Evaluate 12 Node Breaking Point, compression=none, encryption=none, coalescing=off

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch reassigned CASSANDRA-14764:


Assignee: Vinay Chella

> Evaluate 12 Node Breaking Point, compression=none, encryption=none, 
> coalescing=off
> --
>
> Key: CASSANDRA-14764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14764
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Streaming and Messaging
>Reporter: Joseph Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Attachments: i-03341e1c52de6ea3e-after-queue-change.svg, 
> i-07cd92e844d66d801-after-queue-bound.svg, i-07cd92e844d66d801-hint-play.svg, 
> i-07cd92e844d66d801-uninlined-with-jvm-methods.svg, ttop.txt
>
>
> *Setup:*
>  * Cassandra: 12 (2*6) node i3.xlarge AWS instance (4 cpu cores, 30GB ram) 
> running cassandra trunk off of jasobrown/14503 jdd7ec5a2 (Jasons patched 
> internode messaging branch) vs the same footprint running 3.0.17
>  * Two datacenters with 100ms latency between them
>  * No compression, encryption, or coalescing turned on
> *Test #1:*
> ndbench sent 1.5k QPS at a coordinator level to one datacenter (RF=3*2 = 6 so 
> 3k global replica QPS) of 4kb single partition BATCH mutations at LOCAL_ONE. 
> This represents about 250 QPS per coordinator in the first datacenter or 60 
> QPS per core. The goal was to observe P99 write and read latencies under 
> various QPS.
> *Result:*
> The good news is since the CASSANDRA-14503 changes, instead of keeping the 
> mutations on heap we put the message into hints instead and don't run out of 
> memory. The bad news is that the {{MessagingService-NettyOutbound-Thread's}} 
> would occasionally enter a degraded state where they would just spin on a 
> core. I've attached flame graphs showing the CPU state as [~jasobrown] 
> applied fixes to the {{OutboundMessagingConnection}} class.
>  *Follow Ups:*
> [~jasobrown] has committed a number of fixes onto his 
> {{jasobrown/14503-collab}} branch including:
> 1. Limiting the amount of time spent dequeuing messages if they are expired 
> (previously if messages entered the queue faster than we could dequeue them 
> we'd just inifinte loop on the consumer side)
> 2. Don't call {{dequeueMessages}} from within {{dequeueMessages}} created 
> callbacks.
> We're continuing to use CPU flamegraphs to figure out where we're looping and 
> fixing bugs as we find them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15181) Ensure Nodes can Start and Stop

2019-06-24 Thread Joseph Lynch (JIRA)
Joseph Lynch created CASSANDRA-15181:


 Summary: Ensure Nodes can Start and Stop
 Key: CASSANDRA-15181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15181
 Project: Cassandra
  Issue Type: Sub-task
  Components: Legacy/Streaming and Messaging, Test/benchmark
Reporter: Joseph Lynch
Assignee: Vinay Chella


Let's load a cluster up with data and start killing nodes. We can do hard 
failures (node terminations) and soft failures (process kills) We plan to 
observe the following:

* Can nodes successfully bootstrap?
* How long does it take to bootstrap
* What are the effects of TLS on and off (e.g. on stream time)
* Are hints properly played after a node restart
* Do nodes properly shutdown and start back up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15181) Ensure Nodes can Start and Stop

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15181:
-
 Complexity: Normal
   Priority: High  (was: Normal)
Change Category: Operability
 Status: Open  (was: Triage Needed)

> Ensure Nodes can Start and Stop
> ---
>
> Key: CASSANDRA-15181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15181
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Streaming and Messaging, Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Vinay Chella
>Priority: High
>
> Let's load a cluster up with data and start killing nodes. We can do hard 
> failures (node terminations) and soft failures (process kills) We plan to 
> observe the following:
> * Can nodes successfully bootstrap?
> * How long does it take to bootstrap
> * What are the effects of TLS on and off (e.g. on stream time)
> * Are hints properly played after a node restart
> * Do nodes properly shutdown and start back up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871656#comment-16871656
 ] 

Joseph Lynch commented on CASSANDRA-15175:
--

I have completed the {{LOCAL_ONE}} scaling test. I have summarized the test in 
the following graph:
 !trunk_vs_30x_summary.png!

As we can see, even with the extra TLS CPU requirements, trunk was able to 
significantly outperform the status quo 3.0.x cluster across the load spectrum 
for this consistency level

I am proceeding with other consistency levels and gathering additional data.

So far I have noticed the following issues during these tests which I will 
gather more data on and follow up with in other tickets (and edit here with 
ticket numbers once I have them):
 # JDK Netty TLS appears significantly more CPU intensive than the previous 
Java Sockets implementation. [~norman] is taking a look from the Netty side and 
we can follow up and make sure we're not creating improperly (looking at the 
flamegraphs it looks like we may have a buffer sizing issue)
 # When a node was terminated and replaced, the new node appeared to sit for a 
very long time waiting for schema pulls to complete (I think it was waiting on 
the node it was replacing but I haven't fully debugged this).
 # Nodetool netstats doesn't report progress properly for the file count 
(percent, single file, and size still seem right; this is probably 
CASSANDRA-14192
 # When we re-load NTS keyspaces from disk we throw warnings about "Ignoring 
Unrecognized strategy option" for datacenters that we are not in
 # After a node shuts down there is a burst of re-connections on the urgent 
port prior to actual shutdown (I _think_ this is pre-existing and I'm just 
noticing it because of the new logging)

Also while setting up the {{LOCAL_QUORUM}} test I found the following trying to 
understand why I was seeing a higher number of blocking read repairs on the 
trunk cluster than the 30x cluster:
 # When I stop and start nodes, it appears that hints may not always playback. 
In particular the high blocking read repairs were coming from neighbors of the 
node I had restarted a few times to test tcnative openssl integration. I 
checked the neighbor's hints directories and sure enough there were pending 
hints there that were not playing at all (they had been there for over 8 hours 
and still not played).
 # Repair appears to fail on the default system_traces when run with {{-full}} 
and \{{-os}
{noformat}
cass-perf-trunk-14746--useast1c-i-00a32889835534b75:~$ nodetool repair -os 
-full -local
[2019-06-23 23:29:30,210] Starting repair command #1 
(bfbc7ba0-960e-11e9-b238-77fd1c2e9b1c), repairing keyspace perftest with repair 
options (parallelism: parallel, primary range: false, incremental: false, job 
threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 6, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:08,248] Repair session c0573500-960e-11e9-b238-77fd1c2e9b1c 
for range [(384307168575030403,384307170010857891], 
(192153585909716729,384307168575030403]] finished (progress: 10%)
[2019-06-23 23:52:26,393] Repair session c0307320-960e-11e9-b238-77fd1c2e9b1c 
for range [(1808575567,192153584473889241], 
(192153584473889241,192153585909716729]] finished (progress: 20%)
[2019-06-23 23:52:28,298] Repair session c059f420-960e-11e9-b238-77fd1c2e9b1c 
for range [(576460752676171565,576460754111999053], 
(384307170010857891,576460752676171565]] finished (progress: 30%)
[2019-06-23 23:52:28,302] Repair completed successfully
[2019-06-23 23:52:28,310] Repair command #1 finished in 22 minutes 58 seconds
[2019-06-23 23:52:28,331] Replication factor is 1. No repair is needed for 
keyspace 'system_auth'
[2019-06-23 23:52:28,350] Starting repair command #2 
(f52c1c70-9611-11e9-b238-77fd1c2e9b1c), repairing keyspace system_traces with 
repair options (parallelism: parallel, primary range: false, incremental: 
false, job threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 2, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:28,351] Repair command #2 failed with error Endpoints can not 
be empty
[2019-06-23 23:52:28,351] Repair command #2 finished with error
error: Repair job has failed with the error message: [2019-06-23 23:52:28,351] 
Repair command #2 failed with error Endpoints can not be empty. Check the logs 
on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2019-06-23 23:52:28,351] Repair command #2 failed with error Endpoints can not 
be empty. Check the logs on the repair participants for further details
at 
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:122)
at 
org.apache.cassandra.utils.progress.jmx.JM

[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871684#comment-16871684
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] one question... when using JDK TLS do you see any errors at all or 
you just see more CPU usage and thats it ? 

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: trunk_vs_30x_LQ_14kcRPS_14kcWPS.png

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: trunk_LQ_14400cRPS-14400cWPS.svg

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_LQ_14400cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: odd_netty_jdk_tls_cpu_usage.png

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: ShortbufferExceptions.png

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871714#comment-16871714
 ] 

Joseph Lynch commented on CASSANDRA-15175:
--

{quote}
[~jolynch] one question... when using JDK TLS do you see any errors at all or 
you just see more CPU usage and thats it ? 
{quote}

I don't see any errors in our logs but we are spending more CPU handling 
{{ShortBufferExceptions}} internally to Netty then may make sense. I took the 
following screenshots from the [^trunk_LQ_14400cRPS-14400cWPS.svg] flamegraph.
 !odd_netty_jdk_tls_cpu_usage.png!

!ShortbufferExceptions.png!

Other than the flamegraph and degraded latency in {{LOCAL_QUORUM}} mode (where 
C* nodes actually have to talk to each other through the internode messaging 
framework), things appear about the same (no errors that I can see).

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871656#comment-16871656
 ] 

Joseph Lynch edited comment on CASSANDRA-15175 at 6/24/19 7:23 PM:
---

I have completed the {{LOCAL_ONE}} scaling test. I have summarized the test in 
the following graph:
 !trunk_vs_30x_summary.png!

As we can see, even with the extra TLS CPU requirements, trunk was able to 
significantly outperform the status quo 3.0.x cluster across the load spectrum 
for this consistency level

I am proceeding with other consistency levels and gathering additional data.

So far I have noticed the following issues during these tests which I will 
gather more data on and follow up with in other tickets (and edit here with 
ticket numbers once I have them):
 # JDK Netty TLS appears significantly more CPU intensive than the previous 
Java Sockets implementation. [~norman] is taking a look from the Netty side and 
we can follow up and make sure we're not creating improperly (looking at the 
flamegraphs it looks like we may have a buffer sizing issue)
 # When a node was terminated and replaced, the new node appeared to sit for a 
very long time waiting for schema pulls to complete (I think it was waiting on 
the node it was replacing but I haven't fully debugged this).
 # Nodetool netstats doesn't report progress properly for the file count 
(percent, single file, and size still seem right; this is probably 
CASSANDRA-14192
 # When we re-load NTS keyspaces from disk we throw warnings about "Ignoring 
Unrecognized strategy option" for datacenters that we are not in
 # After a node shuts down there is a burst of re-connections on the urgent 
port prior to actual shutdown (I _think_ this is pre-existing and I'm just 
noticing it because of the new logging)

Also while setting up the {{LOCAL_QUORUM}} test I found the following trying to 
understand why I was seeing a higher number of blocking read repairs on the 
trunk cluster than the 30x cluster:
 # When I stop and start nodes, it appears that hints may not always playback. 
In particular the high blocking read repairs were coming from neighbors of the 
node I had restarted a few times to test tcnative openssl integration. I 
checked the neighbor's hints directories and sure enough there were pending 
hints there that were not playing at all (they had been there for over 8 hours 
and still not played).
 # -Repair appears to fail on the default system_traces when run with {{-full}} 
and {{-os}- (Edit: this is operator error, we shouldn't pass -local to a 
SimpleStrategy keyspace)
{noformat}
cass-perf-trunk-14746--useast1c-i-00a32889835534b75:~$ nodetool repair -os 
-full -local
[2019-06-23 23:29:30,210] Starting repair command #1 
(bfbc7ba0-960e-11e9-b238-77fd1c2e9b1c), repairing keyspace perftest with repair 
options (parallelism: parallel, primary range: false, incremental: false, job 
threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 6, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:08,248] Repair session c0573500-960e-11e9-b238-77fd1c2e9b1c 
for range [(384307168575030403,384307170010857891], 
(192153585909716729,384307168575030403]] finished (progress: 10%)
[2019-06-23 23:52:26,393] Repair session c0307320-960e-11e9-b238-77fd1c2e9b1c 
for range [(1808575567,192153584473889241], 
(192153584473889241,192153585909716729]] finished (progress: 20%)
[2019-06-23 23:52:28,298] Repair session c059f420-960e-11e9-b238-77fd1c2e9b1c 
for range [(576460752676171565,576460754111999053], 
(384307170010857891,576460752676171565]] finished (progress: 30%)
[2019-06-23 23:52:28,302] Repair completed successfully
[2019-06-23 23:52:28,310] Repair command #1 finished in 22 minutes 58 seconds
[2019-06-23 23:52:28,331] Replication factor is 1. No repair is needed for 
keyspace 'system_auth'
[2019-06-23 23:52:28,350] Starting repair command #2 
(f52c1c70-9611-11e9-b238-77fd1c2e9b1c), repairing keyspace system_traces with 
repair options (parallelism: parallel, primary range: false, incremental: 
false, job threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 2, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:28,351] Repair command #2 failed with error Endpoints can not 
be empty
[2019-06-23 23:52:28,351] Repair command #2 finished with error
error: Repair job has failed with the error message: [2019-06-23 23:52:28,351] 
Repair command #2 failed with error Endpoints can not be empty. Check the logs 
on the repair participants for further details
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error message: 
[2019-06-23 23:52:28,351] Repair command #2 failed with error Endpoints can not 
be empty. Check the logs on the repair participants for further details
 

[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871727#comment-16871727
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] yeah that is why I ask... From the OpenJDK code it seems like 
`ShortBufferException` should really "never happen". That is why I ask about 
errors.

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10190) Python 3 support for cqlsh

2019-06-24 Thread Stefan Podkowinski (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871740#comment-16871740
 ] 

Stefan Podkowinski commented on CASSANDRA-10190:


I was just trying to catch up, so please correct me if I'm wrong. So we're 
adding 1) Python 3 support for cqlshlib and tests 2) fix tests in cqlshlib/test 
but don't actually run them on CI (will be done in  CASSANDRA-14990) 3) fix and 
enable remaining cqlsh dtest leftovers from CASSANDRA-14298 with dependency to 
cqlshlib.

Current branches are 
https://github.com/ptbannister/cassandra/tree/10190-rebase-20190609
https://github.com/ptbannister/cassandra-dtest/commits/cqlshlib6-rebase-20190322



> Python 3 support for cqlsh
> --
>
> Key: CASSANDRA-10190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Tools
>Reporter: Andrew Pennebaker
>Assignee: Patrick Bannister
>Priority: Normal
>  Labels: cqlsh
> Attachments: coverage_notes.txt
>
>
> Users who operate in a Python 3 environment may have trouble launching cqlsh. 
> Could we please update cqlsh's syntax to run in Python 3?
> As a workaround, users can setup pyenv, and cd to a directory with a 
> .python-version containing "2.7". But it would be nice if cqlsh supported 
> modern Python versions out of the box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871774#comment-16871774
 ] 

Joseph Lynch commented on CASSANDRA-15175:
--

[~norman] yeah sadly I don't see any exceptions in the C* code that correlate 
with that exception, but I can try enabling more verbose logging on particular 
netty modules if you think it will help? Also fwiw I think that the 3.0 branch 
in this test is using {{TLS_RSA_WITH_AES_128_CBC_SHA}} as the default cipher 
instead of {{TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384}}. I don't really know 
what I'm talking about when it comes to TLS cipher suites but it appears from 
my reading of https://bugs.openjdk.java.net/browse/JDK-8046943 that {{GCM}} is 
very slow in Java 8 (apparently fixed in Java 9). That might explain why we're 
spending so much CPU time in GaloisCountMode (which I assume is GCM). I can try 
using {{TLS_RSA_WITH_AES_256_CBC_SHA}} with both as a fair test?

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871778#comment-16871778
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] Yes please use a non GCM cipher and report back :)

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871778#comment-16871778
 ] 

Norman Maurer edited comment on CASSANDRA-15175 at 6/24/19 9:10 PM:


[~jolynch] Yes please use a non GCM cipher and report back :) And please ensure 
you use the same ciphers when comparing 3.x vs trunk as otherwise there is 
really no way to compare these at all (from my understanding you use different 
ciphers maybe)


was (Author: norman):
[~jolynch] Yes please use a non GCM cipher and report back :)

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871656#comment-16871656
 ] 

Joseph Lynch edited comment on CASSANDRA-15175 at 6/24/19 11:22 PM:


I have completed the {{LOCAL_ONE}} scaling test. I have summarized the test in 
the following graph:
 !trunk_vs_30x_summary.png!

As we can see, even with the extra TLS CPU requirements, trunk was able to 
significantly outperform the status quo 3.0.x cluster across the load spectrum 
for this consistency level

I am proceeding with other consistency levels and gathering additional data.

So far I have noticed the following issues during these tests which I will 
gather more data on and follow up with in other tickets (and edit here with 
ticket numbers once I have them):
 # JDK Netty TLS appears significantly more CPU intensive than the previous 
Java Sockets implementation. [~norman] is taking a look from the Netty side and 
we can follow up and make sure we're not creating improperly (looking at the 
flamegraphs it looks like we may have a buffer sizing issue)
 # When a node was terminated and replaced, the new node appeared to sit for a 
very long time waiting for schema pulls to complete (I think it was waiting on 
the node it was replacing but I haven't fully debugged this).
 # Nodetool netstats doesn't report progress properly for the file count 
(percent, single file, and size still seem right; this is probably 
CASSANDRA-14192
 # When we re-load NTS keyspaces from disk we throw warnings about "Ignoring 
Unrecognized strategy option" for datacenters that we are not in
 # After a node shuts down there is a burst of re-connections on the urgent 
port prior to actual shutdown (I _think_ this is pre-existing and I'm just 
noticing it because of the new logging)

Also while setting up the {{LOCAL_QUORUM}} test I found the following trying to 
understand why I was seeing a higher number of blocking read repairs on the 
trunk cluster than the 30x cluster:
 # -When I stop and start nodes, it appears that hints may not always playback. 
In particular the high blocking read repairs were coming from neighbors of the 
node I had restarted a few times to test tcnative openssl integration. I 
checked the neighbor's hints directories and sure enough there were pending 
hints there that were not playing at all (they had been there for over 8 hours 
and still not played).- (Edit: This is a bad default. The default 
hinted_handoff_throttle_in_kb is 1024 but it is divided by the number of nodes 
in the cluster. In this case the size of 192 meant we were playing hints at a 
rate of ~5 kbps, which meant if we were down for even a few minutes we would 
essentially lose those mutations before the 24 hour hint expiry window)
 # -Repair appears to fail on the default system_traces when run with {{-full}} 
and {{-os}- (Edit: this is operator error, we shouldn't pass -local to a 
SimpleStrategy keyspace)
{noformat}
cass-perf-trunk-14746--useast1c-i-00a32889835534b75:~$ nodetool repair -os 
-full -local
[2019-06-23 23:29:30,210] Starting repair command #1 
(bfbc7ba0-960e-11e9-b238-77fd1c2e9b1c), repairing keyspace perftest with repair 
options (parallelism: parallel, primary range: false, incremental: false, job 
threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 6, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:08,248] Repair session c0573500-960e-11e9-b238-77fd1c2e9b1c 
for range [(384307168575030403,384307170010857891], 
(192153585909716729,384307168575030403]] finished (progress: 10%)
[2019-06-23 23:52:26,393] Repair session c0307320-960e-11e9-b238-77fd1c2e9b1c 
for range [(1808575567,192153584473889241], 
(192153584473889241,192153585909716729]] finished (progress: 20%)
[2019-06-23 23:52:28,298] Repair session c059f420-960e-11e9-b238-77fd1c2e9b1c 
for range [(576460752676171565,576460754111999053], 
(384307170010857891,576460752676171565]] finished (progress: 30%)
[2019-06-23 23:52:28,302] Repair completed successfully
[2019-06-23 23:52:28,310] Repair command #1 finished in 22 minutes 58 seconds
[2019-06-23 23:52:28,331] Replication factor is 1. No repair is needed for 
keyspace 'system_auth'
[2019-06-23 23:52:28,350] Starting repair command #2 
(f52c1c70-9611-11e9-b238-77fd1c2e9b1c), repairing keyspace system_traces with 
repair options (parallelism: parallel, primary range: false, incremental: 
false, job threads: 1, ColumnFamilies: [], dataCenters: [us-east-1], hosts: [], 
previewKind: NONE, # of ranges: 2, pull repair: false, force repair: false, 
optimise streams: true)
[2019-06-23 23:52:28,351] Repair command #2 failed with error Endpoints can not 
be empty
[2019-06-23 23:52:28,351] Repair command #2 finished with error
error: Repair job has failed with the error message: [2019-06-23 23:52:28,351] 
Repair command #2

[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: 30x_LQ_21600cRPS-14400cWPS.svg
trunk_LQ_21600cRPS-14400cWPS.svg

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_LQ_21600cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871894#comment-16871894
 ] 

Joseph Lynch commented on CASSANDRA-15175:
--

I switched to the same cipher that 3.0 is running and saw a reduction of on CPU 
time to 12.9% (compared to 3.0's 8.5%). This is a significant improvement but 
still not quite equal. Interestingly with that improvement average latency is 
now on par with 3.0 in the local quorum test. 

 [^trunk_LQ_21600cRPS-14400cWPS.svg]  [^30x_LQ_21600cRPS-14400cWPS.svg] 

I'm going to finish off this round of jdk TLS testing and then switch to 
tcnative tomorrow and test that.

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_LQ_21600cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Attachment: trunk_vs_30x_LQ_21kcRPS_14kcWPS.png

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_LQ_21600cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_LQ_21kcRPS_14kcWPS.png, 
> trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8099) Refactor and modernize the storage engine

2019-06-24 Thread Sumin Byeon (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumin Byeon updated CASSANDRA-8099:
---
Description: 
The current storage engine (which for this ticket I'll loosely define as "the 
code implementing the read/write path") is suffering from old age. One of the 
main problem is that the only structure it deals with is the cell, which 
completely ignores the more high level CQL structure that groups cell into 
(CQL) rows.

This leads to many inefficiencies, like the fact that during a reads we have to 
group cells multiple times (to count on replica, then to count on the 
coordinator, then to produce the CQL resultset) because we forget about the 
grouping right away each time (so lots of useless cell names comparisons in 
particular). But outside inefficiencies, having to manually recreate the CQL 
structure every time we need it for something is hindering new features and 
makes the code more complex that it should be.

Said storage engine also has tons of technical debt. To pick an example, the 
fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to 
go into to simply "remove the last query result".

So I want to bite the bullet and modernize this storage engine. I propose to do 
2 main things:
 # Make the storage engine more aware of the CQL structure. In practice, 
instead of having partitions be a simple iterable map of cells, it should be an 
iterable list of row (each being itself composed of per-column cells, though 
obviously not exactly the same kind of cell we have today).
 # Make the engine more iterative. What I mean here is that in the read path, 
we end up reading all cells in memory (we put them in a ColumnFamily object), 
but there is really no reason to. If instead we were working with iterators all 
the way through, we could get to a point where we're basically transferring 
data from disk to the network, and we should be able to reduce GC substantially.

Please note that such refactor should provide some performance improvements 
right off the bat but it's not its primary goal either. Its primary goal is to 
simplify the storage engine and adds abstraction that are better suited to 
further optimizations.

  was:
The current storage engine (which for this ticket I'll loosely define as "the 
code implementing the read/write path") is suffering from old age. One of the 
main problem is that the only structure it deals with is the cell, which 
completely ignores the more high level CQL structure that groups cell into 
(CQL) rows.

This leads to many inefficiencies, like the fact that during a reads we have to 
group cells multiple times (to count on replica, then to count on the 
coordinator, then to produce the CQL resultset) because we forget about the 
grouping right away each time (so lots of useless cell names comparisons in 
particular). But outside inefficiencies, having to manually recreate the CQL 
structure every time we need it for something is hindering new features and 
makes the code more complex that it should be.

Said storage engine also has tons of technical debt. To pick an example, the 
fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has to 
go into to simply "remove the last query result".

So I want to bite the bullet and modernize this storage engine. I propose to do 
2 main things:
# Make the storage engine more aware of the CQL structure. In practice, instead 
of having partitions be a simple iterable map of cells, it should be an 
iterable list of row (each being itself composed of per-column cells, though 
obviously not exactly the same kind of cell we have today).
# Make the engine more iterative. What I mean here is that in the read path, we 
end up reading all cells in memory (we put them in a ColumnFamily object), but 
there is really no reason to. If instead we were working with iterators all the 
way through, we could get to a point where we're basically transferring data 
from disk to the network, and we should be able to reduce GC substantially.

Please note that such refactor should provide some performance improvements 
right off the bat but it's not it's primary goal either. It's primary goal is 
to simplify the storage engine and adds abstraction that are better suited to 
further optimizations.


> Refactor and modernize the storage engine
> -
>
> Key: CASSANDRA-8099
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0 alpha 1
>
> Attachments: 

[jira] [Created] (CASSANDRA-15182) cqlsh utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True):1:'ascii' codec can't encode character u'\u9ed1' in position 60: ordina

2019-06-24 Thread gloCalHelp.com (JIRA)
gloCalHelp.com created CASSANDRA-15182:
--

 Summary: cqlsh utf_8.py, line 16, in decode return 
codecs.utf_8_decode(input, errors, True):1:'ascii' codec can't encode 
character u'\u9ed1' in position 60: ordinal not in range(128)
 Key: CASSANDRA-15182
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15182
 Project: Cassandra
  Issue Type: Bug
  Components: CQL/Interpreter
Reporter: gloCalHelp.com


I use cqlsh 5.0.1 with cassandra 3.11.3 with python2.7.13 in Centos 6.9.

when I run this cql command: bin/cqlsh hadoop4 -u dba -p LinJiaXin858 --debug  
-e "INSERT INTO HYGL_JCSJ.hyjg_ods_yy_gps_novar3 
(clcph,dwsj,bc,blbs,cjbzh,ckryid,clid,clmc,ddfx,ddrq,fwj,gd,gdjd,gdwd,jsdlc,jszjl,jxzjl,sjid,sjsfzh,sjxm,sssd,xlmc)
 VALUES 
('黑A00888D','2019-06-2509:57:19',0,,'',,,'379-7038',1434,'2019-06-25',275,0,126723690,45726990
 ,796.0,2205,746,'null','null','null',0,'379');"

I get the error message as below:

Using CQL driver: 
Using connect timeout: 5 seconds
Using 'utf-8' encoding
Using ssl: False
Traceback (most recent call last):
 File "/home/cassandra/cas3.11.3/bin/cqlsh.py", line 926, in onecmd
 self.handle_statement(st, statementtext)
 File "/home/cassandra/cas3.11.3/bin/cqlsh.py", line 966, in handle_statement
 return self.perform_statement(cqlruleset.cql_extract_orig(tokens, srcstr))
 File "/home/cassandra/cas3.11.3/bin/cqlsh.py", line 1000, in perform_statement
 success, future = self.perform_simple_statement(stmt)
 File "/home/cassandra/cas3.11.3/bin/cqlsh.py", line 1053, in 
perform_simple_statement
 self.printerr(unicode(err.__class__.__name__) + u": " + 
err.message.decode(encoding='utf-8'))
 File "/usr/local/python27/lib/python2.7/encodings/utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u9ed1' in position 
60: ordinal not in range(128)

 

this issue seems different with the select command issue on  
https://issues.apache.org/jira/browse/CASSANDRA-10875 

and other method to add "-*- coding: utf-8 -*- " in the head of cqlsh.py ,  can 
anyone hurry up to teach me?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



svn commit: r1862034 - /cassandra/site/publish/download/index.html

2019-06-24 Thread mshuler
Author: mshuler
Date: Tue Jun 25 05:49:41 2019
New Revision: 1862034

URL: http://svn.apache.org/viewvc?rev=1862034&view=rev
Log:
Add line breaks to code block, because (╯°□°)╯︵ ┻━┻

Attempted markdown line breaks:
- 2 spaces at end of lines
- 2 backslashes at end of lines
- ~~~ delimeters
- add  to end of lines
- various random code{} css changes
..none of which worked as expected.
What actually is the way to get line breaks in a code block?

Modified:
cassandra/site/publish/download/index.html

Modified: cassandra/site/publish/download/index.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/download/index.html?rev=1862034&r1=1862033&r2=1862034&view=diff
==
--- cassandra/site/publish/download/index.html (original)
+++ cassandra/site/publish/download/index.html Tue Jun 25 05:49:41 2019
@@ -190,12 +190,12 @@ configuration changes.
   Add the Apache repository of Cassandra to /etc/yum.repos.d/cassandra.repo, for example 
for the latest 3.11 version:
 
 
-[cassandra]
-name=Apache Cassandra
-baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
-gpgcheck=1
-repo_gpgcheck=1
-gpgkey=https://www.apache.org/dist/cassandra/KEYS
+[cassandra]
+name=Apache Cassandra
+baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
+gpgcheck=1
+repo_gpgcheck=1
+gpgkey=https://www.apache.org/dist/cassandra/KEYS
 
 
 



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org