[jira] [Commented] (CASSANDRA-18808) netty-handler vulnerability: CVE-2023-4586

2023-09-28 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770135#comment-17770135
 ] 

Norman Maurer commented on CASSANDRA-18808:
---

Sorry I didn't have time yet but its on my todo list 

> netty-handler vulnerability: CVE-2023-4586
> --
>
> Key: CASSANDRA-18808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18808
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP:
> {noformat}
> Dependency-Check Failure:
> One or more dependencies were identified with vulnerabilities that have a 
> CVSS score greater than or equal to '1.0': 
> netty-handler-4.1.96.Final.jar: CVE-2023-4586
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18808) netty-handler vulnerability: CVE-2023-4586

2023-09-19 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766866#comment-17766866
 ] 

Norman Maurer commented on CASSANDRA-18808:
---

Ok will do tomorrow latest. 

> netty-handler vulnerability: CVE-2023-4586
> --
>
> Key: CASSANDRA-18808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18808
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP:
> {noformat}
> Dependency-Check Failure:
> One or more dependencies were identified with vulnerabilities that have a 
> CVSS score greater than or equal to '1.0': 
> netty-handler-4.1.96.Final.jar: CVE-2023-4586
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18808) netty-handler vulnerability: CVE-2023-4586

2023-09-19 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766864#comment-17766864
 ] 

Norman Maurer commented on CASSANDRA-18808:
---

I can also verify and do a PR if needed... just let me know 

> netty-handler vulnerability: CVE-2023-4586
> --
>
> Key: CASSANDRA-18808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18808
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP:
> {noformat}
> Dependency-Check Failure:
> One or more dependencies were identified with vulnerabilities that have a 
> CVSS score greater than or equal to '1.0': 
> netty-handler-4.1.96.Final.jar: CVE-2023-4586
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18808) netty-handler vulnerability: CVE-2023-4586

2023-09-19 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766849#comment-17766849
 ] 

Norman Maurer commented on CASSANDRA-18808:
---

Netty does not enable hostname verification by default. You need to enable it 
by yourself. If you already have there is nothing you need to do.

> netty-handler vulnerability: CVE-2023-4586
> --
>
> Key: CASSANDRA-18808
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18808
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Brandon Williams
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> This is failing OWASP:
> {noformat}
> Dependency-Check Failure:
> One or more dependencies were identified with vulnerabilities that have a 
> CVSS score greater than or equal to '1.0': 
> netty-handler-4.1.96.Final.jar: CVE-2023-4586
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18148) netty-all vulnerability: CVE-2022-41881

2023-01-12 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17676508#comment-17676508
 ] 

Norman Maurer commented on CASSANDRA-18148:
---

Your assumption is correct... Cassandra is not affected as it not use the 
decoder in question 

> netty-all vulnerability: CVE-2022-41881
> ---
>
> Key: CASSANDRA-18148
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18148
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: Brandon Williams
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
>
> This is showing in the OWASP scan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16277) 'SSLEngine closed already' exception on failed outbound connection

2020-11-17 Thread Norman Maurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234313#comment-17234313
 ] 

Norman Maurer commented on CASSANDRA-16277:
---

LGTM as well +1

> 'SSLEngine closed already' exception on failed outbound connection
> --
>
> Key: CASSANDRA-16277
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16277
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
>
> Occasionally Netty will invoke 
> {{OutboundConnectionInitiator#exceptionCaught()}} handler to process an 
> exception of the following kind:
> {code}
> io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: 
> Connection reset by peer
> {code}
> When we invoke {{ctx.close()}} later in that method, the listener, set up in 
> {{channelActive()}}, might be
> failed with an {{SSLException("SSLEngine closed already”)}} by Netty, and 
> {{exceptionCaught()}} will be invoked
> once again, this time to handle the {{SSLException}} triggered by 
> {{ctx.close()}}.
> The exception at this stage is benign, and we shouldn't be double-logging the 
> failure to connect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-27 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874475#comment-16874475
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] so is it fair to say that there is nothing for me to investigate for 
now on the Netty side ?

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_LQ_21600cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_LQ_21kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_64kcRPS_14kcWPS.png, trunk_vs_30x_LQ_jdk_summary.png, 
> trunk_vs_30x_LQ_openssl_21kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_tcnative_summary.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871778#comment-16871778
 ] 

Norman Maurer edited comment on CASSANDRA-15175 at 6/24/19 9:10 PM:


[~jolynch] Yes please use a non GCM cipher and report back :) And please ensure 
you use the same ciphers when comparing 3.x vs trunk as otherwise there is 
really no way to compare these at all (from my understanding you use different 
ciphers maybe)


was (Author: norman):
[~jolynch] Yes please use a non GCM cipher and report back :)

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871778#comment-16871778
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] Yes please use a non GCM cipher and report back :)

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871727#comment-16871727
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] yeah that is why I ask... From the OpenJDK code it seems like 
`ShortBufferException` should really "never happen". That is why I ask about 
errors.

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> odd_netty_jdk_tls_cpu_usage.png, trunk_14400cRPS-14400cWPS.svg, 
> trunk_187000cRPS-14400cWPS.svg, trunk_187kcRPS_14kcWPS.png, 
> trunk_22000cRPS-14400cWPS-jdk.svg, trunk_22000cRPS-14400cWPS-openssl.svg, 
> trunk_220kcRPS_14kcWPS.png, trunk_252kcRPS-14kcWPS.png, 
> trunk_93500cRPS-14400cWPS.svg, trunk_LQ_14400cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-24 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871684#comment-16871684
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

[~jolynch] one question... when using JDK TLS do you see any errors at all or 
you just see more CPU usage and thats it ? 

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-22 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870444#comment-16870444
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

Los please include the cipher that is used

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_93500cRPS-14400cWPS.svg, trunk_vs_30x_125kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kRPS_14kcWPS_load.png, trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> Test setup at (reproduced below)
> [https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-22 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870256#comment-16870256
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

Can you provide the full JDK version as well ?

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_24kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS_load.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-06-22 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870255#comment-16870255
 ] 

Norman Maurer commented on CASSANDRA-15175:
---

I will have a Look 

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> trunk_14400cRPS-14400cWPS.svg, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_24kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS_load.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2018-08-21 Thread Norman Maurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587629#comment-16587629
 ] 

Norman Maurer commented on CASSANDRA-13651:
---

I am no committer but the netty project lead so from the point of view of netty 
usage I am also +1 on this.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Major
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only 

[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374138#comment-16374138
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

Just a general comment The recycler only makes sense to use if creating the 
object is considered very expensive and or if you create / destroy a lot of 
these very frequently. Which means usually thousands per second.  So if this is 
not the case here I think it completely reasonable to not use the Recycler at 
all... As I have no idea really about the use-case I am just leave this here as 
general comment :)

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363974#comment-16363974
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~tsteinmaurer] yeah thanks I see... So looks more like a misusage for me then 
a netty bug. Cassandra may also consider to configure the `Recycler` with a 
more sane default value for this use-case (via the constructor).

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363597#comment-16363597
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~jay.zhuang] I would be interested what is contained in the `Stack` itself

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-01 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348723#comment-16348723
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~tsteinmaurer] what about a heap dump ? Is this something you could provide ?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-01 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348181#comment-16348181
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

yeah 4.0.55.Final should have the "fix" as well:

 

[https://github.com/netty/netty/commit/b386ee3eaf35abd5072992d626de6ae2ccadc6d9#diff-23eafd00fcd66829f8cce343b26c236a]

 

That said maybe there are other issues. Would it be possible to share a 
heap-dump ?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-09-28 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183709#comment-16183709
 ] 

Norman Maurer commented on CASSANDRA-13789:
---

[~jasobrown] [~Stefania] my exception was that caller of this methods ensure 
that there will be no "concurrent" access to the underlying object and so 
duplicate it if necessary. If this is not true then yes this needs to be 
changed. Just let me know and I will take care (happy to write a patch).

> Reduce memory copies and object creations when acting on  ByteBufs
> --
>
> Key: CASSANDRA-13789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Fix For: 4.0
>
> Attachments: 
> 0001-CBUtil.sizeOfLongString-encodes-String-to-byte-to-ca.patch, 
> 0001-Reduce-memory-copies-and-object-creations-when-actin.patch
>
>
> There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
> and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-09-07 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157554#comment-16157554
 ] 

Norman Maurer commented on CASSANDRA-13789:
---

[~jasobrown] Doh! I guess I need to buy the beer now 

> Reduce memory copies and object creations when acting on  ByteBufs
> --
>
> Key: CASSANDRA-13789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Fix For: 4.0
>
> Attachments: 
> 0001-CBUtil.sizeOfLongString-encodes-String-to-byte-to-ca.patch, 
> 0001-Reduce-memory-copies-and-object-creations-when-actin.patch
>
>
> There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
> and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-08-24 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-13789:
--
Attachment: 0001-CBUtil.sizeOfLongString-encodes-String-to-byte-to-ca.patch

And another one...

> Reduce memory copies and object creations when acting on  ByteBufs
> --
>
> Key: CASSANDRA-13789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Attachments: 
> 0001-CBUtil.sizeOfLongString-encodes-String-to-byte-to-ca.patch, 
> 0001-Reduce-memory-copies-and-object-creations-when-actin.patch
>
>
> There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
> and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-08-24 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139768#comment-16139768
 ] 

Norman Maurer commented on CASSANDRA-13789:
---

[~jasobrown] let me know if you need anything else.

> Reduce memory copies and object creations when acting on  ByteBufs
> --
>
> Key: CASSANDRA-13789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Attachments: 
> 0001-Reduce-memory-copies-and-object-creations-when-actin.patch
>
>
> There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
> and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-08-23 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-13789:
--
Assignee: Norman Maurer
  Status: Patch Available  (was: Open)

> Reduce memory copies and object creations when acting on  ByteBufs
> --
>
> Key: CASSANDRA-13789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Attachments: 
> 0001-Reduce-memory-copies-and-object-creations-when-actin.patch
>
>
> There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
> and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13789) Reduce memory copies and object creations when acting on ByteBufs

2017-08-23 Thread Norman Maurer (JIRA)
Norman Maurer created CASSANDRA-13789:
-

 Summary: Reduce memory copies and object creations when acting on  
ByteBufs
 Key: CASSANDRA-13789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13789
 Project: Cassandra
  Issue Type: Improvement
Reporter: Norman Maurer
 Attachments: 
0001-Reduce-memory-copies-and-object-creations-when-actin.patch

There are multiple "low-hanging-fruits" when it comes to reduce memory copies 
and object allocations when acting on ByteBufs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134918#comment-16134918
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

Its up to you guys which versions the patch should be applied to, just wanted 
to mention this fix is very low-risk in terms of Netty itself as it just move 
logic to an extra ChannelHandler. Thats all. 

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>  Labels: patch
> Attachments: 
> 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, 
> test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-18 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133438#comment-16133438
 ] 

Norman Maurer commented on CASSANDRA-13651:
---

[~iksaif] FYI we will merge the change to use timerfd today and so it will be 
part of the next Netty release. That said I think what you suggested (changing 
the Cassandra code) may make more sense in general.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with 

[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-18 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-13649:
--
Attachment: 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch

This patch should fix the problem and should go in all active Cassandra trees.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>  Labels: patch
> Attachments: 
> 0001-CASSANDRA-13649-Ensure-all-exceptions-are-correctly-.patch, 
> test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-18 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-13649:
--
  Labels: patch  (was: )
Assignee: Norman Maurer
  Status: Patch Available  (was: Open)

To ensure we handle all exceptions in the netty pipeline that are produced by 
either the Channel itself or the ChannelHandlers in the pipeline we need to 
ensure the handlers that does so not uses the RequestThreadPoolExecutor as it 
not enforces strict ordering of the events per Channel.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
>Assignee: Norman Maurer
>  Labels: patch
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-18 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131914#comment-16131914
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

Actually I think it is because of how you do the event dispatching in 
Cassandra... working on a patch, stay tuned.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125612#comment-16125612
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

And this only happens with the native epoll transport but not with the nio 
transport ?

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-09 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119785#comment-16119785
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

Is it possible that you have an eventExecutor set here?:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/Server.java#L329

And if so can you show me the implementation of it ?

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-09 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119753#comment-16119753
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

[~spo...@gmail.com] yes how handler(...) and childHandler(...) are now handled 
is more consistent. Can you give me a link to the code where you setup your 
handlers that this exceptions produce ?

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13649) Uncaught exceptions in Netty pipeline

2017-08-04 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114698#comment-16114698
 ] 

Norman Maurer commented on CASSANDRA-13649:
---

[~spo...@gmail.com] sorry I am a bit busy atm but will check over the next days 
and come back to you.

> Uncaught exceptions in Netty pipeline
> -
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging, Testing
>Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest 
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
>  Just want to make sure that we don't have to change anything related to the 
> exception handling in our pipeline and that this isn't a netty issue. 
> Actually if this causes flakiness but is otherwise harmless, we should do 
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN  [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN  [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151 
> - An exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed: 
> Connection reset by peer
>   at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown 
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() 
> failed}} error also causes tests to fail for 3.0 and 3.11. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-02 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111027#comment-16111027
 ] 

Norman Maurer commented on CASSANDRA-13651:
---

Also for the record Scott Mitchell (another core netty dev) just created a PR 
in netty to support micro seconds timeouts when using the native epoll 
transport:

https://github.com/netty/netty/pull/7042

That said I still need to review it in detail and its not merged yet.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-01 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110335#comment-16110335
 ] 

Norman Maurer commented on CASSANDRA-13651:
---

Sorry for the late response.. didn't see the mention :(

So yes netty uses `epoll_wait` which only supports milli-seconds resulution so 
everything smaller then this will just cause `epoll_wait(...)` be called with a 
`0` and so a non-blocking check of ready fds. What we could do in our native 
transport implementation is that we make use of `timerfd` [1] to schedule 
timeouts but again this would only work for the case of using the native epoll 
transport and not when you use the nio transport (which works on all OS). So I 
think what you really want to do is have timeouts of >= 1ms. 

Comments and ideas welcome :)


> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>

[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-03-27 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943172#comment-15943172
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] let me know if I should review the code again.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2017-02-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863760#comment-15863760
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] whooot :) I will do another review as well, just to double-check 
again.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13114) 3.0.x: update netty

2017-01-25 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838465#comment-15838465
 ] 

Norman Maurer commented on CASSANDRA-13114:
---

[~spo...@gmail.com] nope just upgrade to 4.0.43. FTW ;) That said we will 
release 4.0.44.Final end of this week or next week latest.

> 3.0.x: update netty
> ---
>
> Key: CASSANDRA-13114
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13114
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom van der Woerdt
> Attachments: 13114_netty-4.0.43_2.x-3.0.patch, 
> 13114_netty-4.0.43_3.11.patch
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-12032 updated netty for 
> Cassandra 3.8, but this wasn't backported. Netty 4.0.23, which ships with 
> Cassandra 3.0.x, has some serious bugs around memory handling for SSL 
> connections.
> It would be nice if both were updated to 4.0.42, a version released this year.
> 4.0.23 makes it impossible for me to run SSL, because nodes run out of memory 
> every ~30 minutes. This was fixed in 4.0.27.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10993) Make read and write requests paths fully non-blocking, eliminate related stages

2016-07-10 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370184#comment-15370184
 ] 

Norman Maurer commented on CASSANDRA-10993:
---

[~thobbs] so all good ? If not just ping me...

> Make read and write requests paths fully non-blocking, eliminate related 
> stages
> ---
>
> Key: CASSANDRA-10993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10993
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Coordination, Local Write-Read Paths
>Reporter: Aleksey Yeschenko
>Assignee: Tyler Hobbs
> Fix For: 3.x
>
>
> Building on work done by [~tjake] (CASSANDRA-10528), [~slebresne] 
> (CASSANDRA-5239), and others, convert read and write request paths to be 
> fully non-blocking, to enable the eventual transition from SEDA to TPC 
> (CASSANDRA-10989)
> Eliminate {{MUTATION}}, {{COUNTER_MUTATION}}, {{VIEW_MUTATION}}, {{READ}}, 
> and {{READ_REPAIR}} stages, move read and write execution directly to Netty 
> context.
> For lack of decent async I/O options on Linux, we’ll still have to retain an 
> extra thread pool for serving read requests for data not residing in our page 
> cache (CASSANDRA-5863), however.
> Implementation-wise, we only have two options available to us: explicit FSMs 
> and chained futures. Fibers would be the third, and easiest option, but 
> aren’t feasible in Java without resorting to direct bytecode manipulation 
> (ourselves or using [quasar|https://github.com/puniverse/quasar]).
> I have seen 4 implementations bases on chained futures/promises now - three 
> in Java and one in C++ - and I’m not convinced that it’s the optimal (or 
> sane) choice for representing our complex logic - think 2i quorum read 
> requests with timeouts at all levels, read repair (blocking and 
> non-blocking), and speculative retries in the mix, {{SERIAL}} reads and 
> writes.
> I’m currently leaning towards an implementation based on explicit FSMs, and 
> intend to provide a prototype - soonish - for comparison with 
> {{CompletableFuture}}-like variants.
> Either way the transition is a relatively boring straightforward refactoring.
> There are, however, some extension points on both write and read paths that 
> we do not control:
> - authorisation implementations will have to be non-blocking. We have control 
> over built-in ones, but for any custom implementation we will have to execute 
> them in a separate thread pool
> - 2i hooks on the write path will need to be non-blocking
> - any trigger implementations will not be allowed to block
> - UDFs and UDAs
> We are further limited by API compatibility restrictions in the 3.x line, 
> forbidding us to alter, or add any non-{{default}} interface methods to those 
> extension points, so these pose a problem.
> Depending on logistics, expecting to get this done in time for 3.4 or 3.6 
> feature release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-07-10 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370179#comment-15370179
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] I did a first review-round and left some comments. All in all it 
looks solid from a Netty perspective, can't say a lot about cassandra 
perspective here :) Let me know if you have any questions etc.

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10993) Make read and write requests paths fully non-blocking, eliminate related stages

2016-07-08 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368142#comment-15368142
 ] 

Norman Maurer commented on CASSANDRA-10993:
---

[~thobbs] can you explain me why you can not just add the tasks to nettys 
EventLoop directly ? Is it because you not want to wake it up but just run 
these once the EventLoop run ?

> Make read and write requests paths fully non-blocking, eliminate related 
> stages
> ---
>
> Key: CASSANDRA-10993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10993
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Coordination, Local Write-Read Paths
>Reporter: Aleksey Yeschenko
>Assignee: Tyler Hobbs
> Fix For: 3.x
>
>
> Building on work done by [~tjake] (CASSANDRA-10528), [~slebresne] 
> (CASSANDRA-5239), and others, convert read and write request paths to be 
> fully non-blocking, to enable the eventual transition from SEDA to TPC 
> (CASSANDRA-10989)
> Eliminate {{MUTATION}}, {{COUNTER_MUTATION}}, {{VIEW_MUTATION}}, {{READ}}, 
> and {{READ_REPAIR}} stages, move read and write execution directly to Netty 
> context.
> For lack of decent async I/O options on Linux, we’ll still have to retain an 
> extra thread pool for serving read requests for data not residing in our page 
> cache (CASSANDRA-5863), however.
> Implementation-wise, we only have two options available to us: explicit FSMs 
> and chained futures. Fibers would be the third, and easiest option, but 
> aren’t feasible in Java without resorting to direct bytecode manipulation 
> (ourselves or using [quasar|https://github.com/puniverse/quasar]).
> I have seen 4 implementations bases on chained futures/promises now - three 
> in Java and one in C++ - and I’m not convinced that it’s the optimal (or 
> sane) choice for representing our complex logic - think 2i quorum read 
> requests with timeouts at all levels, read repair (blocking and 
> non-blocking), and speculative retries in the mix, {{SERIAL}} reads and 
> writes.
> I’m currently leaning towards an implementation based on explicit FSMs, and 
> intend to provide a prototype - soonish - for comparison with 
> {{CompletableFuture}}-like variants.
> Either way the transition is a relatively boring straightforward refactoring.
> There are, however, some extension points on both write and read paths that 
> we do not control:
> - authorisation implementations will have to be non-blocking. We have control 
> over built-in ones, but for any custom implementation we will have to execute 
> them in a separate thread pool
> - 2i hooks on the write path will need to be non-blocking
> - any trigger implementations will not be allowed to block
> - UDFs and UDAs
> We are further limited by API compatibility restrictions in the 3.x line, 
> forbidding us to alter, or add any non-{{default}} interface methods to those 
> extension points, so these pose a problem.
> Depending on logistics, expecting to get this done in time for 3.4 or 3.6 
> feature release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10993) Make read and write requests paths fully non-blocking, eliminate related stages

2016-07-07 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366364#comment-15366364
 ] 

Norman Maurer commented on CASSANDRA-10993:
---

[~thobbs] basically we made it final to guard ourselves from users that will 
depend on some methods that we may want to remove later on. Can you explain me 
a bit what you try to do or show some code to better understand the use case ? 

> Make read and write requests paths fully non-blocking, eliminate related 
> stages
> ---
>
> Key: CASSANDRA-10993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10993
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Coordination, Local Write-Read Paths
>Reporter: Aleksey Yeschenko
>Assignee: Tyler Hobbs
> Fix For: 3.x
>
>
> Building on work done by [~tjake] (CASSANDRA-10528), [~slebresne] 
> (CASSANDRA-5239), and others, convert read and write request paths to be 
> fully non-blocking, to enable the eventual transition from SEDA to TPC 
> (CASSANDRA-10989)
> Eliminate {{MUTATION}}, {{COUNTER_MUTATION}}, {{VIEW_MUTATION}}, {{READ}}, 
> and {{READ_REPAIR}} stages, move read and write execution directly to Netty 
> context.
> For lack of decent async I/O options on Linux, we’ll still have to retain an 
> extra thread pool for serving read requests for data not residing in our page 
> cache (CASSANDRA-5863), however.
> Implementation-wise, we only have two options available to us: explicit FSMs 
> and chained futures. Fibers would be the third, and easiest option, but 
> aren’t feasible in Java without resorting to direct bytecode manipulation 
> (ourselves or using [quasar|https://github.com/puniverse/quasar]).
> I have seen 4 implementations bases on chained futures/promises now - three 
> in Java and one in C++ - and I’m not convinced that it’s the optimal (or 
> sane) choice for representing our complex logic - think 2i quorum read 
> requests with timeouts at all levels, read repair (blocking and 
> non-blocking), and speculative retries in the mix, {{SERIAL}} reads and 
> writes.
> I’m currently leaning towards an implementation based on explicit FSMs, and 
> intend to provide a prototype - soonish - for comparison with 
> {{CompletableFuture}}-like variants.
> Either way the transition is a relatively boring straightforward refactoring.
> There are, however, some extension points on both write and read paths that 
> we do not control:
> - authorisation implementations will have to be non-blocking. We have control 
> over built-in ones, but for any custom implementation we will have to execute 
> them in a separate thread pool
> - 2i hooks on the write path will need to be non-blocking
> - any trigger implementations will not be allowed to block
> - UDFs and UDAs
> We are further limited by API compatibility restrictions in the 3.x line, 
> forbidding us to alter, or add any non-{{default}} interface methods to those 
> extension points, so these pose a problem.
> Depending on logistics, expecting to get this done in time for 3.4 or 3.6 
> feature release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption

2016-06-20 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15339194#comment-15339194
 ] 

Norman Maurer commented on CASSANDRA-10735:
---

Sorry for the delay but the good news is that I think I have everything needed 
here locally implemented now... Stay tuned for have everything needed merged 
into Netty. Once in I will look into add it to cassandra itself

Performance FTW ;)

> Support netty openssl (netty-tcnative) for client encryption
> 
>
> Key: CASSANDRA-10735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10735
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andy Tolbert
>Assignee: Norman Maurer
> Fix For: 3.x
>
> Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, 
> nettysslbench.png, nettysslbench_small.png, sslbench12-03.png
>
>
> The java-driver recently added support for using netty openssl via 
> [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in 
> [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a 
> very measured improvement (numbers incoming on that ticket).   It seems 
> likely that this can offer improvement if implemented C* side as well.
> Since netty-tcnative has platform specific requirements, this should not be 
> made the default, but rather be an option that one can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11921) Upgrade to Netty 4.1 + PR5314

2016-05-30 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306538#comment-15306538
 ] 

Norman Maurer commented on CASSANDRA-11921:
---

[~snazy] no... the only thing to consider is that 4.1.x is using the 
PooledByteBufAllocator by default while 4.0.x is not. Also something as a 
side-note. I will merge the PR into 4.0 as well

> Upgrade to Netty 4.1 + PR5314
> -
>
> Key: CASSANDRA-11921
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11921
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 3.x
>
>
> Netty [PR5314|https://github.com/netty/netty/pull/5314] works around 
> {{Bits.reserveMemory}}+{{Cleaner}} and introduces an independent off-heap 
> memory pool.
> Requirement for CASSANDRA-11870
> Local tests of Netty4.1+PR5314 against trunk were running fine.
> Any incompatibilities or else to consider when upgrading from Netty 4.0 to 
> 4.1?
> /cc [~norman] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-27 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304464#comment-15304464
 ] 

Norman Maurer edited comment on CASSANDRA-11818 at 5/27/16 6:14 PM:


[~snazy] Sorry for the late response (busy as always :( )... I wonder if this 
would be something that may be helpful for you in terms of Netty: 
https://github.com/netty/netty/pull/5314


was (Author: norman):
Sorry for the late response (busy as always :( )... I wonder if this would be 
something that may be helpful for you in terms of Netty: 
https://github.com/netty/netty/pull/5314

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, 
> oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445)
>  ~[main/:na]
>   at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> {code}
> The situation does not get better when the load driver is stopped.
> I can reproduce this scenario at will. Managed to get histogram, stack traces 
> and heap dump. Already increased {{-XX:MaxDirectMemorySize}} to {{2g}}.

[jira] [Commented] (CASSANDRA-11818) C* does neither recover nor trigger stability inspector on direct memory OOM

2016-05-27 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304464#comment-15304464
 ] 

Norman Maurer commented on CASSANDRA-11818:
---

Sorry for the late response (busy as always :( )... I wonder if this would be 
something that may be helpful for you in terms of Netty: 
https://github.com/netty/netty/pull/5314

> C* does neither recover nor trigger stability inspector on direct memory OOM
> 
>
> Key: CASSANDRA-11818
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11818
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Robert Stupp
> Attachments: 11818-direct-mem-unpooled.png, 11818-direct-mem.png, 
> oom-histo-live.txt, oom-stack.txt
>
>
> The following stack trace is not caught by {{JVMStabilityInspector}}.
> Situation was caused by a load test with a lot of parallel writes and reads 
> against a single node.
> {code}
> ERROR [SharedPool-Worker-1] 2016-05-17 18:38:44,187 Message.java:611 - 
> Unexpected exception during request; channel = [id: 0x1e02351b, 
> L:/127.0.0.1:9042 - R:/127.0.0.1:51087]
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_92]
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) 
> ~[na:1.8.0_92]
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 
> ~[na:1.8.0_92]
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:672) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:234) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:218) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:270)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:177)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:105)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:349)
>  ~[main/:na]
>   at 
> org.apache.cassandra.transport.Message$ProtocolEncoder.encode(Message.java:314)
>  ~[main/:na]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:619)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:676)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:612)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher$Flusher.run(Message.java:445)
>  ~[main/:na]
>   at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:374) 
> ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.36.Final.jar:4.0.36.Final]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
> {code}
> The situation does not get better when the load driver is stopped.
> I can reproduce this scenario at will. Managed to get histogram, stack traces 
> and heap dump. Already increased {{-XX:MaxDirectMemorySize}} to {{2g}}.
> A {{nodetool flush}} causes the daemon to exit (as that direct-memory OOM is 
> caught by {{JVMStabilityInspector}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11749) CQLSH gets SSL exception following a COPY FROM

2016-05-26 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303499#comment-15303499
 ] 

Norman Maurer commented on CASSANDRA-11749:
---

[~Stefania] no problems... Sorry that it took me so long :(

> CQLSH gets SSL exception following a COPY FROM
> --
>
> Key: CASSANDRA-11749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11749
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x
>
> Attachments: driver_debug.txt, stdout.txt.zip, 
> stdout_single_process.txt.zip
>
>
> When running Cassandra and cqlsh with SSL, the following command occasionally 
> results in the exception below:
> {code}
> cqlsh --ssl -f kv.cql
> {code}
> {code}
> ERROR [SharedPool-Worker-2] 2016-05-11 12:41:03,583 Message.java:538 - 
> Unexpected exception during request; channel = [id: 0xeb75e05d, 
> /127.0.0.1:51083 => /127.0.0.1:9042]
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: bad 
> record MAC
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: javax.net.ssl.SSLException: bad record MAC
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:981) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) 
> ~[na:1.8.0_91]
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_91]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:982) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:908) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:854) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> ... 10 common frames omitted
> Caused by: javax.crypto.BadPaddingException: bad record MAC
> at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:974) 
> ~[na:1.8.0_91]
> ... 17 common frames omitted
> {code}
> where
> {code}
> cat kv.cql 
> create keyspace if not exists cvs_copy_ks with replication = {'class': 
> 'SimpleStrategy', 'replication_factor':1};
> create table if not exists cvs_copy_ks.kv (key int primary key, value text);
> truncate cvs_copy_ks.kv;
> copy cvs_copy_ks.kv (key, value) from 'kv.csv' with header='true';
> select * from cvs_copy_ks.kv;
> drop keyspace cvs_copy_ks;
> stefi@cuoricina:~/git/cstar/cassandra$ cat kv.c
> kv.cql  kv.csv  
> cat kv.csv 
> key,value
> 1,'a'
> 2,'b'
> 3,'c'
> {code}
> The COPY FROM succeeds, however the following select does not. 
> The easiest way to 

[jira] [Commented] (CASSANDRA-11749) CQLSH gets SSL exception following a COPY FROM

2016-05-26 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302311#comment-15302311
 ] 

Norman Maurer commented on CASSANDRA-11749:
---

[~jjordan] Only very briefly but my suspection so far is that it is actually a 
race in cqlsh where two multiple threads write to the same connection 
concurrently. Like one writes and the second starts to write as well before the 
first is complete. Could this be possible ? This would also explain why a sleep 
may "workaround" this.

> CQLSH gets SSL exception following a COPY FROM
> --
>
> Key: CASSANDRA-11749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11749
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x
>
> Attachments: stdout.txt.zip, stdout_single_process.txt.zip
>
>
> When running Cassandra and cqlsh with SSL, the following command occasionally 
> results in the exception below:
> {code}
> cqlsh --ssl -f kv.cql
> {code}
> {code}
> ERROR [SharedPool-Worker-2] 2016-05-11 12:41:03,583 Message.java:538 - 
> Unexpected exception during request; channel = [id: 0xeb75e05d, 
> /127.0.0.1:51083 => /127.0.0.1:9042]
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: bad 
> record MAC
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: javax.net.ssl.SSLException: bad record MAC
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:981) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) 
> ~[na:1.8.0_91]
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_91]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:982) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:908) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:854) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> ... 10 common frames omitted
> Caused by: javax.crypto.BadPaddingException: bad record MAC
> at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:974) 
> ~[na:1.8.0_91]
> ... 17 common frames omitted
> {code}
> where
> {code}
> cat kv.cql 
> create keyspace if not exists cvs_copy_ks with replication = {'class': 
> 'SimpleStrategy', 'replication_factor':1};
> create table if not exists cvs_copy_ks.kv (key int primary key, value text);
> truncate cvs_copy_ks.kv;
> copy cvs_copy_ks.kv (key, value) from 'kv.csv' with header='true';
> select * from 

[jira] [Commented] (CASSANDRA-11749) CQLSH gets SSL exception following a COPY FROM

2016-05-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282697#comment-15282697
 ] 

Norman Maurer commented on CASSANDRA-11749:
---

[~Stefania] if you can give me "step-by-step" way to reproduce this in my 
laptop I will have a look and try to figure out what is wrong

> CQLSH gets SSL exception following a COPY FROM
> --
>
> Key: CASSANDRA-11749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11749
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x
>
> Attachments: stdout.txt.zip, stdout_single_process.txt.zip
>
>
> When running Cassandra and cqlsh with SSL, the following command occasionally 
> results in the exception below:
> {code}
> cqlsh --ssl -f kv.cql
> {code}
> {code}
> ERROR [SharedPool-Worker-2] 2016-05-11 12:41:03,583 Message.java:538 - 
> Unexpected exception during request; channel = [id: 0xeb75e05d, 
> /127.0.0.1:51083 => /127.0.0.1:9042]
> io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: bad 
> record MAC
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:280)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:722)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: javax.net.ssl.SSLException: bad record MAC
> at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:981) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) 
> ~[na:1.8.0_91]
> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_91]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:982) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:908) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:854) 
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at 
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
>  ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> ... 10 common frames omitted
> Caused by: javax.crypto.BadPaddingException: bad record MAC
> at sun.security.ssl.InputRecord.decrypt(InputRecord.java:219) 
> ~[na:1.8.0_91]
> at 
> sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:177) 
> ~[na:1.8.0_91]
> at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:974) 
> ~[na:1.8.0_91]
> ... 17 common frames omitted
> {code}
> where
> {code}
> cat kv.cql 
> create keyspace if not exists cvs_copy_ks with replication = {'class': 
> 'SimpleStrategy', 'replication_factor':1};
> create table if not exists cvs_copy_ks.kv (key int primary key, value text);
> truncate cvs_copy_ks.kv;
> copy cvs_copy_ks.kv (key, value) from 'kv.csv' with header='true';
> select * from cvs_copy_ks.kv;
> drop keyspace cvs_copy_ks;
> stefi@cuoricina:~/git/cstar/cassandra$ cat kv.c
> kv.cql  kv.csv  
> cat kv.csv 
> key,value
> 1,'a'
> 2,'b'
> 3,'c'
> {code}
> The COPY FROM succeeds, however 

[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption

2016-03-31 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220135#comment-15220135
 ] 

Norman Maurer commented on CASSANDRA-10735:
---

[~iamaleksey] you are fast buddy ;)

> Support netty openssl (netty-tcnative) for client encryption
> 
>
> Key: CASSANDRA-10735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10735
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andy Tolbert
>Assignee: Norman Maurer
> Fix For: 3.x
>
> Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, 
> nettysslbench.png, nettysslbench_small.png, sslbench12-03.png
>
>
> The java-driver recently added support for using netty openssl via 
> [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in 
> [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a 
> very measured improvement (numbers incoming on that ticket).   It seems 
> likely that this can offer improvement if implemented C* side as well.
> Since netty-tcnative has platform specific requirements, this should not be 
> made the default, but rather be an option that one can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10735) Support netty openssl (netty-tcnative) for client encryption

2016-03-31 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220125#comment-15220125
 ] 

Norman Maurer commented on CASSANDRA-10735:
---

Actually you may want to assign this to me ? ;)

> Support netty openssl (netty-tcnative) for client encryption
> 
>
> Key: CASSANDRA-10735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10735
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Andy Tolbert
>Assignee: Aleksey Yeschenko
> Fix For: 3.x
>
> Attachments: netty-ssl-trunk.tgz, nettyssl-bench.tgz, 
> nettysslbench.png, nettysslbench_small.png, sslbench12-03.png
>
>
> The java-driver recently added support for using netty openssl via 
> [netty-tcnative|http://netty.io/wiki/forked-tomcat-native.html] in 
> [JAVA-841|https://datastax-oss.atlassian.net/browse/JAVA-841], this shows a 
> very measured improvement (numbers incoming on that ticket).   It seems 
> likely that this can offer improvement if implemented C* side as well.
> Since netty-tcnative has platform specific requirements, this should not be 
> made the default, but rather be an option that one can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8457) nio MessagingService

2016-02-10 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141370#comment-15141370
 ] 

Norman Maurer commented on CASSANDRA-8457:
--

[~jasobrown] you know where to find me ;)

> nio MessagingService
> 
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jonathan Ellis
>Assignee: Jason Brown
>Priority: Minor
>  Labels: performance
> Fix For: 3.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11047) native protocol will not bind ipv6

2016-01-29 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124053#comment-15124053
 ] 

Norman Maurer commented on CASSANDRA-11047:
---

[~brandon.williams] netty 4.0.34.Final was released which has a fix for it. So 
I think it's up to you guys now to upgrade :) 

> native protocol will not bind ipv6
> --
>
> Key: CASSANDRA-11047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Brandon Williams
>Assignee: Norman Maurer
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> When you set rpc_address to 0.0.0.0 it should bind every interface.  Of 
> course for ipv6 you have to comment out -Djava.net.preferIPv4Stack=true from 
> cassandra-env.sh, however this will not make the native protocol bind on 
> ipv6, only thrift:
> {noformat}
> tcp6   0  0 :::9160 :::*LISTEN
>   13488/java  
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
>   13488/java  
> # telnet ::1 9160
> Trying ::1...
> Connected to ::1.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> # telnet ::1 9042
> Trying ::1...
> telnet: Unable to connect to remote host: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11047) native protocol will not bind ipv6

2016-01-27 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118835#comment-15118835
 ] 

Norman Maurer commented on CASSANDRA-11047:
---

[~brandon.williams] I just proposed a fix for netty :

https://github.com/netty/netty/pull/4770


> native protocol will not bind ipv6
> --
>
> Key: CASSANDRA-11047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Brandon Williams
>Assignee: Norman Maurer
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> When you set rpc_address to 0.0.0.0 it should bind every interface.  Of 
> course for ipv6 you have to comment out -Djava.net.preferIPv4Stack=true from 
> cassandra-env.sh, however this will not make the native protocol bind on 
> ipv6, only thrift:
> {noformat}
> tcp6   0  0 :::9160 :::*LISTEN
>   13488/java  
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
>   13488/java  
> # telnet ::1 9160
> Trying ::1...
> Connected to ::1.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> # telnet ::1 9042
> Trying ::1...
> telnet: Unable to connect to remote host: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11047) native protocol will not bind ipv6

2016-01-22 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112652#comment-15112652
 ] 

Norman Maurer commented on CASSANDRA-11047:
---

will check next week.

> native protocol will not bind ipv6
> --
>
> Key: CASSANDRA-11047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Brandon Williams
>Assignee: Norman Maurer
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> When you set rpc_address to 0.0.0.0 it should bind every interface.  Of 
> course for ipv6 you have to comment out -Djava.net.preferIPv4Stack=true from 
> cassandra-env.sh, however this will not make the native protocol bind on 
> ipv6, only thrift:
> {noformat}
> tcp6   0  0 :::9160 :::*LISTEN
>   13488/java  
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
>   13488/java  
> # telnet ::1 9160
> Trying ::1...
> Connected to ::1.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> # telnet ::1 9042
> Trying ::1...
> telnet: Unable to connect to remote host: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11047) native protocol will not bind ipv6

2016-01-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111030#comment-15111030
 ] 

Norman Maurer commented on CASSANDRA-11047:
---

[~brandon.williams] you are using the native (epoll) transport or just nio ? 
what Netty version ?

> native protocol will not bind ipv6
> --
>
> Key: CASSANDRA-11047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Brandon Williams
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> When you set rpc_address to 0.0.0.0 it should bind every interface.  Of 
> course for ipv6 you have to comment out -Djava.net.preferIPv4Stack=true from 
> cassandra-env.sh, however this will not make the native protocol bind on 
> ipv6, only thrift:
> {noformat}
> tcp6   0  0 :::9160 :::*LISTEN
>   13488/java  
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
>   13488/java  
> # telnet ::1 9160
> Trying ::1...
> Connected to ::1.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> # telnet ::1 9042
> Trying ::1...
> telnet: Unable to connect to remote host: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11047) native protocol will not bind ipv6

2016-01-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer reassigned CASSANDRA-11047:
-

Assignee: Norman Maurer

> native protocol will not bind ipv6
> --
>
> Key: CASSANDRA-11047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11047
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
>Reporter: Brandon Williams
>Assignee: Norman Maurer
> Fix For: 2.1.x, 2.2.x, 3.x
>
>
> When you set rpc_address to 0.0.0.0 it should bind every interface.  Of 
> course for ipv6 you have to comment out -Djava.net.preferIPv4Stack=true from 
> cassandra-env.sh, however this will not make the native protocol bind on 
> ipv6, only thrift:
> {noformat}
> tcp6   0  0 :::9160 :::*LISTEN
>   13488/java  
> tcp6   0  0 0.0.0.0:9042:::*LISTEN
>   13488/java  
> # telnet ::1 9160
> Trying ::1...
> Connected to ::1.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> # telnet ::1 9042
> Trying ::1...
> telnet: Unable to connect to remote host: Connection refused
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10559) Support encrypted and plain traffic on the same port

2015-10-28 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-10559:
--
Attachment: 0001-CASSANDRA-8803-Allow-to-serve-plain-and-encrypted-2.1.patch

> Support encrypted and plain traffic on the same port
> 
>
> Key: CASSANDRA-10559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10559
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Fix For: 3.0.0
>
> Attachments: 
> 0001-CASSANDRA-8803-Allow-to-serve-plain-and-encrypted-2.1.patch, 
> 0001-CASSANDRA-8803-Allow-to-serve-plain-and-encrypted-na.patch
>
>
> To be able to migrate clusters in a rolling way from plain to encrypted 
> traffic it would be very helpful if we could have Cassandra accept both on 
> the same port. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10559) Support encrypted and plain traffic on the same port

2015-10-21 Thread Norman Maurer (JIRA)
Norman Maurer created CASSANDRA-10559:
-

 Summary: Support encrypted and plain traffic on the same port
 Key: CASSANDRA-10559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10559
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Norman Maurer
Assignee: Norman Maurer


To be able to migrate clusters in a rolling way from plain to encrypted traffic 
it would be very helpful if we could have Cassandra accept both on the same 
port. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8803) Implement transitional mode in C* that will accept both encrypted and non-encrypted client traffic

2015-10-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966487#comment-14966487
 ] 

Norman Maurer commented on CASSANDRA-8803:
--

[~brandon.williams] sorry for the delay... Here we go 
https://issues.apache.org/jira/browse/CASSANDRA-10559

> Implement transitional mode in C* that will accept both encrypted and 
> non-encrypted client traffic
> --
>
> Key: CASSANDRA-8803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8803
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vishy Kasar
>
> We have some non-secure clusters taking live traffic in production from 
> active clients. We want to enable client to node encryption on these 
> clusters. Once we set the client_encryption_options enabled to true in yaml 
> and bounce a cassandra node in the ring, the existing clients that do not do 
> SSL will fail to connect to that node.
> There does not seem to be a good way to roll this change with out taking an 
> outage. Can we implement a transitional mode in C* that will accept both 
> encrypted and non-encrypted client traffic? We would enable this during 
> transition and turn it off after both server and client start talking SSL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10559) Support encrypted and plain traffic on the same port

2015-10-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-10559:
--
Attachment: 0001-CASSANDRA-8803-Allow-to-serve-plain-and-encrypted-na.patch

> Support encrypted and plain traffic on the same port
> 
>
> Key: CASSANDRA-10559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10559
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Norman Maurer
>Assignee: Norman Maurer
> Attachments: 
> 0001-CASSANDRA-8803-Allow-to-serve-plain-and-encrypted-na.patch
>
>
> To be able to migrate clusters in a rolling way from plain to encrypted 
> traffic it would be very helpful if we could have Cassandra accept both on 
> the same port. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8803) Implement transitional mode in C* that will accept both encrypted and non-encrypted client traffic

2015-09-25 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907779#comment-14907779
 ] 

Norman Maurer commented on CASSANDRA-8803:
--

[~brandon.williams] against which branch I should do the patch ?

> Implement transitional mode in C* that will accept both encrypted and 
> non-encrypted client traffic
> --
>
> Key: CASSANDRA-8803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8803
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vishy Kasar
>
> We have some non-secure clusters taking live traffic in production from 
> active clients. We want to enable client to node encryption on these 
> clusters. Once we set the client_encryption_options enabled to true in yaml 
> and bounce a cassandra node in the ring, the existing clients that do not do 
> SSL will fail to connect to that node.
> There does not seem to be a good way to roll this change with out taking an 
> outage. Can we implement a transitional mode in C* that will accept both 
> encrypted and non-encrypted client traffic? We would enable this during 
> transition and turn it off after both server and client start talking SSL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8803) Implement transitional mode in C* that will accept both encrypted and non-encrypted client traffic

2015-09-24 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906156#comment-14906156
 ] 

Norman Maurer commented on CASSANDRA-8803:
--

[~brandon.williams] I have a patch here that I would like to submit to allow 
serve SSL and non SSL on the same port without the need for STARTTLS etc. This 
will make things a lot easier. Should I just reopen this issue and attach the 
patch here or what ?

> Implement transitional mode in C* that will accept both encrypted and 
> non-encrypted client traffic
> --
>
> Key: CASSANDRA-8803
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8803
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Vishy Kasar
>
> We have some non-secure clusters taking live traffic in production from 
> active clients. We want to enable client to node encryption on these 
> clusters. Once we set the client_encryption_options enabled to true in yaml 
> and bounce a cassandra node in the ring, the existing clients that do not do 
> SSL will fail to connect to that node.
> There does not seem to be a good way to roll this change with out taking an 
> outage. Can we implement a transitional mode in C* that will accept both 
> encrypted and non-encrypted client traffic? We would enable this during 
> transition and turn it off after both server and client start talking SSL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8803) Implement transitional mode in C* that will accept both encrypted and non-encrypted client traffic

2015-07-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634665#comment-14634665
 ] 

Norman Maurer commented on CASSANDRA-8803:
--

[~brandon.williams] you can assign this to me if you like... I will work on a 
patch

 Implement transitional mode in C* that will accept both encrypted and 
 non-encrypted client traffic
 --

 Key: CASSANDRA-8803
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8803
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vishy Kasar
 Fix For: 2.0.x


 We have some non-secure clusters taking live traffic in production from 
 active clients. We want to enable client to node encryption on these 
 clusters. Once we set the client_encryption_options enabled to true in yaml 
 and bounce a cassandra node in the ring, the existing clients that do not do 
 SSL will fail to connect to that node.
 There does not seem to be a good way to roll this change with out taking an 
 outage. Can we implement a transitional mode in C* that will accept both 
 encrypted and non-encrypted client traffic? We would enable this during 
 transition and turn it off after both server and client start talking SSL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9558) Cassandra-stress regression in 2.2

2015-06-19 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593780#comment-14593780
 ] 

Norman Maurer commented on CASSANDRA-9558:
--

Sorry for been late to the party, but it somehow got lost in my inbox :(

So from a netty standpoint your are right flushing from outside the EventLoop 
is pretty expensive as it will need to wakeup the selector if it is not 
already woken up and processing stuff. 

So the best thing you can do is either always write / flush etc from within the 
EventLoop or try to minimize the flushes from outside the EventLoop. That said 
if you point me to the place in your code where you do the flush and the other 
stuff I'm happy to have a look and see if I can give you some idea how to 
improve. 

Just let me know!

 Cassandra-stress regression in 2.2
 --

 Key: CASSANDRA-9558
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9558
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
 Fix For: 2.2.0 rc2

 Attachments: 2.1.log, 2.2.log, CASSANDRA-9558-2.patch, 
 CASSANDRA-9558-ProtocolV2.patch, atolber-CASSANDRA-9558-stress.tgz, 
 atolber-trunk-driver-coalescing-disabled.txt, 
 stress-2.1-java-driver-2.0.9.2.log, stress-2.1-java-driver-2.2+PATCH.log, 
 stress-2.1-java-driver-2.2.log, stress-2.2-java-driver-2.2+PATCH.log, 
 stress-2.2-java-driver-2.2.log


 We are seeing some regression in performance when using cassandra-stress 2.2. 
 You can see the difference at this url:
 http://riptano.github.io/cassandra_performance/graph_v5/graph.html?stats=stress_regression.jsonmetric=op_rateoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=108.57ymin=0ymax=168147.1
 The cassandra version of the cluster doesn't seem to have any impact. 
 //cc [~tjake] [~benedict]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-05 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 
0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-2.1.patch

Patch against 2.1

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-2.1.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final-v2.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-04 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347404#comment-14347404
 ] 

Norman Maurer commented on CASSANDRA-8086:
--

Will get you a patch asap.

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final-v2.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-03 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Comment: was deleted

(was: you are right, sigh... fixing now)

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-03 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345293#comment-14345293
 ] 

Norman Maurer commented on CASSANDRA-8086:
--

you are right, sigh... fixing now

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-03 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345294#comment-14345294
 ] 

Norman Maurer commented on CASSANDRA-8086:
--

you are right, sigh... fixing now

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-03 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345314#comment-14345314
 ] 

Norman Maurer commented on CASSANDRA-8086:
--

Addressed comment and uploaded new patch

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final-v2.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-03-03 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 
0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final-v2.patch

Address comment... 

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final-v2.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-02-28 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 
0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch

This is the patch with comments addressed. It basically not add the handler to 
the pipeline if not explicit enabled. So it's kind of dead-code and so should 
expose no risk.

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-final.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-02-18 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325853#comment-14325853
 ] 

Norman Maurer commented on CASSANDRA-8086:
--

Fair enough... let me adjust the patch to only add the handler to the pipeline 
if enabled.

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.4

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-02-02 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch

Latest patch with all to limit per source ip or limit in general

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.3

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2015-02-02 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 
0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch

Patch with applies on the cassandra-2.0 branch

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Fix For: 2.1.3

 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c-2.0.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.patch, 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-12-08 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-6198:
-
Attachment: 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con-v2.txt

This includes changes to address comment of Brandon.

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Fix For: 2.1.3

 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con-v2.txt, 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-12-08 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237726#comment-14237726
 ] 

Norman Maurer commented on CASSANDRA-6198:
--

[~brandon.williams] sorry for the delay, just addressed your comment and 
uploaded a new version of the patch with the change included.

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Fix For: 2.1.3

 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con-v2.txt, 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-11-26 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226587#comment-14226587
 ] 

Norman Maurer commented on CASSANDRA-6198:
--

Agree Boolean.getBoolean(...) would be better. Should I adjust the patch ?

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Fix For: 2.1.3

 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-11-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-6198:
-
Attachment: 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-11-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-6198:
-
Attachment: (was: 
0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt)

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor

 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-11-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-6198:
-
Attachment: 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6198) Distinguish streaming traffic at network level

2014-11-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220968#comment-14220968
 ] 

Norman Maurer commented on CASSANDRA-6198:
--

Please review...

 Distinguish streaming traffic at network level
 --

 Key: CASSANDRA-6198
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6198
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Norman Maurer
Priority: Minor
 Attachments: 
 0001-CASSANDRA-6198-Set-IPTOS_THROUGHPUT-on-streaming-con.txt


 It would be nice to have some information in the TCP packet which network 
 teams can inspect to distinguish between streaming traffic and other organic 
 cassandra traffic. This is very useful for monitoring WAN traffic. 
 Here are some solutions:
 1) Use a different port for streaming. 
 2) Add some IP header. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8086) Cassandra should have ability to limit the number of native connections

2014-11-21 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-8086:
-
Attachment: 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt

Please review... This is against 2.1

 Cassandra should have ability to limit the number of native connections
 ---

 Key: CASSANDRA-8086
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8086
 Project: Cassandra
  Issue Type: Bug
Reporter: Vishy Kasar
Assignee: Norman Maurer
 Attachments: 
 0001-CASSANDRA-8086-Allow-to-limit-the-number-of-native-c.txt


 We have a production cluster with 72 instances spread across 2 DCs. We have a 
 large number ( ~ 40,000 ) of clients hitting this cluster. Client normally 
 connects to 4 cassandra instances. Some event (we think it is a schema change 
 on server side) triggered the client to establish connections to all 
 cassandra instances of local DC. This brought the server to its knees. The 
 client connections failed and client attempted re-connections. 
 Cassandra should protect itself from such attack from client. Do we have any 
 knobs to control the number of max connections? If not, we need to add that 
 knob.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-21 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105211#comment-14105211
 ] 

Norman Maurer commented on CASSANDRA-7743:
--

[~benedict] so no netty issue at all ?

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1 rc6


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  10.240.72.183   896.57 KB  256 100.0%
 15735c4d-98d4-4ea4-a305-7ab2d92f65fc  RAC1
 $ echo 

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096625#comment-14096625
 ] 

Norman Maurer commented on CASSANDRA-7743:
--

[~benedict] hmm.. it should always get returned to the pool that it was 
allocated from. Could you provide me with an easy way to reproduce ?

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096634#comment-14096634
 ] 

Norman Maurer commented on CASSANDRA-7743:
--

[~benedict] Yeah it add to the cache of the releasing thread that is right.. 
I thought you talk about return to pool.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 41314169-eff5-465f-85ea-d501fd8f9c5e  RAC1
 UN  10.240.137.253  1.1 MB 256 100.0%
 c706f5f9-c5f3-4d5e-95e9-a8903823827e  RAC1
 UN  10.240.72.183   896.57 KB  

[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test

2014-08-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096647#comment-14096647
 ] 

Norman Maurer commented on CASSANDRA-7743:
--

[~benedict] well it will be released after a while if not used. But I think for 
your use-case it would be best to disable the cache which can be done via the 
PooledByteBufAllocator constructor just pass in 0 for int tinyCacheSize, int 
smallCacheSize, int normalCacheSize.

 Possible C* OOM issue during long running test
 --

 Key: CASSANDRA-7743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Google Compute Engine, n1-standard-1
Reporter: Pierre Laporte
Assignee: Benedict
 Fix For: 2.1.0


 During a long running test, we ended up with a lot of 
 java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra 
 instances.
 Here is an example of stacktrace from system.log :
 {code}
 ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - 
 Unexpected exception during request
 java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25]
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) 
 ~[na:1.7.0_25]
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) 
 ~[na:1.7.0_25]
 at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) 
 ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
  ~[netty-all-4.0.20.Final.jar:4.0.20.Final]
 at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
 {code}
 The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 
 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance 
 running the test.
 After ~2.5 days, several requests start to fail and we see the previous 
 stacktraces in the system.log file.
 The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory 
 available.
 {code}
 $ free -m
 total  used   free sharedbuffers cached
 Mem:  3702   3532169  0161854
 -/+ buffers/cache:   2516   1185
 Swap:0  0  0
 $ head -n 4 /proc/meminfo
 MemTotal:3791292 kB
 MemFree:  173568 kB
 Buffers:  165608 kB
 Cached:   874752 kB
 {code}
 These errors do not affect all the queries we run. The cluster is still 
 responsive but is unable to display tracing information using cqlsh :
 {code}
 $ ./bin/nodetool --host 10.240.137.253 status duration_test
 Datacenter: DC1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.240.98.27925.17 KB  256 100.0%
 

[jira] [Commented] (CASSANDRA-7695) Inserting the same row in parallel causes bad data to be returned to the client

2014-08-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095316#comment-14095316
 ] 

Norman Maurer commented on CASSANDRA-7695:
--

Hey guys I finally found the root cause of the problem and fixed it in Netty. 
That said I think it is also possible to see the same problem when using 
non-unsafe ByteBufs (if you are lucky enough). This problem was easier to 
reproduce on OSX as for this to happen you need to produce some series of 
incomplete / complete writes to trigger that, and this is easier on OSX stock 
network configuration.

The issue and fix can be found here:
https://github.com/netty/netty/issues/2761

 Inserting the same row in parallel causes bad data to be returned to the 
 client
 ---

 Key: CASSANDRA-7695
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7695
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.12.21, JVM 1.7u60
 Cassandra server 2.1.0 RC 5
 Cassandra datastax client version 2.1.0RC1
Reporter: Johan Bjork
Assignee: T Jake Luciani
Priority: Blocker
  Labels: qa-resolved
 Fix For: 2.1.0

 Attachments: 7695-workaround.txt, PutFailureRepro.java, 
 bad-data-tid43-get, bad-data-tid43-put


 Running the attached test program against a cassandra 2.1 server results in 
 scrambled data returned by the SELECT statement. Running it against latest 
 stable works fine.
 Attached:
 * Program that reproduces the failure
 * Example output files from mentioned test-program with the scrambled output.
 Failure mode:
 The value returned by 'get' is scrambled, the size is correct but some bytes 
 have shifted locations in the returned buffer.
 Cluster info:
 For the test we set up a single cassandra node using the stock configuration 
 file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7695) Inserting the same row in parallel causes bad data to be returned to the client

2014-08-13 Thread Norman Maurer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norman Maurer updated CASSANDRA-7695:
-

Attachment: 0001-CASSANDRA-7695-Workaround-Netty-bug-by-not-use-Compo.patch

This diff will workaround the bug while still allow to use unsafe etc.

 Inserting the same row in parallel causes bad data to be returned to the 
 client
 ---

 Key: CASSANDRA-7695
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7695
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.12.21, JVM 1.7u60
 Cassandra server 2.1.0 RC 5
 Cassandra datastax client version 2.1.0RC1
Reporter: Johan Bjork
Assignee: T Jake Luciani
Priority: Blocker
  Labels: qa-resolved
 Fix For: 2.1.0

 Attachments: 
 0001-CASSANDRA-7695-Workaround-Netty-bug-by-not-use-Compo.patch, 
 7695-workaround.txt, PutFailureRepro.java, bad-data-tid43-get, 
 bad-data-tid43-put


 Running the attached test program against a cassandra 2.1 server results in 
 scrambled data returned by the SELECT statement. Running it against latest 
 stable works fine.
 Attached:
 * Program that reproduces the failure
 * Example output files from mentioned test-program with the scrambled output.
 Failure mode:
 The value returned by 'get' is scrambled, the size is correct but some bytes 
 have shifted locations in the returned buffer.
 Cluster info:
 For the test we set up a single cassandra node using the stock configuration 
 file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7695) Inserting the same row in parallel causes bad data to be returned to the client

2014-08-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095526#comment-14095526
 ] 

Norman Maurer commented on CASSANDRA-7695:
--

[~tjake] yay :)

 Inserting the same row in parallel causes bad data to be returned to the 
 client
 ---

 Key: CASSANDRA-7695
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7695
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.12.21, JVM 1.7u60
 Cassandra server 2.1.0 RC 5
 Cassandra datastax client version 2.1.0RC1
Reporter: Johan Bjork
Assignee: T Jake Luciani
Priority: Blocker
  Labels: qa-resolved
 Fix For: 2.1.0

 Attachments: 
 0001-CASSANDRA-7695-Workaround-Netty-bug-by-not-use-Compo.patch, 
 7695-workaround.txt, PutFailureRepro.java, bad-data-tid43-get, 
 bad-data-tid43-put


 Running the attached test program against a cassandra 2.1 server results in 
 scrambled data returned by the SELECT statement. Running it against latest 
 stable works fine.
 Attached:
 * Program that reproduces the failure
 * Example output files from mentioned test-program with the scrambled output.
 Failure mode:
 The value returned by 'get' is scrambled, the size is correct but some bytes 
 have shifted locations in the returned buffer.
 Cluster info:
 For the test we set up a single cassandra node using the stock configuration 
 file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6861) Optimise our Netty 4 integration

2014-05-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992883#comment-13992883
 ] 

Norman Maurer commented on CASSANDRA-6861:
--

We will most likely ship our OpensslEngine in the next release of Netty. That 
said it only support server-side usage atm.

 Optimise our Netty 4 integration
 

 Key: CASSANDRA-6861
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6861
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 2.1 rc1


 Now we've upgraded to Netty 4, we're generating a lot of garbage that could 
 be avoided, so we should probably stop that. Should be reasonably easy to 
 hook into Netty's pooled buffers, returning them to the pool once a given 
 message is completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6861) Optimise our Netty 4 integration

2014-05-11 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993380#comment-13993380
 ] 

Norman Maurer commented on CASSANDRA-6861:
--

Opened a PR with some improvements:
https://github.com/apache/cassandra/pull/35

 Optimise our Netty 4 integration
 

 Key: CASSANDRA-6861
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6861
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 2.1 rc1


 Now we've upgraded to Netty 4, we're generating a lot of garbage that could 
 be avoided, so we should probably stop that. Should be reasonably easy to 
 hook into Netty's pooled buffers, returning them to the pool once a given 
 message is completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6236) Update native protocol server to Netty 4

2014-03-31 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955528#comment-13955528
 ] 

Norman Maurer commented on CASSANDRA-6236:
--

There is now even the recorded video for this:

https://www.youtube.com/watch?v=_GRIyCMNGGI

Anyway what you guys think about having me doing the heavy work and submit a 
patch?

 Update native protocol server to Netty 4
 

 Key: CASSANDRA-6236
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6236
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
 Fix For: 2.1 beta2


 We should switch to Netty 4 at some point, since it's the future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6236) Update native protocol server to Netty 4

2014-03-31 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955542#comment-13955542
 ] 

Norman Maurer commented on CASSANDRA-6236:
--

lol.. Ok tell me from which branch etc to start and I will come back to you 
guys in the next days ;)

 Update native protocol server to Netty 4
 

 Key: CASSANDRA-6236
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6236
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
 Fix For: 2.1 beta2


 We should switch to Netty 4 at some point, since it's the future.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6235) Improve native protocol server latency

2013-11-04 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13812866#comment-13812866
 ] 

Norman Maurer commented on CASSANDRA-6235:
--

You are still using 3.x right ?

 Improve native protocol server latency
 --

 Key: CASSANDRA-6235
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6235
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
 Attachments: NPTester.java


 The tl;dr is that the native protocol server seems to add some non 
 negligeable latency to operations compared to the thrift server. And the 
 added latency seems to lie within Netty's internal as far as I can tell. I'm 
 not sure what to tweak to try to reduce that.
 The test I ran is simple: it's {{stress -t 1 -L3}}, the Cassandra stress test 
 for insertions with just 1 thread and using CQL-over-thrift (to make things 
 more comparable). What I'm interested in is the average latency. Also, 
 because I don't care about testing the storage engine or even CQL processing, 
 I've disabled the processing of statements: all queries just return an empty 
 result set right away (there's no parsing of the query in particular). The 
 resulting branch is at 
 https://github.com/pcmanus/cassandra/commits/latency-testing (note that 
 there's a trivial patch to have stress show the latency in microseconds).
 With that branch (single node), I get with thrift ~62μs of average latency. 
 That number is actually fairly stable across runs (not doing any real 
 processing helps having consistent performance here).
 For the native protocol, I wanted to eliminate the possibility that the 
 DataStax Java driver was the bottleneck so I wrote a very simple class 
 (NPTester.java, attached) that emulates the stress test above but with the 
 native protocol. It's not execssively pretty but its simple (no dependencies, 
 compiles with javac NPTester.java) and it tries to minimize the client side 
 overhead. It's just a basic loop that write query frames (serializing them 
 largely manually) and read the result back. And it measures the latency as 
 close to the socket as possible. Unless I've done something really wrong, it 
 should have less client side overhead than what stress has.
 With that tester, the average latency I get is ~140μs. This is more than 
 twice that of thrift.
 To try to understand where that additional latency was spent, I 
 instrumented the Frame coder/decoder to record latencies (last commit of 
 the latency-test branch above): it records how long it takes to decode, 
 execute and re-encode the query. The latency for that is ~35μs (as other 
 numbers above, this is pretty consistent over runs). Given that my ping on 
 localhost is 30μs, this suggest that compared to thrift, Netty spends ~70μs 
 more than the thrift server somewhere while reading and/or writing data on 
 the wire. I've try yourkitting it but I didn't saw anything obvious so I'm 
 not sure what's the problem, but it sure would be nice to get on par (or at 
 least much closer) with thrift on such a simple test.
 I'll note that if I run the same tests without disabling actual query 
 processing, the tests have a bit more variability, but for thrift I get 
 ~220-230μs latency on average while the NPTester gets ~290-300μs. In other 
 words, there still seems to be that 70μs overhead for the native protocol. 
 Which in that case is still a 30% slowdown. I'll also note that test 
 comparisons with more threads (using the java driver this time) also show the 
 native protocol being slightly slower than thrift (~5-10% slower), and while 
 there might be inefficiencies in the java driver, I'm growing more and more 
 convinced that at least part of it is due to the latency issue described 
 above.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-2478) Custom CQL protocol/transport

2012-07-05 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407219#comment-13407219
 ] 

Norman Maurer commented on CASSANDRA-2478:
--

@Yuki I think everything is ok.. Let me do a final review cycle today again. 

 Custom CQL protocol/transport
 -

 Key: CASSANDRA-2478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2478
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Eric Evans
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: cql
 Attachments: cql_binary_protocol, cql_binary_protocol-v2


 A custom wire protocol would give us the flexibility to optimize for our 
 specific use-cases, and eliminate a troublesome dependency (I'm referring to 
 Thrift, but none of the others would be significantly better).  Additionally, 
 RPC is bad fit here, and we'd do better to move in the direction of something 
 that natively supports streaming.
 I don't think this is as daunting as it might seem initially.  Utilizing an 
 existing server framework like Netty, combined with some copy-and-paste of 
 bits from other FLOSS projects would probably get us 80% of the way there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2478) Custom CQL protocol/transport

2012-07-05 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13407260#comment-13407260
 ] 

Norman Maurer commented on CASSANDRA-2478:
--

We released 3.5.2.Final today.. not sure if you used it ;)

 Custom CQL protocol/transport
 -

 Key: CASSANDRA-2478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2478
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Eric Evans
Assignee: Sylvain Lebresne
Priority: Minor
  Labels: cql
 Fix For: 1.2

 Attachments: cql_binary_protocol, cql_binary_protocol-v2


 A custom wire protocol would give us the flexibility to optimize for our 
 specific use-cases, and eliminate a troublesome dependency (I'm referring to 
 Thrift, but none of the others would be significantly better).  Additionally, 
 RPC is bad fit here, and we'd do better to move in the direction of something 
 that natively supports streaming.
 I don't think this is as daunting as it might seem initially.  Utilizing an 
 existing server framework like Netty, combined with some copy-and-paste of 
 bits from other FLOSS projects would probably get us 80% of the way there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >