[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-08 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506462#comment-16506462
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Thanks [~jasobrown] again for the review. Committed as 
[{{ed5f834}}|https://github.com/apache/cassandra/commit/ed5f8347ef0c7175cd96e59bc8bfaf3ed1f4697a].

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-08 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505984#comment-16505984
 ] 

Jason Brown commented on CASSANDRA-13929:
-

[~jay.zhuang] +1

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505023#comment-16505023
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Thanks [~jasobrown] for the review and fix.

Yes, null out the {{values}} array before reusing is a good practice. +1 on 
that. Seems the dTest is failed for 3.11 branch, but I don't think it's caused 
by this patch. Please let me know if I could commit.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504890#comment-16504890
 ] 

Jason Brown commented on CASSANDRA-13929:
-

[~tsteinmaurer] ok, not a problem. As long as you don't see streaming getting 
obviously slower (1 hour -> 2 hours, for example), I'll take that as a data 
point.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Thomas Steinmaurer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504873#comment-16504873
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~jasobrown], sorry I don't have any streaming benchmarks before/after.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504842#comment-16504842
 ] 

Jason Brown commented on CASSANDRA-13929:
-

running [~jay.zhuang]'s patch with a few minor cleanups:

||3.11||trunk||
|[branch|https://github.com/jasobrown/cassandra/tree/13929-3.11]|[branch|https://github.com/jasobrown/cassandra/tree/13929-trunk]|
|[utests & 
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/13929-3.11]|[utests
 & 
dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/13929-trunk]|
||


> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504733#comment-16504733
 ] 

Jason Brown commented on CASSANDRA-13929:
-

[~tsteinmaurer] if you've been running with some version of this patch (where 
recycling is not used), do you see any degradation in streaming?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-07 Thread Jason Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504704#comment-16504704
 ] 

Jason Brown commented on CASSANDRA-13929:
-

On the whole, this looks pretty good. I'm running tests locally now, and will 
run in circleci shortly. 

[~jay.zhuang] There's one thing I'm wondering: in {{#reuse{}}}, we do not 
explicitly null out the {{values}} array, like we used to do in {{#cleanup()}}:

{code}
Arrays.fill(values, null);
{code}

While every access of {{values}} seems safe, and uses the {{count}} to ensure 
proper sizing of any built {{Object[]}}, I'd like to be over-cautious and null 
out the array to prevent the possibility of data leak wherein a reuse of 
Builder erroneously gets extra data from the last use. This error doesn't seem 
like it's possible in the current code, but I'd like to future-proof this - I'd 
be a complete nightmare to debug. 

wdyt?



> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.3
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-06-05 Thread Cyril Scetbon (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502546#comment-16502546
 ] 

Cyril Scetbon commented on CASSANDRA-13929:
---

Hey guys, any news on that issue ?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-03-04 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385610#comment-16385610
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Hi [~tjake], are you interested in reviewing the patch? The trunk uTest failure 
is because of CASSANDRA-14119.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374200#comment-16374200
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


The following does not include the latest patches from Feb 22, but shows last 
30d on a single node (m4.2xlarge, Xmx12G, CMS) out of our 9 node loadtest 
environment including various tests/patches we have applied.
 !cassandra_heapcpu_memleak_patching_test_30d.png|width=1280!
 * Blue line => AVG heap utilization
 * Orange line => AVG CPU utilization (not really related as usually compaction 
is overlaying anything else most likely)

Following timelines in the chart:
||Timeframe||Deployment||Comment/Result||
|Jan 25 - Feb 1|Cassandra 3.11 public + Netty 4.0.55|(!) Heap utilization 
increase|
|Feb 1 - Feb 6|Cassandra 3.11 public + Netty 4.0.55 + limiting Netty capacity 
per Thread|(!) Heap utilization increase|
|Feb 6 - Feb 14|Cassandra 3.11 public + Netty 4.0.55 + my recycleHandle = null 
patch|(/) Heap utilization stable|
|Feb 14 - Feb 23|Cassandra 3.11 public + Netty 4.0.55 + *without* recycleHandle 
= null patch + first [~jay.zhuang] patch from Feb 13|(/) Heap utilization 
stable, but slightly increased to previous|

Very high-level (although from the field) compared to [~jay.zhuang] tests and 
benchmarks, but possibly useful for a decision process, hopefully being 
included in 3.11.3. Thanks guys!

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> cassandra_heapcpu_memleak_patching_test_30d.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-23 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374138#comment-16374138
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

Just a general comment The recycler only makes sense to use if creating the 
object is considered very expensive and or if you create / destroy a lot of 
these very frequently. Which means usually thousands per second.  So if this is 
not the case here I think it completely reasonable to not use the Recycler at 
all... As I have no idea really about the use-case I am just leave this here as 
general comment :)

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-22 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373845#comment-16373845
 ] 

Jay Zhuang commented on CASSANDRA-13929:


I tested the change with {{LongStreamingTest}}, seems no impact with/without 
{{Recycler}}:
{noformat}
With Recycler
[junit] ERROR [main] 2018-02-22 16:57:56,189 SubstituteLogger.java:250 - 
Writer finished after 22 seconds
[junit] ERROR [main] 2018-02-22 16:58:16,480 SubstituteLogger.java:250 - 
Finished Streaming in 20.29 seconds: 23.66 Mb/sec
[junit] ERROR [main] 2018-02-22 16:58:35,921 SubstituteLogger.java:250 - 
Finished Streaming in 19.44 seconds: 24.69 Mb/sec
[junit] ERROR [main] 2018-02-22 16:59:25,719 SubstituteLogger.java:250 - 
Finished Compacting in 49.80 seconds: 19.44 Mb/sec

No Recycler
[junit] ERROR [main] 2018-02-22 16:50:48,255 SubstituteLogger.java:250 - 
Writer finished after 22 seconds
[junit] ERROR [main] 2018-02-22 16:51:08,209 SubstituteLogger.java:250 - 
Finished Streaming in 19.95 seconds: 24.06 Mb/sec
[junit] ERROR [main] 2018-02-22 16:51:27,624 SubstituteLogger.java:250 - 
Finished Streaming in 19.41 seconds: 24.72 Mb/sec
[junit] ERROR [main] 2018-02-22 16:52:16,900 SubstituteLogger.java:250 - 
Finished Compacting in 49.28 seconds: 19.48 Mb/sec
{noformat}

Here is the patch, please review:
| Branch | uTest |
| [13929-3.11|https://github.com/cooldoger/cassandra/tree/13929-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11]
 |
| [13929-trunk|https://github.com/cooldoger/cassandra/tree/13929-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/13929-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-trunk]
 |

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373549#comment-16373549
 ] 

T Jake Luciani commented on CASSANDRA-13929:


I'd prefer to reduce the array size vs removing but you have been at this for 
so long I'm starting to think it's not worth keeping around.  I'd like to 
re-setup the test I had to check the improvement of CASSANDRA-9766 and see how 
much of a impact it has.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-22 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373186#comment-16373186
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Hi [~tjake], any suggestion on that? Should we remove {{Recycler}} or just 
reduce the large builder array size?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-19 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369517#comment-16369517
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Seems like the performance is better without {{Recycler}}. Here is microbench 
test result to build a BTree with and without {{Recycler}} 
([13929-3.11-perf|https://github.com/cooldoger/cassandra/tree/13929-3.11-perf]) 
(The score is operation per ms, higher is better)
{noformat}
 [java] Benchmark  (dataSize)  (treeBuilder)   
Mode  Cnt  Score   Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  treeBuilderRecycleAdd  
thrpt6  23112.102 ? 17522.471  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1 treeBuilderAdd  
thrpt6  46275.541 ? 60422.458  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  treeBuilderRecycleAdd  
thrpt6  23588.176 ? 16372.260  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2 treeBuilderAdd  
thrpt6  42838.298 ? 25339.870  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  treeBuilderRecycleAdd  
thrpt6  24358.111 ? 24339.382  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5 treeBuilderAdd  
thrpt6  60074.551 ? 47329.418  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  treeBuilderRecycleAdd  
thrpt6  21412.578 ?  6072.160  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10 treeBuilderAdd  
thrpt6  50862.304 ? 30597.546  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  treeBuilderRecycleAdd  
thrpt6  15871.754 ?  5036.739  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20 treeBuilderAdd  
thrpt6  33699.725 ? 10857.366  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  treeBuilderRecycleAdd  
thrpt6   5168.225 ?  1212.571  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40 treeBuilderAdd  
thrpt6   8806.838 ?  7736.485  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  treeBuilderRecycleAdd  
thrpt6   2114.218 ?   639.589  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100 treeBuilderAdd  
thrpt6   3213.333 ?   486.126  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  treeBuilderRecycleAdd  
thrpt6335.523 ?   101.230  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000 treeBuilderAdd  
thrpt6386.678 ?   333.534  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  treeBuilderRecycleAdd  
thrpt6 35.644 ?32.171  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1 treeBuilderAdd  
thrpt6 44.250 ? 8.180  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  treeBuilderRecycleAdd  
thrpt6  3.073 ? 2.165  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10 treeBuilderAdd  
thrpt6  4.651 ? 4.137  ops/ms
{noformat}
And I think the performance gain for CASSANDRA-9766 is not because of the 
{{Recycler}} for BTree builder.

With {{Recycler}}, it does reduce the GC by reserving the memory. But for P95 
and sometime P99, {{noRecycler}} is still better. Here is the test result with 
percentiles (score is time per operation, so smaller is better):
{noformat}
 [java] Benchmark(dataSize) 
 (treeBuilder)Mode   Cnt  Score   Error  Units
 [java] BTreeBuildBench.buildTreeTest 1  
treeBuilderRecycleAdd  sample   9244739   1435.033 ?   111.857  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.00 1  
treeBuilderRecycleAdd  sample   92.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.50 1  
treeBuilderRecycleAdd  sample  638.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.90 1  
treeBuilderRecycleAdd  sample 1688.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.95 1  
treeBuilderRecycleAdd  sample 2292.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.99 1  
treeBuilderRecycleAdd  sample 4544.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.9991  
treeBuilderRecycleAdd  sample13792.000  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.   1  
treeBuilderRecycleAdd  sample   124233.984  ns/op
 [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p1.00 1  
treeBuilderRecycleAdd  sample104202240.000  ns/op
 [java] BTreeBuildBench.buildTreeTes

[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-14 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363974#comment-16363974
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~tsteinmaurer] yeah thanks I see... So looks more like a misusage for me then 
a netty bug. Cassandra may also consider to configure the `Recycler` with a 
more sane default value for this use-case (via the constructor).

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-14 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363884#comment-16363884
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~norman], a concrete example for a single Recycler$Stack instance:
!memleak_heapdump_recyclerstack.png|width=300!

Looks like we have a lot of largish 32K object arrays, which seem to be empty 
as shallow heap = retained heap for this object arrays. So, a quick look on the 
patch of [~jay.zhuang], this might help. [~jjirsa], will re-patch a single node 
and report back after a considerable uptime.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png, memleak_heapdump_recyclerstack.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-13 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363597#comment-16363597
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~jay.zhuang] I would be interested what is contained in the `Stack` itself

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-13 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363596#comment-16363596
 ] 

Jeff Jirsa commented on CASSANDRA-13929:


[~tsteinmaurer] - available to test?

Patch looks straight forward to me, but any volunteers to review? Given its 
impact, probably deserves a more thorough review than I have time for. 




> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361890#comment-16361890
 ] 

Jay Zhuang commented on CASSANDRA-13929:


After increasing the heap size from 1G to 4G heap size, it could support 80 
such requests at the same time 
({{[large_in_test.py:12|https://github.com/cooldoger/cassandra-dtest/blob/13929/large_in_test.py#L12]}}),
 then the {{$Stack}} uses about 70% heap:
 !dtest_example_80_request.png! 

Tried the patch, it did fix the problem:
 !dtest_example_80_request_fix.png! 

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, 
> dtest_example_80_request.png, dtest_example_80_request_fix.png, 
> dtest_example_heap.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-12 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361873#comment-16361873
 ] 

Jay Zhuang commented on CASSANDRA-13929:


I think it's just caching the largest tree ever built 
({{[BTree.java:907|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L907]}})
 and getting bigger and bigger.

Here is an example with a large number of {{IN}} in a CQL, which causes high 
heap usage for {{$Stack}} :
{noformat}
session.execute_async("select id from test.users where id in (%s)" % 
','.join(map(str, range(10
{noformat}
Here is a dtest to reproduce that: 
{{[large_in_test.py|https://github.com/cooldoger/cassandra-dtest/blob/13929/large_in_test.py]}}
{{$Stack}} uses 25% 10M heap: {{10M = 10 (thread) * 100K (element) * 8 (obj 
size)}}
!dtest_example_heap.png! 
We could increase heap size and then increase thread number, element number 
gradually to make {{$Stack}} to 80%+ heap usage. And I'm sure there're other 
use cases could build large BTree. Seems the {{$Stack}} is unable to release 
until the {{Thread}} is released and we don't that for {{ThreadPoolExecutor}}.

I'd like to propose the following patch which cleans large BTree builder ({{> 
1k elements}}) while recycling:
| Branch | uTest |
| [13929-3.11|https://github.com/cooldoger/cassandra/tree/13929-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11]
 |


> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, dtest_example_heap.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-04 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352117#comment-16352117
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


I have attached a new heap utilization chart for our 9 node loadtest 
environment.
 !cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png|width=300! 

The marked node is running with the following configuration:
* Cassandra 3.11.1 public release
* Netty 4.0.55
* -Dio.netty.recycler.maxCapacityPerThread=1024

This still results in a slow increase in AVG heap utilization

The other 8 nodes are running with 3.11.2 built from source with my 
recycleHandle "nulling" patch.

[~norman], I need to internally check if we need to do that via some sort of 
NDA then. Will report back. Thanks.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png, 
> cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-01 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348723#comment-16348723
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

[~tsteinmaurer] what about a heap dump ? Is this something you could provide ?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-01 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348576#comment-16348576
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~jay.zhuang], will let the node (1 out of 9) - previously, manually upgraded 
to Netty 4.0.55 - running with your mentioned Netty JVM option over the 
weekend. Thanks.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-02-01 Thread Norman Maurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348181#comment-16348181
 ] 

Norman Maurer commented on CASSANDRA-13929:
---

yeah 4.0.55.Final should have the "fix" as well:

 

[https://github.com/netty/netty/commit/b386ee3eaf35abd5072992d626de6ae2ccadc6d9#diff-23eafd00fcd66829f8cce343b26c236a]

 

That said maybe there are other issues. Would it be possible to share a 
heap-dump ?

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-31 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347760#comment-16347760
 ] 

Jay Zhuang commented on CASSANDRA-13929:


Hi [~tsteinmaurer], as [~tjake] suggested, would you please try to reduce the 
Recycler Capacity Per Thread 
([Recycler.java:64|https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/Recycler.java#L64]),
 just restart the node with JVM option:
{noformat}
-Dio.netty.recycler.maxCapacityPerThread=1024
{noformat}



> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-31 Thread Nate McCall (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347629#comment-16347629
 ] 

Nate McCall commented on CASSANDRA-13929:
-

If [~norman] is around, would be great to get his input on whether this might 
be a leak or tuning issue with recycler. Thanks for your patience and 
willingness to test this out [~tsteinmaurer]. 

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-31 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346779#comment-16346779
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~jjirsa], had 3.11.1 with Netty 4.0.55 on one node in parallel with 3.11.1 
incl. "nulling patch" and Netty 4.0.44 on other nodes running over the weekend. 
Unpatched combo with Netty 4.0.55 unfortunately still shows an AVG heap 
utilization increase over 72hrs, while the patched show a constant AVG heap 
usage.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-25 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340152#comment-16340152
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


Makes sense. Restarted the single node with Netty 4.0.55 now.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-25 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340146#comment-16340146
 ] 

Jeff Jirsa commented on CASSANDRA-13929:


h3. 
[netty-4.0.55|https://github.com/netty/netty/releases/tag/netty-4.0.55.Final] 
may be a better test, since 3.11.1 is running netty 4.0.44 now, 4.0.55 would be 
on the same major and appears to have many of the similar fixes to Recycler.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-25 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340134#comment-16340134
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~jjirsa], thanks for the follow-up. Looks like a simple netty jar file 
replacement works, at least Cassandra starts up without any exceptions. I have 
now 1 of 9 nodes running with 3.11.1 unpatched + Netty 4.1.20.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-25 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339863#comment-16339863
 ] 

Jeff Jirsa commented on CASSANDRA-13929:


I chatted offline with Scott (who is far more familiar with the netty project 
than I am), and he noted that they've fixed a few recent bugs in that area of 
the code that COULD resolve leaks. I'm not confident this is actually a leak, 
vs just the recycler working as intended (but needing to be tuned), but perhaps 
we can consider bumping netty anyway (at least in trunk this is an easy 
decision, but it'd be interesting to find if netty 4.1.20 fixes the behavior 
you see in unpatched 3.11.1 [~tsteinmaurer] ).

 

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2018-01-22 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334780#comment-16334780
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~mshuler], my patch would look like as mentioned in the description, but I'm 
pretty sure it will get rejected, cause [~tjake] mentioned that this all has 
been added to cache something, especially useful for streaming, if I remember 
correctly. Due to lack of knowledge about the inner workings here, I can't 
provide anything more useful. Our loadtest environment is running a locally 
patched build with the nulling approach.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-12-11 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286047#comment-16286047
 ] 

Michael Shuler commented on CASSANDRA-13929:


I set the fix version to {{3.11.x}} to indicate this is intended for the 3.11 
series for you.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Fix For: 3.11.x
>
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-12-11 Thread Michael Shuler (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286044#comment-16286044
 ] 

Michael Shuler commented on CASSANDRA-13929:


You may have some better success getting eyes on this JIRA by attaching a 
proper patch for the 3.11 branch (and a separate trunk patch, if merge up isn't 
clean). You can do this with a simple text file attachment of the diff, or 
linking to a github branch.

Bonus points for adding a test that reproduces the problem/fix, as well as 
getting test run through circleci.

This JIRA can be assigned to yourself as the author and you can set the status 
to "Patch Available" when there's an actual patch/branch to review. Then you 
can possibly seek out a reviewer on the dev@ mailing list, if it's urgent.

It looks like there are currently 115 Patch Available tickets, so it may still 
take some time, but getting this set up for someone else to review would be a 
great step in getting the process rolling further than just a comment.
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20%3D%20%22Patch%20Available%22

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-12-11 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285908#comment-16285908
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


Yet another ping after 2 months of silence and the issue still being 
unassigned. Is this something which will be handled in the 3.11 series? Thanks!

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-13 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203473#comment-16203473
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


Stable for a week now in our 9 node (m4.xlarge) loadtest cluster from a heap 
perspective with entirely disabling the recycler cache.

Any ideas if we can expect something in context of this ticket for 3.11.2? 
Thanks!



> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190865#comment-16190865
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


Ok, if this sort of caching is entirely only useful during streaming, this 
somehow explains why I do not see any difference between the nodes here, cause 
they process regular business and no repair, bootstrapping etc, thus I can't 
comment on the performance gain with the cache (possibly this has been tested 
in CASSANDRA-9766 anyway).

If the cache is useful for streaming, it still would definitely make sense 
IMHO, to make the max capacity configurable via a system property, 
cassandra.yaml or whatever place is preferred here, cause ~ 1,8G heap usage for 
this sort of cache at -XmX8G does not make sense.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190006#comment-16190006
 ] 

T Jake Luciani commented on CASSANDRA-13929:


This was part of CASSANDRA-9766 and addresses one of the main allocation 
culprits during streaming.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189956#comment-16189956
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


I can try a different max value, but what is supposed to be cached here and 
what area should be suffering without the cache? The reason why I'm asking is, 
that in our 9 node cluster, a single node is patched with the discussed change. 
I don't see any difference in CPU usage, GC, request latency etc., thus 
potentially looking at the wrong metrics.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189717#comment-16189717
 ] 

T Jake Luciani commented on CASSANDRA-13929:


The default for recycler is 32k instances per thread. So perhaps change this to 
8192 per thread and see if that makes a difference.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread Thomas Steinmaurer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189626#comment-16189626
 ] 

Thomas Steinmaurer commented on CASSANDRA-13929:


[~tjake]: Thanks for the feedback about invalidating the cache. Not sure what 
actually is cached here, but without nulling the reference, I do see e.g. ~ 
770K BTree$Builder instances on the heap. Infinite caching still sounds like a 
memory leak to me, but that's nitpicking now. :-)

Thanks again.

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory

2017-10-03 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189587#comment-16189587
 ] 

T Jake Luciani commented on CASSANDRA-13929:


The recycler is meant to cache objects for reuse. By nulling the handler you 
are effectively invalidating the cache every time. 

I don't think this is a leak but perhaps we should limit this cache to hold 
less items. (see the recycler constructor)

> BTree$Builder / io.netty.util.Recycler$Stack leaking memory
> ---
>
> Key: CASSANDRA-13929
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13929
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Thomas Steinmaurer
> Attachments: cassandra_3.11.0_min_memory_utilization.jpg, 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png, 
> cassandra_3.11.1_mat_dominator_classes.png, 
> cassandra_3.11.1_snapshot_heaputilization.png
>
>
> Different to CASSANDRA-13754, there seems to be another memory leak in 
> 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack.
> * heap utilization increase after upgrading to 3.11.0 => 
> cassandra_3.11.0_min_memory_utilization.jpg
> * No difference after upgrading to 3.11.1 (snapshot build) => 
> cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing 
> CASSANDRA-13754, more visible now
> * MAT shows io.netty.util.Recycler$Stack as top contributing class => 
> cassandra_3.11.1_mat_dominator_classes.png
> * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart 
> after ~ 72 hours
> Verified the following fix, namely explicitly unreferencing the 
> _recycleHandle_ member (making it non-final). In 
> _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_
> {code}
> public void recycle()
> {
> if (recycleHandle != null)
> {
> this.cleanup();
> builderRecycler.recycle(this, recycleHandle);
> recycleHandle = null; // ADDED
> }
> }
> {code}
> Patched a single node in our loadtest cluster with this change and after ~ 10 
> hours uptime, no sign of the previously offending class in MAT anymore => 
> cassandra_3.11.1_mat_dominator_classes_FIXED.png
> Can' say if this has any other side effects etc., but I doubt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org