[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506462#comment-16506462 ] Jay Zhuang commented on CASSANDRA-13929: Thanks [~jasobrown] again for the review. Committed as [{{ed5f834}}|https://github.com/apache/cassandra/commit/ed5f8347ef0c7175cd96e59bc8bfaf3ed1f4697a]. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505984#comment-16505984 ] Jason Brown commented on CASSANDRA-13929: - [~jay.zhuang] +1 > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505023#comment-16505023 ] Jay Zhuang commented on CASSANDRA-13929: Thanks [~jasobrown] for the review and fix. Yes, null out the {{values}} array before reusing is a good practice. +1 on that. Seems the dTest is failed for 3.11 branch, but I don't think it's caused by this patch. Please let me know if I could commit. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504890#comment-16504890 ] Jason Brown commented on CASSANDRA-13929: - [~tsteinmaurer] ok, not a problem. As long as you don't see streaming getting obviously slower (1 hour -> 2 hours, for example), I'll take that as a data point. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504873#comment-16504873 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~jasobrown], sorry I don't have any streaming benchmarks before/after. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504842#comment-16504842 ] Jason Brown commented on CASSANDRA-13929: - running [~jay.zhuang]'s patch with a few minor cleanups: ||3.11||trunk|| |[branch|https://github.com/jasobrown/cassandra/tree/13929-3.11]|[branch|https://github.com/jasobrown/cassandra/tree/13929-trunk]| |[utests & dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/13929-3.11]|[utests & dtests|https://circleci.com/gh/jasobrown/workflows/cassandra/tree/13929-trunk]| || > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504733#comment-16504733 ] Jason Brown commented on CASSANDRA-13929: - [~tsteinmaurer] if you've been running with some version of this patch (where recycling is not used), do you see any degradation in streaming? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504704#comment-16504704 ] Jason Brown commented on CASSANDRA-13929: - On the whole, this looks pretty good. I'm running tests locally now, and will run in circleci shortly. [~jay.zhuang] There's one thing I'm wondering: in {{#reuse{}}}, we do not explicitly null out the {{values}} array, like we used to do in {{#cleanup()}}: {code} Arrays.fill(values, null); {code} While every access of {{values}} seems safe, and uses the {{count}} to ensure proper sizing of any built {{Object[]}}, I'd like to be over-cautious and null out the array to prevent the possibility of data leak wherein a reuse of Builder erroneously gets extra data from the last use. This error doesn't seem like it's possible in the current code, but I'd like to future-proof this - I'd be a complete nightmare to debug. wdyt? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.3 > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502546#comment-16502546 ] Cyril Scetbon commented on CASSANDRA-13929: --- Hey guys, any news on that issue ? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385610#comment-16385610 ] Jay Zhuang commented on CASSANDRA-13929: Hi [~tjake], are you interested in reviewing the patch? The trunk uTest failure is because of CASSANDRA-14119. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374200#comment-16374200 ] Thomas Steinmaurer commented on CASSANDRA-13929: The following does not include the latest patches from Feb 22, but shows last 30d on a single node (m4.2xlarge, Xmx12G, CMS) out of our 9 node loadtest environment including various tests/patches we have applied. !cassandra_heapcpu_memleak_patching_test_30d.png|width=1280! * Blue line => AVG heap utilization * Orange line => AVG CPU utilization (not really related as usually compaction is overlaying anything else most likely) Following timelines in the chart: ||Timeframe||Deployment||Comment/Result|| |Jan 25 - Feb 1|Cassandra 3.11 public + Netty 4.0.55|(!) Heap utilization increase| |Feb 1 - Feb 6|Cassandra 3.11 public + Netty 4.0.55 + limiting Netty capacity per Thread|(!) Heap utilization increase| |Feb 6 - Feb 14|Cassandra 3.11 public + Netty 4.0.55 + my recycleHandle = null patch|(/) Heap utilization stable| |Feb 14 - Feb 23|Cassandra 3.11 public + Netty 4.0.55 + *without* recycleHandle = null patch + first [~jay.zhuang] patch from Feb 13|(/) Heap utilization stable, but slightly increased to previous| Very high-level (although from the field) compared to [~jay.zhuang] tests and benchmarks, but possibly useful for a decision process, hopefully being included in 3.11.3. Thanks guys! > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > cassandra_heapcpu_memleak_patching_test_30d.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374138#comment-16374138 ] Norman Maurer commented on CASSANDRA-13929: --- Just a general comment The recycler only makes sense to use if creating the object is considered very expensive and or if you create / destroy a lot of these very frequently. Which means usually thousands per second. So if this is not the case here I think it completely reasonable to not use the Recycler at all... As I have no idea really about the use-case I am just leave this here as general comment :) > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373845#comment-16373845 ] Jay Zhuang commented on CASSANDRA-13929: I tested the change with {{LongStreamingTest}}, seems no impact with/without {{Recycler}}: {noformat} With Recycler [junit] ERROR [main] 2018-02-22 16:57:56,189 SubstituteLogger.java:250 - Writer finished after 22 seconds [junit] ERROR [main] 2018-02-22 16:58:16,480 SubstituteLogger.java:250 - Finished Streaming in 20.29 seconds: 23.66 Mb/sec [junit] ERROR [main] 2018-02-22 16:58:35,921 SubstituteLogger.java:250 - Finished Streaming in 19.44 seconds: 24.69 Mb/sec [junit] ERROR [main] 2018-02-22 16:59:25,719 SubstituteLogger.java:250 - Finished Compacting in 49.80 seconds: 19.44 Mb/sec No Recycler [junit] ERROR [main] 2018-02-22 16:50:48,255 SubstituteLogger.java:250 - Writer finished after 22 seconds [junit] ERROR [main] 2018-02-22 16:51:08,209 SubstituteLogger.java:250 - Finished Streaming in 19.95 seconds: 24.06 Mb/sec [junit] ERROR [main] 2018-02-22 16:51:27,624 SubstituteLogger.java:250 - Finished Streaming in 19.41 seconds: 24.72 Mb/sec [junit] ERROR [main] 2018-02-22 16:52:16,900 SubstituteLogger.java:250 - Finished Compacting in 49.28 seconds: 19.48 Mb/sec {noformat} Here is the patch, please review: | Branch | uTest | | [13929-3.11|https://github.com/cooldoger/cassandra/tree/13929-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11] | | [13929-trunk|https://github.com/cooldoger/cassandra/tree/13929-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/13929-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-trunk] | > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373549#comment-16373549 ] T Jake Luciani commented on CASSANDRA-13929: I'd prefer to reduce the array size vs removing but you have been at this for so long I'm starting to think it's not worth keeping around. I'd like to re-setup the test I had to check the improvement of CASSANDRA-9766 and see how much of a impact it has. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373186#comment-16373186 ] Jay Zhuang commented on CASSANDRA-13929: Hi [~tjake], any suggestion on that? Should we remove {{Recycler}} or just reduce the large builder array size? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369517#comment-16369517 ] Jay Zhuang commented on CASSANDRA-13929: Seems like the performance is better without {{Recycler}}. Here is microbench test result to build a BTree with and without {{Recycler}} ([13929-3.11-perf|https://github.com/cooldoger/cassandra/tree/13929-3.11-perf]) (The score is operation per ms, higher is better) {noformat} [java] Benchmark (dataSize) (treeBuilder) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 treeBuilderRecycleAdd thrpt6 23112.102 ? 17522.471 ops/ms [java] BTreeBuildBench.buildTreeTest 1 treeBuilderAdd thrpt6 46275.541 ? 60422.458 ops/ms [java] BTreeBuildBench.buildTreeTest 2 treeBuilderRecycleAdd thrpt6 23588.176 ? 16372.260 ops/ms [java] BTreeBuildBench.buildTreeTest 2 treeBuilderAdd thrpt6 42838.298 ? 25339.870 ops/ms [java] BTreeBuildBench.buildTreeTest 5 treeBuilderRecycleAdd thrpt6 24358.111 ? 24339.382 ops/ms [java] BTreeBuildBench.buildTreeTest 5 treeBuilderAdd thrpt6 60074.551 ? 47329.418 ops/ms [java] BTreeBuildBench.buildTreeTest 10 treeBuilderRecycleAdd thrpt6 21412.578 ? 6072.160 ops/ms [java] BTreeBuildBench.buildTreeTest 10 treeBuilderAdd thrpt6 50862.304 ? 30597.546 ops/ms [java] BTreeBuildBench.buildTreeTest 20 treeBuilderRecycleAdd thrpt6 15871.754 ? 5036.739 ops/ms [java] BTreeBuildBench.buildTreeTest 20 treeBuilderAdd thrpt6 33699.725 ? 10857.366 ops/ms [java] BTreeBuildBench.buildTreeTest 40 treeBuilderRecycleAdd thrpt6 5168.225 ? 1212.571 ops/ms [java] BTreeBuildBench.buildTreeTest 40 treeBuilderAdd thrpt6 8806.838 ? 7736.485 ops/ms [java] BTreeBuildBench.buildTreeTest 100 treeBuilderRecycleAdd thrpt6 2114.218 ? 639.589 ops/ms [java] BTreeBuildBench.buildTreeTest 100 treeBuilderAdd thrpt6 3213.333 ? 486.126 ops/ms [java] BTreeBuildBench.buildTreeTest1000 treeBuilderRecycleAdd thrpt6335.523 ? 101.230 ops/ms [java] BTreeBuildBench.buildTreeTest1000 treeBuilderAdd thrpt6386.678 ? 333.534 ops/ms [java] BTreeBuildBench.buildTreeTest 1 treeBuilderRecycleAdd thrpt6 35.644 ?32.171 ops/ms [java] BTreeBuildBench.buildTreeTest 1 treeBuilderAdd thrpt6 44.250 ? 8.180 ops/ms [java] BTreeBuildBench.buildTreeTest 10 treeBuilderRecycleAdd thrpt6 3.073 ? 2.165 ops/ms [java] BTreeBuildBench.buildTreeTest 10 treeBuilderAdd thrpt6 4.651 ? 4.137 ops/ms {noformat} And I think the performance gain for CASSANDRA-9766 is not because of the {{Recycler}} for BTree builder. With {{Recycler}}, it does reduce the GC by reserving the memory. But for P95 and sometime P99, {{noRecycler}} is still better. Here is the test result with percentiles (score is time per operation, so smaller is better): {noformat} [java] Benchmark(dataSize) (treeBuilder)Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 treeBuilderRecycleAdd sample 9244739 1435.033 ? 111.857 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.00 1 treeBuilderRecycleAdd sample 92.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.50 1 treeBuilderRecycleAdd sample 638.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.90 1 treeBuilderRecycleAdd sample 1688.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.95 1 treeBuilderRecycleAdd sample 2292.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.99 1 treeBuilderRecycleAdd sample 4544.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0.9991 treeBuilderRecycleAdd sample13792.000 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p0. 1 treeBuilderRecycleAdd sample 124233.984 ns/op [java] BTreeBuildBench.buildTreeTest:buildTreeTest?p1.00 1 treeBuilderRecycleAdd sample104202240.000 ns/op [java] BTreeBuildBench.buildTreeTes
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363974#comment-16363974 ] Norman Maurer commented on CASSANDRA-13929: --- [~tsteinmaurer] yeah thanks I see... So looks more like a misusage for me then a netty bug. Cassandra may also consider to configure the `Recycler` with a more sane default value for this use-case (via the constructor). > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363884#comment-16363884 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~norman], a concrete example for a single Recycler$Stack instance: !memleak_heapdump_recyclerstack.png|width=300! Looks like we have a lot of largish 32K object arrays, which seem to be empty as shallow heap = retained heap for this object arrays. So, a quick look on the patch of [~jay.zhuang], this might help. [~jjirsa], will re-patch a single node and report back after a considerable uptime. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png, memleak_heapdump_recyclerstack.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363597#comment-16363597 ] Norman Maurer commented on CASSANDRA-13929: --- [~jay.zhuang] I would be interested what is contained in the `Stack` itself > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363596#comment-16363596 ] Jeff Jirsa commented on CASSANDRA-13929: [~tsteinmaurer] - available to test? Patch looks straight forward to me, but any volunteers to review? Given its impact, probably deserves a more thorough review than I have time for. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361890#comment-16361890 ] Jay Zhuang commented on CASSANDRA-13929: After increasing the heap size from 1G to 4G heap size, it could support 80 such requests at the same time ({{[large_in_test.py:12|https://github.com/cooldoger/cassandra-dtest/blob/13929/large_in_test.py#L12]}}), then the {{$Stack}} uses about 70% heap: !dtest_example_80_request.png! Tried the patch, it did fix the problem: !dtest_example_80_request_fix.png! > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, > dtest_example_80_request.png, dtest_example_80_request_fix.png, > dtest_example_heap.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361873#comment-16361873 ] Jay Zhuang commented on CASSANDRA-13929: I think it's just caching the largest tree ever built ({{[BTree.java:907|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/BTree.java#L907]}}) and getting bigger and bigger. Here is an example with a large number of {{IN}} in a CQL, which causes high heap usage for {{$Stack}} : {noformat} session.execute_async("select id from test.users where id in (%s)" % ','.join(map(str, range(10 {noformat} Here is a dtest to reproduce that: {{[large_in_test.py|https://github.com/cooldoger/cassandra-dtest/blob/13929/large_in_test.py]}} {{$Stack}} uses 25% 10M heap: {{10M = 10 (thread) * 100K (element) * 8 (obj size)}} !dtest_example_heap.png! We could increase heap size and then increase thread number, element number gradually to make {{$Stack}} to 80%+ heap usage. And I'm sure there're other use cases could build large BTree. Seems the {{$Stack}} is unable to release until the {{Thread}} is released and we don't that for {{ThreadPoolExecutor}}. I'd like to propose the following patch which cleans large BTree builder ({{> 1k elements}}) while recycling: | Branch | uTest | | [13929-3.11|https://github.com/cooldoger/cassandra/tree/13929-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/13929-3.11] | > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png, dtest_example_heap.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352117#comment-16352117 ] Thomas Steinmaurer commented on CASSANDRA-13929: I have attached a new heap utilization chart for our 9 node loadtest environment. !cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png|width=300! The marked node is running with the following configuration: * Cassandra 3.11.1 public release * Netty 4.0.55 * -Dio.netty.recycler.maxCapacityPerThread=1024 This still results in a slow increase in AVG heap utilization The other 8 nodes are running with 3.11.2 built from source with my recycleHandle "nulling" patch. [~norman], I need to internally check if we need to do that via some sort of NDA then. Will report back. Thanks. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png, > cassandra_3.11.1_vs_3.11.2recyclernullingpatch.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348723#comment-16348723 ] Norman Maurer commented on CASSANDRA-13929: --- [~tsteinmaurer] what about a heap dump ? Is this something you could provide ? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348576#comment-16348576 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~jay.zhuang], will let the node (1 out of 9) - previously, manually upgraded to Netty 4.0.55 - running with your mentioned Netty JVM option over the weekend. Thanks. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348181#comment-16348181 ] Norman Maurer commented on CASSANDRA-13929: --- yeah 4.0.55.Final should have the "fix" as well: [https://github.com/netty/netty/commit/b386ee3eaf35abd5072992d626de6ae2ccadc6d9#diff-23eafd00fcd66829f8cce343b26c236a] That said maybe there are other issues. Would it be possible to share a heap-dump ? > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347760#comment-16347760 ] Jay Zhuang commented on CASSANDRA-13929: Hi [~tsteinmaurer], as [~tjake] suggested, would you please try to reduce the Recycler Capacity Per Thread ([Recycler.java:64|https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/Recycler.java#L64]), just restart the node with JVM option: {noformat} -Dio.netty.recycler.maxCapacityPerThread=1024 {noformat} > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347629#comment-16347629 ] Nate McCall commented on CASSANDRA-13929: - If [~norman] is around, would be great to get his input on whether this might be a leak or tuning issue with recycler. Thanks for your patience and willingness to test this out [~tsteinmaurer]. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346779#comment-16346779 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~jjirsa], had 3.11.1 with Netty 4.0.55 on one node in parallel with 3.11.1 incl. "nulling patch" and Netty 4.0.44 on other nodes running over the weekend. Unpatched combo with Netty 4.0.55 unfortunately still shows an AVG heap utilization increase over 72hrs, while the patched show a constant AVG heap usage. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340152#comment-16340152 ] Thomas Steinmaurer commented on CASSANDRA-13929: Makes sense. Restarted the single node with Netty 4.0.55 now. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340146#comment-16340146 ] Jeff Jirsa commented on CASSANDRA-13929: h3. [netty-4.0.55|https://github.com/netty/netty/releases/tag/netty-4.0.55.Final] may be a better test, since 3.11.1 is running netty 4.0.44 now, 4.0.55 would be on the same major and appears to have many of the similar fixes to Recycler. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340134#comment-16340134 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~jjirsa], thanks for the follow-up. Looks like a simple netty jar file replacement works, at least Cassandra starts up without any exceptions. I have now 1 of 9 nodes running with 3.11.1 unpatched + Netty 4.1.20. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16339863#comment-16339863 ] Jeff Jirsa commented on CASSANDRA-13929: I chatted offline with Scott (who is far more familiar with the netty project than I am), and he noted that they've fixed a few recent bugs in that area of the code that COULD resolve leaks. I'm not confident this is actually a leak, vs just the recycler working as intended (but needing to be tuned), but perhaps we can consider bumping netty anyway (at least in trunk this is an easy decision, but it'd be interesting to find if netty 4.1.20 fixes the behavior you see in unpatched 3.11.1 [~tsteinmaurer] ). > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334780#comment-16334780 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~mshuler], my patch would look like as mentioned in the description, but I'm pretty sure it will get rejected, cause [~tjake] mentioned that this all has been added to cache something, especially useful for streaming, if I remember correctly. Due to lack of knowledge about the inner workings here, I can't provide anything more useful. Our loadtest environment is running a locally patched build with the nulling approach. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer >Priority: Major > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286047#comment-16286047 ] Michael Shuler commented on CASSANDRA-13929: I set the fix version to {{3.11.x}} to indicate this is intended for the 3.11 series for you. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Fix For: 3.11.x > > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286044#comment-16286044 ] Michael Shuler commented on CASSANDRA-13929: You may have some better success getting eyes on this JIRA by attaching a proper patch for the 3.11 branch (and a separate trunk patch, if merge up isn't clean). You can do this with a simple text file attachment of the diff, or linking to a github branch. Bonus points for adding a test that reproduces the problem/fix, as well as getting test run through circleci. This JIRA can be assigned to yourself as the author and you can set the status to "Patch Available" when there's an actual patch/branch to review. Then you can possibly seek out a reviewer on the dev@ mailing list, if it's urgent. It looks like there are currently 115 Patch Available tickets, so it may still take some time, but getting this set up for someone else to review would be a great step in getting the process rolling further than just a comment. https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20%3D%20%22Patch%20Available%22 > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285908#comment-16285908 ] Thomas Steinmaurer commented on CASSANDRA-13929: Yet another ping after 2 months of silence and the issue still being unassigned. Is this something which will be handled in the 3.11 series? Thanks! > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203473#comment-16203473 ] Thomas Steinmaurer commented on CASSANDRA-13929: Stable for a week now in our 9 node (m4.xlarge) loadtest cluster from a heap perspective with entirely disabling the recycler cache. Any ideas if we can expect something in context of this ticket for 3.11.2? Thanks! > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_NORECYCLE_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190865#comment-16190865 ] Thomas Steinmaurer commented on CASSANDRA-13929: Ok, if this sort of caching is entirely only useful during streaming, this somehow explains why I do not see any difference between the nodes here, cause they process regular business and no repair, bootstrapping etc, thus I can't comment on the performance gain with the cache (possibly this has been tested in CASSANDRA-9766 anyway). If the cache is useful for streaming, it still would definitely make sense IMHO, to make the max capacity configurable via a system property, cassandra.yaml or whatever place is preferred here, cause ~ 1,8G heap usage for this sort of cache at -XmX8G does not make sense. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190006#comment-16190006 ] T Jake Luciani commented on CASSANDRA-13929: This was part of CASSANDRA-9766 and addresses one of the main allocation culprits during streaming. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189956#comment-16189956 ] Thomas Steinmaurer commented on CASSANDRA-13929: I can try a different max value, but what is supposed to be cached here and what area should be suffering without the cache? The reason why I'm asking is, that in our 9 node cluster, a single node is patched with the discussed change. I don't see any difference in CPU usage, GC, request latency etc., thus potentially looking at the wrong metrics. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189717#comment-16189717 ] T Jake Luciani commented on CASSANDRA-13929: The default for recycler is 32k instances per thread. So perhaps change this to 8192 per thread and see if that makes a difference. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189626#comment-16189626 ] Thomas Steinmaurer commented on CASSANDRA-13929: [~tjake]: Thanks for the feedback about invalidating the cache. Not sure what actually is cached here, but without nulling the reference, I do see e.g. ~ 770K BTree$Builder instances on the heap. Infinite caching still sounds like a memory leak to me, but that's nitpicking now. :-) Thanks again. > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13929) BTree$Builder / io.netty.util.Recycler$Stack leaking memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189587#comment-16189587 ] T Jake Luciani commented on CASSANDRA-13929: The recycler is meant to cache objects for reuse. By nulling the handler you are effectively invalidating the cache every time. I don't think this is a leak but perhaps we should limit this cache to hold less items. (see the recycler constructor) > BTree$Builder / io.netty.util.Recycler$Stack leaking memory > --- > > Key: CASSANDRA-13929 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13929 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Thomas Steinmaurer > Attachments: cassandra_3.11.0_min_memory_utilization.jpg, > cassandra_3.11.1_mat_dominator_classes_FIXED.png, > cassandra_3.11.1_mat_dominator_classes.png, > cassandra_3.11.1_snapshot_heaputilization.png > > > Different to CASSANDRA-13754, there seems to be another memory leak in > 3.11.0+ in BTree$Builder / io.netty.util.Recycler$Stack. > * heap utilization increase after upgrading to 3.11.0 => > cassandra_3.11.0_min_memory_utilization.jpg > * No difference after upgrading to 3.11.1 (snapshot build) => > cassandra_3.11.1_snapshot_heaputilization.png; thus most likely after fixing > CASSANDRA-13754, more visible now > * MAT shows io.netty.util.Recycler$Stack as top contributing class => > cassandra_3.11.1_mat_dominator_classes.png > * With -Xmx8G (CMS) and our load pattern, we have to do a rolling restart > after ~ 72 hours > Verified the following fix, namely explicitly unreferencing the > _recycleHandle_ member (making it non-final). In > _org.apache.cassandra.utils.btree.BTree.Builder.recycle()_ > {code} > public void recycle() > { > if (recycleHandle != null) > { > this.cleanup(); > builderRecycler.recycle(this, recycleHandle); > recycleHandle = null; // ADDED > } > } > {code} > Patched a single node in our loadtest cluster with this change and after ~ 10 > hours uptime, no sign of the previously offending class in MAT anymore => > cassandra_3.11.1_mat_dominator_classes_FIXED.png > Can' say if this has any other side effects etc., but I doubt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org