[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881762#comment-16881762 ]
Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 7/10/19 6:32 AM: ------------------------------------------------------------------------- Performance tests were run against two C* clusters, one running latest trunk, and one running (latest trunk + 15013 patch). Two NDBench clusters, with similar configuration to emit similar traffic, were setup to throw load at each of the C* clusters. Each of the C* clusters is a single region, six i3.8xl nodes, and each of the NDBench clusters is 450 nodes. Following is the analysis of the perf run: # No blocked threadpool in patch, vs blocked threadpool in trunk !perftest_blockedthreads.png! # Similar writeops !perftest_writeops.png! # Patch does more readops vs trunk !perftest_readops.png! # Comparable read and write latencies (99th and avg) !perftest_readlatency_99th.png! !perftest_readlatency_avg.png! !perftest_writelatency_99th.png! !perftest_writelatency_avg.png! # Comparable CPU usage !perftest_cpu_usage.png! # Comparable heap usage !perftest_heap_usage.png! # Connections count (~1000 connections per C* node) !perftest_connections_count.png! was (Author: sumanth.pasupuleti): Performance tests were run against two C* clusters, one running latest trunk, and one running (latest trunk + 15013 patch). Two NDBench clusters, with similar configuration to emit similar traffic, were setup to throw load at each of the C* clusters. Each of the C* clusters is a single region, six i3.8xl nodes, and each of the NDBench clusters is 450 nodes. Following is the analysis of the perf run: # No blocked threadpool in patch, vs blocked threadpool in trunk !perftest_blockedthreads.png! # Similar writeops !perftest_writeops.png|thumbnail! # Patch does more readops vs trunk !perftest_readops.png|thumbnail! # Comparable read and write latencies (99th and avg) !perftest_readlatency_99th.png|thumbnail! !perftest_readlatency_avg.png|thumbnail! !perftest_writelatency_99th.png|thumbnail! !perftest_writelatency_avg.png|thumbnail! # Comparable CPU usage !perftest_cpu_usage.png|thumbnail! # Comparable heap usage !perftest_heap_usage.png|thumbnail! # Connections count (~1000 connections per C* node) !perftest_connections_count.png|thumbnail! > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > ------------------------------------------------------------------------------- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client > Reporter: Sumanth Pasupuleti > Assignee: Sumanth Pasupuleti > Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > perftest_blockedthreads.png, perftest_connections_count.png, > perftest_cpu_usage.png, perftest_heap_usage.png, > perftest_readlatency_99th.png, perftest_readlatency_avg.png, > perftest_readops.png, perftest_writelatency_99th.png, > perftest_writelatency_avg.png, perftest_writeops.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org