[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768865#comment-16768865 ]
Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 2/15/19 2:03 AM: ------------------------------------------------------------------------- [~benedict] Your theory seems to be spot on (I have all the evidence supporting it from the heap dumps and thread dumps now). * Evidence of requestExecutor queue full (indicated by taskPermit), and all 128 workers busy (indicated by workPermit) [ !RequestExecutorQueueFull.png|thumbnail! ] * Evidence of blocked epollEventLoopGroup threads (from heap) [ [^BlockedEpollEventLoopFromHeapDump.png] ] * Evidence of blocked epollEventLoopGroup threads (from thread dump) [ [^BlockedEpollEventLoopFromThreadDump.png] ] was (Author: sumanth.pasupuleti): [~benedict] Your theory seems to be spot on (I have all the evidence supporting it from the heap dumps and thread dumps now). * Evidence of requestExecutor queue full (indicated by taskPermit), and all 128 workers busy (indicated by workPermit) [ !RequestExecutorQueueFull.png|thumbnail! ] * Evidence of blocked epollEventLoopGroup threads (from heap) [ [^BlockedEpollEventLoopFromHeapDump.png] ] * Evidence of blocked epollEventLoopGroup threads (from thread dump) [ [^BlockedEpollEventLoopFromThreadDump.png] ] > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > ------------------------------------------------------------------------------- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client > Reporter: Sumanth Pasupuleti > Assignee: Sumanth Pasupuleti > Priority: Major > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png, > image-2019-02-14-17-59-50-794.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org