[ https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843339#comment-16843339 ]
Sumanth Pasupuleti commented on CASSANDRA-15013: ------------------------------------------------ Thanks for the additional feedback. I've made following further changes to the patch: * Removed requestsProcessed metric. * Added BackpressureDeployed metric to indicate how many times server attempted to apply backpressure. * EndpointPayloadTracker manages requestPayloadInFlightPerEndpoint, and an external consumer of EndpointPayloadTracker calls a static get that internally does tryRef * EndpointAndGlobal.release returns ABOVE_LIMIT or BELOW_LIMIT EndpointPayloadTracker is the only accessor of globalRequestPayloadInFlight, however, it did not seem to make semantic sense of moving globalRequestPayloadInFlight as a member of EndpointPayloadTracker. Regarding starting/stopping all channels for an endpoint at once, I agree it would make the system better respect the limits than otherwise letting each channel cross once (incase of backpressure mode). However, since each channel's connection properties maybe different (THROW_ON_OVERLOAD vs BACKPRESSURE), we will then have to check the property and make a decision accordingly. I am inclining towards punting this for now. Let me know what you think. Regarding per-client limit, I see it as a good logical extension to this patch, and I would like to tackle it as a separate ticket. Also, as an additional ticket, I would like to explore extending this concurrency limitting patch to go beyond the current parameter of incoming payload which works well for writes, but not necessarily for reads, maybe considering the response payload and/or no. of concurrent tasks in flight. > Message Flusher queue can grow unbounded, potentially running JVM out of > memory > ------------------------------------------------------------------------------- > > Key: CASSANDRA-15013 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15013 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Client > Reporter: Sumanth Pasupuleti > Assignee: Sumanth Pasupuleti > Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 3.0.x, 3.11.x > > Attachments: BlockedEpollEventLoopFromHeapDump.png, > BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap > dump showing each ImmediateFlusher taking upto 600MB.png > > > This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue > bounded, since, in the current state, items get added to the queue without > any checks on queue size, nor with any checks on netty outbound buffer to > check the isWritable state. > We are seeing this issue hit our production 3.0 clusters quite often. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org