[ https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145048#comment-17145048 ]
Sergio Bossa commented on CASSANDRA-15700: ------------------------------------------ [~aleksey] apologies for this late reply. I've pushed the recommended changes, please have a look when you have a moment. > Performance regression on internode messaging > --------------------------------------------- > > Key: CASSANDRA-15700 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15700 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode > Reporter: Sergio Bossa > Assignee: Sergio Bossa > Priority: Normal > Labels: pull-request-available > Fix For: 4.0-beta > > Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, > oss40_nogc.tar.xz, oss40_system.log > > Time Spent: 10m > Remaining Estimate: 0h > > Me and [~jasonstack] have been investigating a performance regression > affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries > like workload, as shown in this plot, where blue is 3.11 and orange is 4.0: > !Oss40vsOss311.png|width=389,height=214! > It's been a bit of a long investigation, but two clues ended up standing out: > 1) An abnormal number of expired messages on 4.0 (as shown in the attached > system log), while 3.11 has almost none. > 2) An abnormal GC activity (as shown in the attached gc log). > Turns out the two are related, as the [on expired > callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462] > creates a huge amount of strings in the {{id()}} call. The next question is > what causes all those message expirations; we thoroughly reviewed the > internode messaging code and the only issue we could find so far is related > to the "batch pruning" calls > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81] > and > [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]: > it _seems_ too much time is spent on those, causing the event loop to fall > behind in processing the rest of the messages, which will end up being > expired. This is supported by the analysis of the collapsed stacks (after > fixing the GC issue): > {noformat} > (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5)) > org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456 > org/apache/cassandra/net/OutboundMessageQueue:access$600 1621 > org/apache/cassandra/net/PrunableArrayQueue:prune 1621 > org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621 > org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620 > {noformat} > Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which > spends half of its time pruning. But only a tiny portion of such pruning time > is spent actually expiring: > {noformat} > (tprint (top-aggregated-calls oss40nogc > "OutboundMessageQueue:pruneInternalQueueWithLock" 5)) > org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900 > org/apache/cassandra/net/PrunableArrayQueue:prune 1894 > org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147 > org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147 > org/apache/cassandra/net/OutboundConnection:onExpired 147 > {noformat} > And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant: > {noformat} > (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5)) > org/apache/cassandra/net/PrunableArrayQueue:prune 1718 > org/apache/cassandra/net/OutboundConnection:releaseCapacity 27 > java/util/concurrent/ConcurrentHashMap:replaceNode 19 > java/util/concurrent/ConcurrentLinkedQueue:offer 16 > java/util/concurrent/LinkedBlockingQueue:offer 15 > {noformat} > That said, before proceeding with a PR to fix those issues, I'd like to > understand: what's the reason to prune so often, rather than just when > polling the message during delivery? If there's a reason I'm missing, let's > talk about how to optimize pruning, otherwise let's get rid of that. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org