[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213235#comment-17213235 ]
Michael Semb Wever edited comment on CASSANDRA-15430 at 10/13/20, 5:10 PM: --------------------------------------------------------------------------- {quote}What I don't understand yet (or perhaps not looked closely enough) is, how MultiCBuilder.build() could benefit from that, cause it won't call BTreeSet.builder with any sort of initialCapacity information, thus falling back to the default Object[] size of 16.{quote} Yup, taking a look at it now, you are correct [~tsteinmaurer]. MultiCBuilder's Object[] allocations won't be reduced by just the {{initialCapacity}} fix from 13929. Between 3.0 and 3.11 - https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/MultiCBuilder.java#L272 - https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/MultiCBuilder.java#L262 this was fixed by CASSANDRA-10409 in 3.2 (commit: [958aa7c9|https://github.com/apache/cassandra/commit/958aa7c959cb6c2bca4ed62d78974e83a5371787] That patch is a lot more than just using {{initialCapacity}} so I wouldn't recommend back-porting it (rather recommending you focus on an upgrade to 3.11 instead, to get around this issue). The other hotspot of {{Object[]}} allocation in {{MultiCBuilder}} is in {{addElementToAll(..)}}. An {{initialCapacity}} parameter to {{MultiCBuilder.create(..)}} would help there, used [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSet.java#L163]. But again, that looks like it would only be a hack when it was done properly in 10409 [here|https://github.com/apache/cassandra/commit/958aa7c959cb6c2bca4ed62d78974e83a5371787#diff-cf2ba879f46192374c74759aa522cb5e98afb5015605f07091231ec8943e8c5fR69-R73]. was (Author: michaelsembwever): {quote}What I don't understand yet (or perhaps not looked closely enough) is, how MultiCBuilder.build() could benefit from that, cause it won't call BTreeSet.builder with any sort of initialCapacity information, thus falling back to the default Object[] size of 16.{quote} Taking a look at it now, you are correct [~tsteinmaurer]. MultiCBuilder's Object[] allocations won't be reduced by just the {{initialCapacity}} fix from 13929. Between 3.0 and 3.11 - https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/MultiCBuilder.java#L272 - https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/MultiCBuilder.java#L262 this was fixed by CASSANDRA-10409 in 3.2 (commit: [958aa7c9|https://github.com/apache/cassandra/commit/958aa7c959cb6c2bca4ed62d78974e83a5371787] That patch is a lot more than just using {{initialCapacity}} so I wouldn't recommend back-porting it (rather recommending you focus on an upgrade to 3.11 instead, to get around this issue). The other hotspot of {{Object[]}} allocation in {{MultiCBuilder}} is in {{addElementToAll(..)}}. An {{initialCapacity}} parameter to {{MultiCBuilder.create(..)}} would help there, used [here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSet.java#L163]. But again, that looks like it would only be a hack when it was done properly in 10409 [here|https://github.com/apache/cassandra/commit/958aa7c959cb6c2bca4ed62d78974e83a5371787#diff-cf2ba879f46192374c74759aa522cb5e98afb5015605f07091231ec8943e8c5fR69-R73]. > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug > Components: Local/Other > Reporter: Thomas Steinmaurer > Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x > > Attachments: dashboard.png, jfr_allocations.png, jfr_jmc_2-1.png, > jfr_jmc_2-1_obj.png, jfr_jmc_3-0.png, jfr_jmc_3-0_obj.png, > jfr_jmc_3-0_obj_obj_alloc.png, jfr_jmc_3-11.png, jfr_jmc_3-11_obj.png, > jfr_jmc_4-0-b2.png, jfr_jmc_4-0-b2_obj.png, mutation_stage.png, > screenshot-1.png, screenshot-2.png, screenshot-3.png, screenshot-4.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > Name Active Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org