[ https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433637#comment-16433637 ]
Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 9:45 AM: -------------------------------------------------------------------------- I changed {code:java} disk_optimization_strategy: ssd memtable_heap_space_in_mb: 2048 memtable_offheap_space_in_mb: 2048 {code} Streaming was much more faster and produced less CPU pressure than before {code:java} -dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total- read writ| int csw |usr sys idl wai hiq siq| read writ| recv send 9830B 31M| 48k 7751 | 67 2 31 0 0 1|0.20 85.8 | 30M 380k 0 28M| 51k 7838 | 65 2 32 0 0 1| 0 80.9 | 33M 511k 32k 35M| 54k 9024 | 66 2 31 0 0 1|0.60 102 | 37M 540k 0 28M| 41k 7072 | 62 2 36 0 0 1| 0 78.1 | 26M 265k 1638B 25M| 41k 6606 | 62 1 36 0 0 0|0.10 67.6 | 25M 110k 1638B 26M| 41k 7251 | 57 1 41 0 0 0|0.10 69.9 | 27M 138k 819B 24M| 40k 6129 | 56 1 42 0 0 1|0.20 61.5 | 25M 127k 0 25M| 38k 7273 | 56 1 42 0 0 0| 0 66.9 | 26M 162k 1024k 24M| 35k 6501 | 56 1 42 0 0 0|25.2 62.8 | 25M 128k 0 24M| 37k 7238 | 56 1 42 0 0 0| 0 62.6 | 26M 164k 0 24M| 35k 6349 | 56 1 42 0 0 0| 0 63.5 | 25M 145k 410B 26M| 40k 6979 | 56 2 42 0 0 0|0.10 73.1 | 28M 341k 0 28M| 41k 7042 | 56 1 42 0 0 0| 0 70.8 | 30M 350k 2048B 31M| 44k 7334 | 56 2 42 0 0 0|0.20 85.4 | 32M 347k 0 31M| 46k 6515 | 56 1 42 0 0 1| 0 86.0 | 33M 383k 0 30M| 47k 7572 | 56 1 42 0 0 1| 0 82.3 | 33M 466k 7373B 31M| 41k 5742 | 56 1 42 0 0 0|0.20 84.3 | 30M 319k 0 30M| 43k 7146 | 56 2 42 0 0 1| 0 87.4 | 28M 423k {code} when `Received complete` for all Nodes, bootstrap didn't finish and I can observe a * stalled number of `Completed` MutationStage, * while the `Pending` MutationStage seems to skyrocket. * Rest of it looks fine to me :( {code:java} nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 2 7 53 0 0 MutationStage 128 5722021 593964000 0 0 MemtableReclaimMemory 0 0 2194 0 0 PendingRangeCalculator 0 0 19 0 0 GossipStage 0 0 25736 0 0 SecondaryIndexManagement 0 0 0 0 0 HintsDispatcher 0 0 0 0 0 RequestResponseStage 0 0 167108 0 0 ReadRepairStage 0 0 0 0 0 CounterMutationStage 0 0 0 0 0 MigrationStage 0 0 40 0 0 MemtablePostFlush 1 11 2344 0 0 PerDiskMemtableFlushWriter_0 0 0 2194 0 0 ValidationExecutor 0 0 0 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 2 11 2194 0 0 InternalResponseStage 0 0 31 0 0 ViewMutationStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 CacheCleanupExecutor 0 0 0 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 HINT 0 MUTATION 0 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 {code} *Why does `MutationStage`now `(busy) hang`? - While* * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently * G1 Old Gen increases linearly over time and goes far beyond 50GB * See attached [^gc.log.201804111141.zip] at [gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7] was (Author: jalbersdorfer): I changed disk_optimization_strategy: ssd memtable_heap_space_in_mb: 2048 memtable_offheap_space_in_mb: 2048 Streaming was much more faster and produced less CPU pressure than before {code:java} -dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total- read writ| int csw |usr sys idl wai hiq siq| read writ| recv send 9830B 31M| 48k 7751 | 67 2 31 0 0 1|0.20 85.8 | 30M 380k 0 28M| 51k 7838 | 65 2 32 0 0 1| 0 80.9 | 33M 511k 32k 35M| 54k 9024 | 66 2 31 0 0 1|0.60 102 | 37M 540k 0 28M| 41k 7072 | 62 2 36 0 0 1| 0 78.1 | 26M 265k 1638B 25M| 41k 6606 | 62 1 36 0 0 0|0.10 67.6 | 25M 110k 1638B 26M| 41k 7251 | 57 1 41 0 0 0|0.10 69.9 | 27M 138k 819B 24M| 40k 6129 | 56 1 42 0 0 1|0.20 61.5 | 25M 127k 0 25M| 38k 7273 | 56 1 42 0 0 0| 0 66.9 | 26M 162k 1024k 24M| 35k 6501 | 56 1 42 0 0 0|25.2 62.8 | 25M 128k 0 24M| 37k 7238 | 56 1 42 0 0 0| 0 62.6 | 26M 164k 0 24M| 35k 6349 | 56 1 42 0 0 0| 0 63.5 | 25M 145k 410B 26M| 40k 6979 | 56 2 42 0 0 0|0.10 73.1 | 28M 341k 0 28M| 41k 7042 | 56 1 42 0 0 0| 0 70.8 | 30M 350k 2048B 31M| 44k 7334 | 56 2 42 0 0 0|0.20 85.4 | 32M 347k 0 31M| 46k 6515 | 56 1 42 0 0 1| 0 86.0 | 33M 383k 0 30M| 47k 7572 | 56 1 42 0 0 1| 0 82.3 | 33M 466k 7373B 31M| 41k 5742 | 56 1 42 0 0 0|0.20 84.3 | 30M 319k 0 30M| 43k 7146 | 56 2 42 0 0 1| 0 87.4 | 28M 423k {code} when `Received complete` for all Nodes, bootstrap didn't finish and I can observe a * stalled number of `Completed` MutationStage, * while the `Pending` MutationStage seems to skyrocket. * Rest of it looks fine to me :( {code:java} nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 2 7 53 0 0 MutationStage 128 5722021 593964000 0 0 MemtableReclaimMemory 0 0 2194 0 0 PendingRangeCalculator 0 0 19 0 0 GossipStage 0 0 25736 0 0 SecondaryIndexManagement 0 0 0 0 0 HintsDispatcher 0 0 0 0 0 RequestResponseStage 0 0 167108 0 0 ReadRepairStage 0 0 0 0 0 CounterMutationStage 0 0 0 0 0 MigrationStage 0 0 40 0 0 MemtablePostFlush 1 11 2344 0 0 PerDiskMemtableFlushWriter_0 0 0 2194 0 0 ValidationExecutor 0 0 0 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 2 11 2194 0 0 InternalResponseStage 0 0 31 0 0 ViewMutationStage 0 0 0 0 0 AntiEntropyStage 0 0 0 0 0 CacheCleanupExecutor 0 0 0 0 0 Message type Dropped READ 0 RANGE_SLICE 0 _TRACE 0 HINT 0 MUTATION 0 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 0 PAGED_RANGE 0 READ_REPAIR 0 {code} *Why does `MutationStage`now `(busy) hang`? - While* * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently * G1 Old Gen increases linearly over time and goes far beyond 50GB * See attached [^gc.log.201804111141.zip] at [gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7] > OutOfMemoryError when bootstrapping with less than 100GB RAM > ------------------------------------------------------------ > > Key: CASSANDRA-14239 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14239 > Project: Cassandra > Issue Type: Bug > Environment: Details of the bootstrapping Node > * ProLiant BL460c G7 > * 56GB RAM > * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and > saved_caches) > * CentOS 7.4 on SD-Card > * /tmp and /var/log on tmpfs > * Oracle JDK 1.8.0_151 > * Cassandra 3.11.1 > Cluster > * 10 existing Nodes (Up and Normal) > Reporter: Jürgen Albersdorfer > Priority: Major > Attachments: Objects-by-class.csv, > Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml, > gc.log.0.current.zip, gc.log.201804111141.zip, jvm.options, jvm_opts.txt, > stack-traces.txt > > > Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on > our 10 Node C* 3.11.1 Cluster. > During bootstrap, when I watch the cassandra.log I observe a growth in JVM > Heap Old Gen which gets not significantly freed up any more. > I know that JVM collects on Old Gen only when really needed. I can see > collections, but there is always a remainder which seems to grow forever > without ever getting freed. > After the Node successfully Joined the Cluster, I can remove the extra RAM I > have given it for bootstrapping without any further effect. > It feels like Cassandra will not forget about every single byte streamed over > the Network over time during bootstrapping, - which would be a memory leak > and a major problem, too. > I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB > assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated > for org.apache.cassandra.db.Memtable (22 GB) > org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer > (11 GB) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org