[jira] [Comment Edited] (CASSANDRA-14239) OutOfMemoryError when bootstrapping with less than 100GB RAM

JIRA Wed, 11 Apr 2018 02:46:16 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433637#comment-16433637
 ]


Jürgen Albersdorfer edited comment on CASSANDRA-14239 at 4/11/18 9:45 AM:
--------------------------------------------------------------------------

I changed
{code:java}
disk_optimization_strategy: ssd
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
{code}
Streaming was much more faster and produced less CPU pressure than before 
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total-
 read  writ| int   csw |usr sys idl wai hiq siq| read  writ| recv  send
9830B   31M|  48k 7751 | 67   2  31   0   0   1|0.20  85.8 |  30M  380k
   0    28M|  51k 7838 | 65   2  32   0   0   1|   0  80.9 |  33M  511k
  32k   35M|  54k 9024 | 66   2  31   0   0   1|0.60   102 |  37M  540k
   0    28M|  41k 7072 | 62   2  36   0   0   1|   0  78.1 |  26M  265k
1638B   25M|  41k 6606 | 62   1  36   0   0   0|0.10  67.6 |  25M  110k
1638B   26M|  41k 7251 | 57   1  41   0   0   0|0.10  69.9 |  27M  138k
 819B   24M|  40k 6129 | 56   1  42   0   0   1|0.20  61.5 |  25M  127k
   0    25M|  38k 7273 | 56   1  42   0   0   0|   0  66.9 |  26M  162k
1024k   24M|  35k 6501 | 56   1  42   0   0   0|25.2  62.8 |  25M  128k
   0    24M|  37k 7238 | 56   1  42   0   0   0|   0  62.6 |  26M  164k
   0    24M|  35k 6349 | 56   1  42   0   0   0|   0  63.5 |  25M  145k
 410B   26M|  40k 6979 | 56   2  42   0   0   0|0.10  73.1 |  28M  341k
   0    28M|  41k 7042 | 56   1  42   0   0   0|   0  70.8 |  30M  350k
2048B   31M|  44k 7334 | 56   2  42   0   0   0|0.20  85.4 |  32M  347k
   0    31M|  46k 6515 | 56   1  42   0   0   1|   0  86.0 |  33M  383k
   0    30M|  47k 7572 | 56   1  42   0   0   1|   0  82.3 |  33M  466k
7373B   31M|  41k 5742 | 56   1  42   0   0   0|0.20  84.3 |  30M  319k
   0    30M|  43k 7146 | 56   2  42   0   0   1|   0  87.4 |  28M  423k
{code}
when `Received complete` for all Nodes, bootstrap didn't finish and I can 
observe a

 
 * stalled number of `Completed` MutationStage,
 * while the `Pending` MutationStage seems to skyrocket.
 * Rest of it looks fine to me  :(

 
{code:java}
nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  
All time blocked
ReadStage                              0         0              0         0     
            0
MiscStage                              0         0              0         0     
            0
CompactionExecutor                     2         7             53         0     
            0
MutationStage                        128   5722021      593964000         0     
            0
MemtableReclaimMemory                  0         0           2194         0     
            0
PendingRangeCalculator                 0         0             19         0     
            0
GossipStage                            0         0          25736         0     
            0
SecondaryIndexManagement               0         0              0         0     
            0
HintsDispatcher                        0         0              0         0     
            0
RequestResponseStage                   0         0         167108         0     
            0
ReadRepairStage                        0         0              0         0     
            0
CounterMutationStage                   0         0              0         0     
            0
MigrationStage                         0         0             40         0     
            0
MemtablePostFlush                      1        11           2344         0     
            0
PerDiskMemtableFlushWriter_0           0         0           2194         0     
            0
ValidationExecutor                     0         0              0         0     
            0
Sampler                                0         0              0         0     
            0
MemtableFlushWriter                    2        11           2194         0     
            0
InternalResponseStage                  0         0             31         0     
            0
ViewMutationStage                      0         0              0         0     
            0
AntiEntropyStage                       0         0              0         0     
            0
CacheCleanupExecutor                   0         0              0         0     
            0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

{code}
   

*Why does `MutationStage`now `(busy) hang`? - While*
 * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently
 * G1 Old Gen increases linearly over time and goes far beyond 50GB
 * See attached [^gc.log.201804111141.zip] at 
[gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7]

 


was (Author: jalbersdorfer):
 

I changed
disk_optimization_strategy: ssd
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
Streaming was much more faster and produced less CPU pressure than before

 
{code:java}
-dsk/total- ---system-- ----total-cpu-usage---- --io/total- -net/total-
 read  writ| int   csw |usr sys idl wai hiq siq| read  writ| recv  send
9830B   31M|  48k 7751 | 67   2  31   0   0   1|0.20  85.8 |  30M  380k
   0    28M|  51k 7838 | 65   2  32   0   0   1|   0  80.9 |  33M  511k
  32k   35M|  54k 9024 | 66   2  31   0   0   1|0.60   102 |  37M  540k
   0    28M|  41k 7072 | 62   2  36   0   0   1|   0  78.1 |  26M  265k
1638B   25M|  41k 6606 | 62   1  36   0   0   0|0.10  67.6 |  25M  110k
1638B   26M|  41k 7251 | 57   1  41   0   0   0|0.10  69.9 |  27M  138k
 819B   24M|  40k 6129 | 56   1  42   0   0   1|0.20  61.5 |  25M  127k
   0    25M|  38k 7273 | 56   1  42   0   0   0|   0  66.9 |  26M  162k
1024k   24M|  35k 6501 | 56   1  42   0   0   0|25.2  62.8 |  25M  128k
   0    24M|  37k 7238 | 56   1  42   0   0   0|   0  62.6 |  26M  164k
   0    24M|  35k 6349 | 56   1  42   0   0   0|   0  63.5 |  25M  145k
 410B   26M|  40k 6979 | 56   2  42   0   0   0|0.10  73.1 |  28M  341k
   0    28M|  41k 7042 | 56   1  42   0   0   0|   0  70.8 |  30M  350k
2048B   31M|  44k 7334 | 56   2  42   0   0   0|0.20  85.4 |  32M  347k
   0    31M|  46k 6515 | 56   1  42   0   0   1|   0  86.0 |  33M  383k
   0    30M|  47k 7572 | 56   1  42   0   0   1|   0  82.3 |  33M  466k
7373B   31M|  41k 5742 | 56   1  42   0   0   0|0.20  84.3 |  30M  319k
   0    30M|  43k 7146 | 56   2  42   0   0   1|   0  87.4 |  28M  423k
{code}
when `Received complete` for all Nodes, bootstrap didn't finish and I can 
observe a

 
 * stalled number of `Completed` MutationStage,
 * while the `Pending` MutationStage seems to skyrocket.
 * Rest of it looks fine to me  :(

 
{code:java}
nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  
All time blocked
ReadStage                              0         0              0         0     
            0
MiscStage                              0         0              0         0     
            0
CompactionExecutor                     2         7             53         0     
            0
MutationStage                        128   5722021      593964000         0     
            0
MemtableReclaimMemory                  0         0           2194         0     
            0
PendingRangeCalculator                 0         0             19         0     
            0
GossipStage                            0         0          25736         0     
            0
SecondaryIndexManagement               0         0              0         0     
            0
HintsDispatcher                        0         0              0         0     
            0
RequestResponseStage                   0         0         167108         0     
            0
ReadRepairStage                        0         0              0         0     
            0
CounterMutationStage                   0         0              0         0     
            0
MigrationStage                         0         0             40         0     
            0
MemtablePostFlush                      1        11           2344         0     
            0
PerDiskMemtableFlushWriter_0           0         0           2194         0     
            0
ValidationExecutor                     0         0              0         0     
            0
Sampler                                0         0              0         0     
            0
MemtableFlushWriter                    2        11           2194         0     
            0
InternalResponseStage                  0         0             31         0     
            0
ViewMutationStage                      0         0              0         0     
            0
AntiEntropyStage                       0         0              0         0     
            0
CacheCleanupExecutor                   0         0              0         0     
            0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     0
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

{code}
 

 

 

*Why does `MutationStage`now `(busy) hang`? - While* 
 * SlabPoolCleaner Thread uses a single logical CPU at 100% permanently
 * G1 Old Gen increases linearly over time and goes far beyond 50GB
 * See attached [^gc.log.201804111141.zip] at 
[gceasy.io|http://gceasy.io/diamondgc-report.jsp?oTxnId_value=5c97d52f-1d06-4d28-8ab7-dd9bd58311b7]

 

> OutOfMemoryError when bootstrapping with less than 100GB RAM
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-14239
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14239
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Details of the bootstrapping Node
>  * ProLiant BL460c G7
>  * 56GB RAM
>  * 2x 146GB 10K HDD (One dedicated for Commitlog, one for Data, Hints and 
> saved_caches)
>  * CentOS 7.4 on SD-Card
>  * /tmp and /var/log on tmpfs
>  * Oracle JDK 1.8.0_151
>  * Cassandra 3.11.1
> Cluster
>  * 10 existing Nodes (Up and Normal)
>            Reporter: Jürgen Albersdorfer
>            Priority: Major
>         Attachments: Objects-by-class.csv, 
> Objects-with-biggest-retained-size.csv, cassandra-env.sh, cassandra.yaml, 
> gc.log.0.current.zip, gc.log.201804111141.zip, jvm.options, jvm_opts.txt, 
> stack-traces.txt
>
>
> Hi, I face an issue when bootstrapping a Node having less than 100GB RAM on 
> our 10 Node C* 3.11.1 Cluster.
> During bootstrap, when I watch the cassandra.log I observe a growth in JVM 
> Heap Old Gen which gets not significantly freed up any more.
> I know that JVM collects on Old Gen only when really needed. I can see 
> collections, but there is always a remainder which seems to grow forever 
> without ever getting freed.
> After the Node successfully Joined the Cluster, I can remove the extra RAM I 
> have given it for bootstrapping without any further effect.
> It feels like Cassandra will not forget about every single byte streamed over 
> the Network over time during bootstrapping, - which would be a memory leak 
> and a major problem, too.
> I was able to produce a HeapDumpOnOutOfMemoryError from a 56GB Node (40 GB 
> assigned JVM Heap). YourKit Profiler shows huge amount of Memory allocated 
> for org.apache.cassandra.db.Memtable (22 GB) 
> org.apache.cassandra.db.rows.BufferCell (19 GB) and java.nio.HeapByteBuffer 
> (11 GB)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14239) OutOfMemoryError when bootstrapping with less than 100GB RAM

Reply via email to