[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Mikhail Stepura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959148#comment-14959148
 ] 

Mikhail Stepura commented on CASSANDRA-10449:
-

I would love to get hold of a heapdump for that OOM. At least we could figure 
out what's consuming the heap.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I discovered that an index on one of the tables has a wide row, and I'm 
assuming that to be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959217#comment-14959217
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Ok [~mishail] I will re-run with heap dump enabled (we had it turned off for 
some reason) and post it.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-12 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954097#comment-14954097
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Per [~zznate]'s suggestion I also tried setting compaction throughput to 0, 
with no effect. He also said I should try taking one of the large sstables and 
trying to use sstableloader on it.  I will do that tomorrow.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-09 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950528#comment-14950528
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

After making numerous GC tweaks, including going back to default CMS settings, 
symptoms remain the same.  Would appreciate any additional pointers, as I'm 
grasping at straws now.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-08 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948566#comment-14948566
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Unfortunately increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers resulted in OOMing again instead of hanging.  It seems 
to be hanging when it gets to larger sstables (30GB+).  I will poke around some 
more today.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947793#comment-14947793
 ] 

Yuki Morishita commented on CASSANDRA-10449:


There are couples of things going on.

{code}
ERROR [StreamReceiveTask:29] 2015-10-05 14:46:17,090 CassandraDaemon.java:223 - 
Exception in thread Thread[StreamReceiveTask:29,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
{code}

When rebuilding secondary index after receiving files, bootstrapping node is 
experiencing TombstoneOverwhelmingException.
This can make streaming to hang, as it never completes the receiving task.
Streaming should tolerate secondary index build failure, instead of failing 
entire stream session, it should just warn user and go on, so that user can 
manually trigger secondary index rebuild later.

I'm not sure the above relates to OOM. From StatusLogger, FlushWriter task is 
glowing and that is the cause of OOM.
If you can capture stack using jstack, that would be greate help.

I create separate JIRA for the former.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947486#comment-14947486
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I am going to try again after increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers.  I had not touched these values, so it's possible that 
was hurting me.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946624#comment-14946624
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I increased max heap to 96GB and tried again.  Now doing netstats shows 
progress ground to a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
/52.1.155.147 (using /10.239.209.15)
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
/52.2.9.34 (using /10.239.209.17)
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
/52.0.152.88 (using /10.239.209.44)
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
/52.2.0.164 (using /10.239.209.16)
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
/54.152.177.161 (using /10.239.209.93)
/54.172.174.48 (using /10.239.209.49)
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
/52.2.75.82 (using /10.239.208.88)
/54.165.111.69 (using /10.239.209.47)
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
 4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
/52.6.136.30 (using /10.239.209.45)
Receiving 174 files, 87064163973 bytes total. Already received 91 
files, 53930233899 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
 17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
/52.7.14.201 (using /10.239.209.46)
Receiving 164 files, 46351636573 bytes total. Already received 164 
files, 46351636573 bytes total
/52.2.30.66 (using /10.239.209.18)
Receiving 158 files, 62899520151 bytes total. Already received 158 
files, 62899520151 bytes total
/54.175.138.33 (using /10.239.209.96)
/54.88.44.178 (using /10.239.209.91)
/52.2.109.194 (using /10.239.208.89)
/54.172.81.117 (using /10.239.209.95)
/54.172.103.46 (using /10.239.209.48)
Receiving 164 files, 48771232182 bytes total. Already received 164 
files, 48771232182 bytes total
/54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a19 56
Responses   n/a 0   35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
/52.1.155.147 (using /10.239.209.15)
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
/52.2.9.34 (using /10.239.209.17)
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
/52.0.152.88 (using /10.239.209.44)
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
/52.2.0.164 (using /10.239.209.16)
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
/54.152.177.161 (using /10.239.209.93)
/54.172.174.48 (using /10.239.209.49)
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
/52.2.75.82 (using /10.239.208.88)
/54.165.111.69 (using /10.239.209.47)
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946842#comment-14946842
 ] 

Paulo Motta commented on CASSANDRA-10449:
-

What GC parameters are you using and what was your original heap size before 
increasing to 48G? Have you tried using CMS with smaller heap sizes and/or 
higher young generation sizes? What's your streaming_socket_timeout_in_ms and 
memtable_flush_writers parameters?

ps: I'm not a gc expert, just trying to understand a bit more GC behavior on 
cassandra.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-06 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945149#comment-14945149
 ] 

Philip Thompson commented on CASSANDRA-10449:
-

[~yukim], do you want to look at this as well?

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-05 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1497#comment-1497
 ] 

Philip Thompson commented on CASSANDRA-10449:
-

Can you attach a system.log from a node that fails to bootstrap? 

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)