[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-11 Thread Matthew O'Riordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006236#comment-16006236
 ] 

Matthew O'Riordan commented on CASSANDRA-13523:
---

Note: rerunning the same repair caused Cassandra to exit again.

> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.35.144
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,344  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 5035 in 
> slow log
> WARN  [GossipTasks:1] 2017-05-10 18:18:26,344  FailureDetector.java:258 - Not 
> marking nodes down due to local pause of 5109502584 > 50
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.96.213
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /52.221.228.170
> INFO  [Service Thread] 2017-05-10 18:18:26,345  StatusLogger.java:66 - 
> MutationStage 0 0 1199223432 0

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-11 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006620#comment-16006620
 ] 

Chris Lohfink commented on CASSANDRA-13523:
---

This isnt really a bug so much as an operational/config issue. Repairs and 
streaming are pretty memory intensive and will use all of your heap + some.

Should check your limits ({{ulimit -a}}). Check max mmapped files ({{cat 
/proc/sys/vm/max_map_count}}). but most likely "The system is out of physical 
RAM or swap space". should also check your jvm options and docker settings to 
make sure your system can handle load you set it up with. Maybe decrease heap 
size so your physical system can match what your giving the jvm.

That swap use isnt a good thing also fwiw.

> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.35.144
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,344  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 5035 in 
> slow log
> WARN  [GossipTasks:1] 2017-05-10 18:18:26,344  FailureDetector.java:258 - Not 
> marking nodes down due to local pause of 5109502584 > 50
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,3

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-11 Thread Matthew O'Riordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006700#comment-16006700
 ] 

Matthew O'Riordan commented on CASSANDRA-13523:
---

> Should check your limits ({{ulimit -a}})

They are all maxed out i.e. we are not hitting file handle limits.

> Check {{cat /proc/sys/vm/max_map_count}}

Sure, but any advice on what that should be?  We've not tweaked that to date.

> but most likely "The system is out of physical RAM or swap space". should 
> also check your jvm options and docker settings to make sure your system can 
> handle load you set it up with. Maybe decrease heap size so your physical 
> system can match what your giving the jvm.

I'm not following you.  I thought the JVM is running out of memory?

> That swap use isnt a good thing also fwiw.

There pretty much isn't.  With 100 swap operations the swap file is effectively 
not being used.  When the system actually uses swap, it very quickly climbs 
into 1000s or 10,000s of operations.  As far as I can tell from the graphs at 
https://dl.dropboxusercontent.com/u/1575409/Ably/logs/2017-05-10-cassandra-crash/ap-southeast-1/Voila_Capture%202017-05-11_09-08-53_am.png,
 no swap was actually being used.


> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.ja

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-13 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009583#comment-16009583
 ] 

ZhaoYang commented on CASSANDRA-13523:
--

are you using 32-bit jvm ? could you share you jvm setting, etc.

> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.35.144
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,344  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 5035 in 
> slow log
> WARN  [GossipTasks:1] 2017-05-10 18:18:26,344  FailureDetector.java:258 - Not 
> marking nodes down due to local pause of 5109502584 > 50
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.96.213
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /52.221.228.170
> INFO  [Service Thread] 2017-05-10 18:18:26,345  StatusLogger.java:66 - 
> MutationStage 0 0 1199223432 0
>  0
> I

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-14 Thread Matthew O'Riordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009692#comment-16009692
 ] 

Matthew O'Riordan commented on CASSANDRA-13523:
---

> are you using 32-bit jvm ? could you share you jvm setting, etc.

No, I very much doubt we are using 32 bit JVM unless that is done mistakenly.  
Is there an easy way to share all the JVM settings i.e. in the logs or via a 
command?  I am not quite sure what you're after, but happy to provide whatever 
you need of course. Thanks.

> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.35.144
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,344  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 5035 in 
> slow log
> WARN  [GossipTasks:1] 2017-05-10 18:18:26,344  FailureDetector.java:258 - Not 
> marking nodes down due to local pause of 5109502584 > 50
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.96.213
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-15 Thread Chris Lohfink (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010591#comment-16010591
 ] 

Chris Lohfink commented on CASSANDRA-13523:
---

the first line in the GC logs provides a lot of the details relevant to this. I 
suspect you are either exceeding one of the kernel limits (ie number of mem 
mappings) or giving JVM more memory than is actually available to docker 
image/system.

bq. Sure, but any advice on what that should be? We've not tweaked that to date.

Follow: 
http://docs.datastax.com/en/landing_page/doc/landing_page/recommendedSettings.html
 as a good starting points on kernel options
{{vm.max_map_count = 1048575}}

> StreamReceiveTask: java.lang.OutOfMemoryError: Map failed
> -
>
> Key: CASSANDRA-13523
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13523
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Ubuntu 14.04.5 LTS, Docker version 1.9.1, run as a 
> container, 4 core server with 16GB memory.
>Reporter: Matthew O'Riordan
>  Labels: bug, crash
> Fix For: 2.1.13
>
>
> During a nodetool repair -par on one of our keyspaces, Cassandra crashed due 
> to what seems like memory exhaustion within the JVM.  The machine itself had 
> plenty of available memory at the time and did not appear to be under any 
> significant load.
> In the system log, before the crash, the following was logged:
> {code}
> ...
> INFO  [AntiEntropySessions:55] 2017-05-10 18:18:20,627  RepairJob.java:163 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] requesting merkle trees for 
> stats_day_aggregates (to [/54.162.66.114, /54.236.226.76, /52.221.228.170, 
> /54.154.35.144, /54.154.96.213, /52.221.217.27])
> INFO  [ValidationExecutor:54] 2017-05-10 18:18:20,628  
> ColumnFamilyStore.java:905 - Enqueuing flush of stats_day_aggregates: 7018 
> (0%) on-heap, 0 (0%) off-heap
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,628  Memtable.java:347 
> - Writing Memtable-stats_day_aggregates@3469792(2.734KiB serialized bytes, 14 
> ops, 0%/0% of on/off-heap limit)
> INFO  [MemtableFlushWriter:13608] 2017-05-10 18:18:20,629  Memtable.java:382 
> - Completed flushing 
> /var/lib/cassandra/data/ably_production_0_stats/stats_day_aggregates-b6e29201e3d111e5bbf3091830ac5256/ably_production_0_stats-stats_day_aggregates-tmp-ka-43026-Data.db
>  (0.000KiB) for commitlog position ReplayPosition(segmentId=1491420635101, 
> position=24224955)
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:180 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> Session with /52.203.21.193 is complete
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,220  
> StreamResultFuture.java:212 - [Stream #0008cac1-35ad-11e7-b7e4-091830ac5256] 
> All sessions completed
> INFO  [StreamReceiveTask:638] 2017-05-10 18:18:21,221  
> StreamingRepairTask.java:96 - [repair #fe7e3320-35ac-11e7-b7e4-091830ac5256] 
> streaming task succeed, returning response to /52.221.217.27
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,230  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 4844 in 
> slow log
> INFO  [Service Thread] 2017-05-10 18:18:26,233  GCInspector.java:258 - G1 Old 
> Generation GC in 4781ms.  G1 Eden Space: 131072 -> 0; G1 Old Gen: 
> 2774539816 -> 1830851216; G1 Survivor Space: 37748736 -> 0; 
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,237  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.162.66.114
> INFO  [StreamConnectionEstablisher:1] 2017-05-10 18:18:26,239  
> StreamCoordinator.java:209 - [Stream #0293bb60-35ad-11e7-b7e4-091830ac5256, 
> ID#0] Beginning stream session with /54.154.226.20
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,294  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.236.226.76
> INFO  [Service Thread] 2017-05-10 18:18:26,298  StatusLogger.java:51 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac5256] Received merkle tree for 
> stats_day_aggregates from /54.154.35.144
> INFO  [CqlSlowLog-Writer-thread-0] 2017-05-10 18:18:26,344  
> CqlSlowLogWriter.java:151 - Recording statements with duration of 5035 in 
> slow log
> WARN  [GossipTasks:1] 2017-05-10 18:18:26,344  FailureDetector.java:258 - Not 
> marking nodes down due to local pause of 5109502584 > 50
> INFO  [AntiEntropyStage:1] 2017-05-10 18:18:26,344  RepairSession.java:171 - 
> [repair #0330beb0-35ad-11e7-b7e4-091830ac525

[jira] [Commented] (CASSANDRA-13523) StreamReceiveTask: java.lang.OutOfMemoryError: Map failed

2017-05-25 Thread Matthew O'Riordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024951#comment-16024951
 ] 

Matthew O'Riordan commented on CASSANDRA-13523:
---

Chris, apologies for my delay on this.

The GC details are as follows:

{code}
INFO  13:51:35  G1 Young Generation GC in 310ms.  G1 Eden Space: 367001600 -> 
0; G1 Old Gen: 184442160 -> 244320056;
INFO  13:51:35  Pool NameActive   Pending  Completed   
Blocked  All Time Blocked
Total time for which application threads were stopped: 0.0008716 seconds, 
Stopping threads took: 0.604 seconds
INFO  13:51:35  MutationStage 1 0  72774
 0 0
INFO  13:51:35  RequestResponseStage  0 0  0
 0 0
INFO  13:51:35  ReadRepairStage   0 0  0
 0 0
INFO  13:51:35  CounterMutationStage  0 0  0
 0 0
INFO  13:51:35  ReadStage 0 0  0
 0 0
INFO  13:51:35  MiscStage 0 0  0
 0 0
INFO  13:51:35  GossipStage   0 0  0
 0 0
INFO  13:51:35  CacheCleanupExecutor  0 0  0
 0 0
INFO  13:51:35  InternalResponseStage 0 0  0
 0 0
INFO  13:51:35  CommitLogArchiver 0 0  0
 0 0
INFO  13:51:35  CompactionExecutor0 0  1
 0 0
INFO  13:51:35  ValidationExecutor0 0  0
 0 0
INFO  13:51:35  MigrationStage0 0  0
 0 0
INFO  13:51:35  AntiEntropyStage  0 0  0
 0 0
INFO  13:51:35  Sampler   0 0  0
 0 0
INFO  13:51:35  MemtableFlushWriter   0 0  1
 0 0
INFO  13:51:35  MemtablePostFlush 0 0  3
 0 0
INFO  13:51:35  MemtableReclaimMemory 0 0  1
 0 0
{code}

Note that the container does not have any explicit memory allocations and thus 
has access to the entire system memory.  The instance has 16GB of RAM, and 
Cassandra JVM is launched with the following settings:

{code}
/usr/bin/java -Ddse.system_memory_in_mb=16048 
-Dcassandra.config.loader=com.datastax.bdp.config.DseConfigurationLoader -ea 
-javaagent:/usr/share/dse/cassandra/lib/jamm-0.3.0.jar 
-XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms8024M -Xmx8024M -XX:+HeapDumpOnOutOfMemoryError 
-Xss256k -XX:StringTableSize=103 -XX:+UseG1GC -XX:MaxGCPauseMillis=500 
-XX:G1RSetUpdatingPauseTimePercent=5 -XX:+UseCondCardMark 
-XX:+PrintGCApplicationStoppedTime -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.rmi.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=true 
-Dcom.sun.management.jmxremote.password.file=/etc/dse/cassandra/conf/jmx.password
 -Dcom.sun.management.jmxremote.access.file=/etc/dse/cassandra/conf/jmx.access 
-Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
-Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra.pid 
-Dcassandra-foreground=yes -cp 
/etc/dse/cassandra/ably-cassandra-auth.jar::/usr/share/dse/dse-core-4.8.6.jar:/usr/share/dse/dse-hadoop-4.8.6.jar:/usr/share/dse/dse-hive-4.8.6.jar:/usr/share/dse/dse-search-4.8.6.jar:/usr/share/dse/dse-spark-4.8.6.jar:/usr/share/dse/dse-sqoop-4.8.6.jar:/usr/share/dse/common/HdrHistogram-1.2.1.1.jar:/usr/share/dse/common/antlr-2.7.7.jar:/usr/share/dse/common/antlr-3.2.jar:/usr/share/dse/common/antlr-runtime-3.2.jar:/usr/share/dse/common/aopalliance-1.0.jar:/usr/share/dse/common/api-asn1-api-1.0.0-M24.jar:/usr/share/dse/common/api-asn1-ber-1.0.0-M24.jar:/usr/share/dse/common/api-i18n-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-client-api-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-codec-core-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-codec-standalone-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-extras-codec-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-extras-codec-api-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-model-1.0.0-M24.jar:/usr/share/dse/common/api-ldap-net-mina-1.0.0-M24.jar:/usr/share/dse/common/api-util-1.0.0-M24.jar:/usr/share/dse/common/asm-5.0.3.jar:/usr/share/dse/common/commons-beanutils-1.