[jira] [Commented] (CASSANDRA-6225) GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables and reduce cache size

2013-10-22 Thread Billow Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801887#comment-13801887
 ] 

Billow Gao commented on CASSANDRA-6225:
---

We tried to use
{code}
flush_largest_memtables_at = 1.0;
reduce_cache_sizes_at = 1.0;
{code}

The heap usage remained at 100% in Cassandra 1.2.9. Just hang up there after we 
run stress write test for a while.

> GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables 
> and reduce cache size
> -
>
> Key: CASSANDRA-6225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 1.2.9, SunOS, Java 7
>Reporter: Billow Gao
>
> In GCInspector.logGCResults, cassandra won't flush memtables and reduce Cache 
> Sizes until there is a ConcurrentMarkSweep GC. It caused a long pause on the 
> service. And other nodes could mark it as DEAD.
> In our stress test, we were using 64 concurrent threads to write data to 
> cassandra. The heap usage grew up quickly and reach to maximum.
> We saw several ConcurrentMarkSweep GCs which only freed very few rams until a 
> memtable flush was called. The other nodes marked the node as DOWN when GC 
> took more than 20 seconds.
> {code}
> INFO [ScheduledTasks:1] 2013-10-18 15:42:36,176 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 27481 ms for 1 collections, 5229917848 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:14,013 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 27729 ms for 1 collections, 5381504752 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:50,565 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 29867 ms for 1 collections, 5479631256 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:23,457 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 28166 ms for 1 collections, 5545752344 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:58,290 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 29377 ms for 2 collections, 5343255456 used; max 
> is 6358564864
> {code}
> {code}
> INFO [GossipTasks:1] 2013-10-18 15:42:29,004 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:43:06,901 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:18,254 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:48,507 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:45:32,375 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
> {code}
> We found two solutions to fix the long pause which result in a DOWN status.
> 1. We reduced the maximum ram to 3G. The behavior is the same, but gc was 
> faster(under 20 seconds), so no nodes were marked as DOWN
> 2. Running a cronjob on the cassandra server which period call nodetool -h 
> localhost flush.
> Flush after a full gc just make thing worse and waste time spent on GC. In a 
> heavily load system, you would have several full GCs before a flush can 
> finish. (a flush may take more than 30 seconds)
> Ideally, GCInspector should has a better logic on when to flush memtable. 
> 1. Flush memtable/reduce cache size when it reached the threshold(smaller 
> than full gc threshold).
> 2. prevent frequently flush by remembering the last running time.
> If we call flush before a full gc, then the full gc will release those rams 
> occupied by memtable. Thus reduce the heap usage a lot. Otherwise, full gc 
> will be called again and again until a flush was finished.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (CASSANDRA-6225) GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables and reduce cache size

2013-10-21 Thread Billow Gao (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billow Gao updated CASSANDRA-6225:
--

Environment: Cassandra 1.2.9, SunOS, Java 7  (was: Cassandra 1.2.9, 
Cassandra 1.2.10)

> GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables 
> and reduce cache size
> -
>
> Key: CASSANDRA-6225
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6225
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 1.2.9, SunOS, Java 7
>Reporter: Billow Gao
>
> In GCInspector.logGCResults, cassandra won't flush memtables and reduce Cache 
> Sizes until there is a ConcurrentMarkSweep GC. It caused a long pause on the 
> service. And other nodes could mark it as DEAD.
> In our stress test, we were using 64 concurrent threads to write data to 
> cassandra. The heap usage grew up quickly and reach to maximum.
> We saw several ConcurrentMarkSweep GCs which only freed very few rams until a 
> memtable flush was called. The other nodes marked the node as DOWN when GC 
> took more than 20 seconds.
> {code}
> INFO [ScheduledTasks:1] 2013-10-18 15:42:36,176 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 27481 ms for 1 collections, 5229917848 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:14,013 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 27729 ms for 1 collections, 5381504752 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:43:50,565 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 29867 ms for 1 collections, 5479631256 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:23,457 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 28166 ms for 1 collections, 5545752344 used; max 
> is 6358564864
>  INFO [ScheduledTasks:1] 2013-10-18 15:44:58,290 GCInspector.java (line 119) 
> GC for ConcurrentMarkSweep: 29377 ms for 2 collections, 5343255456 used; max 
> is 6358564864
> {code}
> {code}
> INFO [GossipTasks:1] 2013-10-18 15:42:29,004 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:43:06,901 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:18,254 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:44:48,507 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
>  INFO [GossipTasks:1] 2013-10-18 15:45:32,375 Gossiper.java (line 803) 
> InetAddress /1.2.3.4 is now DOWN
> {code}
> We found two solutions to fix the long pause which result in a DOWN status.
> 1. We reduced the maximum ram to 3G. The behavior is the same, but gc was 
> faster(under 20 seconds), so no nodes were marked as DOWN
> 2. Running a cronjob on the cassandra server which period call nodetool -h 
> localhost flush.
> Flush after a full gc just make thing worse and waste time spent on GC. In a 
> heavily load system, you would have several full GCs before a flush can 
> finish. (a flush may take more than 30 seconds)
> Ideally, GCInspector should has a better logic on when to flush memtable. 
> 1. Flush memtable/reduce cache size when it reached the threshold(smaller 
> than full gc threshold).
> 2. prevent frequently flush by remembering the last running time.
> If we call flush before a full gc, then the full gc will release those rams 
> occupied by memtable. Thus reduce the heap usage a lot. Otherwise, full gc 
> will be called again and again until a flush was finished.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-6225) GCInspector should not wait after ConcurrentMarkSweep GC to flush memtables and reduce cache size

2013-10-21 Thread Billow Gao (JIRA)
Billow Gao created CASSANDRA-6225:
-

 Summary: GCInspector should not wait after ConcurrentMarkSweep GC 
to flush memtables and reduce cache size
 Key: CASSANDRA-6225
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6225
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 1.2.9, Cassandra 1.2.10
Reporter: Billow Gao


In GCInspector.logGCResults, cassandra won't flush memtables and reduce Cache 
Sizes until there is a ConcurrentMarkSweep GC. It caused a long pause on the 
service. And other nodes could mark it as DEAD.

In our stress test, we were using 64 concurrent threads to write data to 
cassandra. The heap usage grew up quickly and reach to maximum.
We saw several ConcurrentMarkSweep GCs which only freed very few rams until a 
memtable flush was called. The other nodes marked the node as DOWN when GC took 
more than 20 seconds.
{code}
INFO [ScheduledTasks:1] 2013-10-18 15:42:36,176 GCInspector.java (line 119) GC 
for ConcurrentMarkSweep: 27481 ms for 1 collections, 5229917848 used; max is 
6358564864
 INFO [ScheduledTasks:1] 2013-10-18 15:43:14,013 GCInspector.java (line 119) GC 
for ConcurrentMarkSweep: 27729 ms for 1 collections, 5381504752 used; max is 
6358564864
 INFO [ScheduledTasks:1] 2013-10-18 15:43:50,565 GCInspector.java (line 119) GC 
for ConcurrentMarkSweep: 29867 ms for 1 collections, 5479631256 used; max is 
6358564864
 INFO [ScheduledTasks:1] 2013-10-18 15:44:23,457 GCInspector.java (line 119) GC 
for ConcurrentMarkSweep: 28166 ms for 1 collections, 5545752344 used; max is 
6358564864
 INFO [ScheduledTasks:1] 2013-10-18 15:44:58,290 GCInspector.java (line 119) GC 
for ConcurrentMarkSweep: 29377 ms for 2 collections, 5343255456 used; max is 
6358564864
{code}
{code}
INFO [GossipTasks:1] 2013-10-18 15:42:29,004 Gossiper.java (line 803) 
InetAddress /1.2.3.4 is now DOWN
 INFO [GossipTasks:1] 2013-10-18 15:43:06,901 Gossiper.java (line 803) 
InetAddress /1.2.3.4 is now DOWN
 INFO [GossipTasks:1] 2013-10-18 15:44:18,254 Gossiper.java (line 803) 
InetAddress /1.2.3.4 is now DOWN
 INFO [GossipTasks:1] 2013-10-18 15:44:48,507 Gossiper.java (line 803) 
InetAddress /1.2.3.4 is now DOWN
 INFO [GossipTasks:1] 2013-10-18 15:45:32,375 Gossiper.java (line 803) 
InetAddress /1.2.3.4 is now DOWN
{code}

We found two solutions to fix the long pause which result in a DOWN status.
1. We reduced the maximum ram to 3G. The behavior is the same, but gc was 
faster(under 20 seconds), so no nodes were marked as DOWN

2. Running a cronjob on the cassandra server which period call nodetool -h 
localhost flush.

Flush after a full gc just make thing worse and waste time spent on GC. In a 
heavily load system, you would have several full GCs before a flush can finish. 
(a flush may take more than 30 seconds)

Ideally, GCInspector should has a better logic on when to flush memtable. 
1. Flush memtable/reduce cache size when it reached the threshold(smaller than 
full gc threshold).
2. prevent frequently flush by remembering the last running time.

If we call flush before a full gc, then the full gc will release those rams 
occupied by memtable. Thus reduce the heap usage a lot. Otherwise, full gc will 
be called again and again until a flush was finished.





--
This message was sent by Atlassian JIRA
(v6.1#6144)