[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-29 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9914:
---
Fix Version/s: (was: 2.1.x)

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Attachment: high_pending_compactions.txt

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: high_pending_compactions.txt


 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again.  All tables are set to STCS.  
The issue persists after restart, but takes a few minutes before it becomes a 
problem.  SSTable counts are below 10 for every table.  We're also seeing 20s 
Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland

 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with 
 [CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
 observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland

 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Attachment: cass_high_cpu.png

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* and 
about 10GB of data on each node.  It's showing millions of pending compaction 
tasks (but no actual work in progress), and the CPUs are pegged on all three 
nodes.  The task count goes down rapidly, but then jumps back up again seconds 
later.  All tables are set to STCS.  The issue persists after restart, but 
takes a few minutes before it becomes a problem.  SSTable counts are below 10 
for every table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9914:
---
 Assignee: Marcus Eriksson
Fix Version/s: 2.1.x

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)