[jira] [Updated] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9146:
---
Priority: Major  (was: Blocker)

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj

 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj updated CASSANDRA-9146:

Attachment: sstables.txt
system-modified.log

Please find attached the logs:
1. system-modified.log = system logs
2. sstables.txt = listing of sstables in ks1cf1 column family in test_ks1 
Keyspace

Repair -pr was running on node on 3 instances everytime creating numerous 
sstables every second:
2015-04-09 09:14:36 TO 2015-04-09 12:07:28
2015-04-09 14:34 (stopped at 15:07)
2015-04-09 15:11 

While only 42 sstables exist for ks1cf1Idx3 as it was compacting 
regularly..other two indexes ks1cf1Idx1 and ks1cf1Idx2 have 8932 sstables.

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj
 Attachments: sstables.txt, system-modified.log


 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)