[jira] [Commented] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526919#comment-14526919 ] Anuj commented on CASSANDRA-8150: - We have write heavy workload and used to face promotion failures/long gc pauses with Cassandra 2.0.x. I am not into code yet but I think that memtable and compaction related objects have mid-life and write heavy workload is not suitable for generation collection by default. So, we tuned JVM to make sure that minimum objects are promoted to Old Gen and achieved great success in that: MAX_HEAP_SIZE=12G HEAP_NEWSIZE=3G -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=20 -XX:CMSInitiatingOccupancyFraction=70 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking We also think that default total_space_in_mb=1/4 heap is too much for write heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in order to make sure that memtable related objects dont stay in memory for too long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full GC observed. Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in RAID5. We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Ryan McGuire Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526919#comment-14526919 ] Anuj edited comment on CASSANDRA-8150 at 5/4/15 5:48 PM: - We have write heavy workload and used to face promotion failures/long gc pauses with Cassandra 2.0.x. I am not into code yet but I think that memtable and compaction related objects have mid-life and write heavy workload is not suitable for generation collection by default. So, we tuned JVM to make sure that minimum objects are promoted to Old Gen and achieved great success in that: MAX_HEAP_SIZE=12G HEAP_NEWSIZE=3G -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=20 -XX:CMSInitiatingOccupancyFraction=70 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking We also think that default total_memtable_space_in_mb=1/4 heap is too much for write heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in order to make sure that memtable related objects dont stay in memory for too long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full GC observed. Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in RAID5. We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row was (Author: eanujwa): We have write heavy workload and used to face promotion failures/long gc pauses with Cassandra 2.0.x. I am not into code yet but I think that memtable and compaction related objects have mid-life and write heavy workload is not suitable for generation collection by default. So, we tuned JVM to make sure that minimum objects are promoted to Old Gen and achieved great success in that: MAX_HEAP_SIZE=12G HEAP_NEWSIZE=3G -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=20 -XX:CMSInitiatingOccupancyFraction=70 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking We also think that default total_space_in_mb=1/4 heap is too much for write heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in order to make sure that memtable related objects dont stay in memory for too long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full GC observed. Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in RAID5. We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Ryan McGuire Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503307#comment-14503307 ] Anuj commented on CASSANDRA-9146: - Yes we use vnodes.We havent changed cold_reads_to_ommit Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Attachments: sstables.txt, system-modified.log Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503028#comment-14503028 ] Anuj commented on CASSANDRA-9146: - Marcus Are you looking into this? Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Attachments: sstables.txt, system-modified.log Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9177) sstablesplit should provide option to generate sstables of different size
Anuj created CASSANDRA-9177: --- Summary: sstablesplit should provide option to generate sstables of different size Key: CASSANDRA-9177 URL: https://issues.apache.org/jira/browse/CASSANDRA-9177 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Anuj Currently, sstablesplit generates sstables of same size. As soon as the Cassandra is started after executing sstablesplit, these tables are compacted back to single huge sstable by STCS. Thus, sstablesplit should provide an option to split single sstable into sstables of different sizes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9174) sstablesplit does not splits Secondary Index CFs
Anuj created CASSANDRA-9174: --- Summary: sstablesplit does not splits Secondary Index CFs Key: CASSANDRA-9174 URL: https://issues.apache.org/jira/browse/CASSANDRA-9174 Project: Cassandra Issue Type: Bug Components: Tools Environment: Cassandra 2.0.3 Reporter: Anuj Priority: Minor When you run sstablesplit on a CF. Secondary Index CFs are not split. If you run sstablesplit on a Secondary Index CF's Data.db file separately, it fails with the message Unknown keyspace/cf pair -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9175) Major Compaction should provide an option to compact Secondary Index CFs
Anuj created CASSANDRA-9175: --- Summary: Major Compaction should provide an option to compact Secondary Index CFs Key: CASSANDRA-9175 URL: https://issues.apache.org/jira/browse/CASSANDRA-9175 Project: Cassandra Issue Type: Wish Components: Tools Reporter: Anuj Priority: Minor Major compaction on a CF does not compacts Secondary Index CFs. It should have an option to compact all Secondary CFs too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487372#comment-14487372 ] Anuj commented on CASSANDRA-9146: - Yes, 2.0.3. Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487423#comment-14487423 ] Anuj commented on CASSANDRA-8938: - I think counting it as one read would make the ready latency in cfstats misleading as range scan may return nunerous rows and is generally slower. What about having a separate Range scan count and latency. Range scan count can be equal to rows read in scan. I think if a range scan reads several rows from a sstable it should impact hotness propotionately. Cassandra should not worry about the type of workload as data is being read and compaction will be useful whether its analytics or oltp. Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Assignee: Marcus Eriksson Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487406#comment-14487406 ] Anuj commented on CASSANDRA-9146: - Yes please analyze as we are facing it in 2.0.3.We may need this fix in 2.0 branch.Upgrading would take some time and scenario is not easily reproducible but once it occurs in a cluster this burst of sstables grows with every repair.We need to understand why this premature flushing is happenning? Whats the work around till we upgrade? If its coldness issue which prevents auto compaction then any advise to make all sstables hot ( we tried with reads but without success)? Range scan doesnt contribute to hotness ..thats another open issue I logged sometime back. Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487439#comment-14487439 ] Anuj edited comment on CASSANDRA-8938 at 4/9/15 2:45 PM: - In our case we do some reporting using full table scans in off hours when resources are free. We dont have an Analytics system like Spark. If sstables are not compacted it affects reporting performance badly. So I think Cassandra should be unbiased. If a data is read actively we will gain performance in bothh workloads. Compaction will have cost and will impact transactional resources temporarily but further Analytics hits would be much faster and those hits will put less load on cassandra resources if tables are compacted. Compaction can be throttled anyways. was (Author: eanujwa): In our case we do some reporting using full table scans in off hours when resources are free. We dont have an Analytics system like Spark. If sstables are not compacted it affects reporting performance badly. So I think Cassandra should be unbiased. If a data is read actively we will gain performance in bothh workloads. Compaction will have cost and will impact transactional resources temporarily but further Analytics hits would be much faster and those hits will put less load on cassandra resources if tables are compacted. Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Assignee: Marcus Eriksson Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anuj updated CASSANDRA-9146: Attachment: sstables.txt system-modified.log Please find attached the logs: 1. system-modified.log = system logs 2. sstables.txt = listing of sstables in ks1cf1 column family in test_ks1 Keyspace Repair -pr was running on node on 3 instances everytime creating numerous sstables every second: 2015-04-09 09:14:36 TO 2015-04-09 12:07:28 2015-04-09 14:34 (stopped at 15:07) 2015-04-09 15:11 While only 42 sstables exist for ks1cf1Idx3 as it was compacting regularly..other two indexes ks1cf1Idx1 and ks1cf1Idx2 have 8932 sstables. Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Attachments: sstables.txt, system-modified.log Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487624#comment-14487624 ] Anuj commented on CASSANDRA-8938: - Agree !! Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Assignee: Marcus Eriksson Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487439#comment-14487439 ] Anuj commented on CASSANDRA-8938: - In our case we do some reporting using full table scans in off hours when resources are free. We dont have an Analytics system like Spark. If sstables are not compacted it affects reporting performance badly. So I think Cassandra should be unbiased. If a data is read actively we will gain performance in bothh workloads. Compaction will have cost and will impact transactional resources temporarily but further Analytics hits would be much faster and those hits will put less load on cassandra resources if tables are compacted. Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Assignee: Marcus Eriksson Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
[ https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487385#comment-14487385 ] Anuj commented on CASSANDRA-9146: - Small correction at max one gc greater than 200ms PER MINUTE. Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair
Anuj created CASSANDRA-9146: --- Summary: Ever Growing Secondary Index sstables after every Repair Key: CASSANDRA-9146 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anuj Priority: Blocker Cluster has reached a state where every repair -pr operation on CF results in numerous tiny sstables being flushed to disk. Most sstables are related to secondary indexes. Due to thousands of sstables, reads have started timing out. Even though compaction begins for one of the secondary index, sstable count after repair remains very high (thousands). Every repair adds thousands of sstables. Problems: 1. Why burst of tiny secondary index tables are flushed during repair ? What is triggering frequent/premature flush of secondary index sstable (more than hundred in every burst)? At max we see one ParNew GC pauses 200ms. 2. Why auto-compaction is not compacting all sstables. Is it related to coldness issue(CASSANDRA-8885) where compaction doesn't works even when cold_reads_to_omit=0 by default? If coldness is the issue, we are stuck in infinite loop: reads will trigger compaction but reads timeout as sstable count is in thousands 3. What's the way out if we face this issue in Prod? Is this issue fixed in latest production release 2.0.13? Issue looks similar to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be fixed in 2.0 branch too. Configuration: Compaction Strategy: STCS memtable_flush_writers=4 memtable_flush_queue_size=4 in_memory_compaction_limit_in_mb=32 concurrent_compactors=12 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8641) Repair causes a large number of tiny SSTables
[ https://issues.apache.org/jira/browse/CASSANDRA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484693#comment-14484693 ] Anuj commented on CASSANDRA-8641: - We faced same issue in 2.0.3 when we ran repair with write load. We had substantial data to repair. We are planning to upgrade to 2.0.13 soon. Is the bug fixed in 2.0.13? Repair causes a large number of tiny SSTables - Key: CASSANDRA-8641 URL: https://issues.apache.org/jira/browse/CASSANDRA-8641 Project: Cassandra Issue Type: Bug Components: Core Environment: Ubuntu 14.04 Reporter: Flavien Charlon Fix For: 2.1.3 I have a 3 nodes cluster with RF = 3, quad core and 32 GB or RAM. I am running 2.1.2 with all the default settings. I'm seeing some strange behaviors during incremental repair (under write load). Taking the example of one particular column family, before running an incremental repair, I have about 13 SSTables. After finishing the incremental repair, I have over 114000 SSTables. {noformat} Table: customers SSTable count: 114688 Space used (live): 97203707290 Space used (total): 99175455072 Space used by snapshots (total): 0 SSTable Compression Ratio: 0.28281112416526505 Memtable cell count: 0 Memtable data size: 0 Memtable switch count: 1069 Local read count: 0 Local read latency: NaN ms Local write count: 11548705 Local write latency: 0.030 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 144145152 Compacted partition minimum bytes: 311 Compacted partition maximum bytes: 1996099046 Compacted partition mean bytes: 3419 Average live cells per slice (last five minutes): 0.0 Maximum live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 {noformat} Looking at the logs during the repair, it seems Cassandra is struggling to compact minuscule memtables (often just a few kilobytes): {noformat} INFO [CompactionExecutor:337] 2015-01-17 01:44:27,011 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-228341,]. 8,332 bytes to 6,547 (~78% of original) in 80,476ms = 0.78MB/s. 32 total partitions merged to 32. Partition merge counts were {1:32, } INFO [CompactionExecutor:337] 2015-01-17 01:45:35,519 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229348,]. 8,384 bytes to 6,563 (~78% of original) in 6,880ms = 0.000910MB/s. 32 total partitions merged to 32. Partition merge counts were {1:32, } INFO [CompactionExecutor:339] 2015-01-17 01:47:46,475 CompactionTask.java:251 - Compacted 32 sstables to [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229351,]. 8,423 bytes to 6,401 (~75% of original) in 10,416ms = 0.000586MB/s. 32 total partitions merged to 32. Partition merge counts were {1:32, } {noformat} Here is an excerpt of the system logs showing the abnormal flushing: {noformat} INFO [AntiEntropyStage:1] 2015-01-17 15:28:43,807 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 634484 (0%) on-heap, 2599489 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:06,823 ColumnFamilyStore.java:840 - Enqueuing flush of levels: 129504 (0%) on-heap, 222168 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:07,940 ColumnFamilyStore.java:840 - Enqueuing flush of chain: 4508 (0%) on-heap, 6880 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:08,124 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 1469772 (0%) on-heap, 2542675 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:09,471 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 809844 (0%) on-heap, 3364728 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,368 ColumnFamilyStore.java:840 - Enqueuing flush of levels: 28212 (0%) on-heap, 44220 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,822 ColumnFamilyStore.java:840 - Enqueuing flush of chain: 860 (0%) on-heap, 1130 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:24,985 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 334480 (0%) on-heap, 568959 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:27,375 ColumnFamilyStore.java:840 - Enqueuing flush of customers: 221568 (0%) on-heap, 929962 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:35,755 ColumnFamilyStore.java:840 - Enqueuing flush of invoices: 7916 (0%) on-heap, 11080 (0%) off-heap INFO [AntiEntropyStage:1] 2015-01-17 15:29:36,239
[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386793#comment-14386793 ] Anuj commented on CASSANDRA-8938: - Even though single partition read and range scan are technically different, from Application point of view they are just reads. I feel that scans should also make sstables HOT and make them eligible for STCS. Regarding nodetool cfstats, if Read count and Read Latency are not including scans , Don't you think we should have stats for scan count and latency ? Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Assignee: Marcus Eriksson Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads
[ https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358306#comment-14358306 ] Anuj commented on CASSANDRA-8938: - Yes. We mean full row scan (select query without WHERE clause). Even if full row scan reads all sstables,it should be considered as Reads and all sstables must be marked hot and available for next compaction. There is only one Read Count when you do cfstats. We are not talking about latency. We think that after a row scan , read count must be incremented and STCS should pick these sstables for compaction as data has been read from them. Full Row Scan does not count towards Reads -- Key: CASSANDRA-8938 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.3 Reporter: Amit Singh Chowdhery Priority: Minor Labels: none When a CQL SELECT statement is executed with WHERE clause, Read Count is incremented in cfstats of the column family. But, when a full row scan is done using SELECT statement without WHERE clause, Read Count is not incremented. Similarly, when using Size Tiered Compaction, if we do a full row scan using Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra still considers all sstables as cold and does not trigger compaction for them. If we fire MultigetSliceQuery, Read Count is incremented and sstables becomes hot, triggering compaction of these sstables. Expected Behavior: 1. Read Count must be incremented by number of rows read during a full row scan done using CQL SELECT statement or Hector RangeslicesQuery. 2. Size Tiered compaction must consider all sstables as Hot after a full row scan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8885) Cold sstable count increases indefinitely in STCS
Anuj created CASSANDRA-8885: --- Summary: Cold sstable count increases indefinitely in STCS Key: CASSANDRA-8885 URL: https://issues.apache.org/jira/browse/CASSANDRA-8885 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.0.3 Reporter: Anuj Priority: Minor This issue is regarding compaction of cold sstables. Compaction Strategy for CF: Size Tiered (Min Threshold=4 Max Threshold=8) Scenario: Run heavy write load (with NO reads) on all 3 nodes of Cassandra cluster. After some time, compactions stop and sstable count increases continuously to large numbers (90+) . As writes continue, small sstable of similar size are added after every few minutes. When writes stop and Cassandra is idle, compaction doesn't happen automatically and sstable count is still very high. When we start reads on sstables, compaction is automatically triggered and sstable count keeps on decreasing to reasonable levels. We think that behaviour is unexpected because of following reasons: 1. As per documentation (https://issues.apache.org/jira/browse/CASSANDRA-6109 and http://www.datastax.com/dev/blog/optimizations-around-cold-sstables), cold_reads_to_omit is disabled by default in 2.0.3. We are using 2.0.3 and not enabled cold_reads_to_omit explicitly. Still, cold sstables are not getting compacted. Coldness should not come in picture by default in 2.0.3. Note:compactionstats show 0 pending tasks while compaction process is not happening. 2. In our scenario, we have heavy writes (say for loading data) followed by reads. Till the time read starts, you have huge no. of sstables to compact.Significant compactions need to be done before read performance is restored. 3. Even when Cassandra is kept idle after writes, compaction doesn't happen. Dont you think compaction should not just look at coldness? It should also consider load on server and utilize time for compaction when possible. Cold sstables can be compacted together. This way we will have much less compactions to do if cold sstables turn hot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298995#comment-14298995 ] Anuj commented on CASSANDRA-8382: - Thanks Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation --- Key: CASSANDRA-8382 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382 Project: Cassandra Issue Type: Bug Components: Documentation website Environment: Red Hat Linux , Cassandra 2.0.3 Reporter: Anuj Use Case: We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen address and seeds of all nodes are Public IPs while rpc addresses are private IPs. Now, we want to decommission a DC2 and change public IPs in listen address/seeds of DC1 nodes to private IPs as it will be a single DC setup. Issue: Cassandra doesn’t provide any standard procedure for changing IP address of nodes in a cluster. We can bring down nodes, one by one, change their IP address and perform the procedure mentioned in “ Replacing a Dead Node” at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html by mentioning public IP of the node in replace_address option. But procedure recommends that you must set the auto_bootstrap option to true. We don’t want any bootstrap and data streaming to happen as data is already there on nodes. So, our questions is : What’s the standard procedure for changing IP address of Cassandra nodes while making sure that no data streaming occurs and gossip state is not corrupted. We are using vnodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298993#comment-14298993 ] Anuj commented on CASSANDRA-8382: - Just bouncing C* didnt remove old IPs. We finally cleared gossip state on all nodes by adding following line at end of cassandra-env.sh: JVM_OPTS=$JVM_OPTS -Dcassandra.load_ring_state=false This solved our issue. I hope the procedure is correct and we need not clear gossip info by other methods e.g. sudo rm -r /var/lib/cassandra/data/system/peers/* or executing Gossiper.unsafeAssassinateEndpoints(ip_address) via JConsole as mentioned at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_gossip_purge.html Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation --- Key: CASSANDRA-8382 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382 Project: Cassandra Issue Type: Bug Components: Documentation website Environment: Red Hat Linux , Cassandra 2.0.3 Reporter: Anuj Use Case: We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen address and seeds of all nodes are Public IPs while rpc addresses are private IPs. Now, we want to decommission a DC2 and change public IPs in listen address/seeds of DC1 nodes to private IPs as it will be a single DC setup. Issue: Cassandra doesn’t provide any standard procedure for changing IP address of nodes in a cluster. We can bring down nodes, one by one, change their IP address and perform the procedure mentioned in “ Replacing a Dead Node” at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html by mentioning public IP of the node in replace_address option. But procedure recommends that you must set the auto_bootstrap option to true. We don’t want any bootstrap and data streaming to happen as data is already there on nodes. So, our questions is : What’s the standard procedure for changing IP address of Cassandra nodes while making sure that no data streaming occurs and gossip state is not corrupted. We are using vnodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278685#comment-14278685 ] Anuj commented on CASSANDRA-8479: - You are correct. These logs are from 2.0.3. As suggested in CASSANDRA 8352, we upgraded to 2.0.11 and tested the issue. Same issue was observed there. We are using read_repair_chance=1,dclocal_read_repair_chance=0. When we made read_repair_chance=0, killing node in local and remote DC didn't lead to any read failures:) We need your help in understanding the following points: 1. We are using strong consistency i.e. LOCAL_QUORUM for reads writes. So, even if one of the replicas is having obsolete value, we will read latest value next time we read data. Does that mean read_repair_chance=1 is not required when LOCAL_QUORUM is used for both reads and writes? We must have read_repair_chance=0, that will give us better performance without sacrificing consistency? What is your recommendation? 2. We are writing to Cassandra at high speeds. Is that the reason we are getting digest mismatch during read repair? And that's when Cassandra goes for CL.ALL irrespective of the fact that we are using CL.LOCAL_QUORUM? 3. I think read_repair is comparing digests from replicas in remote DC also? Isn't that a performance hit? We are currently using Cassandra in Active-Passive mode so updating remote DC fast is not our priority. What's recommended? I tried setting dclocal_read_repair_chance=1 and read_repair_chance=0 in order to make sure that read repairs are only executed within the DC. But I noticed that killing local node didn't caused any read failures. Does that mean the digest mismatch problem occurs with node at remote DC rather than digest of third node which didn't participated in read LOCAL_QUORUM? 4.Documentation at http://www.datastax.com/docs/1.1/configuration/storage_configuration says that read_repair_chance specifies the probability with which read repairs should be invoked on non-quorum reads. What is the significance of non-Quorum here? We are using LOCAL_QUORUM and still read repair is coming into picture. Yes. We misunderstood Tracing. Now that you have identified the issue, do you still need Tracing? Timeout Exception on Node Failure in Remote Data Center --- Key: CASSANDRA-8479 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.11 Reporter: Amit Singh Chowdhery Assignee: Sam Tunnicliffe Priority: Minor Attachments: TRACE_LOGS.zip Issue Faced : We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 Cassandra-pid, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~). Reference : Already a ticket has been opened/resolved and link is provided below : https://issues.apache.org/jira/browse/CASSANDRA-8352 Activity Done as per Resolution Provided : Upgraded to Cassandra 2.0.11 . We have two 3 node clusters in two different DCs and if one or more of the nodes go down in one Data Center , ~5-10% traffic failure is observed on the other. CL: LOCAL_QUORUM RF=3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270893#comment-14270893 ] Anuj commented on CASSANDRA-8479: - I have attached TRACE level logs. You can find multiple ReadTimeoutException in System.log.3 . Once we killed Cassandra on one of the nodes in DC2, around 7 read requests failed for around 17 seconds on DC1 and then everything was back to normal. We need to understand why these reads failed when we are using LOCAL_QUORUM in our application. Also, in another Cassandra log file System.log.2, we saw java.nio.file.NoSuchFileException. We got Hector's HTimeoutException in our application logs during these 17 seconds. Stack Trace from application logs: com.ericsson.rm.service.voucher.InternalServerException: Internal server error, me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.getReservationSlice(CassandraReservation.java:552) ~[na:na] at com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.lookup(CassandraReservation.java:499) ~[na:na] at com.ericsson.rm.voucher.traffic.VoucherTraffic.getReservedOrPendingVoucher(VoucherTraffic.java:764) ~[na:na] at com.ericsson.rm.voucher.traffic.VoucherTraffic.commit(VoucherTraffic.java:686) ~[na:na] ... 6 common frames omitted Caused by: com.ericsson.rm.service.cassandra.xa.ConnectionException: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at com.ericsson.rm.cassandra.xa.keyspace.row.KeyedRowQuery.execute(KeyedRowQuery.java:93) ~[na:na] at com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.getReservationSlice(CassandraReservation.java:548) ~[na:na] ... 9 common frames omitted Caused by: me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException() at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:286) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:269) ~[na:na] at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104) ~[na:na] at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:290) ~[na:na] at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53) ~[na:na] at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49) ~[na:na] at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) ~[na:na] at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101) ~[na:na] at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48) ~[na:na] at com.ericsson.rm.cassandra.xa.keyspace.row.KeyedRowQuery.execute(KeyedRowQuery.java:77) ~[na:na] ... 10 common frames omitted Caused by: org.apache.cassandra.thrift.TimedOutException: null at org.apache.cassandra.thrift.Cassandra$get_slice_result$get_slice_resultStandardScheme.read(Cassandra.java:11504) ~[na:na] at org.apache.cassandra.thrift.Cassandra$get_slice_result$get_slice_resultStandardScheme.read(Cassandra.java:11453) ~[na:na] at org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:11379) ~[na:na] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:653) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:637) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:274) ~[na:na] ... 21 common frames omitted Please have a look at https://issues.apache.org/jira/browse/CASSANDRA-8352 for more details about the issue. Timeout Exception on Node Failure in Remote Data Center --- Key: CASSANDRA-8479 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.11 Reporter: Amit Singh
[jira] [Updated] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center
[ https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anuj updated CASSANDRA-8479: Attachment: TRACE_LOGS.zip Trace level logs for the issue.Please see ReadTimeoutException in System.log.3. Timeout Exception on Node Failure in Remote Data Center --- Key: CASSANDRA-8479 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479 Project: Cassandra Issue Type: Bug Components: API, Core, Tools Environment: Unix, Cassandra 2.0.11 Reporter: Amit Singh Chowdhery Assignee: Ryan McGuire Priority: Minor Attachments: TRACE_LOGS.zip Issue Faced : We have a Geo-red setup with 2 Data centers having 3 nodes each. When we bring down a single Cassandra node down in DC2 by kill -9 Cassandra-pid, reads fail on DC1 with TimedOutException for a brief amount of time (15-20 sec~). Reference : Already a ticket has been opened/resolved and link is provided below : https://issues.apache.org/jira/browse/CASSANDRA-8352 Activity Done as per Resolution Provided : Upgraded to Cassandra 2.0.11 . We have two 3 node clusters in two different DCs and if one or more of the nodes go down in one Data Center , ~5-10% traffic failure is observed on the other. CL: LOCAL_QUORUM RF=3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation
[ https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259892#comment-14259892 ] Anuj commented on CASSANDRA-8382: - As suggested by you, we restarted all Cassandra nodes with new IPs, but nodetool still shows nodes with old IPs. After restarting a 3 node cluster with new IPs, we saw total of 6 nodes in Cassandra cluster: 3 nodes with new IPs and 3 nodes with old IPs (? against them). As data on 3 node cluster hasn't changed at all, we don't want to remove these nodes with old IPs. Removing each node takes about 12 hrs in our case and is expensive. We want a procedure for changing IP address where we need not remove nodes with old IPs manually. Also, the procedure should not cause any downtime. Please suggest. Nodetool output after restarting 3 node cluster with new IPs: Here 10.64.8.90,10.44.172.45 and 10.44.172.34 are old IPs whereas 192.168.0.4,192.168.0.5 and 192.168.0.6 are new IPs. Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving State Address Load Tokens OwnsHost ID Rack DN 10.64.8.90? 90dea63f-793b-40be-abf6-cb4a09a27477 256 17.2%RAC1 UN 192.168.0.4 212.98 KB 64f51130-ae48-4fe1-84ec-70a7e3f8fb3e256 17.3%RAC1 DN 10.44.172.45 ? 9f4d33a3-1d17-4f78-b37e-de272fea72dd 256 17.9%RAC1 UN 192.168.0.6 194.32 KB d83e6e06-690b-4733-a134-ebd721da5ecd256 16.4%RAC1 DN 10.44.172.34 ? 6d3cf257-501f-4d0a-b894-ef9bbf3647ee 256 16.2%RAC1 UN 192.168.0.5 194.3 KB 34db82f6-5176-4d72-8446-2d9c607d0453 256 15.0%RAC1 Procedure followed for changing IPs: 1. Stop all Cassandra nodes 2. Update cassandra.yaml with seeds and listen_address as per new IPs 3. Update cassandra-topology.properties with new IPs 4. Restart Cassandra nodes. Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation --- Key: CASSANDRA-8382 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382 Project: Cassandra Issue Type: Bug Components: Documentation website Environment: Red Hat Linux , Cassandra 2.0.3 Reporter: Anuj Use Case: We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen address and seeds of all nodes are Public IPs while rpc addresses are private IPs. Now, we want to decommission a DC2 and change public IPs in listen address/seeds of DC1 nodes to private IPs as it will be a single DC setup. Issue: Cassandra doesn’t provide any standard procedure for changing IP address of nodes in a cluster. We can bring down nodes, one by one, change their IP address and perform the procedure mentioned in “ Replacing a Dead Node” at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html by mentioning public IP of the node in replace_address option. But procedure recommends that you must set the auto_bootstrap option to true. We don’t want any bootstrap and data streaming to happen as data is already there on nodes. So, our questions is : What’s the standard procedure for changing IP address of Cassandra nodes while making sure that no data streaming occurs and gossip state is not corrupted. We are using vnodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation
Anuj created CASSANDRA-8382: --- Summary: Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation Key: CASSANDRA-8382 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382 Project: Cassandra Issue Type: Bug Components: Documentation website Environment: Red Hat Linux , Cassandra 2.0.3 Reporter: Anuj Use Case: We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen address and seeds of all nodes are Public IPs while rpc addresses are private IPs. Now, we want to decommission a DC2 and change public IPs in listen address/seeds of DC1 nodes to private IPs as it will be a single DC setup. Issue: Cassandra doesn’t provide any standard procedure for changing IP address of nodes in a cluster. We can bring down nodes, one by one, change their IP address and perform the procedure mentioned in “ Replacing a Dead Node” at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html by mentioning public IP of the node in replace_address option. But procedure recommends that you must set the auto_bootstrap option to true. We don’t want any bootstrap and data streaming to happen as data is already there on nodes. So, our questions is : What’s the standard procedure for changing IP address of Cassandra nodes while making sure that no data streaming occurs and gossip state is not corrupted. We are using vnodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)