[jira] [Commented] (CASSANDRA-8150) Revaluate Default JVM tuning parameters

2015-05-04 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526919#comment-14526919
 ] 

Anuj commented on CASSANDRA-8150:
-

We have write heavy workload and used to face promotion failures/long gc pauses 
with Cassandra 2.0.x. I am not into code yet but I think that memtable and 
compaction related objects have mid-life and write heavy workload is not 
suitable for generation collection by default. So, we tuned JVM to make sure 
that minimum objects are promoted to Old Gen and achieved great success in that:

MAX_HEAP_SIZE=12G
HEAP_NEWSIZE=3G
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=20
-XX:CMSInitiatingOccupancyFraction=70
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000
JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

We also think that default total_space_in_mb=1/4 heap is too much for write 
heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in 
order to make sure that memtable related objects dont stay in memory for too 
long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the 
job well. GC was very consistent. No Full GC observed.

Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in 
RAID5.

We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 
reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of 
around 100mb per row

 Revaluate Default JVM tuning parameters
 ---

 Key: CASSANDRA-8150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Reporter: Matt Stump
Assignee: Ryan McGuire
 Attachments: upload.png


 It's been found that the old twitter recommendations of 100m per core up to 
 800m is harmful and should no longer be used.
 Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 
 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 
 1/3 is probably better for releases greater than 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8150) Revaluate Default JVM tuning parameters

2015-05-04 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526919#comment-14526919
 ] 

Anuj edited comment on CASSANDRA-8150 at 5/4/15 5:48 PM:
-

We have write heavy workload and used to face promotion failures/long gc pauses 
with Cassandra 2.0.x. I am not into code yet but I think that memtable and 
compaction related objects have mid-life and write heavy workload is not 
suitable for generation collection by default. So, we tuned JVM to make sure 
that minimum objects are promoted to Old Gen and achieved great success in that:

MAX_HEAP_SIZE=12G
HEAP_NEWSIZE=3G
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=20
-XX:CMSInitiatingOccupancyFraction=70
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000
JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

We also think that default total_memtable_space_in_mb=1/4 heap is too much for 
write heavy loads. By default, young gen is also 1/4 heap.We reduced it to 
1000mb in order to make sure that memtable related objects dont stay in memory 
for too long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 
did the job well. GC was very consistent. No Full GC observed.

Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in 
RAID5.

We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 
reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of 
around 100mb per row


was (Author: eanujwa):
We have write heavy workload and used to face promotion failures/long gc pauses 
with Cassandra 2.0.x. I am not into code yet but I think that memtable and 
compaction related objects have mid-life and write heavy workload is not 
suitable for generation collection by default. So, we tuned JVM to make sure 
that minimum objects are promoted to Old Gen and achieved great success in that:

MAX_HEAP_SIZE=12G
HEAP_NEWSIZE=3G
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=20
-XX:CMSInitiatingOccupancyFraction=70
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000
JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

We also think that default total_space_in_mb=1/4 heap is too much for write 
heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in 
order to make sure that memtable related objects dont stay in memory for too 
long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the 
job well. GC was very consistent. No Full GC observed.

Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in 
RAID5.

We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 
reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of 
around 100mb per row

 Revaluate Default JVM tuning parameters
 ---

 Key: CASSANDRA-8150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Reporter: Matt Stump
Assignee: Ryan McGuire
 Attachments: upload.png


 It's been found that the old twitter recommendations of 100m per core up to 
 800m is harmful and should no longer be used.
 Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 
 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 
 1/3 is probably better for releases greater than 2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-20 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503307#comment-14503307
 ] 

Anuj commented on CASSANDRA-9146:
-

Yes we use vnodes.We havent changed cold_reads_to_ommit

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj
 Attachments: sstables.txt, system-modified.log


 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-20 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503028#comment-14503028
 ] 

Anuj commented on CASSANDRA-9146:
-

Marcus Are you looking into this?

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj
 Attachments: sstables.txt, system-modified.log


 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9177) sstablesplit should provide option to generate sstables of different size

2015-04-12 Thread Anuj (JIRA)
Anuj created CASSANDRA-9177:
---

 Summary: sstablesplit should provide option to generate sstables 
of different size
 Key: CASSANDRA-9177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9177
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Anuj


Currently, sstablesplit generates sstables of same size. As soon as the 
Cassandra is started after executing sstablesplit, these tables are compacted 
back to single huge sstable by STCS. 

Thus, sstablesplit should provide an option to split single sstable into 
sstables of different sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9174) sstablesplit does not splits Secondary Index CFs

2015-04-12 Thread Anuj (JIRA)
Anuj created CASSANDRA-9174:
---

 Summary: sstablesplit does not splits Secondary Index CFs
 Key: CASSANDRA-9174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9174
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
 Environment: Cassandra 2.0.3
Reporter: Anuj
Priority: Minor


When you run sstablesplit on a CF. Secondary Index CFs are not split. If you 
run sstablesplit on a Secondary Index CF's Data.db file separately, it fails 
with the message Unknown keyspace/cf pair



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9175) Major Compaction should provide an option to compact Secondary Index CFs

2015-04-12 Thread Anuj (JIRA)
Anuj created CASSANDRA-9175:
---

 Summary: Major Compaction should provide an option to compact 
Secondary Index CFs
 Key: CASSANDRA-9175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9175
 Project: Cassandra
  Issue Type: Wish
  Components: Tools
Reporter: Anuj
Priority: Minor


Major compaction on a CF does not compacts Secondary Index CFs. It should have 
an option  to compact all Secondary CFs too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487372#comment-14487372
 ] 

Anuj commented on CASSANDRA-9146:
-

Yes, 2.0.3.

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj

 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487423#comment-14487423
 ] 

Anuj commented on CASSANDRA-8938:
-

I think counting it as one read would make the ready latency in cfstats 
misleading as range scan may return nunerous rows and is generally slower. What 
about having a separate Range scan count and latency. Range scan count can be 
equal to rows read in scan.   I think if a range scan reads several rows from a 
sstable it should impact hotness propotionately. Cassandra should not worry 
about the type of workload as data is being read and compaction will be useful 
whether its analytics or oltp.

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Assignee: Marcus Eriksson
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487406#comment-14487406
 ] 

Anuj commented on CASSANDRA-9146:
-

Yes please analyze as we are facing it in 2.0.3.We may need this fix in 2.0 
branch.Upgrading would take some time and scenario is not easily reproducible 
but once it occurs in a cluster this burst of sstables grows with every 
repair.We need to understand why this premature flushing is happenning? Whats 
the work around till we upgrade? If its coldness issue which prevents auto 
compaction then any advise to make all sstables hot ( we tried with reads but 
without success)? Range scan doesnt contribute to hotness ..thats another open 
issue I logged sometime back.

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj

 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487439#comment-14487439
 ] 

Anuj edited comment on CASSANDRA-8938 at 4/9/15 2:45 PM:
-

In our case we do some reporting using full table scans in off hours when 
resources are free. We dont have an Analytics system like Spark. If sstables 
are not compacted it affects reporting performance badly. So I think Cassandra 
should be unbiased. If a data is read actively we will gain performance in 
bothh workloads. Compaction will have cost and will impact transactional 
resources temporarily but further Analytics hits would be much faster and those 
hits will put less load on cassandra resources if tables are compacted. 
Compaction can be throttled anyways.


was (Author: eanujwa):
In our case we do some reporting using full table scans in off hours when 
resources are free. We dont have an Analytics system like Spark. If sstables 
are not compacted it affects reporting performance badly. So I think Cassandra 
should be unbiased. If a data is read actively we will gain performance in 
bothh workloads. Compaction will have cost and will impact transactional 
resources temporarily but further Analytics hits would be much faster and those 
hits will put less load on cassandra resources if tables are compacted. 

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Assignee: Marcus Eriksson
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj updated CASSANDRA-9146:

Attachment: sstables.txt
system-modified.log

Please find attached the logs:
1. system-modified.log = system logs
2. sstables.txt = listing of sstables in ks1cf1 column family in test_ks1 
Keyspace

Repair -pr was running on node on 3 instances everytime creating numerous 
sstables every second:
2015-04-09 09:14:36 TO 2015-04-09 12:07:28
2015-04-09 14:34 (stopped at 15:07)
2015-04-09 15:11 

While only 42 sstables exist for ks1cf1Idx3 as it was compacting 
regularly..other two indexes ks1cf1Idx1 and ks1cf1Idx2 have 8932 sstables.

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj
 Attachments: sstables.txt, system-modified.log


 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487624#comment-14487624
 ] 

Anuj commented on CASSANDRA-8938:
-

Agree !!

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Assignee: Marcus Eriksson
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487439#comment-14487439
 ] 

Anuj commented on CASSANDRA-8938:
-

In our case we do some reporting using full table scans in off hours when 
resources are free. We dont have an Analytics system like Spark. If sstables 
are not compacted it affects reporting performance badly. So I think Cassandra 
should be unbiased. If a data is read actively we will gain performance in 
bothh workloads. Compaction will have cost and will impact transactional 
resources temporarily but further Analytics hits would be much faster and those 
hits will put less load on cassandra resources if tables are compacted. 

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Assignee: Marcus Eriksson
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487385#comment-14487385
 ] 

Anuj commented on CASSANDRA-9146:
-

Small correction at max one gc greater than 200ms PER MINUTE.

 Ever Growing Secondary Index sstables after every Repair
 

 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj

 Cluster has reached a state where every repair -pr operation on CF results 
 in numerous tiny sstables being flushed to disk. Most sstables are related to 
 secondary indexes. Due to thousands of sstables, reads have started timing 
 out. Even though compaction begins for one of the secondary index, sstable 
 count after repair remains very high (thousands). Every repair adds thousands 
 of sstables.
 Problems:
 1. Why burst of tiny secondary index tables are flushed during repair ? What 
 is triggering frequent/premature flush of secondary index sstable (more than 
 hundred in every burst)? At max we see one ParNew GC pauses 200ms.
 2. Why auto-compaction is not compacting all sstables. Is it related to 
 coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
 cold_reads_to_omit=0 by default? 
If coldness is the issue, we are stuck in infinite loop: reads will 
 trigger compaction but reads timeout as sstable count is in thousands
 3. What's the way out if we face this issue in Prod?
 Is this issue fixed in latest production release 2.0.13? Issue looks similar 
 to CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
 fixed in 2.0 branch too. 
 Configuration:
 Compaction Strategy: STCS
 memtable_flush_writers=4
 memtable_flush_queue_size=4
 in_memory_compaction_limit_in_mb=32
 concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9146) Ever Growing Secondary Index sstables after every Repair

2015-04-09 Thread Anuj (JIRA)
Anuj created CASSANDRA-9146:
---

 Summary: Ever Growing Secondary Index sstables after every Repair
 Key: CASSANDRA-9146
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9146
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anuj
Priority: Blocker


Cluster has reached a state where every repair -pr operation on CF results in 
numerous tiny sstables being flushed to disk. Most sstables are related to 
secondary indexes. Due to thousands of sstables, reads have started timing out. 
Even though compaction begins for one of the secondary index, sstable count 
after repair remains very high (thousands). Every repair adds thousands of 
sstables.

Problems:
1. Why burst of tiny secondary index tables are flushed during repair ? What is 
triggering frequent/premature flush of secondary index sstable (more than 
hundred in every burst)? At max we see one ParNew GC pauses 200ms.

2. Why auto-compaction is not compacting all sstables. Is it related to 
coldness issue(CASSANDRA-8885) where compaction doesn't works even when 
cold_reads_to_omit=0 by default? 
   If coldness is the issue, we are stuck in infinite loop: reads will trigger 
compaction but reads timeout as sstable count is in thousands
3. What's the way out if we face this issue in Prod?

Is this issue fixed in latest production release 2.0.13? Issue looks similar to 
CASSANDRA-8641, but the issue is fixed in only 2.1.3. I think it should be 
fixed in 2.0 branch too. 

Configuration:
Compaction Strategy: STCS
memtable_flush_writers=4
memtable_flush_queue_size=4
in_memory_compaction_limit_in_mb=32
concurrent_compactors=12



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8641) Repair causes a large number of tiny SSTables

2015-04-07 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484693#comment-14484693
 ] 

Anuj commented on CASSANDRA-8641:
-

We faced same issue in 2.0.3 when we ran repair with write load. We had 
substantial data to repair. We are planning to upgrade to 2.0.13  soon. Is the 
bug fixed in 2.0.13?

 Repair causes a large number of tiny SSTables
 -

 Key: CASSANDRA-8641
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8641
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04
Reporter: Flavien Charlon
 Fix For: 2.1.3


 I have a 3 nodes cluster with RF = 3, quad core and 32 GB or RAM. I am 
 running 2.1.2 with all the default settings. I'm seeing some strange 
 behaviors during incremental repair (under write load).
 Taking the example of one particular column family, before running an 
 incremental repair, I have about 13 SSTables. After finishing the incremental 
 repair, I have over 114000 SSTables.
 {noformat}
 Table: customers
 SSTable count: 114688
 Space used (live): 97203707290
 Space used (total): 99175455072
 Space used by snapshots (total): 0
 SSTable Compression Ratio: 0.28281112416526505
 Memtable cell count: 0
 Memtable data size: 0
 Memtable switch count: 1069
 Local read count: 0
 Local read latency: NaN ms
 Local write count: 11548705
 Local write latency: 0.030 ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 144145152
 Compacted partition minimum bytes: 311
 Compacted partition maximum bytes: 1996099046
 Compacted partition mean bytes: 3419
 Average live cells per slice (last five minutes): 0.0
 Maximum live cells per slice (last five minutes): 0.0
 Average tombstones per slice (last five minutes): 0.0
 Maximum tombstones per slice (last five minutes): 0.0
 {noformat}
 Looking at the logs during the repair, it seems Cassandra is struggling to 
 compact minuscule memtables (often just a few kilobytes):
 {noformat}
 INFO  [CompactionExecutor:337] 2015-01-17 01:44:27,011 
 CompactionTask.java:251 - Compacted 32 sstables to 
 [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-228341,].
   8,332 bytes to 6,547 (~78% of original) in 80,476ms = 0.78MB/s.  32 
 total partitions merged to 32.  Partition merge counts were {1:32, }
 INFO  [CompactionExecutor:337] 2015-01-17 01:45:35,519 
 CompactionTask.java:251 - Compacted 32 sstables to 
 [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229348,].
   8,384 bytes to 6,563 (~78% of original) in 6,880ms = 0.000910MB/s.  32 
 total partitions merged to 32.  Partition merge counts were {1:32, }
 INFO  [CompactionExecutor:339] 2015-01-17 01:47:46,475 
 CompactionTask.java:251 - Compacted 32 sstables to 
 [/mnt/data/cassandra/data/business/customers-d9d42d209ccc11e48ca54553c90a9d45/business-customers-ka-229351,].
   8,423 bytes to 6,401 (~75% of original) in 10,416ms = 0.000586MB/s.  32 
 total partitions merged to 32.  Partition merge counts were {1:32, }
 {noformat}
  
 Here is an excerpt of the system logs showing the abnormal flushing:
 {noformat}
 INFO  [AntiEntropyStage:1] 2015-01-17 15:28:43,807 ColumnFamilyStore.java:840 
 - Enqueuing flush of customers: 634484 (0%) on-heap, 2599489 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:06,823 ColumnFamilyStore.java:840 
 - Enqueuing flush of levels: 129504 (0%) on-heap, 222168 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:07,940 ColumnFamilyStore.java:840 
 - Enqueuing flush of chain: 4508 (0%) on-heap, 6880 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:08,124 ColumnFamilyStore.java:840 
 - Enqueuing flush of invoices: 1469772 (0%) on-heap, 2542675 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:09,471 ColumnFamilyStore.java:840 
 - Enqueuing flush of customers: 809844 (0%) on-heap, 3364728 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,368 ColumnFamilyStore.java:840 
 - Enqueuing flush of levels: 28212 (0%) on-heap, 44220 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,822 ColumnFamilyStore.java:840 
 - Enqueuing flush of chain: 860 (0%) on-heap, 1130 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:24,985 ColumnFamilyStore.java:840 
 - Enqueuing flush of invoices: 334480 (0%) on-heap, 568959 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:27,375 ColumnFamilyStore.java:840 
 - Enqueuing flush of customers: 221568 (0%) on-heap, 929962 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:35,755 ColumnFamilyStore.java:840 
 - Enqueuing flush of invoices: 7916 (0%) on-heap, 11080 (0%) off-heap
 INFO  [AntiEntropyStage:1] 2015-01-17 15:29:36,239 

[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-03-30 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386793#comment-14386793
 ] 

Anuj commented on CASSANDRA-8938:
-

Even though single partition read and range scan are technically different, 
from Application point of view they are just reads. I feel that scans should 
also make sstables HOT and make them eligible for STCS. 
Regarding nodetool cfstats, if Read count and Read Latency are not including 
scans , Don't you think we should have stats for scan count and latency ?

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Assignee: Marcus Eriksson
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8938) Full Row Scan does not count towards Reads

2015-03-12 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358306#comment-14358306
 ] 

Anuj commented on CASSANDRA-8938:
-

Yes. We mean full row scan (select query without WHERE clause). Even if full 
row scan reads all sstables,it should be considered as Reads and all sstables 
must be marked hot and available for next compaction. 

There is only one Read Count when you do cfstats. We are not talking about 
latency.

We think that after a row scan , read count must be incremented and STCS should 
pick these sstables for compaction as data has been read from them.  

 Full Row Scan does not count towards Reads
 --

 Key: CASSANDRA-8938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8938
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.3
Reporter: Amit Singh Chowdhery
Priority: Minor
  Labels: none

 When a CQL SELECT statement is executed with WHERE clause, Read Count is 
 incremented in cfstats of the column family. But, when a full row scan is 
 done using SELECT statement without WHERE clause, Read Count is not 
 incremented. 
 Similarly, when using Size Tiered Compaction, if we do a full row scan using 
 Hector RangeslicesQuery, Read Count is not incremented in cfstats, Cassandra 
 still considers all sstables as cold and does not trigger compaction for 
 them. If we fire MultigetSliceQuery, Read Count is incremented and sstables 
 becomes hot, triggering compaction of these sstables. 
 Expected Behavior:
 1. Read Count must be incremented by number of rows read during a full row 
 scan done using CQL SELECT statement or Hector RangeslicesQuery.
 2. Size Tiered compaction must consider all sstables as Hot after a full row 
 scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8885) Cold sstable count increases indefinitely in STCS

2015-03-02 Thread Anuj (JIRA)
Anuj created CASSANDRA-8885:
---

 Summary: Cold sstable count increases indefinitely in STCS
 Key: CASSANDRA-8885
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8885
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.0.3
Reporter: Anuj
Priority: Minor


This issue is regarding compaction of cold sstables.
Compaction Strategy for CF: Size Tiered (Min Threshold=4  Max Threshold=8)

Scenario: Run heavy write load (with NO reads) on all 3 nodes of Cassandra 
cluster. After some time, compactions stop and sstable count increases 
continuously to large numbers (90+) . As writes continue, small sstable of 
similar size are added after every few minutes. When writes stop and Cassandra 
is idle, compaction doesn't happen automatically and sstable count is still 
very high. When we start reads on sstables, compaction is automatically 
triggered and sstable count keeps on decreasing to reasonable levels.

We think that behaviour is unexpected because of following reasons:

1. As per documentation 
(https://issues.apache.org/jira/browse/CASSANDRA-6109 and   
http://www.datastax.com/dev/blog/optimizations-around-cold-sstables), 
cold_reads_to_omit is disabled by default in 2.0.3. We are using 2.0.3 and not 
enabled cold_reads_to_omit explicitly. Still, cold sstables are not getting 
compacted. Coldness should not come in picture by default in 2.0.3.
Note:compactionstats show 0 pending tasks while compaction process is 
not happening.

2. In our scenario, we have heavy writes (say for loading data) 
followed by reads. Till the time read starts, you have huge no. of sstables to 
compact.Significant compactions need to be done before read performance is 
restored. 

3. Even when Cassandra is kept idle after writes, compaction doesn't 
happen. Dont you think compaction should not just look at coldness? It should 
also consider load on server and utilize time for compaction when possible. 
Cold sstables can be compacted together. This way we will have much less 
compactions to do if cold sstables turn hot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation

2015-01-30 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298995#comment-14298995
 ] 

Anuj commented on CASSANDRA-8382:
-

Thanks

 Procedure to Change IP Address without Data streaming is Missing in Cassandra 
 Documentation
 ---

 Key: CASSANDRA-8382
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
 Environment: Red Hat Linux , Cassandra 2.0.3
Reporter: Anuj

 Use Case: 
 We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen 
 address and seeds of all nodes are Public IPs while rpc addresses are private 
 IPs.  Now, we want to decommission a DC2 and change public IPs in listen 
 address/seeds of DC1 nodes to private IPs as it will be a single DC setup.
 Issue: 
 Cassandra doesn’t provide any standard procedure for changing IP address of 
 nodes in a cluster. We can bring down nodes, one by one, change their IP 
 address and perform the procedure mentioned in “ Replacing a Dead Node” at 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
   by mentioning public IP of the node in replace_address option. But 
 procedure recommends that you must set the auto_bootstrap option to true.  We 
 don’t want any bootstrap and data streaming to happen as data is already 
 there on nodes. So, our questions is : What’s the standard procedure for 
 changing IP address of Cassandra nodes while making sure that no data 
 streaming occurs and gossip state is not corrupted.
 We are using vnodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation

2015-01-30 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298993#comment-14298993
 ] 

Anuj commented on CASSANDRA-8382:
-

Just bouncing C* didnt remove old IPs. We finally cleared gossip state on all 
nodes by adding following line at end of cassandra-env.sh:

JVM_OPTS=$JVM_OPTS -Dcassandra.load_ring_state=false

This solved our issue. I hope the procedure is correct and we need not clear 
gossip info by other methods e.g. sudo rm -r 
/var/lib/cassandra/data/system/peers/* or executing 
Gossiper.unsafeAssassinateEndpoints(ip_address) via JConsole as mentioned at 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_gossip_purge.html

 Procedure to Change IP Address without Data streaming is Missing in Cassandra 
 Documentation
 ---

 Key: CASSANDRA-8382
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
 Environment: Red Hat Linux , Cassandra 2.0.3
Reporter: Anuj

 Use Case: 
 We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen 
 address and seeds of all nodes are Public IPs while rpc addresses are private 
 IPs.  Now, we want to decommission a DC2 and change public IPs in listen 
 address/seeds of DC1 nodes to private IPs as it will be a single DC setup.
 Issue: 
 Cassandra doesn’t provide any standard procedure for changing IP address of 
 nodes in a cluster. We can bring down nodes, one by one, change their IP 
 address and perform the procedure mentioned in “ Replacing a Dead Node” at 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
   by mentioning public IP of the node in replace_address option. But 
 procedure recommends that you must set the auto_bootstrap option to true.  We 
 don’t want any bootstrap and data streaming to happen as data is already 
 there on nodes. So, our questions is : What’s the standard procedure for 
 changing IP address of Cassandra nodes while making sure that no data 
 streaming occurs and gossip state is not corrupted.
 We are using vnodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center

2015-01-15 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278685#comment-14278685
 ] 

Anuj commented on CASSANDRA-8479:
-

You are correct. These logs are from 2.0.3. As suggested in CASSANDRA 8352, we 
upgraded to 2.0.11 and tested the issue. Same issue was observed there.

We are using read_repair_chance=1,dclocal_read_repair_chance=0. When we made 
read_repair_chance=0, killing node in local and remote DC didn't lead to any 
read failures:) We need your help in understanding the following points:

1. We are using strong consistency i.e. LOCAL_QUORUM for reads  writes. So, 
even if one of the replicas is having obsolete value, we will read latest value 
next time we read data. Does that mean read_repair_chance=1 is not required 
when LOCAL_QUORUM is used for both reads and writes? We must have 
read_repair_chance=0, that will give us better performance without sacrificing 
consistency? What is your recommendation?

2. We are writing to Cassandra at high speeds. Is that the reason we are 
getting digest mismatch during read repair? And that's when Cassandra goes for 
CL.ALL irrespective of the fact that we are using CL.LOCAL_QUORUM?

3. I think read_repair is comparing digests from replicas in remote DC also? 
Isn't that a performance hit? We are currently using Cassandra in 
Active-Passive mode so updating remote DC fast is not our priority. What's 
recommended? I tried setting dclocal_read_repair_chance=1 and 
read_repair_chance=0 in order to make sure that read repairs are only executed 
within the DC. But I noticed that killing local node didn't caused any read 
failures. Does that mean the digest mismatch problem occurs with node at remote 
DC rather than digest of third node which didn't participated in read 
LOCAL_QUORUM?

4.Documentation at 
http://www.datastax.com/docs/1.1/configuration/storage_configuration says that 
read_repair_chance specifies the probability with which read repairs should be 
invoked on non-quorum reads. What is the significance of non-Quorum here? 
We are using LOCAL_QUORUM and still read repair is coming into picture.

Yes. We misunderstood Tracing. Now that you have identified the issue, do you 
still need Tracing?

 


 Timeout Exception on Node Failure in Remote Data Center
 ---

 Key: CASSANDRA-8479
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.11
Reporter: Amit Singh Chowdhery
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: TRACE_LOGS.zip


 Issue Faced :
 We have a Geo-red setup with 2 Data centers having 3 nodes each. When we 
 bring down a single Cassandra node down in DC2 by kill -9 Cassandra-pid, 
 reads fail on DC1 with TimedOutException for a brief amount of time (15-20 
 sec~).
 Reference :
 Already a ticket has been opened/resolved and link is provided below :
 https://issues.apache.org/jira/browse/CASSANDRA-8352
 Activity Done as per Resolution Provided :
 Upgraded to Cassandra 2.0.11 .
 We have two 3 node clusters in two different DCs and if one or more of the 
 nodes go down in one Data Center , ~5-10% traffic failure is observed on the 
 other.
 CL: LOCAL_QUORUM
 RF=3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center

2015-01-09 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270893#comment-14270893
 ] 

Anuj commented on CASSANDRA-8479:
-

I have attached TRACE level logs. You can find multiple ReadTimeoutException in 
System.log.3 . Once we killed Cassandra on one of the nodes in DC2, around 7 
read requests failed for around 17 seconds on DC1 and then everything was back 
to normal. We need to understand why these reads failed when we are using 
LOCAL_QUORUM in our application. Also, in another Cassandra log file 
System.log.2, we saw java.nio.file.NoSuchFileException. 

We got Hector's HTimeoutException in our application logs during these 17 
seconds. 
Stack Trace from application logs:
com.ericsson.rm.service.voucher.InternalServerException: Internal server error, 
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at 
com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.getReservationSlice(CassandraReservation.java:552)
 ~[na:na]
at 
com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.lookup(CassandraReservation.java:499)
 ~[na:na]
at 
com.ericsson.rm.voucher.traffic.VoucherTraffic.getReservedOrPendingVoucher(VoucherTraffic.java:764)
 ~[na:na]
at 
com.ericsson.rm.voucher.traffic.VoucherTraffic.commit(VoucherTraffic.java:686) 
~[na:na]
... 6 common frames omitted
Caused by: com.ericsson.rm.service.cassandra.xa.ConnectionException: 
me.prettyprint.hector.api.exceptions.HTimedOutException: TimedOutException()
at 
com.ericsson.rm.cassandra.xa.keyspace.row.KeyedRowQuery.execute(KeyedRowQuery.java:93)
 ~[na:na]
at 
com.ericsson.rm.voucher.traffic.reservation.cassandra.CassandraReservation.getReservationSlice(CassandraReservation.java:548)
 ~[na:na]
... 9 common frames omitted
Caused by: me.prettyprint.hector.api.exceptions.HTimedOutException: 
TimedOutException()
at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
 ~[na:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:286)
 ~[na:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:269)
 ~[na:na]
at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
 ~[na:na]
at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
 ~[na:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132)
 ~[na:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:290)
 ~[na:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
 ~[na:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
 ~[na:na]
at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
 ~[na:na]
at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101)
 ~[na:na]
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
 ~[na:na]
at 
com.ericsson.rm.cassandra.xa.keyspace.row.KeyedRowQuery.execute(KeyedRowQuery.java:77)
 ~[na:na]
... 10 common frames omitted
Caused by: org.apache.cassandra.thrift.TimedOutException: null
at 
org.apache.cassandra.thrift.Cassandra$get_slice_result$get_slice_resultStandardScheme.read(Cassandra.java:11504)
 ~[na:na]
at 
org.apache.cassandra.thrift.Cassandra$get_slice_result$get_slice_resultStandardScheme.read(Cassandra.java:11453)
 ~[na:na]
at 
org.apache.cassandra.thrift.Cassandra$get_slice_result.read(Cassandra.java:11379)
 ~[na:na]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) 
~[na:na]
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:653) 
~[na:na]
at 
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:637) 
~[na:na]
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:274)
 ~[na:na]
... 21 common frames omitted

Please have a look at https://issues.apache.org/jira/browse/CASSANDRA-8352 for 
more details about the issue.




 Timeout Exception on Node Failure in Remote Data Center
 ---

 Key: CASSANDRA-8479
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.11
Reporter: Amit Singh 

[jira] [Updated] (CASSANDRA-8479) Timeout Exception on Node Failure in Remote Data Center

2015-01-09 Thread Anuj (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj updated CASSANDRA-8479:

Attachment: TRACE_LOGS.zip

Trace level logs for the issue.Please see ReadTimeoutException in System.log.3.

 Timeout Exception on Node Failure in Remote Data Center
 ---

 Key: CASSANDRA-8479
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8479
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core, Tools
 Environment: Unix, Cassandra 2.0.11
Reporter: Amit Singh Chowdhery
Assignee: Ryan McGuire
Priority: Minor
 Attachments: TRACE_LOGS.zip


 Issue Faced :
 We have a Geo-red setup with 2 Data centers having 3 nodes each. When we 
 bring down a single Cassandra node down in DC2 by kill -9 Cassandra-pid, 
 reads fail on DC1 with TimedOutException for a brief amount of time (15-20 
 sec~).
 Reference :
 Already a ticket has been opened/resolved and link is provided below :
 https://issues.apache.org/jira/browse/CASSANDRA-8352
 Activity Done as per Resolution Provided :
 Upgraded to Cassandra 2.0.11 .
 We have two 3 node clusters in two different DCs and if one or more of the 
 nodes go down in one Data Center , ~5-10% traffic failure is observed on the 
 other.
 CL: LOCAL_QUORUM
 RF=3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation

2014-12-28 Thread Anuj (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259892#comment-14259892
 ] 

Anuj commented on CASSANDRA-8382:
-

 As suggested by you, we restarted all Cassandra nodes with new IPs, but 
nodetool still shows nodes with old IPs. After restarting a 3 node cluster with 
new IPs, we saw total of 6 nodes in Cassandra cluster: 3 nodes with new IPs and 
3 nodes with old IPs (? against them). As data on 3 node cluster hasn't changed 
at all, we don't want to remove these nodes with old IPs. Removing each node 
takes about 12 hrs in our case and is expensive. We want a procedure for 
changing IP address where we need not remove nodes with old IPs manually. Also, 
the procedure should not cause any downtime. Please suggest.

Nodetool output after restarting 3 node cluster with new IPs:
Here 10.64.8.90,10.44.172.45 and 10.44.172.34 are old IPs whereas 
192.168.0.4,192.168.0.5 and 192.168.0.6 are new IPs.

Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
State Address   Load  Tokens OwnsHost ID Rack
DN  10.64.8.90?  
90dea63f-793b-40be-abf6-cb4a09a27477  256 17.2%RAC1
UN 192.168.0.4 212.98 KB 
64f51130-ae48-4fe1-84ec-70a7e3f8fb3e256 17.3%RAC1
DN  10.44.172.45   ?  
9f4d33a3-1d17-4f78-b37e-de272fea72dd  256 17.9%RAC1
UN 192.168.0.6 194.32 KB 
d83e6e06-690b-4733-a134-ebd721da5ecd256 16.4%RAC1
DN  10.44.172.34   ?  
6d3cf257-501f-4d0a-b894-ef9bbf3647ee   256 16.2%RAC1
UN 192.168.0.5 194.3 KB   
34db82f6-5176-4d72-8446-2d9c607d0453  256 15.0%RAC1

Procedure followed for changing IPs:
1. Stop all Cassandra nodes
2. Update cassandra.yaml with seeds and listen_address as per new IPs
3. Update cassandra-topology.properties with new IPs
4. Restart Cassandra nodes.



 Procedure to Change IP Address without Data streaming is Missing in Cassandra 
 Documentation
 ---

 Key: CASSANDRA-8382
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
 Environment: Red Hat Linux , Cassandra 2.0.3
Reporter: Anuj

 Use Case: 
 We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen 
 address and seeds of all nodes are Public IPs while rpc addresses are private 
 IPs.  Now, we want to decommission a DC2 and change public IPs in listen 
 address/seeds of DC1 nodes to private IPs as it will be a single DC setup.
 Issue: 
 Cassandra doesn’t provide any standard procedure for changing IP address of 
 nodes in a cluster. We can bring down nodes, one by one, change their IP 
 address and perform the procedure mentioned in “ Replacing a Dead Node” at 
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
   by mentioning public IP of the node in replace_address option. But 
 procedure recommends that you must set the auto_bootstrap option to true.  We 
 don’t want any bootstrap and data streaming to happen as data is already 
 there on nodes. So, our questions is : What’s the standard procedure for 
 changing IP address of Cassandra nodes while making sure that no data 
 streaming occurs and gossip state is not corrupted.
 We are using vnodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8382) Procedure to Change IP Address without Data streaming is Missing in Cassandra Documentation

2014-11-26 Thread Anuj (JIRA)
Anuj created CASSANDRA-8382:
---

 Summary: Procedure to Change IP Address without Data streaming is 
Missing in Cassandra Documentation
 Key: CASSANDRA-8382
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8382
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
 Environment: Red Hat Linux , Cassandra 2.0.3
Reporter: Anuj


Use Case: 
We have a Geo-Red setup with 2 DCs (DC1 and DC2) having 3 nodes each. Listen 
address and seeds of all nodes are Public IPs while rpc addresses are private 
IPs.  Now, we want to decommission a DC2 and change public IPs in listen 
address/seeds of DC1 nodes to private IPs as it will be a single DC setup.

Issue: 
Cassandra doesn’t provide any standard procedure for changing IP address of 
nodes in a cluster. We can bring down nodes, one by one, change their IP 
address and perform the procedure mentioned in “ Replacing a Dead Node” at 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
  by mentioning public IP of the node in replace_address option. But procedure 
recommends that you must set the auto_bootstrap option to true.  We don’t want 
any bootstrap and data streaming to happen as data is already there on nodes. 
So, our questions is : What’s the standard procedure for changing IP address of 
Cassandra nodes while making sure that no data streaming occurs and gossip 
state is not corrupted.

We are using vnodes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)