Git Push Summary
Updated Tags: refs/tags/2.0.5-tentative [deleted] 0191b359f
Git Push Summary
Updated Tags: refs/tags/2.0.5-tentative [created] b71372146
[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit
[ https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889330#comment-13889330 ] Marcus Eriksson commented on CASSANDRA-6364: About the ignore case, lets hard code something for now - rate limit at one log error message per second perhaps? I don't think we should default to 'ignore' in Config.java - if someone does a minor upgrade they most likely wont check NEWS or update their config files to add the new parameter. The shipped config in cassandra.yaml looks wrong, should be commit_failure_policy, not disk_failure_policy I guess There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit -- Key: CASSANDRA-6364 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364 Project: Cassandra Issue Type: Improvement Components: Core Environment: JBOD, single dedicated commit disk Reporter: J. Ryan Earl Assignee: Benedict Fix For: 2.0.5 We're doing fault testing on a pre-production Cassandra cluster. One of the tests was to simulation failure of the commit volume/disk, which in our case is on a dedicated disk. We expected failure of the commit volume to be handled somehow, but what we found was that no action was taken by Cassandra when the commit volume fail. We simulated this simply by pulling the physical disk that backed the commit volume, which resulted in filesystem I/O errors on the mount point. What then happened was that the Cassandra Heap filled up to the point that it was spending 90% of its time doing garbage collection. No errors were logged in regards to the failed commit volume. Gossip on other nodes in the cluster eventually flagged the node as down. Gossip on the local node showed itself as up, and all other nodes as down. The most serious problem was that connections to the coordinator on this node became very slow due to the on-going GC, as I assume uncommitted writes piled up on the JVM heap. What we believe should have happened is that Cassandra should have caught the I/O error and exited with a useful log message, or otherwise done some sort of useful cleanup. Otherwise the node goes into a sort of Zombie state, spending most of its time in GC, and thus slowing down any transactions that happen to use the coordinator on said node. A limit on in-memory, unflushed writes before refusing requests may also work. Point being, something should be done to handle the commit volume dying as doing nothing results in affecting the entire cluster. I should note, we are using: disk_failure_policy: best_effort -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit
[ https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889343#comment-13889343 ] Benedict commented on CASSANDRA-6364: - bq. I don't think we should default to 'ignore' in Config.java Well, I wasn't too sure about this. On the one hand switching the default to stop means we could over cautiously kill user's hosts unexpectedly, maybe resulting in interruption of service (especially, say, our users running on SAN, as much as it is strongly discouraged). Whereas switching to ignore means we may not be durable. Neither are great defaults, but both are better than before. I'm comfortable with both, so if you feel strongly it should be stop, I'll happily switch it. Perhaps I lean slightly in favour of it too, but it depends on if the user favours durability over availability, really, so there doesn't seem a single correct answer to me. Note that the default disk_failure_policy is also ignore, and the prior behaviour was closest to ignore, so introducing a default that results in a failing node is somewhat unprecedented for disk failure. bq. The shipped config in cassandra.yaml looks wrong, should be commit_failure_policy, not disk_failure_policy I guess Right, looks like I didn't update the first or last lines I copy-pasted. Thanks. bq. About the ignore case, lets hard code something for now - rate limit at one log error message per second perhaps? If we're just rate limiting the log messages, I'd say one per minute might be better. But I'm not sure having the threads spin trying to make progress is useful. The PCLES, for instance, will just start burning one core until it can successfully sync, assuming it doesn't actually have to wait each time to encounter the error. Tempted to have a 1s pause after an error during which we just sleep the erroring thread. Another issue that slightly concerns me is what happens if the CLES sync() starts failing, but the append and CLA doesn't. With ignore this could potentially result in us mapping in and allocating huge amounts of disk space, but not being able to sync or clear it. This might either result in lots of swapping, and/or us exceeding by a large margin our max log space goal. Since we never guarantee to keep to this I'm not sure how much of a problem it would be, but an error down to ACLs that stops us syncing one file might potentally end up eating up huge quantities of commit disk space. I'm tempted to have the CLA thread block once it hits twice its goal max space (or maybe introduce a second config parameter for a hard maximum). But I'm also tempted to leave these changes for the 2.1 branch, since it's a fairly specific failure case, and what we have is a big improvement over the current state of affairs. There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit -- Key: CASSANDRA-6364 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364 Project: Cassandra Issue Type: Improvement Components: Core Environment: JBOD, single dedicated commit disk Reporter: J. Ryan Earl Assignee: Benedict Fix For: 2.0.5 We're doing fault testing on a pre-production Cassandra cluster. One of the tests was to simulation failure of the commit volume/disk, which in our case is on a dedicated disk. We expected failure of the commit volume to be handled somehow, but what we found was that no action was taken by Cassandra when the commit volume fail. We simulated this simply by pulling the physical disk that backed the commit volume, which resulted in filesystem I/O errors on the mount point. What then happened was that the Cassandra Heap filled up to the point that it was spending 90% of its time doing garbage collection. No errors were logged in regards to the failed commit volume. Gossip on other nodes in the cluster eventually flagged the node as down. Gossip on the local node showed itself as up, and all other nodes as down. The most serious problem was that connections to the coordinator on this node became very slow due to the on-going GC, as I assume uncommitted writes piled up on the JVM heap. What we believe should have happened is that Cassandra should have caught the I/O error and exited with a useful log message, or otherwise done some sort of useful cleanup. Otherwise the node goes into a sort of Zombie state, spending most of its time in GC, and thus slowing down any transactions that happen to use the coordinator on said node. A limit on in-memory, unflushed writes before refusing requests may also work. Point being, something should
[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit
[ https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889346#comment-13889346 ] Dmitry Shohov commented on CASSANDRA-6628: -- I checked your patch and jvm doesn't crash I also did some performance tests. I don't know the usage pattern for comparator, but my simple tests show that my changes to comparer would make it slower than pure java implementation :) I fully agree that it's better to use java implementation on solaris sparcv9, than change unsafe implementation Cassandra crashes on Solaris sparcv9 using java 64bit - Key: CASSANDRA-6628 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628 Project: Cassandra Issue Type: Bug Components: Core Environment: checked 1.2.x line and 2.0.x Reporter: Dmitry Shohov Assignee: Dmitry Shohov Fix For: 2.0.5 Attachments: solaris_unsafe_fix.patch, tmp.patch When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid. The reason for this crash is some memory allignment related problems and incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, you will see that it checks os.arch before using getLong methods. I have a patch, which check for the os.arch and if it is not one of the known, it reads longs and ints byte by byte. Although patch fixes the problem in cassandra, it will still crash without similar fixes in the lz4 library. I already provided the patch for Unsafe usage in lz4. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit
[ https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889349#comment-13889349 ] Benedict commented on CASSANDRA-6628: - bq. my changes to comparer would make it slower than pure java implementation This isn't very surprising given what they were doing, but always good to have the confirmation :-) It should be possible to make a special unsafe comparer tailored for sparcv9 (and any other aligned access only architectures) that is quite a bit faster, in the manner I mention above, but it's not something we're likely to consider a priority in the near future. As always feel free to have a crack at it yourself and submit, I'd be more than happy to review. Cassandra crashes on Solaris sparcv9 using java 64bit - Key: CASSANDRA-6628 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628 Project: Cassandra Issue Type: Bug Components: Core Environment: checked 1.2.x line and 2.0.x Reporter: Dmitry Shohov Assignee: Dmitry Shohov Fix For: 2.0.5 Attachments: solaris_unsafe_fix.patch, tmp.patch When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid. The reason for this crash is some memory allignment related problems and incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, you will see that it checks os.arch before using getLong methods. I have a patch, which check for the os.arch and if it is not one of the known, it reads longs and ints byte by byte. Although patch fixes the problem in cassandra, it will still crash without similar fixes in the lz4 library. I already provided the patch for Unsafe usage in lz4. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6628) Cassandra crashes on Solaris sparcv9 using java 64bit
[ https://issues.apache.org/jira/browse/CASSANDRA-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889356#comment-13889356 ] Benedict commented on CASSANDRA-6628: - This patch is ready for commit. Cassandra crashes on Solaris sparcv9 using java 64bit - Key: CASSANDRA-6628 URL: https://issues.apache.org/jira/browse/CASSANDRA-6628 Project: Cassandra Issue Type: Bug Components: Core Environment: checked 1.2.x line and 2.0.x Reporter: Dmitry Shohov Assignee: Dmitry Shohov Fix For: 2.0.5 Attachments: solaris_unsafe_fix.patch, tmp.patch When running cassandra 2.0.4 (and other versions) on Solaris and java 64 bit, JVM crashes. Issue is described once in CASSANDRA-4646 but closed as invalid. The reason for this crash is some memory allignment related problems and incorrect sun.misc.Unsafe usage. If you look into DirectByteBuffer in jdk, you will see that it checks os.arch before using getLong methods. I have a patch, which check for the os.arch and if it is not one of the known, it reads longs and ints byte by byte. Although patch fixes the problem in cassandra, it will still crash without similar fixes in the lz4 library. I already provided the patch for Unsafe usage in lz4. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family
Sergio Bossa created CASSANDRA-6645: --- Summary: upgradesstables causes NPE for secondary indexes without an underlying column family Key: CASSANDRA-6645 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sergio Bossa Assignee: Sergio Bossa SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index is not backed by a column family, but this causes an NPE as StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not check for null values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family
[ https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Bossa updated CASSANDRA-6645: Attachment: CASSANDRA-6645.patch upgradesstables causes NPE for secondary indexes without an underlying column family Key: CASSANDRA-6645 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sergio Bossa Assignee: Sergio Bossa Attachments: CASSANDRA-6645.patch SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index is not backed by a column family, but this causes an NPE as StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not check for null values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Git Push Summary
Updated Tags: refs/tags/cassandra-1.2.14 [created] f9bef16ae
Git Push Summary
Updated Tags: refs/tags/1.2.14-tentative [deleted] 6a9314408
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494 ] Marcus Eriksson commented on CASSANDRA-5351: More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction - Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. - If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * Compaction * LCS - We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. - We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS - Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. - Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494 ] Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:18 PM: More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction ** Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. ** If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * LCS ** We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. ** We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS ** Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. ** Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status was (Author: krummas): More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction - Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. - If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * Compaction * LCS - We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. - We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS - Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. - Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status Avoid repairing already-repaired data by default Key:
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889498#comment-13889498 ] Marcus Eriksson commented on CASSANDRA-5351: if its unclear, purpose of the Prepare step is that we want to do anticompaction on all ranges involved in the repair, if we did not do that, we would have to anticompact 3 times for a nodetool repair with RF=3 (and 3*256 times with 256 vnodes (i think)) Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889494#comment-13889494 ] Marcus Eriksson edited comment on CASSANDRA-5351 at 2/3/14 2:30 PM: More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction ** Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. ** If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * LCS ** We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. ** We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS ** Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. ** Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during scrub new sstable is not repaired. * Upgradesstables - Keep repaired status was (Author: krummas): More complete version now pushed to https://github.com/krummas/cassandra/tree/marcuse/5351 Lots of testing required, but i think it is mostly 'feature-complete'; Repair flow is now: # Repair coordinator sends out Prepare messages to all neighbors # All involved parties figure out what sstables should be included in the repair (if full repair, all sstables are included) otherwise only the ones with repairedAt set to 0. Note that we don't do any locking of the sstables, if they are gone when we do anticompaction it is fine - we will repair them next round. # Repair coordinator prepares itself and waits until all neighbors have prepared and sends out TreeRequests. # All nodes calculate merkle trees based on the sstables picked in step #2 # Coordinator waits for replies and then sends AnticompactionRequests to all nodes # If we are doing full repair, we simply skip doing anticompaction. notes; * SSTables are tagged with repairedAt timestamps, compactions keep min(repairedAt) of the included sstables. * nodetool repair defaults to use the old behaviour. Use --incremental to use the new repairs. * anticompaction ** Split an sstable in 2 new ones. One sstable with all keys that were in the repaired ranges and one with unrepaired data. ** If the repaired ranges cover the entire sstable, we rewrite sstable metadata. This means that the optimal way to run incremental repairs is to not do partitioner range repairs etc. * LCS ** We always first check if there are any unrepaired sstables to do STCS on, if there is, we do that. Reasoning being that new data (which needs compaction) is unrepaired. ** We keep all sstables in the LeveledManifest, then filter out the unrepaired ones when getting compaction candidates etc. * STCS ** Major compaction is done by taking the biggest set of sstables - so for a total major compaction, you will need to run nodetool compact twice. ** Minors works the same way, the biggest set of sstables will be compacted. * Streaming - A streamed SSTable keeps its repairedAt time. * BulkLoader - Loaded sstables are unrepaired. * Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair new sstable is not repaired. * Upgradesstables - Keep repaired status Avoid repairing already-repaired data by default Key: CASSANDRA-5351
[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address
[ https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889546#comment-13889546 ] Brandon Williams commented on CASSANDRA-6622: - Can you attach logs from both the replacing node and the node that is failing the stream session? Streaming session failures during node replace using replace_address Key: CASSANDRA-6622 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622 Project: Cassandra Issue Type: Bug Environment: RHEL6, cassandra-2.0.4 Reporter: Ravi Prasad Assignee: Brandon Williams Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 6622-2.0.txt When using replace_address, Gossiper ApplicationState is set to hibernate, which is a down state. We are seeing that the peer nodes are seeing streaming plan request even before the Gossiper on them marks the replacing node as dead. As a result, streaming on peer nodes convicts the replacing node by closing the stream handler. I think, making the StorageService thread on the replacing node, sleep for BROADCAST_INTERVAL before bootstrapping, would avoid this scenario. Relevant logs from peer node (see that the Gossiper on peer node mark the replacing node as down, 2 secs after the streaming init request): {noformat} INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 StreamResultFuture.java (line 116) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) InetAddress /x.x.x.x is now DOWN ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175) at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436) at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358) at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293) at java.lang.Thread.run(Thread.java:722) INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-4851) CQL3: improve support for paginating over composites
[ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-4851: Attachment: 4851.txt Attaching rather simple patch for this. The patch implements the tuple/vector syntax described above (so {{WHERE (c1, c2) (1, 0)}} typically) as that's the easier and imo the most natural syntax anyway when you want to do such a thing. CQL3: improve support for paginating over composites Key: CASSANDRA-4851 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Priority: Minor Fix For: 2.0.6 Attachments: 4851.txt Consider the following table: {noformat} CREATE TABLE test ( k int, c1 int, c2 int, PRIMARY KEY (k, c1, c2) ) {noformat} with the following data: {noformat} k | c1 | c2 0 | 0 | 0 0 | 0 | 1 0 | 1 | 0 0 | 1 | 1 {noformat} Currently, CQL3 allows to slice over either c1 or c2: {noformat} SELECT * FROM test WHERE k = 0 AND c1 0 AND c1 2 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 0 AND c2 2 {noformat} but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say: {noformat} SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 0) OR c2 0) {noformat} but we don't support that. This can make it hard to paginate over say all records for {{k = 0}} (I'm saying can because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do). For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered. I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others. Another solution would be to allow the following syntax: {noformat} SELECT * FROM test WHERE k = 0 AND (c1, c2) (0, 0) {noformat} which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites
[ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889582#comment-13889582 ] Sylvain Lebresne commented on CASSANDRA-4851: - For info, I pushed a dtest too: https://github.com/riptano/cassandra-dtest/commit/da2fb8451b465299c095b320fbfc83c90467a49b CQL3: improve support for paginating over composites Key: CASSANDRA-4851 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0.6 Attachments: 4851.txt Consider the following table: {noformat} CREATE TABLE test ( k int, c1 int, c2 int, PRIMARY KEY (k, c1, c2) ) {noformat} with the following data: {noformat} k | c1 | c2 0 | 0 | 0 0 | 0 | 1 0 | 1 | 0 0 | 1 | 1 {noformat} Currently, CQL3 allows to slice over either c1 or c2: {noformat} SELECT * FROM test WHERE k = 0 AND c1 0 AND c1 2 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 0 AND c2 2 {noformat} but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say: {noformat} SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 0) OR c2 0) {noformat} but we don't support that. This can make it hard to paginate over say all records for {{k = 0}} (I'm saying can because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do). For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered. I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others. Another solution would be to allow the following syntax: {noformat} SELECT * FROM test WHERE k = 0 AND (c1, c2) (0, 0) {noformat} which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5323) Revisit disabled dtests
[ https://issues.apache.org/jira/browse/CASSANDRA-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889590#comment-13889590 ] Michael Shuler commented on CASSANDRA-5323: --- *All* dtest have been running on cassandra-2.0 branch for a few days, without hanging up the entire test :) Revisit disabled dtests --- Key: CASSANDRA-5323 URL: https://issues.apache.org/jira/browse/CASSANDRA-5323 Project: Cassandra Issue Type: Test Reporter: Ryan McGuire Assignee: Michael Shuler The following dtests are disabled in buildbot, if they can be re-enabled great, if they can't can they be fixed? upgrade|decommission|sstable_gen|global_row|putget_2dc|cql3_insert -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6640) Improve custom 2i performance and abstraction
[ https://issues.apache.org/jira/browse/CASSANDRA-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889600#comment-13889600 ] Sam Tunnicliffe commented on CASSANDRA-6640: Second patch LGTM +1 Improve custom 2i performance and abstraction - Key: CASSANDRA-6640 URL: https://issues.apache.org/jira/browse/CASSANDRA-6640 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Miguel Angel Fernandez Diaz Assignee: Miguel Angel Fernandez Diaz Labels: patch, performance Fix For: 2.1 Attachments: 6640.diff, 6640v2.diff With the current implementation, the method update from SecondaryIndexManager forces to insert and delete a cell. That happens because we assume that we need the value of the old cell in order to locate the cell that we are updating in our custom secondary index implementation. However, depending on the implementation, a insert and a delete operations could have much worse performance than a simple update. Moreover, if our custom secondary index doesn't use inverted indexes, we don't really need the old cell information and the key information is enough. Therefore, a good solution would be to make the update method more abstract. Thus, the update method for PerColumnSecondaryIndex would receive also the old cell information and from that point we could decide if we must carry out the delete+insert operation or just a update operation. I attach a patch that implements this solution. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889646#comment-13889646 ] Yuki Morishita commented on CASSANDRA-5351: --- [~krummas] Still reviewing your patch, but I see following problems: * Sequential(Snapshot) repair is now default, so to use this feature users need to do 'nodetool repair -par -inc' since I don't see incremental repair codes in snapshot repair code path. * Performing anti-compaction in repair thread sequentially for all parent sessions seems problematic performance-wise. And for STCS major compaction, I prefer not to change current behavior, dropping compacted SSTable to UNREPAIRED is fine I think. I think it is surprise for users (despite major compaction is not recommended). I'll look deeper but that's what I have currently. Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
git commit: Improve custom 2i performance and abstraction Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for CASSANDRA-6640
Updated Branches: refs/heads/trunk aa29b6af6 - fc91071c0 Improve custom 2i performance and abstraction Patch by Miguel Angel Fernandez Diaz, reviewed by Sam Tunnicliffe for CASSANDRA-6640 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fc91071c Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fc91071c Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fc91071c Branch: refs/heads/trunk Commit: fc91071c01c33500774de83944bf5f937397c089 Parents: aa29b6a Author: Brandon Williams brandonwilli...@apache.org Authored: Mon Feb 3 11:33:37 2014 -0600 Committer: Brandon Williams brandonwilli...@apache.org Committed: Mon Feb 3 11:33:37 2014 -0600 -- .../db/index/AbstractSimplePerColumnSecondaryIndex.java | 7 +-- .../apache/cassandra/db/index/PerColumnSecondaryIndex.java| 3 ++- .../org/apache/cassandra/db/index/SecondaryIndexManager.java | 7 +++ test/unit/org/apache/cassandra/db/RangeTombstoneTest.java | 2 +- .../org/apache/cassandra/db/SecondaryIndexCellSizeTest.java | 2 +- 5 files changed, 12 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java -- diff --git a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java index 5987d7a..e2a6608 100644 --- a/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java +++ b/src/java/org/apache/cassandra/db/index/AbstractSimplePerColumnSecondaryIndex.java @@ -135,9 +135,12 @@ public abstract class AbstractSimplePerColumnSecondaryIndex extends PerColumnSec indexCfs.apply(valueKey, cfi, SecondaryIndexManager.nullUpdater, opGroup, null); } -public void update(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup) -{ +public void update(ByteBuffer rowKey, Cell oldCol, Cell col, OpOrder.Group opGroup) +{ +// insert the new value before removing the old one, so we never have a period +// where the row is invisible to both queries (the opposite seems preferable); see CASSANDRA-5540 insert(rowKey, col, opGroup); +delete(rowKey, oldCol, opGroup); } public void removeIndex(ByteBuffer columnName) http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java -- diff --git a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java index e094c4c..79087d2 100644 --- a/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java +++ b/src/java/org/apache/cassandra/db/index/PerColumnSecondaryIndex.java @@ -49,9 +49,10 @@ public abstract class PerColumnSecondaryIndex extends SecondaryIndex * update a column from the index * * @param rowKey the underlying row key which is indexed + * @param oldCol the previous column info * @param col all the column info */ -public abstract void update(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup); +public abstract void update(ByteBuffer rowKey, Cell oldCol, Cell col, OpOrder.Group opGroup); public String getNameForSystemKeyspace(ByteBuffer column) { http://git-wip-us.apache.org/repos/asf/cassandra/blob/fc91071c/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java -- diff --git a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java index 66e549d..946e3be 100644 --- a/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java +++ b/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java @@ -676,11 +676,10 @@ public class SecondaryIndexManager { if (index instanceof PerColumnSecondaryIndex) { -// insert the new value before removing the old one, so we never have a period -// where the row is invisible to both queries (the opposite seems preferable); see CASSANDRA-5540 if (!cell.isMarkedForDelete(System.currentTimeMillis())) -((PerColumnSecondaryIndex) index).insert(key.key, cell, opGroup); -((PerColumnSecondaryIndex) index).delete(key.key, oldCell, opGroup); +((PerColumnSecondaryIndex) index).update(key.key, oldCell, cell,
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889667#comment-13889667 ] Marcus Eriksson commented on CASSANDRA-5351: [~yukim] thanks for the comments; bq. I don't see incremental repair codes in snapshot repair code path. right.. snapshot repairs... will look at them tomorrow bq. STCS major compaction, I prefer not to change current behavior Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. Maybe we could make major compaction do 2 separate compactions? Ending up with 2 sstables should be fine for users right? bq. Performing anti-compaction in repair thread sequentially for all parent sessions seems problematic performance-wise. I will move anticompaction out of the repair thread as well Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (CASSANDRA-6646) Disk Failure Policy ignores CorruptBlockException
sankalp kohli created CASSANDRA-6646: Summary: Disk Failure Policy ignores CorruptBlockException Key: CASSANDRA-6646 URL: https://issues.apache.org/jira/browse/CASSANDRA-6646 Project: Cassandra Issue Type: Improvement Reporter: sankalp kohli Priority: Minor If Cassandra is using compression and has a bad drive or stable, it will throw an CorruptBlockException. Disk Failure Policy only works if it is an FSError and does not work for IOExceptions like this. We need to better handle such exceptions as it causes nodes to not respond to the co-ordinator causing the client to timeout. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6590) Gossip does not heal after a temporary partition at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889826#comment-13889826 ] Vijay commented on CASSANDRA-6590: -- Sorry was shooting a different message during the startup, fixed and pushed to https://github.com/Vijay2win/cassandra/tree/6590-v3. Thanks! Gossip does not heal after a temporary partition at startup --- Key: CASSANDRA-6590 URL: https://issues.apache.org/jira/browse/CASSANDRA-6590 Project: Cassandra Issue Type: Bug Components: Core Reporter: Brandon Williams Assignee: Vijay Fix For: 2.0.6 Attachments: 0001-CASSANDRA-6590.patch, 0001-logging-for-6590.patch, 6590_disable_echo.txt See CASSANDRA-6571 for background. If a node is partitioned on startup when the echo command is sent, but then the partition heals, the halves of the partition will never mark each other up despite being able to communicate. This stems from CASSANDRA-3533. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
git commit: Let scrub optionally skip broken counter partitions
Updated Branches: refs/heads/cassandra-2.0 b71372146 - 728c4fa9b Let scrub optionally skip broken counter partitions patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9 Branch: refs/heads/cassandra-2.0 Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb Parents: b713721 Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Feb 3 23:01:31 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Feb 3 23:01:31 2014 +0300 -- CHANGES.txt | 4 + NEWS.txt| 12 ++- .../apache/cassandra/db/ColumnFamilyStore.java | 4 +- .../db/compaction/CompactionManager.java| 12 +-- .../cassandra/db/compaction/Scrubber.java | 37 ++--- .../cassandra/service/StorageService.java | 4 +- .../cassandra/service/StorageServiceMBean.java | 2 +- .../org/apache/cassandra/tools/NodeCmd.java | 6 +- .../org/apache/cassandra/tools/NodeProbe.java | 4 +- .../cassandra/tools/StandaloneScrubber.java | 6 +- .../apache/cassandra/tools/NodeToolHelp.yaml| 6 +- .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++-- 12 files changed, 140 insertions(+), 38 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 13b4c5b..a1a58a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,3 +1,7 @@ +2.0.6 + * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) + + 2.0.5 * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609) * Add ks.cf names to tombstone logging (CASSANDRA-6597) http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index 92446c8..b21fbaa 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -14,11 +14,21 @@ restore snapshots created with the previous major version using the using the provided 'sstableupgrade' tool. +2.0.6 += + +New features + +- Scrub can now optionally skip corrupt counter partitions. Please note + that this will lead to the loss of all the counter updates in the skipped + partition. See the --skip-corrupted option. + + 2.0.5 = New features - + - Batchlog replay can be, and is throttled by default now. See batchlog_replay_throttle_in_kb setting in cassandra.yaml. http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 8750026..38d87db 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean CompactionManager.instance.performCleanup(ColumnFamilyStore.this, renewer); } -public void scrub(boolean disableSnapshot) throws ExecutionException, InterruptedException +public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws ExecutionException, InterruptedException { // skip snapshot creation during scrub, SEE JIRA 5891 if(!disableSnapshot) snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis()); -CompactionManager.instance.performScrub(ColumnFamilyStore.this); +CompactionManager.instance.performScrub(ColumnFamilyStore.this, skipCorrupted); } public void sstablesRewrite(boolean excludeCurrentVersion) throws ExecutionException, InterruptedException http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index 168ee02..48900c8 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -227,13 +227,13 @@ public class CompactionManager implements CompactionManagerMBean executor.submit(runnable).get(); } -public void
[jira] [Updated] (CASSANDRA-6622) Streaming session failures during node replace using replace_address
[ https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prasad updated CASSANDRA-6622: --- Attachment: logs.tgz Streaming session failures during node replace using replace_address Key: CASSANDRA-6622 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622 Project: Cassandra Issue Type: Bug Environment: RHEL6, cassandra-2.0.4 Reporter: Ravi Prasad Assignee: Brandon Williams Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 6622-2.0.txt, logs.tgz When using replace_address, Gossiper ApplicationState is set to hibernate, which is a down state. We are seeing that the peer nodes are seeing streaming plan request even before the Gossiper on them marks the replacing node as dead. As a result, streaming on peer nodes convicts the replacing node by closing the stream handler. I think, making the StorageService thread on the replacing node, sleep for BROADCAST_INTERVAL before bootstrapping, would avoid this scenario. Relevant logs from peer node (see that the Gossiper on peer node mark the replacing node as down, 2 secs after the streaming init request): {noformat} INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 StreamResultFuture.java (line 116) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) InetAddress /x.x.x.x is now DOWN ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175) at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436) at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358) at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293) at java.lang.Thread.run(Thread.java:722) INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6622) Streaming session failures during node replace using replace_address
[ https://issues.apache.org/jira/browse/CASSANDRA-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889856#comment-13889856 ] Ravi Prasad commented on CASSANDRA-6622: In attached logs, .72 was the replacing node, .73 is where the streaming session failed. I had trace logging turned on in .73 for org.apache.cassandra.gms. Looks like, it is FailureDetector is convicting. I have to mention that this was with '0001-don-t-signal-restart-of-dead-states.txt' applied on cassandra-2.0.4. Streaming session failures during node replace using replace_address Key: CASSANDRA-6622 URL: https://issues.apache.org/jira/browse/CASSANDRA-6622 Project: Cassandra Issue Type: Bug Environment: RHEL6, cassandra-2.0.4 Reporter: Ravi Prasad Assignee: Brandon Williams Attachments: 0001-don-t-signal-restart-of-dead-states.txt, 6622-2.0.txt, logs.tgz When using replace_address, Gossiper ApplicationState is set to hibernate, which is a down state. We are seeing that the peer nodes are seeing streaming plan request even before the Gossiper on them marks the replacing node as dead. As a result, streaming on peer nodes convicts the replacing node by closing the stream handler. I think, making the StorageService thread on the replacing node, sleep for BROADCAST_INTERVAL before bootstrapping, would avoid this scenario. Relevant logs from peer node (see that the Gossiper on peer node mark the replacing node as down, 2 secs after the streaming init request): {noformat} INFO [STREAM-INIT-/x.x.x.x:46436] 2014-01-26 20:42:24,388 StreamResultFuture.java (line 116) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Received streaming plan for Bootstrap INFO [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [GossipTasks:1] 2014-01-26 20:42:25,240 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed INFO [GossipStage:1] 2014-01-26 20:42:25,242 Gossiper.java (line 850) InetAddress /x.x.x.x is now DOWN ERROR [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,766 StreamSession.java (line 410) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:175) at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436) at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358) at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293) at java.lang.Thread.run(Thread.java:722) INFO [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 181) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Session with /x.x.x.x is complete WARN [STREAM-IN-/x.x.x.x] 2014-01-26 20:42:25,768 StreamResultFuture.java (line 210) [Stream #5c6cd940-86ca-11e3-90a0-411b913c0e88] Stream failed {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[2/4] Merge branch 'cassandra-2.0' into trunk
http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/test/unit/org/apache/cassandra/db/ScrubTest.java -- diff --cc test/unit/org/apache/cassandra/db/ScrubTest.java index 38c8b62,08dd435..d8ab9ff --- a/test/unit/org/apache/cassandra/db/ScrubTest.java +++ b/test/unit/org/apache/cassandra/db/ScrubTest.java @@@ -39,9 -41,9 +41,11 @@@ import org.apache.cassandra.db.compacti import org.apache.cassandra.exceptions.ConfigurationException; import org.apache.cassandra.db.columniterator.IdentityQueryFilter; import org.apache.cassandra.db.compaction.CompactionManager; ++import org.apache.cassandra.exceptions.WriteTimeoutException; import org.apache.cassandra.io.sstable.*; import org.apache.cassandra.utils.ByteBufferUtil; ++import static org.apache.cassandra.Util.cellname; import static org.apache.cassandra.Util.column; import static org.junit.Assert.assertEquals; import static org.junit.Assert.fail; @@@ -76,6 -79,53 +81,53 @@@ public class ScrubTest extends SchemaLo } @Test -public void testScrubCorruptedCounterRow() throws IOException, InterruptedException, ExecutionException ++public void testScrubCorruptedCounterRow() throws IOException, InterruptedException, ExecutionException, WriteTimeoutException + { + CompactionManager.instance.disableAutoCompaction(); + Keyspace keyspace = Keyspace.open(KEYSPACE); + ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(COUNTER_CF); + cfs.clearUnsafe(); + + fillCounterCF(cfs, 2); + + ListRow rows = cfs.getRangeSlice(Util.range(, ), null, new IdentityQueryFilter(), 1000); + assertEquals(2, rows.size()); + + SSTableReader sstable = cfs.getSSTables().iterator().next(); + + // overwrite one row with garbage + long row0Start = sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(0), sstable.partitioner), SSTableReader.Operator.EQ).position; + long row1Start = sstable.getPosition(RowPosition.forKey(ByteBufferUtil.bytes(1), sstable.partitioner), SSTableReader.Operator.EQ).position; + long startPosition = row0Start row1Start ? row0Start : row1Start; + long endPosition = row0Start row1Start ? row1Start : row0Start; + + RandomAccessFile file = new RandomAccessFile(sstable.getFilename(), rw); + file.seek(startPosition); + file.writeBytes(StringUtils.repeat('z', (int) (endPosition - startPosition))); + file.close(); + + // with skipCorrupted == false, the scrub is expected to fail + Scrubber scrubber = new Scrubber(cfs, sstable, false); + try + { + scrubber.scrub(); + fail(Expected a CorruptSSTableException to be thrown); + } + catch (IOError err) {} + + // with skipCorrupted == true, the corrupt row will be skipped + scrubber = new Scrubber(cfs, sstable, true); + scrubber.scrub(); + scrubber.close(); + cfs.replaceCompactedSSTables(Collections.singletonList(sstable), Collections.singletonList(scrubber.getNewSSTable()), OperationType.SCRUB); + assertEquals(1, cfs.getSSTables().size()); + + // verify that we can read all of the rows, and there is now one less row + rows = cfs.getRangeSlice(Util.range(, ), null, new IdentityQueryFilter(), 1000); + assertEquals(1, rows.size()); + } + + @Test public void testScrubDeletedRow() throws IOException, ExecutionException, InterruptedException, ConfigurationException { CompactionManager.instance.disableAutoCompaction(); @@@ -207,4 -256,20 +258,20 @@@ cfs.forceBlockingFlush(); } + -protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) throws ExecutionException, InterruptedException, IOException ++protected void fillCounterCF(ColumnFamilyStore cfs, int rowsPerSSTable) throws ExecutionException, InterruptedException, IOException, WriteTimeoutException + { + for (int i = 0; i rowsPerSSTable; i++) + { + String key = String.valueOf(i); + ColumnFamily cf = TreeMapBackedSortedColumns.factory.create(KEYSPACE, COUNTER_CF); -RowMutation rm = new RowMutation(KEYSPACE, ByteBufferUtil.bytes(key), cf); -rm.addCounter(COUNTER_CF, ByteBufferUtil.bytes(Column1), 100); ++Mutation rm = new Mutation(KEYSPACE, ByteBufferUtil.bytes(key), cf); ++rm.addCounter(COUNTER_CF, cellname(Column1), 100); + CounterMutation cm = new CounterMutation(rm, ConsistencyLevel.ONE); + cm.apply(); + } + + cfs.forceBlockingFlush(); + } + -} +}
[1/4] git commit: Let scrub optionally skip broken counter partitions
Updated Branches: refs/heads/trunk fc91071c0 - 63f110b5e Let scrub optionally skip broken counter partitions patch by Tyler Hobbs; reviewed by Aleksey Yeschenko for CASSANDRA-5930 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/728c4fa9 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/728c4fa9 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/728c4fa9 Branch: refs/heads/trunk Commit: 728c4fa9bf2b2c11dbc61c8e5536b1542abc1ccb Parents: b713721 Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Feb 3 23:01:31 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Feb 3 23:01:31 2014 +0300 -- CHANGES.txt | 4 + NEWS.txt| 12 ++- .../apache/cassandra/db/ColumnFamilyStore.java | 4 +- .../db/compaction/CompactionManager.java| 12 +-- .../cassandra/db/compaction/Scrubber.java | 37 ++--- .../cassandra/service/StorageService.java | 4 +- .../cassandra/service/StorageServiceMBean.java | 2 +- .../org/apache/cassandra/tools/NodeCmd.java | 6 +- .../org/apache/cassandra/tools/NodeProbe.java | 4 +- .../cassandra/tools/StandaloneScrubber.java | 6 +- .../apache/cassandra/tools/NodeToolHelp.yaml| 6 +- .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++-- 12 files changed, 140 insertions(+), 38 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 13b4c5b..a1a58a3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,3 +1,7 @@ +2.0.6 + * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) + + 2.0.5 * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609) * Add ks.cf names to tombstone logging (CASSANDRA-6597) http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/NEWS.txt -- diff --git a/NEWS.txt b/NEWS.txt index 92446c8..b21fbaa 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -14,11 +14,21 @@ restore snapshots created with the previous major version using the using the provided 'sstableupgrade' tool. +2.0.6 += + +New features + +- Scrub can now optionally skip corrupt counter partitions. Please note + that this will lead to the loss of all the counter updates in the skipped + partition. See the --skip-corrupted option. + + 2.0.5 = New features - + - Batchlog replay can be, and is throttled by default now. See batchlog_replay_throttle_in_kb setting in cassandra.yaml. http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java -- diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java index 8750026..38d87db 100644 --- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java +++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java @@ -1115,12 +1115,12 @@ public class ColumnFamilyStore implements ColumnFamilyStoreMBean CompactionManager.instance.performCleanup(ColumnFamilyStore.this, renewer); } -public void scrub(boolean disableSnapshot) throws ExecutionException, InterruptedException +public void scrub(boolean disableSnapshot, boolean skipCorrupted) throws ExecutionException, InterruptedException { // skip snapshot creation during scrub, SEE JIRA 5891 if(!disableSnapshot) snapshotWithoutFlush(pre-scrub- + System.currentTimeMillis()); -CompactionManager.instance.performScrub(ColumnFamilyStore.this); +CompactionManager.instance.performScrub(ColumnFamilyStore.this, skipCorrupted); } public void sstablesRewrite(boolean excludeCurrentVersion) throws ExecutionException, InterruptedException http://git-wip-us.apache.org/repos/asf/cassandra/blob/728c4fa9/src/java/org/apache/cassandra/db/compaction/CompactionManager.java -- diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java index 168ee02..48900c8 100644 --- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java +++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java @@ -227,13 +227,13 @@ public class CompactionManager implements CompactionManagerMBean executor.submit(runnable).get(); } -public void performScrub(ColumnFamilyStore
[4/4] git commit: Merge branch 'cassandra-2.0' into trunk
Merge branch 'cassandra-2.0' into trunk Conflicts: CHANGES.txt src/java/org/apache/cassandra/tools/NodeCmd.java src/resources/org/apache/cassandra/tools/NodeToolHelp.yaml Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/63f110b5 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/63f110b5 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/63f110b5 Branch: refs/heads/trunk Commit: 63f110b5e058217c1d7e3d178b367b918ca2f856 Parents: fc91071 728c4fa Author: Aleksey Yeschenko alek...@apache.org Authored: Mon Feb 3 23:32:23 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Mon Feb 3 23:32:23 2014 +0300 -- CHANGES.txt | 4 + NEWS.txt| 12 ++- .../apache/cassandra/db/ColumnFamilyStore.java | 4 +- .../db/compaction/CompactionManager.java| 12 +-- .../cassandra/db/compaction/Scrubber.java | 37 ++--- .../cassandra/service/StorageService.java | 4 +- .../cassandra/service/StorageServiceMBean.java | 2 +- .../org/apache/cassandra/tools/NodeProbe.java | 4 +- .../org/apache/cassandra/tools/NodeTool.java| 11 ++- .../cassandra/tools/StandaloneScrubber.java | 6 +- .../unit/org/apache/cassandra/db/ScrubTest.java | 81 ++-- 11 files changed, 141 insertions(+), 36 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/CHANGES.txt -- diff --cc CHANGES.txt index 6ca163a,a1a58a3..f9da65c --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,32 -1,7 +1,36 @@@ +2.1 + * add listsnapshots command to nodetool (CASSANDRA-5742) + * Introduce AtomicBTreeColumns (CASSANDRA-6271) + * Multithreaded commitlog (CASSANDRA-3578) + * allocate fixed index summary memory pool and resample cold index summaries + to use less memory (CASSANDRA-5519) + * Removed multithreaded compaction (CASSANDRA-6142) + * Parallelize fetching rows for low-cardinality indexes (CASSANDRA-1337) + * change logging from log4j to logback (CASSANDRA-5883) + * switch to LZ4 compression for internode communication (CASSANDRA-5887) + * Stop using Thrift-generated Index* classes internally (CASSANDRA-5971) + * Remove 1.2 network compatibility code (CASSANDRA-5960) + * Remove leveled json manifest migration code (CASSANDRA-5996) + * Remove CFDefinition (CASSANDRA-6253) + * Use AtomicIntegerFieldUpdater in RefCountedMemory (CASSANDRA-6278) + * User-defined types for CQL3 (CASSANDRA-5590) + * Use of o.a.c.metrics in nodetool (CASSANDRA-5871, 6406) + * Batch read from OTC's queue and cleanup (CASSANDRA-1632) + * Secondary index support for collections (CASSANDRA-4511, 6383) + * SSTable metadata(Stats.db) format change (CASSANDRA-6356) + * Push composites support in the storage engine + (CASSANDRA-5417, CASSANDRA-6520) + * Add snapshot space used to cfstats (CASSANDRA-6231) + * Add cardinality estimator for key count estimation (CASSANDRA-5906) + * CF id is changed to be non-deterministic. Data dir/key cache are created + uniquely for CF id (CASSANDRA-5202) + * New counters implementation (CASSANDRA-6504) + + + 2.0.6 + * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) + + 2.0.5 * Reduce garbage generated by bloom filter lookups (CASSANDRA-6609) * Add ks.cf names to tombstone logging (CASSANDRA-6597) http://git-wip-us.apache.org/repos/asf/cassandra/blob/63f110b5/NEWS.txt -- diff --cc NEWS.txt index 72b898e,b21fbaa..185f60c --- a/NEWS.txt +++ b/NEWS.txt @@@ -13,37 -13,17 +13,47 @@@ restore snapshots created with the prev 'sstableloader' tool. You can upgrade the file format of your snapshots using the provided 'sstableupgrade' tool. +2.1 +=== + +New features + + - SSTable data directory name is slightly changed. Each directory will + have hex string appended after CF name, e.g. + ks/cf-5be396077b811e3a3ab9dc4b9ac088d/ + This hex string part represents unique ColumnFamily ID. + Note that existing directories are used as is, so only newly created + directories after upgrade have new directory name format. + - Saved key cache files also have ColumnFamily ID in their file name. + +Upgrading +- + - Rolling upgrades from anything pre-2.0.5 is not supported. + - For leveled compaction users, 2.0 must be atleast started before + upgrading to 2.1 due to the fact that the old JSON leveled + manifest is migrated into the sstable metadata files on startup + in 2.0 and this code is gone from 2.1. + - For size-tiered compaction users, Cassandra now defaults to ignoring +
[Cassandra Wiki] Trivial Update of ThirdPartySupport by AlekseyYeschenko
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The ThirdPartySupport page has been changed by AlekseyYeschenko: https://wiki.apache.org/cassandra/ThirdPartySupport?action=diffrev1=42rev2=43 Companies providing support for Apache Cassandra are not endorsed by the Apache Software Foundation, although some of these companies employ [[Committers]] to the Apache project. - Companies that employ Apache Cassandra [[Committers]]: + == Companies that employ Apache Cassandra Committers: == {{http://www.datastax.com/wp-content/themes/datastax-custom/images/logo.png}} [[http://datastax.com|Datastax]], the commercial leader in Apache Cassandraâ„¢ offers products and services that make it easy for customers to build, deploy and operate elastically scalable and cloud-optimized applications and data services. [[http://datastax.com|DataStax]] has over 100 customers, including leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and [[http://www.datastax.com/cassandrausers|more]], and spanning verticals including web, financial services, telecommunications, logistics and government. - Other companies: + == Other companies: == {{http://www.acunu.com/uploads/1/1/5/5/11559475/1335714080.png}} [[http://www.acunu.com|Acunu]] are world experts in Apache Cassandra and beyond. Some of the most challenging Cassandra deployments already rely on Acunu's technology, training and support. With a focus real time applications, Acunu makes it easy to build Cassandra based real-time Big Data solutions that derive instant answers from event streams and deliver fresh insight
[jira] [Commented] (CASSANDRA-6572) Workload recording / playback
[ https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889939#comment-13889939 ] Lyuben Todorov commented on CASSANDRA-6572: --- I think it would be easier for users to understand what's going on if we record the CQL query string in QP#processStatement and pass it to a function is SP (so the majority of the work can be done in SP but still allow us to capture the CQL string which is easy to understand) and then save that to a system table (not though about the name yet) along with the timestamp of execution this will give us a good starting point. Workload recording / playback - Key: CASSANDRA-6572 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Jonathan Ellis Assignee: Lyuben Todorov Fix For: 2.0.6 Write sample mode gets us part way to testing new versions against a real world workload, but we need an easy way to test the query side as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a trace
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889940#comment-13889940 ] Robert Coli commented on CASSANDRA-6510: This bug seems to have the implication that no ConsistencyLevel has had its supposed meaning for the duration of the bug, because there is no guarantee that the acknowledged-to-the-client local write actually succeeds? Is that correct? If so, this issue seems quite fundamental and serious; why did automated testing not surface it? Is there now a test which covers this case? What is the since for this issue? Looks like at least 1.2.0? Don't drop local mutations without a trace -- Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a trace. SP.insertLocal() should be using LocalMutationRunnable instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a trace
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889949#comment-13889949 ] Jonathan Ellis commented on CASSANDRA-6510: --- Nope, that's not the implication. You can easily see from the code that {{responseHandler.response}} only gets called after {{rm.apply}}. That is, no write is acknowledge if it hasn't actually been applied. Don't drop local mutations without a trace -- Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a trace. SP.insertLocal() should be using LocalMutationRunnable instead. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6510) Don't drop local mutations without a hint
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-6510: - Description: SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a hint. SP.insertLocal() should be using LocalMutationRunnable instead. Note: hints are the context here, not consistency. was:SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a trace. SP.insertLocal() should be using LocalMutationRunnable instead. Summary: Don't drop local mutations without a hint (was: Don't drop local mutations without a trace) Don't drop local mutations without a hint - Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a hint. SP.insertLocal() should be using LocalMutationRunnable instead. Note: hints are the context here, not consistency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6510) Don't drop local mutations without a hint
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889968#comment-13889968 ] Robert Coli commented on CASSANDRA-6510: Thanks for the clarification. Others who look to JIRA to understand impact will appreciate not having to try to deduce it from reading the patch. What is the since for this issue? Looks like at least 1.2.0? Don't drop local mutations without a hint - Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a hint. SP.insertLocal() should be using LocalMutationRunnable instead. Note: hints are the context here, not consistency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
git commit: Unscarify 6510 CHANGES
Updated Branches: refs/heads/cassandra-1.2 3f9875c7f - 814a91209 Unscarify 6510 CHANGES Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120 Branch: refs/heads/cassandra-1.2 Commit: 814a91209418206f791eda1cebc83262c9e225f0 Parents: 3f9875c Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:00:29 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:00:29 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 68bed3b..981f977 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -5,7 +5,7 @@ * Allow executing CREATE statements multiple times (CASSANDRA-6471) * Don't send confusing info with timeouts (CASSANDRA-6491) * Don't resubmit counter mutation runnables internally (CASSANDRA-6427) - * Don't drop local mutations without a trace (CASSANDRA-6510) + * Don't drop local mutations without a hint (CASSANDRA-6510) * Don't allow null max_hint_window_in_ms (CASSANDRA-6419) * Validate SliceRange start and finish lengths (CASSANDRA-6521) * fsync compression metadata (CASSANDRA-6531)
[1/3] git commit: Unscarify 6510 CHANGES
Updated Branches: refs/heads/trunk 63f110b5e - 78f71420c Unscarify 6510 CHANGES Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120 Branch: refs/heads/trunk Commit: 814a91209418206f791eda1cebc83262c9e225f0 Parents: 3f9875c Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:00:29 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:00:29 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 68bed3b..981f977 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -5,7 +5,7 @@ * Allow executing CREATE statements multiple times (CASSANDRA-6471) * Don't send confusing info with timeouts (CASSANDRA-6491) * Don't resubmit counter mutation runnables internally (CASSANDRA-6427) - * Don't drop local mutations without a trace (CASSANDRA-6510) + * Don't drop local mutations without a hint (CASSANDRA-6510) * Don't allow null max_hint_window_in_ms (CASSANDRA-6419) * Validate SliceRange start and finish lengths (CASSANDRA-6521) * fsync compression metadata (CASSANDRA-6531)
[2/3] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0
Merge branch 'cassandra-1.2' into cassandra-2.0 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/066d00ba Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/066d00ba Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/066d00ba Branch: refs/heads/trunk Commit: 066d00ba5183a3a37b962334d0442edaaf9bebc8 Parents: 728c4fa 814a912 Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:01:42 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:01:42 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/066d00ba/CHANGES.txt -- diff --cc CHANGES.txt index a1a58a3,981f977..4440942 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -77,51 -45,9 +77,51 @@@ Merged from 1.2 (CASSANDRA-6413) * (Hadoop) add describe_local_ring (CASSANDRA-6268) * Fix handling of concurrent directory creation failure (CASSANDRA-6459) + * Allow executing CREATE statements multiple times (CASSANDRA-6471) + * Don't send confusing info with timeouts (CASSANDRA-6491) + * Don't resubmit counter mutation runnables internally (CASSANDRA-6427) - * Don't drop local mutations without a trace (CASSANDRA-6510) ++ * Don't drop local mutations without a hint (CASSANDRA-6510) + * Don't allow null max_hint_window_in_ms (CASSANDRA-6419) + * Validate SliceRange start and finish lengths (CASSANDRA-6521) -1.2.12 +2.0.3 + * Fix FD leak on slice read path (CASSANDRA-6275) + * Cancel read meter task when closing SSTR (CASSANDRA-6358) + * free off-heap IndexSummary during bulk (CASSANDRA-6359) + * Recover from IOException in accept() thread (CASSANDRA-6349) + * Improve Gossip tolerance of abnormally slow tasks (CASSANDRA-6338) + * Fix trying to hint timed out counter writes (CASSANDRA-6322) + * Allow restoring specific columnfamilies from archived CL (CASSANDRA-4809) + * Avoid flushing compaction_history after each operation (CASSANDRA-6287) + * Fix repair assertion error when tombstones expire (CASSANDRA-6277) + * Skip loading corrupt key cache (CASSANDRA-6260) + * Fixes for compacting larger-than-memory rows (CASSANDRA-6274) + * Compact hottest sstables first and optionally omit coldest from + compaction entirely (CASSANDRA-6109) + * Fix modifying column_metadata from thrift (CASSANDRA-6182) + * cqlsh: fix LIST USERS output (CASSANDRA-6242) + * Add IRequestSink interface (CASSANDRA-6248) + * Update memtable size while flushing (CASSANDRA-6249) + * Provide hooks around CQL2/CQL3 statement execution (CASSANDRA-6252) + * Require Permission.SELECT for CAS updates (CASSANDRA-6247) + * New CQL-aware SSTableWriter (CASSANDRA-5894) + * Reject CAS operation when the protocol v1 is used (CASSANDRA-6270) + * Correctly throw error when frame too large (CASSANDRA-5981) + * Fix serialization bug in PagedRange with 2ndary indexes (CASSANDRA-6299) + * Fix CQL3 table validation in Thrift (CASSANDRA-6140) + * Fix bug missing results with IN clauses (CASSANDRA-6327) + * Fix paging with reversed slices (CASSANDRA-6343) + * Set minTimestamp correctly to be able to drop expired sstables (CASSANDRA-6337) + * Support NaN and Infinity as float literals (CASSANDRA-6003) + * Remove RF from nodetool ring output (CASSANDRA-6289) + * Fix attempting to flush empty rows (CASSANDRA-6374) + * Fix potential out of bounds exception when paging (CASSANDRA-6333) +Merged from 1.2: + * Optimize FD phi calculation (CASSANDRA-6386) + * Improve initial FD phi estimate when starting up (CASSANDRA-6385) + * Don't list CQL3 table in CLI describe even if named explicitely + (CASSANDRA-5750) * Invalidate row cache when dropping CF (CASSANDRA-6351) * add non-jamm path for cached statements (CASSANDRA-6293) * (Hadoop) Require CFRR batchSize to be at least 2 (CASSANDRA-6114)
[1/2] git commit: Unscarify 6510 CHANGES
Updated Branches: refs/heads/cassandra-2.0 728c4fa9b - 066d00ba5 Unscarify 6510 CHANGES Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/814a9120 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/814a9120 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/814a9120 Branch: refs/heads/cassandra-2.0 Commit: 814a91209418206f791eda1cebc83262c9e225f0 Parents: 3f9875c Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:00:29 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:00:29 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/814a9120/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 68bed3b..981f977 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -5,7 +5,7 @@ * Allow executing CREATE statements multiple times (CASSANDRA-6471) * Don't send confusing info with timeouts (CASSANDRA-6491) * Don't resubmit counter mutation runnables internally (CASSANDRA-6427) - * Don't drop local mutations without a trace (CASSANDRA-6510) + * Don't drop local mutations without a hint (CASSANDRA-6510) * Don't allow null max_hint_window_in_ms (CASSANDRA-6419) * Validate SliceRange start and finish lengths (CASSANDRA-6521) * fsync compression metadata (CASSANDRA-6531)
[3/3] git commit: Merge branch 'cassandra-2.0' into trunk
Merge branch 'cassandra-2.0' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/78f71420 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/78f71420 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/78f71420 Branch: refs/heads/trunk Commit: 78f71420c33f588dcbb82bcbd689bb4aad6dd92f Parents: 63f110b 066d00b Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:02:08 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:02:08 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/78f71420/CHANGES.txt --
[2/2] git commit: Merge branch 'cassandra-1.2' into cassandra-2.0
Merge branch 'cassandra-1.2' into cassandra-2.0 Conflicts: CHANGES.txt Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/066d00ba Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/066d00ba Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/066d00ba Branch: refs/heads/cassandra-2.0 Commit: 066d00ba5183a3a37b962334d0442edaaf9bebc8 Parents: 728c4fa 814a912 Author: Aleksey Yeschenko alek...@apache.org Authored: Tue Feb 4 01:01:42 2014 +0300 Committer: Aleksey Yeschenko alek...@apache.org Committed: Tue Feb 4 01:01:42 2014 +0300 -- CHANGES.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/066d00ba/CHANGES.txt -- diff --cc CHANGES.txt index a1a58a3,981f977..4440942 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -77,51 -45,9 +77,51 @@@ Merged from 1.2 (CASSANDRA-6413) * (Hadoop) add describe_local_ring (CASSANDRA-6268) * Fix handling of concurrent directory creation failure (CASSANDRA-6459) + * Allow executing CREATE statements multiple times (CASSANDRA-6471) + * Don't send confusing info with timeouts (CASSANDRA-6491) + * Don't resubmit counter mutation runnables internally (CASSANDRA-6427) - * Don't drop local mutations without a trace (CASSANDRA-6510) ++ * Don't drop local mutations without a hint (CASSANDRA-6510) + * Don't allow null max_hint_window_in_ms (CASSANDRA-6419) + * Validate SliceRange start and finish lengths (CASSANDRA-6521) -1.2.12 +2.0.3 + * Fix FD leak on slice read path (CASSANDRA-6275) + * Cancel read meter task when closing SSTR (CASSANDRA-6358) + * free off-heap IndexSummary during bulk (CASSANDRA-6359) + * Recover from IOException in accept() thread (CASSANDRA-6349) + * Improve Gossip tolerance of abnormally slow tasks (CASSANDRA-6338) + * Fix trying to hint timed out counter writes (CASSANDRA-6322) + * Allow restoring specific columnfamilies from archived CL (CASSANDRA-4809) + * Avoid flushing compaction_history after each operation (CASSANDRA-6287) + * Fix repair assertion error when tombstones expire (CASSANDRA-6277) + * Skip loading corrupt key cache (CASSANDRA-6260) + * Fixes for compacting larger-than-memory rows (CASSANDRA-6274) + * Compact hottest sstables first and optionally omit coldest from + compaction entirely (CASSANDRA-6109) + * Fix modifying column_metadata from thrift (CASSANDRA-6182) + * cqlsh: fix LIST USERS output (CASSANDRA-6242) + * Add IRequestSink interface (CASSANDRA-6248) + * Update memtable size while flushing (CASSANDRA-6249) + * Provide hooks around CQL2/CQL3 statement execution (CASSANDRA-6252) + * Require Permission.SELECT for CAS updates (CASSANDRA-6247) + * New CQL-aware SSTableWriter (CASSANDRA-5894) + * Reject CAS operation when the protocol v1 is used (CASSANDRA-6270) + * Correctly throw error when frame too large (CASSANDRA-5981) + * Fix serialization bug in PagedRange with 2ndary indexes (CASSANDRA-6299) + * Fix CQL3 table validation in Thrift (CASSANDRA-6140) + * Fix bug missing results with IN clauses (CASSANDRA-6327) + * Fix paging with reversed slices (CASSANDRA-6343) + * Set minTimestamp correctly to be able to drop expired sstables (CASSANDRA-6337) + * Support NaN and Infinity as float literals (CASSANDRA-6003) + * Remove RF from nodetool ring output (CASSANDRA-6289) + * Fix attempting to flush empty rows (CASSANDRA-6374) + * Fix potential out of bounds exception when paging (CASSANDRA-6333) +Merged from 1.2: + * Optimize FD phi calculation (CASSANDRA-6386) + * Improve initial FD phi estimate when starting up (CASSANDRA-6385) + * Don't list CQL3 table in CLI describe even if named explicitely + (CASSANDRA-5750) * Invalidate row cache when dropping CF (CASSANDRA-6351) * add non-jamm path for cached statements (CASSANDRA-6293) * (Hadoop) Require CFRR batchSize to be at least 2 (CASSANDRA-6114)
[jira] [Comment Edited] (CASSANDRA-6510) Don't drop local mutations without a hint
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889949#comment-13889949 ] Jonathan Ellis edited comment on CASSANDRA-6510 at 2/3/14 10:01 PM: Nope, that's not the implication. You can see from the code that {{responseHandler.response}} only gets called after {{rm.apply}}. That is, no write is acknowledged if it hasn't actually been applied. was (Author: jbellis): Nope, that's not the implication. You can easily see from the code that {{responseHandler.response}} only gets called after {{rm.apply}}. That is, no write is acknowledge if it hasn't actually been applied. Don't drop local mutations without a hint - Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a hint. SP.insertLocal() should be using LocalMutationRunnable instead. Note: hints are the context here, not consistency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5863) Create a Decompressed Chunk [block] Cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889997#comment-13889997 ] Pavel Yaskevich commented on CASSANDRA-5863: Just to keep everybody updated, I didn't forget about this although got distracted by multiple things coming simultaneously, I tried multiple ways of using existing cache non of which yielded good performance. As Jake mentioned previously the hard part to track hotness of the sections + low cost lookup for already decompressed chunks, I have one more idea how to make it work, will keep you posted... Create a Decompressed Chunk [block] Cache - Key: CASSANDRA-5863 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863 Project: Cassandra Issue Type: New Feature Components: Core Reporter: T Jake Luciani Assignee: Pavel Yaskevich Labels: performance Fix For: 2.1 Currently, for every read, the CRAR reads each compressed chunk into a byte[], sends it to ICompressor, gets back another byte[] and verifies a checksum. This process is where the majority of time is spent in a read request. Before compression, we would have zero-copy of data and could respond directly from the page-cache. It would be useful to have some kind of Chunk cache that could speed up this process for hot data. Initially this could be a off heap cache but it would be great to put these decompressed chunks onto a SSD so the hot data lives on a fast disk similar to https://github.com/facebook/flashcache. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5863) Create a Decompressed Chunk [block] Cache
[ https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890008#comment-13890008 ] Jonathan Ellis commented on CASSANDRA-5863: --- Thanks for the update. Create a Decompressed Chunk [block] Cache - Key: CASSANDRA-5863 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863 Project: Cassandra Issue Type: New Feature Components: Core Reporter: T Jake Luciani Assignee: Pavel Yaskevich Labels: performance Fix For: 2.1 Currently, for every read, the CRAR reads each compressed chunk into a byte[], sends it to ICompressor, gets back another byte[] and verifies a checksum. This process is where the majority of time is spent in a read request. Before compression, we would have zero-copy of data and could respond directly from the page-cache. It would be useful to have some kind of Chunk cache that could speed up this process for hot data. Initially this could be a off heap cache but it would be great to put these decompressed chunks onto a SSD so the hot data lives on a fast disk similar to https://github.com/facebook/flashcache. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6572) Workload recording / playback
[ https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890012#comment-13890012 ] Jonathan Ellis commented on CASSANDRA-6572: --- Why split the work across two classes instead of doing it all in QP? Workload recording / playback - Key: CASSANDRA-6572 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Jonathan Ellis Assignee: Lyuben Todorov Fix For: 2.0.6 Write sample mode gets us part way to testing new versions against a real world workload, but we need an easy way to test the query side as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6510) Don't drop local mutations without a hint
[ https://issues.apache.org/jira/browse/CASSANDRA-6510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-6510: - Since Version: 1.2.1 Don't drop local mutations without a hint - Key: CASSANDRA-6510 URL: https://issues.apache.org/jira/browse/CASSANDRA-6510 Project: Cassandra Issue Type: Bug Reporter: Aleksey Yeschenko Assignee: Aleksey Yeschenko Fix For: 1.2.14, 2.0.4 Attachments: 6510.txt SP.insertLocal() uses a regular DroppableRunnable, thus timed out local mutations get dropped without leaving a hint. SP.insertLocal() should be using LocalMutationRunnable instead. Note: hints are the context here, not consistency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites
[ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890409#comment-13890409 ] Jonathan Ellis commented on CASSANDRA-4851: --- Agreed that this syntax is convenient. CQL3: improve support for paginating over composites Key: CASSANDRA-4851 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851 Project: Cassandra Issue Type: Improvement Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0.6 Attachments: 4851.txt Consider the following table: {noformat} CREATE TABLE test ( k int, c1 int, c2 int, PRIMARY KEY (k, c1, c2) ) {noformat} with the following data: {noformat} k | c1 | c2 0 | 0 | 0 0 | 0 | 1 0 | 1 | 0 0 | 1 | 1 {noformat} Currently, CQL3 allows to slice over either c1 or c2: {noformat} SELECT * FROM test WHERE k = 0 AND c1 0 AND c1 2 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 0 AND c2 2 {noformat} but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say: {noformat} SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 0) OR c2 0) {noformat} but we don't support that. This can make it hard to paginate over say all records for {{k = 0}} (I'm saying can because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do). For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered. I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others. Another solution would be to allow the following syntax: {noformat} SELECT * FROM test WHERE k = 0 AND (c1, c2) (0, 0) {noformat} which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-4851) CQL3: improve support for paginating over composites
[ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-4851: -- Reviewer: Aleksey Yeschenko Component/s: API CQL3: improve support for paginating over composites Key: CASSANDRA-4851 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851 Project: Cassandra Issue Type: Improvement Components: API Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Fix For: 2.0.6 Attachments: 4851.txt Consider the following table: {noformat} CREATE TABLE test ( k int, c1 int, c2 int, PRIMARY KEY (k, c1, c2) ) {noformat} with the following data: {noformat} k | c1 | c2 0 | 0 | 0 0 | 0 | 1 0 | 1 | 0 0 | 1 | 1 {noformat} Currently, CQL3 allows to slice over either c1 or c2: {noformat} SELECT * FROM test WHERE k = 0 AND c1 0 AND c1 2 SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 0 AND c2 2 {noformat} but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say: {noformat} SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 0) OR c2 0) {noformat} but we don't support that. This can make it hard to paginate over say all records for {{k = 0}} (I'm saying can because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do). For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered. I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others. Another solution would be to allow the following syntax: {noformat} SELECT * FROM test WHERE k = 0 AND (c1, c2) (0, 0) {noformat} which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family
[ https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890411#comment-13890411 ] Jonathan Ellis commented on CASSANDRA-6645: --- On the first hunk, should {{indexManager.getIndexForColumn(expression.column_name)}} even be including non-CF indexes? upgradesstables causes NPE for secondary indexes without an underlying column family Key: CASSANDRA-6645 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sergio Bossa Assignee: Sergio Bossa Fix For: 2.0.6 Attachments: CASSANDRA-6645.patch SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index is not backed by a column family, but this causes an NPE as StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not check for null values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6645) upgradesstables causes NPE for secondary indexes without an underlying column family
[ https://issues.apache.org/jira/browse/CASSANDRA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6645: -- Reviewer: Jonathan Ellis Fix Version/s: 2.0.6 upgradesstables causes NPE for secondary indexes without an underlying column family Key: CASSANDRA-6645 URL: https://issues.apache.org/jira/browse/CASSANDRA-6645 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sergio Bossa Assignee: Sergio Bossa Fix For: 2.0.6 Attachments: CASSANDRA-6645.patch SecondaryIndex#getIndexCfs is allowed to return null by contract, if the index is not backed by a column family, but this causes an NPE as StorageService#getValidColumnFamilies and StorageService#upgradeSSTables do not check for null values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[3/3] git commit: merge from 2.0
merge from 2.0 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b25a63a8 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b25a63a8 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b25a63a8 Branch: refs/heads/trunk Commit: b25a63a81d22e409e607ca28c39e20604332cb5d Parents: 78f7142 039e9b9 Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 3 23:51:16 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 3 23:51:16 2014 -0600 -- CHANGES.txt | 2 + .../cassandra/config/DatabaseDescriptor.java| 20 ++- .../org/apache/cassandra/io/util/Memory.java| 123 ++- .../cassandra/service/CassandraDaemon.java | 2 +- .../cassandra/utils/FastByteComparisons.java| 6 + 5 files changed, 145 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/b25a63a8/CHANGES.txt -- diff --cc CHANGES.txt index 6a4c507,b1fade1..28278c4 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,33 -1,6 +1,35 @@@ +2.1 + * add listsnapshots command to nodetool (CASSANDRA-5742) + * Introduce AtomicBTreeColumns (CASSANDRA-6271) + * Multithreaded commitlog (CASSANDRA-3578) + * allocate fixed index summary memory pool and resample cold index summaries + to use less memory (CASSANDRA-5519) + * Removed multithreaded compaction (CASSANDRA-6142) + * Parallelize fetching rows for low-cardinality indexes (CASSANDRA-1337) + * change logging from log4j to logback (CASSANDRA-5883) + * switch to LZ4 compression for internode communication (CASSANDRA-5887) + * Stop using Thrift-generated Index* classes internally (CASSANDRA-5971) + * Remove 1.2 network compatibility code (CASSANDRA-5960) + * Remove leveled json manifest migration code (CASSANDRA-5996) + * Remove CFDefinition (CASSANDRA-6253) + * Use AtomicIntegerFieldUpdater in RefCountedMemory (CASSANDRA-6278) + * User-defined types for CQL3 (CASSANDRA-5590) + * Use of o.a.c.metrics in nodetool (CASSANDRA-5871, 6406) + * Batch read from OTC's queue and cleanup (CASSANDRA-1632) + * Secondary index support for collections (CASSANDRA-4511, 6383) + * SSTable metadata(Stats.db) format change (CASSANDRA-6356) + * Push composites support in the storage engine + (CASSANDRA-5417, CASSANDRA-6520) + * Add snapshot space used to cfstats (CASSANDRA-6231) + * Add cardinality estimator for key count estimation (CASSANDRA-5906) + * CF id is changed to be non-deterministic. Data dir/key cache are created + uniquely for CF id (CASSANDRA-5202) + * New counters implementation (CASSANDRA-6504) + + 2.0.6 + * Fix direct Memory on architectures that do not support unaligned long access +(CASSANDRA-6628) * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) http://git-wip-us.apache.org/repos/asf/cassandra/blob/b25a63a8/src/java/org/apache/cassandra/config/DatabaseDescriptor.java -- diff --cc src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 2793237,bd5db69..378fa8a --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@@ -177,9 -177,9 +177,9 @@@ public class DatabaseDescripto /* evaluate the DiskAccessMode Config directive, which also affects indexAccessMode selection */ if (conf.disk_access_mode == Config.DiskAccessMode.auto) { - conf.disk_access_mode = System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; + conf.disk_access_mode = hasLargeAddressSpace() ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; indexAccessMode = conf.disk_access_mode; -logger.info(DiskAccessMode 'auto' determined to be + conf.disk_access_mode + , indexAccessMode is + indexAccessMode ); +logger.info(DiskAccessMode 'auto' determined to be {}, indexAccessMode is {}, conf.disk_access_mode, indexAccessMode); } else if (conf.disk_access_mode == Config.DiskAccessMode.mmap_index_only) { @@@ -1384,8 -1324,19 +1384,24 @@@ } } +public static int getIndexSummaryResizeIntervalInMinutes() +{ +return conf.index_summary_resize_interval_in_minutes; +} ++ + public static boolean hasLargeAddressSpace() + { + // currently we just check if it's a 64bit arch, but any we only really care if the address space is large + String datamodel = System.getProperty(sun.arch.data.model); + if (datamodel != null) + { + switch (datamodel) +
[2/3] git commit: Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628
Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/039e9b9a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/039e9b9a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/039e9b9a Branch: refs/heads/trunk Commit: 039e9b9a18cbe78091231a4538b6d428deacc771 Parents: 066d00b Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 3 23:50:08 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 3 23:50:08 2014 -0600 -- CHANGES.txt | 2 + .../cassandra/config/DatabaseDescriptor.java| 20 ++- .../org/apache/cassandra/io/util/Memory.java| 123 ++- .../cassandra/service/CassandraDaemon.java | 2 +- .../cassandra/utils/FastByteComparisons.java| 6 + 5 files changed, 145 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 4440942..b1fade1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,6 @@ 2.0.6 + * Fix direct Memory on architectures that do not support unaligned long access + (CASSANDRA-6628) * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java -- diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 44d9d3a..bd5db69 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -172,12 +172,12 @@ public class DatabaseDescriptor } if (conf.commitlog_total_space_in_mb == null) -conf.commitlog_total_space_in_mb = System.getProperty(os.arch).contains(64) ? 1024 : 32; +conf.commitlog_total_space_in_mb = hasLargeAddressSpace() ? 1024 : 32; /* evaluate the DiskAccessMode Config directive, which also affects indexAccessMode selection */ if (conf.disk_access_mode == Config.DiskAccessMode.auto) { -conf.disk_access_mode = System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; +conf.disk_access_mode = hasLargeAddressSpace() ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; indexAccessMode = conf.disk_access_mode; logger.info(DiskAccessMode 'auto' determined to be + conf.disk_access_mode + , indexAccessMode is + indexAccessMode ); } @@ -1323,4 +1323,20 @@ public class DatabaseDescriptor throw new RuntimeException(e); } } + +public static boolean hasLargeAddressSpace() +{ +// currently we just check if it's a 64bit arch, but any we only really care if the address space is large +String datamodel = System.getProperty(sun.arch.data.model); +if (datamodel != null) +{ +switch (datamodel) +{ +case 64: return true; +case 32: return false; +} +} +String arch = System.getProperty(os.arch); +return arch.contains(64) || arch.contains(sparcv9); +} } http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/io/util/Memory.java -- diff --git a/src/java/org/apache/cassandra/io/util/Memory.java b/src/java/org/apache/cassandra/io/util/Memory.java index f276190..263205b 100644 --- a/src/java/org/apache/cassandra/io/util/Memory.java +++ b/src/java/org/apache/cassandra/io/util/Memory.java @@ -17,9 +17,10 @@ */ package org.apache.cassandra.io.util; -import sun.misc.Unsafe; +import java.nio.ByteOrder; import org.apache.cassandra.config.DatabaseDescriptor; +import sun.misc.Unsafe; /** * An off-heap region of memory that must be manually free'd when no longer needed. @@ -30,6 +31,16 @@ public class Memory private static final IAllocator allocator = DatabaseDescriptor.getoffHeapMemoryAllocator(); private static final long BYTE_ARRAY_BASE_OFFSET = unsafe.arrayBaseOffset(byte[].class); +private static final boolean bigEndian = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN); +private static final boolean unaligned; + +static +{ +String arch = System.getProperty(os.arch); +unaligned =
[1/3] git commit: Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628
Updated Branches: refs/heads/cassandra-2.0 066d00ba5 - 039e9b9a1 refs/heads/trunk 78f71420c - b25a63a81 Fix direct Memory on architectures that do not support unaligned long access patch by Dmitry Shohov and Benedict Elliott Smith for CASSANDRA-6628 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/039e9b9a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/039e9b9a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/039e9b9a Branch: refs/heads/cassandra-2.0 Commit: 039e9b9a18cbe78091231a4538b6d428deacc771 Parents: 066d00b Author: Jonathan Ellis jbel...@apache.org Authored: Mon Feb 3 23:50:08 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Mon Feb 3 23:50:08 2014 -0600 -- CHANGES.txt | 2 + .../cassandra/config/DatabaseDescriptor.java| 20 ++- .../org/apache/cassandra/io/util/Memory.java| 123 ++- .../cassandra/service/CassandraDaemon.java | 2 +- .../cassandra/utils/FastByteComparisons.java| 6 + 5 files changed, 145 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 4440942..b1fade1 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,6 @@ 2.0.6 + * Fix direct Memory on architectures that do not support unaligned long access + (CASSANDRA-6628) * Let scrub optionally skip broken counter partitions (CASSANDRA-5930) http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java -- diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 44d9d3a..bd5db69 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -172,12 +172,12 @@ public class DatabaseDescriptor } if (conf.commitlog_total_space_in_mb == null) -conf.commitlog_total_space_in_mb = System.getProperty(os.arch).contains(64) ? 1024 : 32; +conf.commitlog_total_space_in_mb = hasLargeAddressSpace() ? 1024 : 32; /* evaluate the DiskAccessMode Config directive, which also affects indexAccessMode selection */ if (conf.disk_access_mode == Config.DiskAccessMode.auto) { -conf.disk_access_mode = System.getProperty(os.arch).contains(64) ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; +conf.disk_access_mode = hasLargeAddressSpace() ? Config.DiskAccessMode.mmap : Config.DiskAccessMode.standard; indexAccessMode = conf.disk_access_mode; logger.info(DiskAccessMode 'auto' determined to be + conf.disk_access_mode + , indexAccessMode is + indexAccessMode ); } @@ -1323,4 +1323,20 @@ public class DatabaseDescriptor throw new RuntimeException(e); } } + +public static boolean hasLargeAddressSpace() +{ +// currently we just check if it's a 64bit arch, but any we only really care if the address space is large +String datamodel = System.getProperty(sun.arch.data.model); +if (datamodel != null) +{ +switch (datamodel) +{ +case 64: return true; +case 32: return false; +} +} +String arch = System.getProperty(os.arch); +return arch.contains(64) || arch.contains(sparcv9); +} } http://git-wip-us.apache.org/repos/asf/cassandra/blob/039e9b9a/src/java/org/apache/cassandra/io/util/Memory.java -- diff --git a/src/java/org/apache/cassandra/io/util/Memory.java b/src/java/org/apache/cassandra/io/util/Memory.java index f276190..263205b 100644 --- a/src/java/org/apache/cassandra/io/util/Memory.java +++ b/src/java/org/apache/cassandra/io/util/Memory.java @@ -17,9 +17,10 @@ */ package org.apache.cassandra.io.util; -import sun.misc.Unsafe; +import java.nio.ByteOrder; import org.apache.cassandra.config.DatabaseDescriptor; +import sun.misc.Unsafe; /** * An off-heap region of memory that must be manually free'd when no longer needed. @@ -30,6 +31,16 @@ public class Memory private static final IAllocator allocator = DatabaseDescriptor.getoffHeapMemoryAllocator(); private static final long BYTE_ARRAY_BASE_OFFSET = unsafe.arrayBaseOffset(byte[].class); +private static final boolean bigEndian = ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN); +private static final
[jira] [Commented] (CASSANDRA-6572) Workload recording / playback
[ https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890421#comment-13890421 ] Lyuben Todorov commented on CASSANDRA-6572: --- I thought it would make more sense to have this kind of functionality in StorageProxy, but it makes sense to keep it simple by only coding in QP and it will be better if someone else wishes to extend the functionality of the workload recording. Workload recording / playback - Key: CASSANDRA-6572 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Jonathan Ellis Assignee: Lyuben Todorov Fix For: 2.0.6 Write sample mode gets us part way to testing new versions against a real world workload, but we need an easy way to test the query side as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890422#comment-13890422 ] Jonathan Ellis commented on CASSANDRA-5263: --- Why don't we just pick the depth that covers the appropriate number of partitions/hashes? 2**16 = 64K, 2**17 = 128K, etc. Allow Merkle tree maximum depth to be configurable -- Key: CASSANDRA-5263 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263 Project: Cassandra Issue Type: Improvement Components: Config Affects Versions: 1.1.9 Reporter: Ahmed Bashir Assignee: Minh Do Currently, the maximum depth allowed for Merkle trees is hardcoded as 15. This value should be configurable, just like phi_convict_treshold and other properties. Given a cluster with nodes responsible for a large number of row keys, Merkle tree comparisons can result in a large amount of unnecessary row keys being streamed. Empirical testing indicates that reasonable changes to this depth (18, 20, etc) don't affect the Merkle tree generation and differencing timings all that much, and they can significantly reduce the amount of data being streamed during repair. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6472) Node hangs when Drop Keyspace / Table is executed
[ https://issues.apache.org/jira/browse/CASSANDRA-6472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6472: -- Priority: Minor (was: Major) Assignee: Mikhail Stepura (was: Benedict) [~mishail] can you shed any light on the cqlsh hang? Node hangs when Drop Keyspace / Table is executed - Key: CASSANDRA-6472 URL: https://issues.apache.org/jira/browse/CASSANDRA-6472 Project: Cassandra Issue Type: Bug Components: Core Reporter: amorton Assignee: Mikhail Stepura Priority: Minor Fix For: 2.1 from http://www.mail-archive.com/user@cassandra.apache.org/msg33566.html CommitLogSegmentManager.flushDataFrom() returns a FutureTask to wait on the flushes, but the task is not started in flushDataFrom(). The CLSM manager thread does not use the result and forceRecycleAll (eventually called when making schema mods) does not start it so hangs when calling get(). plan to patch so flushDataFrom() returns a Future. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-5549. --- Resolution: Fixed Reviewer: Jonathan Ellis Remove Table.switchLock --- Key: CASSANDRA-5549 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-5549) Remove Table.switchLock
[ https://issues.apache.org/jira/browse/CASSANDRA-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5549: -- Issue Type: Improvement (was: Bug) Remove Table.switchLock --- Key: CASSANDRA-5549 URL: https://issues.apache.org/jira/browse/CASSANDRA-5549 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Benedict Labels: performance Fix For: 2.1 Attachments: 5549-removed-switchlock.png, 5549-sunnyvale.png As discussed in CASSANDRA-5422, Table.switchLock is a bottleneck on the write path. ReentrantReadWriteLock is not lightweight, even if there is no contention per se between readers and writers of the lock (in Cassandra, memtable updates and switches). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (CASSANDRA-6594) CqlRecordWriter marked final
[ https://issues.apache.org/jira/browse/CASSANDRA-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-6594. --- Resolution: Fixed Fix Version/s: 2.1 Reviewer: Jonathan Ellis Assignee: Luca Rosellini SGTM; committed CqlRecordWriter marked final Key: CASSANDRA-6594 URL: https://issues.apache.org/jira/browse/CASSANDRA-6594 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Luca Rosellini Assignee: Luca Rosellini Labels: CQL3, HADOOP Fix For: 2.1 Attachments: CqlRecordWriter.diff We have an use case in which we need a custom implementation of CqlRecordWriter. It would be nice to have an extensible version of it (it would save us the pain of replicating upstream changes). See attached patch. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
git commit: make CqlRecordWriter extensible
Updated Branches: refs/heads/trunk b25a63a81 - 0842681e2 make CqlRecordWriter extensible Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/0842681e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/0842681e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/0842681e Branch: refs/heads/trunk Commit: 0842681e214229b5d83283574911d70b5b050586 Parents: b25a63a Author: Jonathan Ellis jbel...@apache.org Authored: Tue Feb 4 00:07:30 2014 -0600 Committer: Jonathan Ellis jbel...@apache.org Committed: Tue Feb 4 00:07:30 2014 -0600 -- .../apache/cassandra/hadoop/cql3/CqlRecordWriter.java | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/0842681e/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java -- diff --git a/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java b/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java index 27d1c70..e354ad6 100644 --- a/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java +++ b/src/java/org/apache/cassandra/hadoop/cql3/CqlRecordWriter.java @@ -59,21 +59,21 @@ import org.apache.thrift.transport.TTransport; * * @see CqlOutputFormat */ -final class CqlRecordWriter extends AbstractColumnFamilyRecordWriterMapString, ByteBuffer, ListByteBuffer +class CqlRecordWriter extends AbstractColumnFamilyRecordWriterMapString, ByteBuffer, ListByteBuffer { private static final Logger logger = LoggerFactory.getLogger(CqlRecordWriter.class); // handles for clients for each range running in the threadpool -private final MapRange, RangeClient clients; +protected final MapRange, RangeClient clients; // host to prepared statement id mappings -private ConcurrentHashMapCassandra.Client, Integer preparedStatements = new ConcurrentHashMapCassandra.Client, Integer(); +protected final ConcurrentHashMapCassandra.Client, Integer preparedStatements = new ConcurrentHashMapCassandra.Client, Integer(); -private final String cql; +protected final String cql; -private AbstractType? keyValidator; -private String [] partitionKeyColumns; -private ListString clusterColumns; +protected AbstractType? keyValidator; +protected String [] partitionKeyColumns; +protected ListString clusterColumns; /** * Upon construction, obtain the map that this writer will use to collect
[jira] [Commented] (CASSANDRA-6157) Selectively Disable hinted handoff for a data center
[ https://issues.apache.org/jira/browse/CASSANDRA-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890427#comment-13890427 ] Lyuben Todorov commented on CASSANDRA-6157: --- [~kohlisankalp] Still working on this? Selectively Disable hinted handoff for a data center Key: CASSANDRA-6157 URL: https://issues.apache.org/jira/browse/CASSANDRA-6157 Project: Cassandra Issue Type: Improvement Components: Core Reporter: sankalp kohli Assignee: sankalp kohli Priority: Minor Fix For: 2.0.6 Attachments: trunk-6157-v2.diff, trunk-6157-v3.diff, trunk-6157-v4.diff, trunk-6157.txt Cassandra supports disabling the hints or reducing the window for hints. It would be helpful to have a switch which stops hints to a down data center but continue hints to other DCs. This is helpful during data center fail over as hints will put more unnecessary pressure on the DC taking double traffic. Also since now Cassandra is under reduced reduncany, we don't want to disable hints within the DC. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6568) sstables incorrectly getting marked as not live
[ https://issues.apache.org/jira/browse/CASSANDRA-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6568: -- Fix Version/s: 2.0.6 1.2.15 sstables incorrectly getting marked as not live - Key: CASSANDRA-6568 URL: https://issues.apache.org/jira/browse/CASSANDRA-6568 Project: Cassandra Issue Type: Bug Components: Core Environment: 1.2.12 with several 1.2.13 patches Reporter: Chris Burroughs Assignee: Marcus Eriksson Fix For: 1.2.15, 2.0.6 {noformat} -rw-rw-r-- 14 cassandra cassandra 1.4G Nov 25 19:46 /data/sstables/data/ks/cf/ks-cf-ic-402383-Data.db -rw-rw-r-- 14 cassandra cassandra 13G Nov 26 00:04 /data/sstables/data/ks/cf/ks-cf-ic-402430-Data.db -rw-rw-r-- 14 cassandra cassandra 13G Nov 26 05:03 /data/sstables/data/ks/cf/ks-cf-ic-405231-Data.db -rw-rw-r-- 31 cassandra cassandra 21G Nov 26 08:38 /data/sstables/data/ks/cf/ks-cf-ic-405232-Data.db -rw-rw-r-- 2 cassandra cassandra 2.6G Dec 3 13:44 /data/sstables/data/ks/cf/ks-cf-ic-434662-Data.db -rw-rw-r-- 14 cassandra cassandra 1.5G Dec 5 09:05 /data/sstables/data/ks/cf/ks-cf-ic-438698-Data.db -rw-rw-r-- 2 cassandra cassandra 3.1G Dec 6 12:10 /data/sstables/data/ks/cf/ks-cf-ic-440983-Data.db -rw-rw-r-- 2 cassandra cassandra 96M Dec 8 01:52 /data/sstables/data/ks/cf/ks-cf-ic-444041-Data.db -rw-rw-r-- 2 cassandra cassandra 3.3G Dec 9 16:37 /data/sstables/data/ks/cf/ks-cf-ic-451116-Data.db -rw-rw-r-- 2 cassandra cassandra 876M Dec 10 11:23 /data/sstables/data/ks/cf/ks-cf-ic-453552-Data.db -rw-rw-r-- 2 cassandra cassandra 891M Dec 11 03:21 /data/sstables/data/ks/cf/ks-cf-ic-454518-Data.db -rw-rw-r-- 2 cassandra cassandra 102M Dec 11 12:27 /data/sstables/data/ks/cf/ks-cf-ic-455429-Data.db -rw-rw-r-- 2 cassandra cassandra 906M Dec 11 23:54 /data/sstables/data/ks/cf/ks-cf-ic-455533-Data.db -rw-rw-r-- 1 cassandra cassandra 214M Dec 12 05:02 /data/sstables/data/ks/cf/ks-cf-ic-456426-Data.db -rw-rw-r-- 1 cassandra cassandra 203M Dec 12 10:49 /data/sstables/data/ks/cf/ks-cf-ic-456879-Data.db -rw-rw-r-- 1 cassandra cassandra 49M Dec 12 12:03 /data/sstables/data/ks/cf/ks-cf-ic-456963-Data.db -rw-rw-r-- 18 cassandra cassandra 20G Dec 25 01:09 /data/sstables/data/ks/cf/ks-cf-ic-507770-Data.db -rw-rw-r-- 3 cassandra cassandra 12G Jan 8 04:22 /data/sstables/data/ks/cf/ks-cf-ic-567100-Data.db -rw-rw-r-- 3 cassandra cassandra 957M Jan 8 22:51 /data/sstables/data/ks/cf/ks-cf-ic-569015-Data.db -rw-rw-r-- 2 cassandra cassandra 923M Jan 9 17:04 /data/sstables/data/ks/cf/ks-cf-ic-571303-Data.db -rw-rw-r-- 1 cassandra cassandra 821M Jan 10 08:20 /data/sstables/data/ks/cf/ks-cf-ic-574642-Data.db -rw-rw-r-- 1 cassandra cassandra 18M Jan 10 08:48 /data/sstables/data/ks/cf/ks-cf-ic-574723-Data.db {noformat} I tried to do a user defined compaction on sstables from November and got it is not an active sstable. Live sstable count from jmx was about 7 while on disk there were over 20. Live vs total size showed about a ~50 GiB difference. Forcing a gc from jconsole had no effect. However, restarting the node resulted in live sstables/bytes *increasing* to match what was on disk. User compaction could now compact the November sstables. This cluster was last restarted in mid December. I'm not sure what affect not live had on other operations of the cluster. From the logs it seems that the files were sent at least at some point as part of repair, but I don't know if they were being being used for read requests or not. Because the problem that got me looking in the first place was poor performance I suspect they were used for reads (and the reads were slow because so many sstables were being read). I presume based on their age at the least they were being excluded from compaction. I'm not aware of any isLive() or getRefCount() to problematically confirm which nodes have this problem. In this cluster almost all columns have a 14 day TTL, based on the number of nodes with November sstables it appears to be occurring on a significant fraction of the nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6557) CommitLogSegment may be duplicated in unlikely race scenario
[ https://issues.apache.org/jira/browse/CASSANDRA-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-6557: -- Priority: Minor (was: Major) CommitLogSegment may be duplicated in unlikely race scenario Key: CASSANDRA-6557 URL: https://issues.apache.org/jira/browse/CASSANDRA-6557 Project: Cassandra Issue Type: Bug Components: Core Environment: 2.1 Reporter: Benedict Priority: Minor Fix For: 2.1 In the unlikely event that the thread that switched to a new CLS has not finished executing the cleanup of its switch by the time the CLS has finished being used, it is possible for the same segment to be 'switched' in again. This would be benign except that it is added to the activeSegments queue a second time also, which would permit it to be recycled twice, creating two different CLS objects in memory pointing to the same CLS on disk, after which all bets are off. The issue is highly unlikely to occur, but highly unlikely means it will probably happen eventually. I've fixed this based on my patch for CASSANDRA-5549, using the NonBlockingQueue I introduce there to simplify the logic and make it more obviously correct. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6572) Workload recording / playback
[ https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890436#comment-13890436 ] Jonathan Ellis commented on CASSANDRA-6572: --- Do you have any strong feelings here [~iamaleksey] on the QP/SP divide? Workload recording / playback - Key: CASSANDRA-6572 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Jonathan Ellis Assignee: Lyuben Todorov Fix For: 2.0.6 Write sample mode gets us part way to testing new versions against a real world workload, but we need an easy way to test the query side as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890438#comment-13890438 ] Yuki Morishita commented on CASSANDRA-5351: --- bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. That's what I meant. Current major compaction produces one SSTable and I think changing that behavior would confuse users, maybe. My opinion is to keep it as is, but . Additional review comments: * Does PrepareMessage needs to carry around dataCenters? Only coordinator sends out messages so I think you can drop it(also from ParentRepairSession). * CF ID is preferred to use over Keyspace name/CF name pair. * PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't one message per replica node enough? * I think we need clean up for parentRepairSessions when something bad happened. Otherwise ParentRepairSession in the map keep reference to SSTables. I just worked on the first one above and the commit is here(on top of your branch): https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361 Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890438#comment-13890438 ] Yuki Morishita edited comment on CASSANDRA-5351 at 2/4/14 6:20 AM: --- bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. That's what I meant. Current major compaction produces one SSTable and I think changing that behavior would confuse users, maybe. My opinion is to keep it as is . Additional review comments: * Does PrepareMessage needs to carry around dataCenters? Only coordinator sends out messages so I think you can drop it(also from ParentRepairSession). * CF ID is preferred to use over Keyspace name/CF name pair. * PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't one message per replica node enough? * I think we need clean up for parentRepairSessions when something bad happened. Otherwise ParentRepairSession in the map keep reference to SSTables. I just worked on the first one above and the commit is here(on top of your branch): https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361 was (Author: yukim): bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. That's what I meant. Current major compaction produces one SSTable and I think changing that behavior would confuse users, maybe. My opinion is to keep it as is, but . Additional review comments: * Does PrepareMessage needs to carry around dataCenters? Only coordinator sends out messages so I think you can drop it(also from ParentRepairSession). * CF ID is preferred to use over Keyspace name/CF name pair. * PrepareMessage is sent per CF but it can produce a lot of round trip. Isn't one message per replica node enough? * I think we need clean up for parentRepairSessions when something bad happened. Otherwise ParentRepairSession in the map keep reference to SSTables. I just worked on the first one above and the commit is here(on top of your branch): https://github.com/yukim/cassandra/commit/7c65e532dd69f9f4c1ea2d3fdf0401ed70291361 Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5351) Avoid repairing already-repaired data by default
[ https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890447#comment-13890447 ] Jonathan Ellis commented on CASSANDRA-5351: --- bq. Dropping sstable to UNREPAIRED during major compaction means that all repaired data status is cleared for the node. Maybe we could make major compaction do 2 separate compactions? Ending up with 2 sstables should be fine for users right? I think this is a better approach than stomping on the repair information. People major compact to free up disk space or improve read performance; either way, having a small amount of data in an unrepaired sandbox should be acceptable. (If I am wrong, we can add a utility to clear repaired flags, or add a flag to compact to treat everything as unrepaired... but I'd rather not add this complexity unless we see a clear demand for it.) Avoid repairing already-repaired data by default Key: CASSANDRA-5351 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Assignee: Lyuben Todorov Labels: repair Fix For: 2.1 Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 5351_nodetool.log Repair has always built its merkle tree from all the data in a columnfamily, which is guaranteed to work but is inefficient. We can improve this by remembering which sstables have already been successfully repaired, and only repairing sstables new since the last repair. (This automatically makes CASSANDRA-3362 much less of a problem too.) The tricky part is, compaction will (if not taught otherwise) mix repaired data together with non-repaired. So we should segregate unrepaired sstables from the repaired ones. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890449#comment-13890449 ] Minh Do commented on CASSANDRA-5263: If I understand correctly, are you saying that if N is the total number of rows in all SSTables on a node for a given token range, then depth = logN with log base 2? This works if a node does not hold too many rows. Can we safely assume that a node does not hold more than 2^24 rows (or 16.7M rows)? Because for this many rows, we need to build a Merkle tree with depth 24 and requires about 1.6G of heap. Beyond this number, I would say we run into memory heap allocation issue. I was thinking earlier that depth 20 is the maximum allowable depth and I worked my way down to compute lower depth tree. Allow Merkle tree maximum depth to be configurable -- Key: CASSANDRA-5263 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263 Project: Cassandra Issue Type: Improvement Components: Config Affects Versions: 1.1.9 Reporter: Ahmed Bashir Assignee: Minh Do Currently, the maximum depth allowed for Merkle trees is hardcoded as 15. This value should be configurable, just like phi_convict_treshold and other properties. Given a cluster with nodes responsible for a large number of row keys, Merkle tree comparisons can result in a large amount of unnecessary row keys being streamed. Empirical testing indicates that reasonable changes to this depth (18, 20, etc) don't affect the Merkle tree generation and differencing timings all that much, and they can significantly reduce the amount of data being streamed during repair. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5263) Allow Merkle tree maximum depth to be configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890451#comment-13890451 ] Jonathan Ellis commented on CASSANDRA-5263: --- No, we can't assume that, but capping it at 20 is certainly better than capping it at 16 as it does now. Allow Merkle tree maximum depth to be configurable -- Key: CASSANDRA-5263 URL: https://issues.apache.org/jira/browse/CASSANDRA-5263 Project: Cassandra Issue Type: Improvement Components: Config Affects Versions: 1.1.9 Reporter: Ahmed Bashir Assignee: Minh Do Currently, the maximum depth allowed for Merkle trees is hardcoded as 15. This value should be configurable, just like phi_convict_treshold and other properties. Given a cluster with nodes responsible for a large number of row keys, Merkle tree comparisons can result in a large amount of unnecessary row keys being streamed. Empirical testing indicates that reasonable changes to this depth (18, 20, etc) don't affect the Merkle tree generation and differencing timings all that much, and they can significantly reduce the amount of data being streamed during repair. -- This message was sent by Atlassian JIRA (v6.1.5#6160)