[jira] [Commented] (CASSANDRA-3674) add nodetool explicitgc
[ https://issues.apache.org/jira/browse/CASSANDRA-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176110#comment-13176110 ] Peter Schuller commented on CASSANDRA-3674: --- My argument is that for casual and/or new users, they are running Cassandra, not the JVM, and they are not JVM experts (also, along those lines the heap usage printout of 'nodetool info' is duplication, *and* even misleading because it encourages people to interpret it in ways that don't really reflect reality). I see what you're saying; I just think it's a convenient thing to have. It's not that *I* want it, but I'd like to be able to tell people to use it and assume they have it available out of the box. Not too fussed about it though :) add nodetool explicitgc --- Key: CASSANDRA-3674 URL: https://issues.apache.org/jira/browse/CASSANDRA-3674 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Attachments: CASSANDRA-3674-trunk.txt So that you can easily ask people run nodetool explicitgc and paste the results. I'll file a separate JIRA suggesting that we ship with -XX:+ExplicitGCInvokesConcurrent by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176126#comment-13176126 ] Pavel Yaskevich commented on CASSANDRA-3623: CASSANDRA-3610 needs rebase to be applied on the latest trunk. Also I took a look at the doc you have attached and it looks like test for 1 is broken because stress command line shows that you use -S 3000 instead of 1. {noformat} Compressed Reads: *10,000* columnSize: [vijay_tcasstest@vijay_tcass--1a-i-2801d94a ~]$ java -Xms2G -Xmx2G -Xmn1G -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -jar Stress.jar -p 7102 -d 10.87.81.75 -n 50 -S *3000* -I SnappyCompressor -o read {noformat} Also as I mentioned before - you test on the different nodes on the working cluster, there are side factors that could be affecting test results. Can you please explain why testing performance on the working cluster is a good idea? use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176221#comment-13176221 ] Vijay commented on CASSANDRA-3623: -- Also I took a look at the doc you have attached and it looks like test for 1 is broken because stress command line shows that you use -S 3000 instead of 1. I will fix it. Also as I mentioned before - you test on the different nodes on the working cluster, there are side factors that could be affecting test results. Can you please explain why testing performance on the working cluster is a good idea? How do you know it is a working cluster? They are individual machine isolated without any network access to any other machine. There isnt anything which is been shared between those machines (They are VM's from the diffrent servers than the results which i have ever published). I created this test in different just to make a clean environment with cold cache (other option is to reset the mmap which i dont want to do). I know you have your doubts but I am not that bad ;) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176226#comment-13176226 ] Pavel Yaskevich commented on CASSANDRA-3623: I ask because you mentioned previously that you done tests on 12 node cluster. Testing results on the cloud depend on your neigbours that is why I/O could differ dramatically as it does in your tests, let's settle with CASSADRA-3611 (and CASSANDRA-3610) and I will test it again on the real machine. use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3673) Allow reduced consistency in sstableloader utility
[ https://issues.apache.org/jira/browse/CASSANDRA-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-3673: -- Component/s: (was: Core) Tools Priority: Minor (was: Major) Affects Version/s: (was: 1.0.6) (was: 0.8.0) Fix Version/s: (was: 1.0.7) Issue Type: New Feature (was: Bug) Summary: Allow reduced consistency in sstableloader utility (was: Issues in sstableloader utility) 1) For 1.1 we've updated the loader to not become a gossip peer (CASSANDRA-3045), but for 1.0.x that's part of how it works. In the meantime, do not use nodetool removetoken; just let it expire normally when it's done. 2) The loader is doing you a favor here; it's a lot cheaper to get the data on all the machines in the first place, than to put it only some and repair later. But, I suppose it's reasonable to have an option to reduce the effective ConsistencyLevel here. Allow reduced consistency in sstableloader utility -- Key: CASSANDRA-3673 URL: https://issues.apache.org/jira/browse/CASSANDRA-3673 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: Samarth Gahire Priority: Minor Labels: cassandra, clustering, performance, sstableloader Original Estimate: 72h Remaining Estimate: 72h Below are some of the issues I have been facing since I am using sstable-loader cassandra utility in cassandra-0.8.2 1) We have configured the sstableloader on a different machine.Since we have loaded sstables from this machine it has become a part of the cluster and except loading time it is always unreachable in describe cluster. a)As it is unreachable whenever I changes the schema it says this node is unreachable(but its ok as schema change reflect over the other nodes) b) The main problem is when I tried to remove the node out of the cluster using nodetool removetoken ,the process stucks saying RemovalStatus: Removing token (62676456546693435176060154681903071729). Waiting for replication confirmation from [cassandra-1/(10.10.01.10)(This is the ip of loader machine)].As loader is part of the cluster and cassandra tries to stream the data from loader machine and could not stream. So instead of making loader machine permanent part of the cluster can we make it temporarily part of the cluster? 2)When any of the node is down or unreachable with thrift based client like pelop we can insert the data into the cassandra cluster.But this is not the case with the sstable-loader.It do not work(do not stream) when any of the nodes in the cluster is down or unreachable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3611) Make checksum on a compressed blocks optional
[ https://issues.apache.org/jira/browse/CASSANDRA-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176133#comment-13176133 ] Pavel Yaskevich commented on CASSANDRA-3611: {code} if (FBUtilities.threadLocalRandom().nextDouble() metadata.parameters.crcChance) {code} So when you have 1.0 in your parameters you will never get checksum checked (because nextDouble() is 1.0d exclusive), on the other hand with 0.0 you will check checksum every time, shouldn't it work the other way around? Make checksum on a compressed blocks optional - Key: CASSANDRA-3611 URL: https://issues.apache.org/jira/browse/CASSANDRA-3611 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Priority: Minor Labels: compression Fix For: 1.1 Attachments: 0001-crc-check-chance-v2.patch, 0001-crc-check-chance.patch Currently every uncompressed block is run against checksum algo, there is cpu overhead in doing same... We might want to make it configurable/optional for some use cases which might not require checksum all the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3674) add nodetool explicitgc
[ https://issues.apache.org/jira/browse/CASSANDRA-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176168#comment-13176168 ] Radim Kolar commented on CASSANDRA-3674: if this command reports size of live objects on heap after GC then it is somewhat usefull, because after doing GC in jconsole no size is printed after finishing GC and you need to wait some time until heap graphs are refreshed. add nodetool explicitgc --- Key: CASSANDRA-3674 URL: https://issues.apache.org/jira/browse/CASSANDRA-3674 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Attachments: CASSANDRA-3674-trunk.txt So that you can easily ask people run nodetool explicitgc and paste the results. I'll file a separate JIRA suggesting that we ship with -XX:+ExplicitGCInvokesConcurrent by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3611) Make checksum on a compressed blocks optional
[ https://issues.apache.org/jira/browse/CASSANDRA-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176133#comment-13176133 ] Pavel Yaskevich edited comment on CASSANDRA-3611 at 12/27/11 10:34 AM: --- {code} if (FBUtilities.threadLocalRandom().nextDouble() metadata.parameters.crcChance) {code} So when you have 1.0 in your parameters you will never get checksum checked (because nextDouble() is 1.0d exclusive), on the other hand with 0.0 you will check checksum every time, shouldn't it work the other way around? I also think that we should add check for chance to be between 0.0 and 1.0 in CompressionParameters. was (Author: xedin): {code} if (FBUtilities.threadLocalRandom().nextDouble() metadata.parameters.crcChance) {code} So when you have 1.0 in your parameters you will never get checksum checked (because nextDouble() is 1.0d exclusive), on the other hand with 0.0 you will check checksum every time, shouldn't it work the other way around? Make checksum on a compressed blocks optional - Key: CASSANDRA-3611 URL: https://issues.apache.org/jira/browse/CASSANDRA-3611 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Priority: Minor Labels: compression Fix For: 1.1 Attachments: 0001-crc-check-chance-v2.patch, 0001-crc-check-chance.patch Currently every uncompressed block is run against checksum algo, there is cpu overhead in doing same... We might want to make it configurable/optional for some use cases which might not require checksum all the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3610) Checksum improvement for CompressedRandomAccessReader
[ https://issues.apache.org/jira/browse/CASSANDRA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3610: - Attachment: 0001-use-pure-java-CRC32-v3.patch rebased Checksum improvement for CompressedRandomAccessReader - Key: CASSANDRA-3610 URL: https://issues.apache.org/jira/browse/CASSANDRA-3610 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.1 Attachments: 0001-use-pure-java-CRC32-v2.patch, 0001-use-pure-java-CRC32-v3.patch, 0001-use-pure-java-CRC32.patch, TestCrc32Performance.java When compression is on, Currently we see checksum taking up about 40% of the CPU more than snappy library. Looks like hadoop solved it by implementing their own checksum, we can either use it or implement something like that. http://images.slidesharecdn.com/1toddlipconyanpeichen-cloudera-hadoopandperformance-final-10132228-phpapp01-slide-15-768.jpg?1321043717 in our test env it provided 50% improvement over native implementation which uses jni to call the OS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3674) add nodetool explicitgc
[ https://issues.apache.org/jira/browse/CASSANDRA-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176249#comment-13176249 ] Peter Schuller commented on CASSANDRA-3674: --- It prints the heap usage after trying to obtain it as soon as possible after the explicit GC completes. I did however just realize that I haven't tested whether the explicit gc invocation is blocking in the case of -XX:+ExplicitGCInvokesConcurrent. add nodetool explicitgc --- Key: CASSANDRA-3674 URL: https://issues.apache.org/jira/browse/CASSANDRA-3674 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Peter Schuller Priority: Minor Attachments: CASSANDRA-3674-trunk.txt So that you can easily ask people run nodetool explicitgc and paste the results. I'll file a separate JIRA suggesting that we ship with -XX:+ExplicitGCInvokesConcurrent by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of LargeDataSetConsiderations by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The LargeDataSetConsiderations page has been changed by PeterSchuller: http://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diffrev1=19rev2=20 * Cassandra will read through sstable index files on start-up, doing what is known as index sampling. This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). * Potential future improvement: [[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]]. + * The total number of rows per node correlates directly with the size of bloom filters and sampled index entries. Expect the base memory requirement of a node to increase linearly with the number of keys (assuming the average row key size remains constant). + * You can decrease the memory use due to index sampling by changing the index sampling interval in cassandra.yaml + * You should soon be able to tweak the bloom filter sizes too once [[https://issues.apache.org/jira/browse/CASSANDRA-3497|CASSANDRA-3497]] is done
[Cassandra Wiki] Update of LargeDataSetConsiderations by PeterSchuller
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The LargeDataSetConsiderations page has been changed by PeterSchuller: http://wiki.apache.org/cassandra/LargeDataSetConsiderations?action=diffrev1=20rev2=21 * Cassandra will read through sstable index files on start-up, doing what is known as index sampling. This is used to keep a subset (currently and by default, 1 out of 100) of keys and and their on-disk location in the index, in memory. See [[ArchitectureInternals]]. This means that the larger the index files are, the longer it takes to perform this sampling. Thus, for very large indexes (typically when you have a very large number of keys) the index sampling on start-up may be a significant issue. * A negative side-effect of a large row-cache is start-up time. The periodic saving of the row cache information only saves the keys that are cached; the data has to be pre-fetched on start-up. On a large data set, this is probably going to be seek-bound and the time it takes to warm up the row cache will be linear with respect to the row cache size (assuming sufficiently large amounts of data that the seek bound I/O is not subject to optimization by disks). * Potential future improvement: [[https://issues.apache.org/jira/browse/CASSANDRA-1625|CASSANDRA-1625]]. - * The total number of rows per node correlates directly with the size of bloom filters and sampled index entries. Expect the base memory requirement of a node to increase linearly with the number of keys (assuming the average row key size remains constant). + * The total number of rows per node correlates directly with the size of bloom filters and sampled index entries. Expect the base memory requirement of a node to increase linearly with the number of keys (assuming the average row key size remains constant). If you are not using caching at all (e.g. you are doing analysis type workloads), expect these two to be the two biggest consumers of memory. * You can decrease the memory use due to index sampling by changing the index sampling interval in cassandra.yaml * You should soon be able to tweak the bloom filter sizes too once [[https://issues.apache.org/jira/browse/CASSANDRA-3497|CASSANDRA-3497]] is done
[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way
[ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-3497: -- Attachment: 0001-give-default-val-to-fp_chance.patch Radim, Thanks for the report. The problem is that the new bloom_filter_fp_chance in avro interface definition does not have proper default. I attached the patch to fix it. BloomFilter FP ratio should be configurable or size-restricted some other way - Key: CASSANDRA-3497 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brandon Williams Assignee: Yuki Morishita Priority: Minor Fix For: 1.0.7 Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1224981 - /cassandra/branches/cassandra-1.0/src/avro/internode.genavro
Author: jbellis Date: Tue Dec 27 19:12:39 2011 New Revision: 1224981 URL: http://svn.apache.org/viewvc?rev=1224981view=rev Log: make avro bloom_filter_fp_chance default to null patch by yukim; reviewed by jbellis for CASSANDRA-3497 Modified: cassandra/branches/cassandra-1.0/src/avro/internode.genavro Modified: cassandra/branches/cassandra-1.0/src/avro/internode.genavro URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/src/avro/internode.genavro?rev=1224981r1=1224980r2=1224981view=diff == --- cassandra/branches/cassandra-1.0/src/avro/internode.genavro (original) +++ cassandra/branches/cassandra-1.0/src/avro/internode.genavro Tue Dec 27 19:12:39 2011 @@ -71,7 +71,7 @@ protocol InterNode { union { null, string } compaction_strategy = null; union { null, mapstring } compaction_strategy_options = null; union { null, mapstring } compression_options = null; -union { double, null } bloom_filter_fp_chance; +union { null, double } bloom_filter_fp_chance = null; } @aliases([org.apache.cassandra.config.avro.KsDef])
[jira] [Resolved] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way
[ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis resolved CASSANDRA-3497. --- Resolution: Fixed committed BloomFilter FP ratio should be configurable or size-restricted some other way - Key: CASSANDRA-3497 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brandon Williams Assignee: Yuki Morishita Priority: Minor Fix For: 1.0.7 Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices
[ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176291#comment-13176291 ] Jonathan Ellis commented on CASSANDRA-1600: --- What do we gain from typedefing ListIndexExpression to FilterClause? (I note this was part of Stu and my original attempts back in April but I don't remember a good reason for that.) {noformat} +/* + * XXX: If the range requested is a token range, we'll have to start at the beginning (and stop at the end) of + * the indexed row unfortunately (which will be inefficient), because we have not way to intuit the small + * possible key having a given token. A fix would be to actually store the token along the key in the + * indexed row. + */ {noformat} This is fine since there's no reason to be searching by token unless you're doing an exhaustive scan, i.e. a m/r job. {noformat} + rows.addAll(RangeSliceVerbHandler.executeLocally(command)); {noformat} Another place the original patches failed... we should avoid this because it means we're now allowing one range scan per thrift client instead of one per read stage thread, and it bypasses the drop hopeless requests overcapacity protection built in there. Look at SP.LocalReadRunnable for how to do this safely. Simplest fix would be to just continue routing all range scans over MessagingService. Nit: I'd remove this comment {code} +// Mostly just a typedef {code} since class definitions to hardcode a specific version of a generic type are an antipattern, but this is necessary to mix in the CloseableIterator interface. Merge get_indexed_slices with get_range_slices -- Key: CASSANDRA-1600 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600 Project: Cassandra Issue Type: New Feature Components: API Reporter: Stu Hood Assignee: Sylvain Lebresne Fix For: 1.1 Attachments: 0001-Add-optional-FilterClause-to-KeyRange-and-support-do-v2.patch, 0001-Add-optional-FilterClause-to-KeyRange-and-support-doin.txt, 0001-Add-optional-FilterClause-to-KeyRange-v3.patch, 0002-allow-get_range_slices-to-apply-filter-to-a-sequenti-v2.patch, 0002-allow-get_range_slices-to-apply-filter-to-a-sequential.txt, 0002-thrift-generated-code-changes-v3.patch, 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v3.patch, 0004-Update-cql-to-not-use-deprecated-index-scan-v3.patch From a comment on 1157: {quote} IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning behind using 'KeyRange' for get_range_slices applies there as well, since if you know the range you care about in the primary index, you don't want to continue scanning until you exhaust 'count' (or the cluster). Since it would appear that get_indexed_slices would benefit from a KeyRange, why not smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange? {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3623: - Attachment: (was: CRC+MMapIO.xlsx) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3623: - Attachment: (was: MMappedIO-Performance.docx) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3623: - Attachment: CRC+MMapIO.xlsx MMappedIO-Performance.docx Done, 1) fixed the data for 10K 2) rebased 3610 Thanks! use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1224998 - in /cassandra/trunk: ./ contrib/ doc/cql/ interface/thrift/gen-java/org/apache/cassandra/thrift/ src/avro/ src/java/org/apache/cassandra/cql/ src/java/org/apache/cassandra/db/ s
Author: jbellis Date: Tue Dec 27 20:03:37 2011 New Revision: 1224998 URL: http://svn.apache.org/viewvc?rev=1224998view=rev Log: merge from 1.0 Modified: cassandra/trunk/ (props changed) cassandra/trunk/CHANGES.txt cassandra/trunk/contrib/ (props changed) cassandra/trunk/doc/cql/CQL.textile cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Cassandra.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/Column.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/InvalidRequestException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/NotFoundException.java (props changed) cassandra/trunk/interface/thrift/gen-java/org/apache/cassandra/thrift/SuperColumn.java (props changed) cassandra/trunk/src/avro/internode.genavro cassandra/trunk/src/java/org/apache/cassandra/cql/Cql.g cassandra/trunk/src/java/org/apache/cassandra/cql/CreateColumnFamilyStatement.java cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java cassandra/trunk/src/java/org/apache/cassandra/db/index/SecondaryIndex.java cassandra/trunk/src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysIndex.java Propchange: cassandra/trunk/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Dec 27 20:03:37 2011 @@ -4,7 +4,7 @@ /cassandra/branches/cassandra-0.8:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1220925,1220927-1222440 /cassandra/branches/cassandra-0.8.0:1125021-1130369 /cassandra/branches/cassandra-0.8.1:1101014-1125018 -/cassandra/branches/cassandra-1.0:1167085-1222743 +/cassandra/branches/cassandra-1.0:1167085-1224997 /cassandra/branches/cassandra-1.0.0:1167104-1167229,1167232-1181093,1181741,1181816,1181820,1182951,1183243 /cassandra/branches/cassandra-1.0.5:1208016 /cassandra/tags/cassandra-0.7.0-rc3:1051699-1053689 Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1224998r1=1224997r2=1224998view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Tue Dec 27 20:03:37 2011 @@ -45,11 +45,13 @@ * Avoid creating empty and non cleaned writer during compaction (CASSANDRA-3616) * stop thrift service in shutdown hook so we can quiesce MessagingService (CASSANDRA-3335) + * (CQL) compaction_strategy_options and compression_parameters for + CREATE COLUMNFAMILY statement (CASSANDRA-3374) Merged from 0.8: * avoid logging (harmless) exception when GC takes 1ms (CASSANDRA-3656) * prevent new nodes from thinking down nodes are up forever (CASSANDRA-3626) * Flush non-cfs backed secondary indexes (CASSANDRA-3659) - + * Secondary Indexes should report memory consumption (CASSANDRA-3155) 1.0.6 * (CQL) fix cqlsh support for replicate_on_write (CASSANDRA-3596) Propchange: cassandra/trunk/contrib/ -- --- svn:mergeinfo (original) +++ svn:mergeinfo Tue Dec 27 20:03:37 2011 @@ -4,7 +4,7 @@ /cassandra/branches/cassandra-0.8/contrib:1090934-1125013,1125019-1198724,1198726-1206097,1206099-1220925,1220927-1222440 /cassandra/branches/cassandra-0.8.0/contrib:1125021-1130369 /cassandra/branches/cassandra-0.8.1/contrib:1101014-1125018 -/cassandra/branches/cassandra-1.0/contrib:1167085-1222743 +/cassandra/branches/cassandra-1.0/contrib:1167085-1224997 /cassandra/branches/cassandra-1.0.0/contrib:1167104-1167229,1167232-1181093,1181741,1181816,1181820,1182951,1183243 /cassandra/branches/cassandra-1.0.5/contrib:1208016 /cassandra/tags/cassandra-0.7.0-rc3/contrib:1051699-1053689 Modified: cassandra/trunk/doc/cql/CQL.textile URL: http://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.textile?rev=1224998r1=1224997r2=1224998view=diff == --- cassandra/trunk/doc/cql/CQL.textile (original) +++ cassandra/trunk/doc/cql/CQL.textile Tue Dec 27 20:03:37 2011 @@ -488,9 +488,14 @@ bc(syntax). createColumnFamilyStatement ::= CREATE COLUMNFAMILY name ( term storageType PRIMARY KEY ( , term storageType )* ) - ( WITH identifier = cfOptionVal - ( AND identifier = cfOptionVal )* )? + ( WITH optionName = cfOptionVal + ( AND optionName = cfOptionVal )* )? ; +optionName ::= identifier + | optionName : identifier + | optionName : integer + ; + cfOptionVal
svn commit: r1225001 [2/2] - in /cassandra/trunk: lib/ lib/licenses/ src/java/org/apache/cassandra/db/ test/unit/org/apache/cassandra/db/
Modified: cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java?rev=1225001r1=1225000r2=1225001view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java Tue Dec 27 20:17:17 2011 @@ -24,11 +24,15 @@ import java.util.SortedMap; import java.util.SortedSet; import java.util.TreeMap; +import com.google.common.base.Function; + import org.apache.cassandra.db.marshal.AbstractType; import org.apache.cassandra.utils.Allocator; -public class TreeMapBackedSortedColumns extends TreeMapByteBuffer, IColumn implements ISortedColumns +public class TreeMapBackedSortedColumns extends AbstractThreadUnsafeSortedColumns implements ISortedColumns { +private final TreeMapByteBuffer, IColumn map; + public static final ISortedColumns.Factory factory = new Factory() { public ISortedColumns create(AbstractType? comparator, boolean insertReversed) @@ -49,17 +53,17 @@ public class TreeMapBackedSortedColumns public AbstractType? getComparator() { -return (AbstractType)comparator(); +return (AbstractType)map.comparator(); } private TreeMapBackedSortedColumns(AbstractType? comparator) { -super(comparator); +this.map = new TreeMapByteBuffer, IColumn(comparator); } private TreeMapBackedSortedColumns(SortedMapByteBuffer, IColumn columns) { -super(columns); +this.map = new TreeMapByteBuffer, IColumn(columns); } public ISortedColumns.Factory getFactory() @@ -69,7 +73,7 @@ public class TreeMapBackedSortedColumns public ISortedColumns cloneMe() { -return new TreeMapBackedSortedColumns(this); +return new TreeMapBackedSortedColumns(map); } public boolean isInsertReversed() @@ -88,7 +92,7 @@ public class TreeMapBackedSortedColumns // but TreeMap lacks putAbsent. Rather than split it into a get, then put check, we do it as follows, // which saves the extra get in the no-conflict case [for both normal and super columns], // in exchange for a re-put in the SuperColumn case. -IColumn oldColumn = put(name, column); +IColumn oldColumn = map.put(name, column); if (oldColumn != null) { if (oldColumn instanceof SuperColumn) @@ -98,13 +102,13 @@ public class TreeMapBackedSortedColumns // add the new one to the old, then place old back in the Map, rather than copy the old contents // into the new Map entry. ((SuperColumn) oldColumn).putColumn((SuperColumn)column, allocator); -put(name, oldColumn); +map.put(name, oldColumn); } else { // calculate reconciled col from old (existing) col and new col IColumn reconciledColumn = column.reconcile(oldColumn, allocator); -put(name, reconciledColumn); +map.put(name, reconciledColumn); } } } @@ -112,10 +116,10 @@ public class TreeMapBackedSortedColumns /** * We need to go through each column in the column container and resolve it before adding */ -public void addAll(ISortedColumns cm, Allocator allocator) +protected void addAllColumns(ISortedColumns cm, Allocator allocator, FunctionIColumn, IColumn transformation) { for (IColumn column : cm.getSortedColumns()) -addColumn(column, allocator); +addColumn(transformation.apply(column), allocator); } public boolean replace(IColumn oldColumn, IColumn newColumn) @@ -127,15 +131,15 @@ public class TreeMapBackedSortedColumns // column or the column was not equal to oldColumn (to be coherent // with other implementation). We optimize for the common case where // oldColumn do is present though. -IColumn previous = put(oldColumn.name(), newColumn); +IColumn previous = map.put(oldColumn.name(), newColumn); if (previous == null) { -remove(oldColumn.name()); +map.remove(oldColumn.name()); return false; } if (!previous.equals(oldColumn)) { -put(oldColumn.name(), previous); +map.put(oldColumn.name(), previous); return false; } return true; @@ -143,37 +147,42 @@ public class TreeMapBackedSortedColumns public IColumn getColumn(ByteBuffer name) { -return get(name); +return map.get(name); } public void removeColumn(ByteBuffer name) { -remove(name); +
[jira] [Commented] (CASSANDRA-2893) Add row-level isolation
[ https://issues.apache.org/jira/browse/CASSANDRA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176309#comment-13176309 ] Jonathan Ellis commented on CASSANDRA-2893: --- Committed with the Functions.identity change. Leaving open for potential performance enhancements. Add row-level isolation --- Key: CASSANDRA-2893 URL: https://issues.apache.org/jira/browse/CASSANDRA-2893 Project: Cassandra Issue Type: Improvement Reporter: Jonathan Ellis Assignee: Sylvain Lebresne Priority: Minor Fix For: 1.1 Attachments: 0001-Move-deletion-infos-into-ISortedColumns-v2.patch, 0001-Move-deletion-infos-into-ISortedColumns.patch, 0002-Make-memtable-use-CF.addAll-v2.patch, 0002-Make-memtable-use-CF.addAll.patch, 0003-Add-AtomicSortedColumn-and-snapTree-v2.patch, 0003-Add-AtomicSortedColumn-and-snapTree.patch, latency-plain.svg, latency.svg, snaptree-0.1-SNAPSHOT.jar This could be done using an the atomic ConcurrentMap operations from the Memtable and something like http://code.google.com/p/pcollections/ to replace the ConcurrentSkipListMap in ThreadSafeSortedColumns. The trick is that pcollections does not provide a SortedMap, so we probably need to write our own. Googling [persistent sortedmap] I found http://code.google.com/p/actord/source/browse/trunk/actord/src/main/scala/ff/collection (in scala) and http://clojure.org/data_structures#Data Structures-Maps. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3676) Add snaptree dependency to maven central and update pom
Add snaptree dependency to maven central and update pom --- Key: CASSANDRA-3676 URL: https://issues.apache.org/jira/browse/CASSANDRA-3676 Project: Cassandra Issue Type: Sub-task Reporter: T Jake Luciani Assignee: Stephen Connolly Fix For: 1.1 Snaptree dependency needs to be added to maven before we can release 1.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way
[ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176313#comment-13176313 ] Radim Kolar commented on CASSANDRA-3497: FP ratio it is not displayed in output of cli: show schema, describe; BloomFilter FP ratio should be configurable or size-restricted some other way - Key: CASSANDRA-3497 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Brandon Williams Assignee: Yuki Morishita Priority: Minor Fix For: 1.0.7 Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1225002 - in /cassandra/trunk: CHANGES.txt NEWS.txt
Author: jbellis Date: Tue Dec 27 20:26:48 2011 New Revision: 1225002 URL: http://svn.apache.org/viewvc?rev=1225002view=rev Log: update CHANGES, NEWS Modified: cassandra/trunk/CHANGES.txt cassandra/trunk/NEWS.txt Modified: cassandra/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1225002r1=1225001r2=1225002view=diff == --- cassandra/trunk/CHANGES.txt (original) +++ cassandra/trunk/CHANGES.txt Tue Dec 27 20:26:48 2011 @@ -1,4 +1,5 @@ 1.1-dev + * add row-level isolation via SnapTree (CASSANDRA-2893) * Optimize key count estimation when opening sstable on startup (CASSANDRA-2988) * multi-dc replication optimization supporting CL ONE (CASSANDRA-3577) Modified: cassandra/trunk/NEWS.txt URL: http://svn.apache.org/viewvc/cassandra/trunk/NEWS.txt?rev=1225002r1=1225001r2=1225002view=diff == --- cassandra/trunk/NEWS.txt (original) +++ cassandra/trunk/NEWS.txt Tue Dec 27 20:26:48 2011 @@ -35,6 +35,14 @@ Upgrading and row_cache_{size_in_mb, save_period} in conf/cassandra.yaml are used instead of per-ColumnFamily options. +Features + +- Cassandra 1.1 adds row-level isolation. Multi-column updates to + a single row have always been *atomic* (either all will be applied, + or none) thanks to the CommitLog, but until 1.1 they were not *isolated* + -- a reader may see mixed old and new values while the update happens. + + 1.0.6 =
[jira] [Commented] (CASSANDRA-3676) Add snaptree dependency to maven central and update pom
[ https://issues.apache.org/jira/browse/CASSANDRA-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176314#comment-13176314 ] Jonathan Ellis commented on CASSANDRA-3676: --- https://github.com/nbronson/snaptree Add snaptree dependency to maven central and update pom --- Key: CASSANDRA-3676 URL: https://issues.apache.org/jira/browse/CASSANDRA-3676 Project: Cassandra Issue Type: Sub-task Reporter: T Jake Luciani Assignee: Stephen Connolly Fix For: 1.1 Snaptree dependency needs to be added to maven before we can release 1.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3677) NPE during HH delivery when gossip turned off on target
NPE during HH delivery when gossip turned off on target --- Key: CASSANDRA-3677 URL: https://issues.apache.org/jira/browse/CASSANDRA-3677 Project: Cassandra Issue Type: Bug Affects Versions: 1.0.6 Reporter: Radim Kolar Priority: Trivial probably not important bug ERROR [OptionalTasks:1] 2011-12-27 21:44:25,342 AbstractCassandraDaemon.java (line 138) Fatal exception in thread Thread[OptionalTasks:1,5,main] java.lang.NullPointerException at org.cliffc.high_scale_lib.NonBlockingHashMap.hash(NonBlockingHashMap.java:113) at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:553) at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfMatch(NonBlockingHashMap.java:348) at org.cliffc.high_scale_lib.NonBlockingHashMap.putIfAbsent(NonBlockingHashMap.java:319) at org.cliffc.high_scale_lib.NonBlockingHashSet.add(NonBlockingHashSet.java:32) at org.apache.cassandra.db.HintedHandOffManager.scheduleHintDelivery(HintedHandOffManager.java:371) at org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:356) at org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:84) at org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:119) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3658) Fix smallish problems find by FindBugs
[ https://issues.apache.org/jira/browse/CASSANDRA-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176320#comment-13176320 ] Nick Bailey commented on CASSANDRA-3658: This breaks a bunch of jmx stuff. A fair amount of jmx methods return Token objects so they need to be serializable. I plan on doing CASSANDRA-2805 for 1.1, but jmx will be broken in trunk until I get that done unless that specific patch is reverted. Fix smallish problems find by FindBugs -- Key: CASSANDRA-3658 URL: https://issues.apache.org/jira/browse/CASSANDRA-3658 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: fingbugs Fix For: 1.1 Attachments: 0001-Respect-Future-semantic.patch, 0002-Avoid-race-when-reloading-snitch-file.patch, 0003-use-static-inner-class-when-possible.patch, 0004-Remove-dead-code.patch, 0005-Protect-against-signed-byte-extension.patch, 0006-Add-hashCode-method-when-equals-is-overriden.patch, 0007-Inverse-argument-of-compare-instead-of-negating-to-a.patch, 0008-stop-pretending-Token-is-Serializable-LocalToken-is-.patch, 0009-remove-useless-assert-that-is-always-true.patch, 0010-Add-equals-and-hashCode-to-Expiring-column.patch I've just run (the newly released) FindBugs 2 out of curiosity. Attaching a number of patches related to issue raised by it. There is nothing major at all so all patches are against trunk. I've tried keep each issue to it's own patch with a self describing title. It far from covers all FindBugs alerts, but it's a picky tool so I've tried to address only what felt at least vaguely useful. Those are still mostly nits (only patch 2 is probably an actual bug). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
svn commit: r1225014 - in /cassandra/trunk/src/java/org/apache/cassandra/dht: LocalToken.java Token.java
Author: jbellis Date: Tue Dec 27 21:01:57 2011 New Revision: 1225014 URL: http://svn.apache.org/viewvc?rev=1225014view=rev Log: make Token serializable again for JMX Modified: cassandra/trunk/src/java/org/apache/cassandra/dht/LocalToken.java cassandra/trunk/src/java/org/apache/cassandra/dht/Token.java Modified: cassandra/trunk/src/java/org/apache/cassandra/dht/LocalToken.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/LocalToken.java?rev=1225014r1=1225013r2=1225014view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/dht/LocalToken.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/dht/LocalToken.java Tue Dec 27 21:01:57 2011 @@ -24,6 +24,8 @@ import org.apache.cassandra.db.marshal.A public class LocalToken extends TokenByteBuffer { +static final long serialVersionUID = 8437543776403014875L; + private final AbstractType comparator; public LocalToken(AbstractType comparator, ByteBuffer token) Modified: cassandra/trunk/src/java/org/apache/cassandra/dht/Token.java URL: http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/Token.java?rev=1225014r1=1225013r2=1225014view=diff == --- cassandra/trunk/src/java/org/apache/cassandra/dht/Token.java (original) +++ cassandra/trunk/src/java/org/apache/cassandra/dht/Token.java Tue Dec 27 21:01:57 2011 @@ -30,7 +30,7 @@ import org.apache.cassandra.io.ISerializ import org.apache.cassandra.service.StorageService; import org.apache.cassandra.utils.ByteBufferUtil; -public abstract class TokenT implements RingPositionTokenT +public abstract class TokenT implements RingPositionTokenT, Serializable { private static final long serialVersionUID = 1L;
[jira] [Commented] (CASSANDRA-3658) Fix smallish problems find by FindBugs
[ https://issues.apache.org/jira/browse/CASSANDRA-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176324#comment-13176324 ] Jonathan Ellis commented on CASSANDRA-3658: --- reverted 0008 for now Fix smallish problems find by FindBugs -- Key: CASSANDRA-3658 URL: https://issues.apache.org/jira/browse/CASSANDRA-3658 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: fingbugs Fix For: 1.1 Attachments: 0001-Respect-Future-semantic.patch, 0002-Avoid-race-when-reloading-snitch-file.patch, 0003-use-static-inner-class-when-possible.patch, 0004-Remove-dead-code.patch, 0005-Protect-against-signed-byte-extension.patch, 0006-Add-hashCode-method-when-equals-is-overriden.patch, 0007-Inverse-argument-of-compare-instead-of-negating-to-a.patch, 0008-stop-pretending-Token-is-Serializable-LocalToken-is-.patch, 0009-remove-useless-assert-that-is-always-true.patch, 0010-Add-equals-and-hashCode-to-Expiring-column.patch I've just run (the newly released) FindBugs 2 out of curiosity. Attaching a number of patches related to issue raised by it. There is nothing major at all so all patches are against trunk. I've tried keep each issue to it's own patch with a self describing title. It far from covers all FindBugs alerts, but it's a picky tool so I've tried to address only what felt at least vaguely useful. Those are still mostly nits (only patch 2 is probably an actual bug). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3507) Proposal: separate cqlsh from CQL drivers
[ https://issues.apache.org/jira/browse/CASSANDRA-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176341#comment-13176341 ] paul cannon commented on CASSANDRA-3507: bq. Is it possible for the ASF contributors to vote on code that isn't in the official tree, like, say, a particular tag of the python CQL driver at Apache Extras? If we can distribute the drivers in the same official repository, most of these problems go away. I read through all the rules I can find, and I see nothing prohibiting us from voting on and releasing specific source/binary artifacts of the various cql drivers alongside c*, as long as they follow the ASF licensing restrictions. http://www.apache.org/dev/release.html#distribute-other-artifacts seems the most apropos. So, I propose that we call a vote for a Cassandra project release of cassandra-dbapi2, alias python-cql, once I get the ASF licensing stuff sorted in it, and tag and post its 1.0.7 version. Then we can put the python-cql debs in the official debian repository, and everything is happy. Proposal: separate cqlsh from CQL drivers - Key: CASSANDRA-3507 URL: https://issues.apache.org/jira/browse/CASSANDRA-3507 Project: Cassandra Issue Type: Improvement Components: Packaging, Tools Affects Versions: 1.0.3 Environment: Debian-based systems Reporter: paul cannon Assignee: paul cannon Priority: Minor Labels: cql, cqlsh Fix For: 1.1 Whereas: * It has been shown to be very desirable to decouple the release cycles of Cassandra from the various client CQL drivers, and * It is also desirable to include a good interactive CQL client with releases of Cassandra, and * It is not desirable for Cassandra releases to depend on 3rd-party software which is neither bundled with Cassandra nor readily available for every target platform, but * Any good interactive CQL client will require a CQL driver; Therefore, be it resolved that: * cqlsh will not use an official or supported CQL driver, but will include its own private CQL driver, not intended for use by anything else, and * the Cassandra project will still recommend installing and using a proper CQL driver for client software. To ease maintenance, the private CQL driver included with cqlsh may very well be created by copying the python CQL driver from one directory into another, but the user shouldn't rely on this. Maybe we even ought to take some minor steps to discourage its use for other purposes. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[Cassandra Wiki] Update of NodeTool by JanneJalkanen
Dear Wiki user, You have subscribed to a wiki page or wiki category on Cassandra Wiki for change notification. The NodeTool page has been changed by JanneJalkanen: http://wiki.apache.org/cassandra/NodeTool?action=diffrev1=21rev2=22 Comment: setcompactionthroughput documented == Scrub == Cassandra v0.7.1 and v0.7.2 shipped with a bug that caused incorrect row-level bloom filters to be generated when compacting sstables generated with earlier versions. This would manifest in IOExceptions during column name-based queries. v0.7.3 provides nodetool scrub to rebuild sstables with correct bloom filters, with no data lost. (If your cluster was never on 0.7.0 or earlier, you don't have to worry about this.) Note that nodetool scrub will snapshot your data files before rebuilding, just in case. - == upgradesstables == + == Upgradesstables == While scrub does rebuild your sstables, it will also discard data it deems broken and create a snapshot, which you have to remove manually. If you just wish to rebuild your sstables without all that jazz, then use nodetool upgradesstables. This is useful e.g. when you are upgrading your server, or changing compression options. upgradesstables is available from Cassandra 1.0.4 onwards. + + == Setcompactionthroughput == + + As of Cassandra 1.0, the amount of resources that compactions can use can be easily controlled using a single value: the compaction throughput, which is expressed in Megabytes/second. You can (and probably should) specify this in your cassandra.yaml file, but in some cases it can be very beneficial to change it live using the nodetool. + + For example, in [[http://www.slideshare.net/edwardcapriolo/m6d-cassandrapresentation|this presentation]] Edward Capriolo explains how their company throttles compaction during the day so that I/O is mostly reserved for serving requests, whereas during the night they allocate more capability for running compactions. This can be e.g. accomplished through a simple cron script: + + {{{ + # Script increases compaction throughput to 999 MB/s (i.e. nearly unlimited) for 00-06. + # + # turn into Mr.batch at night + 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999 + # turn back into Dr.Realtime for day + 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16 + }}} + + Setting the compaction throughput to zero disables compaction. This may be useful in some cases if you e.g. wish to avoid the compaction I/O during extremely busy periods. It is not a good idea to leave it on for a long period, since you will end up with a large number of very small sstables, which will start to slow down your reads. == Cfhistograms ==
[jira] [Updated] (CASSANDRA-3611) Make checksum on a compressed blocks optional
[ https://issues.apache.org/jira/browse/CASSANDRA-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3611: - Attachment: 0001-crc-check-chance-v3.patch Done, Thanks! Make checksum on a compressed blocks optional - Key: CASSANDRA-3611 URL: https://issues.apache.org/jira/browse/CASSANDRA-3611 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Priority: Minor Labels: compression Fix For: 1.1 Attachments: 0001-crc-check-chance-v2.patch, 0001-crc-check-chance-v3.patch, 0001-crc-check-chance.patch Currently every uncompressed block is run against checksum algo, there is cpu overhead in doing same... We might want to make it configurable/optional for some use cases which might not require checksum all the time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3583) Add rebuild index JMX command
[ https://issues.apache.org/jira/browse/CASSANDRA-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176352#comment-13176352 ] Vijay commented on CASSANDRA-3583: -- Hi Jonathan, I think you also want to clear the Built flag on the index(es) or the rebuild will be incomplete if you cancel or restart partway through. in the current patch, we dont need to as the new indexes will be new SST's and if some one stopped in between then it wont be worser than what it was earlier... otherwise we might want to clear the field when we start and reset it in the end that way the clients might notice some additional missing indexes. Agree? Add rebuild index JMX command --- Key: CASSANDRA-3583 URL: https://issues.apache.org/jira/browse/CASSANDRA-3583 Project: Cassandra Issue Type: New Feature Components: Core, Tools Reporter: Jonathan Ellis Assignee: Vijay Priority: Minor Fix For: 1.1 Attachments: 0001-3583.patch CASSANDRA-1740 allows aborting an index build, but there is no way to re-attempt the build without restarting the server. We've also had requests to allow rebuilding an index that *has* been built, so it would be nice to kill two birds with one stone here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3631) While sleeping for RING_DELAY, bootstrapping nodes do not show as joining in the ring (or at all)
[ https://issues.apache.org/jira/browse/CASSANDRA-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3631: - Attachment: 0001-add-initializing-status-in-nodetool-for-3631.patch Hi Brandon, Let me know if the attached patch is sufficient... It looks like the following. Address DC RackStatus State LoadOwns Token 143633478586163499463326301508681906517 10.123.42.165 us-east 1a Down Init? ? ? 10.42.134.229 us-east 1a Up Normal 1.74 GB 35.40% 33724529808132598296109669138912087817 10.93.19.6 us-east 1a Down Normal 1.78 GB 38.69% 99546828780538918038465713665698202555 10.93.74.164us-east 1a Up Normal 120.37 MB 2.53% 103845090001698524309715695190561870103 10.123.59.26us-east 1a Up Normal 1.15 GB 23.39% 143633478586163499463326301508681906517 While sleeping for RING_DELAY, bootstrapping nodes do not show as joining in the ring (or at all) - Key: CASSANDRA-3631 URL: https://issues.apache.org/jira/browse/CASSANDRA-3631 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Vijay Priority: Minor Fix For: 1.0.7 Attachments: 0001-add-initializing-status-in-nodetool-for-3631.patch As the title says, the nodes do not show in the ring until they are actually in the token selection/streaming phase. This appears due to CASSANDRA-957, but now can be further exacerbated by longer sleep times for CASSANDRA-3629. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3112) Make repair fail when an unexpected error occurs
[ https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176420#comment-13176420 ] Vijay commented on CASSANDRA-3112: -- But do you know what is the reason for it making no progress? Because unless we know what can cause it, not sure what to fix? it is usually is in the Streaming phase, i think adding a SoTimeout might fix it... but it is so random i couldn't reproduce in my tests but definitely seeing it in production. How can we lose messages, aren't tcp supposed to avoid this? Once you send the message the other node might get restarted (without validation or starting any thing) or the sockets can get reset, Actually i think when i posted this message it was because of CASSANDRA-3577. There isnt something like hints or a retry on the messages sent for the repairs. I understand this isnt the scope of this ticket, but i still think there should be a way to orchestrate repairs with a little complicated logic and i will try to do some parts of it in the other ticket. Make repair fail when an unexpected error occurs Key: CASSANDRA-3112 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 1.1 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch CASSANDRA-2433 makes it so that nodetool repair will fail if a node participating to repair dies before completing his part of the repair. This handles most of the situation where repair was previously hanging, but repair can still hang if an unexpected error occurs during either the merkle tree creation (an on-disk corruption triggers an IOError say) or during streaming (though I'm not sure what could make streaming failed outside of 'one of the node died' (besides a bug)). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176455#comment-13176455 ] Matt Stump commented on CASSANDRA-2474: --- I wanted to bring this up because it hasn't been mentioned yet, and it's currently a topic of discussion on the hector-users list: for query results of composite columns are you going to deserialize the column name or leave it as an opaque blob? For the current implementation of composite columns in Hector the type information for dynamic composites is encoded in the name, but that information is lacking for the static variety. My understanding is that the type information is only stored at the CFDef level as the type alias, and could possibly be cached to aide in deserialization but that seems like a bit of a hack. CQL support for compound columns Key: CASSANDRA-2474 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Eric Evans Assignee: Pavel Yaskevich Labels: cql Fix For: 1.1 Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg For the most part, this boils down to supporting the specification of compound column names (the CQL syntax is colon-delimted terms), and then teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3678) New Pluggable Compaction to handle Capped Rows / Super Columns
New Pluggable Compaction to handle Capped Rows / Super Columns -- Key: CASSANDRA-3678 URL: https://issues.apache.org/jira/browse/CASSANDRA-3678 Project: Cassandra Issue Type: New Feature Components: API, Contrib, Core Environment: ALL Reporter: Praveen Baratam Now that Pluggable Compaction is released, its feasible to implement a CompactionStrategy that handles Capped (Limited in size) Rows or SuperColumns in a ColumnFamily. This feature was requested many times on mailing lists by many people including me. http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Use-Case-scenario-Keeping-a-window-of-data-online-analytics-td4694907.html The above thread was quoted in Cassandra - Use Cases too. Reading and interpreting many conversations over this issue, I could infer that it was discussed in two flavors. 1. Enforcing Max Columns per Row/SC 2. Sliding Time Window Many a times MEMTABLE/SSTABLE approach of Cassandra is quoted as a limiting factor for an amicable implementation. In my perspective the above mentioned SSTABLE approach could mean some trade-offs and clever engineering but its still doable. This feature is not intended to offer a drop-in replacement for specialized tools like RRDTool, jRobin, etc. but to decrease the overhead of retro fitting such functionality into CASSANDRA and finding an approach that achieves the principal purpose of discarding obsolete data and stretching only as far as necessary. This ticket is to discuss ideas and implementation details of such compaction strategy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3678) New Pluggable Compaction to handle Capped Rows / Super Columns
[ https://issues.apache.org/jira/browse/CASSANDRA-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Praveen Baratam updated CASSANDRA-3678: --- Description: Now that Pluggable Compaction is released, its feasible to implement a CompactionStrategy that handles Capped (Limited in size) Rows or SuperColumns in a ColumnFamily. This feature was requested many times on mailing lists by many people including me. http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Use-Case-scenario-Keeping-a-window-of-data-online-analytics-td4694907.html The above thread was quoted in Cassandra - Use Cases too. Reading and interpreting many conversations over this issue, I could infer that it was discussed in two flavors. 1. Enforcing Max Columns per Row/SC 2. Sliding Time Window Many a times MEMTABLE/SSTABLE approach of Cassandra is quoted as a limiting factor for an amicable implementation. In my perspective the above mentioned SSTABLE approach could mean some trade-offs and clever engineering but its still doable. This feature is not intended to offer a drop-in replacement for specialized tools like RRDTool, jRobin, etc. but to decrease the overhead of retro fitting such functionality into CASSANDRA and finding an approach that achieves the principal purpose of discarding obsolete data and stretching only as far as necessary. was: Now that Pluggable Compaction is released, its feasible to implement a CompactionStrategy that handles Capped (Limited in size) Rows or SuperColumns in a ColumnFamily. This feature was requested many times on mailing lists by many people including me. http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Use-Case-scenario-Keeping-a-window-of-data-online-analytics-td4694907.html The above thread was quoted in Cassandra - Use Cases too. Reading and interpreting many conversations over this issue, I could infer that it was discussed in two flavors. 1. Enforcing Max Columns per Row/SC 2. Sliding Time Window Many a times MEMTABLE/SSTABLE approach of Cassandra is quoted as a limiting factor for an amicable implementation. In my perspective the above mentioned SSTABLE approach could mean some trade-offs and clever engineering but its still doable. This feature is not intended to offer a drop-in replacement for specialized tools like RRDTool, jRobin, etc. but to decrease the overhead of retro fitting such functionality into CASSANDRA and finding an approach that achieves the principal purpose of discarding obsolete data and stretching only as far as necessary. This ticket is to discuss ideas and implementation details of such compaction strategy. New Pluggable Compaction to handle Capped Rows / Super Columns -- Key: CASSANDRA-3678 URL: https://issues.apache.org/jira/browse/CASSANDRA-3678 Project: Cassandra Issue Type: New Feature Components: API, Contrib, Core Environment: ALL Reporter: Praveen Baratam Labels: features Original Estimate: 672h Remaining Estimate: 672h Now that Pluggable Compaction is released, its feasible to implement a CompactionStrategy that handles Capped (Limited in size) Rows or SuperColumns in a ColumnFamily. This feature was requested many times on mailing lists by many people including me. http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Use-Case-scenario-Keeping-a-window-of-data-online-analytics-td4694907.html The above thread was quoted in Cassandra - Use Cases too. Reading and interpreting many conversations over this issue, I could infer that it was discussed in two flavors. 1. Enforcing Max Columns per Row/SC 2. Sliding Time Window Many a times MEMTABLE/SSTABLE approach of Cassandra is quoted as a limiting factor for an amicable implementation. In my perspective the above mentioned SSTABLE approach could mean some trade-offs and clever engineering but its still doable. This feature is not intended to offer a drop-in replacement for specialized tools like RRDTool, jRobin, etc. but to decrease the overhead of retro fitting such functionality into CASSANDRA and finding an approach that achieves the principal purpose of discarding obsolete data and stretching only as far as necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira