[jira] [Issue Comment Edited] (CASSANDRA-4138) Add varint encoding to Serializing Cache
[ https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255193#comment-13255193 ] Vijay edited comment on CASSANDRA-4138 at 4/16/12 11:57 PM: Hi Pavel, attached patch has recommended changes except {quote} I think the DBContants class is now should be changed to only share sizeof(type) methods and become something like DBContants.{native, vint}.sizeof(type) {quote} I will mark it private once parent ticket is complete (Messaging and SSTable formats), currently we have it called in other places too. was (Author: vijay2...@yahoo.com): Hi Pavel, attached patch has recommended changes except {comment} I think the DBContants class is now should be changed to only share sizeof(type) methods and become something like DBContants.{native, vint}.sizeof(type) {comment} I will mark it private once parent ticket is complete (Messaging and SSTable formats), currently we have it called in other places too. Add varint encoding to Serializing Cache Key: CASSANDRA-4138 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 1.2 Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.2 Attachments: 0001-CASSANDRA-4138-Take1.patch, 0001-CASSANDRA-4138-V2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-4138) Add varint encoding to Serializing Cache
[ https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255193#comment-13255193 ] Vijay edited comment on CASSANDRA-4138 at 4/16/12 11:58 PM: Hi Pavel, attached patch has recommended changes except I think the DBContants class is now should be changed to only share sizeof(type) methods and become something like DBContants.{native, vint}.sizeof(type) I will mark it private once parent ticket is complete (Messaging and SSTable formats), currently we have it called in other places too. was (Author: vijay2...@yahoo.com): Hi Pavel, attached patch has recommended changes except {quote} I think the DBContants class is now should be changed to only share sizeof(type) methods and become something like DBContants.{native, vint}.sizeof(type) {quote} I will mark it private once parent ticket is complete (Messaging and SSTable formats), currently we have it called in other places too. Add varint encoding to Serializing Cache Key: CASSANDRA-4138 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 1.2 Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.2 Attachments: 0001-CASSANDRA-4138-Take1.patch, 0001-CASSANDRA-4138-V2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them
[ https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253918#comment-13253918 ] Vijay edited comment on CASSANDRA-4140 at 4/14/12 1:06 AM: --- Done! was (Author: vijay2...@yahoo.com): Done and tested! Build stress classes in a location that allows tools/stress/bin/stress to find them --- Key: CASSANDRA-4140 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 1.2 Reporter: Nick Bailey Assignee: Vijay Priority: Trivial Fix For: 1.2 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch Right now its hard to run stress from a checkout of trunk. You need to do 'ant artifacts' and then run the stress tool in the generated artifacts. A discussion on irc came up with the proposal to just move stress to the main jar, but the stress/stressd bash scripts in bin/, and drop the tools directory altogether. It will be easier for users to find that way and will make running stress from a checkout much easier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2635) make cache skipping optional
[ https://issues.apache.org/jira/browse/CASSANDRA-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252775#comment-13252775 ] Vijay edited comment on CASSANDRA-2635 at 4/12/12 7:49 PM: --- +1 for the rest of the patch, how about the following name and comments? {code} # The following setting populates the page cache on memtable flush and compaction # WARNING: Enable this setting only node's data-size can fit in memory. populate_cache_on_flush: false {code} was (Author: vijay2...@yahoo.com): +1 for the rest of the patch, how about the following name and comments? # The following setting populates the page cache on memtable flush and compaction # WARNING: Enable this setting only node's data-size can fit in memory. populate_cache_on_flush: false make cache skipping optional Key: CASSANDRA-2635 URL: https://issues.apache.org/jira/browse/CASSANDRA-2635 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Harish Doddi Priority: Minor Attachments: CASSANDRA-2635-075.txt, CASSANDRA-2635-trunk-1.txt, CASSANDRA-2635-trunk.txt We've applied this patch locally in order to turn of page skipping; not completely but only for compaction/repair situations where it can be directly detrimental in the sense of causing data to become cold even though your entire data set fits in memory. It's better than completely disabling DONTNEED because the cache skipping does make sense and has no relevant (that I can see) detrimental effects in some cases, like when dumping caches. The patch is against 0.7.5 right now but if the change is desired I can make a patch for trunk. Also, the name of the configuration option is dubious since saying 'false' does not actually turn it off completely. I wasn't able to figure out a good name that conveyed the functionality in a short brief name however. A related concern as discussed in CASSANDRA-1902 is that the cache skipping isn't fsync:ing and so won't work reliably on writes. If the feature is to be retained that's something to fix in a different ticket. A question is also whether to retain the default to true or change it to false. I'm kinda leaning to false since it's detrimental in the easy cases of little data. In big cases with lots of data people will have to think and tweak anyway, so better to put the burden on that end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2635) make cache skipping optional
[ https://issues.apache.org/jira/browse/CASSANDRA-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252775#comment-13252775 ] Vijay edited comment on CASSANDRA-2635 at 4/12/12 7:52 PM: --- +1 for the rest of the patch, how about the following name and comments? {code} # The following setting populates the page cache on memtable flush and compaction # WARNING: Enable this setting only when the node's data fit's in memory. populate_cache_on_flush: false {code} was (Author: vijay2...@yahoo.com): +1 for the rest of the patch, how about the following name and comments? {code} # The following setting populates the page cache on memtable flush and compaction # WARNING: Enable this setting only node's data-size can fit in memory. populate_cache_on_flush: false {code} make cache skipping optional Key: CASSANDRA-2635 URL: https://issues.apache.org/jira/browse/CASSANDRA-2635 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Harish Doddi Priority: Minor Attachments: CASSANDRA-2635-075.txt, CASSANDRA-2635-trunk-1.txt, CASSANDRA-2635-trunk.txt We've applied this patch locally in order to turn of page skipping; not completely but only for compaction/repair situations where it can be directly detrimental in the sense of causing data to become cold even though your entire data set fits in memory. It's better than completely disabling DONTNEED because the cache skipping does make sense and has no relevant (that I can see) detrimental effects in some cases, like when dumping caches. The patch is against 0.7.5 right now but if the change is desired I can make a patch for trunk. Also, the name of the configuration option is dubious since saying 'false' does not actually turn it off completely. I wasn't able to figure out a good name that conveyed the functionality in a short brief name however. A related concern as discussed in CASSANDRA-1902 is that the cache skipping isn't fsync:ing and so won't work reliably on writes. If the feature is to be retained that's something to fix in a different ticket. A question is also whether to retain the default to true or change it to false. I'm kinda leaning to false since it's detrimental in the easy cases of little data. In big cases with lots of data people will have to think and tweak anyway, so better to put the burden on that end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-4138) Add varint encoding to Serializing Cache
[ https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253028#comment-13253028 ] Vijay edited comment on CASSANDRA-4138 at 4/13/12 1:04 AM: --- Attached patch is the first attempt to add VarInt Encoding to cassandra. It save's us around 10% of the memory compared to normal DataInputStream. (based on a simple test via Stress Tool) Once this gets committed i will work on the rest of the pieces. was (Author: vijay2...@yahoo.com): Attached patch is the first attempt to add VarInt Encoding to cassandra. It save's us around 10% of the memory compared to normal DataInputStream. Once this gets committed i will work on the rest of the pieces. Add varint encoding to Serializing Cache Key: CASSANDRA-4138 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138 Project: Cassandra Issue Type: Sub-task Components: Core Affects Versions: 1.2 Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.2 Attachments: 0001-CASSANDRA-4138-Take1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-4100) Make scrub and cleanup operations throttled
[ https://issues.apache.org/jira/browse/CASSANDRA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246370#comment-13246370 ] Vijay edited comment on CASSANDRA-4100 at 4/4/12 3:41 PM: -- I think Throttle object in CompcationController should be non-static since compactions may run in parallel. Exactly thats why static is better, Parallel compaction is not a problem per say (ParallelCompactionIterable.getReduced() will take care of it), but compaction running one after the other (lot of small compactions). Let me know if everything else is ok i will rebase to 1.0.10 and move away from static (I am ok either ways), if needed. Thanks! was (Author: vijay2...@yahoo.com): I think Throttle object in CompcationController should be non-static since compactions may run in parallel. Exactly thats why static is better, Parallel compaction is not a problem per say (ParallelCompactionIterable.getReduced() will take care of it), but compaction running one after the other (lot of small compactions). Let me know if everything else is ok i will rebase to 1.0.10, if needed. Thanks! Make scrub and cleanup operations throttled --- Key: CASSANDRA-4100 URL: https://issues.apache.org/jira/browse/CASSANDRA-4100 Project: Cassandra Issue Type: Bug Components: Core Reporter: Vijay Assignee: Vijay Priority: Minor Labels: compaction Fix For: 1.0.10 Attachments: 0001-CASSANDRA-4100.patch Looks like scrub and cleanup operations are not throttled and it will be nice to throttle else we are likely to run into IO issues while running it on live cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244479#comment-13244479 ] Vijay edited comment on CASSANDRA-3997 at 4/2/12 7:20 PM: -- Segfaults happen in multiple places (opening a file, accessing malloc, while calling free, and in a lot of unrelated cases)... Unless we open JDK source code and figure out how it is structured it is hard to say when exactly it can fails (Let me know if you want to take a look at the hs_err*.log). In the bright side at least we can isolate this by calling via JNI, and we dont see the issue by loading JEMalloc via LD_LIBRARY_PATH. In v2 I removed the synchronization, i have also attached it here (Plz note the yaml setting is not included just to hide it for now). Thanks! Note: jemalloc 2.2.5 release works fine and so as the git/dev branch. was (Author: vijay2...@yahoo.com): Segfaults happen in multiple places (opening a file, accessing malloc, while calling free, and in a lot of unrelated cases)... Unless we open JDK source code and figure out how it is structured it is hard to say when exactly it can fails (Let me know if you want to take a look at the hs_err*.log). In the bright side at least we can isolate this by calling via JNI. In v2 I removed the synchronization, i have also attached it here (Plz note the yaml setting is not included just to hide it for now). Thanks! Note: jemalloc 2.2.5 release works fine and so as the git/dev branch. Make SerializingCache Memory Pluggable -- Key: CASSANDRA-3997 URL: https://issues.apache.org/jira/browse/CASSANDRA-3997 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Priority: Minor Labels: cache Fix For: 1.2 Attachments: 0001-CASSANDRA-3997-v2.patch, 0001-CASSANDRA-3997.patch, jna.zip Serializing cache uses native malloc and free by making FM pluggable, users will have a choice of gcc malloc, TCMalloc or JEMalloc as needed. Initial tests shows less fragmentation in JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-4099) IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress
[ https://issues.apache.org/jira/browse/CASSANDRA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240813#comment-13240813 ] Vijay edited comment on CASSANDRA-4099 at 3/28/12 11:07 PM: Thanks Brandon, CASSANDRA-4101 looks like a better solution but not only does the Streaming sets the version Gossip or any connunication does set it, the following does it {code} from = msg.getFrom(); // why? see = CASSANDRA-4099 if (version MessagingService.current_version) { // save the endpoint so gossip will reconnect to it Gossiper.instance.addSavedEndpoint(from); logger.info(Received + (isStream ? streaming : ) + connection from newer protocol version. Ignoring); } else if (msg != null) { Gossiper.instance.setVersion(from, version); logger.debug(set version for {} to {}, from, version); } {code} was (Author: vijay2...@yahoo.com): Thanks Brandon, CASSANDRA-4101 looks like a better solution but not only does the Streaming sets the version Gossip or any connunication does set it, the following does it code from = msg.getFrom(); // why? see = CASSANDRA-4099 if (version MessagingService.current_version) { // save the endpoint so gossip will reconnect to it Gossiper.instance.addSavedEndpoint(from); logger.info(Received + (isStream ? streaming : ) + connection from newer protocol version. Ignoring); } else if (msg != null) { Gossiper.instance.setVersion(from, version); logger.debug(set version for {} to {}, from, version); } /code IncomingTCPConnection recognizes from by doing socket.getInetAddress() instead of BroadCastAddress -- Key: CASSANDRA-4099 URL: https://issues.apache.org/jira/browse/CASSANDRA-4099 Project: Cassandra Issue Type: Bug Reporter: Vijay Assignee: Vijay Priority: Minor Attachments: 0001-CASSANDRA-4099.patch change this.from = socket.getInetAddress() to understand the broad cast IP, but the problem is we dont know until the first packet is received, this ticket is to work around the problem until it reads the first packet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237722#comment-13237722 ] Vijay edited comment on CASSANDRA-3997 at 3/24/12 11:15 PM: Have an update: Jason Evan says: LD_PRELOAD'ing jemalloc should be okay as long as the JVM doesn't statically link a different malloc implementation. I expect that if it isn't safe, you'll experience crashes quite early on, so give it a try and see what happens. I have also conformed the unsafe isn't statically linked to native Malloc by adding a printf in the malloc c code which basically count's the number of times it is called. Looks like PRELOAD is a better option. I am running a long running test and will close this ticket once it is successful. Thanks! was (Author: vijay2...@yahoo.com): Have an update: Jason Evan's says: LD_PRELOAD'ing jemalloc should be okay as long as the JVM doesn't statically link a different malloc implementation. I expect that if it isn't safe, you'll experience crashes quite early on, so give it a try and see what happens. I have also conformed the unsafe isn't statically linked to native Malloc by adding a printf in the malloc c code which basically count's the number of times it is called. Looks like PRELOAD is a better option. I am running a long running test and will close this ticket once it is successful. Thanks! Make SerializingCache Memory Pluggable -- Key: CASSANDRA-3997 URL: https://issues.apache.org/jira/browse/CASSANDRA-3997 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Priority: Minor Labels: cache Fix For: 1.2 Attachments: 0001-CASSANDRA-3997.patch, jna.zip Serializing cache uses native malloc and free by making FM pluggable, users will have a choice of gcc malloc, TCMalloc or JEMalloc as needed. Initial tests shows less fragmentation in JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3690) Streaming CommitLog backup
[ https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13225656#comment-13225656 ] Vijay edited comment on CASSANDRA-3690 at 3/8/12 11:12 PM: --- Hi Jonathan, Attached patch does exactly what we discussed here. Its almost the same as PostgreSQL :) In addition we can start the node with -Dcassandra.join_ring=false and then use JMX to restore files one by one via JMX. Plz let me know. was (Author: vijay2...@yahoo.com): Hi Jonathan, Attached patch does exactly what we discussed here. Its almost the same as Postgress :) In addition we can start the node with -Dcassandra.join_ring=false and then use JMX to restore files one by one via JMX. Plz let me know. Streaming CommitLog backup -- Key: CASSANDRA-3690 URL: https://issues.apache.org/jira/browse/CASSANDRA-3690 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.1.1 Attachments: 0001-CASSANDRA-3690-v2.patch, 0001-Make-commitlog-recycle-configurable.patch, 0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch, 0004-external-commitlog-with-sockets.patch, 0005-cmmiting-comments-to-yaml.patch Problems with the current SST backups 1) The current backup doesn't allow us to restore point in time (within a SST) 2) Current SST implementation needs the backup to read from the filesystem and hence additional IO during the normal operational Disks 3) in 1.0 we have removed the flush interval and size when the flush will be triggered per CF, For some use cases where there is less writes it becomes increasingly difficult to time it right. 4) Use cases which needs BI which are external (Non cassandra), needs the data in regular intervals than waiting for longer or unpredictable intervals. Disadvantages of the new solution 1) Over head in processing the mutations during the recover phase. 2) More complicated solution than just copying the file to the archive. Additional advantages: Online and offline restore. Close to live incremental backup. Note: If the listener agent gets restarted, it is the agents responsibility to Stream the files missed or incomplete. There are 3 Options in the initial implementation: 1) Backup - Once a socket is connected we will switch the commit log and send new updates via the socket. 2) Stream - will take the absolute path of the file and will read the file and send the updates via the socket. 3) Restore - this will get the serialized bytes and apply's the mutation. Side NOTE: (Not related to this patch as such) The agent which will take incremental backup is planned to be open sourced soon (Name: Priam). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224533#comment-13224533 ] Vijay edited comment on CASSANDRA-3997 at 3/7/12 5:40 PM: -- Ohhh sorry for the confusion. JEMAlloc's case: The Malloc/Free should done by ANY one thread at a time. The test had 100 Threads doing malloc/free but only one will actually malloc/free at a time and the Time taken shows the raw speed. TCMalloc's case: Only one thread should be malloc and doing free. (Even after this it was crashing randomly because of illegal memory access, hence i said JEMalloc hasnt crashed). The test code does exactly the above The implementation should deal with it and avoid contending for malloc and free with multiple threads. Once we deal with it, it works well. was (Author: vijay2...@yahoo.com): Ohhh sorry for the confusion. JEMAlloc's case: The Malloc/Free should be one only be done by any one thread at a time. The test had 100 Threads doing malloc/free but only one will actually malloc/free at a time and the Time taken shows the raw speed. TCMalloc's case: One thread should be malloc and doing free. (Even making this single threaded it was crashing randomly because of illegal memory access errors, hence i said JEMalloc hasnt crashed). The test code does exactly the above The implementation should deal with it and avoid contending for malloc and free with multiple threads. Once we deal with it, it works well. Make SerializingCache Memory Pluggable -- Key: CASSANDRA-3997 URL: https://issues.apache.org/jira/browse/CASSANDRA-3997 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vijay Assignee: Vijay Priority: Minor Labels: cache Fix For: 1.2 Attachments: jna.zip Serializing cache uses native malloc and free by making FM pluggable, users will have a choice of gcc malloc, TCMalloc or JEMalloc as needed. Initial tests shows less fragmentation in JEMalloc but the only issue with it is that (both TCMalloc and JEMalloc) are kind of single threaded (at-least they crash in my test otherwise). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222039#comment-13222039 ] Vijay edited comment on CASSANDRA-3997 at 3/4/12 10:26 PM: --- Attached is the test classes used for the test. Results on CentOS: nowiki [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26060604 45627616 0 169128 996300 -/+ buffers/cache: 24895176 46793044 Swap:0 0 0 Starting Test! Total bytes read: 101321734144 Time taken: 29937 total used free sharedbuffers cached Mem: 71688220 28472436 43215784 0 169128 996440 -/+ buffers/cache: 27306868 44381352 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /nowiki was (Author: vijay2...@yahoo.com): Attached is the test classes used for the test. Results on CentOS: [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 5 200 total used free
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222039#comment-13222039 ] Vijay edited comment on CASSANDRA-3997 at 3/4/12 10:28 PM: --- Attached is the test classes used for the test. Results on CentOS: {noformat} [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26060604 45627616 0 169128 996300 -/+ buffers/cache: 24895176 46793044 Swap:0 0 0 Starting Test! Total bytes read: 101321734144 Time taken: 29937 total used free sharedbuffers cached Mem: 71688220 28472436 43215784 0 169128 996440 -/+ buffers/cache: 27306868 44381352 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ {noformat} was (Author: vijay2...@yahoo.com): Attached is the test classes used for the test. Results on CentOS: nowiki [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 5 200 total
[jira] [Issue Comment Edited] (CASSANDRA-3997) Make SerializingCache Memory Pluggable
[ https://issues.apache.org/jira/browse/CASSANDRA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222039#comment-13222039 ] Vijay edited comment on CASSANDRA-3997 at 3/4/12 10:30 PM: --- Attached is the test classes used for the test. Results on CentOS: {noformat} [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=~/jemalloc-2.2.5/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.JEMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26060604 45627616 0 169128 996300 -/+ buffers/cache: 24895176 46793044 Swap:0 0 0 Starting Test! Total bytes read: 101321734144 Time taken: 29937 total used free sharedbuffers cached Mem: 71688220 28472436 43215784 0 169128 996440 -/+ buffers/cache: 27306868 44381352 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ {noformat} The test shows around 4 GB savings. The test was on 101321734144 bytes (101 GB each). The test use CLHM to hold on to the objects and release them when the capacity is reached (5K) was (Author: vijay2...@yahoo.com): Attached is the test classes used for the test. Results on CentOS: {noformat} [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.MallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26049380 45638840 0 169116 996172 -/+ buffers/cache: 24884092 46804128 Swap:0 0 0 Starting Test! Total bytes read: 101422934016 Time taken: 25407 total used free sharedbuffers cached Mem: 71688220 31981924 39706296 0 169116 996312 -/+ buffers/cache: 30816496 40871724 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=/usr/local/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ /etc/alternatives/jre_1.7.0/bin/java -Djava.library.path=/usr/local/lib/ -cp jna.jar:/apps/nfcassandra_server/lib/*:. com.sun.jna.TCMallocAllocator 5 200 total used free sharedbuffers cached Mem: 71688220 26054620 45633600 0 169128 996228 -/+ buffers/cache: 24889264 46798956 Swap:0 0 0 Starting Test! Total bytes read: 101304894464 Time taken: 46387 total used free sharedbuffers cached Mem: 71688220 28535136 43153084 0 169128 996436 -/+ buffers/cache: 27369572 44318648 Swap:0 0 0 ending Test! [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$ export LD_LIBRARY_PATH=~/jemalloc-2.2.5/lib/ [vijay_tcasstest@vijay_tcass-i-a91ee8cd ~]$
[jira] [Issue Comment Edited] (CASSANDRA-3853) lower impact on old-gen promotion of slow nodes or connections
[ https://issues.apache.org/jira/browse/CASSANDRA-3853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211501#comment-13211501 ] Vijay edited comment on CASSANDRA-3853 at 2/19/12 7:33 PM: --- Other thing we can do is that we can drop the references in the co-ordinator once we have executed/sent the query to the remote node, we can avoid promotion of those objects. We dont use this command for retry etc... it is done by the client hence we dont need to hold this references. Once the references to the query is dropped even if the rpc timeout is 10 seconds we have reference to only fewer objects. Bonus: if we convert table name and column family name to byte buffers and use references then we will save some there too. was (Author: vijay2...@yahoo.com): Other thing we can do is that we can drop the references in the co-ordinator once we have executed/sent the query to the remote node, this we can avoid promotion of those objects at least we dont use this command for retry etc... it is done by the client hence we dont need to hold the references. Once the references to the query is dropped even if the rpc timeout is 10 seconds we have reference to very little objects. Bonus: if we convert table name and column family name to byte buffers and use references then we will save some there too. lower impact on old-gen promotion of slow nodes or connections -- Key: CASSANDRA-3853 URL: https://issues.apache.org/jira/browse/CASSANDRA-3853 Project: Cassandra Issue Type: Improvement Reporter: Peter Schuller Assignee: Peter Schuller Cassandra has the unfortunate behavior that when things are slow (nodes overloaded, etc) there is a tendency for cascading failure if the system is overall under high load. This is generally true of most systems, but one way in which it is worse than desired is the way we queue up things between stages and outgoing requests. First off, I use the following premises: * The node is not running Azul ;) * The total cost of ownership (in terms of allocation+collection) of an object that dies in old-gen is *much* higher than that of an object that dies in young gen. * When CMS fails (concurrent mode failure or promotion failure), the resulting full GC is *serial* and does not use all cores, and is a stop-the-world pause. Here is how this very effectively leads to cascading failure of the fallen and can't get up kind: * Some node has a problem and is slow, even if just for a little while. * Other nodes, especially neighbors in the replica set, start queueing up outgoing requests to the node for {{rpc_timeout}} milliseconds. * You have a high (let's say write) throughput of 50 thousand or so requests per second per node. * Because you want writes to be highly available and you are okay with high latency, you have an {{rpc_timeout}} of 60 seconds. * The total amount of memory used for 60 * 50 000 requests is freaking high. * The young gen GC pauses happen *much* more frequently than every 60 seconds. * The result is that when a node goes down, other nodes in the replica set start *massively* increasing their promotion rate into old gen. A cluster whose nodes are normally completely fine, with slow nice promotion into old-gen, will now exhibit vastly different behavior than normal: While the total allocation rate doesn't change (or not very much, perhaps a little if clients are doing re-tries), the promotion rate into old-gen increases massively. * This increases the total cost of ownership, and thus demand for CPU resources. * You will *very* easily see CMS' sweeping phase not stand a chance to sweep up fast enough to keep up with the incoming request rate, even with a hugely inflated heap (CMS sweeping is not parallel, even though marking is). * This leads to promotion failure/conc mode failure, and you fall into full GC. * But now, your full GC is effectively stealing CPU resources since you are forcing all cores but one to be completely idle on your system. * Once you go out of GC, you now have a huge backlog of work to do that you get bombarded with from other nodes that thought it was a good idea to retain 30 seconds worth of messages in *their* heap. So you're now being instantly shot down again by your neighbors, falling into the next full GC cycle even easier than originally. * Meanwhile, the fact that you are in full gc, is causing your neighbors to enter the same predicament. The solution to this in production is to rapidly restart all nodes in the replica set. Doing a live-change of RPC timeouts to something very very low might also do the trick. This is a specific instance of the overall problem that we
[jira] [Issue Comment Edited] (CASSANDRA-3412) make nodetool ring ownership smarter
[ https://issues.apache.org/jira/browse/CASSANDRA-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210055#comment-13210055 ] Vijay edited comment on CASSANDRA-3412 at 2/17/12 5:02 AM: --- I tried to make this generic enough and attached is a simple patch to display what we discussed. If we cannot figure out a keyspace to display the default is the same as today. Default: Warning: Output contains ownership information which does not include replication factor. Warning: Use nodetool ring keyspace to specify a keyspace. Address DC RackStatus State LoadOwns Token 141784319550391026443072753096942836216 107.21.183.168 us-east 1c Up Normal 38.57 KB27.78% 18904575940052136859076367081351254013 79.125.30.58eu-west 1c Up Normal 36.44 KB22.22% 56713727820156410577229101239000783353 50.16.117.152 us-east 1c Up Normal 52.03 KB11.11% 75618303760208547436305468319979289255 50.19.163.142 us-east 1c Up Normal 51.59 KB33.33% 132332031580364958013534569558607324497 46.51.157.33eu-west 1c Up Normal 31.64 KB5.56% 141784319550391026443072753096942836216 Effective nt ring: ('org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options={us-east:2,eu-west:1}) [vijay_tcasstest@vijay_tcass-i-a6643ac3 ~]$ nt ring Address DC RackStatus State Load Effective-Owership Token 141784319550391026443072753096942836216 107.21.183.168 us-east 1c Up Normal 27.23 KB66.67% 18904575940052136859076367081351254013 79.125.30.58eu-west 1c Up Normal 31.51 KB50.00% 56713727820156410577229101239000783353 50.16.117.152 us-east 1c Up Normal 47.1 KB 66.67% 75618303760208547436305468319979289255 50.19.163.142 us-east 1c Up Normal 42.52 KB66.67% 132332031580364958013534569558607324497 46.51.157.33eu-west 1c Up Normal 36.32 KB50.00% 141784319550391026443072753096942836216 was (Author: vijay2...@yahoo.com): I tried to make this generic enough and attached is a simple patch to display what we discussed. If we cannot figure out a keyspace to display the default is the same as today. Default: Warning: Output contains ownership information which does not include replication factor. Warning: Use nodetool ring keyspace to specify a keyspace. Address DC RackStatus State LoadOwns Token 141784319550391026443072753096942836216 107.21.183.168 us-east 1c Up Normal 38.57 KB27.78% 18904575940052136859076367081351254013 79.125.30.58eu-west 1c Up Normal 36.44 KB22.22% 56713727820156410577229101239000783353 50.16.117.152 us-east 1c Up Normal 52.03 KB11.11% 75618303760208547436305468319979289255 50.19.163.142 us-east 1c Up Normal 51.59 KB33.33% 132332031580364958013534569558607324497 46.51.157.33eu-west 1c Up Normal 31.64 KB5.56% 141784319550391026443072753096942836216 Effective nt ring: [vijay_tcasstest@vijay_tcass-i-a6643ac3 ~]$ nt ring Address DC RackStatus State Load Effective-Owership Token 141784319550391026443072753096942836216 107.21.183.168 us-east 1c Up Normal 27.23 KB66.67% 18904575940052136859076367081351254013 79.125.30.58eu-west 1c Up Normal 31.51 KB50.00% 56713727820156410577229101239000783353 50.16.117.152 us-east 1c Up Normal 47.1 KB 66.67% 75618303760208547436305468319979289255 50.19.163.142 us-east 1c Up Normal 42.52 KB66.67% 132332031580364958013534569558607324497 46.51.157.33
[jira] [Issue Comment Edited] (CASSANDRA-3772) Evaluate Murmur3-based partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207436#comment-13207436 ] Vijay edited comment on CASSANDRA-3772 at 2/14/12 2:26 AM: --- If CASSANDRA-2975 gets committed you should be able to use that. Edit: you can use MurmurHash.hash3_x64_128 function from 2975. was (Author: vijay2...@yahoo.com): If CASSANDRA-2975 gets committed you should be able to use that. Evaluate Murmur3-based partitioner -- Key: CASSANDRA-3772 URL: https://issues.apache.org/jira/browse/CASSANDRA-3772 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Dave Brosius Fix For: 1.2 Attachments: try_murmur3.diff MD5 is a relatively heavyweight hash to use when we don't need cryptographic qualities, just a good output distribution. Let's see how much overhead we can save by using Murmur3 instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1956) Convert row cache to row+filter cache
[ https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204329#comment-13204329 ] Vijay edited comment on CASSANDRA-1956 at 2/9/12 7:30 AM: -- This patch is not complete yet i just wanted to show and see what you guys think about this... This patch is something like a block cache where it will cache blocks of columns where the user can choose the block size and if the query is within the block we are good by just pulling the block into memory else we will scan through the blocks and get the required blocks. Updates can also scan through the blocks and update them... The good part here is this should have lower memory foot print than Query cache but it should also solve the problems which we are discussing in this ticket and it doesnt support Super columns and I dont plan to do so. Let me know, Thanks! Again there is more logic/cases to be handled, Just a prototype for now. was (Author: vijay2...@yahoo.com): This patch is not complete yet i just wanted to show and see what you guys think about this... This patch is something like a block cache where it will cache blocks of columns where the user can choose the block size and if the query is within the block we are good by just pulling the block into memory else we will scan through the blocks and get the required blocks. Updates can also scan through the blocks and update them... The good part here is this should have lower memory foot print than Query cache but it should also solve the problems which we are discussing in this ticket. Let me know thanks! Again there is more logic/cases to be handled, Just a prototype for now. Convert row cache to row+filter cache - Key: CASSANDRA-1956 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Stu Hood Assignee: Vijay Priority: Minor Fix For: 1.2 Attachments: 0001-1956-cache-updates-v0.patch, 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch Changing the row cache to a row+filter cache would make it much more useful. We currently have to warn against using the row cache with wide rows, where the read pattern is typically a peek at the head, but this usecase would be perfect supported by a cache that stored only columns matching the filter. Possible implementations: * (copout) Cache a single filter per row, and leave the cache key as is * Cache a list of filters per row, leaving the cache key as is: this is likely to have some gotchas for weird usage patterns, and it requires the list overheard * Change the cache key to rowkey+filterid: basically ideal, but you need a secondary index to lookup cache entries by rowkey so that you can keep them in sync with the memtable * others? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3690) Streaming CommitLog backup
[ https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191899#comment-13191899 ] Vijay edited comment on CASSANDRA-3690 at 1/24/12 5:43 AM: --- 0001 = Adds a configuration so we can avoid recycling in case some one wants to copy the files across to another location like a archive logs 0002 = Adds CommitLogListener, implementation can recive the updates to the commitlogs. 0003 = helper JMX in case the user wants to query the active CL's 0004 = this can go to the tools folder/we dont need to commit it to the core. was (Author: vijay2...@yahoo.com): 0001 = Adds CommitLogListener, implementation can recive the updates to the commitlogs. This also adds a configuration so we can avoid recycling in case some one wants to copy the files across to another location like a archive logs 0002 = helper JMX in case the user wants to query the active CL's 0003 = this can go to the tools folder/we dont need to commit it to the core. Streaming CommitLog backup -- Key: CASSANDRA-3690 URL: https://issues.apache.org/jira/browse/CASSANDRA-3690 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.1 Attachments: 0001-Make-commitlog-recycle-configurable.patch, 0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch, 0004-external-commitlog-with-sockets.patch, 0005-cmmiting-comments-to-yaml.patch Problems with the current SST backups 1) The current backup doesn't allow us to restore point in time (within a SST) 2) Current SST implementation needs the backup to read from the filesystem and hence additional IO during the normal operational Disks 3) in 1.0 we have removed the flush interval and size when the flush will be triggered per CF, For some use cases where there is less writes it becomes increasingly difficult to time it right. 4) Use cases which needs BI which are external (Non cassandra), needs the data in regular intervals than waiting for longer or unpredictable intervals. Disadvantages of the new solution 1) Over head in processing the mutations during the recover phase. 2) More complicated solution than just copying the file to the archive. Additional advantages: Online and offline restore. Close to live incremental backup. Note: If the listener agent gets restarted, it is the agents responsibility to Stream the files missed or incomplete. There are 3 Options in the initial implementation: 1) Backup - Once a socket is connected we will switch the commit log and send new updates via the socket. 2) Stream - will take the absolute path of the file and will read the file and send the updates via the socket. 3) Restore - this will get the serialized bytes and apply's the mutation. Side NOTE: (Not related to this patch as such) The agent which will take incremental backup is planned to be open sourced soon (Name: Priam). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3723) Include await for the queues in tpstats
[ https://issues.apache.org/jira/browse/CASSANDRA-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187169#comment-13187169 ] Vijay edited comment on CASSANDRA-3723 at 1/16/12 8:42 PM: --- latency = await + this task's processing time, right? Currently it is just the processing time... what i am proposing is to change the latency number to something like await (await in the queue + processing time). If that makes sense. not sure why we'd remove the latency though I am not saying we have to remove it... Just like a enhancement instead of adding one more metric to glance :) was (Author: vijay2...@yahoo.com): latency = await + this task's processing time, right? Currently it is just the processing time... what i am proposing is to change the latency number to something like await. (If that makes sense). not sure why we'd remove the latency though I am not saying we have to remove it... Just like a enhancement instead of adding one more metric to glance :) Include await for the queues in tpstats --- Key: CASSANDRA-3723 URL: https://issues.apache.org/jira/browse/CASSANDRA-3723 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 1.2 Reporter: Vijay Assignee: Vijay Priority: Minor Something simillar to IOSTAT await, there is an additional over head not sure if we have to make an exception for this but i think this has a huge + while troubleshooting await The average time (in milliseconds) for I/O requests issued to the request to be served. This includes the time spent by the requests in queue and the time spent servicing them or we can also have a simple average of time spent in the queue before being served. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3723) Include await for the queues in tpstats
[ https://issues.apache.org/jira/browse/CASSANDRA-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187169#comment-13187169 ] Vijay edited comment on CASSANDRA-3723 at 1/16/12 8:43 PM: --- latency = await + this task's processing time, right? Currently it is just the processing time... what i am proposing is to change the latency number to something like await (wait time in the queue + processing time). If that makes sense. not sure why we'd remove the latency though I am not saying we have to remove it... Just like a enhancement instead of adding one more metric to glance :) was (Author: vijay2...@yahoo.com): latency = await + this task's processing time, right? Currently it is just the processing time... what i am proposing is to change the latency number to something like await (await in the queue + processing time). If that makes sense. not sure why we'd remove the latency though I am not saying we have to remove it... Just like a enhancement instead of adding one more metric to glance :) Include await for the queues in tpstats --- Key: CASSANDRA-3723 URL: https://issues.apache.org/jira/browse/CASSANDRA-3723 Project: Cassandra Issue Type: Bug Components: Tools Affects Versions: 1.2 Reporter: Vijay Assignee: Vijay Priority: Minor Something simillar to IOSTAT await, there is an additional over head not sure if we have to make an exception for this but i think this has a huge + while troubleshooting await The average time (in milliseconds) for I/O requests issued to the request to be served. This includes the time spent by the requests in queue and the time spent servicing them or we can also have a simple average of time spent in the queue before being served. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3590) Use multiple connection to share the OutboutTCPConnection
[ https://issues.apache.org/jira/browse/CASSANDRA-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187466#comment-13187466 ] Vijay edited comment on CASSANDRA-3590 at 1/17/12 6:06 AM: --- Finally had a chance to do this bench mark. Configuration: M2.4xl (AWS) Traffic Between: US and EU Open JDK, CentOS 5.6 3 Tests where done where the active queue is the limiting factor for the traffic to go across the nodes. Latency is the metric which we are trying to measure in this test (With 1 connection the latency is high, because of the Delay over the public internet in a AWS multi region setup). Code for the benchmark is attached with this ticket. Server A (US): java -jar Listener.jar 7103 Server B (EU): java -jar RunTest.jar 1 107.22.50.61 7103 500 Server C (US): java -jar Listener.jar 7103 Server D (EU): java -jar RunTest.jar 2 107.22.50.61 7103 500 Data is collected with 1 Second interval (plz see code for details). Code for the IncomingTcpConnection and OutboundTcpConnection was modified a little bit to work independent of other cassandra services (Plz see code for details). was (Author: vijay2...@yahoo.com): Finally had a chance to do this bench mark. Configuration: M2.4xl (AWS) Traffic Between: US and EU Open JDK, CentOS 5.6 3 Tests where done where the active queue is the limiting factor for the traffic to go across the nodes. Latency is the metric which we are trying to measure in this test (With 1 connection the latency is high, because of the Delay over the public internet in a AWS multi region setup). Code for the benchmark is attached with this ticket. Server A (US): java -jar Listener.jar 7103 Server B (EU): java -jar RunTest.jar 1 107.22.50.61 7103 500 Server C (US): java -jar Listener.jar 7103 Server D (EU): java -jar RunTest.jar 2 107.22.50.61 7103 500 Data is collected with 1 Second interval (plz see code for details). Use multiple connection to share the OutboutTCPConnection - Key: CASSANDRA-3590 URL: https://issues.apache.org/jira/browse/CASSANDRA-3590 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.2 Attachments: TCPTest.xlsx, TCPTest.zip Currently there is one connection between any given host to another host in the cluster, the problem with this is: 1) This can become a bottleneck in some cases where the latencies are higher. 2) When a connection is dropped we also drop the queue and recreate a new one and hence the messages can be lost (Currently hints will take care of it and clients also can retry) by making it a configurable option to configure the number of connections and also making the queue common to those connections the above 2 issues can be resolved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3695) Hibernating nodes that die never go away
[ https://issues.apache.org/jira/browse/CASSANDRA-3695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179827#comment-13179827 ] Vijay edited comment on CASSANDRA-3695 at 1/4/12 8:08 PM: -- Or can we have AVeryLongTime more like configuration only if they dont have any state (Fat clients wont have state, currently) and the nodes with the dead state can be removed much more often something like an hour or so since we last got the ghossip? was (Author: vijay2...@yahoo.com): Or can we have AVeryLongTime more like configuration only if they dont have any state and the nodes with the dead state can be removed much more often something like an hour or so since we last got the ghossip? Hibernating nodes that die never go away Key: CASSANDRA-3695 URL: https://issues.apache.org/jira/browse/CASSANDRA-3695 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Brandon Williams Assignee: Brandon Williams Title says it all. We should be able to monitor these via the gossip heartbeat like other nodes, but it's tricky since it's a dead state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176071#comment-13176071 ] Vijay edited comment on CASSANDRA-3623 at 12/27/11 3:22 AM: Alright i think i found the the missing peace: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). 3) Plz set the CRC chance to 0.0 by update chance - We need to do this before the SST's are created otherwise it wont take into effect. (update statements i used is in the *.doc attached) You might not see any diffrence if it is not set, because thats a big bottleneck. 4) I used SunJDK for the test. The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) was (Author: vijay2...@yahoo.com): Alright i think i found the the missing peace: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). 3) Plz set the CRC chance to 0.0 by update chance - We need to do this before the SST's are created otherwise it wont take into effect. (update statements i used is in the *.doc attached) 4) I used SunJDK for the test. The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176071#comment-13176071 ] Vijay edited comment on CASSANDRA-3623 at 12/27/11 3:20 AM: Alright i think i found the the missing peace: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). 3) Plz set the CRC chance to 0.0 by update chance - We need to do this before the SST's are created otherwise it wont take into effect. (update statements i used is in the *.doc attached) 4) I used SunJDK for the test. The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) was (Author: vijay2...@yahoo.com): Alright i think i found the the missing peace: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176071#comment-13176071 ] Vijay edited comment on CASSANDRA-3623 at 12/27/11 3:23 AM: Alright i think i found the the missing pieces: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). 3) Plz set the CRC chance to 0.0 by update chance - We need to do this before the SST's are created otherwise it wont take into effect. (update statements i used is in the *.doc attached) You might not see any diffrence if it is not set, because thats a big bottleneck. 4) I used SunJDK for the test. The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) was (Author: vijay2...@yahoo.com): Alright i think i found the the missing peace: 1) Plz reapply v2 from CASSANDRA-3611 (which also depends on CASSANDRA-3610) 2) Plz reapply v3 which has the mark() (this seem to be used by range slice and Stress tool does it). 3) Plz set the CRC chance to 0.0 by update chance - We need to do this before the SST's are created otherwise it wont take into effect. (update statements i used is in the *.doc attached) You might not see any diffrence if it is not set, because thats a big bottleneck. 4) I used SunJDK for the test. The Test Results are attached, let me know in case of any questions... the performance seem to be better. I Used stress test so we are in the same page, and when the Column size or the range of columns to be fetched increases the performance gets better (rebuffers) use MMapedBuffer in CompressedSegmentedFile.getSegment -- Key: CASSANDRA-3623 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 1.1 Reporter: Vijay Assignee: Vijay Labels: compression Fix For: 1.1 Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. This ticket is to implement the TODO mentioned in CompressedRandomAccessReader // TODO refactor this to separate concept of buffer to avoid lots of read() syscalls and compression buffer but i think a separate class for the Buffer will be better. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
[ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175595#comment-13175595 ] Vijay edited comment on CASSANDRA-3623 at 12/23/11 10:30 PM: - Hot Methods before the patch (trunk, without any patch): Excl. User CPUName sec. % 1480.474 100.00 Total 756.717 51.11 crc32 387.767 26.19 static@0x54999 (snappy-1.0.4.1-libsnappyjava.so) 54.814 3.70 org.apache.cassandra.io.compress.CompressedRandomAccessReader.init(java.lang.String, org.apache.cassandra.io.compress.CompressionMetadata, boolean) 46.676 3.15 org.apache.cassandra.io.util.RandomAccessReader.init(java.io.File, int, boolean) 45.697 3.09 Copy::pd_disjoint_words(HeapWord*, HeapWord*, unsigned long) 39.417 2.66 memcpy 36.931 2.49 static@0xd8e9 (libpthread-2.5.so) 23.272 1.57 CompactibleFreeListSpace::block_size(const HeapWord*) const 22.766 1.54 SpinPause 12.593 0.85 BlockOffsetArrayNonContigSpace::block_start_unsafe(const void*) const 9.304 0.63 CardTableModRefBSForCTRS::card_will_be_scanned(signed char) 8.468 0.57 CardTableModRefBS::non_clean_card_iterate_work(MemRegion, MemRegionClosure*, bool) 8.051 0.54 ParallelTaskTerminator::offer_termination(TerminatorTerminator*) 5.400 0.36 madvise 4.619 0.31 CardTableModRefBS::process_chunk_boundaries(Space*, DirtyCardToOopClosure*, MemRegion, MemRegion, signed char**, unsigned long, unsigned long) 1.584 0.11 CardTableModRefBS::dirty_card_range_after_reset(MemRegion, bool, int) 1.551 0.10 SweepClosure::do_blk_careful(HeapWord*) Hot Methods After the patch: sec. % 537.681 100.00 Total 529.719 98.52 static@0x54999 (snappy-1.0.4.1-libsnappyjava.so) 4.168 0.78 memcpy 0.143 0.03 Unknown 0.121 0.02 send 0.121 0.02 sun.misc.Unsafe.park(boolean, long) 0.110 0.02 sun.misc.Unsafe.unpark(java.lang.Object) 0.088 0.02 Interpreter 0.077 0.01 org.apache.cassandra.utils.EstimatedHistogram.max() 0.077 0.01 recv 0.066 0.01 SpinPause 0.055 0.01 org.apache.cassandra.utils.EstimatedHistogram.mean() 0.044 0.01 java.lang.Object.wait(long) 0.044 0.01 org.apache.cassandra.utils.EstimatedHistogram.min() 0.044 0.01 __pthread_cond_signal 0.044 0.01 vtable stub 0.033 0.01 java.lang.Object.notify() 0.033 0.01 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) 0.033 0.01 org.apache.cassandra.io.compress.CompressedMappedFileDataInput.read() 0.033 0.01 PhaseLive::compute(unsigned) 0.033 0.01 poll 0.022 0.00 Arena::contains(const void*) const 0.022 0.00 CompactibleFreeListSpace::free() const 0.022 0.00 I2C/C2I adapters 0.022 0.00 IndexSetIterator::advance_and_next() 0.022 0.00 java.lang.Class.forName0(java.lang.String, boolean, java.lang.ClassLoader) 0.022 0.00 java.lang.Long.getChars(long, int, char[]) 0.022 0.00 java.nio.Bits.swap(int) Before this patch response times (With crc chance set to 0): Epoch Rds/s RdLat Wrts/s WrtLat %user %sys %idle %iowait %steal md0r/s w/s rMB/s wMB/s NetRxKb NetTxKb Percentiles ReadWrite Compacts 1324587443 15 186.305 00.000 27.85 0.0271.83 0.24 0.053.890.000.120.0041 45 99th 545.791 ms 95th 454.826 ms 99th 0.00 ms95th 0.00 msPen/0 1324587455 15 1142.712 00.000 39.55 0.1357.61 2.50 0.21118.30 0.302.200.0034 36 99th 8409.007 ms95th 8409.007 ms99th 0.00 ms95th 0.00 msPen/0 1324587467 10 171.808 00.000 23.83 0.0476.05 0.04 0.054.800.000.140.00127 33 99th 454.826 ms 95th 315.852 ms 99th 0.00 ms95th 0.00 msPen/0 1324587478 10 182.775 00.000 20.43 0.0479.47 0.01 0.051.600.400.040.0030 37 99th 379.022 ms 95th 379.022 ms 99th 0.00 ms95th 0.00 msPen/0 1324587490 13 190.893 00.000 27.58 0.0372.20 0.14 0.063.200.500.090.0039 42 99th 545.791 ms 95th 379.022 ms 99th 0.00 ms95th 0.00 msPen/0 1324587503 28 358.719 00.000 52.24 0.0846.20 1.40 0.09159.40 0.003.160.00196 71 99th 3379.391 ms95th 943.127 ms 99th 0.00 ms95th 0.00 msPen/0 1324587517 13 194.281 00.000 16.68 0.0283.23 0.04 0.022.400.300.070.0038 41 99th 785.939 ms 95th 545.791 ms 99th 0.00 ms95th 0.00 msPen/0 1324587535 36 662.410 00.000 58.34 0.08
[jira] [Issue Comment Edited] (CASSANDRA-3610) Checksum improvement for CompressedRandomAccessReader
[ https://issues.apache.org/jira/browse/CASSANDRA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175035#comment-13175035 ] Vijay edited comment on CASSANDRA-3610 at 12/22/11 8:29 PM: Ooops pasted the wrong data the above data is without any Heap settings hence GC becomes a bottleneck... Plz see the below :) /usr/java/latest/jre/bin/java -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn2G -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -jar TestCRC32Performance.jar ||bytes||PureJava MB/sec||Native MB/sec||Random PureJava MB/sec||Native MB/sec|| | 1 |121.124|11.866 | | 2 |161.981|23.851 | | 4 |204.718|45.486 | | 8 |297.229|76.296 | | 16|379.268|117.326| | 32|440.153|157.711| | 64|468.143|193.304|| PureJava |0-64 |272.921 MB/sec|| Native|0-64 |145.289 MB/sec| | 128 |500.006|219.657|| PureJava |0-128 |367.816 MB/sec|| Native|0-128 |186.861 MB/sec| | 256 |511.572|234.052|| PureJava |0-256 |432.433 MB/sec|| Native|0-256 |214.047 MB/sec| | 512 |517.550|242.634|| PureJava |0-512 |474.074 MB/sec|| Native|0-512 |231.047 MB/sec| | 1024 |516.994|246.424|| PureJava |0-1024 |498.055 MB/sec|| Native|0-1024 |241.056 MB/sec| | 2048 |518.095|248.529|| PureJava |0-2048 |509.960 MB/sec|| Native|0-2048 |245.683 MB/sec| | 4096 |522.002|249.755|| PureJava |0-4096 |518.226 MB/sec|| Native|0-4096 |248.062 MB/sec| | 8192 |522.795|250.316|| PureJava |0-8192 |520.326 MB/sec|| Native|0-8192 |249.519 MB/sec| | 16384 |522.521|250.484|| PureJava |0-16384 |522.480 MB/sec|| Native|0-16384|250.002 MB/sec| | 32768 |521.098|250.604|| PureJava |0-32768 |520.349 MB/sec|| Native|0-32768|250.494 MB/sec| | 65536 |520.973|250.837|| PureJava |0-65536 |520.392 MB/sec|| Native|0-65536|249.063 MB/sec| | 131072|510.129|248.949|| PureJava |0-131072 |516.246 MB/sec|| Native|0-131072 |249.535 MB/sec| | 262144|513.534|249.506|| PureJava |0-262144 |514.407 MB/sec|| Native|0-262144 |250.617 MB/sec| | 524288|519.554|250.696|| PureJava |0-524288 |520.402 MB/sec|| Native|0-524288 |251.048 MB/sec| | 1048576 |519.559|250.557|| PureJava |0-1048576 |520.403 MB/sec|| Native|0-1048576 |250.734 MB/sec| | 2097152 |519.259|250.456|| PureJava |0-2097152 |519.337 MB/sec|| Native|0-2097152 |250.299 MB/sec| | 4194304 |518.649|250.470|| PureJava |0-4194304 |518.495 MB/sec|| Native|0-4194304 |250.523 MB/sec| | 8388608 |501.986|248.044|| PureJava |0-8388608 |509.521 MB/sec|| Native|0-8388608 |248.626 MB/sec| | 16777216 |508.201|247.587|| PureJava |0-16777216 |505.258 MB/sec|| Native|0-16777216 |249.558 MB/sec| [vijay_tcasstest@vijay_tcass--1a-i-aad629c8 ~]$ /usr/java/latest/jre/bin/java -version java version 1.6.0_27 Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode) [vijay_tcasstest@vijay_tcass--1a-i-aad629c8 ~]$ was (Author: vijay2...@yahoo.com): Ooops pasted the wrong data the above data is without any Heap settings hence GC becomes a bottleneck... Plz see the below :) /usr/java/latest/jre/bin/java -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn2G -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -jar TestCRC32Performance.jar ||bytes||PureJava MB/sec||Native MB/sec||Random PureJava MB/sec||Native MB/sec|| | 1 |121.124|11.866 | | 2 |161.981|23.851 | | 4 |204.718|45.486 | | 8 |297.229|76.296 | | 16|379.268|117.326| | 32|440.153|157.711| | 64|468.143|193.304|| PureJava |0-64 |272.921 MB/sec|| Native|0-64 |145.289 MB/sec| | 128 |500.006|219.657|| PureJava |0-128
[jira] [Issue Comment Edited] (CASSANDRA-3610) Checksum improvement for CompressedRandomAccessReader
[ https://issues.apache.org/jira/browse/CASSANDRA-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175035#comment-13175035 ] Vijay edited comment on CASSANDRA-3610 at 12/22/11 8:30 PM: Ooops pasted the wrong data the above data is without any Heap settings and on Open JDK.. Plz see the below :) /usr/java/latest/jre/bin/java -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn2G -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -jar TestCRC32Performance.jar ||bytes||PureJava MB/sec||Native MB/sec||Random PureJava MB/sec||Native MB/sec|| | 1 |121.124|11.866 | | 2 |161.981|23.851 | | 4 |204.718|45.486 | | 8 |297.229|76.296 | | 16|379.268|117.326| | 32|440.153|157.711| | 64|468.143|193.304|| PureJava |0-64 |272.921 MB/sec|| Native|0-64 |145.289 MB/sec| | 128 |500.006|219.657|| PureJava |0-128 |367.816 MB/sec|| Native|0-128 |186.861 MB/sec| | 256 |511.572|234.052|| PureJava |0-256 |432.433 MB/sec|| Native|0-256 |214.047 MB/sec| | 512 |517.550|242.634|| PureJava |0-512 |474.074 MB/sec|| Native|0-512 |231.047 MB/sec| | 1024 |516.994|246.424|| PureJava |0-1024 |498.055 MB/sec|| Native|0-1024 |241.056 MB/sec| | 2048 |518.095|248.529|| PureJava |0-2048 |509.960 MB/sec|| Native|0-2048 |245.683 MB/sec| | 4096 |522.002|249.755|| PureJava |0-4096 |518.226 MB/sec|| Native|0-4096 |248.062 MB/sec| | 8192 |522.795|250.316|| PureJava |0-8192 |520.326 MB/sec|| Native|0-8192 |249.519 MB/sec| | 16384 |522.521|250.484|| PureJava |0-16384 |522.480 MB/sec|| Native|0-16384|250.002 MB/sec| | 32768 |521.098|250.604|| PureJava |0-32768 |520.349 MB/sec|| Native|0-32768|250.494 MB/sec| | 65536 |520.973|250.837|| PureJava |0-65536 |520.392 MB/sec|| Native|0-65536|249.063 MB/sec| | 131072|510.129|248.949|| PureJava |0-131072 |516.246 MB/sec|| Native|0-131072 |249.535 MB/sec| | 262144|513.534|249.506|| PureJava |0-262144 |514.407 MB/sec|| Native|0-262144 |250.617 MB/sec| | 524288|519.554|250.696|| PureJava |0-524288 |520.402 MB/sec|| Native|0-524288 |251.048 MB/sec| | 1048576 |519.559|250.557|| PureJava |0-1048576 |520.403 MB/sec|| Native|0-1048576 |250.734 MB/sec| | 2097152 |519.259|250.456|| PureJava |0-2097152 |519.337 MB/sec|| Native|0-2097152 |250.299 MB/sec| | 4194304 |518.649|250.470|| PureJava |0-4194304 |518.495 MB/sec|| Native|0-4194304 |250.523 MB/sec| | 8388608 |501.986|248.044|| PureJava |0-8388608 |509.521 MB/sec|| Native|0-8388608 |248.626 MB/sec| | 16777216 |508.201|247.587|| PureJava |0-16777216 |505.258 MB/sec|| Native|0-16777216 |249.558 MB/sec| [vijay_tcasstest@vijay_tcass--1a-i-aad629c8 ~]$ /usr/java/latest/jre/bin/java -version java version 1.6.0_27 Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) 64-Bit Server VM (build 20.2-b06, mixed mode) [vijay_tcasstest@vijay_tcass--1a-i-aad629c8 ~]$ was (Author: vijay2...@yahoo.com): Ooops pasted the wrong data the above data is without any Heap settings hence GC becomes a bottleneck... Plz see the below :) /usr/java/latest/jre/bin/java -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn2G -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -jar TestCRC32Performance.jar ||bytes||PureJava MB/sec||Native MB/sec||Random PureJava MB/sec||Native MB/sec|| | 1 |121.124|11.866 | | 2 |161.981|23.851 | | 4 |204.718|45.486 | | 8 |297.229|76.296 | | 16|379.268|117.326| | 32|440.153|157.711| | 64|468.143|193.304|| PureJava |0-64 |272.921 MB/sec|| Native|0-64 |145.289 MB/sec| | 128 |500.006|219.657|| PureJava |0-128 |367.816 MB/sec||
[jira] [Issue Comment Edited] (CASSANDRA-3112) Make repair fail when an unexpected error occurs
[ https://issues.apache.org/jira/browse/CASSANDRA-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161327#comment-13161327 ] Vijay edited comment on CASSANDRA-3112 at 12/2/11 12:22 AM: Hi Sylvain, I have seen the following issues in the Repairs specially in AWS Multi DC deployments... 1) Stream session or the stream doesn't have any progress (Read Timeout/rpc timeout - Socket timeout might help) 2) Validation compaction completed but the result tree is sent but not received. 3) Repair request is sent but the receiving node didn't receive it. 4) When we have a big repair which runs for hours it will be better to retry the failed part rather than full retry. Do you think it is worth to address this in a separate ticket? else i will close CASSANDRA-3487. was (Author: vijay2...@yahoo.com): Hi Sylvain, I have seen the following issues in the Repairs specially in AWS Multi DC deployments... 1) Stream session or the stream doesn't have any progress (Read Timeout/rpc timeout - Socket timeout might help) 2) Validation compaction completed but the result tree is sent but not received? 3) Repair request is sent but the receiving node didn't receive it? 4) When we have a big repair which runs for hours it will be better to retry the failed part rather than full retry. Do you think it is worth to address this in a separate ticket? else i will close CASSANDRA-3487. Make repair fail when an unexpected error occurs Key: CASSANDRA-3112 URL: https://issues.apache.org/jira/browse/CASSANDRA-3112 Project: Cassandra Issue Type: Bug Components: Core Reporter: Sylvain Lebresne Assignee: Sylvain Lebresne Priority: Minor Labels: repair Fix For: 1.0.6 Attachments: 0003-Report-streaming-errors-back-to-repair-v4.patch, 0004-Reports-validation-compaction-errors-back-to-repair-v4.patch CASSANDRA-2433 makes it so that nodetool repair will fail if a node participating to repair dies before completing his part of the repair. This handles most of the situation where repair was previously hanging, but repair can still hang if an unexpected error occurs during either the merkle tree creation (an on-disk corruption triggers an IOError say) or during streaming (though I'm not sure what could make streaming failed outside of 'one of the node died' (besides a bug)). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1740) Nodetool commands to query and stop compaction, repair, cleanup and scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156810#comment-13156810 ] Vijay edited comment on CASSANDRA-1740 at 11/24/11 5:27 PM: Yeah I did try it and saw the exception, I was not sure who is wrapping it again with RTE. To be more clear: The RTE additional wrap is because WrappableRunnable which catches all Exceptions just to wrap it to a RTE (plz check http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/RuntimeException.java). Hope this helps :) was (Author: vijay2...@yahoo.com): Yeah I did try it and saw the exception, I was not sure who is wrapping it again with rte. Sent from my iPhone Nodetool commands to query and stop compaction, repair, cleanup and scrub - Key: CASSANDRA-1740 URL: https://issues.apache.org/jira/browse/CASSANDRA-1740 Project: Cassandra Issue Type: Improvement Components: Tools Reporter: Chip Salzenberg Assignee: Vijay Priority: Minor Labels: compaction Fix For: 1.0.4 Attachments: 0001-Patch-to-Stop-compactions-v2.patch, 0001-Patch-to-Stop-compactions-v3.patch, 0001-Patch-to-Stop-compactions-v4.patch, 0001-Patch-to-Stop-compactions-v5.patch, 0001-Patch-to-Stop-compactions.patch, CASSANDRA-1740.patch Original Estimate: 24h Remaining Estimate: 24h The only way to stop compaction, repair, cleanup, or scrub in progress is to stop and restart the entire Cassandra server. Please provide nodetool commands to query whether such things are running, and stop them if they are. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3502) Repair of one CF streams data of the other CF's
[ https://issues.apache.org/jira/browse/CASSANDRA-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152274#comment-13152274 ] Vijay edited comment on CASSANDRA-3502 at 11/17/11 7:39 PM: Was actually looking at 0.8.8 patch and thinking of fixing it. seems like it is fixed in 1.0.0, Thanks! was (Author: vijay2...@yahoo.com): Was actually looking at 0.8.8 patch and thinking of fixing it. seems like it is fixed in 1.0.0 Repair of one CF streams data of the other CF's --- Key: CASSANDRA-3502 URL: https://issues.apache.org/jira/browse/CASSANDRA-3502 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.1 Currently Manual Repair on a cf with inconsistent data will stream those ranges for all the CF's in the keyspace. StreamIn.requestRanges() just takes table name as a argument and requests inconsistent ranges for all the CF's within a keyspace. The expected behaviour is to stream only the CF's Ranges. This trigger a lot more compaction of the CF's which are not inconsistent and a lot more traffic between the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3502) Repair of one CF streams data of the other CF's
[ https://issues.apache.org/jira/browse/CASSANDRA-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152274#comment-13152274 ] Vijay edited comment on CASSANDRA-3502 at 11/17/11 7:38 PM: Was actually looking at 0.8.8 patch and thinking of fixing it. seems like it is fixed in 1.0.0 was (Author: vijay2...@yahoo.com): Was actually looking at 0.8.6 patch and thinking of fixing it. seems like it is fixed in 1.0.0 Repair of one CF streams data of the other CF's --- Key: CASSANDRA-3502 URL: https://issues.apache.org/jira/browse/CASSANDRA-3502 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1 Environment: JVM Reporter: Vijay Assignee: Vijay Priority: Minor Fix For: 1.1 Currently Manual Repair on a cf with inconsistent data will stream those ranges for all the CF's in the keyspace. StreamIn.requestRanges() just takes table name as a argument and requests inconsistent ranges for all the CF's within a keyspace. The expected behaviour is to stream only the CF's Ranges. This trigger a lot more compaction of the CF's which are not inconsistent and a lot more traffic between the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira