Re: Repair fails with java.io.IOError: java.io.EOFException
If they are and repair has completed use node tool cleanup to remove the data the node is no longer responsible. See bootstrap section above. I've seen that said a few times so allow me to correct. Cleanup is useless after a repair. 'nodetool cleanup' removes rows the node is not responsible anymore and is thus useful only after operations that change the range a node is responsible for (bootstrap, move, decommission). After a repair, you will need compaction to kick in to see you disk usage come back to normal. -- Sylvain Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 12:44, Sameer Farooqui wrote: Looks like the repair finished successfully the second time. However, the cluster is still severely unbalanced. I was hoping the repair would balance the nodes. We're using random partitioner. One node has 900GB and others have 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just insanely high. Not sure why or how to troubleshoot. On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui cassandral...@gmail.com wrote: I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash? We're running a pretty up to date with Sun Java: ubuntu@ip-10-2-x-x:/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) I'm gonna restart the Repair process in a few more hours. If there are any additional debug or troubleshooting logs you'd like me to enable first, please let me know. - Sameer On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis jbel...@gmail.com wrote: Did you check for a JVM crash log? You should make sure you're running the latest Sun JVM, older versions and OpenJDK in particular are prone to segfaulting. On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui cassandral...@gmail.com wrote: We are starting Cassandra with brisk cassandra, so as a stand-alone process, not a service. The syslog on the node doesn't show anything regarding the Cassandra Java process around the time the last entries were made in the Cassandra system.log (2011-07-21 13:01:51): Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software=rsyslogd swVersion=4.2.0 x-pid=663 x-info=http://www.rsyslog.com;] (re)start The last thing in the Cassandra log before INFO Logging initialized is: INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 4030726144 I can start Repair again, but am worried that it will crash Cassandra again, so I want to turn on any debugging or helpful logs to diagnose the crash if it happens again. - Sameer On Thu, Jul 21, 2011 at 4:30 PM, aaron morton aa...@thelastpickle.com wrote: The default init.d script will direct std out/err to that file, how are you starting brisk / cassandra ? Check the syslog and other logs in /var/log to see if the OS killed cassandra. Also, what was the last thing in the casandra log before INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging initialised ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: Hey Aaron, I don't have any output.log files in that folder: ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls system.log system.log.11 system.log.4 system.log.7 system.log.1 system.log.2 system.log.5 system.log.8 system.log.10 system.log.3 system.log.6 system.log.9 On Thu, Jul 21, 2011 at 3:40 PM, aaron morton aa...@thelastpickle.com wrote: Check /var/log/cassandra/output.log (assuming the default init scripts) A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: Hmm. Just looked at the log more closely. So, what actually happened is while Repair was running on this specific node, the Cassandra java process terminated itself automatically. The last entries in the log are: INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is
Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database
Have you tried some of the ideas about reducing the memory pressure ? How many CF's + second indexes do you have? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 17:10, lebron james wrote: I have only 4GB on server, so i give jvm 3 GB of heap, but this dont help, cassandra still fall when i launch major compaction on 37 GB database.
Re: Repair fails with java.io.IOError: java.io.EOFException
Was guessing something like a token move may have happened in the past. Good suggestion to also kick off a major compaction. I've seen that make a big difference even for apps that do not do deletes, but do do overwrites. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:00, Sylvain Lebresne wrote: If they are and repair has completed use node tool cleanup to remove the data the node is no longer responsible. See bootstrap section above. I've seen that said a few times so allow me to correct. Cleanup is useless after a repair. 'nodetool cleanup' removes rows the node is not responsible anymore and is thus useful only after operations that change the range a node is responsible for (bootstrap, move, decommission). After a repair, you will need compaction to kick in to see you disk usage come back to normal. -- Sylvain Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 12:44, Sameer Farooqui wrote: Looks like the repair finished successfully the second time. However, the cluster is still severely unbalanced. I was hoping the repair would balance the nodes. We're using random partitioner. One node has 900GB and others have 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just insanely high. Not sure why or how to troubleshoot. On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui cassandral...@gmail.com wrote: I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash? We're running a pretty up to date with Sun Java: ubuntu@ip-10-2-x-x:/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) I'm gonna restart the Repair process in a few more hours. If there are any additional debug or troubleshooting logs you'd like me to enable first, please let me know. - Sameer On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis jbel...@gmail.com wrote: Did you check for a JVM crash log? You should make sure you're running the latest Sun JVM, older versions and OpenJDK in particular are prone to segfaulting. On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui cassandral...@gmail.com wrote: We are starting Cassandra with brisk cassandra, so as a stand-alone process, not a service. The syslog on the node doesn't show anything regarding the Cassandra Java process around the time the last entries were made in the Cassandra system.log (2011-07-21 13:01:51): Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software=rsyslogd swVersion=4.2.0 x-pid=663 x-info=http://www.rsyslog.com;] (re)start The last thing in the Cassandra log before INFO Logging initialized is: INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 4030726144 I can start Repair again, but am worried that it will crash Cassandra again, so I want to turn on any debugging or helpful logs to diagnose the crash if it happens again. - Sameer On Thu, Jul 21, 2011 at 4:30 PM, aaron morton aa...@thelastpickle.com wrote: The default init.d script will direct std out/err to that file, how are you starting brisk / cassandra ? Check the syslog and other logs in /var/log to see if the OS killed cassandra. Also, what was the last thing in the casandra log before INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging initialised ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: Hey Aaron, I don't have any output.log files in that folder: ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls system.log system.log.11 system.log.4 system.log.7 system.log.1 system.log.2 system.log.5 system.log.8 system.log.10 system.log.3 system.log.6 system.log.9 On Thu, Jul 21, 2011 at 3:40 PM, aaron morton aa...@thelastpickle.com wrote: Check /var/log/cassandra/output.log (assuming the default init scripts) A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: Hmm. Just looked at the log more
cassandra server disk full
Actually I was wrong�C our patch will disable gosisp and thrift but leave the process running: Could cassandra server be normal without restart after clear the disk? Best Regards Donna li -邮件原件- 发件人: Ryan King [mailto:r...@twitter.com] 发送时间: 2011年7月26日 1:53 收件人: user@cassandra.apache.org 主题: Re: cassandra server disk full Actually I was wrong�C our patch will disable gosisp and thrift but leave the process running: https://issues.apache.org/jira/browse/CASSANDRA-2118 If people are interested in that I can make sure its up to date with our latest version. -ryan On Mon, Jul 25, 2011 at 10:07 AM, Ryan King r...@twitter.com wrote: We have a patch somewhere that will kill the node on IOErrors, since those tend to be of the class that are unrecoverable. -ryan On Thu, Jul 7, 2011 at 8:02 PM, Jonathan Ellis jbel...@gmail.com wrote: Yeah, ideally it should probably die or drop into read-only mode if it runs out of space. (https://issues.apache.org/jira/browse/CASSANDRA-809) Unfortunately dealing with disk-full conditions tends to be a low priority for many people because it's relatively easy to avoid with decent monitoring, but if it's critical for you, we'd welcome the assistance. On Thu, Jul 7, 2011 at 8:34 PM, Donna Li donna...@utstar.com wrote: All: When one of the cassandra servers disk full, the cluster can not work normally, even I make space. I must reboot the server that disk full, the cluster can work normally. Best Regards Donna li -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra 0.7.8 and 0.8.1 fail when major compaction on 37GB database
I have only one CF with one UTF8 column and without indexes. in column always 1 byte of data and keys is 16byte strings.
Re: Capacity Planning
See the Edward Capriolo (media6degrees) – Real World Capacity Planning: Cassandra on Blades and Big Iron at http://www.datastax.com/events/cassandrasf2011/presentations Open ended questions like this are really hard to answer. It's a lot easier for people if you provide some context, some idea of the data or expected loaded or what the app does. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:51, CASSANDRA learner wrote: Can you guys please explain how to do capacity planning in cassandra
Re: Stress test using Java-based stress utility
Thank you every one it is working fine. I was watching jconsole behavior...can tell me where exactly I can find *RecentHitRates : *Tuning for Optimal Caching: Here they have given one example of that * http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches * *RecentHitRates... *In my jconsole within MBean I am unable to find that one. what is the value of long[36] and long[90]. From Jconsole attributes how can I find the *performance of the casssandra while stress testing? Thank You *** On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote: It's in the source distribution under tools/stress see the instructions in the README file and then look at the command line help (bin/stress --help). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: Hi,, I too wanna know what this stress tool do? What is the usage of this tool... Please explain On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com wrote: What does nodetool ring say? On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, I am following this following link http://www.datastax.com/docs/0.7/utilities/stress_java for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
cassandra server disk full
I means the best way is that full disk server will not influence the service of cluster and the server can be automatic come back to work after clean the disk. Best Regards Donna li -邮件原件- 发件人: aaron morton [mailto:aa...@thelastpickle.com] 发送时间: 2011年7月26日 6:25 收件人: user@cassandra.apache.org 主题: Re: cassandra server disk full If the commit log or data disk is full it's not possible for the server to process any writes, the best it could do is perform reads. But reads may result in a write due to read repair and will also need to do some app logging, so IMHO it's really down / dead. You should free space and restart the cassandra service. Restarting a cassandra service should be something your installation can handle. Is there something else I'm missing here ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25 Jul 2011, at 20:06, Donna Li wrote: All: Could anyone help me? Best Regards Donna li -邮件原件- 发件人: Donna Li [mailto:donna...@utstar.com] 发送时间: 2011年7月22日 11:23 收件人: user@cassandra.apache.org 主题: cassandra server disk full All: Is there an easy way to fix the bug by change server's code? Best Regards Donna li -邮件原件- 发件人: Donna Li [mailto:donna...@utstar.com] 发送时间: 2011年7月8日 11:29 收件人: user@cassandra.apache.org 主题: cassandra server disk full Does CASSANDRA-809 resolved or any other path can resolve the problem? Is there any way to avoid reboot the cassandra server? Thanks! Best Regards Donna li -邮件原件- 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 发送时间: 2011年7月8日 11:03 收件人: user@cassandra.apache.org 主题: Re: cassandra server disk full Yeah, ideally it should probably die or drop into read-only mode if it runs out of space. (https://issues.apache.org/jira/browse/CASSANDRA-809) Unfortunately dealing with disk-full conditions tends to be a low priority for many people because it's relatively easy to avoid with decent monitoring, but if it's critical for you, we'd welcome the assistance. On Thu, Jul 7, 2011 at 8:34 PM, Donna Li donna...@utstar.com wrote: All: When one of the cassandra servers disk full, the cluster can not work normally, even I make space. I must reboot the server that disk full, the cluster can work normally. Best Regards Donna li -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
[RELEASE] Apache Cassandra 0.8.2 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.8.2. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is primarily a bug fix release[1] (it fixes a regression of 0.8.1 in particular that made hinted handoff not be delivered correctly) and upgrade is highly encouraged. Please however always pay attention to the release notes[2] before upgrading, If you were to encounter any problem, let us know[4]. Have fun! [1]: http://goo.gl/z61nT (CHANGES.txt) [2]: http://goo.gl/Swjk5 (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: do I need to add more nodes? minor compaction eat all IO
On Mon, Jul 25, 2011 at 6:41 PM, aaron morton aa...@thelastpickle.com wrote: There are no hard and fast rules to add new nodes, but here are two guidelines: 1) Single node load is getting too high, rule of thumb is 300GB is probably too high. What is that rule of thumb based on? I would guess that working set size would matter more than absolute size. Why isn't that the case? Jim
Re: Stress test using Java-based stress utility
cassandra.db.Caches On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Thank you every one it is working fine. I was watching jconsole behavior...can tell me where exactly I can find RecentHitRates : Tuning for Optimal Caching: Here they have given one example of that http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches RecentHitRates... In my jconsole within MBean I am unable to find that one. what is the value of long[36] and long[90]. From Jconsole attributes how can I find the performance of the casssandra while stress testing? Thank You On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote: It's in the source distribution under tools/stress see the instructions in the README file and then look at the command line help (bin/stress --help). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: Hi,, I too wanna know what this stress tool do? What is the usage of this tool... Please explain On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com wrote: What does nodetool ring say? On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, I am following this following link http://www.datastax.com/docs/0.7/utilities/stress_java for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: sstableloader throws storage_port error
If I have Cassandra already running on my machine, how do I configure sstableloader to run on a different IP (127.0.0.2). Also, does that mean in order to use sstableloader on the same machine as an running Cassandra node, I have to have two NIC cards? I looked around for any info about how to configure and run sstableloader, but other than what the cmdline spits out I cant find anything. Are there any examples or best practices? Is it designed to be run on a machine that isn't running a cassandra node? On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis jbel...@gmail.com wrote: sstableloader uses gossip to discover the Cassandra ring, so you'll need to run it on a different IP (127.0.0.2 is fine). On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote: I'm trying to figure out how to use the sstableloader tool. For my test I have a single node cassandra instance running on my local machine. I have cassandra running, and validate this by connecting to it with cassandra-cli. I run sstableloader using the following command: bin/sstableloader /Users/someuser/cassandra/mykeyspace and I get the following error: org.apache.cassandra.config.ConfigurationException: localhost/ 127.0.0.1:7000 is in use by another process. Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services I've played around with different ports, but nothing works. It it because I'm trying to run sstableloader on the same machine that cassandra is running on? It would be odd I think, but cant thing of another reason I would get that eror. Thanks, John -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Thanks, John C
Slow Reads
Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow Reads
i believe it's because it needs to read the whole row to get to your super column. you might have to reconsider your model. Le 26 juil. 2011 17:39, Priyanka priya...@gmail.com a écrit : Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow Reads
It doesn't read the entire row, but it does read a section of the row from disk... How big is each supercolumn? If you re-read the data does the query time get faster? On Tue, Jul 26, 2011 at 11:59 AM, Philippe watche...@gmail.com wrote: i believe it's because it needs to read the whole row to get to your super column. you might have to reconsider your model. Le 26 juil. 2011 17:39, Priyanka priya...@gmail.com a écrit : Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- http://twitter.com/tjake
Re: Slow Reads
On Tue, Jul 26, 2011 at 5:39 PM, Priyanka priya...@gmail.com wrote: Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) What are you trying to query exactly ? All the rows or only one ? Because I'm no expert in pycassa but if read this code and pycassa code correctly, request will query 1024 rows upfront and return an iterator that will eventually read all the rows in the database if you iter. This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra start/stop scripts
I do the same way... On Tue, Jul 26, 2011 at 1:07 PM, mcasandra [via cassandra-u...@incubator.apache.org] ml-node+6622977-1598721269-336...@n2.nabble.com wrote: I need to write cassandra start/stop script. Currently I run cassandra to start and kill -9 to stop. Is this the best way? kill -9 doesn't sound right :) Wondering how others do it. -- If you reply to this email, your message will be added to the discussion below: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622977.html To start a new topic under cassandra-u...@incubator.apache.org, email ml-node+3065146-137246924-336...@n2.nabble.com To unsubscribe from cassandra-u...@incubator.apache.org, click herehttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3065146code=cHJpeWE0MjlAZ21haWwuY29tfDMwNjUxNDZ8MTI0NzM0MTExOQ==. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622997.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow Reads
Thanks Philippe , I have a question here...I am specifying the required super column.Does it still need to read the entire row? Or is it because am listing all the slices and then going to each slice and picking data for the required super column? SlicePredicate slicePredicate = new SlicePredicate(); SliceRange sliceRange = new SliceRange(); sliceRange.setStart(new byte[] {}); sliceRange.setFinish(new byte[] {}); slicePredicate.setSlice_range(sliceRange); ColumnParent columnParent = new ColumnParent(COLUMN_FAMILY); KeyRange keyRange = new KeyRange(); keyRange.start_key= ByteBuffer.wrap(lastkey.getBytes()); keyRange.end_key=ByteBuffer.wrap(.getBytes()); ListKeySlice slices = client.get_range_slices(columnParent, slicePredicate, keyRange, ConsistencyLevel.ONE); Then i loop around slices and and list super columns and set the name of the super column and look for that. Am I missing sth here ? On Tue, Jul 26, 2011 at 11:59 AM, Philippe watche...@gmail.com wrote: i believe it's because it needs to read the whole row to get to your super column. you might have to reconsider your model. Le 26 juil. 2011 17:39, Priyanka priya...@gmail.com a écrit : Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Slow Reads
Supercolumn has two columns and each column has only one byte. It is a bit faster but not significant. On Tue, Jul 26, 2011 at 12:49 PM, Jake Luciani jak...@gmail.com wrote: It doesn't read the entire row, but it does read a section of the row from disk... How big is each supercolumn? If you re-read the data does the query time get faster? On Tue, Jul 26, 2011 at 11:59 AM, Philippe watche...@gmail.com wrote: i believe it's because it needs to read the whole row to get to your super column. you might have to reconsider your model. Le 26 juil. 2011 17:39, Priyanka priya...@gmail.com a écrit : Hello All, I am doing some read tests on Cassandra on a single node.But they are turning up to be very slow. Here is the data model in detail. I am using a super column family.Cassandra has total 970 rows and each row has 620901 super columns and each super column has 2 columns.Total data in the database would be around 45GB. I am trying to retrieve the data of a particular super column[Trying to pull the row key associated with the super column and the column values with in the super column. It is taking 2.5 secs with java code and 4.7 secs with the python code. Here is the python code. result = col_fam.get_range(start=, finish=,columns=None,column_start=,column_finish =,column_reversed=False,column_count=2,row_count=None,include_timestamp=False, super_column='23', read_consistency_level=None,buffer_size=None) This is very slow compared to MySQL. Am not sure whats going wrong here.Could some one let me know if there is any problem with my model. Any help in this regard is highly appreciated. Thank you. Regards, Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6622680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- http://twitter.com/tjake
Re: Slow Reads
this is how my data looks “rowkey1”:{ “supercol1”:{ “col1”:T,”col2”:C} “supercol2”:{“col1”:C,”col2”:T } “supercol3”:{ “col1”:C,”col2”:T} } rowkey2”:{ “supercol1”:{ “col1”:A,”col2”:A} “supercol2”:{“col1”:A,”col2”:T } “supercol3”:{ “col1”:C,”col2”:T} } each row has 620901 super columns and 2 columns for each super column. Name of the super columns remain same for all the rows but the data in each super column is different. I am trying to get the data of a particular super col which is spread across all the rows but with different data. So yes,its getting data from all rows. Please suggest me a better way to do so. Thank you. the output of my query will be (suppose if i do for supercol1) rowkey1,T,C rowkey2,A,A -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-Reads-tp6622680p6623091.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: sstableloader throws storage_port error
After much research and experimentation, I figured out how to get sstableloader running on the same machine as a live cassandra node instance. The key, as Jonathan stated is to configure sstableloader to use a different ipaddress than the running cassandra instance is using. To do this, I ran this command, which created the loopback address for 127.0.0.2 sudo ifconfig lo0 alias 127.0.0.2 No you can have cassandra configured to listen to 127.0.0.1, and sstableloader configured to listen to 127.0.0.2 By the way, to remove this ipaddress, run sudo ifconfig lo0 -alias 127.0.0.2 But thats not really all. Because sstableloader reads the cassandra.yaml file to get the gossip ipaddress, you need to make a copy of the cassandra install directory (or at least the bin and conf folders). Basically one folder with yaml configured for Cassandra, the other folder with yaml configured for sstableloader. Hope this helps people. I've written an in depth description of how to do all this, and can post it if people want, but I'm not sure the etiquette of posting blog links in the email list. Thanks, John On Tue, Jul 26, 2011 at 7:40 AM, John Conwell j...@iamjohn.me wrote: If I have Cassandra already running on my machine, how do I configure sstableloader to run on a different IP (127.0.0.2). Also, does that mean in order to use sstableloader on the same machine as an running Cassandra node, I have to have two NIC cards? I looked around for any info about how to configure and run sstableloader, but other than what the cmdline spits out I cant find anything. Are there any examples or best practices? Is it designed to be run on a machine that isn't running a cassandra node? On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis jbel...@gmail.com wrote: sstableloader uses gossip to discover the Cassandra ring, so you'll need to run it on a different IP (127.0.0.2 is fine). On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote: I'm trying to figure out how to use the sstableloader tool. For my test I have a single node cassandra instance running on my local machine. I have cassandra running, and validate this by connecting to it with cassandra-cli. I run sstableloader using the following command: bin/sstableloader /Users/someuser/cassandra/mykeyspace and I get the following error: org.apache.cassandra.config.ConfigurationException: localhost/ 127.0.0.1:7000 is in use by another process. Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services I've played around with different ports, but nothing works. It it because I'm trying to run sstableloader on the same machine that cassandra is running on? It would be odd I think, but cant thing of another reason I would get that eror. Thanks, John -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Thanks, John C -- Thanks, John C
Re: Cassandra start/stop scripts
Did you install via a package or tarball binaries? Packages allow you to run cassandra as a service with sudo service cassandra start|stop But if you are running via tarballs, then yes, running a kill command against Cassandra is the way to do it since Cassandra runs in crash-only mode. Kill pid would work however. Thanks, Joaquin Casares DataStax Software Engineer/Support On Tue, Jul 26, 2011 at 12:19 PM, Priyanka priya...@gmail.com wrote: I do the same way... On Tue, Jul 26, 2011 at 1:07 PM, mcasandra [via [hidden email]http://user/SendEmail.jtp?type=nodenode=6622997i=0] [hidden email] http://user/SendEmail.jtp?type=nodenode=6622997i=1wrote: I need to write cassandra start/stop script. Currently I run cassandra to start and kill -9 to stop. Is this the best way? kill -9 doesn't sound right :) Wondering how others do it. -- If you reply to this email, your message will be added to the discussion below: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622977.html To start a new topic under [hidden email]http://user/SendEmail.jtp?type=nodenode=6622997i=2, email [hidden email]http://user/SendEmail.jtp?type=nodenode=6622997i=3 To unsubscribe from [hidden email]http://user/SendEmail.jtp?type=nodenode=6622997i=4, click here. -- View this message in context: Re: Cassandra start/stop scriptshttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622997.html Sent from the cassandra-u...@incubator.apache.org mailing list archivehttp://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/at Nabble.com.
Re: Predictable low RW latency, SLABS and STW GC
Restarting the service will drop all the memmapped caches, cassandra caches are saved / persistent and you can also use memcachd if you want. Well, the OS won't evict everything from page cache just because the last process to map them exits. That said, since restarts tend to have secondary effects on caches like streaming through all the bf and index files, restarts are certainly detrimental to the page cache. Also you may still see some eviction (even if it doesn't *necessarily* happen) depending (particularly if not running with numactl set to interleave). -- / Peter Schuller (@scode on twitter)
Re: Repair fails with java.io.IOError: java.io.EOFException
Thanks for the info guys. I'm running compaction on the two very highly loaded nodes now in hopes of the data volume going down. But I'm skeptical because I don't see how it got so unbalanced in the first place (all nodes were up while the writes were being injected). I should have an update tomorrow on whether compaction rebalanced the nodes. The tokens are evenly distributed across the ring: Address DC Rack Status State Load Owns Token 148873535527910577765226390751398592512 10.192.143.x DC1 RAC1 Up Normal 643.42 GB 12.50% 0 10.192.171.x DC1 RAC1 Up Normal 128.96 GB 6.25% 21267647932558653966460912964485513216 10.210.95.x DC1 RAC1 Up Normal 128.34 GB 12.50% 42535295865117307932921825928971026432 10.211.19.x DC1 RAC1 Up Normal 128.55 GB 6.25% 63802943797675961899382738893456539648 10.68.58.x DC1 RAC2 Up Normal 643.05 GB 12.50% 85070591730234615865843651857942052864 10.110.31.x DC1 RAC2 Up Normal 128.84 GB 6.25% 106338239662793269832304564822427566080 10.96.58.x DC1 RAC2 Up Normal 128.11 GB 12.50% 127605887595351923798765477786913079296 10.210.195.x DC1 RAC2 Up Normal 129.33 GB 6.25% 148873535527910577765226390751398592512 10.114.138.x DC2 RAC1 Up Normal 258.04 GB 6.25% 10633823966279326983230456482242756608 10.203.79.x DC2 RAC1 Up Normal 257.14 GB 6.25% 53169119831396634916152282411213783040 10.242.209.x DC2 RAC1 Up Normal 256.58 GB 6.25% 95704415696513942849074108340184809472 10.38.25.x DC2 RAC1 Up Normal 257.08 GB 6.25% 138239711561631250781995934269155835904 On Tue, Jul 26, 2011 at 1:59 AM, aaron morton aa...@thelastpickle.comwrote: Was guessing something like a token move may have happened in the past. Good suggestion to also kick off a major compaction. I've seen that make a big difference even for apps that do not do deletes, but do do overwrites. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:00, Sylvain Lebresne wrote: If they are and repair has completed use node tool cleanup to remove the data the node is no longer responsible. See bootstrap section above. I've seen that said a few times so allow me to correct. Cleanup is useless after a repair. 'nodetool cleanup' removes rows the node is not responsible anymore and is thus useful only after operations that change the range a node is responsible for (bootstrap, move, decommission). After a repair, you will need compaction to kick in to see you disk usage come back to normal. -- Sylvain Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 12:44, Sameer Farooqui wrote: Looks like the repair finished successfully the second time. However, the cluster is still severely unbalanced. I was hoping the repair would balance the nodes. We're using random partitioner. One node has 900GB and others have 128GB, 191GB, 129GB, 257 GB, etc. The 900GB and the 646GB are just insanely high. Not sure why or how to troubleshoot. On Fri, Jul 22, 2011 at 1:28 PM, Sameer Farooqui cassandral...@gmail.com wrote: I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash? We're running a pretty up to date with Sun Java: ubuntu@ip-10-2-x-x:/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) I'm gonna restart the Repair process in a few more hours. If there are any additional debug or troubleshooting logs you'd like me to enable first, please let me know. - Sameer On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis jbel...@gmail.com wrote: Did you check for a JVM crash log? You should make sure you're running the latest Sun JVM, older versions and OpenJDK in particular are prone to segfaulting. On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui cassandral...@gmail.com wrote: We are starting Cassandra with brisk cassandra, so as a stand-alone process, not a service. The syslog on the node doesn't show anything regarding the Cassandra Java process around the time the last entries were made in the Cassandra system.log (2011-07-21 13:01:51): Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software=rsyslogd swVersion=4.2.0 x-pid=663 x-info=http://www.rsyslog.com;] (re)start The last thing in the Cassandra log before INFO Logging initialized is: INFO
Re: Cassandra start/stop scripts
Check out the rpm packages from Cassandra they have init.d scripts that work very nicely, there are debs as well for ubuntu Sent from my iPhone On Jul 27, 2011, at 3:19, Priyanka priya...@gmail.com wrote: I do the same way... On Tue, Jul 26, 2011 at 1:07 PM, mcasandra [via [hidden email]] [hidden email] wrote: I need to write cassandra start/stop script. Currently I run cassandra to start and kill -9 to stop. Is this the best way? kill -9 doesn't sound right :) Wondering how others do it. If you reply to this email, your message will be added to the discussion below: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622977.html To start a new topic under [hidden email], email [hidden email] To unsubscribe from [hidden email], click here. View this message in context: Re: Cassandra start/stop scripts Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Internal error processing get during bootstrap
Hello, I'm evaluating cassandra for use in my system. I could add approximately 16 million items using a single node. I'm using libcassandra (I can find my way through its code when I need to) to connect to it and I already have some infrastructure for handling and adding those items (I was using tokio cabinet before). I couldn't find much documentation regarding how to make a cluster, but it seemed simple enough. At cassandra server A (10.0.0.2) I had seeds: locahost. At server B (10.0.0.3) I configured seeds: 10.0.0.2 and auto_bootstrap: true. Then I created a keyspace and a few column families in it. I imediately began to add items and to get all these Internal error processing get. I found it quite odd, I thought it had to do with the load I was putting in, seeing that a few small tests had worked before. I spent quite some time debugging, when I finally decided to write this e-mail. I wanted to double check stuff, so I ran nodetool to see if everything was right. To my surprise, there was only one of the node available. It took a little while for the other one to show up as Joining and then as Normal. After I waited that period, I was able to insert items to the cluster with no error at all. Is that expected behaviour? What is the recommended way to setup a cluster? Should it be done manually. Setting up the machines, creating all keyspaces and colum families then checking nodetool and waiting for it to get stable? On a side note, sometimes I get Default TException (that seems to happen when the machine is in a heavier load than usual), commonly retrying the read or insert right after works fine. Is that what's supposed to happen? Perhaps I should raise some timeout somewhere? This is what ./bin/nodetool -h localhost ring reports me: Address DC Rack Status State Load Owns Token 119105113551249187083945476614048008053 10.0.0.3 datacenter1 rack1 Up Normal 3.43 GB 65.90% 61078635599166706937511052402724559481 10.0.0.2 datacenter1 rack1 Up Normal 1.77 GB 34.10% 119105113551249187083945476614048008053 It's still adding stuff. I have no idea why B owns so many more keys than A. I'm sorry if what I'm asking is trivial. But I have been having a hard time finding documentation. I've found a lot of outdated stuff, which was frustrating. I hope you guys have the time to help me out or -- if not -- I hope you can give me good reading material. Thank you, Rafael
Cassandra allocation unit size
Hi, which is the allocation unit size to format a hard drive to use Cassandra, usingUbuntu server and a SAN (Storage Area Network).
Re: Cassandra allocation unit size
You should see this thread about Cassandra with a SAN: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-on-iSCSI-td5945217.html On Tue, Jul 26, 2011 at 4:38 PM, Andres Rodriguez Contreras anrocoubu...@gmail.com wrote: Hi, which is the allocation unit size to format a hard drive to use Cassandra, usingUbuntu server and a SAN (Storage Area Network). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Recovering from a multi-node cluster failure caused by OOM on repairs
Hi, I thought I share the following with this mailing list as a number of other users seem to have had similar problems. We have the following set-up: OS: CentOS 5.5 RAM: 16GB JVM heap size: 8GB (also tested with 14GB) Cassandra version: 0.7.6-2 (also tested with 0.7.7) Oracle JDK version: 1.6.0_26 Number of nodes: 5 Load per node: ~40GB Replication factor: 3 Number of requests/day: 2.5 Million (95% inserts) Total net insert data/day: 1GB Default TTL for most of the data: 10 days This set-up has been operating successfully for a few months, however recently we started seeing multi-node failures, usually triggered by a repair, but occasionally also under normal operation. A repair on node 3,4 and 5 would always cause the cluster as whole to fail, whereas node 1 2 completed their repair cycles successfully. These failures would usually result in 2 or 3 nodes becoming unresponsive and dropping out of the cluster, resulting in client failure rates to spike up to ~10%. We normally operate with a failure rate of 0.1%. The relevant log entries showed a complete heap memory exhaustion within 1 minute (see log lines below where we experimented with a larger heap size of 14GB). Also of interest was a number of huge SliceQueryFilter collections running concurrently on the nodes in question (see log lines below). The way we ended recovering from this situation was as follows. Remember these steps were taken to get an unstable cluster back under control, so you might want to revert some of the changes once the cluster is stable again. Set disk_access_mode: standard in cassandra.yaml This allowed us to prevent the JVM blowing out the hard limit of 8GB via large mmaps. Heap size was set to 8GB (RAM/2). That meant the JVM was never using more than 8GB total. mlockall didn't seem to make a difference for our particular problem. Turn off all row key caches via cassandra-cli, e.g. update column family Example with rows_cached=0; update column family Example with keys_cached=0; We were seeing compacted row maximum sizes of ~800MB from cfstats, that's why we turned them all off. Again, we saw a significant drop in the actual memory used from the available maximum of 8GB. Obviously, this will affect reads, but as 95% of our requests are inserts, it didn't matter so much for us. Bootstrap problematic node: Kill Cassandra Change auto_bootstrap: true in cassandra.yaml, remove own IP address from list of seeds (important) Delete all data directories (i.e. commit-log, data, saved-caches) Start Cassandra Wait for bootstrap to finish (see log nodetool) Change auto_bootstrap: false (Run repair) The first bootstrap completed very quickly, so we decided to bootstrap every node in the cluster (not just the problematic ones). This resulted in some data loss. The next time we will follow the bootstrap by a repair before bootstrapping repairing the next node to minimize data loss. After this procedure, the cluster was operating normally again. We now run a continuous rolling repair, followed by a (major) compaction and a manual garbage collection. As the repairs a required anyway, we decided to run them all the time in a continuous fashion. Therefore, potential problems can be identified earlier. The major compaction followed by a manual GC allows us to keep the disk usage low on each node. The manual GC is necessary as the unused files on disk are only really deleted when the reference is garbage collected inside the JVM (a restart would achieve the same). We also collected some statistics in regards to the duration of some of the operations: cleanup/compact: ~1 min/GB repair: ~2-3 min/GB bootstrap: ~1 min/GB This means that if you have a node with 60GB of data, it will take ~1hr to compact and ~2-3hrs to repair. Therefore, it is advisable to keep the data per node below ~120GB. We achieve this by using an aggressive TTL on most of our writes. Cheers, Teijo Here are the relevant log entries showing the OOM conditions: [2011-07-21 11:12:11,059] INFO: GC for ParNew: 1141 ms, 509843976 reclaimed leaving 1469443752 used; max is 14675869696 (ScheduledTasks:1 GCInspector.java:128) [2011-07-21 11:12:15,226] INFO: GC for ParNew: 1149 ms, 564409392 reclaimed leaving 2247228920 used; max is 14675869696 (ScheduledTasks:1 GCInspector.java:128) ... [2011-07-21 11:12:55,062] INFO: GC for ParNew: 1110 ms, 564365792 reclaimed leaving 12901974704 used; max is 14675869696 (ScheduledTasks:1 GCInspector.java:128) [2011-07-21 10:57:23,548] DEBUG: collecting 4354206 of 2147483647: 940657e5b3b0d759eb4a14a7228ae365:false:41@1311102443362542 (ReadStage:27 SliceQueryFilter.java:123)
Re: Stress test using Java-based stress utility
Thank you Jonathan.. :) On 26 July 2011 20:08, Jonathan Ellis jbel...@gmail.com wrote: cassandra.db.Caches On Tue, Jul 26, 2011 at 2:11 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Thank you every one it is working fine. I was watching jconsole behavior...can tell me where exactly I can find RecentHitRates : Tuning for Optimal Caching: Here they have given one example of that http://www.datastax.com/docs/0.8/operations/cache_tuning#configuring-key-and-row-caches RecentHitRates... In my jconsole within MBean I am unable to find that one. what is the value of long[36] and long[90]. From Jconsole attributes how can I find the performance of the casssandra while stress testing? Thank You On 26 July 2011 14:33, aaron morton aa...@thelastpickle.com wrote: It's in the source distribution under tools/stress see the instructions in the README file and then look at the command line help (bin/stress --help). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26 Jul 2011, at 19:40, CASSANDRA learner wrote: Hi,, I too wanna know what this stress tool do? What is the usage of this tool... Please explain On Fri, Jul 22, 2011 at 6:39 PM, Jonathan Ellis jbel...@gmail.com wrote: What does nodetool ring say? On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, I am following this following link http://www.datastax.com/docs/0.7/utilities/stress_java for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com