Re: Backup solution
Thank you. I have a high bandwidth connection. But that also means that regular repairs on the backup data-center will take a long time. 2013/3/14 Jabbar Azam aja...@gmail.com Hello, If the live data centre disappears restoring the data from the backup is going to take ages especially if the data is going from one data centre to another, unless you have a high bandwidth connection between data centres or you have a small amount of data. Jabbar Azam On 14 Mar 2013 14:31, Rene Kochen rene.koc...@schange.com wrote: Hi all, Is the following a good backup solution. Create two data-centers: - A live data-center with multiple nodes (commodity hardware). Clients connect to this cluster with LOCAL_QUORUM. - A backup data-center with 1 node (with fast SSDs). Clients do not connect to this cluster. Cluster only used for creating and storing snapshots. Advantages: - No snapshots and bulk network I/O (transfer snapshots) needed on the live cluster. - Clients are not slowed down because writes to the backup data-center are async. - On the backup cluster snapshots are made on a regular basis. This again does not affect the live cluster. - The back-up cluster does not need to process client requests/reads, so we need less machines for the backup cluster than the live cluster. Are there any disadvantages with this approach? Thanks!
Re: cql query not giving any result.
Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: cql query not giving any result.
Here is a list of keywords and whether or not the words are reserved. A reserved keyword cannot be used as an identifier unless you enclose the word in double quotation marks. Non-reserved keywords have a specific meaning in certain context but can be used as an identifier outside this context. http://www.datastax.com/docs/1.2/cql_cli/cql_lexicon#cql-keywords On Fri, Mar 15, 2013 at 6:43 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: cql query not giving any result.
On Fri, Mar 15, 2013 at 11:43 AM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . No, it shouldn't be possible and that is your problem. How did you created that table? -- Sylvain Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: cql query not giving any result.
Hi Sylvain, I created it using thrift client, here is column family creation script, Cassandra.Client client; CfDef user_Def = new CfDef(); user_Def.name = DOCTOR; user_Def.keyspace = KunderaExamples; user_Def.setComparator_type(UTF8Type); user_Def.setDefault_validation_class(UTF8Type); user_Def.setKey_validation_class(UTF8Type); ColumnDef key = new ColumnDef(ByteBuffer.wrap(KEY.getBytes()), UTF8Type); key.index_type = IndexType.KEYS; ColumnDef age = new ColumnDef(ByteBuffer.wrap(AGE.getBytes()), UTF8Type); age.index_type = IndexType.KEYS; user_Def.addToColumn_metadata(key); user_Def.addToColumn_metadata(age); client.set_keyspace(KunderaExamples); client.system_add_column_family(user_Def); Thanks KK On Fri, Mar 15, 2013 at 4:24 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Fri, Mar 15, 2013 at 11:43 AM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . No, it shouldn't be possible and that is your problem. How did you created that table? -- Sylvain Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: cql query not giving any result.
Ok. So it's a case when, CQL returns rowkey value as key and there is also column present with name as key. Sounds like a bug? -Vivek On Fri, Mar 15, 2013 at 5:17 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi Sylvain, I created it using thrift client, here is column family creation script, Cassandra.Client client; CfDef user_Def = new CfDef(); user_Def.name = DOCTOR; user_Def.keyspace = KunderaExamples; user_Def.setComparator_type(UTF8Type); user_Def.setDefault_validation_class(UTF8Type); user_Def.setKey_validation_class(UTF8Type); ColumnDef key = new ColumnDef(ByteBuffer.wrap(KEY.getBytes()), UTF8Type); key.index_type = IndexType.KEYS; ColumnDef age = new ColumnDef(ByteBuffer.wrap(AGE.getBytes()), UTF8Type); age.index_type = IndexType.KEYS; user_Def.addToColumn_metadata(key); user_Def.addToColumn_metadata(age); client.set_keyspace(KunderaExamples); client.system_add_column_family(user_Def); Thanks KK On Fri, Mar 15, 2013 at 4:24 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Fri, Mar 15, 2013 at 11:43 AM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . No, it shouldn't be possible and that is your problem. How did you created that table? -- Sylvain Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: Backup solution
On Fri, Mar 15, 2013 at 3:12 AM, Rene Kochen rene.koc...@emea.schange.com wrote: Thank you. I have a high bandwidth connection. But that also means that regular repairs on the backup data-center will take a long time. Honestly, at this point I don't think anyone can provide you any good feedback based on facts because so far you haven't given us any facts. Like: 1. How big of a data set? 2. How many nodes in your primary DC? 3. How many transactions/sec is your primary DC doing? 4. What are your uptime SLA's? 5. Just how fast is high bandwidth How much latency? Anyways, will it work? Possibly. What are the disadvantages? Well it depends on a bunch of things you haven't told us. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Backup solution
You can consider using a WAN optimization appliance such as a Riverbed Steelhead to significantly speed up your transfers, though that will cost. It is a common approach to speed up inter-datacenter transfers. Steelheads for the AWS EC2 cloud are also available. (Disclaimer: I used to write software for the physical and AWS Steelheads.) Philip On Mar 15, 2013, at 9:22 AM, Aaron Turner synfina...@gmail.com wrote: On Fri, Mar 15, 2013 at 3:12 AM, Rene Kochen rene.koc...@emea.schange.com wrote: Thank you. I have a high bandwidth connection. But that also means that regular repairs on the backup data-center will take a long time. Honestly, at this point I don't think anyone can provide you any good feedback based on facts because so far you haven't given us any facts. Like: 1. How big of a data set? 2. How many nodes in your primary DC? 3. How many transactions/sec is your primary DC doing? 4. What are your uptime SLA's? 5. Just how fast is high bandwidth How much latency? Anyways, will it work? Possibly. What are the disadvantages? Well it depends on a bunch of things you haven't told us. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: cassandra 1.2.2 build generates slightly different than one on website
Very weird. I went back and tried to reproduce it after cleaning all changes from git. I am not sure how that got deleted nor how I ended up with wordcount *.class files since I am not doing any map/reduce or anything…..oh well, must have made a mistake somewhere. Thanks, Dean From: Sylvain Lebresne sylv...@datastax.commailto:sylv...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, March 15, 2013 11:51 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: cassandra 1.2.2 build generates slightly different than one on website I suspect you are doing something wrong because both the released archive (I just checked) and the tag (as shown here: https://github.com/apache/cassandra/blob/cassandra-1.2.2/bin/cassandra.in.sh) have the cassandra.in.shhttp://cassandra.in.sh file. -- Sylvain On Fri, Mar 15, 2013 at 6:11 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: On git, I checkout out tag 1.2.2 and built it and then tar –xvf the bin distro, but it 1. Has extra *.class files in apache-cassandra-1.2.2-SNAPSHOT/bin directory 2. Is missing the cassandra.in.shhttp://cassandra.in.sh so it would not actually start properly? The second one took me a while to figure out. This makes me unsure if the tag is actually matching what was released? Thanks, Dean
Re: Can't replace dead node
I removed Priam and get the same picture. What I do is- I added to cassandra-env.sh two lines and start cassandra. JVM_OPTS=$JVM_OPTS -Dcassandra.initial_token=aaba JVM_OPTS=$JVM_OPTS -Dcassandra.replace_token=aaba Then I can successfully run ring command Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State LoadOwns Token Token(bytes[aaba]) 10.28.241.14us-east 1a Up Normal 251.96 GB 33.33% Token(bytes[0010]) 10.240.119.230 us-east 1b Up Normal 252.48 GB 33.33% Token(bytes[5565]) 10.147.174.27 us-east 1c Up Normal 11.26 KB 33.33% Token(bytes[aaba]) It shows the current node as part of the ring, but it is empty. In data directory I can see only system key space. There is no any errors in log file. It just doen't stream data from other nodes. I can launch 1.1.6 but not 1.1.7 or higher. Any ideas what was changed in 1.1.7? Thank you, Andrey INFO [main] 2013-03-15 18:20:45,303 AbstractCassandraDaemon.java (line 101) Logging initialized INFO [main] 2013-03-15 18:20:45,309 AbstractCassandraDaemon.java (line 122) JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_35 INFO [main] 2013-03-15 18:20:45,310 AbstractCassandraDaemon.java (line 123) Heap size: 1931476992/1931476992 INFO [main] 2013-03-15 18:20:45,311 AbstractCassandraDaemon.java (line 124) Classpath: /opt/apache-cassandra-1.1.10/bin/../conf:/opt/apache-cassandra-1.1.10/bin/../build/classes/main:/opt/apache-cassandra-1.1.10/bin/../build/classes/thrift:/opt/apache-cassandra-1.1.10/bin/../lib/antlr-3.2.jar:/opt/apache-cassandra-1.1.10/bin/../lib/apache-cassandra-1.1.10.jar:/opt/apache-cassandra-1.1.10/bin/../lib/apache-cassandra-clientutil-1.1.10.jar:/opt/apache-cassandra-1.1.10/bin/../lib/apache-cassandra-thrift-1.1.10.jar:/opt/apache-cassandra-1.1.10/bin/../lib/avro-1.4.0-fixes.jar:/opt/apache-cassandra-1.1.10/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/apache-cassandra-1.1.10/bin/../lib/commons-cli-1.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/commons-codec-1.2.jar:/opt/apache-cassandra-1.1.10/bin/../lib/commons-lang-2.4.jar:/opt/apache-cassandra-1.1.10/bin/../lib/compress-lzf-0.8.4.jar:/opt/apache-cassandra-1.1.10/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/opt/apache-cassandra-1.1.10/bin/../lib/guava-r08.jar:/opt/apache-cassandra-1.1.10/bin/../lib/high-scale-lib-1.1.2.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jackson-core-asl-1.9.2.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jackson-mapper-asl-1.9.2.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jamm-0.2.5.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jline-0.9.94.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jna.jar:/opt/apache-cassandra-1.1.10/bin/../lib/json-simple-1.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/libthrift-0.7.0.jar:/opt/apache-cassandra-1.1.10/bin/../lib/log4j-1.2.16.jar:/opt/apache-cassandra-1.1.10/bin/../lib/metrics-core-2.0.3.jar:/opt/apache-cassandra-1.1.10/bin/../lib/priam.jar:/opt/apache-cassandra-1.1.10/bin/../lib/servlet-api-2.5-20081211.jar:/opt/apache-cassandra-1.1.10/bin/../lib/slf4j-api-1.6.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/snakeyaml-1.6.jar:/opt/apache-cassandra-1.1.10/bin/../lib/snappy-java-1.0.4.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/snaptree-0.1.jar:/opt/apache-cassandra-1.1.10/bin/../lib/jamm-0.2.5.jar INFO [main] 2013-03-15 18:20:47,406 CLibrary.java (line 111) JNA mlockall successful INFO [main] 2013-03-15 18:20:47,419 DatabaseDescriptor.java (line 123) Loading settings from file:/opt/apache-cassandra-1.1.10/conf/cassandra.yaml INFO [main] 2013-03-15 18:20:47,840 DatabaseDescriptor.java (line 182) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO [main] 2013-03-15 18:20:47,853 DatabaseDescriptor.java (line 246) Global memtable threshold is enabled at 614MB INFO [main] 2013-03-15 18:20:47,879 Ec2Snitch.java (line 66) EC2Snitch using region: us-east, zone: 1c. INFO [main] 2013-03-15 18:20:48,359 CacheService.java (line 96) Initializing key cache with capacity of 92 MBs. INFO [main] 2013-03-15 18:20:48,376 CacheService.java (line 107) Scheduling key cache save to each 14400 seconds (going to save all keys). INFO [main] 2013-03-15 18:20:48,377 CacheService.java (line 121) Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider INFO [main] 2013-03-15 18:20:48,384 CacheService.java (line 133) Scheduling row cache save to each 0 seconds (going to save all keys). INFO [main] 2013-03-15 18:20:48,661 DatabaseDescriptor.java (line 509) Couldn't detect any schema definitions in local storage.
Re: HintedHandoff IOError?
JMX ended up just with lots more IOErrors. Did a rolling restart of the cluster and removed the HH family in the mean time. That seemed to do the trick. Thanks! /Janne On Mar 14, 2013, at 06:58 , aaron morton aa...@thelastpickle.com wrote: What is the sanctioned way of removing hints? rm -f HintsColumnFamily*? Truncate from CLI? There is a JMX command to do it for a particular node. But if you just want to remove all of them, stop and delete the files. the only one with zero size are the -tmp- files. It seems odd… Temp files are created during compaction and flushing sstables. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 11/03/2013, at 11:19 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Oops, forgot to mention that, did I… Cass 1.1.10. What is the sanctioned way of removing hints? rm -f HintsColumnFamily*? Truncate from CLI? This is ls -l of my /system/HintsColumnFamily/ btw - the only one with zero size are the -tmp- files. It seems odd… -rw-rw-r-- 1 ubuntu ubuntu 86373144 Jan 26 21:39 system-HintsColumnFamily-hf-11-Data.db -rw-rw-r-- 1 ubuntu ubuntu 80 Jan 26 21:39 system-HintsColumnFamily-hf-11-Digest.sha1 -rw-rw-r-- 1 ubuntu ubuntu 976 Jan 26 21:39 system-HintsColumnFamily-hf-11-Filter.db -rw-rw-r-- 1 ubuntu ubuntu 11 Jan 26 21:39 system-HintsColumnFamily-hf-11-Index.db -rw-rw-r-- 1 ubuntu ubuntu 4348 Jan 26 21:39 system-HintsColumnFamily-hf-11-Statistics.db -rw-rw-r-- 1 ubuntu ubuntu 569 Feb 27 08:33 system-HintsColumnFamily-hf-23-Data.db -rw-rw-r-- 1 ubuntu ubuntu 80 Feb 27 08:33 system-HintsColumnFamily-hf-23-Digest.sha1 -rw-rw-r-- 1 ubuntu ubuntu 1936 Feb 27 08:33 system-HintsColumnFamily-hf-23-Filter.db -rw-rw-r-- 1 ubuntu ubuntu 11 Feb 27 08:33 system-HintsColumnFamily-hf-23-Index.db -rw-rw-r-- 1 ubuntu ubuntu 4356 Feb 27 08:33 system-HintsColumnFamily-hf-23-Statistics.db -rw-rw-r-- 1 ubuntu ubuntu 5500155 Feb 27 08:57 system-HintsColumnFamily-hf-24-Data.db -rw-rw-r-- 1 ubuntu ubuntu 80 Feb 27 08:57 system-HintsColumnFamily-hf-24-Digest.sha1 -rw-rw-r-- 1 ubuntu ubuntu 16 Feb 27 08:57 system-HintsColumnFamily-hf-24-Filter.db -rw-rw-r-- 1 ubuntu ubuntu 26 Feb 27 08:57 system-HintsColumnFamily-hf-24-Index.db -rw-rw-r-- 1 ubuntu ubuntu 4340 Feb 27 08:57 system-HintsColumnFamily-hf-24-Statistics.db -rw-rw-r-- 1 ubuntu ubuntu0 Feb 27 08:57 system-HintsColumnFamily-tmp-hf-25-Data.db -rw-rw-r-- 1 ubuntu ubuntu0 Feb 27 08:57 system-HintsColumnFamily-tmp-hf-25-Index.db /Janne On Mar 12, 2013, at 08:07 , aaron morton aa...@thelastpickle.com wrote: What version of cassandra are you using? I would stop each node and delete the hints. If it happens again I could either indicate a failing disk or a bug. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 11/03/2013, at 2:13 PM, Robert Coli robert.d.a.c...@gmail.com wrote: On Mon, Mar 11, 2013 at 7:05 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: I keep seeing these in my log. Three-node cluster, one node is working fine, but two other nodes have increased latencies and these in the error logs (might of course be unrelated). No obvious GC pressure, no disk errors that I can see. Ubuntu 12.04 on EC2, Java 7. Repair is run regularly. My two questions: 1) should I worry, and 2) what might be going on, and 3) is there any way to get rid of these? Can I just blow my HintedHandoff table to smithereens? http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/IndexHelper.java public static Filter defreezeBloomFilter(FileDataInput file, long maxSize, boolean useOldBuffer) throws IOException { int size = file.readInt(); if (size maxSize || size = 0) throw new EOFException(bloom filter claims to be + size + bytes, longer than entire row size + maxSize); ByteBuffer bytes = file.readBytes(size); Based on the above, I would suspect either a zero byte -Filter.db file or a corrupt one. Probably worry a little bit, but only a little bit unless your cluster is RF=1. =Rob
secondary index problem
We have a CF with an indexed column 'type', but we get incomplete results when we query that CF for all rows matching 'type'. We can find the missing rows if we query by key. * we are seeing this on a small, single node, 1.2.2 instance with few rows. * we use thrift execute_cql_query, no CL is specified * none of repair, restart, compact, scrub helped Finally, nodetool rebuild_index fixed it. Is index rebuild something we need to do periodically? How often? Is there a way to know when it needs to be done? Do we have to run rebuild on all nodes? We have not noticed this until 1.2 Regards, - Brett smime.p7s Description: S/MIME cryptographic signature
Secondary Indexes
We need to provide search capability based on a field that is a bitmap combination of 18 possible values. We want to use secondary indexes to improve performance. One possible solution is to create a named column for each value and have a secondary index for each of the 18 columns. Questions we have are: - Will that result in Cassandra creating 18 new column families, one for each index? - If a given column is not specified in any rows, will Cassandra still create an index column family? - The documentation says that indexes are rebuilt with every Cassandra restart. Why is that needed? What does the rebuild do? Does it read the whole column family into memory at once?
Re: secondary index problem
This could be either of the following bugs (which might be the same thing). I get it too every time I recycle a node on 1.1.10. https://issues.apache.org/jira/browse/CASSANDRA-4973 or https://issues.apache.org/jira/browse/CASSANDRA-4785 /Janne On Mar 15, 2013, at 23:24 , Brett Tinling btinl...@lacunasystems.com wrote: We have a CF with an indexed column 'type', but we get incomplete results when we query that CF for all rows matching 'type'. We can find the missing rows if we query by key. * we are seeing this on a small, single node, 1.2.2 instance with few rows. * we use thrift execute_cql_query, no CL is specified * none of repair, restart, compact, scrub helped Finally, nodetool rebuild_index fixed it. Is index rebuild something we need to do periodically? How often? Is there a way to know when it needs to be done? Do we have to run rebuild on all nodes? We have not noticed this until 1.2 Regards, - Brett
java.lang.OutOfMemoryError: unable to create new native thread
I have a Cassandra node that is going down frequently with 'java.lang.OutOfMemoryError: unable to create new native thread. Its a 16GB VM out of which 4GB is set as Xmx and there are no other process running on the VM. I have about 300 clients connecting to this node on an average. I have no indication from vmstats/SAR that my VM has used more memory or is memory hungry. Doesn't indicate a memory issue to me. Appreciate any pointers to this. System Specs: 2CPU16GBRHEL 6.2 Thank you.
Cassandra Compression and Wide Rows
Hey Guys, I remember reading somewhere that C* compression is not very effective when most of the CFs are in wide-row format and some folks turn the compression off and use disk level compression as a workaround. Considering that wide rows with composites are first class citizens in CQL3, is this still the case? Has there been any improvements on this? Thanks, Drew
Waiting on read repair?
I've got a couple of questions related issues I'm encountering using Cassandra under a heavy write load: 1. With a ConsistencyLevel of quorum, does FBUtilities.waitForFutures() wait for read repair to complete before returning? 2. When read repair applies a mutation, it needs to obtain a lock for the associated memtable. Does compaction obtain this same lock? (I'm asking because I've seen readrepair spend a few seconds stalling in org.apache.cassandra.db.Table.apply). Thanks, Jasdeep
Re: Backup solution
On Fri, Mar 15, 2013 at 10:35 AM, Rene Kochen rene.koc...@emea.schange.com wrote: Hi Aaron, We have many deployments, but typically: - Live cluster of six nodes, replication factor = 3. - A node processes more reads than writes (approximately 100 get_slices per/second, narrow rows). - Data per node is about 50 to 100 GBytes. - We should recover within 4 hours. The idea is to put the backup cluster close to the live cluster with a gigabit connection only for Cassandra. 100 reads/sec/node doesn't sound like a lot to me... And 100G/node is far below the recommended limit. Sounds to me you've possibly over spec'd your cluster (not a bad thing, just an observation). Of course, if your data set is growing, then... That said, I wouldn't consider a single node in a 2nd DC receiving updates via Cassandra a backup. That's because a bug in cassandra which corrupts your data or a user accidentally doing the wrong thing (like issuing deletes they shouldn't) means that get's replicated to all your nodes- including the one in the other DC. A real backup would be to take snapshots on the nodes and then copy them off the cluster. I'd say replication is good if you want a hot-standby for a disaster recovery site so you can quickly recover from a hardware fault. Especially if you have a 4hr SLA, how are you going to get your primary DC back up after a fire, earthquake, etc in 4 hours? Heck, a switch failure might knock you out for 4 hours depending on how quickly you can swap another one in and how recent your config backups are. Better to have a DR site with a smaller set of nodes with the data ready to go. Maybe they won't be as fast, but hopefully you can make sure the most important queries are handled. But for that, I would probably go with something more then just a single node in the DR DC. One thing to remember is that compactions will impact the feasible single node size to something smaller then you can potentially allocate disk space for. Ie: just because you can build a 4TB disk array, doesn't mean you can have a single Cassandra node with 4TB of data. Typically, people around here seem to recommend ~400GB, but that depends on hardware. Honestly, for the price of a single computer you could test this pretty easy. That's what I'd do. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
RE: java.lang.OutOfMemoryError: unable to create new native thread
I think I figured out where the issue is. I will keep you posted soon. From: as...@outlook.com To: user@cassandra.apache.org Subject: java.lang.OutOfMemoryError: unable to create new native thread Date: Fri, 15 Mar 2013 17:54:25 -0500 I have a Cassandra node that is going down frequently with 'java.lang.OutOfMemoryError: unable to create new native thread. Its a 16GB VM out of which 4GB is set as Xmx and there are no other process running on the VM. I have about 300 clients connecting to this node on an average. I have no indication from vmstats/SAR that my VM has used more memory or is memory hungry. Doesn't indicate a memory issue to me. Appreciate any pointers to this. System Specs: 2CPU16GBRHEL 6.2 Thank you.
Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?
Thank you very much Aaron. I recall from the logs of this upgraded node to 1.2.2 reported seeing others as dead. Brandon suggested in https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first before upgrading to 1.2.2. I am in the middle of troubleshooting some other issues I had with that upgrade (posted separately), once I am done, I will give your suggestion a try. On Mon, Mar 11, 2013 at 10:34 PM, aaron morton aa...@thelastpickle.comwrote: Is this just a display bug in nodetool or this upgraded node really sees the other ones as dead? Is the 1.2.2 node which is see all the others as down processing requests ? Is it showing the others as down in the log ? I'm not really sure what's happening. But you can try starting the 1.2.2 node with the -Dcassandra.load_ring_state=false parameter, append it at the bottom of the cassandra-env.sh file. It will force the node to get the ring state from the others. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 8/03/2013, at 10:24 PM, Arya Goudarzi gouda...@gmail.com wrote: OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new problems that I had and I posted them in a separate email, this issue still exists but now it is only on 1.2.2 node. This means that the nodes running 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and gossip from nodes with 1.1.6 for example. Bold denotes upgraded node: Address DC RackStatus State Load Effective-Ownership Token 141784319550391026443072753098378663700 XX.180.36us-east 1b Up Normal 49.47 GB 25.00% 1808575600 XX.231.121 us-east 1c Up Normal 47.08 GB 25.00% 7089215977519551322153637656637080005 XX.177.177 us-east 1d Up Normal 33.64 GB 25.00% 14178431955039102644307275311465584410 XX.7.148us-east 1b Up Normal 41.27 GB 25.00% 42535295865117307932921825930779602030 XX.20.9 us-east 1c Up Normal 38.51 GB 25.00% 49624511842636859255075463585608106435 XX.86.255us-east 1d Up Normal 34.78 GB 25.00% 56713727820156410577229101240436610840 XX.63.230us-east 1b Up Normal 38.11 GB 25.00% 85070591730234615865843651859750628460 XX.163.36 us-east 1c Up Normal 44.25 GB 25.00% 92159807707754167187997289514579132865 XX.31.234us-east 1d Up Normal 44.66 GB 25.00% 99249023685273718510150927169407637270 XX.132.169 us-east 1b Up Normal 44.2 GB 25.00% 127605887595351923798765477788721654890 XX.71.63 us-east 1c Up Normal 38.74 GB 25.00% 134695103572871475120919115443550159295 XX.197.209 us-east 1d Up Normal 41.5 GB 25.00% 141784319550391026443072753098378663700 /XX.71.63 RACK:1c SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.1598705272E10 DC:us-east INTERNAL_IP:XX.194.92 STATUS:NORMAL,134695103572871475120919115443550159295 RPC_ADDRESS:XX.194.92 RELEASE_VERSION:1.1.6 /XX.86.255 RACK:1d SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:3.734334162E10 DC:us-east INTERNAL_IP:XX.6.195 STATUS:NORMAL,56713727820156410577229101240436610840 RPC_ADDRESS:XX.6.195 RELEASE_VERSION:1.1.6 /XX.7.148 RACK:1b SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.4316975808E10 DC:us-east INTERNAL_IP:XX.47.250 STATUS:NORMAL,42535295865117307932921825930779602030 RPC_ADDRESS:XX.47.250 RELEASE_VERSION:1.1.6 /XX.63.230 RACK:1b SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.0918593305E10 DC:us-east INTERNAL_IP:XX.89.127 STATUS:NORMAL,85070591730234615865843651859750628460 RPC_ADDRESS:XX.89.127 RELEASE_VERSION:1.1.6 /XX.132.169 RACK:1b SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.745883458E10 DC:us-east INTERNAL_IP:XX.94.161 STATUS:NORMAL,127605887595351923798765477788721654890 RPC_ADDRESS:XX.94.161 RELEASE_VERSION:1.1.6 /XX.180.36 RACK:1b SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:5.311963027E10 DC:us-east INTERNAL_IP:XX.123.112 STATUS:NORMAL,1808575600 RPC_ADDRESS:XX.123.112 RELEASE_VERSION:1.1.6 /XX.163.36 RACK:1c SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.7516755022E10 DC:us-east INTERNAL_IP:XX.163.180 STATUS:NORMAL,92159807707754167187997289514579132865 RPC_ADDRESS:XX.163.180 RELEASE_VERSION:1.1.6 /XX.31.234 RACK:1d SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 LOAD:4.7954372912E10
Re: cql query not giving any result.
Any suggestions? -Vivek On Fri, Mar 15, 2013 at 5:20 PM, Vivek Mishra mishra.v...@gmail.com wrote: Ok. So it's a case when, CQL returns rowkey value as key and there is also column present with name as key. Sounds like a bug? -Vivek On Fri, Mar 15, 2013 at 5:17 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: Hi Sylvain, I created it using thrift client, here is column family creation script, Cassandra.Client client; CfDef user_Def = new CfDef(); user_Def.name = DOCTOR; user_Def.keyspace = KunderaExamples; user_Def.setComparator_type(UTF8Type); user_Def.setDefault_validation_class(UTF8Type); user_Def.setKey_validation_class(UTF8Type); ColumnDef key = new ColumnDef(ByteBuffer.wrap(KEY.getBytes()), UTF8Type); key.index_type = IndexType.KEYS; ColumnDef age = new ColumnDef(ByteBuffer.wrap(AGE.getBytes()), UTF8Type); age.index_type = IndexType.KEYS; user_Def.addToColumn_metadata(key); user_Def.addToColumn_metadata(age); client.set_keyspace(KunderaExamples); client.system_add_column_family(user_Def); Thanks KK On Fri, Mar 15, 2013 at 4:24 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Fri, Mar 15, 2013 at 11:43 AM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi, Is it possible in Cassandra to make multiple column with same name ?, like in this particular scenario I have two column with same name as key, first one is rowkey and second on is column name . No, it shouldn't be possible and that is your problem. How did you created that table? -- Sylvain Thanks and Regards Kuldeep On Fri, Mar 15, 2013 at 4:05 PM, Kuldeep Mishra kuld.cs.mis...@gmail.com wrote: Hi , Following cql query not returning any result cqlsh:KunderaExamples select * from DOCTOR where key='kuldeep'; I have enabled secondary indexes on both column. Screen shot is attached Please help -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199 -- Thanks and Regards Kuldeep Kumar Mishra +919540965199
Re: hinted handoff disabling trade-offs
Thanks Aaron. I am using CL=ONE. read_repair_chance=0. The part which I'm wondering about is what happens to the internal Cassandra writes if Hinted Handoffs are disabled. I think I understand what it means for application-level data, but the part I'm not entirely sure about is what it could mean for Cassandra internals. My cluster is under heavy write load. I'm considering disabling Hinted Handoffs so the nodes recover quicker in case compactions begin to back up. On Wed, Mar 6, 2013 at 2:06 AM, aaron morton aa...@thelastpickle.com wrote: The advantage of HH is that it reduces the probability of a DigestMismatch when using a CL ONE. A DigestMismatch means the read has to run a second time before returning to the client. - No risk of hinted-handoffs building up - No risk of hinted-handoffs flooding a node that just came up See the yaml config settings for the max hint window and the throttling. Can anyone suggest any other factors that I'm missing here. Specifically reasons not to do this. If you are doing this for performance first make sure your data model is efficient, that you are doing the most efficient reads (see my presentation here http://www.datastax.com/events/cassandrasummit2012/presentations), and your caching is bang on. Then consider if you can tune the CL, and if your client is token aware so it directs traffic to a node that has it. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 9:19 PM, Michael Kjellman mkjell...@barracuda.com wrote: Also, if you have enough hints being created that its significantly impacting your heap I have a feeling things are going to get out of sync very quickly. On Mar 4, 2013, at 9:17 PM, Wz1975 wz1...@yahoo.com wrote: Why do you think disabling hinted handoff will improve memory usage? Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: hinted handoff disabling trade-offs From: Michael Kjellman mkjell...@barracuda.com To: user@cassandra.apache.org user@cassandra.apache.org CC: Repair is slow. On Mar 4, 2013, at 8:07 PM, Matt Kap matvey1...@gmail.com wrote: I am looking to get a second opinion about disabling hinted-handoffs. I have an application that can tolerate a fair amount of inconsistency (advertising domain), and so I'm weighting the pros and cons of hinted handoffs. I'm running Cassandra 1.0, looking to upgrade to 1.1 soon. Pros of disabling hinted handoffs: - Reduces heap - Improves GC performance - No risk of hinted-handoffs building up - No risk of hinted-handoffs flooding a node that just came up Cons - Some writes can be lost, at least until repair runs Can anyone suggest any other factors that I'm missing here. Specifically reasons not to do this. Cheers! -Matt Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com. -- Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com. -- www.calcmachine.com - easy online calculator.