Re: Cassandra Client Recommendation
Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: Cassandra Client Recommendation
Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.comwrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: Cassandra Client Recommendation
1) Does Astyanax client have any problem with previous version of Cassandra? We have used with 1.1.8, but for this version we do not use the last version of Astyanax. But I think that to Cassandra 1.2.* the last version of astyanax will work. 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? In our tests, when we use Astyanax the process memory elavate comparing with with using direct TBinaryProtocol (cassandra-all.jar). So it is necessary that you have more memory to your process. 3) Does Astyanax supports asynch capabilities? What is an asynch capabilites example? 2013/4/17 Techy Teck comptechge...@gmail.com Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.comwrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
RE: Cassandra Client Recommendation
Hi We are using Cassandra 1.6 at this moment. We start to work with Hector, because it is the first recommendation that you can find in a simple google search for java clients Cassandra. We start using Hector but when we start to have non dynamically column families, that can be managed using cql, we start to use astyanax because: - It is easy to understand the code even for people who has never worked with Cassandra. - The cql implementation offer more capabilities - Astyanax is prepared to use Cql 3 and with hector we experienced some problems (probably our fault, but with Astyanax everything works from the beginning). - Astyanax allow to use compound primary keys. In next months we are going to substitute Hector by Astyanax totally but at this moment we are using both: - Astyanax for cql. - Hector for dynamic column families. From: Techy Teck [mailto:comptechge...@gmail.com] Sent: woensdag 17 april 2013 8:14 To: user Subject: Re: Cassandra Client Recommendation Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.commailto:peitin.inu...@gmail.com wrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.commailto:comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: Reduce Cassandra GC
You're right, it's probably hard. I should have provided more data. I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the log indicates that JNA is working, please correct me if I'm wrong: CLibrary.java (line 111) JNA mlockall successful Total amount of RAM is 4GB. My description of data size was very bad. Sorry about that. Data set size is 12.3 GB per node, compressed. Heap size is 998.44MB according to nodetool info. Key cache is 49MB bytes according to nodetool info. Row cache size is 0 bytes acoording to nodetool info. Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max in jconsole. Memtable is left at default which should give it 333MB according to documentation (uncertain where I can verify this). Our production cluster seems similar to your dev cluster so possibly increasing the heap to 2GB might help our issues. I am still interested in getting rough estimates of how much heap will be needed as data grows. Other than empirical studies how would I go about getting such estimates? 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com How one could provide any help without any knowledge about your cluster, node and environment settings? ** ** 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes). ** ** With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small. ** ** Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM? ** ** Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB. ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] *Sent:* Tuesday, April 16, 2013 12:52 *To:* user@cassandra.apache.org *Subject:* Re: Reduce Cassandra GC ** ** How do you calculate the heap / data size ratio? Is this a linear ratio?** ** ** ** Each node has slightly more than 12 GB right now though. ** ** 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com For a 40GB of data 1GB of heap is too low. Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer ** ** Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] *Sent:* Tuesday, April 16, 2013 10:47 *To:* user@cassandra.apache.org *Subject:* Reduce Cassandra GC Hi, We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2. About 2.4 million of the rows are skinny (6 columns) and around 3kb
Key-Token mapping in cassandra
We would like to map multiple keys to a single token in cassandra. I believe this should be possible now with CASSANDRA-1034 Ex: Key1 -- 123/IMAGE Key2 -- 123/DOCUMENTS Key3 -- 123/MULTIMEDIA I would like all keys with 123 as prefix to be mapped to a single token. Is this possible? What should be the Partitioner that I should most likely extend and write my own to achieve the desired result? -- Ravi
InvalidRequestException: Start key's token sorts after end token
Hi, I am getting an exception when I run Hadoop with Cassandra that follows: WARN org.apache.hadoop.mapred.Child (main): Error running child java.lang.RuntimeException: InvalidRequestException(why:Start key's token sorts after end token) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453) I don't know what exactly this message means and how to solve the problem ... I am using Priam for manager my cluster in Cassandra over Elastic Map/Reduce on Amazon ... Any hint helps ... Thanks, Andre
Re: InvalidRequestException: Start key's token sorts after end token
I literally jut replied to your stackoverflow comment then saw this email. I need the whole stack trace. My guess is the ColFamily is configured for one sort method where map/reduce is using another or something when querying but that's just a guess. Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 6:47 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: InvalidRequestException: Start key's token sorts after end token know what exactly this message means a
Re: InvalidRequestException: Start key's token sorts after end token
Dean, sorry, but I saw your comments on Stackoverflow ( http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr) just after I sent this message ... and I think you may be right about the sort method, but Priam sets Cassandra partitioner with RandomPartitioner, and maybe the correct could be Murmur3Partitioner when we use Hadoop (I am not sure too) ... if that is true I got a problem because I can't change the partitioner with Priam (I think it only works with RandomPartitioner) ... Andre 2013/4/17 Hiller, Dean dean.hil...@nrel.gov I literally jut replied to your stackoverflow comment then saw this email. I need the whole stack trace. My guess is the ColFamily is configured for one sort method where map/reduce is using another or something when querying but that's just a guess. Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 6:47 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: InvalidRequestException: Start key's token sorts after end token know what exactly this message means a
Getting error while inserting data in cassandra table using Java with JDBC
Hi, When I am trying to insert the data into a table using Java with JDBC, I am getting the error InvalidRequestException(why:cannot parse 'Jo' as hex bytes) My insert quarry is: insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10); This insert quarry is running successfully from CQLSH command prompt but not from the code The quarry I have used to create the table in CQLSH is: CREATE TABLE temp ( id bigint PRIMARY KEY, dt_stamp timestamp, name text, url_id bigint, value text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I guess the problem may because of undefined key_validation_class,default_validation_class and comparator etc. Is there any way to define these attributes using CQLSH ? I have already tried ASSUME command but it also have not resolved the problem. I am a beginner in cassandra and need your guidance. -- Thanks Regards, Himanshu Joshi
Re: InvalidRequestException: Start key's token sorts after end token
What's the stack trace you see? At the time, I was thinking column scan not row scan as perhaps your code or priam's code was doing a column slice within a row set and the columns are sorted by Integer while priam is passing in UTF8 or vice-versa. Ie. Do we know if this is a column sorting issue or a row one? Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 7:09 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: InvalidRequestException: Start key's token sorts after end token Dean, sorry, but I saw your comments on Stackoverflow (http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr ) just after I sent this message ... and I think you may be right about the sort method, but Priam sets Cassandra partitioner with RandomPartitioner, and maybe the correct could be Murmur3Partitioner when we use Hadoop (I am not sure too) ... if that is true I got a problem because I can't change the partitioner with Priam (I think it only works with RandomPartitioner) ... Andre 2013/4/17 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov I literally jut replied to your stackoverflow comment then saw this email. I need the whole stack trace. My guess is the ColFamily is configured for one sort method where map/reduce is using another or something when querying but that's just a guess. Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 6:47 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: InvalidRequestException: Start key's token sorts after end token know what exactly this message means a
looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right
Is cassandra-thrift-1.1.1.jar the generated code? I see a send() and recv() but I don't see a send(Callback cb) that is typicaly of true asynchronous platforms. Ie. I don't know when to call recv myself obviously if I am trying to make astyanax truly asynchronous. The reason I ask is we have a 100k row upload that with synchronous 20 threads takes around 30 seconds and with simulation, we predict this would be done in 3 seconds with an asynch api as our threads would not get held up like they do now. I guess we can try to crank it up to 100 threads to get it running a bit faster for now :( :(. Thanks, Dean
Re: How to stop Cassandra and then restart it in windows?
Hello, Can anyone provide any help on this? Thanks in advance. *Raihan Jamal* On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal jamalrai...@gmail.com wrote: Hello, I installed single node cluster in my local dev box which is running Windows 7 and it was working fine. Due to some reason, I need to restart my desktop and then after that whenever I am doing like this on the command prompt, it always gives me the below exception- S:\Apache Cassandra\apache-cassandra-1.2.3\bincassandra -f Starting Cassandra Server Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7199; nested exception is: java.net.BindException: Address already in use: JVM_Bind Meaning port being used somewhere. I have made some changes in *cassandra.yaml *file so I need to shutdown the Cassandra server and then restart it again. Can anybody help me with this? Thanks for the help.
Re: Added extra column as composite key while creation counter column family
On Tue, Apr 16, 2013 at 10:29 PM, Kuldeep Mishra kuld.cs.mis...@gmail.comwrote: cassandra 1.2.0 Is it a bug in 1.2.0 ? While I can't speak to this specific issue, 1.2.0 has meaningful known issues. I suggest upgrade to 1.2.3(/4) ASAP. =Rob
Re: Thrift message length exceeded
That was our first thought. Using maven's dependency tree info we verified that we're using the expected (cass 1.2.3) jars $ mvn dependency:tree | grep thrift [INFO] | +- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] | \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile I've also dumped the final command run by the hadoop we use (CDH3u5) and verified it's not sneaking thrift in on us. On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.comwrote: Can you confirm the you are using the same thrift version that ships 1.2.3 ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote: A bump to say I found this http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded so others are seeing similar behavior. From what I can see of org.apache.cassandra.hadoop nothing has changed since 1.1.5 when we didn't see such things but sure looks like there's a bug that's slipped in (or been uncovered) somewhere. I'll try to narrow down to a dataset and code that can reproduce. On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote: We are using Astyanax in production but I cut back to just Hadoop and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem. We do have some extremely large rows but we went from everything working with 1.1.5 to almost everything carping with 1.2.3. Something has changed. Perhaps we were doing something wrong earlier that 1.2.3 exposed but surprises are never welcome in production. On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote: I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 to 0.8 Turns out the Thrift message really was too long. The mystery to me: Why no complaints in previous versions? Were some checks added in Thrift or Hector? -Original Message- From: Lanny Ripple [mailto:la...@spotright.com] Sent: Tuesday, April 09, 2013 6:17 PM To: user@cassandra.apache.org Subject: Thrift message length exceeded Hello, We have recently upgraded to Cass 1.2.3 from Cass 1.1.5. We ran sstableupgrades and got the ring on its feet and we are now seeing a new issue. When we run MapReduce jobs against practically any table we find the following errors: 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 106 at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.Child.main(Child.java:260) Caused by: org.apache.thrift.TException: Message length exceeded: 106 at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) at org.apache.cassandra.thrift.Column.read(Column.java:528) at
Re: MySQL Cluster performing faster than Cassandra cluster on single table
How many threads / processes do you have performing the writes? How big are the mutations ? Where are you measuring the latency ? Look at the nodetool cfhistograms to see the time it takes for a single node to perform a write. Look at the nodetool proxyhistograms to see the end to end request latency from the coordinator. ^ the number on the left is microseconds for both. Generally cassandra does well with more clients. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 2:56 PM, Jabbar Azam aja...@gmail.com wrote: MySQL cluster also has the index in ram. So with lots of rows the ram becomes a limiting factor. That's what my colleague found and hence why were sticking with Cassandra. On 16 Apr 2013 21:05, horschi hors...@gmail.com wrote: Ah, I see, that makes sense. Have you got a source for the storing of hundreds of gigabytes? And does Cassandra not store anything in memory? It stores bloom filters and index-samples in memory. But they are much smaller than the actual data and they can be configured. Yeah, my dataset is small at the moment - perhaps I should have chosen something larger for the work I'm doing (University dissertation), however, it is far too late to change now! On paper mysql-cluster looks great. But in daily use its not as nice as Cassandra (where you have machines dying, networks splitting, etc.). cheers, Christian
Re: differences between DataStax Community Edition and Cassandra package
It's the same as the Apache version, but DSC comes with samples and the free version of Ops Centre. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 6:36 PM, Francisco Trujillo f.truji...@genetwister.nl wrote: Hi everyone Probably this question has been formulated for someone in the past. We are using apache Cassandra 1.6 now and we are planning to update the version. Datastax provides their own Cassandra package called “Datastax Community Edition”. I know that the Datastax package have some tools to manage the cluster like visual interfaces, but is there some important difference in the database itself is we compared with the same apache Cassandra that we can download from http://cassandra.apache.org/? Thanks for your help in advanced
Re: Cassandra Client Recommendation
One node on the native binary protocol, AFAIK it's still considered beta in 1.2 Also +1 for Astyanax Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl wrote: Hi We are using Cassandra 1.6 at this moment. We start to work with Hector, because it is the first recommendation that you can find in a simple google search for java clients Cassandra. We start using Hector but when we start to have non dynamically column families, that can be managed using cql, we start to use astyanax because: - It is easy to understand the code even for people who has never worked with Cassandra. - The cql implementation offer more capabilities - Astyanax is prepared to use Cql 3 and with hector we experienced some problems (probably our fault, but with Astyanax everything works from the beginning). - Astyanax allow to use compound primary keys. In next months we are going to substitute Hector by Astyanax totally but at this moment we are using both: - Astyanax for cql. - Hector for dynamic column families. From: Techy Teck [mailto:comptechge...@gmail.com] Sent: woensdag 17 april 2013 8:14 To: user Subject: Re: Cassandra Client Recommendation Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com wrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: Reduce Cassandra GC
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600 This does not say that the heap is full. ParNew is GC activity for the new heap, which is typically a smaller part of the overall heap. It sounds like you are running with defaults for the memory config, which is generally a good idea. But 4GB total memory for a node is on the small size. Try some changes, edit the cassandra-env.sh file and change MAX_HEAP_SIZE=2G HEAP_NEWSIZE=400M You may also want to try: MAX_HEAP_SIZE=2G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=4 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=2 The size of the new heap generally depends on the number of cores available, see the commends in the -env file. An older discussion about memory use, not that in 1.2 the bloom filters (and compression data) are off heap now. http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html Hope that helps. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 11:06 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: You're right, it's probably hard. I should have provided more data. I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the log indicates that JNA is working, please correct me if I'm wrong: CLibrary.java (line 111) JNA mlockall successful Total amount of RAM is 4GB. My description of data size was very bad. Sorry about that. Data set size is 12.3 GB per node, compressed. Heap size is 998.44MB according to nodetool info. Key cache is 49MB bytes according to nodetool info. Row cache size is 0 bytes acoording to nodetool info. Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max in jconsole. Memtable is left at default which should give it 333MB according to documentation (uncertain where I can verify this). Our production cluster seems similar to your dev cluster so possibly increasing the heap to 2GB might help our issues. I am still interested in getting rough estimates of how much heap will be needed as data grows. Other than empirical studies how would I go about getting such estimates? 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com How one could provide any help without any knowledge about your cluster, node and environment settings? 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes). With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small. Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM? Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride with Adform's Rich Media Suite signature-logo29.png signature-best-employer-logo4823.png Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com] Sent: Tuesday, April 16, 2013 12:52 To: user@cassandra.apache.org Subject: Re: Reduce Cassandra GC How do you calculate the heap / data size ratio? Is this a linear ratio? Each node has slightly more than 12 GB right now though. 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com For a 40GB of data 1GB of heap is too low. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride with Adform's Rich Media Suite image001.png image002.png Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information
Re: Key-Token mapping in cassandra
CASSANDRA-1034 That ticket is about removing an assumption which was not correct. I would like all keys with 123 as prefix to be mapped to a single token. Why? it's not possible nor desirable IMHO. Tokens are used to identify a single row internally. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 11:25 PM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: We would like to map multiple keys to a single token in cassandra. I believe this should be possible now with CASSANDRA-1034 Ex: Key1 -- 123/IMAGE Key2 -- 123/DOCUMENTS Key3 -- 123/MULTIMEDIA I would like all keys with 123 as prefix to be mapped to a single token. Is this possible? What should be the Partitioner that I should most likely extend and write my own to achieve the desired result? -- Ravi
Re: Getting error while inserting data in cassandra table using Java with JDBC
What version are you using ? And what JDBC driver ? Sounds like the driver is not converting the value to bytes for you. I guess the problem may because of undefined key_validation_class,default_validation_class and comparator etc. If you are using CQL these are not relevant. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 1:31 AM, himanshu.joshi himanshu.jo...@orkash.com wrote: Hi, When I am trying to insert the data into a table using Java with JDBC, I am getting the error InvalidRequestException(why:cannot parse 'Jo' as hex bytes) My insert quarry is: insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10); This insert quarry is running successfully from CQLSH command prompt but not from the code The quarry I have used to create the table in CQLSH is: CREATE TABLE temp ( id bigint PRIMARY KEY, dt_stamp timestamp, name text, url_id bigint, value text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I guess the problem may because of undefined key_validation_class,default_validation_class and comparator etc. Is there any way to define these attributes using CQLSH ? I have already tried ASSUME command but it also have not resolved the problem. I am a beginner in cassandra and need your guidance. -- Thanks Regards, Himanshu Joshi
Multi datacenter setup question
Hello, My test setup consist of two datacenters DC1 and DC2. DC2 has a offset of 10 as you can see in the following ring command. I have two questions: 1) Let's say in this case I insert a key at DC2 and its token is, let's say 85070591730234615865843651857942052874, in this case will it be owned by DC2 ? and then replicated on DC1 ? i.e. who owns it. 2) Notice that the Owns distribution is not even, is this something I should be worrying about ? I am using Cassandra 1.0.12. Following is the ring command output: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052874 10.0.0.1 DC1 RAC-1 Up Normal 101.73 KB 50.00% 0 10.0.0.2 DC2 RAC-1 Up Normal 92.55 KB0.00% 10 10.0.0.3 DC1 RAC-1 Up Normal 115.09 KB 50.00% 85070591730234615865843651857942052864 10.0.0.4 DC2 RAC-1 Up Normal 101.62 KB 0.00% 85070591730234615865843651857942052874
Using an EC2 cluster from the outside.
I have a working 3 node cluster in a single ec2 region and I need to hit it from our datacenter. As you'd expect, the client gets the internal addresses of the nodes back. Someone on irc mentioned using the public IP for rpc and binding that address to the box. I see that mentioned in an old list mail but I don't get exactly how this is supposed to work. I could really use either a link to something with explicit directions or a detailed explanation. Should cassandra use the public IPs for everything -- listen, b'cast, and rpc? What should cassandra.yaml look like? Is the idea to use the public addresses for cassandra but route the requests between nodes over the lan using nat? Any help or suggestion is appreciated.
Re: Cassandra Client Recommendation
Thanks Aaron for the suggestion. I am not sure, I was able to understand regarding one node thing you mentioned on the native binary protocol? Can you please elaborate that? On Wed, Apr 17, 2013 at 11:21 AM, aaron morton aa...@thelastpickle.comwrote: One node on the native binary protocol, AFAIK it's still considered beta in 1.2 Also +1 for Astyanax Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl wrote: Hi We are using Cassandra 1.6 at this moment. We start to work with Hector, because it is the first recommendation that you can find in a simple google search for java clients Cassandra. We start using Hector but when we start to have non dynamically column families, that can be managed using cql, we start to use astyanax because: - It is easy to understand the code even for people who has never worked with Cassandra. - The cql implementation offer more capabilities - Astyanax is prepared to use Cql 3 and with hector we experienced some problems (probably our fault, but with Astyanax everything works from the beginning). - Astyanax allow to use compound primary keys. In next months we are going to substitute Hector by Astyanax totally but at this moment we are using both: - Astyanax for cql. - Hector for dynamic column families. From: Techy Teck [mailto:comptechge...@gmail.com] Sent: woensdag 17 april 2013 8:14 To: user Subject: Re: Cassandra Client Recommendation Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com wrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
Re: InvalidRequestException: Start key's token sorts after end token
If you Hadoop task supplying both a start and finish key for the slice ? You probably only want the start. Provide the full call stack and the code in your hadoop task. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 1:34 AM, Hiller, Dean dean.hil...@nrel.gov wrote: What's the stack trace you see? At the time, I was thinking column scan not row scan as perhaps your code or priam's code was doing a column slice within a row set and the columns are sorted by Integer while priam is passing in UTF8 or vice-versa. Ie. Do we know if this is a column sorting issue or a row one? Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 7:09 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: InvalidRequestException: Start key's token sorts after end token Dean, sorry, but I saw your comments on Stackoverflow (http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr ) just after I sent this message ... and I think you may be right about the sort method, but Priam sets Cassandra partitioner with RandomPartitioner, and maybe the correct could be Murmur3Partitioner when we use Hadoop (I am not sure too) ... if that is true I got a problem because I can't change the partitioner with Priam (I think it only works with RandomPartitioner) ... Andre 2013/4/17 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov I literally jut replied to your stackoverflow comment then saw this email. I need the whole stack trace. My guess is the ColFamily is configured for one sort method where map/reduce is using another or something when querying but that's just a guess. Dean From: Andre Tavares andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, April 17, 2013 6:47 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: InvalidRequestException: Start key's token sorts after end token know what exactly this message means a
Re: looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right
Here's an example I did in python a long time ago http://www.mail-archive.com/user@cassandra.apache.org/msg04775.html Call send() then select on the file handle, when it's ready to read call recv(). Or just add more threads on your side :) Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 2:50 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Is cassandra-thrift-1.1.1.jar the generated code? I see a send() and recv() but I don't see a send(Callback cb) that is typicaly of true asynchronous platforms. Ie. I don't know when to call recv myself obviously if I am trying to make astyanax truly asynchronous. The reason I ask is we have a 100k row upload that with synchronous 20 threads takes around 30 seconds and with simulation, we predict this would be done in 3 seconds with an asynch api as our threads would not get held up like they do now. I guess we can try to crank it up to 100 threads to get it running a bit faster for now :( :(. Thanks, Dean
Re: differences between DataStax Community Edition and Cassandra package
On Wed, Apr 17, 2013 at 11:19 AM, aaron morton aa...@thelastpickle.comwrote: It's the same as the Apache version, but DSC comes with samples and the free version of Ops Centre. DSE also comes with Solr special sauce and CDFS. =Rob
Re: How to stop Cassandra and then restart it in windows?
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7199; nested exception is: java.net.BindException: Address already in use: JVM_Bind The process is already running, is it installed as a service and was it automatically started when the system started ? either shut it down using the service management or find the process (however you do that in windows) and kill it. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 4:26 AM, Raihan Jamal jamalrai...@gmail.com wrote: Hello, Can anyone provide any help on this? Thanks in advance. Raihan Jamal On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal jamalrai...@gmail.com wrote: Hello, I installed single node cluster in my local dev box which is running Windows 7 and it was working fine. Due to some reason, I need to restart my desktop and then after that whenever I am doing like this on the command prompt, it always gives me the below exception- S:\Apache Cassandra\apache-cassandra-1.2.3\bincassandra -f Starting Cassandra Server Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7199; nested exception is: java.net.BindException: Address already in use: JVM_Bind Meaning port being used somewhere. I have made some changes in cassandra.yaml file so I need to shutdown the Cassandra server and then restart it again. Can anybody help me with this? Thanks for the help.
Re: Using an EC2 cluster from the outside.
On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote: I have a working 3 node cluster in a single ec2 region and I need to hit it from our datacenter. As you'd expect, the client gets the internal addresses of the nodes back. Someone on irc mentioned using the public IP for rpc and binding that address to the box. I see that mentioned in an old list mail but I don't get exactly how this is supposed to work. I could really use either a link to something with explicit directions or a detailed explanation. Should cassandra use the public IPs for everything -- listen, b'cast, and rpc? What should cassandra.yaml look like? Is the idea to use the public addresses for cassandra but route the requests between nodes over the lan using nat? Any help or suggestion is appreciated. Google EC2MultiRegionSnitch. =Rob
Re: Thrift message length exceeded
Can you reproduce this in a simple way ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote: That was our first thought. Using maven's dependency tree info we verified that we're using the expected (cass 1.2.3) jars $ mvn dependency:tree | grep thrift [INFO] | +- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] | \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile I've also dumped the final command run by the hadoop we use (CDH3u5) and verified it's not sneaking thrift in on us. On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com wrote: Can you confirm the you are using the same thrift version that ships 1.2.3 ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote: A bump to say I found this http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded so others are seeing similar behavior. From what I can see of org.apache.cassandra.hadoop nothing has changed since 1.1.5 when we didn't see such things but sure looks like there's a bug that's slipped in (or been uncovered) somewhere. I'll try to narrow down to a dataset and code that can reproduce. On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote: We are using Astyanax in production but I cut back to just Hadoop and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem. We do have some extremely large rows but we went from everything working with 1.1.5 to almost everything carping with 1.2.3. Something has changed. Perhaps we were doing something wrong earlier that 1.2.3 exposed but surprises are never welcome in production. On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote: I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 to 0.8 Turns out the Thrift message really was too long. The mystery to me: Why no complaints in previous versions? Were some checks added in Thrift or Hector? -Original Message- From: Lanny Ripple [mailto:la...@spotright.com] Sent: Tuesday, April 09, 2013 6:17 PM To: user@cassandra.apache.org Subject: Thrift message length exceeded Hello, We have recently upgraded to Cass 1.2.3 from Cass 1.1.5. We ran sstableupgrades and got the ring on its feet and we are now seeing a new issue. When we run MapReduce jobs against practically any table we find the following errors: 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 106 at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at
Re: Multi datacenter setup question
1) Let’s say in this case I insert a key at DC2 and its token is, let’s say 85070591730234615865843651857942052874, in this case will it be owned by DC2 ? and then replicated on DC1 ? i.e. who owns it. We don't think in terms of owning the token. The token range in the local DC that contains the token is used to find the first replica for the row. The same process is used to find the replicas in the remote DC's. 2) Notice that the Owns distribution is not even, is this something I should be worrying about ? No. I think that's changed in the newer versions. I am using Cassandra 1.0.12. Please use version 1.1 or 1.2. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 7:03 AM, More, Sandeep R sandeep.r.m...@intel.com wrote: Hello, My test setup consist of two datacenters DC1 and DC2. DC2 has a offset of 10 as you can see in the following ring command. I have two questions: 1) Let’s say in this case I insert a key at DC2 and its token is, let’s say 85070591730234615865843651857942052874, in this case will it be owned by DC2 ? and then replicated on DC1 ? i.e. who owns it. 2) Notice that the Owns distribution is not even, is this something I should be worrying about ? I am using Cassandra 1.0.12. Following is the ring command output: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052874 10.0.0.1 DC1 RAC-1 Up Normal 101.73 KB 50.00% 0 10.0.0.2 DC2 RAC-1 Up Normal 92.55 KB0.00% 10 10.0.0.3 DC1 RAC-1 Up Normal 115.09 KB 50.00% 85070591730234615865843651857942052864 10.0.0.4 DC2 RAC-1 Up Normal 101.62 KB 0.00% 85070591730234615865843651857942052874
Re: Cassandra Client Recommendation
Was a typo, should have been One note on Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 7:23 AM, Techy Teck comptechge...@gmail.com wrote: Thanks Aaron for the suggestion. I am not sure, I was able to understand regarding one node thing you mentioned on the native binary protocol? Can you please elaborate that? On Wed, Apr 17, 2013 at 11:21 AM, aaron morton aa...@thelastpickle.com wrote: One node on the native binary protocol, AFAIK it's still considered beta in 1.2 Also +1 for Astyanax Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl wrote: Hi We are using Cassandra 1.6 at this moment. We start to work with Hector, because it is the first recommendation that you can find in a simple google search for java clients Cassandra. We start using Hector but when we start to have non dynamically column families, that can be managed using cql, we start to use astyanax because: - It is easy to understand the code even for people who has never worked with Cassandra. - The cql implementation offer more capabilities - Astyanax is prepared to use Cql 3 and with hector we experienced some problems (probably our fault, but with Astyanax everything works from the beginning). - Astyanax allow to use compound primary keys. In next months we are going to substitute Hector by Astyanax totally but at this moment we are using both: - Astyanax for cql. - Hector for dynamic column families. From: Techy Teck [mailto:comptechge...@gmail.com] Sent: woensdag 17 april 2013 8:14 To: user Subject: Re: Cassandra Client Recommendation Thanks Everton for the suggestion. Couple of questions- 1) Does Astyanax client have any problem with previous version of Cassandra? 2) You said one problem, that it will consume more memory? Can you elaborate that slightly? What do you mean by that? 3) Does Astyanax supports asynch capabilities? On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com wrote: Hi Techy, We are using Astyanax with cassandra 1.2.4. beneficits: * It is so easy to configure and use. * Good wiki * Mantained by Netflix * Solution to manage the store of big files (more than 15mb) * Solution to read all rows efficiently problems: * It consume more memory 2013/4/16 Techy Teck comptechge...@gmail.com Hello, I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client I should go forward with. I am mainly interested in these three- --1) Astyanax client 2--) New Datastax client that uses Binary protocol. --3) Pelops client Can anyone provide some thoughts on this? Some advantages and disadvantages for these three will be great start for me. Keeping in mind, we are running Cassandra 1.2.2 in production environment. Thanks for the help. -- Everton Lima Aleixo Bacharel em Ciência da Computação pela UFG Mestrando em Ciência da Computação pela UFG Programador no LUPA
How to make compaction run faster?
Hi Team, I have a high write traffic to my Cassandra cluster. I experience a very high number of pending compactions. As I expect higher writes, The pending compactions keep increasing. Even when I stop my writes it takes several hours to finishing pending compactions. My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9. How can I increase the compaction rate so it will run bit faster to match my write speed? Your inputs are appreciated. Thanks, Jay
Re: How to make compaction run faster?
three things: 1) compaction throughput is fairly low (yaml nodetool) 2) concurrent compactions is fairly low (yaml) 3) multithreaded compaction might be off in your version Try raising these things. Otherwise consider option 4. 4)$$$ RAID,RAMCPU$$ On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc jaytechg...@gmail.com wrote: Hi Team, I have a high write traffic to my Cassandra cluster. I experience a very high number of pending compactions. As I expect higher writes, The pending compactions keep increasing. Even when I stop my writes it takes several hours to finishing pending compactions. My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9. How can I increase the compaction rate so it will run bit faster to match my write speed? Your inputs are appreciated. Thanks, Jay
Re: How to make compaction run faster?
:D Jay, check if your disk(s) utilization allows you to change the configuration the way Edward suggest. iostat -xkcd 1 will show you how much of your disk(s) are in use. On Wed, Apr 17, 2013 at 5:26 PM, Edward Capriolo edlinuxg...@gmail.comwrote: three things: 1) compaction throughput is fairly low (yaml nodetool) 2) concurrent compactions is fairly low (yaml) 3) multithreaded compaction might be off in your version Try raising these things. Otherwise consider option 4. 4)$$$ RAID,RAMCPU$$ On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc jaytechg...@gmail.com wrote: Hi Team, I have a high write traffic to my Cassandra cluster. I experience a very high number of pending compactions. As I expect higher writes, The pending compactions keep increasing. Even when I stop my writes it takes several hours to finishing pending compactions. My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9. How can I increase the compaction rate so it will run bit faster to match my write speed? Your inputs are appreciated. Thanks, Jay
Re: Thrift message length exceeded
It's slow going finding the time to do so but I'm working on that. We do have another table that has one or sometimes two columns per row. We can run jobs on it without issue. I looked through org.apache.cassandra.hadoop code and don't see anything that's really changed since 1.1.5 (which was also using thrift-0.7) so something of a puzzler about what's going on. On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com wrote: Can you reproduce this in a simple way ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote: That was our first thought. Using maven's dependency tree info we verified that we're using the expected (cass 1.2.3) jars $ mvn dependency:tree | grep thrift [INFO] | +- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] | \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile I've also dumped the final command run by the hadoop we use (CDH3u5) and verified it's not sneaking thrift in on us. On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com wrote: Can you confirm the you are using the same thrift version that ships 1.2.3 ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote: A bump to say I found this http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded so others are seeing similar behavior. From what I can see of org.apache.cassandra.hadoop nothing has changed since 1.1.5 when we didn't see such things but sure looks like there's a bug that's slipped in (or been uncovered) somewhere. I'll try to narrow down to a dataset and code that can reproduce. On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote: We are using Astyanax in production but I cut back to just Hadoop and Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem. We do have some extremely large rows but we went from everything working with 1.1.5 to almost everything carping with 1.2.3. Something has changed. Perhaps we were doing something wrong earlier that 1.2.3 exposed but surprises are never welcome in production. On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote: I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 to 0.8 Turns out the Thrift message really was too long. The mystery to me: Why no complaints in previous versions? Were some checks added in Thrift or Hector? -Original Message- From: Lanny Ripple [mailto:la...@spotright.com] Sent: Tuesday, April 09, 2013 6:17 PM To: user@cassandra.apache.org Subject: Thrift message length exceeded Hello, We have recently upgraded to Cass 1.2.3 from Cass 1.1.5. We ran sstableupgrades and got the ring on its feet and we are now seeing a new issue. When we run MapReduce jobs against practically any table we find the following errors: 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 106 at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at
Re: Using an EC2 cluster from the outside.
Depending on your client, disable automatic client discovery and just specify a list of all your nodes in your client configuration. For more details check out http://xzheng.net/blogs/problem-when-connecting-to-cassandra-with-ruby/ , obviously this deals specifically with a ruby client but it should be applicable to others. Cheers Ben Instaclustr | www.instaclustr.com | @instaclustr On 18/04/2013, at 5:43 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote: I have a working 3 node cluster in a single ec2 region and I need to hit it from our datacenter. As you'd expect, the client gets the internal addresses of the nodes back. Someone on irc mentioned using the public IP for rpc and binding that address to the box. I see that mentioned in an old list mail but I don't get exactly how this is supposed to work. I could really use either a link to something with explicit directions or a detailed explanation. Should cassandra use the public IPs for everything -- listen, b'cast, and rpc? What should cassandra.yaml look like? Is the idea to use the public addresses for cassandra but route the requests between nodes over the lan using nat? Any help or suggestion is appreciated. Google EC2MultiRegionSnitch. =Rob
[no subject]
I run cassandra on single win 8 machine for development needs. Everything has been working fine for several months but just today I saw this error message in cassandra logs all host pools were marked down. ERROR 08:40:42,684 Error occurred during processing of message. java.lang.StringIndexOutOfBoundsException: String index out of range: -214741811 1 at java.lang.String.checkBounds(String.java:397) at java.lang.String.init(String.java:442) at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol .java:339) at org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr a.java:18958) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process( Cassandra.java:3441) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav a:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run (CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec utor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) After restarting the server everything again worked fine. I am curious to know what is this related to. Is this caused due to my application putting any corrupted data?
Failed shuffle
I had a situation earlier where my shuffle failed after a hard disk drive filled up. I went through and disabled shuffle on the machines while trying to get the situation resolved. Now, while I can re-enable shuffle on the machines, when trying to do an ls, I get a timeout. Looking at the cassandra-shuffle code, it is trying execute this query: SELECT token_bytes,requested_at FROM system.range_xfers which is throwing the following error in my logs: java.lang.AssertionError: [min(-1),max(-219851097003960625)] at org.apache.cassandra.dht.Bounds.init(Bounds.java:41) at org.apache.cassandra.dht.Bounds.init(Bounds.java:34) at org.apache.cassandra.dht.Bounds.withNewRight(Bounds.java:121) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1172) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:132) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:62) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143) at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1726) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) So this causes me two major issues, first, I can't restart my dead node because it ends up with a Concurrency exception while trying to find relocating tokens during StorageService initialization, and I can't clear the moves because nothing is able to read what is in that range_xfers table (at least, I also was not able to read it through cqlsh). I thought I could recreate the table, but system is a restricted keyspace and it looks like I can't drop and recreate that table, and cql requires a key for delete... and since you can't get the key without getting an error Is there something simple I can do that I'm just missing right now? Right now I can't restart nodes because of this, nor sucessfully add new nodes to my ring.
Re: Getting error while inserting data in cassandra table using Java with JDBC
On 04/18/2013 12:06 AM, aaron morton wrote: What version are you using ? And what JDBC driver ? Sounds like the driver is not converting the value to bytes for you. I guess the problem may because of undefined key_validation_class,default_validation_class and comparator etc. If you are using CQL these are not relevant. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/04/2013, at 1:31 AM, himanshu.joshi himanshu.jo...@orkash.com mailto:himanshu.jo...@orkash.com wrote: Hi, When I am trying to insert the data into a table using Java with JDBC, I am getting the error InvalidRequestException(why:cannot parse 'Jo' as hex bytes) My insert quarry is: insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10); This insert quarry is running successfully from CQLSH command prompt but not from the code The quarry I have used to create the table in CQLSH is: CREATE TABLE temp ( id bigint PRIMARY KEY, dt_stamp timestamp, name text, url_id bigint, value text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; I guess the problem may because of undefined key_validation_class,default_validation_class and comparator etc. Is there any way to define these attributes using CQLSH ? I have already tried ASSUME command but it also have not resolved the problem. I am a beginner in cassandra and need your guidance. -- Thanks Regards, Himanshu Joshi Hi Aaron, The problem is resolved now as I upgraded the version of JDBC to 1.2.2 Earlier I was using JDBC version 1.1.6 with Cassandra 1.2.2 Thanks for your guidance. -- Thanks Regards, Himanshu Joshi
Re: Repair hanges on 1.1.4
Hi Aaron, Thank you for your feedback. I have also installed DataStax OPS center and its nothing shows progress of repair. Previously every repair progress also shown on OPS center and once it 100%, reapir also completed on nodes. but now reapir is in progress on node but OPS center nothing shows. Secondly please find netstats and compactionstats results as under; # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats Mode: NORMAL Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 05327870 Responses n/a 0 163271943 # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats pending tasks: 0 Active compaction remaining time :n/a Regards, Adeel Akbar Quoting aaron morton aa...@thelastpickle.com: The errors from Hints are not concerned with repair. Increasing the rpc_timeout may help with those. If it's logging about 0 hints you may be seeing this https://issues.apache.org/jira/browse/CASSANDRA-5068 How did repair hang ? Check for progress with nodetool compactionstats and nodetool netstats. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/04/2013, at 3:01 AM, Alexis Rodríguez arodrig...@inconcertcc.com wrote: Adeel, It may be a problem in the remote node, could you check the system.log? Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an increase in this parameter helps. On Fri, Apr 12, 2013 at 9:17 AM, adeel.ak...@panasiangroup.com wrote: Hi, I have started repair on newly added node with -pr and this nodes exist on another data center. I have 5MB internet connection and configured setstreamthroughput 1. After some time repair goes hang and following meesage found in logs; # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring Address DC RackStatus State Load Effective-Ownership Token 169417178424467235000914166253263322299 10.0.0.3DC1 RAC1Up Normal 93.26 GB 66.67% 0 10.0.0.4DC1 RAC1Up Normal 89.1 GB 66.67% 56713727820156410577229101238628035242 10.0.0.15 DC1 RAC1Up Normal 72.87 GB 66.67% 113427455640312821154458202477256070484 10.40.1.103 DC2 RAC1Up Normal 48.59 GB 100.00% 169417178424467235000914166253263322299 INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 372) Timed out replaying hints to /10.40.1.103; aborting further deliveries INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103 Why we getting this message and how I prevent repair from this error. Regards, Adeel Akbar