Re: 15 seconds to increment 17k keys?
Assuming you have replicate_on_write enabled (which you almost certainly do for counters), you have to do a read on a write for each increment. This means counter increments, even if all your data set fits in cache, are significantly slower than normal column inserts. I would say ~1k increments per second is about right, although you can probably do some tuning to improve this. I've also found that the pycassa client uses significant amounts of CPU, so be careful you are not CPU bound on the client. -- Richard Low Acunu | http://www.acunu.com | @acunu On Thu, Sep 1, 2011 at 2:31 AM, Yang tedd...@gmail.com wrote: 1ms per add operation is the general order of magnitude I have seen with my tests. On Wed, Aug 31, 2011 at 6:04 PM, Ian Danforth idanfo...@numenta.com wrote: All, I've got a 4 node cluster (ec2 m1.large instances, replication = 3) that has one primary counter type column family, that has one column in the family. There are millions of rows. Each operation consists of doing a batch_insert through pycassa, which increments ~17k keys. A majority of these keys are new in each batch. Each operation is taking up to 15 seconds. For our system this is a significant bottleneck. Does anyone know if this write speed is expected? Thanks in advance, Ian
Re: RF=1 w/ hadoop jobs
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote: But there is the another problem with Hadoop-Cassandra, if there is no node available for a range of keys, it fails on RuntimeError. For example having a keyspace with RF=1 and a node is down all MapReduce tasks fail. CASSANDRA-2388 is related but not the same. Before 0.8.4 the behaviour was if the local cassandra node didn't have the split's data the tasktracker would connect to another cassandra node where the split's data could be found. So even 0.8.4 with RF=1 you would have your hadoop job fail. Although I've reopened CASSANDRA-2388 (and reverted the code locally) because the new behaviour in 0.8.4 leads to abysmal tasktracker throughput (for me task allocation doesn't seem to honour data-locality according to split.getLocations()). I've reworked my previous patch, that was addressing this issue and now there are ConfigHelper methods for enable/disable ignoring unavailable ranges. It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) I'm interested in this patch and see it's usefulness but no one will act until you attach it to an issue. (I think a new issue is appropriate here). ~mck
Column index limitations: total number of indexes per row?
Hi All I have indexed a number of columns in a ROW, ie 25 colums, to perform Indexed_slice queries. If I am not mistaken, there is some limit to the number of indexes one may create per row/keyspace? I am trying to get up to 6000 columns indexed, per row, in 2.5 million rows. So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 indexes. Would that be tantamount to an atomic bomb riding full speed out of my test server? I have read the strategies of building Column families as indexes. I find it quite helpful, and a good solution for indexing. But can the above scenario ever be achieved? (haven't had time to try that yet). Regards to ALL! -- Renato da Silveira Senior Developer www.indabamobile.co.za
Fwd: Column index limitations: total number of indexes per row? OOPS :/
HAHA finger trouble on bellow line --- So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 indexes. * The sum actually is meant to be: 15,000,000 - so thats 15 million indexes - sho!* Apologies :) Original Message Subject:Column index limitations: total number of indexes per row? Date: Thu, 01 Sep 2011 14:17:01 +0200 From: Renato Bacelar da Silveira renat...@indabamobile.co.za Reply-To: user@cassandra.apache.org To: user user@cass user@cassandra.apache.org Hi All I have indexed a number of columns in a ROW, ie 25 colums, to perform Indexed_slice queries. If I am not mistaken, there is some limit to the number of indexes one may create per row/keyspace? I am trying to get up to 6000 columns indexed, per row, in 2.5 million rows. *So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 indexes.* Would that be tantamount to an atomic bomb riding full speed out of my test server? I have read the strategies of building Column families as indexes. I find it quite helpful, and a good solution for indexing. But can the above scenario ever be achieved? (haven't had time to try that yet). Regards to ALL! -- Renato da Silveira Senior Developer www.indabamobile.co.za
Re: Unable to link C library (for jna.jar) on 0.7.5
On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com wrote: I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and I'm having issues with snapshots, json sstable conversions, and various nodetool commands due to memory errors and the lack of the native access C libraries. I tried putting jna.jar on the classpath but I'm still seeing warnings in the log files like CLibrary.java (line 65) Unable to link C library. Native methods will be disabled.. Based on this warning, It looks like the .jar file is actually on the classpath but the native access libraries still aren't being used. Where did you get this jar? My guess is that the native code in that jar isn't compatible with your system. -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: Unable to link C library (for jna.jar) on 0.7.5
I got it here : https://nodeload.github.com/twall/jna/tarball/master Is there some other version or distribution of jna that I should be using? The version I have is 3.3.0. On Thu, Sep 1, 2011 at 8:49 AM, Eric Evans eev...@acunu.com wrote: On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com wrote: I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and I'm having issues with snapshots, json sstable conversions, and various nodetool commands due to memory errors and the lack of the native access C libraries. I tried putting jna.jar on the classpath but I'm still seeing warnings in the log files like CLibrary.java (line 65) Unable to link C library. Native methods will be disabled.. Based on this warning, It looks like the .jar file is actually on the classpath but the native access libraries still aren't being used. Where did you get this jar? My guess is that the native code in that jar isn't compatible with your system. -- Eric Evans Acunu | http://www.acunu.com | @acunu
RE: Unable to link C library (for jna.jar) on 0.7.5
Have you installed 'jna'? On RHEL (6 at least) it should be possible using the default yum repos. You need the native code and the JAR in Cassandras classpath from what I understand. Dan From: eczec...@gmail.com [mailto:eczec...@gmail.com] On Behalf Of Eric Czech Sent: September-01-11 11:13 To: user@cassandra.apache.org Subject: Re: Unable to link C library (for jna.jar) on 0.7.5 I got it here : https://nodeload.github.com/twall/jna/tarball/master Is there some other version or distribution of jna that I should be using? The version I have is 3.3.0. On Thu, Sep 1, 2011 at 8:49 AM, Eric Evans eev...@acunu.com wrote: On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com wrote: I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and I'm having issues with snapshots, json sstable conversions, and various nodetool commands due to memory errors and the lack of the native access C libraries. I tried putting jna.jar on the classpath but I'm still seeing warnings in the log files like CLibrary.java (line 65) Unable to link C library. Native methods will be disabled.. Based on this warning, It looks like the .jar file is actually on the classpath but the native access libraries still aren't being used. Where did you get this jar? My guess is that the native code in that jar isn't compatible with your system. -- Eric Evans Acunu | http://www.acunu.com | @acunu No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.901 / Virus Database: 271.1.1/3870 - Release Date: 09/01/11 02:34:00
Re: Unable to link C library (for jna.jar) on 0.7.5
On Thu, Sep 1, 2011 at 10:13 AM, Eric Czech e...@nextbigsound.com wrote: I got it here : https://nodeload.github.com/twall/jna/tarball/master Is there some other version or distribution of jna that I should be using? The version I have is 3.3.0. As Dan mentions in another email, if you can install it from an RPM that was built for RHEL 5, then try that first (make sure to add it to the classpath though). -- Eric Evans Acunu | http://www.acunu.com | @acunu
Re: java.io.IOException: Could not get input splits
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, fixed for 0.8.5 On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang jian.fang.subscr...@gmail.com wrote: Hi, I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read data from Cassandra, but got the following errors: 11/09/01 11:42:46 INFO hadoop.SalesRankLoader: Start Cassandra reader... Exception in thread main java.io.IOException: Could not get input splits at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at com.barnesandnoble.hadoop.SalesRankLoader.run(SalesRankLoader.java:359) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.barnesandnoble.hadoop.SalesRankLoader.main(SalesRankLoader.java:408) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: protocol = socket host = null at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153) ... 12 more Caused by: java.lang.IllegalArgumentException: protocol = socket host = null at sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358) at java.net.Socket.connect(Socket.java:529) at org.apache.thrift.transport.TSocket.open(TSocket.java:178) at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:243) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:217) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The code used to work for 0.8.2 and it is really strange to see the host = null. My code is very similar to the word count example, logger.info(Start Cassandra reader...); Job job2 = new Job(getConf(), SalesRankCassandraReader); job2.setJarByClass(SalesRankLoader.class); job2.setMapperClass(CassandraReaderMapper.class); job2.setReducerClass(CassandraToFilesystem.class); job2.setOutputKeyClass(Text.class); job2.setOutputValueClass(IntWritable.class); job2.setMapOutputKeyClass(Text.class); job2.setMapOutputValueClass(IntWritable.class); FileOutputFormat.setOutputPath(job2, new Path(outPath)); job2.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job2.getConfiguration(), 9260); ConfigHelper.setInitialAddress(job2.getConfiguration(), dnjsrcha02); ConfigHelper.setPartitioner(job2.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); ConfigHelper.setInputColumnFamily(job2.getConfiguration(), KEYSPACE, columnFamily); // ConfigHelper.setInputSplitSize(job2.getConfiguration(), 5000); ConfigHelper.setRangeBatchSize(job2.getConfiguration(), batchSize); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName))); ConfigHelper.setInputSlicePredicate(job2.getConfiguration(), predicate); job2.waitForCompletion(true); The Cassandra cluster includes 6 nodes and I am pretty sure they work fine. Please help. Thanks, John -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: java.io.IOException: Could not get input splits
Thanks. How soon 0.8.5 will be out? Is there any 0.8.5 snapshot version available? On Thu, Sep 1, 2011 at 11:57 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, fixed for 0.8.5 On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang jian.fang.subscr...@gmail.com wrote: Hi, I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read data from Cassandra, but got the following errors: 11/09/01 11:42:46 INFO hadoop.SalesRankLoader: Start Cassandra reader... Exception in thread main java.io.IOException: Could not get input splits at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) at com.barnesandnoble.hadoop.SalesRankLoader.run(SalesRankLoader.java:359) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.barnesandnoble.hadoop.SalesRankLoader.main(SalesRankLoader.java:408) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: protocol = socket host = null at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153) ... 12 more Caused by: java.lang.IllegalArgumentException: protocol = socket host = null at sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358) at java.net.Socket.connect(Socket.java:529) at org.apache.thrift.transport.TSocket.open(TSocket.java:178) at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:243) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:217) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190) at org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The code used to work for 0.8.2 and it is really strange to see the host = null. My code is very similar to the word count example, logger.info(Start Cassandra reader...); Job job2 = new Job(getConf(), SalesRankCassandraReader); job2.setJarByClass(SalesRankLoader.class); job2.setMapperClass(CassandraReaderMapper.class); job2.setReducerClass(CassandraToFilesystem.class); job2.setOutputKeyClass(Text.class); job2.setOutputValueClass(IntWritable.class); job2.setMapOutputKeyClass(Text.class); job2.setMapOutputValueClass(IntWritable.class); FileOutputFormat.setOutputPath(job2, new Path(outPath)); job2.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setRpcPort(job2.getConfiguration(), 9260); ConfigHelper.setInitialAddress(job2.getConfiguration(), dnjsrcha02); ConfigHelper.setPartitioner(job2.getConfiguration(), org.apache.cassandra.dht.RandomPartitioner); ConfigHelper.setInputColumnFamily(job2.getConfiguration(), KEYSPACE, columnFamily); //ConfigHelper.setInputSplitSize(job2.getConfiguration(), 5000); ConfigHelper.setRangeBatchSize(job2.getConfiguration(), batchSize); SlicePredicate predicate = new SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName))); ConfigHelper.setInputSlicePredicate(job2.getConfiguration(), predicate); job2.waitForCompletion(true); The Cassandra cluster includes 6 nodes and I am
Re: 15 seconds to increment 17k keys?
Does this scale with multiples of the replication factor or directly with number of nodes? Or more succinctly, to double the writes per second into the cluster how many more nodes would I need? (Thanks for the note on pycassa, I've checked and it's not the limiting factor) Ian On Thu, Sep 1, 2011 at 3:36 AM, Richard Low r...@acunu.com wrote: Assuming you have replicate_on_write enabled (which you almost certainly do for counters), you have to do a read on a write for each increment. This means counter increments, even if all your data set fits in cache, are significantly slower than normal column inserts. I would say ~1k increments per second is about right, although you can probably do some tuning to improve this. I've also found that the pycassa client uses significant amounts of CPU, so be careful you are not CPU bound on the client. -- Richard Low Acunu | http://www.acunu.com | @acunu On Thu, Sep 1, 2011 at 2:31 AM, Yang tedd...@gmail.com wrote: 1ms per add operation is the general order of magnitude I have seen with my tests. On Wed, Aug 31, 2011 at 6:04 PM, Ian Danforth idanfo...@numenta.com wrote: All, I've got a 4 node cluster (ec2 m1.large instances, replication = 3) that has one primary counter type column family, that has one column in the family. There are millions of rows. Each operation consists of doing a batch_insert through pycassa, which increments ~17k keys. A majority of these keys are new in each batch. Each operation is taking up to 15 seconds. For our system this is a significant bottleneck. Does anyone know if this write speed is expected? Thanks in advance, Ian
Re: Trying to understand QUORUM and Strategies
Thanks Evneniy, We encountered this exception with the following settings: bean id=*consistencyLevelPolicy* class=* me.prettyprint.cassandra.model.ConfigurableConsistencyLevel* property name=*defaultReadConsistencyLevel* value=*LOCAL_QUORUM* / property name=*defaultWriteConsistencyLevel* value=*LOCAL_QUORUM* / /bean Caused by: InvalidRequestException(why:consistency level LOCAL_QUORUM not compatible with replication strategy (org.apache.cassandra.locator .SimpleStrategy)) at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19045) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219) Which is why I raised this email originally. It is probable that we have not configured the system correctly, I just need to find out what it is I'm missing. Anthony On Wed, Aug 31, 2011 at 2:59 PM, Evgeniy Ryabitskiy evgeniy.ryabits...@wikimart.ru wrote: Hi Actually you can use LOCAL_QUORUM and EACH_QUORUM policy everywhere on DEV/QA/Prod. Even it would be better for integration tests to use same Consistency level as on production. For production with multiple DC you usually need to chouse between 2 common solutions: Geographical Distribution or Disaster Recovery. See: http://www.datastax.com/docs/0.8/operations/datacenter LOCAL_QUORUM and EACH_QUORUM for DEV/QA/Prod by examples: create keyspace KeyspaceDEV with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{*datacenter1*:1}]; create keyspace KeyspaceQA with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{*datacenter1*:2}]; create keyspace KeyspaceProd with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{*datacenter1*:3, datacenter2:3}]; Be careful(!!!), usually default name of DC in new cluster is *datacenter1 *. But cassandra-cli use default name *DC1*. (some small mismatch/bug maybe). Evgeny.
Re: cassandra-cli describe / dump command
yes, cli show schema in 0.8.4+ On Thu, Sep 1, 2011 at 12:52 PM, J T jt4websi...@googlemail.com wrote: Hi, I'm probably being blind .. but I can't see any way to dump the schema definition (and the data in it for that matter) of a cluster in order to capture the current schema in a script file for subsequent replaying in to a different environment. For example, say I have a DEV env and wanted to create a script containing the cli commands to create that schema in a UAT env. In my case, I have a cassandra schema I've been tweaking / upgrading over the last 2 years and I can't see any easy way to capture the schema definition. Is such a thing on the cards for cassandra-cli ? JT -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: hw requirements
Sorry about unclear naming scheme. I meant that if I want to index on a few columns simultaneously, I create a new column with catenated values of these. On 8/31/2011 3:10 PM, Anthony Ikeda wrote: Sorry to fork this topic, but in composite indexes do you mean as strings or as Composite(). I only ask cause we have started using the Composite as rowkeys and column names to replace the use of concatenated strings mainly for lookup purposes. Anthony On Wed, Aug 31, 2011 at 10:27 AM, Maxim Potekhin potek...@bnl.gov mailto:potek...@bnl.gov wrote: Plenty of comments in this thread already, and I agree with those saying it depends. From my experience, a cluster with 18 spindles total could not match the performance and throughput of our primary Oracle server which had 108 spindles. After we upgraded to SSD, things have definitely changed for the better, for Cassandra. Another thing is that if you plan to implement composite indexes by catenating column values into additional columns, that would constitute a write hence you'll need CPU. So watch out. On 8/29/2011 9:15 AM, Helder Oliveira wrote: Hello guys, What is the type of profile of a cassandra server. Are SSD an option ? Does cassandra needs better CPU ou lots of memory ? Are SATA II disks ok ? I am making some tests, and i started evaluating the possible hardware. If someone already has conclusions about it, please share :D Thanks a lot.
Replicate On Write behavior
I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time. The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache... On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner... Address DC RackStatus State LoadOwns Token 136112946768375385385349842972707284580 10.0.0.57datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 10.0.0.56datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145 10.0.0.55datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290 10.0.0.54datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435 10.0.0.72datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580 The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.
Re: Updates lost
Are you running on windows? If the default timestamp is just using time.time()*1e6 you will get the same timestamp twice if the code is close together. time.time() on windows is only millisecond resolution. I don't use pycassa, but in the Thrift api wrapper I created for our python code I implemented the following function for getting timestamps: def GetTimeInMicroSec(): Returns the current time in microseconds, returned value always increases with each call. :return: Current time in microseconds newTime = long(time.time()*1e6) try: if GetTimeInMicroSec.lastTime = newTime: newTime = GetTimeInMicroSec.lastTime + 1 except AttributeError: pass GetTimeInMicroSec.lastTime = newTime return newTime On 08/29/2011 04:56 PM, Peter Schuller wrote: If the client sleeps for a few ms at each loop, the success rate increases. At 15 ms, the script always succeeds so far. Interestingly, the problem seems to be sensitive to alphabetical order. Updating the value from 'aaa' to 'bbb' never has problem. No pause needed. Is it possible the version of pycassa you're using does not guarantee that successive queries use non-identical and monotonically increasing timestamps? I'm just speculating, but if that is the case and two requests are sent with the same timestamp (due to resolution being lower than the time it takes between calls), the tie breaking would be the column value which jives with the fact that you're saying it seems to depend on the value. (I haven't checked current nor past versions of pycassa to determine if this is plausible. Just speculating.)
Re: Replicate On Write behavior
when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.comwrote: I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time. The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache... On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner... Address DC RackStatus State LoadOwns Token 136112946768375385385349842972707284580 10.0.0.57datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 10.0.0.56datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145 10.0.0.55datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290 10.0.0.54datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435 10.0.0.72datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580 The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.
Re: Replicate On Write behavior
Yeah, I believe that Yan has a type in his post. A CF is no read in one go, a row is. As for the scalability of having all the columns being read at once, I do not believe that it was ever meant to be. All the columns in a row are stored together, on the same set of machines. This means that if you have very large rows, you can have an unbalanced cluster, but it also allows reads of several columns out of a row to be more efficient since they are all together on the same machine (no need to gather results from several machines) and should read quickly since they are all together on disk. - Original Message - From: Ian Danforth idanfo...@numenta.com To: user@cassandra.apache.org Sent: Thursday, September 1, 2011 4:35:33 PM Subject: Re: Replicate On Write behavior I'm not sure I understand the scalability of this approach. A given column family can be HUGE with millions of rows and columns. In my cluster I have a single column family that accounts for 90GB of load on each node. Not only that but column family is distributed over the entire ring. Clearly I'm misunderstanding something. Ian On Thu, Sep 1, 2011 at 1:17 PM, Yang tedd...@gmail.com wrote: when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.com wrote: I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time. The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache... On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner... Address DC Rack Status State Load Owns Token 136112946768375385385349842972707284580 10.0.0.57 datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 10.0.0.56 datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145 10.0.0.55 datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290 10.0.0.54 datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435 10.0.0.72 datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580 The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.
Re: Replicate On Write behavior
sorry i mean cf * row if you look in the code, db.cf is just basically a set of columns On Sep 1, 2011 1:36 PM, Ian Danforth idanfo...@numenta.com wrote: I'm not sure I understand the scalability of this approach. A given column family can be HUGE with millions of rows and columns. In my cluster I have a single column family that accounts for 90GB of load on each node. Not only that but column family is distributed over the entire ring. Clearly I'm misunderstanding something. Ian On Thu, Sep 1, 2011 at 1:17 PM, Yang tedd...@gmail.com wrote: when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.com wrote: I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average insert latency increases over time. The strange thing is that I'm not seeing disk read IO increase over that same period, but that might be due to the OS buffer cache... On another note, on a 5-node cluster, I'm only seeing 3 nodes with ReplicateOnWrite Completed tasks in nodetool tpstats output. Is that normal? I'm using RandomPartitioner... Address DC RackStatus State Load OwnsToken 136112946768375385385349842972707284580 10.0.0.57datacenter1 rack1 Up Normal 2.26 GB 20.00% 0 10.0.0.56datacenter1 rack1 Up Normal 2.47 GB 20.00% 34028236692093846346337460743176821145 10.0.0.55datacenter1 rack1 Up Normal 2.52 GB 20.00% 68056473384187692692674921486353642290 10.0.0.54datacenter1 rack1 Up Normal 950.97 MB 20.00% 102084710076281539039012382229530463435 10.0.0.72datacenter1 rack1 Up Normal 383.25 MB 20.00% 136112946768375385385349842972707284580 The nodes with ReplicateOnWrites are the 3 in the middle. The first node and last node both have a count of 0. This is a clean cluster, and I've been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours. The last time this test ran, it went all the way down to 500 inserts/sec before I killed it.
Bulk loader: Got an unknow host from describe_ring
Hello, I'm trying to import data from one Cassandra cluster to another. The old cluster is using ports 7000 and 9160 and the new cluster is using 7001 and 9161. I ran nodetool -h localhost snapshot on a node on the old cluster. I then downloaded apache-cassandra-0.8.4-bin.tar.gz, edited conf/cassandra.yaml appropriately for the new cluster, exported CASSANDRA_INCLUDE=bin/cassandra.sh.in (so it doesn't try to use the installed, running Cassandra). When I run sstableloader, I can see that it's connecting to the new cluster (by tailing the logs), but after a few seconds, it gives the error: Got an unknow host from describe_ring After banging my head for a while, I scp'ed the snapshotted data to a node on the *new* cluster. I again downloaded apache-cassandra-0.8.4-bin.tar.gz and configured cassandra.yaml there appropriately (with listen_address: 127.0.0.1 so as to not conflict with the Cassandra already running on the node). Running sstableloader resulted in the same error. nodetool -h localhost ring shows a healthy cluster. Running that command works both locally and remotely. I can connect to the cluster using cassandra-cli both locally and remotely as well. Any ideas? Thanks for the help.
Re: Bulk loader: Got an unknow host from describe_ring
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, fixed for 0.8.5 On Thu, Sep 1, 2011 at 4:27 PM, Christopher Bottaro cjbott...@onespot.com wrote: Hello, I'm trying to import data from one Cassandra cluster to another. The old cluster is using ports 7000 and 9160 and the new cluster is using 7001 and 9161. I ran nodetool -h localhost snapshot on a node on the old cluster. I then downloaded apache-cassandra-0.8.4-bin.tar.gz, edited conf/cassandra.yaml appropriately for the new cluster, exported CASSANDRA_INCLUDE=bin/cassandra.sh.in (so it doesn't try to use the installed, running Cassandra). When I run sstableloader, I can see that it's connecting to the new cluster (by tailing the logs), but after a few seconds, it gives the error: Got an unknow host from describe_ring After banging my head for a while, I scp'ed the snapshotted data to a node on the *new* cluster. I again downloaded apache-cassandra-0.8.4-bin.tar.gz and configured cassandra.yaml there appropriately (with listen_address: 127.0.0.1 so as to not conflict with the Cassandra already running on the node). Running sstableloader resulted in the same error. nodetool -h localhost ring shows a healthy cluster. Running that command works both locally and remotely. I can connect to the cluster using cassandra-cli both locally and remotely as well. Any ideas? Thanks for the help. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
RE: Removal of old data files
Yes, I see files with name like Orders-g-6517-Compacted However, all of those file have a size of 0. Starting from Monday to Thurseday we have 5642 files for -Data.db, -Filter.db and Statistics.db and only 128 -Compacted files. and all of -Compacted file has size of 0. Is this normal, or we are doing something wrong? yuki From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, August 25, 2011 6:13 PM To: user@cassandra.apache.org Subject: Re: Removal of old data files If cassandra does not have enough disk space to create a new file it will provoke a JVM GC which should result in compacted SStables that are no longer needed been deleted. Otherwise they are deleted at some time in the future. Compacted SSTables have a file written out with a compacted extension. Do you see compacted sstables in the data directory? Cheers. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 26/08/2011, at 2:29 AM, yuki watanabe wrote: We are using Cassandra 0.8.0 with 8 node ring and only one CF. Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to 43200 (12 hours). We have to store massive amount of data for one day now and eventually for five days if we get more disk space. Even for one day, we do run out disk space in a busy day. We run nodetool compact command at night or as necessary then we run GC from jconsole. We observed that GC did remove files but not necessarily oldest ones. Data files from more than 36 hours ago and quite often three days ago are still there. Does this behavior expected or we need adjust some other parameters? Yuki Watanabe ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___
Re: Updates lost
Well, on windows Vista and below (haven't checked on 7), System.currentTimeMillis only has around 10ms granularity. That is for any 10ms period, you get the same value. I develop on Windows and I'd get sporadic integration test failures due to this. On Thu, Sep 1, 2011 at 8:31 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Are you running on windows? If the default timestamp is just using time.time()*1e6 you will get the same timestamp twice if the code is close together. time.time() on windows is only millisecond resolution. I don't use pycassa, but in the Thrift api wrapper I created for our python code I implemented the following function for getting timestamps: def GetTimeInMicroSec(): Returns the current time in microseconds, returned value always increases with each call. :return: Current time in microseconds newTime = long(time.time()*1e6) try: if GetTimeInMicroSec.lastTime = newTime: newTime = GetTimeInMicroSec.lastTime + 1 except AttributeError: pass GetTimeInMicroSec.lastTime = newTime return newTime On 08/29/2011 04:56 PM, Peter Schuller wrote: If the client sleeps for a few ms at each loop, the success rate increases. At 15 ms, the script always succeeds so far. Interestingly, the problem seems to be sensitive to alphabetical order. Updating the value from 'aaa' to 'bbb' never has problem. No pause needed. Is it possible the version of pycassa you're using does not guarantee that successive queries use non-identical and monotonically increasing timestamps? I'm just speculating, but if that is the case and two requests are sent with the same timestamp (due to resolution being lower than the time it takes between calls), the tie breaking would be the column value which jives with the fact that you're saying it seems to depend on the value. (I haven't checked current nor past versions of pycassa to determine if this is plausible. Just speculating.) -- - Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy
Fun with Heap Dump ...
All, I need help interpreting the results of my investigation. I'm encountering this error: Unable to reduce heap usage since there are no dirty column families. My heap sits near max and occasionally OOMs. (4GB heap) Following Mr. Ellis's instructions here: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Memory-Usage-During-Read-td6338635.html I set the heap down to 1GB, restarted the node, watched the memory climb in jconsole and waited for a heap dump. (Of course the first time I tried this I got a permission denied error on writing out the dump, and had to restart C* as root, but anyway ...) Below you'll find a screen grab of the heap dump analysis. http://screencast.com/t/U6IYzloe2McP Here is what I see in cassandra.log just prior to OOM: [ec2-user@ip-10-86-223-245 ~]$ tail -f /var/log/cassandra/cassandra.log INFO 22:37:11,193 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable) INFO 22:37:11,194 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable) INFO 22:37:11,195 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable) INFO 22:37:11,196 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable) INFO 22:37:11,212 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable) INFO 22:38:32,485 Opening /cassandra/data/Keyspace1/TwitterTest-g-5852 INFO 22:38:33,253 Opening /cassandra/data/Keyspace1/TwitterTest-g-5502 INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 INFO 22:38:35,653 Opening /cassandra/data/Keyspace1/TwitterTest-g-6117 INFO 22:38:35,699 Opening /cassandra/data/Keyspace1/TwitterTest-g-1376 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27] ...etc... From my org.apache.cassandra.db MBean I get an estimate of 596,504,576 keys. - What I'd really like to know is: 1. What operation is C* performing during lines like these: INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 (I think this is an SSTable it's extracting an index for this column family from) 2. Has my CF index outgrown memory? 3. If so is there a way to relate # CF, # Columns, # Rows to index size? I need to know how many keys I can store before I need more memory, or need more nodes. Thanks in advance. I've been getting a lot of help from the list and I really appreciate it! Ian
Re: Fun with Heap Dump ...
On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth idanfo...@numenta.com wrote: 1. What operation is C* performing during lines like these: INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 (I think this is an SSTable it's extracting an index for this column family from) Right. 2. Has my CF index outgrown memory? Yes. 3. If so is there a way to relate # CF, # Columns, # Rows to index size? Only the row index sample is kept in memory. So it's the product of key size * row count / index_interval. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Fun with Heap Dump ...
Awesome, thanks for the quick response! Ian On Thu, Sep 1, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth idanfo...@numenta.com wrote: 1. What operation is C* performing during lines like these: INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 (I think this is an SSTable it's extracting an index for this column family from) Right. 2. Has my CF index outgrown memory? Yes. 3. If so is there a way to relate # CF, # Columns, # Rows to index size? Only the row index sample is kept in memory. So it's the product of key size * row count / index_interval. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Limiting ColumnSlice range in second composite value
My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query across the TimeUUIDs correctly, but now I want to also range across the UTF8 component. Is this possible? UUID start = uuidForDate(new Date(1979, 1, 1)); UUID end = uuidForDate(new Date(Long.MAX_VALUE)); String startState = ; String endState = ; if (desiredState != null) { mLog.debug(Restricting state to [ + desiredState.getValue() + ]); startState = desiredState.getValue(); endState = desiredState.getValue().concat(_); } Composite startComp = new Composite(start, startState); Composite endComp = new Composite(end, endState); query.setRange(startComp, endComp, true, count); So far I'm not seeing any effect setting my endState String value. Anthony