Re: 15 seconds to increment 17k keys?

2011-09-01 Thread Richard Low
Assuming you have replicate_on_write enabled (which you almost
certainly do for counters), you have to do a read on a write for each
increment.  This means counter increments, even if all your data set
fits in cache, are significantly slower than normal column inserts.  I
would say ~1k increments per second is about right, although you can
probably do some tuning to improve this.

I've also found that the pycassa client uses significant amounts of
CPU, so be careful you are not CPU bound on the client.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

On Thu, Sep 1, 2011 at 2:31 AM, Yang tedd...@gmail.com wrote:
 1ms per add operation is the general order of magnitude I have seen with my
 tests.


 On Wed, Aug 31, 2011 at 6:04 PM, Ian Danforth idanfo...@numenta.com wrote:

 All,

  I've got a 4 node cluster (ec2 m1.large instances, replication = 3)
 that has one primary counter type column family, that has one column
 in the family. There are millions of rows. Each operation consists of
 doing a batch_insert through pycassa, which increments ~17k keys. A
 majority of these keys are new in each batch.

  Each operation is taking up to 15 seconds. For our system this is a
 significant bottleneck.

  Does anyone know if this write speed is expected?

 Thanks in advance,

  Ian




Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mck
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote:
 But there is the another problem with Hadoop-Cassandra, if there is no
 node available for a range of keys, it fails on RuntimeError. For
 example having a keyspace with RF=1 and a node is down all MapReduce
 tasks fail. 

CASSANDRA-2388 is related but not the same.

Before 0.8.4 the behaviour was if the local cassandra node didn't have
the split's data the tasktracker would connect to another cassandra node
where the split's data could be found.

So even 0.8.4 with RF=1 you would have your hadoop job fail.

Although I've reopened CASSANDRA-2388 (and reverted the code locally)
because the new behaviour in 0.8.4 leads to abysmal tasktracker
throughput (for me task allocation doesn't seem to honour data-locality
according to split.getLocations()).

 I've reworked my previous patch, that was addressing this
 issue and now there are ConfigHelper methods for enable/disable
 ignoring unavailable ranges.
 It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) 

I'm interested in this patch and see it's usefulness but no one will act
until you attach it to an issue. (I think a new issue is appropriate
here).

~mck



Column index limitations: total number of indexes per row?

2011-09-01 Thread Renato Bacelar da Silveira


Hi All

I have indexed a number of columns in a ROW, ie 25 colums, to perform 
Indexed_slice queries.


If I am not mistaken, there is some limit to the number of indexes one 
may create per row/keyspace?


I am trying to get up to 6000 columns indexed, per row, in 2.5 million rows.

So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 
indexes.


Would that be tantamount to an atomic bomb riding full speed out of my 
test server?


I have read the strategies of building Column families as indexes. I 
find it quite helpful,
and a good solution for indexing. But can the above scenario ever be 
achieved? (haven't

had time to try that yet).

Regards to ALL!
--
Renato da Silveira
Senior Developer
www.indabamobile.co.za


Fwd: Column index limitations: total number of indexes per row? OOPS :/

2011-09-01 Thread Renato Bacelar da Silveira

HAHA finger trouble on bellow line ---

So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 
indexes.


*
The sum actually is meant to be: 15,000,000 - so thats 15 million 
indexes - sho!*


Apologies :)


 Original Message 
Subject:Column index limitations: total number of indexes per row?
Date:   Thu, 01 Sep 2011 14:17:01 +0200
From:   Renato Bacelar da Silveira renat...@indabamobile.co.za
Reply-To:   user@cassandra.apache.org
To: user  user@cass user@cassandra.apache.org




Hi All

I have indexed a number of columns in a ROW, ie 25 colums, to perform 
Indexed_slice queries.


If I am not mistaken, there is some limit to the number of indexes one 
may create per row/keyspace?


I am trying to get up to 6000 columns indexed, per row, in 2.5 million rows.

*So I will be looking at 6000 x 2.5million indexes. Yep, that's 
6,250,000 indexes.*


Would that be tantamount to an atomic bomb riding full speed out of my 
test server?


I have read the strategies of building Column families as indexes. I 
find it quite helpful,
and a good solution for indexing. But can the above scenario ever be 
achieved? (haven't

had time to try that yet).

Regards to ALL!
--
Renato da Silveira
Senior Developer
www.indabamobile.co.za


Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Evans
On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com wrote:
 I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and I'm
 having issues with snapshots, json sstable conversions, and various nodetool
 commands due to memory errors and the lack of the native access C libraries.
  I tried putting jna.jar on the classpath but I'm still seeing warnings in
 the log files like CLibrary.java (line 65) Unable to link C library. Native
 methods will be disabled..  Based on this warning, It looks like the .jar
 file is actually on the classpath but the native access libraries still
 aren't being used.

Where did you get this jar?  My guess is that the native code in that
jar isn't compatible with your system.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Czech
I got it here : https://nodeload.github.com/twall/jna/tarball/master

Is there some other version or distribution of jna that I should be using?
 The version I have is 3.3.0.

On Thu, Sep 1, 2011 at 8:49 AM, Eric Evans eev...@acunu.com wrote:

 On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com
 wrote:
  I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and
 I'm
  having issues with snapshots, json sstable conversions, and various
 nodetool
  commands due to memory errors and the lack of the native access C
 libraries.
   I tried putting jna.jar on the classpath but I'm still seeing warnings
 in
  the log files like CLibrary.java (line 65) Unable to link C library.
 Native
  methods will be disabled..  Based on this warning, It looks like the
 .jar
  file is actually on the classpath but the native access libraries still
  aren't being used.

 Where did you get this jar?  My guess is that the native code in that
 jar isn't compatible with your system.

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu



RE: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Dan Hendry
Have you installed 'jna'? On RHEL (6 at least) it should be possible using
the default yum repos. You need the native code and the JAR in Cassandras
classpath from what I understand.

 

Dan

 

From: eczec...@gmail.com [mailto:eczec...@gmail.com] On Behalf Of Eric Czech
Sent: September-01-11 11:13
To: user@cassandra.apache.org
Subject: Re: Unable to link C library (for jna.jar) on 0.7.5

 

I got it here : https://nodeload.github.com/twall/jna/tarball/master

 

Is there some other version or distribution of jna that I should be using?
The version I have is 3.3.0.

 

On Thu, Sep 1, 2011 at 8:49 AM, Eric Evans eev...@acunu.com wrote:

On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech e...@nextbigsound.com wrote:
 I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and
I'm
 having issues with snapshots, json sstable conversions, and various
nodetool
 commands due to memory errors and the lack of the native access C
libraries.
  I tried putting jna.jar on the classpath but I'm still seeing warnings in
 the log files like CLibrary.java (line 65) Unable to link C library.
Native
 methods will be disabled..  Based on this warning, It looks like the .jar
 file is actually on the classpath but the native access libraries still
 aren't being used.

Where did you get this jar?  My guess is that the native code in that
jar isn't compatible with your system.

--
Eric Evans
Acunu | http://www.acunu.com | @acunu

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.901 / Virus Database: 271.1.1/3870 - Release Date: 09/01/11
02:34:00



Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Evans
On Thu, Sep 1, 2011 at 10:13 AM, Eric Czech e...@nextbigsound.com wrote:
 I got it here : https://nodeload.github.com/twall/jna/tarball/master
 Is there some other version or distribution of jna that I should be using?
  The version I have is 3.3.0.

As Dan mentions in another email, if you can install it from an RPM
that was built for RHEL 5, then try that first (make sure to add it to
the classpath though).


-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: java.io.IOException: Could not get input splits

2011-09-01 Thread Jonathan Ellis
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044,
fixed for 0.8.5

On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang
jian.fang.subscr...@gmail.com wrote:
 Hi,

 I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read data
 from Cassandra, but
 got the following errors:

 11/09/01 11:42:46 INFO hadoop.SalesRankLoader: Start Cassandra reader...
 Exception in thread main java.io.IOException: Could not get input splits
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
     at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
     at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
     at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
     at
 com.barnesandnoble.hadoop.SalesRankLoader.run(SalesRankLoader.java:359)
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
     at
 com.barnesandnoble.hadoop.SalesRankLoader.main(SalesRankLoader.java:408)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.util.concurrent.ExecutionException:
 java.lang.IllegalArgumentException: protocol = socket host = null
     at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
     at java.util.concurrent.FutureTask.get(FutureTask.java:83)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
     ... 12 more
 Caused by: java.lang.IllegalArgumentException: protocol = socket host = null
     at
 sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358)
     at java.net.Socket.connect(Socket.java:529)
     at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
     at
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:243)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:217)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
     at
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

 The code used to work for 0.8.2 and it is really strange to see the host =
 null. My code is very similar to the word count example,

     logger.info(Start Cassandra reader...);
     Job job2 = new Job(getConf(), SalesRankCassandraReader);
     job2.setJarByClass(SalesRankLoader.class);
     job2.setMapperClass(CassandraReaderMapper.class);
     job2.setReducerClass(CassandraToFilesystem.class);
     job2.setOutputKeyClass(Text.class);
     job2.setOutputValueClass(IntWritable.class);
     job2.setMapOutputKeyClass(Text.class);
     job2.setMapOutputValueClass(IntWritable.class);
     FileOutputFormat.setOutputPath(job2, new Path(outPath));

     job2.setInputFormatClass(ColumnFamilyInputFormat.class);

     ConfigHelper.setRpcPort(job2.getConfiguration(), 9260);
     ConfigHelper.setInitialAddress(job2.getConfiguration(),
 dnjsrcha02);
     ConfigHelper.setPartitioner(job2.getConfiguration(),
 org.apache.cassandra.dht.RandomPartitioner);
     ConfigHelper.setInputColumnFamily(job2.getConfiguration(), KEYSPACE,
 columnFamily);
 //    ConfigHelper.setInputSplitSize(job2.getConfiguration(), 5000);
     ConfigHelper.setRangeBatchSize(job2.getConfiguration(), batchSize);
     SlicePredicate predicate = new
 SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName)));
     ConfigHelper.setInputSlicePredicate(job2.getConfiguration(),
 predicate);

     job2.waitForCompletion(true);

 The Cassandra cluster includes 6 nodes and I am pretty sure they work fine.

 Please help.

 Thanks,

 John






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: java.io.IOException: Could not get input splits

2011-09-01 Thread Jian Fang
Thanks. How soon 0.8.5 will be out? Is there any 0.8.5 snapshot version
available?

On Thu, Sep 1, 2011 at 11:57 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044,
 fixed for 0.8.5

 On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang
 jian.fang.subscr...@gmail.com wrote:
  Hi,
 
  I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read
 data
  from Cassandra, but
  got the following errors:
 
  11/09/01 11:42:46 INFO hadoop.SalesRankLoader: Start Cassandra reader...
  Exception in thread main java.io.IOException: Could not get input
 splits
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:157)
  at
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
  at
  org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
  at
  com.barnesandnoble.hadoop.SalesRankLoader.run(SalesRankLoader.java:359)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at
  com.barnesandnoble.hadoop.SalesRankLoader.main(SalesRankLoader.java:408)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
  Caused by: java.util.concurrent.ExecutionException:
  java.lang.IllegalArgumentException: protocol = socket host = null
  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:153)
  ... 12 more
  Caused by: java.lang.IllegalArgumentException: protocol = socket host =
 null
  at
  sun.net.spi.DefaultProxySelector.select(DefaultProxySelector.java:151)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:358)
  at java.net.Socket.connect(Socket.java:529)
  at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
  at
 
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:243)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSubSplits(ColumnFamilyInputFormat.java:217)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.access$200(ColumnFamilyInputFormat.java:70)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:190)
  at
 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat$SplitCallable.call(ColumnFamilyInputFormat.java:175)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  The code used to work for 0.8.2 and it is really strange to see the host
 =
  null. My code is very similar to the word count example,
 
  logger.info(Start Cassandra reader...);
  Job job2 = new Job(getConf(), SalesRankCassandraReader);
  job2.setJarByClass(SalesRankLoader.class);
  job2.setMapperClass(CassandraReaderMapper.class);
  job2.setReducerClass(CassandraToFilesystem.class);
  job2.setOutputKeyClass(Text.class);
  job2.setOutputValueClass(IntWritable.class);
  job2.setMapOutputKeyClass(Text.class);
  job2.setMapOutputValueClass(IntWritable.class);
  FileOutputFormat.setOutputPath(job2, new Path(outPath));
 
  job2.setInputFormatClass(ColumnFamilyInputFormat.class);
 
  ConfigHelper.setRpcPort(job2.getConfiguration(), 9260);
  ConfigHelper.setInitialAddress(job2.getConfiguration(),
  dnjsrcha02);
  ConfigHelper.setPartitioner(job2.getConfiguration(),
  org.apache.cassandra.dht.RandomPartitioner);
  ConfigHelper.setInputColumnFamily(job2.getConfiguration(),
 KEYSPACE,
  columnFamily);
  //ConfigHelper.setInputSplitSize(job2.getConfiguration(), 5000);
  ConfigHelper.setRangeBatchSize(job2.getConfiguration(),
 batchSize);
  SlicePredicate predicate = new
 
 SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName)));
  ConfigHelper.setInputSlicePredicate(job2.getConfiguration(),
  predicate);
 
  job2.waitForCompletion(true);
 
  The Cassandra cluster includes 6 nodes and I am 

Re: 15 seconds to increment 17k keys?

2011-09-01 Thread Ian Danforth
Does this scale with multiples of the replication factor or directly
with number of nodes? Or more succinctly, to double the writes per
second into the cluster how many more nodes would I need? (Thanks for
the note on pycassa, I've checked and it's not the limiting factor)

Ian

On Thu, Sep 1, 2011 at 3:36 AM, Richard Low r...@acunu.com wrote:
 Assuming you have replicate_on_write enabled (which you almost
 certainly do for counters), you have to do a read on a write for each
 increment.  This means counter increments, even if all your data set
 fits in cache, are significantly slower than normal column inserts.  I
 would say ~1k increments per second is about right, although you can
 probably do some tuning to improve this.

 I've also found that the pycassa client uses significant amounts of
 CPU, so be careful you are not CPU bound on the client.

 --
 Richard Low
 Acunu | http://www.acunu.com | @acunu

 On Thu, Sep 1, 2011 at 2:31 AM, Yang tedd...@gmail.com wrote:
 1ms per add operation is the general order of magnitude I have seen with my
 tests.


 On Wed, Aug 31, 2011 at 6:04 PM, Ian Danforth idanfo...@numenta.com wrote:

 All,

  I've got a 4 node cluster (ec2 m1.large instances, replication = 3)
 that has one primary counter type column family, that has one column
 in the family. There are millions of rows. Each operation consists of
 doing a batch_insert through pycassa, which increments ~17k keys. A
 majority of these keys are new in each batch.

  Each operation is taking up to 15 seconds. For our system this is a
 significant bottleneck.

  Does anyone know if this write speed is expected?

 Thanks in advance,

  Ian





Re: Trying to understand QUORUM and Strategies

2011-09-01 Thread Anthony Ikeda
Thanks Evneniy,

We encountered this exception with the following settings:

bean id=*consistencyLevelPolicy* class=*
me.prettyprint.cassandra.model.ConfigurableConsistencyLevel*

  property name=*defaultReadConsistencyLevel* value=*LOCAL_QUORUM*
/

  property name=*defaultWriteConsistencyLevel* value=*LOCAL_QUORUM*
/

/bean

Caused by: InvalidRequestException(why:consistency level LOCAL_QUORUM not
compatible with replication strategy (org.apache.cassandra.locator

.SimpleStrategy))

at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19045)

at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)

at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)

at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)

at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)

at
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)

at
me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)

at
me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)

at
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)

at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)

at
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219)

Which is why I raised this email originally. It is probable that we have not
configured the system correctly, I just need to find out what it is I'm
missing.

Anthony

On Wed, Aug 31, 2011 at 2:59 PM, Evgeniy Ryabitskiy 
evgeniy.ryabits...@wikimart.ru wrote:

 Hi
 Actually you can use LOCAL_QUORUM and EACH_QUORUM policy everywhere on
 DEV/QA/Prod.
 Even it would be better for integration tests to use same Consistency level
 as on production.

 For production with multiple DC you usually need to chouse between 2 common
 solutions: Geographical Distribution or Disaster Recovery.
 See: http://www.datastax.com/docs/0.8/operations/datacenter

  LOCAL_QUORUM and EACH_QUORUM for DEV/QA/Prod by examples:

 create keyspace KeyspaceDEV
 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'
 and strategy_options=[{*datacenter1*:1}];

 create keyspace KeyspaceQA
 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'
 and strategy_options=[{*datacenter1*:2}];

 create keyspace KeyspaceProd
 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'
 and strategy_options=[{*datacenter1*:3, datacenter2:3}];


 Be careful(!!!), usually default name of DC in new cluster is *datacenter1
 *. But cassandra-cli use default name *DC1*. (some small mismatch/bug
 maybe).

 Evgeny.



Re: cassandra-cli describe / dump command

2011-09-01 Thread Jonathan Ellis
yes, cli show schema in 0.8.4+

On Thu, Sep 1, 2011 at 12:52 PM, J T jt4websi...@googlemail.com wrote:
 Hi,

 I'm probably being blind .. but I can't see any way to dump the schema
 definition (and the data in it for that matter)  of a cluster in order to
 capture the current schema in a script file for subsequent replaying in to a
 different environment.

 For example, say I have a DEV env and wanted to create a script containing
 the cli commands to create that schema in a UAT env.

 In my case, I have a cassandra schema I've been tweaking / upgrading over
 the last 2 years and I can't see any easy way to capture the schema
 definition.

 Is such a thing on the cards for cassandra-cli ?

 JT




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: hw requirements

2011-09-01 Thread Maxim Potekhin
Sorry about unclear naming scheme. I meant that if I want to index on a 
few columns simultaneously,

I create a new column with catenated values of these.

On 8/31/2011 3:10 PM, Anthony Ikeda wrote:
Sorry to fork this topic, but in composite indexes do you mean as 
strings or as Composite(). I only ask cause we have started using 
the Composite as rowkeys and column names to replace the use of 
concatenated strings mainly for lookup purposes.


Anthony


On Wed, Aug 31, 2011 at 10:27 AM, Maxim Potekhin potek...@bnl.gov 
mailto:potek...@bnl.gov wrote:


Plenty of comments in this thread already, and I agree with those
saying
it depends. From my experience, a cluster with 18 spindles total
could not match the performance and throughput of our primary
Oracle server which had 108 spindles. After we upgraded to SSD,
things have definitely changed for the better, for Cassandra.

Another thing is that if you plan to implement composite indexes by
catenating column values into additional columns, that would
constitute
a write hence you'll need CPU. So watch out.



On 8/29/2011 9:15 AM, Helder Oliveira wrote:

Hello guys,

What is the type of profile of a cassandra server.
Are SSD an option ?
Does cassandra needs better CPU ou lots of memory ?
Are SATA II disks ok ?

I am making some tests, and i started evaluating the possible
hardware.

If someone already has conclusions about it, please share :D

Thanks a lot.







Replicate On Write behavior

2011-09-01 Thread David Hawthorne
I'm curious... digging through the source, it looks like replicate on write 
triggers a read of the entire row, and not just the columns/supercolumns that 
are affected by the counter update.  Is this the case?  It would certainly 
explain why my inserts/sec decay over time and why the average insert latency 
increases over time.  The strange thing is that I'm not seeing disk read IO 
increase over that same period, but that might be due to the OS buffer cache...

On another note, on a 5-node cluster, I'm only seeing 3 nodes with 
ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that normal?  
I'm using RandomPartitioner...

Address DC  RackStatus State   LoadOwns
Token  

136112946768375385385349842972707284580 
10.0.0.57datacenter1 rack1   Up Normal  2.26 GB 20.00%  0   

10.0.0.56datacenter1 rack1   Up Normal  2.47 GB 20.00%  
34028236692093846346337460743176821145  
10.0.0.55datacenter1 rack1   Up Normal  2.52 GB 20.00%  
68056473384187692692674921486353642290  
10.0.0.54datacenter1 rack1   Up Normal  950.97 MB   20.00%  
102084710076281539039012382229530463435 
10.0.0.72datacenter1 rack1   Up Normal  383.25 MB   20.00%  
136112946768375385385349842972707284580 

The nodes with ReplicateOnWrites are the 3 in the middle.  The first node and 
last node both have a count of 0.  This is a clean cluster, and I've been doing 
3k ... 2.5k (decaying performance) inserts/sec for the last 12 hours.  The last 
time this test ran, it went all the way down to 500 inserts/sec before I killed 
it.

Re: Updates lost

2011-09-01 Thread Jeremiah Jordan
Are you running on windows?  If the default timestamp is just using 
time.time()*1e6 you will get the same timestamp twice if the code is 
close together.  time.time() on windows is only millisecond resolution.  
I don't use pycassa, but in the Thrift api wrapper I created for our 
python code I implemented the following function for getting timestamps:


def GetTimeInMicroSec():

Returns the current time in microseconds, returned value always 
increases with each call.


:return: Current time in microseconds

newTime = long(time.time()*1e6)
try:
if GetTimeInMicroSec.lastTime = newTime:
newTime = GetTimeInMicroSec.lastTime + 1
except AttributeError:
pass
GetTimeInMicroSec.lastTime = newTime
return newTime


On 08/29/2011 04:56 PM, Peter Schuller wrote:

If the client sleeps for a few ms at each loop, the success rate
increases. At 15 ms, the script always succeeds so far. Interestingly,
the problem seems to be sensitive to alphabetical order. Updating the
value from 'aaa' to 'bbb' never has problem. No pause needed.

Is it possible the version of pycassa you're using does not guarantee
that successive queries use non-identical and monotonically increasing
timestamps? I'm just speculating, but if that is the case and two
requests are sent with the same timestamp (due to resolution being
lower than the time it takes between calls), the tie breaking would be
the column value which jives with the fact that you're saying it seems
to depend on the value.

(I haven't checked current nor past versions of pycassa to determine
if this is plausible. Just speculating.)



Re: Replicate On Write behavior

2011-09-01 Thread Yang
when Cassandra reads, the entire CF is always read together, only at the
hand-over to client does the pruning happens

On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.comwrote:

 I'm curious... digging through the source, it looks like replicate on write
 triggers a read of the entire row, and not just the columns/supercolumns
 that are affected by the counter update.  Is this the case?  It would
 certainly explain why my inserts/sec decay over time and why the average
 insert latency increases over time.  The strange thing is that I'm not
 seeing disk read IO increase over that same period, but that might be due to
 the OS buffer cache...

 On another note, on a 5-node cluster, I'm only seeing 3 nodes with
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that
 normal?  I'm using RandomPartitioner...

 Address DC  RackStatus State   LoadOwns
Token

  136112946768375385385349842972707284580
 10.0.0.57datacenter1 rack1   Up Normal  2.26 GB 20.00%
  0
 10.0.0.56datacenter1 rack1   Up Normal  2.47 GB 20.00%
  34028236692093846346337460743176821145
 10.0.0.55datacenter1 rack1   Up Normal  2.52 GB 20.00%
  68056473384187692692674921486353642290
 10.0.0.54datacenter1 rack1   Up Normal  950.97 MB   20.00%
  102084710076281539039012382229530463435
 10.0.0.72datacenter1 rack1   Up Normal  383.25 MB   20.00%
  136112946768375385385349842972707284580

 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node
 and last node both have a count of 0.  This is a clean cluster, and I've
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12
 hours.  The last time this test ran, it went all the way down to 500
 inserts/sec before I killed it.


Re: Replicate On Write behavior

2011-09-01 Thread Konstantin Naryshkin
Yeah, I believe that Yan has a type in his post. A CF is no read in one go, a 
row is. As for the scalability of having all the columns being read at once, I 
do not believe that it was ever meant to be. All the columns in a row are 
stored together, on the same set of machines. This means that if you have very 
large rows, you can have an unbalanced cluster, but it also allows reads of 
several columns out of a row to be more efficient since they are all together 
on the same machine (no need to gather results from several machines) and 
should read quickly since they are all together on disk.

- Original Message -
From: Ian Danforth idanfo...@numenta.com
To: user@cassandra.apache.org
Sent: Thursday, September 1, 2011 4:35:33 PM
Subject: Re: Replicate On Write behavior

I'm not sure I understand the scalability of this approach. A given
column family can be HUGE with millions of rows and columns. In my
cluster I have a single column family that accounts for 90GB of load
on each node. Not only that but column family is distributed over the
entire ring.

Clearly I'm misunderstanding something.

Ian

On Thu, Sep 1, 2011 at 1:17 PM, Yang tedd...@gmail.com wrote:
 when Cassandra reads, the entire CF is always read together, only at the
 hand-over to client does the pruning happens

 On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.com
 wrote:

 I'm curious... digging through the source, it looks like replicate on
 write triggers a read of the entire row, and not just the
 columns/supercolumns that are affected by the counter update.  Is this the
 case?  It would certainly explain why my inserts/sec decay over time and why
 the average insert latency increases over time.  The strange thing is that
 I'm not seeing disk read IO increase over that same period, but that might
 be due to the OS buffer cache...

 On another note, on a 5-node cluster, I'm only seeing 3 nodes with
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that
 normal?  I'm using RandomPartitioner...

 Address         DC          Rack        Status State   Load
  Owns    Token

  136112946768375385385349842972707284580
 10.0.0.57    datacenter1 rack1       Up     Normal  2.26 GB         20.00%
  0
 10.0.0.56    datacenter1 rack1       Up     Normal  2.47 GB         20.00%
  34028236692093846346337460743176821145
 10.0.0.55    datacenter1 rack1       Up     Normal  2.52 GB         20.00%
  68056473384187692692674921486353642290
 10.0.0.54    datacenter1 rack1       Up     Normal  950.97 MB       20.00%
  102084710076281539039012382229530463435
 10.0.0.72    datacenter1 rack1       Up     Normal  383.25 MB       20.00%
  136112946768375385385349842972707284580

 The nodes with ReplicateOnWrites are the 3 in the middle.  The first node
 and last node both have a count of 0.  This is a clean cluster, and I've
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last 12
 hours.  The last time this test ran, it went all the way down to 500
 inserts/sec before I killed it.



Re: Replicate On Write behavior

2011-09-01 Thread Yang
sorry i mean  cf * row

if you look in the code, db.cf  is just basically a set of columns
On Sep 1, 2011 1:36 PM, Ian Danforth idanfo...@numenta.com wrote:
 I'm not sure I understand the scalability of this approach. A given
 column family can be HUGE with millions of rows and columns. In my
 cluster I have a single column family that accounts for 90GB of load
 on each node. Not only that but column family is distributed over the
 entire ring.

 Clearly I'm misunderstanding something.

 Ian

 On Thu, Sep 1, 2011 at 1:17 PM, Yang tedd...@gmail.com wrote:
 when Cassandra reads, the entire CF is always read together, only at the
 hand-over to client does the pruning happens

 On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne dha...@gmx.3crowd.com
 wrote:

 I'm curious... digging through the source, it looks like replicate on
 write triggers a read of the entire row, and not just the
 columns/supercolumns that are affected by the counter update.  Is this
the
 case?  It would certainly explain why my inserts/sec decay over time and
why
 the average insert latency increases over time.  The strange thing is
that
 I'm not seeing disk read IO increase over that same period, but that
might
 be due to the OS buffer cache...

 On another note, on a 5-node cluster, I'm only seeing 3 nodes with
 ReplicateOnWrite Completed tasks in nodetool tpstats output.  Is that
 normal?  I'm using RandomPartitioner...

 Address DC  RackStatus State   Load
  OwnsToken

  136112946768375385385349842972707284580
 10.0.0.57datacenter1 rack1   Up Normal  2.26 GB
20.00%
  0
 10.0.0.56datacenter1 rack1   Up Normal  2.47 GB
20.00%
  34028236692093846346337460743176821145
 10.0.0.55datacenter1 rack1   Up Normal  2.52 GB
20.00%
  68056473384187692692674921486353642290
 10.0.0.54datacenter1 rack1   Up Normal  950.97 MB
20.00%
  102084710076281539039012382229530463435
 10.0.0.72datacenter1 rack1   Up Normal  383.25 MB
20.00%
  136112946768375385385349842972707284580

 The nodes with ReplicateOnWrites are the 3 in the middle.  The first
node
 and last node both have a count of 0.  This is a clean cluster, and I've
 been doing 3k ... 2.5k (decaying performance) inserts/sec for the last
12
 hours.  The last time this test ran, it went all the way down to 500
 inserts/sec before I killed it.



Bulk loader: Got an unknow host from describe_ring

2011-09-01 Thread Christopher Bottaro
Hello,

I'm trying to import data from one Cassandra cluster to another.  The old
cluster is using ports 7000 and 9160 and the new cluster is using 7001 and
9161.  I ran nodetool -h localhost snapshot on a node on the old cluster.
 I then downloaded apache-cassandra-0.8.4-bin.tar.gz, edited
conf/cassandra.yaml appropriately for the new cluster, exported
CASSANDRA_INCLUDE=bin/cassandra.sh.in (so it doesn't try to use the
installed, running Cassandra).

When I run sstableloader, I can see that it's connecting to the new cluster
(by tailing the logs), but after a few seconds, it gives the error: Got an
unknow host from describe_ring

After banging my head for a while, I scp'ed the snapshotted data to a node
on the *new* cluster.  I again downloaded apache-cassandra-0.8.4-bin.tar.gz
and configured cassandra.yaml there appropriately (with listen_address:
127.0.0.1 so as to not conflict with the Cassandra already running on the
node).

Running sstableloader resulted in the same error.

nodetool -h localhost ring shows a healthy cluster.  Running that command
works both locally and remotely.  I can connect to the cluster using
cassandra-cli both locally and remotely as well.

Any ideas?  Thanks for the help.


Re: Bulk loader: Got an unknow host from describe_ring

2011-09-01 Thread Jonathan Ellis
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044,
fixed for 0.8.5

On Thu, Sep 1, 2011 at 4:27 PM, Christopher Bottaro
cjbott...@onespot.com wrote:
 Hello,
 I'm trying to import data from one Cassandra cluster to another.  The old
 cluster is using ports 7000 and 9160 and the new cluster is using 7001 and
 9161.  I ran nodetool -h localhost snapshot on a node on the old cluster.
  I then downloaded apache-cassandra-0.8.4-bin.tar.gz, edited
 conf/cassandra.yaml appropriately for the new cluster, exported
 CASSANDRA_INCLUDE=bin/cassandra.sh.in (so it doesn't try to use the
 installed, running Cassandra).
 When I run sstableloader, I can see that it's connecting to the new cluster
 (by tailing the logs), but after a few seconds, it gives the error: Got an
 unknow host from describe_ring
 After banging my head for a while, I scp'ed the snapshotted data to a node
 on the *new* cluster.  I again downloaded apache-cassandra-0.8.4-bin.tar.gz
 and configured cassandra.yaml there appropriately (with listen_address:
 127.0.0.1 so as to not conflict with the Cassandra already running on the
 node).
 Running sstableloader resulted in the same error.
 nodetool -h localhost ring shows a healthy cluster.  Running that command
 works both locally and remotely.  I can connect to the cluster using
 cassandra-cli both locally and remotely as well.
 Any ideas?  Thanks for the help.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RE: Removal of old data files

2011-09-01 Thread hiroyuki.watanabe
Yes, I see files with name like
Orders-g-6517-Compacted

However, all of those file have a size of 0.

Starting from Monday to Thurseday we have 5642 files for -Data.db, -Filter.db 
and Statistics.db and only 128 -Compacted files.
and all of -Compacted file has size of 0.

Is this normal, or we are doing something wrong?


yuki



From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, August 25, 2011 6:13 PM
To: user@cassandra.apache.org
Subject: Re: Removal of old data files

If cassandra does not have enough disk space to create a new file it will 
provoke a JVM GC which should result in compacted SStables that are no longer 
needed been deleted. Otherwise they are deleted at some time in the future.

Compacted SSTables have a file written out with a compacted extension.

Do you see compacted sstables in the data directory?

Cheers.

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2011, at 2:29 AM, yuki watanabe wrote:


We are using Cassandra 0.8.0 with 8 node ring and only one CF.
Every column has TTL of 86400 (24 hours). we also set 'GC grace second' to 43200
(12 hours).  We have to store massive amount of data for one day now and 
eventually for five days if we get more disk space.
Even for one day, we do run out disk space in a busy day.

We run nodetool compact command at night or as necessary then we run GC from 
jconsole. We observed that  GC did remove files but not necessarily oldest ones.
Data files from more than 36 hours ago and quite often three days ago are still 
there.

Does this behavior expected or we need adjust some other parameters?


Yuki Watanabe


___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
___


Re: Updates lost

2011-09-01 Thread Paul Loy
Well, on windows Vista and below (haven't checked on 7),
System.currentTimeMillis only has around 10ms granularity. That is for any
10ms period, you get the same value. I develop on Windows and I'd get
sporadic integration test failures due to this.

On Thu, Sep 1, 2011 at 8:31 PM, Jeremiah Jordan 
jeremiah.jor...@morningstar.com wrote:

 Are you running on windows?  If the default timestamp is just using
 time.time()*1e6 you will get the same timestamp twice if the code is close
 together.  time.time() on windows is only millisecond resolution.  I don't
 use pycassa, but in the Thrift api wrapper I created for our python code I
 implemented the following function for getting timestamps:

 def GetTimeInMicroSec():

Returns the current time in microseconds, returned value always
 increases with each call.

:return: Current time in microseconds

newTime = long(time.time()*1e6)
try:
if GetTimeInMicroSec.lastTime = newTime:
newTime = GetTimeInMicroSec.lastTime + 1
except AttributeError:
pass
GetTimeInMicroSec.lastTime = newTime
return newTime


 On 08/29/2011 04:56 PM, Peter Schuller wrote:

 If the client sleeps for a few ms at each loop, the success rate
 increases. At 15 ms, the script always succeeds so far. Interestingly,
 the problem seems to be sensitive to alphabetical order. Updating the
 value from 'aaa' to 'bbb' never has problem. No pause needed.

 Is it possible the version of pycassa you're using does not guarantee
 that successive queries use non-identical and monotonically increasing
 timestamps? I'm just speculating, but if that is the case and two
 requests are sent with the same timestamp (due to resolution being
 lower than the time it takes between calls), the tie breaking would be
 the column value which jives with the fact that you're saying it seems
 to depend on the value.

 (I haven't checked current nor past versions of pycassa to determine
 if this is plausible. Just speculating.)




-- 
-
Paul Loy
p...@keteracel.com
http://uk.linkedin.com/in/paulloy


Fun with Heap Dump ...

2011-09-01 Thread Ian Danforth
All,

 I need help interpreting the results of my investigation. I'm
encountering this error: Unable to reduce heap usage since there are
no dirty column families. My heap sits near max and occasionally
OOMs. (4GB heap)

Following Mr. Ellis's instructions here:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Memory-Usage-During-Read-td6338635.html

I set the heap down to 1GB, restarted the node, watched the memory
climb in jconsole and waited for a heap dump. (Of course the first
time I tried this I got a permission denied error on writing out the
dump, and had to restart C* as root, but anyway ...)

Below you'll find a screen grab of the heap dump analysis.

http://screencast.com/t/U6IYzloe2McP

Here is what I see in cassandra.log just prior to OOM:

[ec2-user@ip-10-86-223-245 ~]$ tail -f /var/log/cassandra/cassandra.log
 INFO 22:37:11,193 Removing compacted SSTable files (see
http://wiki.apache.org/cassandra/MemtableSSTable)
 INFO 22:37:11,194 Removing compacted SSTable files (see
http://wiki.apache.org/cassandra/MemtableSSTable)
 INFO 22:37:11,195 Removing compacted SSTable files (see
http://wiki.apache.org/cassandra/MemtableSSTable)
 INFO 22:37:11,196 Removing compacted SSTable files (see
http://wiki.apache.org/cassandra/MemtableSSTable)
 INFO 22:37:11,212 Removing compacted SSTable files (see
http://wiki.apache.org/cassandra/MemtableSSTable)
 INFO 22:38:32,485 Opening /cassandra/data/Keyspace1/TwitterTest-g-5852
 INFO 22:38:33,253 Opening /cassandra/data/Keyspace1/TwitterTest-g-5502
 INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643
 INFO 22:38:35,653 Opening /cassandra/data/Keyspace1/TwitterTest-g-6117
 INFO 22:38:35,699 Opening /cassandra/data/Keyspace1/TwitterTest-g-1376
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor27]
...etc...

From my org.apache.cassandra.db MBean I get an estimate of 596,504,576 keys.


-

What I'd really like to know is:

1. What operation is C* performing during lines like these:

 INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643

(I think this is an SSTable it's extracting an index for this column
family from)

2. Has my CF index outgrown memory?

3. If so is there a way to relate # CF, # Columns, # Rows to index size?

I need to know how many keys I can store before I need more memory, or
need more nodes.


Thanks in advance. I've been getting a lot of help from the list and I
really appreciate it!

Ian


Re: Fun with Heap Dump ...

2011-09-01 Thread Jonathan Ellis
On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth idanfo...@numenta.com wrote:
 1. What operation is C* performing during lines like these:

  INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643

 (I think this is an SSTable it's extracting an index for this column
 family from)

Right.

 2. Has my CF index outgrown memory?

Yes.

 3. If so is there a way to relate # CF, # Columns, # Rows to index size?

Only the row index sample is kept in memory.  So it's the product of
key size * row count / index_interval.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Fun with Heap Dump ...

2011-09-01 Thread Ian Danforth
Awesome, thanks for the quick response!

Ian

On Thu, Sep 1, 2011 at 5:27 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth idanfo...@numenta.com wrote:
 1. What operation is C* performing during lines like these:

  INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643

 (I think this is an SSTable it's extracting an index for this column
 family from)

 Right.

 2. Has my CF index outgrown memory?

 Yes.

 3. If so is there a way to relate # CF, # Columns, # Rows to index size?

 Only the row index sample is kept in memory.  So it's the product of
 key size * row count / index_interval.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Limiting ColumnSlice range in second composite value

2011-09-01 Thread Anthony Ikeda
My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query
across the TimeUUIDs correctly, but now I want to also range across the UTF8
component. Is this possible?

UUID start = uuidForDate(new Date(1979, 1, 1));

UUID end = uuidForDate(new Date(Long.MAX_VALUE));

String startState = ;

String endState = ;

if (desiredState != null) {

mLog.debug(Restricting state to [ + desiredState.getValue() + ]);

startState = desiredState.getValue();

endState = desiredState.getValue().concat(_);

}



Composite startComp = new Composite(start, startState);

Composite endComp = new Composite(end, endState);

query.setRange(startComp, endComp, true, count);


So far I'm not seeing any effect setting my endState String value.


Anthony