Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Patrik Modesto
On Wed, Jan 26, 2011 at 08:58, Mck m...@apache.org wrote:
 You are correct that microseconds would be better but for the test it
 doesn't matter that much.

 Have you tried. I'm very new to cassandra as well, and always uncertain
 as to what to expect...

IMHO it's matter of use-case. In my use-case there is no possibility
for two (or more) processes to write/update the same key so
miliseconds are fine for me.

BTW how to get current time in microseconds in Java?

 As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..)
 won't this hurt performance? Normally i would _always_ agree that a
 defensive copy of an array/collection argument be stored, but has this
 intentionally not been done (or should it) because of large reduce jobs
 (millions of records) and the performance impact here.

The size of the queue is computed at runtime:
ColumnFamilyOutputFormat.QUEUE_SIZE, 32 *
Runtime.getRuntime().availableProcessors()
So the queue is not too large so I'd say the performance shouldn't get hurt.

 --
Patrik


Re: client threads locked up - JIRA ISSUE 1594

2011-01-26 Thread Arijit Mukherjee
I'm using the jars packed in Hector 0.6.0-19 (the one compatible with
Cassandra 0.6.*). I wanted to use hector, but for some reason I
haven't been able to do so yet. What I'm doing is a POC kind of thing,
and only if it works out properly, we'll go on to build on it.

The reason I asked this question in the first place was a high idle
cpu percentage. I'm currently doing the POC on an 8-core machine. I
have 8 client threads inserting data into Cassandra. But, most of the
time, I can see 40-45% user time, 15-20% system time and the rest idle
- for each core, even if I change the number of client threads
(increase or decrease). Then I used jstack on my Java application, and
the result was exactly similar to the JIRA issue 1594.

I was just wondering whether this can be the reason for idle CPU.
Because the same application was earlier tried with Lucene (to store
the indices) and we had about 90% CPU utilization. We've replaced
Lucene with Cassandra (to store the index and inverted index), and the
CPU utillization is down, the total time required went up by 5 folds
(for the same data set). We tried Cassandra directly as apparently
Lucandra is 10% slower than Cassandra...

Arijit

On 25 January 2011 20:30, Nate McCall n...@riptano.com wrote:
 What version of the Thrift API are you using?

 (In general, you should use an existing client library rather than
 rolling your own - I recommend Hector:
 https://github.com/rantav/hector).

 On Tue, Jan 25, 2011 at 12:38 AM, Arijit Mukherjee ariji...@gmail.com wrote:
 I'm using Cassandra 0.6.8. I'm not using Hector - it's just raw thrift APIs.

 Arijit

 On 21 January 2011 22:13, Nate McCall n...@riptano.com wrote:
 What versions of Cassandra and Hector? The versions mentioned on this
 ticket are both several releases behind.

 On Fri, Jan 21, 2011 at 3:53 AM, Arijit Mukherjee ariji...@gmail.com 
 wrote:
 Hi All

 I'm facing the same issue as this one mentioned here -
 https://issues.apache.org/jira/browse/CASSANDRA-1594

 Is there any solution or work-around for this?

 Regards
 Arijit


 --
 And when the night is cloudy,
 There is still a light that shines on me,
 Shine on until tomorrow, let it be.





 --
 And when the night is cloudy,
 There is still a light that shines on me,
 Shine on until tomorrow, let it be.





-- 
And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be.


Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Mck
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote:
 BTW how to get current time in microseconds in Java?

I'm using HFactory.clock() (from hector).

  As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..)
  won't this hurt performance? 
 
 The size of the queue is computed at runtime:
 ColumnFamilyOutputFormat.QUEUE_SIZE, 32 *
 Runtime.getRuntime().availableProcessors()
 So the queue is not too large so I'd say the performance shouldn't get hurt. 

This is only the default.
I'm running w/ 8. Testing have given this the best throughput for me
when processing 25+ million rows...

In the end it is still 25+ million .clone(..) calls. 

 The key isn't the only potential live byte[]. You also have names and
 values in all the columns (and supercolumns) for all the mutations.

Now make that over a billion .clone(..) calls... :-(

byte[] copies are relatively quick and cheap, still i am seeing a
performance degradation in m/r reduce performance with cloning of keys.
It's not that you don't have my vote here, i'm just stating my
uncertainty on what the correct API should be.

~mck


signature.asc
Description: This is a digitally signed message part


Re: Files not deleted after compaction and GCed

2011-01-26 Thread Ching-Cheng Chen
It's a bug.

In SSTableDeletingReference, it try this operation

components.remove(Component.DATA);

before

STable.delete(desc, components);

However, the components was reference to the components object which was
created inside SSTable by

this.components = Collections.unmodifiableSet(dataComponents);

As you can see, you can't try the remove operation on that componets object.

If I add a try block and output exception around the
components.remove(Component.DATA), I got this.

java.lang.UnsupportedOperationException
at java.util.Collections$UnmodifiableCollection.remove(Unknown
Source)
at
org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
Source)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at
org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Regards,

Chen

On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote:

 the other component types are deleted by this line:

SSTable.delete(desc, components);

 On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen
 cc...@evidentsoftware.com wrote:
  Nope, no exception at all.
  But if the same class
  (org.apache.cassandra.io.sstable.SSTableDeletingReference) is responsible
  for delete other files, then that's not right.
  I checked the source code for SSTableDeletingReference, doesn't looks
 like
  it will delete other files type.
  Regards,
  Chen
 
  On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  No, that is not expected.  All the sstable components are removed in
  the same method; did you check the log for exceptions?
 
  On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen
  cc...@evidentsoftware.com wrote:
   Using cassandra 0.7.0
   The class org.apache.cassandra.io.sstable.SSTableDeletingReference
 only
   remove the -Data.db file, but leave the xxx-Compacted,
   xxx-Filter.db,
   xxx-Index.db and xxx-Statistics.db intact.
   And that's the behavior I saw.I ran manual compact then trigger a
 GC
   from jconsole.   The Data.db file got removed but not the others.
   Is this the expected behavior?
   Regards,
   Chen
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: the java client problem

2011-01-26 Thread Ashish
click on the loadSchema() button in right panel :)

2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.com

  I had find the loasschemafrom yaml  by jconsole,How to load the schema ?



 *From:* Ashish [mailto:paliwalash...@gmail.com]
 *Sent:* Friday, January 21, 2011 8:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: the java client problem



 check cassandra-install-dir/conf/cassandra.yaml



 start cassandra

 connect via jconsole

 find MBeans - org.apache.cassandra.db - 
 StorageServicehttp://wiki.apache.org/cassandra/StorageService -
 Operations - loadSchemaFromYAML



 load the schema

 and then try the example again.



 HTH

 ashish



 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com

 Which schema is it?

 *From:* Ashish [mailto:paliwalash...@gmail.com]
 *Sent:* Friday, January 21, 2011 7:57 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: the java client problem



 you are missing the column family in your keyspace.



 If you are using the default definitions of schema shipped with cassandra,
 ensure to load the schema from JMX.



 thanks

 ashish

 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com

 I exec the code as below by hector client:



 *package* com.riptano.cassandra.hector.example;

 *import* me.prettyprint.cassandra.serializers.StringSerializer;

 *import* me.prettyprint.hector.api.Cluster;

 *import* me.prettyprint.hector.api.Keyspace;

 *import* me.prettyprint.hector.api.beans.HColumn;

 *import* me.prettyprint.hector.api.exceptions.HectorException;

 *import* me.prettyprint.hector.api.factory.HFactory;

 *import* me.prettyprint.hector.api.mutation.Mutator;

 *import* me.prettyprint.hector.api.query.ColumnQuery;

 *import* me.prettyprint.hector.api.query.QueryResult;



 *public* *class* InsertSingleColumn {

 *private* *static* StringSerializer *stringSerializer* = StringSerializer.
 *get*();



 *public* *static* *void* main(String[] args) *throws* Exception {

 Cluster cluster = HFactory.*getOrCreateCluster*(TestCluster,
 *.*.*.*:9160);



 Keyspace keyspaceOperator = HFactory.*createKeyspace*(Shandy,
 cluster);



 * try* {

 MutatorString mutator = 
 HFactory.*createMutator*(keyspaceOperator,
 StringSerializer.*get*());

 mutator.insert(jsmith, Standard1, HFactory.*
 createStringColumn*(first, John));



 ColumnQueryString, String, String columnQuery = HFactory.*
 createStringColumnQuery*(keyspaceOperator);


 columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first);

 QueryResultHColumnString, String result =
 columnQuery.execute();



 System.*out*.println(Read HColumn from cassandra:  +
 result.get());

 System.*out*.println(Verify on CLI with:  get
 Keyspace1.Standard1['jsmith'] );



 } *catch* (HectorException e) {

 e.printStackTrace();

 }

 cluster.getConnectionManager().shutdown();

 }



 }



 And it shows the error :



 *me.prettyprint.hector.api.exceptions.HInvalidRequestException*:
 InvalidRequestException(why:unconfigured columnfamily Standard1)

   at
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(*
 ExceptionsTranslatorImpl.java:42*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(*
 KeyspaceServiceImpl.java:95*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(*
 KeyspaceServiceImpl.java:88*)

   at me.prettyprint.cassandra.service.Operation.executeAndSetResult(*
 Operation.java:89*)

   at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(
 *HConnectionManager.java:142*)

   at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(*
 KeyspaceServiceImpl.java:129*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(
 *KeyspaceServiceImpl.java:100*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(
 *KeyspaceServiceImpl.java:106*)

   at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(*
 MutatorImpl.java:149*)

   at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(*
 MutatorImpl.java:146*)

   at
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(
 *KeyspaceOperationCallback.java:20*)

  at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(*
 ExecutingKeyspace.java:65*)

   at me.prettyprint.cassandra.model.MutatorImpl.execute(*
 MutatorImpl.java:146*)

   at me.prettyprint.cassandra.model.MutatorImpl.insert(*
 MutatorImpl.java:55*)

   at com.riptano.cassandra.hector.example.InsertSingleColumn.main(*
 InsertSingleColumn.java:21*)

 Caused by: InvalidRequestException(why:unconfigured columnfamily Standard1)

   at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(*
 Cassandra.java:16477*)

   at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(*
 Cassandra.java:916*)

   at 

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Jonathan Ellis
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever m...@apache.org wrote:
 Well your key is a mutable Text object, so i can see some possibility
 depending on how hadoop uses these objects.

Yes, that's it exactly.  We recently fixed a bug in the demo
word_count program for this. Now we do
ByteBuffer.wrap(Arrays.copyOf(text.getBytes(), text.getLength())).

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

I was moving a node and at some point it started streaming data to 2 other
nodes. Later, that node keeled over and let's assume I can't fix it for the
next 3 days and just want to move tokens on the remaining three to even out
and see if I can live with it.

But I can't do that! The node that was on the receiving end of the stream
refuses to move, because it's still receiving.

What do I do?

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Files not deleted after compaction and GCed

2011-01-26 Thread Jonathan Ellis
Thanks for tracking that down!  Created
https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix.

On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen
cc...@evidentsoftware.com wrote:
 It's a bug.
 In SSTableDeletingReference, it try this operation
 components.remove(Component.DATA);
 before
 STable.delete(desc, components);
 However, the components was reference to the components object which was
 created inside SSTable by
 this.components = Collections.unmodifiableSet(dataComponents);
 As you can see, you can't try the remove operation on that componets object.
 If I add a try block and output exception around the
 components.remove(Component.DATA), I got this.
 java.lang.UnsupportedOperationException
         at java.util.Collections$UnmodifiableCollection.remove(Unknown
 Source)
         at
 org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103)
         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)
         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
         at java.util.concurrent.FutureTask.run(Unknown Source)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
 Source)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
         at
 org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
         at java.lang.Thread.run(Unknown Source)
 Regards,
 Chen
 On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote:

 the other component types are deleted by this line:

            SSTable.delete(desc, components);

 On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen
 cc...@evidentsoftware.com wrote:
  Nope, no exception at all.
  But if the same class
  (org.apache.cassandra.io.sstable.SSTableDeletingReference) is
  responsible
  for delete other files, then that's not right.
  I checked the source code for SSTableDeletingReference, doesn't looks
  like
  it will delete other files type.
  Regards,
  Chen
 
  On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  No, that is not expected.  All the sstable components are removed in
  the same method; did you check the log for exceptions?
 
  On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen
  cc...@evidentsoftware.com wrote:
   Using cassandra 0.7.0
   The class org.apache.cassandra.io.sstable.SSTableDeletingReference
   only
   remove the -Data.db file, but leave the xxx-Compacted,
   xxx-Filter.db,
   xxx-Index.db and xxx-Statistics.db intact.
   And that's the behavior I saw.    I ran manual compact then trigger a
   GC
   from jconsole.   The Data.db file got removed but not the others.
   Is this the expected behavior?
   Regards,
   Chen
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Files not deleted after compaction and GCed

2011-01-26 Thread Jonathan Ellis
Patch submitted.

One thing I still don't understand is why
RetryingScheduledThreadPoolExecutor isn't firing the
DefaultUncaughtExceptionHandler, which should have logged that
exception.

On Wed, Jan 26, 2011 at 9:41 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Thanks for tracking that down!  Created
 https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix.

 On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen
 cc...@evidentsoftware.com wrote:
 It's a bug.
 In SSTableDeletingReference, it try this operation
 components.remove(Component.DATA);
 before
 STable.delete(desc, components);
 However, the components was reference to the components object which was
 created inside SSTable by
 this.components = Collections.unmodifiableSet(dataComponents);
 As you can see, you can't try the remove operation on that componets object.
 If I add a try block and output exception around the
 components.remove(Component.DATA), I got this.
 java.lang.UnsupportedOperationException
         at java.util.Collections$UnmodifiableCollection.remove(Unknown
 Source)
         at
 org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103)
         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)
         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
         at java.util.concurrent.FutureTask.run(Unknown Source)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
 Source)
         at
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
         at
 org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
         at java.lang.Thread.run(Unknown Source)
 Regards,
 Chen
 On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote:

 the other component types are deleted by this line:

            SSTable.delete(desc, components);

 On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen
 cc...@evidentsoftware.com wrote:
  Nope, no exception at all.
  But if the same class
  (org.apache.cassandra.io.sstable.SSTableDeletingReference) is
  responsible
  for delete other files, then that's not right.
  I checked the source code for SSTableDeletingReference, doesn't looks
  like
  it will delete other files type.
  Regards,
  Chen
 
  On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
 
  No, that is not expected.  All the sstable components are removed in
  the same method; did you check the log for exceptions?
 
  On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen
  cc...@evidentsoftware.com wrote:
   Using cassandra 0.7.0
   The class org.apache.cassandra.io.sstable.SSTableDeletingReference
   only
   remove the -Data.db file, but leave the xxx-Compacted,
   xxx-Filter.db,
   xxx-Index.db and xxx-Statistics.db intact.
   And that's the behavior I saw.    I ran manual compact then trigger a
   GC
   from jconsole.   The Data.db file got removed but not the others.
   Is this the expected behavior?
   Regards,
   Chen
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Files not deleted after compaction and GCed

2011-01-26 Thread Ching-Cheng Chen
I think this might be what happening.

Since you are using ScheduledThreadPoolExecutor.schedule(), the exception
was swallowed by the FutureTask.

You will have to perform a get() method on the ScheduledFuture, and you will
get ExecutionException if there was any exception occured in run().

Regards,

Chen

On Wed, Jan 26, 2011 at 10:50 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Patch submitted.

 One thing I still don't understand is why
 RetryingScheduledThreadPoolExecutor isn't firing the
 DefaultUncaughtExceptionHandler, which should have logged that
 exception.

 On Wed, Jan 26, 2011 at 9:41 AM, Jonathan Ellis jbel...@gmail.com wrote:
  Thanks for tracking that down!  Created
  https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix.
 
  On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen
  cc...@evidentsoftware.com wrote:
  It's a bug.
  In SSTableDeletingReference, it try this operation
  components.remove(Component.DATA);
  before
  STable.delete(desc, components);
  However, the components was reference to the components object which was
  created inside SSTable by
  this.components = Collections.unmodifiableSet(dataComponents);
  As you can see, you can't try the remove operation on that componets
 object.
  If I add a try block and output exception around the
  components.remove(Component.DATA), I got this.
  java.lang.UnsupportedOperationException
  at java.util.Collections$UnmodifiableCollection.remove(Unknown
  Source)
  at
 
 org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103)
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
  Source)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at
 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown
  Source)
  at
 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
  Source)
  at
 
 org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
  Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
  Source)
  at java.lang.Thread.run(Unknown Source)
  Regards,
  Chen
  On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  the other component types are deleted by this line:
 
 SSTable.delete(desc, components);
 
  On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen
  cc...@evidentsoftware.com wrote:
   Nope, no exception at all.
   But if the same class
   (org.apache.cassandra.io.sstable.SSTableDeletingReference) is
   responsible
   for delete other files, then that's not right.
   I checked the source code for SSTableDeletingReference, doesn't looks
   like
   it will delete other files type.
   Regards,
   Chen
  
   On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   No, that is not expected.  All the sstable components are removed in
   the same method; did you check the log for exceptions?
  
   On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen
   cc...@evidentsoftware.com wrote:
Using cassandra 0.7.0
The class org.apache.cassandra.io.sstable.SSTableDeletingReference
only
remove the -Data.db file, but leave the xxx-Compacted,
xxx-Filter.db,
xxx-Index.db and xxx-Statistics.db intact.
And that's the behavior I saw.I ran manual compact then
 trigger a
GC
from jconsole.   The Data.db file got removed but not the others.
Is this the expected behavior?
Regards,
Chen
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of DataStax, the source for professional Cassandra
 support
   http://www.datastax.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
Hi All,

I was able to run contrib/stress at a very impressive throughput. Single
threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
Multithreaded client was able to pump 7,000 inserts per second with 7ms latency.

Thank you very much for your help!

Oleg




Re: Stress test inconsistencies

2011-01-26 Thread Jonathan Shook
Would you share with us the changes you made, or problems you found?

On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote:
 Hi All,

 I was able to run contrib/stress at a very impressive throughput. Single
 threaded client was able to pump 2,000 inserts per second with 0.4 ms latency.
 Multithreaded client was able to pump 7,000 inserts per second with 7ms 
 latency.

 Thank you very much for your help!

 Oleg





RE: Repair on single CF not working (0.7)

2011-01-26 Thread Dan Hendry
After some bad experiences in the past using non-release versions, I am a
little hesitant. Which nodes would the new code have to be deployed to in
order to test? If it is just one of the three, I might be willing if I need
to repair again.

 

Dan

 

From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: January-24-11 19:19
To: user@cassandra.apache.org
Subject: Re: Repair on single CF not working (0.7)

 

 

On Mon, Jan 24, 2011 at 4:15 PM, Dan Hendry dan.hendry.j...@gmail.com
wrote:

I am trying to repair a single CF using nodetool. It seems like the request
to limit the repair to one CF is not being respected. Here is my current
situation:

-  Run nodetool repair  KEYSPACE   CF_A on node 3

-  Validation compaction runs on nodes 2,3,4 for CF_A only
(expected)

-  Node 3 streams SSTables from CF_A only to nodes 2 and 4
(expected)

-  Nodes 2 and 4 stream SSTables from ALL column families in the
keyspace to node 2 (VERY unexpected)

-  Node 2 runs out of disk space before SSTable rebuild for all cfs
can complete.

 

Presumably this is a bug (?). This is the first time in quite awhile that I
have run a repair (I don't perform deletes, just use expiring columns). I
have included pertinent log entries below:

 

 

Log entry from node 3 after running out of disk space:

 

INFO [Thread-3462] 2011-01-24 16:00:40,658 StreamInSession.java (line 124)
Streaming of file
/var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1313-Data.db/(0,28295
156514)

 progress=15267921920/28295156514 - 53% from
org.apache.cassandra.streaming.StreamInSession@8df7e0c failed: requesting a
retry.

INFO [Thread-3423] 2011-01-24 16:00:40,658 StreamInSession.java (line 124)
Streaming of file
/var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1348-Data.db/(2953062
0905,58066315307)

 progress=18898300928/28535694402 - 66% from
org.apache.cassandra.streaming.StreamInSession@8df7e0c failed: requesting a
retry.

 

Can you test this against the 0.7 branch?  There were some bugs fixed in
StreamInSession that may be related
(https://issues.apache.org/jira/browse/CASSANDRA-1992)

 

-Brandon

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3398 - Release Date: 01/24/11
02:35:00



Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
I returned to periodic commit log fsync.


Jonathan Shook jshook at gmail.com writes:

 
 Would you share with us the changes you made, or problems you found?
 




Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
I have set up a new installation of Cassandra, and have it running with
no problems (0.7.0)

 

Using CLI I added a new keyspace, and column family.

 

When I set a value for a column I get Value Inserted

 

However, when I get the column value it is a number, even though the
Column Family is of Bytes Type:

Keyspace: XXX:

  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy

Replication Factor: 1

  Column Families:

ColumnFamily: Y

  Columns sorted by: org.apache.cassandra.db.marshal.BytesType

  Row cache size / save period: 0.0/0

  Key cache size / save period: 20.0/3600

  Memtable thresholds: 0.0703125/15/60

  GC grace seconds: 864000

  Compaction min/max thresholds: 4/32

  Read repair chance: 1.0

  Built indexes: []

 

 

Anyone else had this happen?

Did I just miss something stupid?

I have not had any issues with earlier versions of Cassandra.

 

 

David Q

 

 

David Quattlebaum

MedProcure, LLC

www.medprocure.com

(864)482-2018 - Support

(864)482-2019 - Direct

 



My new nemesis: EOFException (0.7.0)

2011-01-26 Thread Dan Hendry
I am having yet another issue on one of my Cassandra nodes. Last night, one
of my nodes ran out of memory and crashed after flooding the logs with the
same type of errors I am seeing below. After restarting, they are popping up
again. My solution has been to drop the consistency from ALL to ONE for the
query which seems to be causing this problem so my service using Cassandra
starts working again but its a terrible solution at best. Is there any
thought as to what the root cause of this issue is or thoughts on how to fix
it?

 

These errors seem to be popping up when reading from the same column family
which I have been having other problems with.  Recently, I drained the node
and shutdown, deleted all on disk files for this column family, then ran a
repair (which caused the node to run out of disk space as I have detailed in
a previous email). Could the repair somehow have corrupted the nodes new
data? Why is this error only appearing on one node given the data now is
nearly guaranteed to have come from another replica? I have triple checked
and all nodes are running the same, release version of 0.7

 

Are there any suggestions for tools to check over a systems hardware (Ubuntu
10.04)? SMART info for the disk shows nothing alarming and there is nothing
in /var/log/messages. 

 

 

ERROR [ReadStage:10] 2011-01-26 12:15:59,607
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.RuntimeException: java.io.EOFException

at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:124)

at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:47)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS
liceIterator.java:108)

at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter
ator.java:283)

at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt
erator.java:326)

at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte
rator.java:230)

at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:68)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ
ueryFilter.java:118)

at
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte
r.java:142)

at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1230)

at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1107)

at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1077)

at org.apache.cassandra.db.Table.getRow(Table.java:384)

at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav
a:63)

at
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)

at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.EOFException

at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)

at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)

at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:9
4)

at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
64)

at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3
13)

at
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche
r.getNextBlock(IndexedSliceReader.java:180)

at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:119)

... 22 more

ERROR [ReadStage:10] 2011-01-26 12:15:59,608 AbstractCassandraDaemon.java
(line 91) Fatal exception in thread Thread[ReadStage:10,5,main]

java.lang.RuntimeException: java.io.EOFException

at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:124)

at
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe
dSliceReader.java:47)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at

RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
I'm very (2 days) new to Cassandra, but what does the output look like?

Total shot in the dark, if the number is less than 256 would it not
look the same as bytes or a number?

Hope that in some way helps...

Bill-

From: David Quattlebaum [mailto:dquat...@medprocure.com]
Sent: Wednesday, January 26, 2011 3:25 PM
To: user@cassandra.apache.org
Subject: Probelms with Set on Byte type New Installation

I have set up a new installation of Cassandra, and have it running
with no problems (0.7.0)

Using CLI I added a new keyspace, and column family.

When I set a value for a column I get “Value Inserted”

However, when I get the column value it is a number, even though the
Column Family is of Bytes Type:
Keyspace: XXX:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 1
  Column Families:
ColumnFamily: Y
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 0.0703125/15/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []


Anyone else had this happen?
Did I just miss something stupid?
I have not had any issues with earlier versions of Cassandra.


David Q


David Quattlebaum
MedProcure, LLC
www.medprocure.com
(864)482-2018 - Support
(864)482-2019 - Direct


RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
Nope, I should be getting back the String values that were inserted:

[default@TestKeyspace] get custparent['David'];
= (column=4164647265737331, value=333038204279205061737320313233,
timestamp=1296071732281000)
= (column=43697479, value=53656e656361, timestamp=129607174731)
= (column=4e616d65, value=546f6d7320466163696c697479,
timestamp=1296071708189000)
= (column=506f7374616c436f6465, value=3239363738,
timestamp=1296071774549000)
= (column=537461746550726f76, value=5343, timestamp=1296071760213000)
Returned 5 results.

Values should be Name and Address Values.

-David Q

-Original Message-
From: Bill Speirs [mailto:bill.spe...@gmail.com] 
Sent: Wednesday, January 26, 2011 3:45 PM
To: user@cassandra.apache.org
Subject: RE: Probelms with Set on Byte type New Installation

I'm very (2 days) new to Cassandra, but what does the output look like?

Total shot in the dark, if the number is less than 256 would it not
look the same as bytes or a number?

Hope that in some way helps...

Bill-

From: David Quattlebaum [mailto:dquat...@medprocure.com]
Sent: Wednesday, January 26, 2011 3:25 PM
To: user@cassandra.apache.org
Subject: Probelms with Set on Byte type New Installation

I have set up a new installation of Cassandra, and have it running
with no problems (0.7.0)

Using CLI I added a new keyspace, and column family.

When I set a value for a column I get Value Inserted

However, when I get the column value it is a number, even though the
Column Family is of Bytes Type:
Keyspace: XXX:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 1
  Column Families:
ColumnFamily: Y
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 0.0703125/15/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []


Anyone else had this happen?
Did I just miss something stupid?
I have not had any issues with earlier versions of Cassandra.


David Q


David Quattlebaum
MedProcure, LLC
www.medprocure.com
(864)482-2018 - Support
(864)482-2019 - Direct


Schema Design

2011-01-26 Thread Bill Speirs
I'm looking to use Cassandra to store log messages from various
systems. A log message only has a message (UTF8Type) and a data/time.
My thought is to create a column family for each system. The row key
will be a TimeUUIDType. Each row will have 7 columns: year, month,
day, hour, minute, second, and message. I then have indexes setup for
each of the date/time columns.

I was hoping this would allow me to answer queries like: What are all
the log messages that were generated between X  Y? The problem is
that I can ONLY use the equals operator on these column values. For
example, I cannot issuing: get system_x where month  1; gives me this
error: No indexed columns present in index clause with operator EQ.
The equals operator works as expected though: get system_x where month
= 1;

What schema would allow me to get date ranges?

Thanks in advance...

Bill-

* ColumnFamily description *
ColumnFamily: system_x_msg
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 1.1671875/249/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
proj_1_msg.7365636f6e64, proj_1_msg.79656172]
  Column Metadata:
Column Name: year (year)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: month (month)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: second (second)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: minute (minute)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: hour (hour)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: day (day)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS


Re: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
Why would you expect strings? You stated that your comparator is
BytesType. If you set the default_validation_class then you can
specify what types the values should be returned as:

[default@Devel] create column family david with comparator=BytesType
and default_validation_class=UTF8Type;
2dabf0fb-298f-11e0-b177-e700f669bcfc
[default@Devel] set david['david']['test'] = 'test';
Value inserted.
[default@Devel] get david['david'];
= (column=74657374, value=test, timestamp=129607562467)
Returned 1 results.

Now the column name is returned as the ASCII characters for 'test' and
the value is returned as a string because of the
default_validation_class. That seems to make sense to me. If in the
next column you want to sore a number you must store it as such and
return it as such:

[default@Devel] set david['david']['id'] = 37;
Value inserted.
[default@Devel] get david['david']['id'] as integer;
= (column=6964, value=13111, timestamp=129607582033)

Didn't work because it was inserted as a string not a number/integer.
However, if you specify it's a number on the way it, it will return
properly:

[default@Devel] set david['david']['id'] = integer(37);
Value inserted.
[default@Devel] get david['david']['id'] as integer;
= (column=6964, value=37, timestamp=1296075929082000)

Hope that helps... again, I'm new to this so maybe I'm not
understanding your question.

Bill-

On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum
dquat...@medprocure.com wrote:
 Nope, I should be getting back the String values that were inserted:

 [default@TestKeyspace] get custparent['David'];
 = (column=4164647265737331, value=333038204279205061737320313233,
 timestamp=1296071732281000)
 = (column=43697479, value=53656e656361, timestamp=129607174731)
 = (column=4e616d65, value=546f6d7320466163696c697479,
 timestamp=1296071708189000)
 = (column=506f7374616c436f6465, value=3239363738,
 timestamp=1296071774549000)
 = (column=537461746550726f76, value=5343, timestamp=1296071760213000)
 Returned 5 results.

 Values should be Name and Address Values.

 -David Q

 -Original Message-
 From: Bill Speirs [mailto:bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 3:45 PM
 To: user@cassandra.apache.org
 Subject: RE: Probelms with Set on Byte type New Installation

 I'm very (2 days) new to Cassandra, but what does the output look like?

 Total shot in the dark, if the number is less than 256 would it not
 look the same as bytes or a number?

 Hope that in some way helps...

 Bill-

 From: David Quattlebaum [mailto:dquat...@medprocure.com]
 Sent: Wednesday, January 26, 2011 3:25 PM
 To: user@cassandra.apache.org
 Subject: Probelms with Set on Byte type New Installation

 I have set up a new installation of Cassandra, and have it running
 with no problems (0.7.0)

 Using CLI I added a new keyspace, and column family.

 When I set a value for a column I get Value Inserted

 However, when I get the column value it is a number, even though the
 Column Family is of Bytes Type:
 Keyspace: XXX:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: Y
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period: 0.0/0
      Key cache size / save period: 20.0/3600
      Memtable thresholds: 0.0703125/15/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []


 Anyone else had this happen?
 Did I just miss something stupid?
 I have not had any issues with earlier versions of Cassandra.


 David Q


 David Quattlebaum
 MedProcure, LLC
 www.medprocure.com
 (864)482-2018 - Support
 (864)482-2019 - Direct



Re: Schema Design

2011-01-26 Thread David McNelis
I would say in that case you might want  to try a  single column family
where the key to the column is the system name.

Then, you could name your columns as the timestamp.  Then when retrieving
information from the data store you can can, in your slice request, specify
your start column as  X and end  column as Y.

Then you can use the stored column name to know when an event  occurred.

On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote:

 I'm looking to use Cassandra to store log messages from various
 systems. A log message only has a message (UTF8Type) and a data/time.
 My thought is to create a column family for each system. The row key
 will be a TimeUUIDType. Each row will have 7 columns: year, month,
 day, hour, minute, second, and message. I then have indexes setup for
 each of the date/time columns.

 I was hoping this would allow me to answer queries like: What are all
 the log messages that were generated between X  Y? The problem is
 that I can ONLY use the equals operator on these column values. For
 example, I cannot issuing: get system_x where month  1; gives me this
 error: No indexed columns present in index clause with operator EQ.
 The equals operator works as expected though: get system_x where month
 = 1;

 What schema would allow me to get date ranges?

 Thanks in advance...

 Bill-

 * ColumnFamily description *
ColumnFamily: system_x_msg
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 1.1671875/249/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
 proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
 proj_1_msg.7365636f6e64, proj_1_msg.79656172]
  Column Metadata:
Column Name: year (year)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: month (month)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: second (second)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: minute (minute)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: hour (hour)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: day (day)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS




-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: Schema Design

2011-01-26 Thread buddhasystem

Having separate columns for Year, Month etc seems redundant. It's tons more
efficient to keep say UTC time in POSIX format (basically integer). It's
easy to convert back and forth.

If you want to get a range of dates, in that case you might use Order
Preserving Partitioner, and sort out which systems logged later in client.
Read up on consequences of using OPP.

Whether to shard data as per system depends on how many you have. If more
than a few, don't do that, there are memory considerations.

Cheers

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

Bump. I still don't know what is the best things to do, plz help.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Schema Design

2011-01-26 Thread Bill Speirs
I like this approach, but I have 2 questions:

1) what is the implications of continually adding columns to a single
row? I'm unsure how Cassandra is able to grow. I realize you can have
a virtually infinite number of columns, but what are the implications
of growing the number of columns over time?

2) maybe it's just a restriction of the CLI, but how do I do issue a
slice request? Also, what if start (or end) columns don't exist? I'm
guessing it's smart enough to get the columns in that range.

Thanks!

Bill-

On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
dmcne...@agentisenergy.com wrote:
 I would say in that case you might want  to try a  single column family
 where the key to the column is the system name.
 Then, you could name your columns as the timestamp.  Then when retrieving
 information from the data store you can can, in your slice request, specify
 your start column as  X and end  column as Y.
 Then you can use the stored column name to know when an event  occurred.

 On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote:

 I'm looking to use Cassandra to store log messages from various
 systems. A log message only has a message (UTF8Type) and a data/time.
 My thought is to create a column family for each system. The row key
 will be a TimeUUIDType. Each row will have 7 columns: year, month,
 day, hour, minute, second, and message. I then have indexes setup for
 each of the date/time columns.

 I was hoping this would allow me to answer queries like: What are all
 the log messages that were generated between X  Y? The problem is
 that I can ONLY use the equals operator on these column values. For
 example, I cannot issuing: get system_x where month  1; gives me this
 error: No indexed columns present in index clause with operator EQ.
 The equals operator works as expected though: get system_x where month
 = 1;

 What schema would allow me to get date ranges?

 Thanks in advance...

 Bill-

 * ColumnFamily description *
    ColumnFamily: system_x_msg
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 0.0/0
      Key cache size / save period: 20.0/3600
      Memtable thresholds: 1.1671875/249/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
 proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
 proj_1_msg.7365636f6e64, proj_1_msg.79656172]
      Column Metadata:
        Column Name: year (year)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS
        Column Name: month (month)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS
        Column Name: second (second)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS
        Column Name: minute (minute)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS
        Column Name: hour (hour)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS
        Column Name: day (day)
          Validation Class: org.apache.cassandra.db.marshal.IntegerType
          Index Type: KEYS



 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 o: 630.359.6395
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.




RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
Bill, 

You are absolutely correct, I must not have set the default_validation_class 
when I added the column family.

Thanks what I get for continuing to work late into the night!

Thanks,

DQ Less stupid next time

-Original Message-
From: Bill Speirs [mailto:bill.spe...@gmail.com] 
Sent: Wednesday, January 26, 2011 4:09 PM
To: user@cassandra.apache.org
Subject: Re: Probelms with Set on Byte type New Installation

Why would you expect strings? You stated that your comparator is
BytesType. If you set the default_validation_class then you can
specify what types the values should be returned as:

[default@Devel] create column family david with comparator=BytesType
and default_validation_class=UTF8Type;
2dabf0fb-298f-11e0-b177-e700f669bcfc
[default@Devel] set david['david']['test'] = 'test';
Value inserted.
[default@Devel] get david['david'];
= (column=74657374, value=test, timestamp=129607562467)
Returned 1 results.

Now the column name is returned as the ASCII characters for 'test' and
the value is returned as a string because of the
default_validation_class. That seems to make sense to me. If in the
next column you want to sore a number you must store it as such and
return it as such:

[default@Devel] set david['david']['id'] = 37;
Value inserted.
[default@Devel] get david['david']['id'] as integer;
= (column=6964, value=13111, timestamp=129607582033)

Didn't work because it was inserted as a string not a number/integer.
However, if you specify it's a number on the way it, it will return
properly:

[default@Devel] set david['david']['id'] = integer(37);
Value inserted.
[default@Devel] get david['david']['id'] as integer;
= (column=6964, value=37, timestamp=1296075929082000)

Hope that helps... again, I'm new to this so maybe I'm not
understanding your question.

Bill-

On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum
dquat...@medprocure.com wrote:
 Nope, I should be getting back the String values that were inserted:

 [default@TestKeyspace] get custparent['David'];
 = (column=4164647265737331, value=333038204279205061737320313233,
 timestamp=1296071732281000)
 = (column=43697479, value=53656e656361, timestamp=129607174731)
 = (column=4e616d65, value=546f6d7320466163696c697479,
 timestamp=1296071708189000)
 = (column=506f7374616c436f6465, value=3239363738,
 timestamp=1296071774549000)
 = (column=537461746550726f76, value=5343, timestamp=1296071760213000)
 Returned 5 results.

 Values should be Name and Address Values.

 -David Q

 -Original Message-
 From: Bill Speirs [mailto:bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 3:45 PM
 To: user@cassandra.apache.org
 Subject: RE: Probelms with Set on Byte type New Installation

 I'm very (2 days) new to Cassandra, but what does the output look like?

 Total shot in the dark, if the number is less than 256 would it not
 look the same as bytes or a number?

 Hope that in some way helps...

 Bill-

 From: David Quattlebaum [mailto:dquat...@medprocure.com]
 Sent: Wednesday, January 26, 2011 3:25 PM
 To: user@cassandra.apache.org
 Subject: Probelms with Set on Byte type New Installation

 I have set up a new installation of Cassandra, and have it running
 with no problems (0.7.0)

 Using CLI I added a new keyspace, and column family.

 When I set a value for a column I get Value Inserted

 However, when I get the column value it is a number, even though the
 Column Family is of Bytes Type:
 Keyspace: XXX:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: Y
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period: 0.0/0
      Key cache size / save period: 20.0/3600
      Memtable thresholds: 0.0703125/15/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []


 Anyone else had this happen?
 Did I just miss something stupid?
 I have not had any issues with earlier versions of Cassandra.


 David Q


 David Quattlebaum
 MedProcure, LLC
 www.medprocure.com
 (864)482-2018 - Support
 (864)482-2019 - Direct



Re: Schema Design

2011-01-26 Thread Bill Speirs
I have a basic understanding of OPP... if most of my messages come
within a single hour then a few nodes could be storing all of my
values, right?

You totally lost me on, whether to shard data as per system... Is my
schema (one column family per system, and row keys as TimeUUIDType)
sharding by system? I thought -- probably incorrectly -- that the row
keys are used in the sharding process, not column families.

Thanks...

Bill-

On Wed, Jan 26, 2011 at 4:17 PM, buddhasystem potek...@bnl.gov wrote:

 Having separate columns for Year, Month etc seems redundant. It's tons more
 efficient to keep say UTC time in POSIX format (basically integer). It's
 easy to convert back and forth.

 If you want to get a range of dates, in that case you might use Order
 Preserving Partitioner, and sort out which systems logged later in client.
 Read up on consequences of using OPP.

 Whether to shard data as per system depends on how many you have. If more
 than a few, don't do that, there are memory considerations.

 Cheers

 Maxim

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
No worries... it forced me to setup an env to test my understanding.
I'm still trying to learn/understand.

Bill-

On Wed, Jan 26, 2011 at 4:23 PM, David Quattlebaum
dquat...@medprocure.com wrote:
 Bill,

 You are absolutely correct, I must not have set the default_validation_class 
 when I added the column family.

 Thanks what I get for continuing to work late into the night!

 Thanks,

 DQ     Less stupid next time

 -Original Message-
 From: Bill Speirs [mailto:bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 4:09 PM
 To: user@cassandra.apache.org
 Subject: Re: Probelms with Set on Byte type New Installation

 Why would you expect strings? You stated that your comparator is
 BytesType. If you set the default_validation_class then you can
 specify what types the values should be returned as:

 [default@Devel] create column family david with comparator=BytesType
 and default_validation_class=UTF8Type;
 2dabf0fb-298f-11e0-b177-e700f669bcfc
 [default@Devel] set david['david']['test'] = 'test';
 Value inserted.
 [default@Devel] get david['david'];
 = (column=74657374, value=test, timestamp=129607562467)
 Returned 1 results.

 Now the column name is returned as the ASCII characters for 'test' and
 the value is returned as a string because of the
 default_validation_class. That seems to make sense to me. If in the
 next column you want to sore a number you must store it as such and
 return it as such:

 [default@Devel] set david['david']['id'] = 37;
 Value inserted.
 [default@Devel] get david['david']['id'] as integer;
 = (column=6964, value=13111, timestamp=129607582033)

 Didn't work because it was inserted as a string not a number/integer.
 However, if you specify it's a number on the way it, it will return
 properly:

 [default@Devel] set david['david']['id'] = integer(37);
 Value inserted.
 [default@Devel] get david['david']['id'] as integer;
 = (column=6964, value=37, timestamp=1296075929082000)

 Hope that helps... again, I'm new to this so maybe I'm not
 understanding your question.

 Bill-

 On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum
 dquat...@medprocure.com wrote:
 Nope, I should be getting back the String values that were inserted:

 [default@TestKeyspace] get custparent['David'];
 = (column=4164647265737331, value=333038204279205061737320313233,
 timestamp=1296071732281000)
 = (column=43697479, value=53656e656361, timestamp=129607174731)
 = (column=4e616d65, value=546f6d7320466163696c697479,
 timestamp=1296071708189000)
 = (column=506f7374616c436f6465, value=3239363738,
 timestamp=1296071774549000)
 = (column=537461746550726f76, value=5343, timestamp=1296071760213000)
 Returned 5 results.

 Values should be Name and Address Values.

 -David Q

 -Original Message-
 From: Bill Speirs [mailto:bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 3:45 PM
 To: user@cassandra.apache.org
 Subject: RE: Probelms with Set on Byte type New Installation

 I'm very (2 days) new to Cassandra, but what does the output look like?

 Total shot in the dark, if the number is less than 256 would it not
 look the same as bytes or a number?

 Hope that in some way helps...

 Bill-

 From: David Quattlebaum [mailto:dquat...@medprocure.com]
 Sent: Wednesday, January 26, 2011 3:25 PM
 To: user@cassandra.apache.org
 Subject: Probelms with Set on Byte type New Installation

 I have set up a new installation of Cassandra, and have it running
 with no problems (0.7.0)

 Using CLI I added a new keyspace, and column family.

 When I set a value for a column I get Value Inserted

 However, when I get the column value it is a number, even though the
 Column Family is of Bytes Type:
 Keyspace: XXX:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: Y
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period: 0.0/0
      Key cache size / save period: 20.0/3600
      Memtable thresholds: 0.0703125/15/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []


 Anyone else had this happen?
 Did I just miss something stupid?
 I have not had any issues with earlier versions of Cassandra.


 David Q


 David Quattlebaum
 MedProcure, LLC
 www.medprocure.com
 (864)482-2018 - Support
 (864)482-2019 - Direct




Re: Schema Design

2011-01-26 Thread David McNelis
My cli knowledge sucks so far, so I'll leave that  to othersI'm doing
most of my reading/writing through a thrift client (hector/java based)

As for the implications, as of the latest version of Cassandra there is not
theoretical limit to the number of columns that a particular row can hold.
 Over time you've got a couple of different options, if you're concerned
that you end up with too many columns to manage then you'd probably want to
start thinking about a warehousing strategy long-term for your older records
that involves expiring columns that are older than X in your Cassandra
cluster.  But for the most part you shouldn't *need* to do that.



On Wed, Jan 26, 2011 at 3:23 PM, Bill Speirs bill.spe...@gmail.com wrote:

 I like this approach, but I have 2 questions:

 1) what is the implications of continually adding columns to a single
 row? I'm unsure how Cassandra is able to grow. I realize you can have
 a virtually infinite number of columns, but what are the implications
 of growing the number of columns over time?

 2) maybe it's just a restriction of the CLI, but how do I do issue a
 slice request? Also, what if start (or end) columns don't exist? I'm
 guessing it's smart enough to get the columns in that range.

 Thanks!

 Bill-

 On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
 dmcne...@agentisenergy.com wrote:
  I would say in that case you might want  to try a  single column family
  where the key to the column is the system name.
  Then, you could name your columns as the timestamp.  Then when retrieving
  information from the data store you can can, in your slice request,
 specify
  your start column as  X and end  column as Y.
  Then you can use the stored column name to know when an event  occurred.
 
  On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com
 wrote:
 
  I'm looking to use Cassandra to store log messages from various
  systems. A log message only has a message (UTF8Type) and a data/time.
  My thought is to create a column family for each system. The row key
  will be a TimeUUIDType. Each row will have 7 columns: year, month,
  day, hour, minute, second, and message. I then have indexes setup for
  each of the date/time columns.
 
  I was hoping this would allow me to answer queries like: What are all
  the log messages that were generated between X  Y? The problem is
  that I can ONLY use the equals operator on these column values. For
  example, I cannot issuing: get system_x where month  1; gives me this
  error: No indexed columns present in index clause with operator EQ.
  The equals operator works as expected though: get system_x where month
  = 1;
 
  What schema would allow me to get date ranges?
 
  Thanks in advance...
 
  Bill-
 
  * ColumnFamily description *
 ColumnFamily: system_x_msg
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 0.0/0
   Key cache size / save period: 20.0/3600
   Memtable thresholds: 1.1671875/249/60
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
  proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
  proj_1_msg.7365636f6e64, proj_1_msg.79656172]
   Column Metadata:
 Column Name: year (year)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 Column Name: month (month)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 Column Name: second (second)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 Column Name: minute (minute)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 Column Name: hour (hour)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 Column Name: day (day)
   Validation Class: org.apache.cassandra.db.marshal.IntegerType
   Index Type: KEYS
 
 
 
  --
  David McNelis
  Lead Software Engineer
  Agentis Energy
  www.agentisenergy.com
  o: 630.359.6395
  c: 219.384.5143
  A Smart Grid technology company focused on helping consumers of energy
  control an often under-managed resource.
 
 




-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*


Re: Schema Design

2011-01-26 Thread Nick Santini
One thing you can do is create one CF, then as the row key use the
application name + timestamp, with that you can do your range query using
OOP. then store whatever you want in the row

problem would be if one app generates far more logs than the others

Nicolas Santini


On Thu, Jan 27, 2011 at 10:26 AM, Bill Speirs bill.spe...@gmail.com wrote:

 I have a basic understanding of OPP... if most of my messages come
 within a single hour then a few nodes could be storing all of my
 values, right?

 You totally lost me on, whether to shard data as per system... Is my
 schema (one column family per system, and row keys as TimeUUIDType)
 sharding by system? I thought -- probably incorrectly -- that the row
 keys are used in the sharding process, not column families.

 Thanks...

 Bill-

 On Wed, Jan 26, 2011 at 4:17 PM, buddhasystem potek...@bnl.gov wrote:
 
  Having separate columns for Year, Month etc seems redundant. It's tons
 more
  efficient to keep say UTC time in POSIX format (basically integer). It's
  easy to convert back and forth.
 
  If you want to get a range of dates, in that case you might use Order
  Preserving Partitioner, and sort out which systems logged later in
 client.
  Read up on consequences of using OPP.
 
  Whether to shard data as per system depends on how many you have. If more
  than a few, don't do that, there are memory considerations.
 
  Cheers
 
  Maxim
 
  --
  View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html
  Sent from the cassandra-u...@incubator.apache.org mailing list archive
 at Nabble.com.
 



Re: Schema Design

2011-01-26 Thread buddhasystem

I used the term sharding a bit frivolously. Sorry. It's just splitting
semantically homogenious data among CFs doesn't scale too well, as each CF
is allocated a piece of memory on the server.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964326.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Schema Design

2011-01-26 Thread Shu Zhang
Each row can have a maximum of 2 billion columns, which a logging system will 
probably hit eventually.

More importantly, you'll only have 1 row per set of system logs. Every row is 
stored on the same machine(s), which you means you'll definitely not be able to 
distribute your load very well.

From: Bill Speirs [bill.spe...@gmail.com]
Sent: Wednesday, January 26, 2011 1:23 PM
To: user@cassandra.apache.org
Subject: Re: Schema Design

I like this approach, but I have 2 questions:

1) what is the implications of continually adding columns to a single
row? I'm unsure how Cassandra is able to grow. I realize you can have
a virtually infinite number of columns, but what are the implications
of growing the number of columns over time?

2) maybe it's just a restriction of the CLI, but how do I do issue a
slice request? Also, what if start (or end) columns don't exist? I'm
guessing it's smart enough to get the columns in that range.

Thanks!

Bill-

On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
dmcne...@agentisenergy.com wrote:
 I would say in that case you might want  to try a  single column family
 where the key to the column is the system name.
 Then, you could name your columns as the timestamp.  Then when retrieving
 information from the data store you can can, in your slice request, specify
 your start column as  X and end  column as Y.
 Then you can use the stored column name to know when an event  occurred.

 On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote:

 I'm looking to use Cassandra to store log messages from various
 systems. A log message only has a message (UTF8Type) and a data/time.
 My thought is to create a column family for each system. The row key
 will be a TimeUUIDType. Each row will have 7 columns: year, month,
 day, hour, minute, second, and message. I then have indexes setup for
 each of the date/time columns.

 I was hoping this would allow me to answer queries like: What are all
 the log messages that were generated between X  Y? The problem is
 that I can ONLY use the equals operator on these column values. For
 example, I cannot issuing: get system_x where month  1; gives me this
 error: No indexed columns present in index clause with operator EQ.
 The equals operator works as expected though: get system_x where month
 = 1;

 What schema would allow me to get date ranges?

 Thanks in advance...

 Bill-

 * ColumnFamily description *
ColumnFamily: system_x_msg
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 1.1671875/249/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
 proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
 proj_1_msg.7365636f6e64, proj_1_msg.79656172]
  Column Metadata:
Column Name: year (year)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: month (month)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: second (second)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: minute (minute)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: hour (hour)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: day (day)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS



 --
 David McNelis
 Lead Software Engineer
 Agentis Energy
 www.agentisenergy.com
 o: 630.359.6395
 c: 219.384.5143
 A Smart Grid technology company focused on helping consumers of energy
 control an often under-managed resource.




Re: Node going down when streaming data, what next?

2011-01-26 Thread Dan Hendry
When this has happened to me, restarting the node you are trying to
move works. I can't remeber the exact conditions but I have also hade
to restart all nodes in the cluster simultaneously once or twice as
well.

I would love to know if there is a better way of doing it.

On Wednesday, January 26, 2011, buddhasystem potek...@bnl.gov wrote:

 Bump. I still don't know what is the best things to do, plz help.
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem

Hello,

from what I know, you don't really have to restart simultaneously,
although of course you don't want to wait.

I finally decided to use removetoken command to actually scratch out the
sickly node from the cluster. I'll bootstrap is later when it's fixed.


-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem

Sorry if this sounds silly, but I can't get my brain around this one: if all
nodes contain replicas, why does the cluster stream data every time I more
or remove a token? If the data is already there, what needs to be streamed?

Thanks
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964839.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Schema Design

2011-01-26 Thread William R Speirs
It makes sense that the single row for a system (with a growing number of 
columns) will reside on a single machine.


With that in mind, here is my updated schema:

- A single column family for all the messages. The row keys will be the TimeUUID 
of the message with the following columns: date/time (in UTC POSIX), system 
name/id (with an index for fast/easy gets), the actual message payload.


- A column family for each system. The row keys will be UTC POSIX time with 1 
second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of 
any messages that were logged during that time bucket.


My only hesitation with this design is that buddhasystem warned that each column 
family, is allocated a piece of memory on the server. I'm not sure what the 
implications of this are and/or if this would be a problem if a I had a number 
of systems on the order of hundreds.


Thanks...

Bill-

On 01/26/2011 06:51 PM, Shu Zhang wrote:

Each row can have a maximum of 2 billion columns, which a logging system will 
probably hit eventually.

More importantly, you'll only have 1 row per set of system logs. Every row is 
stored on the same machine(s), which you means you'll definitely not be able to 
distribute your load very well.

From: Bill Speirs [bill.spe...@gmail.com]
Sent: Wednesday, January 26, 2011 1:23 PM
To: user@cassandra.apache.org
Subject: Re: Schema Design

I like this approach, but I have 2 questions:

1) what is the implications of continually adding columns to a single
row? I'm unsure how Cassandra is able to grow. I realize you can have
a virtually infinite number of columns, but what are the implications
of growing the number of columns over time?

2) maybe it's just a restriction of the CLI, but how do I do issue a
slice request? Also, what if start (or end) columns don't exist? I'm
guessing it's smart enough to get the columns in that range.

Thanks!

Bill-

On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
dmcne...@agentisenergy.com  wrote:

I would say in that case you might want  to try a  single column family
where the key to the column is the system name.
Then, you could name your columns as the timestamp.  Then when retrieving
information from the data store you can can, in your slice request, specify
your start column as  X and end  column as Y.
Then you can use the stored column name to know when an event  occurred.

On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com  wrote:


I'm looking to use Cassandra to store log messages from various
systems. A log message only has a message (UTF8Type) and a data/time.
My thought is to create a column family for each system. The row key
will be a TimeUUIDType. Each row will have 7 columns: year, month,
day, hour, minute, second, and message. I then have indexes setup for
each of the date/time columns.

I was hoping this would allow me to answer queries like: What are all
the log messages that were generated between X  Y? The problem is
that I can ONLY use the equals operator on these column values. For
example, I cannot issuing: get system_x where month  1; gives me this
error: No indexed columns present in index clause with operator EQ.
The equals operator works as expected though: get system_x where month
= 1;

What schema would allow me to get date ranges?

Thanks in advance...

Bill-

* ColumnFamily description *
ColumnFamily: system_x_msg
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/3600
  Memtable thresholds: 1.1671875/249/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572,
proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468,
proj_1_msg.7365636f6e64, proj_1_msg.79656172]
  Column Metadata:
Column Name: year (year)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: month (month)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: second (second)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: minute (minute)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: hour (hour)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS
Column Name: day (day)
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Type: KEYS




--
David McNelis
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143
A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.




Re: Schema Design

2011-01-26 Thread William R Speirs

Ah, sweet... thanks for the link!

Bill-

On 01/26/2011 08:20 PM, buddhasystem wrote:


Bill, it's all explained here:

http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the

Watch the number of CFs and the memtable sizes.

In my experience, this all matters.


RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem

Thanks, I'll look at the configuration again.

In the meantime, I can't move the first node in the ring (after I removed
the previous node's token) -- it throws an exception and says data is being
streamed to it -- however, this is not what netstats says! Weirdness
continues...

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964883.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


[no subject]

2011-01-26 Thread Geoffry Roberts
-- 
Geoffry Roberts


Re: Schema Design

2011-01-26 Thread Wangpei (Peter)
I am also working on a system store logs from hundreds system.
In my scenario, most query will like this: let's look at login logs (category 
EQ) of that proxy (host EQ) between this Monday and Wednesday(time range).
My data model like this:
. only 1 CF. that's enough for this scenario.
. group logs from each host and day to one row. Key format is 
hostname.category.date
. store each log entry as a super column, super olumn name is TimeUUID of the 
log. each attribute as a column.

Then this query can be done as 3 GET, no need to do key range scan.
Then I can use RP instead of OPP. If I use OPP, I have to worry about load 
balance myself. I hate that.
However, if I need to do a time range access, I can still use column slice.

An additional benefit is, I can clean old logs very easily. We only store logs 
in 1 year. Just deleting by keys can do this job well.

I think storing all logs for a host in a single row is not a good choice. 2 
reason:
1, too few keys, so your data will not distributing well.
2, data under a key will always increase. So Cassandra have to do more SSTable 
compaction.

-邮件原件-
发件人: William R Speirs [mailto:bill.spe...@gmail.com] 
发送时间: 2011年1月27日 9:15
收件人: user@cassandra.apache.org
主题: Re: Schema Design

It makes sense that the single row for a system (with a growing number of 
columns) will reside on a single machine.

With that in mind, here is my updated schema:

- A single column family for all the messages. The row keys will be the 
TimeUUID 
of the message with the following columns: date/time (in UTC POSIX), system 
name/id (with an index for fast/easy gets), the actual message payload.

- A column family for each system. The row keys will be UTC POSIX time with 1 
second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of 
any messages that were logged during that time bucket.

My only hesitation with this design is that buddhasystem warned that each 
column 
family, is allocated a piece of memory on the server. I'm not sure what the 
implications of this are and/or if this would be a problem if a I had a number 
of systems on the order of hundreds.

Thanks...

Bill-

On 01/26/2011 06:51 PM, Shu Zhang wrote:
 Each row can have a maximum of 2 billion columns, which a logging system will 
 probably hit eventually.

 More importantly, you'll only have 1 row per set of system logs. Every row is 
 stored on the same machine(s), which you means you'll definitely not be able 
 to distribute your load very well.
 
 From: Bill Speirs [bill.spe...@gmail.com]
 Sent: Wednesday, January 26, 2011 1:23 PM
 To: user@cassandra.apache.org
 Subject: Re: Schema Design

 I like this approach, but I have 2 questions:

 1) what is the implications of continually adding columns to a single
 row? I'm unsure how Cassandra is able to grow. I realize you can have
 a virtually infinite number of columns, but what are the implications
 of growing the number of columns over time?

 2) maybe it's just a restriction of the CLI, but how do I do issue a
 slice request? Also, what if start (or end) columns don't exist? I'm
 guessing it's smart enough to get the columns in that range.

 Thanks!

 Bill-

 On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
 dmcne...@agentisenergy.com  wrote:
 I would say in that case you might want  to try a  single column family
 where the key to the column is the system name.
 Then, you could name your columns as the timestamp.  Then when retrieving
 information from the data store you can can, in your slice request, specify
 your start column as  X and end  column as Y.
 Then you can use the stored column name to know when an event  occurred.

 On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com  wrote:

 I'm looking to use Cassandra to store log messages from various
 systems. A log message only has a message (UTF8Type) and a data/time.
 My thought is to create a column family for each system. The row key
 will be a TimeUUIDType. Each row will have 7 columns: year, month,
 day, hour, minute, second, and message. I then have indexes setup for
 each of the date/time columns.

 I was hoping this would allow me to answer queries like: What are all
 the log messages that were generated between X  Y? The problem is
 that I can ONLY use the equals operator on these column values. For
 example, I cannot issuing: get system_x where month  1; gives me this
 error: No indexed columns present in index clause with operator EQ.
 The equals operator works as expected though: get system_x where month
 = 1;

 What schema would allow me to get date ranges?

 Thanks in advance...

 Bill-

 * ColumnFamily description *
 ColumnFamily: system_x_msg
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 0.0/0
   Key cache size / save period: 20.0/3600
   Memtable thresholds: 1.1671875/249/60
   GC grace seconds: 864000
   Compaction min/max 

Re: the java client problem

2011-01-26 Thread Ashish
I have no clue about this error.. look into log files. They might reveal
something

Anyone else can help here.?

2011/1/27 Raoyixuan (Shandy) raoyix...@huawei.com

  It shows error, I put it in the attachment



 *From:* Ashish [mailto:paliwalash...@gmail.com]
 *Sent:* Wednesday, January 26, 2011 10:31 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: the java client problem



 click on the loadSchema() button in right panel :)

 2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.com

 I had find the loasschemafrom yaml  by jconsole,How to load the schema ?



 *From:* Ashish [mailto:paliwalash...@gmail.com]
 *Sent:* Friday, January 21, 2011 8:10 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: the java client problem



 check cassandra-install-dir/conf/cassandra.yaml



 start cassandra

 connect via jconsole

 find MBeans - org.apache.cassandra.db - 
 StorageServicehttp://wiki.apache.org/cassandra/StorageService -
 Operations - loadSchemaFromYAML



 load the schema

 and then try the example again.



 HTH

 ashish



 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com

 Which schema is it?

 *From:* Ashish [mailto:paliwalash...@gmail.com]
 *Sent:* Friday, January 21, 2011 7:57 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: the java client problem



 you are missing the column family in your keyspace.



 If you are using the default definitions of schema shipped with cassandra,
 ensure to load the schema from JMX.



 thanks

 ashish

 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com

 I exec the code as below by hector client:



 *package* com.riptano.cassandra.hector.example;

 *import* me.prettyprint.cassandra.serializers.StringSerializer;

 *import* me.prettyprint.hector.api.Cluster;

 *import* me.prettyprint.hector.api.Keyspace;

 *import* me.prettyprint.hector.api.beans.HColumn;

 *import* me.prettyprint.hector.api.exceptions.HectorException;

 *import* me.prettyprint.hector.api.factory.HFactory;

 *import* me.prettyprint.hector.api.mutation.Mutator;

 *import* me.prettyprint.hector.api.query.ColumnQuery;

 *import* me.prettyprint.hector.api.query.QueryResult;



 *public* *class* InsertSingleColumn {

 *private* *static* StringSerializer *stringSerializer* = StringSerializer.
 *get*();



 *public* *static* *void* main(String[] args) *throws* Exception {

 Cluster cluster = HFactory.*getOrCreateCluster*(TestCluster,
 *.*.*.*:9160);



 Keyspace keyspaceOperator = HFactory.*createKeyspace*(Shandy,
 cluster);



 * try* {

 MutatorString mutator = 
 HFactory.*createMutator*(keyspaceOperator,
 StringSerializer.*get*());

 mutator.insert(jsmith, Standard1, HFactory.*
 createStringColumn*(first, John));



 ColumnQueryString, String, String columnQuery = HFactory.*
 createStringColumnQuery*(keyspaceOperator);


 columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first);

 QueryResultHColumnString, String result =
 columnQuery.execute();



 System.*out*.println(Read HColumn from cassandra:  +
 result.get());

 System.*out*.println(Verify on CLI with:  get
 Keyspace1.Standard1['jsmith'] );



 } *catch* (HectorException e) {

 e.printStackTrace();

 }

 cluster.getConnectionManager().shutdown();

 }



 }



 And it shows the error :



 *me.prettyprint.hector.api.exceptions.HInvalidRequestException*:
 InvalidRequestException(why:unconfigured columnfamily Standard1)

   at
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(*
 ExceptionsTranslatorImpl.java:42*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(*
 KeyspaceServiceImpl.java:95*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(*
 KeyspaceServiceImpl.java:88*)

   at me.prettyprint.cassandra.service.Operation.executeAndSetResult(*
 Operation.java:89*)

   at
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(
 *HConnectionManager.java:142*)

   at
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(*
 KeyspaceServiceImpl.java:129*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(
 *KeyspaceServiceImpl.java:100*)

   at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(
 *KeyspaceServiceImpl.java:106*)

   at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(*
 MutatorImpl.java:149*)

   at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(*
 MutatorImpl.java:146*)

   at
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(
 *KeyspaceOperationCallback.java:20*)

  at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(*
 ExecutingKeyspace.java:65*)

   at me.prettyprint.cassandra.model.MutatorImpl.execute(*
 MutatorImpl.java:146*)

   at me.prettyprint.cassandra.model.MutatorImpl.insert(*
 MutatorImpl.java:55*)

   at 

repair cause large number of SSTABLEs

2011-01-26 Thread B. Todd Burruss
i ran out of file handles on the repairing node after doing nodetool 
repair - strange as i have never had this issue until using 0.7.0 (but i 
should say that i have not truly tested 0.7.0 until now.)  up'ed the 
number of file handles, removed data, restarted nodes, then restarted my 
test.  waited a little while.  i have two keyspaces on the cluster, so i 
checked the number of SSTABLES in one of them before nodetool repair 
and i see 36 data.db files, spread over 11 column families.  very 
reasonable.


after running nodetool repair i have over 900 data.db files, 
immediately!  now after waiting several hours i have over 1500 data.db 
files.  out of these i have 95 compacted files


lsof reporting 803 files in use by cassandra for the Queues keyspace ...

[cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues
803

.. this doesn't sound right to me.  checking the server log i see a lot 
of these messages:


ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.fastRemove(ArrayList.java:441)
at java.util.ArrayList.remove(ArrayList.java:424)
at 
com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219)
at 
com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60)
at 
org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436)
at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)


and a lot of these:

ERROR [ReadStage:809] 2011-01-26 21:48:01,047 
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.ArrayIndexOutOfBoundsException
ERROR [ReadStage:809] 2011-01-26 21:48:01,047 
AbstractCassandraDaemon.java (line 91) Fatal exception in thread 
Thread[ReadStage:809,5,main]

java.lang.ArrayIndexOutOfBoundsException

and some more like this:
ERROR [ReadStage:15] 2011-01-26 20:59:14,695 
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor

java.lang.ArrayIndexOutOfBoundsException: 6
at 
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
at 
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
at 
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
at 
org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98)
at 
org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95)
at 
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334)
at 
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
at 
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)

at org.apache.cassandra.db.Table.getRow(Table.java:384)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at 
org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)



RE: the java client problem

2011-01-26 Thread Raoyixuan (Shandy)
I had solved this problem.

I created the column family firstly, then it’s ok.

From: Ashish [mailto:paliwalash...@gmail.com]
Sent: Thursday, January 27, 2011 1:16 PM
To: user@cassandra.apache.org
Subject: Re: the java client problem

I have no clue about this error.. look into log files. They might reveal 
something

Anyone else can help here.?
2011/1/27 Raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com
It shows error, I put it in the attachment

From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com]
Sent: Wednesday, January 26, 2011 10:31 PM

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: the java client problem

click on the loadSchema() button in right panel :)
2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com
I had find the loasschemafrom yaml  by jconsole,How to load the schema ?

From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com]
Sent: Friday, January 21, 2011 8:10 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: the java client problem

check cassandra-install-dir/conf/cassandra.yaml

start cassandra
connect via jconsole
find MBeans - org.apache.cassandra.db - 
StorageServicehttp://wiki.apache.org/cassandra/StorageService - Operations 
- loadSchemaFromYAML

load the schema
and then try the example again.

HTH
ashish

2011/1/21 raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com
Which schema is it?
From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com]
Sent: Friday, January 21, 2011 7:57 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: the java client problem

you are missing the column family in your keyspace.

If you are using the default definitions of schema shipped with cassandra, 
ensure to load the schema from JMX.

thanks
ashish
2011/1/21 raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com
I exec the code as below by hector client:

package com.riptano.cassandra.hector.example;
import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.HColumn;
import me.prettyprint.hector.api.exceptions.HectorException;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import me.prettyprint.hector.api.query.ColumnQuery;
import me.prettyprint.hector.api.query.QueryResult;

public class InsertSingleColumn {
private static StringSerializer stringSerializer = StringSerializer.get();

public static void main(String[] args) throws Exception {
Cluster cluster = HFactory.getOrCreateCluster(TestCluster, 
*.*.*.*:9160);

Keyspace keyspaceOperator = HFactory.createKeyspace(Shandy, cluster);

try {
MutatorString mutator = HFactory.createMutator(keyspaceOperator, 
StringSerializer.get());
mutator.insert(jsmith, Standard1, 
HFactory.createStringColumn(first, John));

ColumnQueryString, String, String columnQuery = 
HFactory.createStringColumnQuery(keyspaceOperator);

columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first);
QueryResultHColumnString, String result = columnQuery.execute();

System.out.println(Read HColumn from cassandra:  + result.get());
System.out.println(Verify on CLI with:  get 
Keyspace1.Standard1['jsmith'] );

} catch (HectorException e) {
e.printStackTrace();
}
cluster.getConnectionManager().shutdown();
}

}

And it shows the error :

me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
InvalidRequestException(why:unconfigured columnfamily Standard1)
  at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42)
  at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
  at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:88)
  at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89)
  at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:142)
  at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129)
  at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:100)
  at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:106)
  at 
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:149)
  at 
me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:146)
  at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
 at 

Generating tokens for Cassandra cluster with ByteOrderedPartitioner

2011-01-26 Thread Matthew Tovbin
Hey,

Can anyone suggest me how to manually generate tokens for Cassandra 0.7.0
cluster, while ByteOrderedPartitioner is being used?

Thanks in advance.

-- 
Best regards,
 Matthew Tovbin.


Using Cassandra for storing large objects

2011-01-26 Thread Narendra Sharma
Anyone using Cassandra for storing large number (millions) of large (mostly
immutable) objects (200KB-5MB size each)? I would like to understand the
experience in general considering that Cassandra is not considered a good
fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265


Thanks,
Naren