Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse
On Wed, Jan 26, 2011 at 08:58, Mck m...@apache.org wrote: You are correct that microseconds would be better but for the test it doesn't matter that much. Have you tried. I'm very new to cassandra as well, and always uncertain as to what to expect... IMHO it's matter of use-case. In my use-case there is no possibility for two (or more) processes to write/update the same key so miliseconds are fine for me. BTW how to get current time in microseconds in Java? As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) won't this hurt performance? Normally i would _always_ agree that a defensive copy of an array/collection argument be stored, but has this intentionally not been done (or should it) because of large reduce jobs (millions of records) and the performance impact here. The size of the queue is computed at runtime: ColumnFamilyOutputFormat.QUEUE_SIZE, 32 * Runtime.getRuntime().availableProcessors() So the queue is not too large so I'd say the performance shouldn't get hurt. -- Patrik
Re: client threads locked up - JIRA ISSUE 1594
I'm using the jars packed in Hector 0.6.0-19 (the one compatible with Cassandra 0.6.*). I wanted to use hector, but for some reason I haven't been able to do so yet. What I'm doing is a POC kind of thing, and only if it works out properly, we'll go on to build on it. The reason I asked this question in the first place was a high idle cpu percentage. I'm currently doing the POC on an 8-core machine. I have 8 client threads inserting data into Cassandra. But, most of the time, I can see 40-45% user time, 15-20% system time and the rest idle - for each core, even if I change the number of client threads (increase or decrease). Then I used jstack on my Java application, and the result was exactly similar to the JIRA issue 1594. I was just wondering whether this can be the reason for idle CPU. Because the same application was earlier tried with Lucene (to store the indices) and we had about 90% CPU utilization. We've replaced Lucene with Cassandra (to store the index and inverted index), and the CPU utillization is down, the total time required went up by 5 folds (for the same data set). We tried Cassandra directly as apparently Lucandra is 10% slower than Cassandra... Arijit On 25 January 2011 20:30, Nate McCall n...@riptano.com wrote: What version of the Thrift API are you using? (In general, you should use an existing client library rather than rolling your own - I recommend Hector: https://github.com/rantav/hector). On Tue, Jan 25, 2011 at 12:38 AM, Arijit Mukherjee ariji...@gmail.com wrote: I'm using Cassandra 0.6.8. I'm not using Hector - it's just raw thrift APIs. Arijit On 21 January 2011 22:13, Nate McCall n...@riptano.com wrote: What versions of Cassandra and Hector? The versions mentioned on this ticket are both several releases behind. On Fri, Jan 21, 2011 at 3:53 AM, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I'm facing the same issue as this one mentioned here - https://issues.apache.org/jira/browse/CASSANDRA-1594 Is there any solution or work-around for this? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be.
Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote: BTW how to get current time in microseconds in Java? I'm using HFactory.clock() (from hector). As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) won't this hurt performance? The size of the queue is computed at runtime: ColumnFamilyOutputFormat.QUEUE_SIZE, 32 * Runtime.getRuntime().availableProcessors() So the queue is not too large so I'd say the performance shouldn't get hurt. This is only the default. I'm running w/ 8. Testing have given this the best throughput for me when processing 25+ million rows... In the end it is still 25+ million .clone(..) calls. The key isn't the only potential live byte[]. You also have names and values in all the columns (and supercolumns) for all the mutations. Now make that over a billion .clone(..) calls... :-( byte[] copies are relatively quick and cheap, still i am seeing a performance degradation in m/r reduce performance with cloning of keys. It's not that you don't have my vote here, i'm just stating my uncertainty on what the correct API should be. ~mck signature.asc Description: This is a digitally signed message part
Re: Files not deleted after compaction and GCed
It's a bug. In SSTableDeletingReference, it try this operation components.remove(Component.DATA); before STable.delete(desc, components); However, the components was reference to the components object which was created inside SSTable by this.components = Collections.unmodifiableSet(dataComponents); As you can see, you can't try the remove operation on that componets object. If I add a try block and output exception around the components.remove(Component.DATA), I got this. java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.remove(Unknown Source) at org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Regards, Chen On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote: the other component types are deleted by this line: SSTable.delete(desc, components); On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Nope, no exception at all. But if the same class (org.apache.cassandra.io.sstable.SSTableDeletingReference) is responsible for delete other files, then that's not right. I checked the source code for SSTableDeletingReference, doesn't looks like it will delete other files type. Regards, Chen On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com wrote: No, that is not expected. All the sstable components are removed in the same method; did you check the log for exceptions? On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Using cassandra 0.7.0 The class org.apache.cassandra.io.sstable.SSTableDeletingReference only remove the -Data.db file, but leave the xxx-Compacted, xxx-Filter.db, xxx-Index.db and xxx-Statistics.db intact. And that's the behavior I saw.I ran manual compact then trigger a GC from jconsole. The Data.db file got removed but not the others. Is this the expected behavior? Regards, Chen -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: the java client problem
click on the loadSchema() button in right panel :) 2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.com I had find the loasschemafrom yaml by jconsole,How to load the schema ? *From:* Ashish [mailto:paliwalash...@gmail.com] *Sent:* Friday, January 21, 2011 8:10 PM *To:* user@cassandra.apache.org *Subject:* Re: the java client problem check cassandra-install-dir/conf/cassandra.yaml start cassandra connect via jconsole find MBeans - org.apache.cassandra.db - StorageServicehttp://wiki.apache.org/cassandra/StorageService - Operations - loadSchemaFromYAML load the schema and then try the example again. HTH ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com Which schema is it? *From:* Ashish [mailto:paliwalash...@gmail.com] *Sent:* Friday, January 21, 2011 7:57 PM *To:* user@cassandra.apache.org *Subject:* Re: the java client problem you are missing the column family in your keyspace. If you are using the default definitions of schema shipped with cassandra, ensure to load the schema from JMX. thanks ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com I exec the code as below by hector client: *package* com.riptano.cassandra.hector.example; *import* me.prettyprint.cassandra.serializers.StringSerializer; *import* me.prettyprint.hector.api.Cluster; *import* me.prettyprint.hector.api.Keyspace; *import* me.prettyprint.hector.api.beans.HColumn; *import* me.prettyprint.hector.api.exceptions.HectorException; *import* me.prettyprint.hector.api.factory.HFactory; *import* me.prettyprint.hector.api.mutation.Mutator; *import* me.prettyprint.hector.api.query.ColumnQuery; *import* me.prettyprint.hector.api.query.QueryResult; *public* *class* InsertSingleColumn { *private* *static* StringSerializer *stringSerializer* = StringSerializer. *get*(); *public* *static* *void* main(String[] args) *throws* Exception { Cluster cluster = HFactory.*getOrCreateCluster*(TestCluster, *.*.*.*:9160); Keyspace keyspaceOperator = HFactory.*createKeyspace*(Shandy, cluster); * try* { MutatorString mutator = HFactory.*createMutator*(keyspaceOperator, StringSerializer.*get*()); mutator.insert(jsmith, Standard1, HFactory.* createStringColumn*(first, John)); ColumnQueryString, String, String columnQuery = HFactory.* createStringColumnQuery*(keyspaceOperator); columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first); QueryResultHColumnString, String result = columnQuery.execute(); System.*out*.println(Read HColumn from cassandra: + result.get()); System.*out*.println(Verify on CLI with: get Keyspace1.Standard1['jsmith'] ); } *catch* (HectorException e) { e.printStackTrace(); } cluster.getConnectionManager().shutdown(); } } And it shows the error : *me.prettyprint.hector.api.exceptions.HInvalidRequestException*: InvalidRequestException(why:unconfigured columnfamily Standard1) at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(* ExceptionsTranslatorImpl.java:42*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(* KeyspaceServiceImpl.java:95*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(* KeyspaceServiceImpl.java:88*) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(* Operation.java:89*) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover( *HConnectionManager.java:142*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(* KeyspaceServiceImpl.java:129*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate( *KeyspaceServiceImpl.java:100*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate( *KeyspaceServiceImpl.java:106*) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(* MutatorImpl.java:149*) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(* MutatorImpl.java:146*) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure( *KeyspaceOperationCallback.java:20*) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(* ExecutingKeyspace.java:65*) at me.prettyprint.cassandra.model.MutatorImpl.execute(* MutatorImpl.java:146*) at me.prettyprint.cassandra.model.MutatorImpl.insert(* MutatorImpl.java:55*) at com.riptano.cassandra.hector.example.InsertSingleColumn.main(* InsertSingleColumn.java:21*) Caused by: InvalidRequestException(why:unconfigured columnfamily Standard1) at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(* Cassandra.java:16477*) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(* Cassandra.java:916*) at
Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever m...@apache.org wrote: Well your key is a mutable Text object, so i can see some possibility depending on how hadoop uses these objects. Yes, that's it exactly. We recently fixed a bug in the demo word_count program for this. Now we do ByteBuffer.wrap(Arrays.copyOf(text.getBytes(), text.getLength())). -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Node going down when streaming data, what next?
I was moving a node and at some point it started streaming data to 2 other nodes. Later, that node keeled over and let's assume I can't fix it for the next 3 days and just want to move tokens on the remaining three to even out and see if I can live with it. But I can't do that! The node that was on the receiving end of the stream refuses to move, because it's still receiving. What do I do? Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Files not deleted after compaction and GCed
Thanks for tracking that down! Created https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix. On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: It's a bug. In SSTableDeletingReference, it try this operation components.remove(Component.DATA); before STable.delete(desc, components); However, the components was reference to the components object which was created inside SSTable by this.components = Collections.unmodifiableSet(dataComponents); As you can see, you can't try the remove operation on that componets object. If I add a try block and output exception around the components.remove(Component.DATA), I got this. java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.remove(Unknown Source) at org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Regards, Chen On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote: the other component types are deleted by this line: SSTable.delete(desc, components); On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Nope, no exception at all. But if the same class (org.apache.cassandra.io.sstable.SSTableDeletingReference) is responsible for delete other files, then that's not right. I checked the source code for SSTableDeletingReference, doesn't looks like it will delete other files type. Regards, Chen On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com wrote: No, that is not expected. All the sstable components are removed in the same method; did you check the log for exceptions? On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Using cassandra 0.7.0 The class org.apache.cassandra.io.sstable.SSTableDeletingReference only remove the -Data.db file, but leave the xxx-Compacted, xxx-Filter.db, xxx-Index.db and xxx-Statistics.db intact. And that's the behavior I saw. I ran manual compact then trigger a GC from jconsole. The Data.db file got removed but not the others. Is this the expected behavior? Regards, Chen -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Files not deleted after compaction and GCed
Patch submitted. One thing I still don't understand is why RetryingScheduledThreadPoolExecutor isn't firing the DefaultUncaughtExceptionHandler, which should have logged that exception. On Wed, Jan 26, 2011 at 9:41 AM, Jonathan Ellis jbel...@gmail.com wrote: Thanks for tracking that down! Created https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix. On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: It's a bug. In SSTableDeletingReference, it try this operation components.remove(Component.DATA); before STable.delete(desc, components); However, the components was reference to the components object which was created inside SSTable by this.components = Collections.unmodifiableSet(dataComponents); As you can see, you can't try the remove operation on that componets object. If I add a try block and output exception around the components.remove(Component.DATA), I got this. java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.remove(Unknown Source) at org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Regards, Chen On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote: the other component types are deleted by this line: SSTable.delete(desc, components); On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Nope, no exception at all. But if the same class (org.apache.cassandra.io.sstable.SSTableDeletingReference) is responsible for delete other files, then that's not right. I checked the source code for SSTableDeletingReference, doesn't looks like it will delete other files type. Regards, Chen On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com wrote: No, that is not expected. All the sstable components are removed in the same method; did you check the log for exceptions? On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Using cassandra 0.7.0 The class org.apache.cassandra.io.sstable.SSTableDeletingReference only remove the -Data.db file, but leave the xxx-Compacted, xxx-Filter.db, xxx-Index.db and xxx-Statistics.db intact. And that's the behavior I saw. I ran manual compact then trigger a GC from jconsole. The Data.db file got removed but not the others. Is this the expected behavior? Regards, Chen -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Files not deleted after compaction and GCed
I think this might be what happening. Since you are using ScheduledThreadPoolExecutor.schedule(), the exception was swallowed by the FutureTask. You will have to perform a get() method on the ScheduledFuture, and you will get ExecutionException if there was any exception occured in run(). Regards, Chen On Wed, Jan 26, 2011 at 10:50 AM, Jonathan Ellis jbel...@gmail.com wrote: Patch submitted. One thing I still don't understand is why RetryingScheduledThreadPoolExecutor isn't firing the DefaultUncaughtExceptionHandler, which should have logged that exception. On Wed, Jan 26, 2011 at 9:41 AM, Jonathan Ellis jbel...@gmail.com wrote: Thanks for tracking that down! Created https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix. On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: It's a bug. In SSTableDeletingReference, it try this operation components.remove(Component.DATA); before STable.delete(desc, components); However, the components was reference to the components object which was created inside SSTable by this.components = Collections.unmodifiableSet(dataComponents); As you can see, you can't try the remove operation on that componets object. If I add a try block and output exception around the components.remove(Component.DATA), I got this. java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.remove(Unknown Source) at org.apache.cassandra.io.sstable.SSTableDeletingReference$CleanupTask.run(SSTableDeletingReference.java:103) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at org.apache.cassandra.concurrent.RetryingScheduledThreadPoolExecutor$LoggingScheduledFuture.run(RetryingScheduledThreadPoolExecutor.java:81) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Regards, Chen On Tue, Jan 25, 2011 at 4:21 PM, Jonathan Ellis jbel...@gmail.com wrote: the other component types are deleted by this line: SSTable.delete(desc, components); On Tue, Jan 25, 2011 at 3:11 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Nope, no exception at all. But if the same class (org.apache.cassandra.io.sstable.SSTableDeletingReference) is responsible for delete other files, then that's not right. I checked the source code for SSTableDeletingReference, doesn't looks like it will delete other files type. Regards, Chen On Tue, Jan 25, 2011 at 4:05 PM, Jonathan Ellis jbel...@gmail.com wrote: No, that is not expected. All the sstable components are removed in the same method; did you check the log for exceptions? On Tue, Jan 25, 2011 at 2:58 PM, Ching-Cheng Chen cc...@evidentsoftware.com wrote: Using cassandra 0.7.0 The class org.apache.cassandra.io.sstable.SSTableDeletingReference only remove the -Data.db file, but leave the xxx-Compacted, xxx-Filter.db, xxx-Index.db and xxx-Statistics.db intact. And that's the behavior I saw.I ran manual compact then trigger a GC from jconsole. The Data.db file got removed but not the others. Is this the expected behavior? Regards, Chen -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Stress test inconsistencies
Hi All, I was able to run contrib/stress at a very impressive throughput. Single threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. Multithreaded client was able to pump 7,000 inserts per second with 7ms latency. Thank you very much for your help! Oleg
Re: Stress test inconsistencies
Would you share with us the changes you made, or problems you found? On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote: Hi All, I was able to run contrib/stress at a very impressive throughput. Single threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. Multithreaded client was able to pump 7,000 inserts per second with 7ms latency. Thank you very much for your help! Oleg
RE: Repair on single CF not working (0.7)
After some bad experiences in the past using non-release versions, I am a little hesitant. Which nodes would the new code have to be deployed to in order to test? If it is just one of the three, I might be willing if I need to repair again. Dan From: Brandon Williams [mailto:dri...@gmail.com] Sent: January-24-11 19:19 To: user@cassandra.apache.org Subject: Re: Repair on single CF not working (0.7) On Mon, Jan 24, 2011 at 4:15 PM, Dan Hendry dan.hendry.j...@gmail.com wrote: I am trying to repair a single CF using nodetool. It seems like the request to limit the repair to one CF is not being respected. Here is my current situation: - Run nodetool repair KEYSPACE CF_A on node 3 - Validation compaction runs on nodes 2,3,4 for CF_A only (expected) - Node 3 streams SSTables from CF_A only to nodes 2 and 4 (expected) - Nodes 2 and 4 stream SSTables from ALL column families in the keyspace to node 2 (VERY unexpected) - Node 2 runs out of disk space before SSTable rebuild for all cfs can complete. Presumably this is a bug (?). This is the first time in quite awhile that I have run a repair (I don't perform deletes, just use expiring columns). I have included pertinent log entries below: Log entry from node 3 after running out of disk space: INFO [Thread-3462] 2011-01-24 16:00:40,658 StreamInSession.java (line 124) Streaming of file /var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1313-Data.db/(0,28295 156514) progress=15267921920/28295156514 - 53% from org.apache.cassandra.streaming.StreamInSession@8df7e0c failed: requesting a retry. INFO [Thread-3423] 2011-01-24 16:00:40,658 StreamInSession.java (line 124) Streaming of file /var/lib/cassandra/data/kikmetrics/UserEventsByEvent-e-1348-Data.db/(2953062 0905,58066315307) progress=18898300928/28535694402 - 66% from org.apache.cassandra.streaming.StreamInSession@8df7e0c failed: requesting a retry. Can you test this against the 0.7 branch? There were some bugs fixed in StreamInSession that may be related (https://issues.apache.org/jira/browse/CASSANDRA-1992) -Brandon No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.872 / Virus Database: 271.1.1/3398 - Release Date: 01/24/11 02:35:00
Re: Stress test inconsistencies
I returned to periodic commit log fsync. Jonathan Shook jshook at gmail.com writes: Would you share with us the changes you made, or problems you found?
Probelms with Set on Byte type New Installation
I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get Value Inserted However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
My new nemesis: EOFException (0.7.0)
I am having yet another issue on one of my Cassandra nodes. Last night, one of my nodes ran out of memory and crashed after flooding the logs with the same type of errors I am seeing below. After restarting, they are popping up again. My solution has been to drop the consistency from ALL to ONE for the query which seems to be causing this problem so my service using Cassandra starts working again but its a terrible solution at best. Is there any thought as to what the root cause of this issue is or thoughts on how to fix it? These errors seem to be popping up when reading from the same column family which I have been having other problems with. Recently, I drained the node and shutdown, deleted all on disk files for this column family, then ran a repair (which caused the node to run out of disk space as I have detailed in a previous email). Could the repair somehow have corrupted the nodes new data? Why is this error only appearing on one node given the data now is nearly guaranteed to have come from another replica? I have triple checked and all nodes are running the same, release version of 0.7 Are there any suggestions for tools to check over a systems hardware (Ubuntu 10.04)? SMART info for the disk shows nothing alarming and there is nothing in /var/log/messages. ERROR [ReadStage:10] 2011-01-26 12:15:59,607 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:124) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:47) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableS liceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQ ueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilte r.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto re.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore. java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.jav a:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63 ) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:9 4) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 64) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:3 13) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetche r.getNextBlock(IndexedSliceReader.java:180) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:119) ... 22 more ERROR [ReadStage:10] 2011-01-26 12:15:59,608 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:10,5,main] java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:124) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(Indexe dSliceReader.java:47) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at
RE: Probelms with Set on Byte type New Installation
I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January 26, 2011 3:25 PM To: user@cassandra.apache.org Subject: Probelms with Set on Byte type New Installation I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get “Value Inserted” However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
RE: Probelms with Set on Byte type New Installation
Nope, I should be getting back the String values that were inserted: [default@TestKeyspace] get custparent['David']; = (column=4164647265737331, value=333038204279205061737320313233, timestamp=1296071732281000) = (column=43697479, value=53656e656361, timestamp=129607174731) = (column=4e616d65, value=546f6d7320466163696c697479, timestamp=1296071708189000) = (column=506f7374616c436f6465, value=3239363738, timestamp=1296071774549000) = (column=537461746550726f76, value=5343, timestamp=1296071760213000) Returned 5 results. Values should be Name and Address Values. -David Q -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 3:45 PM To: user@cassandra.apache.org Subject: RE: Probelms with Set on Byte type New Installation I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January 26, 2011 3:25 PM To: user@cassandra.apache.org Subject: Probelms with Set on Byte type New Installation I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get Value Inserted However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
Schema Design
I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS
Re: Probelms with Set on Byte type New Installation
Why would you expect strings? You stated that your comparator is BytesType. If you set the default_validation_class then you can specify what types the values should be returned as: [default@Devel] create column family david with comparator=BytesType and default_validation_class=UTF8Type; 2dabf0fb-298f-11e0-b177-e700f669bcfc [default@Devel] set david['david']['test'] = 'test'; Value inserted. [default@Devel] get david['david']; = (column=74657374, value=test, timestamp=129607562467) Returned 1 results. Now the column name is returned as the ASCII characters for 'test' and the value is returned as a string because of the default_validation_class. That seems to make sense to me. If in the next column you want to sore a number you must store it as such and return it as such: [default@Devel] set david['david']['id'] = 37; Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=13111, timestamp=129607582033) Didn't work because it was inserted as a string not a number/integer. However, if you specify it's a number on the way it, it will return properly: [default@Devel] set david['david']['id'] = integer(37); Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=37, timestamp=1296075929082000) Hope that helps... again, I'm new to this so maybe I'm not understanding your question. Bill- On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum dquat...@medprocure.com wrote: Nope, I should be getting back the String values that were inserted: [default@TestKeyspace] get custparent['David']; = (column=4164647265737331, value=333038204279205061737320313233, timestamp=1296071732281000) = (column=43697479, value=53656e656361, timestamp=129607174731) = (column=4e616d65, value=546f6d7320466163696c697479, timestamp=1296071708189000) = (column=506f7374616c436f6465, value=3239363738, timestamp=1296071774549000) = (column=537461746550726f76, value=5343, timestamp=1296071760213000) Returned 5 results. Values should be Name and Address Values. -David Q -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 3:45 PM To: user@cassandra.apache.org Subject: RE: Probelms with Set on Byte type New Installation I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January 26, 2011 3:25 PM To: user@cassandra.apache.org Subject: Probelms with Set on Byte type New Installation I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get Value Inserted However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
Re: Schema Design
I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Schema Design
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which systems logged later in client. Read up on consequences of using OPP. Whether to shard data as per system depends on how many you have. If more than a few, don't do that, there are memory considerations. Cheers Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Schema Design
I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
RE: Probelms with Set on Byte type New Installation
Bill, You are absolutely correct, I must not have set the default_validation_class when I added the column family. Thanks what I get for continuing to work late into the night! Thanks, DQ Less stupid next time -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 4:09 PM To: user@cassandra.apache.org Subject: Re: Probelms with Set on Byte type New Installation Why would you expect strings? You stated that your comparator is BytesType. If you set the default_validation_class then you can specify what types the values should be returned as: [default@Devel] create column family david with comparator=BytesType and default_validation_class=UTF8Type; 2dabf0fb-298f-11e0-b177-e700f669bcfc [default@Devel] set david['david']['test'] = 'test'; Value inserted. [default@Devel] get david['david']; = (column=74657374, value=test, timestamp=129607562467) Returned 1 results. Now the column name is returned as the ASCII characters for 'test' and the value is returned as a string because of the default_validation_class. That seems to make sense to me. If in the next column you want to sore a number you must store it as such and return it as such: [default@Devel] set david['david']['id'] = 37; Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=13111, timestamp=129607582033) Didn't work because it was inserted as a string not a number/integer. However, if you specify it's a number on the way it, it will return properly: [default@Devel] set david['david']['id'] = integer(37); Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=37, timestamp=1296075929082000) Hope that helps... again, I'm new to this so maybe I'm not understanding your question. Bill- On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum dquat...@medprocure.com wrote: Nope, I should be getting back the String values that were inserted: [default@TestKeyspace] get custparent['David']; = (column=4164647265737331, value=333038204279205061737320313233, timestamp=1296071732281000) = (column=43697479, value=53656e656361, timestamp=129607174731) = (column=4e616d65, value=546f6d7320466163696c697479, timestamp=1296071708189000) = (column=506f7374616c436f6465, value=3239363738, timestamp=1296071774549000) = (column=537461746550726f76, value=5343, timestamp=1296071760213000) Returned 5 results. Values should be Name and Address Values. -David Q -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 3:45 PM To: user@cassandra.apache.org Subject: RE: Probelms with Set on Byte type New Installation I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January 26, 2011 3:25 PM To: user@cassandra.apache.org Subject: Probelms with Set on Byte type New Installation I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get Value Inserted However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
Re: Schema Design
I have a basic understanding of OPP... if most of my messages come within a single hour then a few nodes could be storing all of my values, right? You totally lost me on, whether to shard data as per system... Is my schema (one column family per system, and row keys as TimeUUIDType) sharding by system? I thought -- probably incorrectly -- that the row keys are used in the sharding process, not column families. Thanks... Bill- On Wed, Jan 26, 2011 at 4:17 PM, buddhasystem potek...@bnl.gov wrote: Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which systems logged later in client. Read up on consequences of using OPP. Whether to shard data as per system depends on how many you have. If more than a few, don't do that, there are memory considerations. Cheers Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Probelms with Set on Byte type New Installation
No worries... it forced me to setup an env to test my understanding. I'm still trying to learn/understand. Bill- On Wed, Jan 26, 2011 at 4:23 PM, David Quattlebaum dquat...@medprocure.com wrote: Bill, You are absolutely correct, I must not have set the default_validation_class when I added the column family. Thanks what I get for continuing to work late into the night! Thanks, DQ Less stupid next time -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 4:09 PM To: user@cassandra.apache.org Subject: Re: Probelms with Set on Byte type New Installation Why would you expect strings? You stated that your comparator is BytesType. If you set the default_validation_class then you can specify what types the values should be returned as: [default@Devel] create column family david with comparator=BytesType and default_validation_class=UTF8Type; 2dabf0fb-298f-11e0-b177-e700f669bcfc [default@Devel] set david['david']['test'] = 'test'; Value inserted. [default@Devel] get david['david']; = (column=74657374, value=test, timestamp=129607562467) Returned 1 results. Now the column name is returned as the ASCII characters for 'test' and the value is returned as a string because of the default_validation_class. That seems to make sense to me. If in the next column you want to sore a number you must store it as such and return it as such: [default@Devel] set david['david']['id'] = 37; Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=13111, timestamp=129607582033) Didn't work because it was inserted as a string not a number/integer. However, if you specify it's a number on the way it, it will return properly: [default@Devel] set david['david']['id'] = integer(37); Value inserted. [default@Devel] get david['david']['id'] as integer; = (column=6964, value=37, timestamp=1296075929082000) Hope that helps... again, I'm new to this so maybe I'm not understanding your question. Bill- On Wed, Jan 26, 2011 at 3:50 PM, David Quattlebaum dquat...@medprocure.com wrote: Nope, I should be getting back the String values that were inserted: [default@TestKeyspace] get custparent['David']; = (column=4164647265737331, value=333038204279205061737320313233, timestamp=1296071732281000) = (column=43697479, value=53656e656361, timestamp=129607174731) = (column=4e616d65, value=546f6d7320466163696c697479, timestamp=1296071708189000) = (column=506f7374616c436f6465, value=3239363738, timestamp=1296071774549000) = (column=537461746550726f76, value=5343, timestamp=1296071760213000) Returned 5 results. Values should be Name and Address Values. -David Q -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 3:45 PM To: user@cassandra.apache.org Subject: RE: Probelms with Set on Byte type New Installation I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January 26, 2011 3:25 PM To: user@cassandra.apache.org Subject: Probelms with Set on Byte type New Installation I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get Value Inserted However, when I get the column value it is a number, even though the Column Family is of Bytes Type: Keyspace: XXX: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Y Columns sorted by: org.apache.cassandra.db.marshal.BytesType Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 0.0703125/15/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] Anyone else had this happen? Did I just miss something stupid? I have not had any issues with earlier versions of Cassandra. David Q David Quattlebaum MedProcure, LLC www.medprocure.com (864)482-2018 - Support (864)482-2019 - Direct
Re: Schema Design
My cli knowledge sucks so far, so I'll leave that to othersI'm doing most of my reading/writing through a thrift client (hector/java based) As for the implications, as of the latest version of Cassandra there is not theoretical limit to the number of columns that a particular row can hold. Over time you've got a couple of different options, if you're concerned that you end up with too many columns to manage then you'd probably want to start thinking about a warehousing strategy long-term for your older records that involves expiring columns that are older than X in your Cassandra cluster. But for the most part you shouldn't *need* to do that. On Wed, Jan 26, 2011 at 3:23 PM, Bill Speirs bill.spe...@gmail.com wrote: I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource. -- *David McNelis* Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 *A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.*
Re: Schema Design
One thing you can do is create one CF, then as the row key use the application name + timestamp, with that you can do your range query using OOP. then store whatever you want in the row problem would be if one app generates far more logs than the others Nicolas Santini On Thu, Jan 27, 2011 at 10:26 AM, Bill Speirs bill.spe...@gmail.com wrote: I have a basic understanding of OPP... if most of my messages come within a single hour then a few nodes could be storing all of my values, right? You totally lost me on, whether to shard data as per system... Is my schema (one column family per system, and row keys as TimeUUIDType) sharding by system? I thought -- probably incorrectly -- that the row keys are used in the sharding process, not column families. Thanks... Bill- On Wed, Jan 26, 2011 at 4:17 PM, buddhasystem potek...@bnl.gov wrote: Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which systems logged later in client. Read up on consequences of using OPP. Whether to shard data as per system depends on how many you have. If more than a few, don't do that, there are memory considerations. Cheers Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964227.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Schema Design
I used the term sharding a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Design-tp5964167p5964326.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Schema Design
Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually. More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirs bill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: Node going down when streaming data, what next?
When this has happened to me, restarting the node you are trying to move works. I can't remeber the exact conditions but I have also hade to restart all nodes in the cluster simultaneously once or twice as well. I would love to know if there is a better way of doing it. On Wednesday, January 26, 2011, buddhasystem potek...@bnl.gov wrote: Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node going down when streaming data, what next?
Hello, from what I know, you don't really have to restart simultaneously, although of course you don't want to wait. I finally decided to use removetoken command to actually scratch out the sickly node from the cluster. I'll bootstrap is later when it's fixed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Why does cassandra stream data when moving tokens?
Sorry if this sounds silly, but I can't get my brain around this one: if all nodes contain replicas, why does the cluster stream data every time I more or remove a token? If the data is already there, what needs to be streamed? Thanks Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964839.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Schema Design
It makes sense that the single row for a system (with a growing number of columns) will reside on a single machine. With that in mind, here is my updated schema: - A single column family for all the messages. The row keys will be the TimeUUID of the message with the following columns: date/time (in UTC POSIX), system name/id (with an index for fast/easy gets), the actual message payload. - A column family for each system. The row keys will be UTC POSIX time with 1 second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of any messages that were logged during that time bucket. My only hesitation with this design is that buddhasystem warned that each column family, is allocated a piece of memory on the server. I'm not sure what the implications of this are and/or if this would be a problem if a I had a number of systems on the order of hundreds. Thanks... Bill- On 01/26/2011 06:51 PM, Shu Zhang wrote: Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually. More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [proj_1_msg.646179, proj_1_msg.686f7572, proj_1_msg.6d696e757465, proj_1_msg.6d6f6e7468, proj_1_msg.7365636f6e64, proj_1_msg.79656172] Column Metadata: Column Name: year (year) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: month (month) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: second (second) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: minute (minute) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: hour (hour) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS Column Name: day (day) Validation Class: org.apache.cassandra.db.marshal.IntegerType Index Type: KEYS -- David McNelis Lead Software Engineer Agentis Energy www.agentisenergy.com o: 630.359.6395 c: 219.384.5143 A Smart Grid technology company focused on helping consumers of energy control an often under-managed resource.
Re: Schema Design
Ah, sweet... thanks for the link! Bill- On 01/26/2011 08:20 PM, buddhasystem wrote: Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters.
RE: Why does cassandra stream data when moving tokens?
Thanks, I'll look at the configuration again. In the meantime, I can't move the first node in the ring (after I removed the previous node's token) -- it throws an exception and says data is being streamed to it -- however, this is not what netstats says! Weirdness continues... Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-does-cassandra-stream-data-when-moving-tokens-tp5964839p5964883.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
[no subject]
-- Geoffry Roberts
Re: Schema Design
I am also working on a system store logs from hundreds system. In my scenario, most query will like this: let's look at login logs (category EQ) of that proxy (host EQ) between this Monday and Wednesday(time range). My data model like this: . only 1 CF. that's enough for this scenario. . group logs from each host and day to one row. Key format is hostname.category.date . store each log entry as a super column, super olumn name is TimeUUID of the log. each attribute as a column. Then this query can be done as 3 GET, no need to do key range scan. Then I can use RP instead of OPP. If I use OPP, I have to worry about load balance myself. I hate that. However, if I need to do a time range access, I can still use column slice. An additional benefit is, I can clean old logs very easily. We only store logs in 1 year. Just deleting by keys can do this job well. I think storing all logs for a host in a single row is not a good choice. 2 reason: 1, too few keys, so your data will not distributing well. 2, data under a key will always increase. So Cassandra have to do more SSTable compaction. -邮件原件- 发件人: William R Speirs [mailto:bill.spe...@gmail.com] 发送时间: 2011年1月27日 9:15 收件人: user@cassandra.apache.org 主题: Re: Schema Design It makes sense that the single row for a system (with a growing number of columns) will reside on a single machine. With that in mind, here is my updated schema: - A single column family for all the messages. The row keys will be the TimeUUID of the message with the following columns: date/time (in UTC POSIX), system name/id (with an index for fast/easy gets), the actual message payload. - A column family for each system. The row keys will be UTC POSIX time with 1 second (maybe 1 minute) bucketing, and the column names will be the TimeUUID of any messages that were logged during that time bucket. My only hesitation with this design is that buddhasystem warned that each column family, is allocated a piece of memory on the server. I'm not sure what the implications of this are and/or if this would be a problem if a I had a number of systems on the order of hundreds. Thanks... Bill- On 01/26/2011 06:51 PM, Shu Zhang wrote: Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually. More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. From: Bill Speirs [bill.spe...@gmail.com] Sent: Wednesday, January 26, 2011 1:23 PM To: user@cassandra.apache.org Subject: Re: Schema Design I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time? 2) maybe it's just a restriction of the CLI, but how do I do issue a slice request? Also, what if start (or end) columns don't exist? I'm guessing it's smart enough to get the columns in that range. Thanks! Bill- On Wed, Jan 26, 2011 at 4:12 PM, David McNelis dmcne...@agentisenergy.com wrote: I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and end column as Y. Then you can use the stored column name to know when an event occurred. On Wed, Jan 26, 2011 at 2:56 PM, Bill Speirsbill.spe...@gmail.com wrote: I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, and message. I then have indexes setup for each of the date/time columns. I was hoping this would allow me to answer queries like: What are all the log messages that were generated between X Y? The problem is that I can ONLY use the equals operator on these column values. For example, I cannot issuing: get system_x where month 1; gives me this error: No indexed columns present in index clause with operator EQ. The equals operator works as expected though: get system_x where month = 1; What schema would allow me to get date ranges? Thanks in advance... Bill- * ColumnFamily description * ColumnFamily: system_x_msg Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period: 0.0/0 Key cache size / save period: 20.0/3600 Memtable thresholds: 1.1671875/249/60 GC grace seconds: 864000 Compaction min/max
Re: the java client problem
I have no clue about this error.. look into log files. They might reveal something Anyone else can help here.? 2011/1/27 Raoyixuan (Shandy) raoyix...@huawei.com It shows error, I put it in the attachment *From:* Ashish [mailto:paliwalash...@gmail.com] *Sent:* Wednesday, January 26, 2011 10:31 PM *To:* user@cassandra.apache.org *Subject:* Re: the java client problem click on the loadSchema() button in right panel :) 2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.com I had find the loasschemafrom yaml by jconsole,How to load the schema ? *From:* Ashish [mailto:paliwalash...@gmail.com] *Sent:* Friday, January 21, 2011 8:10 PM *To:* user@cassandra.apache.org *Subject:* Re: the java client problem check cassandra-install-dir/conf/cassandra.yaml start cassandra connect via jconsole find MBeans - org.apache.cassandra.db - StorageServicehttp://wiki.apache.org/cassandra/StorageService - Operations - loadSchemaFromYAML load the schema and then try the example again. HTH ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com Which schema is it? *From:* Ashish [mailto:paliwalash...@gmail.com] *Sent:* Friday, January 21, 2011 7:57 PM *To:* user@cassandra.apache.org *Subject:* Re: the java client problem you are missing the column family in your keyspace. If you are using the default definitions of schema shipped with cassandra, ensure to load the schema from JMX. thanks ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.com I exec the code as below by hector client: *package* com.riptano.cassandra.hector.example; *import* me.prettyprint.cassandra.serializers.StringSerializer; *import* me.prettyprint.hector.api.Cluster; *import* me.prettyprint.hector.api.Keyspace; *import* me.prettyprint.hector.api.beans.HColumn; *import* me.prettyprint.hector.api.exceptions.HectorException; *import* me.prettyprint.hector.api.factory.HFactory; *import* me.prettyprint.hector.api.mutation.Mutator; *import* me.prettyprint.hector.api.query.ColumnQuery; *import* me.prettyprint.hector.api.query.QueryResult; *public* *class* InsertSingleColumn { *private* *static* StringSerializer *stringSerializer* = StringSerializer. *get*(); *public* *static* *void* main(String[] args) *throws* Exception { Cluster cluster = HFactory.*getOrCreateCluster*(TestCluster, *.*.*.*:9160); Keyspace keyspaceOperator = HFactory.*createKeyspace*(Shandy, cluster); * try* { MutatorString mutator = HFactory.*createMutator*(keyspaceOperator, StringSerializer.*get*()); mutator.insert(jsmith, Standard1, HFactory.* createStringColumn*(first, John)); ColumnQueryString, String, String columnQuery = HFactory.* createStringColumnQuery*(keyspaceOperator); columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first); QueryResultHColumnString, String result = columnQuery.execute(); System.*out*.println(Read HColumn from cassandra: + result.get()); System.*out*.println(Verify on CLI with: get Keyspace1.Standard1['jsmith'] ); } *catch* (HectorException e) { e.printStackTrace(); } cluster.getConnectionManager().shutdown(); } } And it shows the error : *me.prettyprint.hector.api.exceptions.HInvalidRequestException*: InvalidRequestException(why:unconfigured columnfamily Standard1) at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(* ExceptionsTranslatorImpl.java:42*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(* KeyspaceServiceImpl.java:95*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(* KeyspaceServiceImpl.java:88*) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(* Operation.java:89*) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover( *HConnectionManager.java:142*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(* KeyspaceServiceImpl.java:129*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate( *KeyspaceServiceImpl.java:100*) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate( *KeyspaceServiceImpl.java:106*) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(* MutatorImpl.java:149*) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(* MutatorImpl.java:146*) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure( *KeyspaceOperationCallback.java:20*) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(* ExecutingKeyspace.java:65*) at me.prettyprint.cassandra.model.MutatorImpl.execute(* MutatorImpl.java:146*) at me.prettyprint.cassandra.model.MutatorImpl.insert(* MutatorImpl.java:55*) at
repair cause large number of SSTABLEs
i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before nodetool repair and i see 36 data.db files, spread over 11 column families. very reasonable. after running nodetool repair i have over 900 data.db files, immediately! now after waiting several hours i have over 1500 data.db files. out of these i have 95 compacted files lsof reporting 803 files in use by cassandra for the Queues keyspace ... [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues 803 .. this doesn't sound right to me. checking the server log i see a lot of these messages: ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.fastRemove(ArrayList.java:441) at java.util.ArrayList.remove(ArrayList.java:424) at com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219) at com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60) at org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) and a lot of these: ERROR [ReadStage:809] 2011-01-26 21:48:01,047 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:809,5,main] java.lang.ArrayIndexOutOfBoundsException and some more like this: ERROR [ReadStage:15] 2011-01-26 20:59:14,695 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
RE: the java client problem
I had solved this problem. I created the column family firstly, then it’s ok. From: Ashish [mailto:paliwalash...@gmail.com] Sent: Thursday, January 27, 2011 1:16 PM To: user@cassandra.apache.org Subject: Re: the java client problem I have no clue about this error.. look into log files. They might reveal something Anyone else can help here.? 2011/1/27 Raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com It shows error, I put it in the attachment From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com] Sent: Wednesday, January 26, 2011 10:31 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: the java client problem click on the loadSchema() button in right panel :) 2011/1/26 Raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com I had find the loasschemafrom yaml by jconsole,How to load the schema ? From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com] Sent: Friday, January 21, 2011 8:10 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: the java client problem check cassandra-install-dir/conf/cassandra.yaml start cassandra connect via jconsole find MBeans - org.apache.cassandra.db - StorageServicehttp://wiki.apache.org/cassandra/StorageService - Operations - loadSchemaFromYAML load the schema and then try the example again. HTH ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com Which schema is it? From: Ashish [mailto:paliwalash...@gmail.commailto:paliwalash...@gmail.com] Sent: Friday, January 21, 2011 7:57 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: the java client problem you are missing the column family in your keyspace. If you are using the default definitions of schema shipped with cassandra, ensure to load the schema from JMX. thanks ashish 2011/1/21 raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com I exec the code as below by hector client: package com.riptano.cassandra.hector.example; import me.prettyprint.cassandra.serializers.StringSerializer; import me.prettyprint.hector.api.Cluster; import me.prettyprint.hector.api.Keyspace; import me.prettyprint.hector.api.beans.HColumn; import me.prettyprint.hector.api.exceptions.HectorException; import me.prettyprint.hector.api.factory.HFactory; import me.prettyprint.hector.api.mutation.Mutator; import me.prettyprint.hector.api.query.ColumnQuery; import me.prettyprint.hector.api.query.QueryResult; public class InsertSingleColumn { private static StringSerializer stringSerializer = StringSerializer.get(); public static void main(String[] args) throws Exception { Cluster cluster = HFactory.getOrCreateCluster(TestCluster, *.*.*.*:9160); Keyspace keyspaceOperator = HFactory.createKeyspace(Shandy, cluster); try { MutatorString mutator = HFactory.createMutator(keyspaceOperator, StringSerializer.get()); mutator.insert(jsmith, Standard1, HFactory.createStringColumn(first, John)); ColumnQueryString, String, String columnQuery = HFactory.createStringColumnQuery(keyspaceOperator); columnQuery.setColumnFamily(Standard1).setKey(jsmith).setName(first); QueryResultHColumnString, String result = columnQuery.execute(); System.out.println(Read HColumn from cassandra: + result.get()); System.out.println(Verify on CLI with: get Keyspace1.Standard1['jsmith'] ); } catch (HectorException e) { e.printStackTrace(); } cluster.getConnectionManager().shutdown(); } } And it shows the error : me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:unconfigured columnfamily Standard1) at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:88) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:89) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:142) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:100) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:106) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:149) at me.prettyprint.cassandra.model.MutatorImpl$2.doInKeyspace(MutatorImpl.java:146) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at
Generating tokens for Cassandra cluster with ByteOrderedPartitioner
Hey, Can anyone suggest me how to manually generate tokens for Cassandra 0.7.0 cluster, while ByteOrderedPartitioner is being used? Thanks in advance. -- Best regards, Matthew Tovbin.
Using Cassandra for storing large objects
Anyone using Cassandra for storing large number (millions) of large (mostly immutable) objects (200KB-5MB size each)? I would like to understand the experience in general considering that Cassandra is not considered a good fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265 Thanks, Naren