Re: how to implement a client with off-heap memory
The thrift client is just auto generated code, if you really wanted to you may be able to change / override it to modify the SerDe when it pulls things off the wire. Not sure if this does what you are looking for https://issues.apache.org/jira/browse/CASSANDRA-2478 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/10/2012, at 4:59 PM, Manu Zhang owenzhang1...@gmail.com wrote: Hi all, I've been writing a client on Cassandra Thrift API. The client will read almost 1G of data into JVM heap and thus its performance suffers from GC operations. To reduce latency, I'm currently thinking about implementing an off-heap memory (just like that of RowCache) to hold data and manage it myself. The problem is that with Thrift API I read all the data as ListKeySlice directly into heap. Is there a work around? Any other suggestions would also be appreciated. Thanks!
Re: High bandwidth usage between datacenters for cluster
Outbound messages for other DC's are grouped and a single instance is sent to a single node in the remote DC. The remote node then forwards the message on to the other recipients in it's DC. All remote DC nodes will however reply directly to the coordinator. Normally this isn’t an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers. Can you break the traffic down by port and direction ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/10/2012, at 12:18 PM, Bryce Godfrey bryce.godf...@azaleos.com wrote: Network topology with the topology file filled out is already the configuration we are using. From: sankalp kohli [mailto:kohlisank...@gmail.com] Sent: Thursday, October 25, 2012 11:55 AM To: user@cassandra.apache.org Subject: Re: High bandwidth usage between datacenters for cluster Use placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and also fill the topology.properties file. This will tell cassandra that you have two DCs. You can verify that by looking at output of the ring command. If you DCs are setup properly, only one request will go over WAN. Though the responses from all nodes in other DC will go over WAN. On Thu, Oct 25, 2012 at 10:44 AM, Bryce Godfrey bryce.godf...@azaleos.com wrote: We have a 5 node cluster, with a matching 5 nodes for DR in another data center. With a replication factor of 3, does the node I send a write too attempt to send it to the 3 servers in the DR also? Or does it send it to 1 and let it replicate locally in the DR environment to save bandwidth across the WAN? Normally this isn’t an issue for us, but at times we are writing approximately 1MB a sec of data, and seeing a corresponding 3MB of traffic across the WAN to all the Cassandra DR servers. If my assumptions are right, is this configurable somehow for writing to one node and letting it do local replication? We are on 1.1.5 Thanks
Re: Roadmap/Changelog?
For committed changes https://github.com/apache/cassandra/blob/trunk/CHANGES.txt For interesting changer per release https://github.com/apache/cassandra/blob/trunk/NEWS.txt For the road map https://issues.apache.org/jira/browse/CASSANDRA#selectedTab=com.atlassian.jira.plugin.system.project%3Aroadmap-panel Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/10/2012, at 11:56 AM, Timmy Turner timm.t...@gmail.com wrote: Hi everyone, I wrote a library/extension for Cassandra 0.8 a while back, and would like to update it to the current version now, however I can't really find any articles on what has changed in Cassandra. I read the changelog, but those points are too detailed, and it's hard to determine what impact they really have on the functionality. The last things I remember are that CQL v3 was scheduled for 1.1 and supercoloumns would be removed and replaced by compound columns (and included in CQL). Has that already happened? Also it would be interesting to know whether there is any kind of roadmap for Cassandra for new features or functionality that may be introduced in upcoming versions, or features that may be removed in future versions. Thanks!
Re: compression
Hi! Thanks Aaron! Today I restarted Cassandra on that node and ran scrub again, now it is fine. I am worried though that if I decide to change another CF to use compression I will have that issue again. Any clue how to avoid it? Thanks. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Sep 26, 2012 at 3:40 AM, aaron morton aa...@thelastpickle.comwrote: Check the logs on nodes 2 and 3 to see if the scrub started. The logs on 1 will be a good help with that. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 10:31 PM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I ran UPDATE COLUMN FAMILY cf_name WITH compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; I then ran on all my nodes (3) sudo nodetool -h localhost scrub tok cf_name I have replication factor 3. The size of the data on disk was cut in half in the first node and in the jmx I can see that indeed the compression ration is 0.46. But on nodes 2 and 3 nothing happened. In the jmx I can see that compression ratio is 0 and the size of the files of disk stayed the same. In cli ColumnFamily: cf_name Key Validation Class: org.apache.cassandra.db.marshal.UUIDType Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type) Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: chunk_length_kb: 64 sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Can anyone help? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Sep 24, 2012 at 8:37 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks all, that helps. Will start with one - two CFs and let you know the effect *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean dean.hil...@nrel.govwrote: As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, September 23, 2012 11:46 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: compression column metadata, you're still likely to get a reasonable amount of compression. This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows. Compression will almost always be beneficial unless you're already somehow CPU bound or are using large column values that are high in entropy, such as pre-compressed or encrypted data. tokLogo.png
Re: Hinted Handoff storage inflation
With both data centers functional, the test takes just a few minutes to run, with one data center down, 15x the amount of time. Could you provide the numbers, it's easier to get a feel for how the throughput is dropping. Does latency reported by nodetool cf stats change ? I'm also interested to know how long hints were collected for. Each coordinator will be writing three hints, which will be slowing down the other writes it needs to do. but I found that the storage overhead was the same regardless of the size of the batch mutation (i.e., 5 vs 25 mutations made no difference). Batch size makes no difference. Each row mutation is treated as an individual command, the batch is simply a way to reduce network calls. Each write is new data only (no overwrites). Each mutation adds a row to one column family with a column containing about ~100 bytes of data and a new row to another column family with a SuperColumn containing 2x17KiB payloads. I cannot remember anyone raising this sort of issue about HH before. It may be that no one has looked at how that level of hints is handled. Could you reproduce the problem with a smaller test case ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/10/2012, at 7:56 AM, Mattias Larsson mlars...@yahoo-inc.com wrote: On Oct 24, 2012, at 6:05 PM, aaron morton wrote: Hints store the columns, row key, KS name and CF id(s) for each mutation to each node. Where an executed mutation will store the most recent columns collated with others under the same row key. So depending on the type of mutation hints will take up more space. The worse case would be lots of overwrites. After that writing a small amount of data to many rows would result in a lot of the serialised space being devoted to row keys, KS name and CF id. 16Gb is a lot though. What was the write workload like ? Each write is new data only (no overwrites). Each mutation adds a row to one column family with a column containing about ~100 bytes of data and a new row to another column family with a SuperColumn containing 2x17KiB payloads. These are sent in batches with several in them, but I found that the storage overhead was the same regardless of the size of the batch mutation (i.e., 5 vs 25 mutations made no difference). A total of 1,000,000 mutations like these are sent over the duration of the test. You can get an estimate on the number of keys in the Hints CF using nodetool cfstats. Also some metrics in the JMX will tell you how many hints are stored. This has a huge impact on write performance as well. Yup. Hints are added to the same Mutation thread pool as normal mutations. They are processed async to the mutation request but they still take resources to store. You can adjust how long hints a collected for with max_hint_window_in_ms in the yaml file. How long did the test run for ? With both data centers functional, the test takes just a few minutes to run, with one data center down, 15x the amount of time. /dml
Re: compression
I have no clue. I never did it even if I am planning to do so. 1 - Did you just spent 1 month with a cluster in an unstable state ? Had you any issue during this time related to the transitional state of your cluster ? I am currently storing counters with: row = objectId, column name = date#event, data = counter (date format 20121029). 2 - Is it a good Idea to compress this kind of data ? I am looking for using composites columns. 3 - What are the benefits of using a column name like CompositeType(UTF8Type, UTF8Type) and a simple UTF8 column with event and date separated by a sharp as I am doing right now ? 4 - Would compression be a good idea in this case ? Thanks for your help on any of these 4 points :). Alain 2012/10/29 Tamar Fraenkel ta...@tok-media.com Hi! Thanks Aaron! Today I restarted Cassandra on that node and ran scrub again, now it is fine. I am worried though that if I decide to change another CF to use compression I will have that issue again. Any clue how to avoid it? Thanks. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Sep 26, 2012 at 3:40 AM, aaron morton aa...@thelastpickle.comwrote: Check the logs on nodes 2 and 3 to see if the scrub started. The logs on 1 will be a good help with that. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 10:31 PM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I ran UPDATE COLUMN FAMILY cf_name WITH compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; I then ran on all my nodes (3) sudo nodetool -h localhost scrub tok cf_name I have replication factor 3. The size of the data on disk was cut in half in the first node and in the jmx I can see that indeed the compression ration is 0.46. But on nodes 2 and 3 nothing happened. In the jmx I can see that compression ratio is 0 and the size of the files of disk stayed the same. In cli ColumnFamily: cf_name Key Validation Class: org.apache.cassandra.db.marshal.UUIDType Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type) Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: chunk_length_kb: 64 sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Can anyone help? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Sep 24, 2012 at 8:37 AM, Tamar Fraenkel ta...@tok-media.comwrote: Thanks all, that helps. Will start with one - two CFs and let you know the effect *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean dean.hil...@nrel.govwrote: As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, September 23, 2012 11:46 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: compression column metadata, you're still likely to get a reasonable amount of compression. This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows. Compression will almost always be beneficial unless you're already somehow CPU bound or are using large column values that are high in entropy, such as pre-compressed or encrypted data. tokLogo.png
CQL3: Unknown property 'comparator'?
Does CQL3 not allow dynamic columns (column names) any more?
Re: CQL3: Unknown property 'comparator'?
CQL3 does absolutely allow dynamic column families, but does it differently from CQL2. See http://www.datastax.com/dev/blog/cql3-for-cassandra-experts. -- Sylvain On Mon, Oct 29, 2012 at 12:34 PM, Timmy Turner timm.t...@gmail.com wrote: Does CQL3 not allow dynamic columns (column names) any more?
Re: CQL3: Unknown property 'comparator'?
Thank you! That article helps clear up a lot of my confusion about the changes between CQL 2 and 3, since I was wondering how to access/manipulate CompositeType/DynamicCompositeType columns through CQL. So does this mean that in CQL 3 an explicit schema is absolutely mandatory? It's now impossible (within CQL) to add new (non-primary-key) columns only for individual rows implicitly with DML-queries (insert/update)?. 2012/10/29 Sylvain Lebresne sylv...@datastax.com: CQL3 does absolutely allow dynamic column families, but does it differently from CQL2. See http://www.datastax.com/dev/blog/cql3-for-cassandra-experts. -- Sylvain On Mon, Oct 29, 2012 at 12:34 PM, Timmy Turner timm.t...@gmail.com wrote: Does CQL3 not allow dynamic columns (column names) any more?
Re: ColumnFamilyInputFormat - error when column name is UUID
Answering myself: it seems we can't have any non type 1 UUIDs in column names. I used the UTF8 comparator and saved my UUIDs as strings, it worked. 2012/10/29 Marcelo Elias Del Valle mvall...@gmail.com Hello, I am using ColumnFamilyInputFormat the same way it's described in this example: https://github.com/apache/cassandra/blob/trunk/examples/hadoop_word_count/src/WordCount.java#L215 I have been able to successfully process data in cassandra by using hadoop. However, as this solution doesn't allow me to filter which data in cassandra I want to filter, I decided to create a query column family to list data I want to process in hadoop. This column family is as follows: row key: MM column name: UUID - user ID column value: timestamp - last processed date The problem is, when I run hadoop, I get the exception bellow. Is there any limitation in having UUIDs as column names? I am generating my user IDs with java.util.UUID.randomUUID() for now. I could change the method later, but only type 1 UUIDs are 16 bits longer, isn't it? java.lang.RuntimeException: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:391) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:397) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:323) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:188) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12254) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:356) ... 11 more Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: ColumnFamilyInputFormat - error when column name is UUID
Marcelo, das vezes q tive este problema geralmente era porque o valor UUID sendo tratado para o cassandra não correspondia a um valor exato em UUID, para isso utilizava bastante o UUID.randomUUID() (para gerar um UUID valido) e UUID.fromString(081f4500-047e-401c-8c0b-a41fefd099d7) - este para transformar uma String em UUID valido. Como temos 2 keyspaces no cassandra (dmp_input-Astyanax) e (dmp-PlayOrm) pode acontecer destes frameworks tratarem as chaves UUID de maneira diferentes (em nossa implementação feita ) portanto acho válido a solução que você encontrou (sorry por não ter enxergado o probs antes caso, seja este o seu caso ...) Abs, André 2012/10/29 Marcelo Elias Del Valle mvall...@gmail.com Answering myself: it seems we can't have any non type 1 UUIDs in column names. I used the UTF8 comparator and saved my UUIDs as strings, it worked. 2012/10/29 Marcelo Elias Del Valle mvall...@gmail.com Hello, I am using ColumnFamilyInputFormat the same way it's described in this example: https://github.com/apache/cassandra/blob/trunk/examples/hadoop_word_count/src/WordCount.java#L215 I have been able to successfully process data in cassandra by using hadoop. However, as this solution doesn't allow me to filter which data in cassandra I want to filter, I decided to create a query column family to list data I want to process in hadoop. This column family is as follows: row key: MM column name: UUID - user ID column value: timestamp - last processed date The problem is, when I run hadoop, I get the exception bellow. Is there any limitation in having UUIDs as column names? I am generating my user IDs with java.util.UUID.randomUUID() for now. I could change the method later, but only type 1 UUIDs are 16 bits longer, isn't it? java.lang.RuntimeException: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:391) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:397) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:323) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:188) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12254) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:356) ... 11 more Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: ColumnFamilyInputFormat - error when column name is UUID
Hmm, this brings the question of what uuid libraries are others using? I know this one generates type 1 UUIDs with two longs so it is 16 bytes. http://johannburkard.de/software/uuid/ Thanks, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto:mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, October 29, 2012 1:17 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: ColumnFamilyInputFormat - error when column name is UUID Answering myself: it seems we can't have any non type 1 UUIDs in column names. I used the UTF8 comparator and saved my UUIDs as strings, it worked.
Re: Simulating a failed node
Thanks, extremely helpful. The key bit was I wasn't flushing the old Keyspace before re-running the stress test, so I was stuck at RF = 1 from a previous run despite passing RF = 2 to the stress tool. On Sun, Oct 28, 2012 at 2:49 AM, Peter Schuller peter.schul...@infidyne.com wrote: Operation [158320] retried 10 times - error inserting key 0158320 ((UnavailableException)) This means that at the point where the thrift request to write data was handled, the co-ordinator node (the one your client is connected to) believed that, among the replicas responsible for the key, too many were down to satisfy the consistency level. Most likely causes would be that you're in fact not using RF 2 (e.g., is the RF really 1 for the keyspace you're inserting into), or you're in fact not using ONE. I'm sure my naive setup is flawed in some way, but what I was hoping for was when the node went down it would fail to write to the downed node and instead write to one of the other nodes in the clusters. So question is why are writes failing even after a retry? It might be the stress client doesn't pool connections (I took Write always go to all responsible replicas that are up, and when enough return (according to consistency level), the insert succeeds. If replicas fail to respond you may get a TimeoutException. UnavailableException means it didn't even try because it didn't have enough replicas to even try to write to. (Note though: Reads are a bit of a different story and if you want to test behavior when nodes go down I suggest including that. See CASSANDRA-2540 and CASSANDRA-3927.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: ColumnFamilyInputFormat - error when column name is UUID
Dean, Are type 1 UUIDs the best ones to use if I want to avoid conflict? I saw this page: http://en.wikipedia.org/wiki/Universally_unique_identifier The only problem with type 1 UUIDs is they are not opaque? I know there is one kind of UUID that can generate two equal values if you generate them at the same milisecond, but I guess I was confusing them... Best regards, Marcelo Valle. 2012/10/29 Hiller, Dean dean.hil...@nrel.gov Hmm, this brings the question of what uuid libraries are others using? I know this one generates type 1 UUIDs with two longs so it is 16 bytes. http://johannburkard.de/software/uuid/ Thanks, Dean From: Marcelo Elias Del Valle mvall...@gmail.commailto: mvall...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, October 29, 2012 1:17 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: ColumnFamilyInputFormat - error when column name is UUID Answering myself: it seems we can't have any non type 1 UUIDs in column names. I used the UTF8 comparator and saved my UUIDs as strings, it worked. -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: ColumnFamilyInputFormat - error when column name is UUID
Err... Guess you replied in portuguese to the list :D 2012/10/29 Andre Tavares andre...@gmail.com Marcelo, das vezes q tive este problema geralmente era porque o valor UUID sendo tratado para o cassandra não correspondia a um valor exato em UUID, para isso utilizava bastante o UUID.randomUUID() (para gerar um UUID valido) e UUID.fromString(081f4500-047e-401c-8c0b-a41fefd099d7) - este para transformar uma String em UUID valido. Como temos 2 keyspaces no cassandra (dmp_input-Astyanax) e (dmp-PlayOrm) pode acontecer destes frameworks tratarem as chaves UUID de maneira diferentes (em nossa implementação feita ) portanto acho válido a solução que você encontrou (sorry por não ter enxergado o probs antes caso, seja este o seu caso ...) Abs, André 2012/10/29 Marcelo Elias Del Valle mvall...@gmail.com Answering myself: it seems we can't have any non type 1 UUIDs in column names. I used the UTF8 comparator and saved my UUIDs as strings, it worked. 2012/10/29 Marcelo Elias Del Valle mvall...@gmail.com Hello, I am using ColumnFamilyInputFormat the same way it's described in this example: https://github.com/apache/cassandra/blob/trunk/examples/hadoop_word_count/src/WordCount.java#L215 I have been able to successfully process data in cassandra by using hadoop. However, as this solution doesn't allow me to filter which data in cassandra I want to filter, I decided to create a query column family to list data I want to process in hadoop. This column family is as follows: row key: MM column name: UUID - user ID column value: timestamp - last processed date The problem is, when I run hadoop, I get the exception bellow. Is there any limitation in having UUIDs as column names? I am generating my user IDs with java.util.UUID.randomUUID() for now. I could change the method later, but only type 1 UUIDs are 16 bits longer, isn't it? java.lang.RuntimeException: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:391) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:397) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:323) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:188) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) Caused by: InvalidRequestException(why:UUIDs must be exactly 16 bytes) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12254) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:683) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:667) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:356) ... 11 more Best regards, -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr
Re: Benifits by adding nodes to the cluster
This is how cassandra scales. More nodes means better performance. thank you, Andrey On Mon, Oct 29, 2012 at 2:57 PM, Roshan codeva...@gmail.com wrote: Hi All This may be a silly question, but what kind of benefits we can get by adding new nodes to the cluster? Some may be high availability. Any others? /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Benifits-by-adding-nodes-to-the-cluster-tp7583437.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Hinted Handoff runs every ten minutes
I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0. How can I check to see why it keeps running HintedHandoff? Steve -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Wednesday, October 24, 2012 4:56 AM To: user@cassandra.apache.org Subject: Re: Hinted Handoff runs every ten minutes On Sun, Oct 21, 2012 at 6:44 PM, aaron morton aa...@thelastpickle.com wrote: I *think* this may be ghost rows which have not being compacted. You would be correct in the case of 1.0.8: https://issues.apache.org/jira/browse/CASSANDRA-3955 -Brandon
Re: Hinted Handoff runs every ten minutes
Dne 29.10.2012 23:24, Stephen Pierce napsal(a): I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0. How can I check to see why it keeps running HintedHandoff? you have tombstone is system.HintsColumnFamily use list command in cassandra-cli to check
idea drive layout - 4 drives + RAID question
For a server with 4 drive slots only, I'm thinking: either: - OS (1 drive) - Commit Log (1 drive) - Data (2 drives, software raid 0) vs - OS + Data (3 drives, software raid 0) - Commit Log (1 drive) or something else? also, if I can spare the wasted storage, would RAID 10 for cassandra data improve read performance and have no effect on write performance? Thank you!
Re: compression
Any clue how to avoid it? Not really sure what went wrong. Diagnosing that sort of problem usually takes access to the running node and time to poke around and see what it does in responses to various things. Rebooting works for Windows 95 and Cassandra is not that different. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/10/2012, at 9:12 PM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! Thanks Aaron! Today I restarted Cassandra on that node and ran scrub again, now it is fine. I am worried though that if I decide to change another CF to use compression I will have that issue again. Any clue how to avoid it? Thanks. Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Sep 26, 2012 at 3:40 AM, aaron morton aa...@thelastpickle.com wrote: Check the logs on nodes 2 and 3 to see if the scrub started. The logs on 1 will be a good help with that. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 10:31 PM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I ran UPDATE COLUMN FAMILY cf_name WITH compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; I then ran on all my nodes (3) sudo nodetool -h localhost scrub tok cf_name I have replication factor 3. The size of the data on disk was cut in half in the first node and in the jmx I can see that indeed the compression ration is 0.46. But on nodes 2 and 3 nothing happened. In the jmx I can see that compression ratio is 0 and the size of the files of disk stayed the same. In cli ColumnFamily: cf_name Key Validation Class: org.apache.cassandra.db.marshal.UUIDType Default column value validator: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type) Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: chunk_length_kb: 64 sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor Can anyone help? Thanks Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Sep 24, 2012 at 8:37 AM, Tamar Fraenkel ta...@tok-media.com wrote: Thanks all, that helps. Will start with one - two CFs and let you know the effect Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Sep 23, 2012 at 8:21 PM, Hiller, Dean dean.hil...@nrel.gov wrote: As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, September 23, 2012 11:46 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: compression column metadata, you're still likely to get a reasonable amount of compression. This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows. Compression will almost always be beneficial unless you're already somehow CPU bound or are using large column values that are high in entropy, such as pre-compressed or encrypted data.
Re: idea drive layout - 4 drives + RAID question
I'm not sure whether the raid 0 gets you anything other than headaches should one of the drives fail. You can already distribute the individual Cassandra column families on different drives by just setting up symlinks to the individual folders. 2012/10/30 Ran User ranuse...@gmail.com: For a server with 4 drive slots only, I'm thinking: either: - OS (1 drive) - Commit Log (1 drive) - Data (2 drives, software raid 0) vs - OS + Data (3 drives, software raid 0) - Commit Log (1 drive) or something else? also, if I can spare the wasted storage, would RAID 10 for cassandra data improve read performance and have no effect on write performance? Thank you!
Re: CQL3: Unknown property 'comparator'?
More background http://www.datastax.com/dev/blog/thrift-to-cql3 So does this mean that in CQL 3 an explicit schema is absolutely mandatory? Not really, it sort of depends on your view. Lets say this is a schema free CF definition in CLI create column family clicks with key_validation_class = UTF8Type and comparator = DateType and default_validation_class = UTF8Type It could be used for wide rows with lots of columns, where the name is a date. As the article at the top says, this CQL 3 DDL is equivalent: CREATE TABLE clicks ( key text, column1 timestamp, value text, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE This creates a single row inside C*, column name is a date. The difference is CQL 3 pivots this one storage engine row into multiple CQL 3 rows. (See article) So far so good. Let's add some schema: CREATE TABLE clicks ( user_id text, click_time timestamp, click_url text, PRIMARY KEY (user_id, click_time) ) WITH COMPACT STORAGE That's functionally the same but has some more schema in it. It tells CQL 3 that the label to use for the name of a column is click_time. Previously the label was column1. It's now impossible (within CQL) to add new (non-primary-key) columns only for individual rows implicitly with DML-queries (insert/update)?. Is your use case covered in the article above ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/10/2012, at 2:31 AM, Timmy Turner timm.t...@gmail.com wrote: Thank you! That article helps clear up a lot of my confusion about the changes between CQL 2 and 3, since I was wondering how to access/manipulate CompositeType/DynamicCompositeType columns through CQL. So does this mean that in CQL 3 an explicit schema is absolutely mandatory? It's now impossible (within CQL) to add new (non-primary-key) columns only for individual rows implicitly with DML-queries (insert/update)?. 2012/10/29 Sylvain Lebresne sylv...@datastax.com: CQL3 does absolutely allow dynamic column families, but does it differently from CQL2. See http://www.datastax.com/dev/blog/cql3-for-cassandra-experts. -- Sylvain On Mon, Oct 29, 2012 at 12:34 PM, Timmy Turner timm.t...@gmail.com wrote: Does CQL3 not allow dynamic columns (column names) any more?
Re: idea drive layout - 4 drives + RAID question
I was hoping to achieve approx. 2x IO (write and read) performance via RAID 0 (by accepting a higher MTBF). Do believe the performance gains of RAID0 are much lower and/or are not worth it vs the increased server failure rate? From my understanding, RAID 10 would achieve the read performance benefits of RAID 0, but not the write benefits. I'm also considering RAID 10 to maximize server IO performance. Currently, we're working with 1 CF. Thank you On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.com wrote: I'm not sure whether the raid 0 gets you anything other than headaches should one of the drives fail. You can already distribute the individual Cassandra column families on different drives by just setting up symlinks to the individual folders. 2012/10/30 Ran User ranuse...@gmail.com: For a server with 4 drive slots only, I'm thinking: either: - OS (1 drive) - Commit Log (1 drive) - Data (2 drives, software raid 0) vs - OS + Data (3 drives, software raid 0) - Commit Log (1 drive) or something else? also, if I can spare the wasted storage, would RAID 10 for cassandra data improve read performance and have no effect on write performance? Thank you!
Re: idea drive layout - 4 drives + RAID question
Have you considered running RAID 10 for the data drives to improve MTBF? On one hand Cassandra is handling redundancy issues, on the other hand, reducing the frequency of dealing with failed nodes is attractive if cheap (switching RAID levels to 10). We have no experience with software RAID (have always used hardware raid with BBU). I'm assuming software RAID 1 or 10 (the mirroring part) is inherently reliable (perhaps minus some edge case). On Tue, Oct 30, 2012 at 1:07 AM, Tupshin Harper tups...@tupshin.com wrote: I would generally recommend 1 drive for OS and commit log and 3 drive raid 0 for data. The raid does give you good performance benefit, and it can be convenient to have the OS on a side drive for configuration ease and better MTBF. -Tupshin On Oct 29, 2012 8:56 PM, Ran User ranuse...@gmail.com wrote: I was hoping to achieve approx. 2x IO (write and read) performance via RAID 0 (by accepting a higher MTBF). Do believe the performance gains of RAID0 are much lower and/or are not worth it vs the increased server failure rate? From my understanding, RAID 10 would achieve the read performance benefits of RAID 0, but not the write benefits. I'm also considering RAID 10 to maximize server IO performance. Currently, we're working with 1 CF. Thank you On Mon, Oct 29, 2012 at 11:51 PM, Timmy Turner timm.t...@gmail.comwrote: I'm not sure whether the raid 0 gets you anything other than headaches should one of the drives fail. You can already distribute the individual Cassandra column families on different drives by just setting up symlinks to the individual folders. 2012/10/30 Ran User ranuse...@gmail.com: For a server with 4 drive slots only, I'm thinking: either: - OS (1 drive) - Commit Log (1 drive) - Data (2 drives, software raid 0) vs - OS + Data (3 drives, software raid 0) - Commit Log (1 drive) or something else? also, if I can spare the wasted storage, would RAID 10 for cassandra data improve read performance and have no effect on write performance? Thank you!
Throughput decreases as latency increases with YCSB
Hi, I'm currently benchmarking Cassandra and have encountered some interesting behavior. As I increase the number of client threads (and connections), latency increases as expected but, at some point, throughput actually decreases. I've seen a few posts about this online, with no clear resolution: If we move to higher threadcounts, throughput does not increase and even decreases. Do you have any idea why this is happening and possibly suggestions how to scale throughput to much higher numbers? [1] If you want to increase throughput, try increasing the number of clients. Of course, it doesnt mean that throughtput will always increase. My observation was that it will increase and after certain number of clients throughput decrease again. [2] You can see a graph of the behavior I'm experiencing here: https://dl.dropbox.com/u/34647904/cassandra-lat-thru.pdf I'm using YCSB on EC2 with one m1.large instance to drive client load and one m1.large instance for a single Cassandra node with maximum connections set to 1024 and with Cassandra's files on RAID0 ephemeral storage. This problem occurs when commitlog sync is both batch and periodic, with HSHA and sync on, and with a variety of heapsize settings. As far as I can tell, this isn't due to GC and nodetool tpstats isn't showing any dropped requests or even serious queuing. Any thoughts? My guess is that this reflects some sort of overhead due to the extra connections--perhaps something due to context switching? Thanks, Peter [1] http://mail-archives.apache.org/mod_mbox/cassandra-user/201102.mbox/%3C12ECB704F2665F40A9C09018C73D95AEC92A8F3618@IE2RD2XVS011.red002.local%3E [2] http://grokbase.com/t/cassandra/user/127h25p3hy/cassandra-evaluation-benchmarking-throughput-not-scaling-as-expected-neither-latency-showing-good-numbers#20120718x3cpg6enq250gbjg19ns14678g [3] Example Bash script: https://gist.github.com/3978273
Re: Throughput decreases as latency increases with YCSB
I'm using YCSB on EC2 with one m1.large instance to drive client load To add, I don't believe this is due to YCSB. I've done a fair bit of client-side profiling and neither client CPU or NIC (or server NIC) are bottlenecks. I'll also add that this dataset fits in memory. Thanks! Peter