CQL3 PreparedStatement - parameterized timestamp
Is there a way to re-use a prepared statement with different using timestamp values? BEGIN BATCH USING timestamp INSERT INTO Foo (a,b,c) values (?,?,?) ... APPLY BATCH; Once bound or while binding the prepared statement to specific values, I'd like to set the timestamp value. Putting a question mark in for timestamp failed as expected and I don't see a method on the DataStax java driver BoundStatement for setting it. Thanks in advance. /Mike Sample
Re: CQL3 PreparedStatement - parameterized timestamp
Not yet: https://issues.apache.org/jira/browse/CASSANDRA-4450 -- Sylvain On Wed, Feb 6, 2013 at 9:06 AM, Mike Sample mike.sam...@gmail.com wrote: Is there a way to re-use a prepared statement with different using timestamp values? BEGIN BATCH USING timestamp INSERT INTO Foo (a,b,c) values (?,?,?) ... APPLY BATCH; Once bound or while binding the prepared statement to specific values, I'd like to set the timestamp value. Putting a question mark in for timestamp failed as expected and I don't see a method on the DataStax java driver BoundStatement for setting it. Thanks in advance. /Mike Sample
Re: CQL3 PreparedStatement - parameterized timestamp
Thanks Sylvain. I should have scanned Jira first. Glad to see it's on the todo list. On Wed, Feb 6, 2013 at 12:24 AM, Sylvain Lebresne sylv...@datastax.comwrote: Not yet: https://issues.apache.org/jira/browse/CASSANDRA-4450 -- Sylvain On Wed, Feb 6, 2013 at 9:06 AM, Mike Sample mike.sam...@gmail.com wrote: Is there a way to re-use a prepared statement with different using timestamp values? BEGIN BATCH USING timestamp INSERT INTO Foo (a,b,c) values (?,?,?) ... APPLY BATCH; Once bound or while binding the prepared statement to specific values, I'd like to set the timestamp value. Putting a question mark in for timestamp failed as expected and I don't see a method on the DataStax java driver BoundStatement for setting it. Thanks in advance. /Mike Sample
Estimating write throughput with LeveledCompactionStrategy
Dear Community, Could anyone please give me a hand with understanding what am I missing while trying to model how LeveledCompactionStrategy works: https://docs.google.com/spreadsheet/ccc?key=0AvNacZ0w52BydDQ3N2ZPSks2OHR1dlFmMVV4d1E2eEE#gid=0 Logs mostly contain something like this: INFO [CompactionExecutor:2235] 2013-02-06 02:32:29,758 CompactionTask.java (line 221) Compacted to [chunks-hf-285962-Data.db,chunks-hf-285963-Data.db,chunks-hf-285964-Data.db,chunks-hf-285965-Data.db,chunks-hf-285966-Data.db,chunks-hf-285967-Data.db,chunks-hf-285968-Data.db,chunks-hf-285969-Data.db,chunks-hf-285970-Data.db,chunks-hf-285971-Data.db,chunks-hf-285972-Data.db,chunks-hf-285973-Data.db,chunks-hf-285974-Data.db,chunks-hf-285975-Data.db,chunks-hf-285976-Data.db,chunks-hf-285977-Data.db,chunks-hf-285978-Data.db,chunks-hf-285979-Data.db,chunks-hf-285980-Data.db,]. 2,255,863,073 to 1,908,460,931 (~84% of original) bytes for 36,868 keys at 14.965795MB/s. Time: 121,614ms. Thus spreadsheet is parameterized with throughput being 15Mb and survivor ratio of 0.9. 1) Projected result actually differs from what I observe - what am I missing? 2) Are there any metrics on write throughput with LCS per node anyone could possibly share? Thank you very much in advance, Ivan
RE: DataModel Question
1) Version is 1.2 2) DynamicComposites : I read somewhere that they are not recommended ? 3) Good point. I need to think about that one. From: Tamar Fraenkel [mailto:ta...@tok-media.com] Sent: 06 February 2013 00:50 To: user@cassandra.apache.org Subject: Re: DataModel Question Hi! I have couple of questions regarding your model: 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think. 2. Maybe I don't understand your model, but I think you need DynamicComposite columns, as user columns are different in number of components and maybe type. 3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media [Inline image 1] ta...@tok-media.commailto:ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra mishra.v...@gmail.commailto:mishra.v...@gmail.com wrote: Avoid super columns. If you need Sorted, wide rows then go for Composite columns. -Vivek On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key : Composite key [ PhoneNum : Day] *Example: 19876543456:05022013 Dynamic Column Families *Composite column key for SMS [SMS:MessageId:TimeUUID] *Composite column key for MMS [MMS:MessageId:TimeUUID] *Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ? 198:05022013 SMS::ttt SMS:xxx12:ttt MMS::ttt :19 198:05022013 1987888:05022013 Thanks, Kanwar inline: image001.png
Re: Pycassa vs YCSB results.
On Tue, 2013-02-05 at 13:51 -0500, Edward Capriolo wrote: Without stating the obvious, if you are interested in scale, then why pick python?. I would (kind of) agree with this point.. If you absolutely need performance here then python isn't the right choice. If, however, you are currently working with python and the question was just why is pycassa not as fast as YCSB, can I make it faster then I'd say the fact it was only a constant factor of 2 slower shows it's perfectly possible to stick with python. FYI the setup I'm using to write data to Cassandra is based around a series of 0mq applications - the actual loader is in python, but filtering steps before that are (partly) in C. Tim
RE: unbalanced ring
Thanks Aaron. I ran the cassandra-shuffle job and did a rebuild and compact on each of the nodes. [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status Datacenter: 28 == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.28.205.125 1.7 GB 255 33.7% 3daab184-61f0-49a0-b076-863f10bc8c6c 205 UN 10.28.205.126 591.44 MB 256 99.9% 55bbd4b1-8036-4e32-b975-c073a7f0f47f 205 UN 10.28.205.127 112.28 MB 257 66.4% d240c91f-4901-40ad-bd66-d374a0ccf0b9 205 So this is a little better. At last node 3 has some content, but they are still far from balanced. If I am understand this correctly, this is the distribution I would expect if the tokens were set at 15/5/1 rather than equal. As configured, I would expect roughly equal amounts of data on each node. Is that right? Do you have any suggestions for what I can look at to get there? I have about 11M rows of data in this keyspace and none of them are exceptionally long ... it's data pulled from Oracle and didn't include any BLOB, etc. Stephen Thompson Wells Fargo Corporation Internet Authentication Fraud Prevention 704.427.3137 (W) | 704.807.3431 (C) This message may contain confidential and/or privileged information, and is intended for the use of the addressee only. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, February 05, 2013 3:41 PM To: user@cassandra.apache.org Subject: Re: unbalanced ring Use nodetool status with vnodes http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes The different load can be caused by rack affinity, are all the nodes in the same rack ? Another simple check is have you created some very big rows? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/02/2013, at 8:40 AM, stephen.m.thomp...@wellsfargo.commailto:stephen.m.thomp...@wellsfargo.com wrote: So I have three nodes in a ring in one data center. My configuration has num_tokens: 256 set andinitial_token commented out. When I look at the ring, it shows me all of the token ranges of course, and basically identical data for each range on each node. Here is the Cliff's Notes version of what I see: [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring Datacenter: 28 == Replicas: 1 Address RackStatus State LoadOwns Token 9187343239835811839 10.28.205.125 205 Up Normal 2.85 GB 33.69% -3026347817059713363 10.28.205.125 205 Up Normal 2.85 GB 33.69% -3026276684526453414 10.28.205.125 205 Up Normal 2.85 GB 33.69% -3026205551993193465 (etc) 10.28.205.126 205 Up Normal 1.15 GB 100.00% -9187343239835811840 10.28.205.126 205 Up Normal 1.15 GB 100.00% -9151314442816847872 10.28.205.126 205 Up Normal 1.15 GB 100.00% -9115285645797883904 (etc) 10.28.205.127 205 Up Normal 69.13 KB66.30% -9223372036854775808 10.28.205.127 205 Up Normal 69.13 KB66.30% 36028797018963967 10.28.205.127 205 Up Normal 69.13 KB66.30% 72057594037927935 (etc) So at this point I have a number of questions. The biggest question is of Load. Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 0.69 GB? These boxes are all comparable and all configured identically. partitioner: org.apache.cassandra.dht.Murmur3Partitioner I'm sorry to ask so many questions - I'm having a hard time finding documentation that explains this stuff. Stephen
Re: Why do Datastax docs recommend Java 6?
Oracle already did this once, It was called jrockit :) http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html Typically oracle acquires they technology and then the bits are merged with the standard JVM. On Wed, Feb 6, 2013 at 2:13 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: I would prefer Oracle to own an Azul’s Zing JVM over any other (GC) to provide it for free for anyone :) Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* jef...@gmail.com [mailto:jef...@gmail.com] *Sent:* Wednesday, February 06, 2013 02:23 *To:* user@cassandra.apache.org *Subject:* Re: Why do Datastax docs recommend Java 6? Oracle now owns the sun hotspot team, which is inarguably the highest powered java vm team in the world. Its still really the epicenter of all java vm development. Sent from my Verizon Wireless BlackBerry -- *From: *Ilya Grebnov i...@metricshub.com *Date: *Tue, 5 Feb 2013 14:09:33 -0800 *To: *user@cassandra.apache.org *ReplyTo: *user@cassandra.apache.org *Subject: *RE: Why do Datastax docs recommend Java 6? Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I could not find this information online. Thanks, Ilya *From:* Michael Kjellman [mailto:mkjell...@barracuda.commkjell...@barracuda.com] *Sent:* Tuesday, February 05, 2013 7:29 AM *To:* user@cassandra.apache.org *Subject:* Re: Why do Datastax docs recommend Java 6? There have been tons of threads/convos on this. In the early days of Java 7 it was pretty unstable and there was pretty much no convincing reason to use Java 7 over Java 6. Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable decision to use Java 7 and we do it in production with no issues to speak of. That being said there was one potential situation we've seen as a community where bootstrapping new node was using 3x more CPU and getting significantly less throughput. However, reproducing this consistently never happened AFAIK. I think until more people use Java 7 in production and prove it doesn't cause any additional bugs/performance issues Datastax will update their docs. Until now I'd say it's a safe bet to use Java 7 with Vanilla C* 1.2.1. I hope this helps! Best, Michael *From: *Baron Schwartz ba...@xaprb.com *Reply-To: *user@cassandra.apache.org user@cassandra.apache.org *Date: *Tuesday, February 5, 2013 7:21 AM *To: *user@cassandra.apache.org user@cassandra.apache.org *Subject: *Why do Datastax docs recommend Java 6? The Datastax docs repeatedly say (e.g. http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not recommended, but they don't say why. It would be helpful to know this. Does anyone know? The same documentation is referenced from the Cassandra wiki, for example, http://wiki.apache.org/cassandra/GettingStarted - Baron signature-best-employer-logo6784.pngsignature-logo18be.png
Cassandra 1.1.8 timeouts on clients
I've gotten timeouts on clients when using Cassandra 1.1.8 in a cluster of 12 nodes, but I don't see the same behavior when using Cassandra 1.0.10. So, to do a controlled experiment, the following was tried: 1. Started with Cassandra 1.0.10. Built a database and ran our test tools against it to build a database 2. Ran workload to ensure no timeout problems were seen. Stopped the load 3. Upgraded only 2 of the nodes in the cluster to 1.1.8. In the cluster of 12 nodes. Ran scrub afterwards as document states to convert sstables to 1.1 format and to fix level-manifest problems. 4. Started load back up 5. After some time, started seeing timeouts on the client for requests that go to the 1.1.8 nodes (i.e. requests sent to those nodes as the coordinator node) There appears to be a pattern in these timeouts in that a large burst of them occur every 10 minutes (on the 10 minute boundary of the hour, like 10:10:XX, 10:20:YY, 10:30:ZZ etc.). All clients see the timeouts from those two 1.1.8 nodes at the same exact time. The workload is not I/O bound at this point and requests are not being dropped either based on tpstat output. I don't see hinted handoff messages either as I believe that happens every 10 minutes. Key cache size is set to 2.7GB and memtable size is 1/3 of heap (2.7GB). The key cache memory usage is same as 1.0.10 based on heap size calculator. There are no GC pauses or any type of heap pressure messages in the logs. This is with Java 1.6.0.38. Does anyone know of some periodic tasks in Cassandra 1.1 that happens every 10 minutes that could explain this problem or have any ideas? Thanks
Re: Why do Datastax docs recommend Java 6?
Anyone has first hand experience with Zing JVM which is claimed to be pauseless? How do they charge, per CPU?Thanks-WeiFrom: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, February 6, 2013 7:07 AM Subject: Re: Why do Datastax docs recommend Java 6? Oracle already did this once, It was called jrockit :)http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html Typically oracle acquires they technology and then the bits are merged with the standard JVM.On Wed, Feb 6, 2013 at 2:13 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: I would prefer Oracle to own an Azul’s Zing JVM over any other (GC) to provide it for free for anyone :) Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride with Adform's Rich Media Suite Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: jef...@gmail.com [mailto:jef...@gmail.com] Sent: Wednesday, February 06, 2013 02:23 To: user@cassandra.apache.org Subject: Re: Why do Datastax docs recommend Java 6? Oracle now owns the sun hotspot team, which is inarguably the highest powered java vm team in the world. Its still really the epicenter of all java vm development. Sent from my Verizon Wireless BlackBerry From: "Ilya Grebnov" i...@metricshub.com Date: Tue, 5 Feb 2013 14:09:33 -0800 To: user@cassandra.apache.org ReplyTo: user@cassandra.apache.org Subject: RE: Why do Datastax docs recommend Java 6? Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I could not find this information online. Thanks, Ilya From: Michael Kjellman [mailto:mkjell...@barracuda.com] Sent: Tuesday, February 05, 2013 7:29 AM To: user@cassandra.apache.org Subject: Re: Why do Datastax docs recommend Java 6? There have been tons of threads/convos on this. In the early days of Java 7 it was pretty unstable and there was pretty much no convincing reason to use Java 7 over Java 6. Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable decision to use Java 7 and we do it in production with no issues to speak of. That being said there was one potential situation we've seen as a community where bootstrapping new node was using 3x more CPU and getting significantly less throughput. However, reproducing this consistently never happened AFAIK. I think until more people use Java 7 in production and prove it doesn't cause any additional bugs/performance issues Datastax will update their docs. Until now I'd say it's a safe bet to use Java 7 with Vanilla C* 1.2.1. I hope this helps! Best, Michael From: Baron Schwartz ba...@xaprb.com Reply-To: "user@cassandra.apache.org" user@cassandra.apache.org Date: Tuesday, February 5, 2013 7:21 AM To: "user@cassandra.apache.org" user@cassandra.apache.org Subject: Why do Datastax docs recommend Java 6? The Datastax docs repeatedly say (e.g. http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not recommended, but they don't say why. It would be helpful to know this. Does anyone know? The same documentation is referenced from the Cassandra wiki, for example,http://wiki.apache.org/cassandra/GettingStarted - Baron
Re: Estimating write throughput with LeveledCompactionStrategy
I have been struggling with the LCS myself. I observed that for the higher level compaction,(from level 4 to 5) it involves much more SSTables than compacting from lower level. One compaction could take an hour or more. By the way, you set the your SSTable size to be 100M? Thanks. -Wei From: Ивaн Cобoлeв sobol...@gmail.com To: user@cassandra.apache.org Sent: Wednesday, February 6, 2013 2:42 AM Subject: Estimating write throughput with LeveledCompactionStrategy Dear Community, Could anyone please give me a hand with understanding what am I missing while trying to model how LeveledCompactionStrategy works: https://docs.google.com/spreadsheet/ccc?key=0AvNacZ0w52BydDQ3N2ZPSks2OHR1dlFmMVV4d1E2eEE#gid=0 Logs mostly contain something like this: INFO [CompactionExecutor:2235] 2013-02-06 02:32:29,758 CompactionTask.java (line 221) Compacted to [chunks-hf-285962-Data.db,chunks-hf-285963-Data.db,chunks-hf-285964-Data.db,chunks-hf-285965-Data.db,chunks-hf-285966-Data.db,chunks-hf-285967-Data.db,chunks-hf-285968-Data.db,chunks-hf-285969-Data.db,chunks-hf-285970-Data.db,chunks-hf-285971-Data.db,chunks-hf-285972-Data.db,chunks-hf-285973-Data.db,chunks-hf-285974-Data.db,chunks-hf-285975-Data.db,chunks-hf-285976-Data.db,chunks-hf-285977-Data.db,chunks-hf-285978-Data.db,chunks-hf-285979-Data.db,chunks-hf-285980-Data.db,]. 2,255,863,073 to 1,908,460,931 (~84% of original) bytes for 36,868 keys at 14.965795MB/s. Time: 121,614ms. Thus spreadsheet is parameterized with throughput being 15Mb and survivor ratio of 0.9. 1) Projected result actually differs from what I observe - what am I missing? 2) Are there any metrics on write throughput with LCS per node anyone could possibly share? Thank you very much in advance, Ivan
Re: Cassandra 1.1.8 timeouts on clients
I may have found a trigger that is causing these problems. Anyone seen these compaction problems in 1.1? I did run scrub on all my 1.0 data to convert it to 1.1 and fix level-manifest problems before I started running 1.1. 1 node: ERROR [CompactionExecutor:281] 2013-02-06 23:56:16,183 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp actionExecutor:281,1,main] java.io.IOError: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:98) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234 ) at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112) ... 21 more 2nd node: ERROR [CompactionExecutor:266] 2013-02-06 23:51:35,181 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp actionExecutor:266,1,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at
Re: Operation Consideration with Counter Column Families
Thanks Aaron, so will there only be one value for each counter column per sstable just like regular columns? Yes. For some reason I was under the impression that Cassandra keeps a log of all the increments not the actual value. Not as far as I understand. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/02/2013, at 11:15 AM, Drew Kutcharian d...@venarc.com wrote: Thanks Aaron, so will there only be one value for each counter column per sstable just like regular columns? For some reason I was under the impression that Cassandra keeps a log of all the increments not the actual value. On Feb 5, 2013, at 12:36 PM, aaron morton aa...@thelastpickle.com wrote: Are there any specific operational considerations one should make when using counter columns families? Performance, as they incur a read and a write. There were some issues with overcounts in log replay (see the changes.txt). How are counter column families stored on disk? Same as regular CF's. How do they effect compaction? None. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/02/2013, at 7:47 AM, Drew Kutcharian d...@venarc.com wrote: Hey Guys, Are there any specific operational considerations one should make when using counter columns families? How are counter column families stored on disk? How do they effect compaction? -- Drew
Re: DataModel Question
2) DynamicComposites : I read somewhere that they are not recommended ? You probably wont need them. Your current model will not sort message by the time they arrive in a day. The sort order will be based on Message type and the message ID. I'm assuming you want to order messages, so put the time uuid at the start of the composite columns. If you often want to get the most recent messages use a reverse comparator. You could probably also have wider rows if you want to, not sure how many messages kids send a day but you may get by with weekly partitions. The CLI model could be: row_key: phone_number : day column: time_uuid : message_id : message_type You could also pack extra data used JSON, ProtoBuffers etc and store more that just the message in the column value. If you use using CQL 3 consider this: create table messages ( phone_numbertext, day timestamp, message_sequencetimeuuid, # your timestamp message_id integer, message_typetext, message_bodytext ) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id) (phone_number, day) is the partition key, same the thrift row key. message_sequence, message_id is the grouping columns, all instances will be grouped / ordered by these columns. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 1:47 AM, Kanwar Sangha kan...@mavenir.com wrote: 1) Version is 1.2 2) DynamicComposites : I read somewhere that they are not recommended ? 3) Good point. I need to think about that one. From: Tamar Fraenkel [mailto:ta...@tok-media.com] Sent: 06 February 2013 00:50 To: user@cassandra.apache.org Subject: Re: DataModel Question Hi! I have couple of questions regarding your model: 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think. 2. Maybe I don't understand your model, but I think you need DynamicComposite columns, as user columns are different in number of components and maybe type. 3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media image001.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra mishra.v...@gmail.com wrote: Avoid super columns. If you need Sorted, wide rows then go for Composite columns. -Vivek On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – We are designing a Cassandra based storage for the following use cases- ·Store SMS messages ·Store MMS messages ·Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key : Composite key [ PhoneNum : Day] ·Example: 19876543456:05022013 Dynamic Column Families ·Composite column key for SMS [SMS:MessageId:TimeUUID] ·Composite column key for MMS [MMS:MessageId:TimeUUID] ·Composite column key for user I am chatting with [UserId:198765432345] – This can have multiple values since each chat conv can have many messages. Should this be a super column ? 198:05022013 SMS::ttt SMS:xxx12:ttt MMS::ttt :19 198:05022013 1987888:05022013 Thanks, Kanwar
RE: DataModel Question
Thanks Aaron ! My use case is modeled like skype which stores IM + SMS + MMS in one conversation. I need to have the following functionality - *When I go offline and come online again, I need to retrieve all pending messages from all my conversations. *I should be able to select a contact and view the 'history' of the messages (last 7 days, last 14 days, last 21 days...) *If I log in to a different device, I should be able to synch at least a few days of messages. *One conversation can have multiple participants. *Support full synch or delta synch based on number of messages/history. I guess this makes the data model span across many CFs ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 06 February 2013 22:20 To: user@cassandra.apache.org Subject: Re: DataModel Question 2) DynamicComposites : I read somewhere that they are not recommended ? You probably wont need them. Your current model will not sort message by the time they arrive in a day. The sort order will be based on Message type and the message ID. I'm assuming you want to order messages, so put the time uuid at the start of the composite columns. If you often want to get the most recent messages use a reverse comparator. You could probably also have wider rows if you want to, not sure how many messages kids send a day but you may get by with weekly partitions. The CLI model could be: row_key: phone_number : day column: time_uuid : message_id : message_type You could also pack extra data used JSON, ProtoBuffers etc and store more that just the message in the column value. If you use using CQL 3 consider this: create table messages ( phone_numbertext, day timestamp, message_sequence timeuuid, # your timestamp message_id integer, message_type text, message_bodytext ) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id) (phone_number, day) is the partition key, same the thrift row key. message_sequence, message_id is the grouping columns, all instances will be grouped / ordered by these columns. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 1:47 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: 1) Version is 1.2 2) DynamicComposites : I read somewhere that they are not recommended ? 3) Good point. I need to think about that one. From: Tamar Fraenkel [mailto:ta...@tok-media.comhttp://tok-media.com] Sent: 06 February 2013 00:50 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: DataModel Question Hi! I have couple of questions regarding your model: 1. What Cassandra version are you using? I am still working with 1.0 and this seems to make sense, but 1.2 gives you much more power I think. 2. Maybe I don't understand your model, but I think you need DynamicComposite columns, as user columns are different in number of components and maybe type. 3. How do you associate between the SMS or MMS and the user you are chating with. Is it done by a separate CF? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media image001.png ta...@tok-media.commailto:ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra mishra.v...@gmail.commailto:mishra.v...@gmail.com wrote: Avoid super columns. If you need Sorted, wide rows then go for Composite columns. -Vivek On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha kan...@mavenir.commailto:kan...@mavenir.com wrote: Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key : Composite key [ PhoneNum : Day] *Example: 19876543456:05022013 Dynamic Column Families *Composite column key for SMS [SMS:MessageId:TimeUUID] *Composite column key for MMS [MMS:MessageId:TimeUUID] *Composite column key for user I am chatting with [UserId:198765432345] - This can have multiple values since each chat conv can have many messages. Should this be a super column ? 198:05022013 SMS::ttt SMS:xxx12:ttt MMS::ttt :19 198:05022013 1987888:05022013 Thanks, Kanwar
Netflix/Astynax Client for Cassandra
Hi, Has anyone used Netflix/astynax java client library for Cassandra? I have used Hector before and would like to evaluate astynax. Not sure, how it is accepted in Cassandra community. Any issues with it or advantagest? API looks very clean and simple compare to Hector. Has anyone used it in production except Netflix themselves? Thanks LCassa
Re: Netflix/Astynax Client for Cassandra
It's a really great library and definitely recommended by me and many who are reading this. And if you are just starting out on 1.2.1 with C* you might also want to evaluate https://github.com/datastax/java-driver and the new binary protocol. Best, michael From: Cassa L lcas...@gmail.commailto:lcas...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, February 6, 2013 10:13 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Netflix/Astynax Client for Cassandra Hi, Has anyone used Netflix/astynax java client library for Cassandra? I have used Hector before and would like to evaluate astynax. Not sure, how it is accepted in Cassandra community. Any issues with it or advantagest? API looks very clean and simple compare to Hector. Has anyone used it in production except Netflix themselves? Thanks LCassa
Re: Netflix/Astynax Client for Cassandra
Astyanax is not working with Cassandra 1.2.1. Only java-driver is working very well with both Cassandra 1.2 and 1.2.1. Cheers, Gabi On 2/7/13 8:16 AM, Michael Kjellman wrote: It's a really great library and definitely recommended by me and many who are reading this. And if you are just starting out on 1.2.1 with C* you might also want to evaluate https://github.com/datastax/java-driver and the new binary protocol. Best, michael From: Cassa L lcas...@gmail.com mailto:lcas...@gmail.com Reply-To: user@cassandra.apache.org mailto:user@cassandra.apache.org user@cassandra.apache.org mailto:user@cassandra.apache.org Date: Wednesday, February 6, 2013 10:13 PM To: user@cassandra.apache.org mailto:user@cassandra.apache.org user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Netflix/Astynax Client for Cassandra Hi, Has anyone used Netflix/astynax java client library for Cassandra? I have used Hector before and would like to evaluate astynax. Not sure, how it is accepted in Cassandra community. Any issues with it or advantagest? API looks very clean and simple compare to Hector. Has anyone used it in production except Netflix themselves? Thanks LCassa
Re: Netflix/Astynax Client for Cassandra
Kundera 2.3 is also upgraded for Cassandra 1.2(except CQL binary protocol). -Vivek On Thu, Feb 7, 2013 at 11:50 AM, Gabriel Ciuloaica gciuloa...@gmail.comwrote: Astyanax is not working with Cassandra 1.2.1. Only java-driver is working very well with both Cassandra 1.2 and 1.2.1. Cheers, Gabi On 2/7/13 8:16 AM, Michael Kjellman wrote: It's a really great library and definitely recommended by me and many who are reading this. And if you are just starting out on 1.2.1 with C* you might also want to evaluate https://github.com/datastax/java-driver and the new binary protocol. Best, michael From: Cassa L lcas...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Wednesday, February 6, 2013 10:13 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Netflix/Astynax Client for Cassandra Hi, Has anyone used Netflix/astynax java client library for Cassandra? I have used Hector before and would like to evaluate astynax. Not sure, how it is accepted in Cassandra community. Any issues with it or advantagest? API looks very clean and simple compare to Hector. Has anyone used it in production except Netflix themselves? Thanks LCassa