CQL3 PreparedStatement - parameterized timestamp

2013-02-06 Thread Mike Sample
Is there a way to re-use a prepared statement with different using
timestamp values?

BEGIN BATCH USING timestamp
INSERT INTO Foo (a,b,c) values (?,?,?)
...
APPLY BATCH;

Once bound or while binding the prepared statement to specific values, I'd
like to set the timestamp value.

Putting a question mark in for timestamp failed as expected and I don't see
a method on the DataStax java driver BoundStatement for setting it.

Thanks in advance.

/Mike Sample


Re: CQL3 PreparedStatement - parameterized timestamp

2013-02-06 Thread Sylvain Lebresne
Not yet: https://issues.apache.org/jira/browse/CASSANDRA-4450

--
Sylvain


On Wed, Feb 6, 2013 at 9:06 AM, Mike Sample mike.sam...@gmail.com wrote:

 Is there a way to re-use a prepared statement with different using
 timestamp values?

 BEGIN BATCH USING timestamp
 INSERT INTO Foo (a,b,c) values (?,?,?)
 ...
 APPLY BATCH;

 Once bound or while binding the prepared statement to specific values, I'd
 like to set the timestamp value.

 Putting a question mark in for timestamp failed as expected and I don't
 see a method on the DataStax java driver BoundStatement for setting it.

 Thanks in advance.

 /Mike Sample



Re: CQL3 PreparedStatement - parameterized timestamp

2013-02-06 Thread Mike Sample
Thanks Sylvain.  I should have scanned Jira first.  Glad to see it's on the
todo list.


On Wed, Feb 6, 2013 at 12:24 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 Not yet: https://issues.apache.org/jira/browse/CASSANDRA-4450

 --
 Sylvain


 On Wed, Feb 6, 2013 at 9:06 AM, Mike Sample mike.sam...@gmail.com wrote:

 Is there a way to re-use a prepared statement with different using
 timestamp values?

 BEGIN BATCH USING timestamp
 INSERT INTO Foo (a,b,c) values (?,?,?)
 ...
 APPLY BATCH;

 Once bound or while binding the prepared statement to specific values,
 I'd like to set the timestamp value.

 Putting a question mark in for timestamp failed as expected and I don't
 see a method on the DataStax java driver BoundStatement for setting it.

 Thanks in advance.

 /Mike Sample





Estimating write throughput with LeveledCompactionStrategy

2013-02-06 Thread Ивaн Cобoлeв
Dear Community,

Could anyone please give me a hand with understanding what am I
missing while trying to model how LeveledCompactionStrategy works:
https://docs.google.com/spreadsheet/ccc?key=0AvNacZ0w52BydDQ3N2ZPSks2OHR1dlFmMVV4d1E2eEE#gid=0

Logs mostly contain something like this:
 INFO [CompactionExecutor:2235] 2013-02-06 02:32:29,758
CompactionTask.java (line 221) Compacted to
[chunks-hf-285962-Data.db,chunks-hf-285963-Data.db,chunks-hf-285964-Data.db,chunks-hf-285965-Data.db,chunks-hf-285966-Data.db,chunks-hf-285967-Data.db,chunks-hf-285968-Data.db,chunks-hf-285969-Data.db,chunks-hf-285970-Data.db,chunks-hf-285971-Data.db,chunks-hf-285972-Data.db,chunks-hf-285973-Data.db,chunks-hf-285974-Data.db,chunks-hf-285975-Data.db,chunks-hf-285976-Data.db,chunks-hf-285977-Data.db,chunks-hf-285978-Data.db,chunks-hf-285979-Data.db,chunks-hf-285980-Data.db,].
 2,255,863,073 to 1,908,460,931 (~84% of original) bytes for 36,868
keys at 14.965795MB/s.  Time: 121,614ms.

Thus spreadsheet is parameterized with throughput being 15Mb and
survivor ratio of 0.9.

1) Projected result actually differs from what I observe - what am I missing?
2) Are there any metrics on write throughput with LCS per node anyone
could possibly share?

Thank you very much in advance,
Ivan


RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
1)  Version is 1.2

2)  DynamicComposites : I read somewhere that they are not recommended ?

3)  Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:ta...@tok-media.com]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.org
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this 
seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite 
columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating 
with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra 
mishra.v...@gmail.commailto:mishra.v...@gmail.com wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite 
columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha 
kan...@mavenir.commailto:kan...@mavenir.com wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*Store SMS messages

*Store MMS messages

*Store Chat history

What would be the ideal was to design the data model for this kind of 
application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*Example:   19876543456:05022013

Dynamic Column Families


*Composite column key for SMS [SMS:MessageId:TimeUUID]

*Composite column key for MMS [MMS:MessageId:TimeUUID]

*Composite column key for user I am chatting with [UserId:198765432345] 
- This can have multiple values since each chat conv can have many messages. 
Should this be a super column ?


198:05022013

SMS::ttt

SMS:xxx12:ttt

MMS::ttt

:19

198:05022013









1987888:05022013











Thanks,
Kanwar




inline: image001.png

Re: Pycassa vs YCSB results.

2013-02-06 Thread Tim Wintle
On Tue, 2013-02-05 at 13:51 -0500, Edward Capriolo wrote:
 Without stating the obvious, if you are interested in scale, then why
 pick python?.

I would (kind of) agree with this point..

If you absolutely need performance here then python isn't the right
choice.

If, however, you are currently working with python and the question was
just why is pycassa not as fast as YCSB, can I make it faster then I'd
say the fact it was only a constant factor of 2 slower shows it's
perfectly possible to stick with python.


FYI the setup I'm using to write data to Cassandra is based around a
series of 0mq applications - the actual loader is in python, but
filtering steps before that are (partly) in C.

Tim



RE: unbalanced ring

2013-02-06 Thread Stephen.M.Thompson
Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and compact on 
each of the nodes.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
Datacenter: 28
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.28.205.125 1.7 GB 255 33.7% 
3daab184-61f0-49a0-b076-863f10bc8c6c  205
UN  10.28.205.126 591.44 MB  256 99.9% 
55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
UN  10.28.205.127 112.28 MB  257 66.4% 
d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

So this is a little better.  At last node 3 has some content, but they are 
still far from balanced.  If I am understand this correctly, this is the 
distribution I would expect if the tokens were set at 15/5/1 rather than equal. 
 As configured, I would expect roughly equal amounts of data on each node. Is 
that right?  Do you have any suggestions for what I can look at to get there?

I have about 11M rows of data in this keyspace and none of them are 
exceptionally long ... it's data pulled from Oracle and didn't include any 
BLOB, etc.

Stephen Thompson
Wells Fargo Corporation
Internet Authentication  Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is 
intended for the use of the addressee only. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, disclose, 
or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply 
e-mail and delete this message. Thank you for your cooperation.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, February 05, 2013 3:41 PM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

Use nodetool status with vnodes 
http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes

The different load can be caused by rack affinity, are all the nodes in the 
same rack ? Another simple check is have you created some very big rows?
Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/02/2013, at 8:40 AM, 
stephen.m.thomp...@wellsfargo.commailto:stephen.m.thomp...@wellsfargo.com 
wrote:


So I have three nodes in a ring in one data center.  My configuration has 
num_tokens: 256 set andinitial_token commented out.  When I look at the ring, 
it shows me all of the token ranges of course, and basically identical data for 
each range on each node.  Here is the Cliff's Notes version of what I see:

[root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

Datacenter: 28
==
Replicas: 1

Address RackStatus State   LoadOwns
Token
   
9187343239835811839
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026347817059713363
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026276684526453414
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026205551993193465
  (etc)
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9187343239835811840
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9151314442816847872
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9115285645797883904
  (etc)
10.28.205.127   205 Up Normal  69.13 KB66.30%  
-9223372036854775808
10.28.205.127   205 Up Normal  69.13 KB66.30%  
36028797018963967
10.28.205.127   205 Up Normal  69.13 KB66.30%  
72057594037927935
  (etc)

So at this point I have a number of questions.   The biggest question is of 
Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 
0.69 GB?  These boxes are all comparable and all configured identically.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

I'm sorry to ask so many questions - I'm having a hard time finding 
documentation that explains this stuff.

Stephen



Re: Why do Datastax docs recommend Java 6?

2013-02-06 Thread Edward Capriolo
Oracle already did this once, It was called jrockit :)
http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html

Typically oracle acquires they technology and then the bits are merged with
the standard JVM.

On Wed, Feb 6, 2013 at 2:13 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  I would prefer Oracle to own an Azul’s Zing JVM over any other (GC) to
 provide it for free for anyone :)


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* jef...@gmail.com [mailto:jef...@gmail.com]
 *Sent:* Wednesday, February 06, 2013 02:23
 *To:* user@cassandra.apache.org
 *Subject:* Re: Why do Datastax docs recommend Java 6?



 Oracle now owns the sun hotspot team, which is inarguably the highest
 powered java vm team in the world. Its still really the epicenter of all
 java vm development.

 Sent from my Verizon Wireless BlackBerry
  --

 *From: *Ilya Grebnov i...@metricshub.com

 *Date: *Tue, 5 Feb 2013 14:09:33 -0800

 *To: *user@cassandra.apache.org

 *ReplyTo: *user@cassandra.apache.org

 *Subject: *RE: Why do Datastax docs recommend Java 6?



 Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I
 could not find this information online.



 Thanks,

 Ilya

 *From:* Michael Kjellman 
 [mailto:mkjell...@barracuda.commkjell...@barracuda.com]

 *Sent:* Tuesday, February 05, 2013 7:29 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Why do Datastax docs recommend Java 6?



 There have been tons of threads/convos on this.



 In the early days of Java 7 it was pretty unstable and there was pretty
 much no convincing reason to use Java 7 over Java 6.



 Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable
 decision to use Java 7 and we do it in production with no issues to speak
 of.



 That being said there was one potential situation we've seen as a
 community where bootstrapping new node was using 3x more CPU and getting
 significantly less throughput. However, reproducing this consistently never
 happened AFAIK.



 I think until more people use Java 7 in production and prove it doesn't
 cause any additional bugs/performance issues Datastax will update their
 docs. Until now I'd say it's a safe bet to use Java 7 with Vanilla C*
 1.2.1. I hope this helps!



 Best,

 Michael



 *From: *Baron Schwartz ba...@xaprb.com
 *Reply-To: *user@cassandra.apache.org user@cassandra.apache.org
 *Date: *Tuesday, February 5, 2013 7:21 AM
 *To: *user@cassandra.apache.org user@cassandra.apache.org
 *Subject: *Why do Datastax docs recommend Java 6?



 The Datastax docs repeatedly say (e.g.
 http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not
 recommended, but they don't say why. It would be helpful to know this. Does
 anyone know?



 The same documentation is referenced from the Cassandra wiki, for example,
 http://wiki.apache.org/cassandra/GettingStarted



 - Baron

signature-best-employer-logo6784.pngsignature-logo18be.png

Cassandra 1.1.8 timeouts on clients

2013-02-06 Thread Terry Cumaranatunge
I've gotten timeouts on clients when using Cassandra 1.1.8 in a cluster of
12 nodes, but I don't see the same behavior when using Cassandra 1.0.10.
So, to do a controlled experiment, the following was tried:

1. Started with Cassandra 1.0.10. Built a database and ran our test tools
against it to build a database
2. Ran workload to ensure no timeout problems were seen. Stopped the load
3. Upgraded only 2 of the nodes in the cluster to 1.1.8. In the cluster of
12 nodes. Ran scrub afterwards as document states to convert sstables to
1.1 format and to fix level-manifest problems.
4. Started load back up
5. After some time, started seeing timeouts on the client for requests that
go to the 1.1.8 nodes (i.e. requests sent to those nodes as the coordinator
node)

There appears to be a pattern in these timeouts in that a large burst of
them occur every 10 minutes (on the 10 minute boundary of the hour, like
10:10:XX, 10:20:YY, 10:30:ZZ etc.). All clients see the timeouts from those
two 1.1.8 nodes at the same exact time. The workload is not I/O bound at
this point and requests are not being dropped either based on tpstat
output. I don't see hinted handoff messages either as I believe that
happens every 10 minutes. Key cache size is set to 2.7GB and memtable size
is 1/3 of heap (2.7GB). The key cache memory usage is same as 1.0.10 based
on heap size calculator. There are no GC pauses or any type of heap
pressure messages in the logs. This is with Java 1.6.0.38.

Does anyone know of some periodic tasks in Cassandra 1.1 that happens every
10 minutes that could explain this problem or have any ideas?

Thanks


Re: Why do Datastax docs recommend Java 6?

2013-02-06 Thread Wei Zhu
Anyone has first hand experience with Zing JVM which is claimed to be pauseless? How do they charge, per CPU?Thanks-WeiFrom: Edward Capriolo edlinuxg...@gmail.com To:
 user@cassandra.apache.org  Sent: Wednesday, February 6, 2013 7:07 AM Subject: Re: Why do Datastax docs recommend Java 6?   
Oracle already did this once, It was called jrockit :)http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html
Typically oracle acquires they technology and then the bits are merged with the standard JVM.On Wed, Feb 6, 2013 at 2:13 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote:








I would prefer Oracle to own an Azul’s Zing JVM over any other (GC) to provide it for free for anyone :)








Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer


Email: 
viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: 
@adforminsider
Take a ride with Adform's 
Rich Media Suite














Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property
 of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.






From: jef...@gmail.com [mailto:jef...@gmail.com]

Sent: Wednesday, February 06, 2013 02:23
To: user@cassandra.apache.org
Subject: Re: Why do Datastax docs recommend Java 6?



Oracle now owns the sun hotspot team, which is inarguably the highest powered java vm team in the world. Its still really the epicenter of all java vm development.

Sent from my Verizon Wireless BlackBerry





From: "Ilya Grebnov" i...@metricshub.com



Date: Tue, 5 Feb 2013 14:09:33 -0800


To: user@cassandra.apache.org


ReplyTo: user@cassandra.apache.org



Subject: RE: Why do Datastax docs recommend Java 6?




Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I could not find this information online.


Thanks,

Ilya


From: Michael Kjellman [mailto:mkjell...@barracuda.com]

Sent: Tuesday, February 05, 2013 7:29 AM
To: user@cassandra.apache.org
Subject: Re: Why do Datastax docs recommend Java 6?




There have been tons of threads/convos on this.





In the early days of Java 7 it was pretty unstable and there was pretty much no convincing reason to use Java 7 over Java 6.






Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable decision to use Java 7 and we do it in production with no issues to speak of.






That being said there was one potential situation we've seen as a community where bootstrapping new node was using 3x more CPU and getting significantly less
 throughput. However, reproducing this consistently never happened AFAIK.





I think until more people use Java 7 in production and prove it doesn't cause any additional bugs/performance issues Datastax will update their docs. Until now
 I'd say it's a safe bet to use Java 7 with Vanilla C* 1.2.1. I hope this helps!





Best,


Michael





From:
Baron Schwartz ba...@xaprb.com
Reply-To: "user@cassandra.apache.org" user@cassandra.apache.org
Date: Tuesday, February 5, 2013 7:21 AM
To: "user@cassandra.apache.org" user@cassandra.apache.org
Subject: Why do Datastax docs recommend Java 6?







The Datastax docs repeatedly say (e.g.
http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not recommended, but they don't say why. It would be helpful to know this. Does anyone know?





The same documentation is referenced from the Cassandra wiki, for example,http://wiki.apache.org/cassandra/GettingStarted






- Baron











Re: Estimating write throughput with LeveledCompactionStrategy

2013-02-06 Thread Wei Zhu
I have been struggling with the LCS myself. I observed that for the higher 
level compaction,(from level 4 to 5) it involves much more SSTables than 
compacting from lower level. One compaction could take an hour or more. By the 
way, you set the your SSTable size to be 100M?

Thanks.
-Wei 



 From: Ивaн Cобoлeв sobol...@gmail.com
To: user@cassandra.apache.org 
Sent: Wednesday, February 6, 2013 2:42 AM
Subject: Estimating write throughput with LeveledCompactionStrategy
 
Dear Community,

Could anyone please give me a hand with understanding what am I
missing while trying to model how LeveledCompactionStrategy works:
https://docs.google.com/spreadsheet/ccc?key=0AvNacZ0w52BydDQ3N2ZPSks2OHR1dlFmMVV4d1E2eEE#gid=0

Logs mostly contain something like this:
INFO [CompactionExecutor:2235] 2013-02-06 02:32:29,758
CompactionTask.java (line 221) Compacted to
[chunks-hf-285962-Data.db,chunks-hf-285963-Data.db,chunks-hf-285964-Data.db,chunks-hf-285965-Data.db,chunks-hf-285966-Data.db,chunks-hf-285967-Data.db,chunks-hf-285968-Data.db,chunks-hf-285969-Data.db,chunks-hf-285970-Data.db,chunks-hf-285971-Data.db,chunks-hf-285972-Data.db,chunks-hf-285973-Data.db,chunks-hf-285974-Data.db,chunks-hf-285975-Data.db,chunks-hf-285976-Data.db,chunks-hf-285977-Data.db,chunks-hf-285978-Data.db,chunks-hf-285979-Data.db,chunks-hf-285980-Data.db,].
2,255,863,073 to 1,908,460,931 (~84% of original) bytes for 36,868
keys at 14.965795MB/s.  Time: 121,614ms.

Thus spreadsheet is parameterized with throughput being 15Mb and
survivor ratio of 0.9.

1) Projected result actually differs from what I observe - what am I missing?
2) Are there any metrics on write throughput with LCS per node anyone
could possibly share?

Thank you very much in advance,
Ivan

Re: Cassandra 1.1.8 timeouts on clients

2013-02-06 Thread Terry Cumaranatunge
I may have found a trigger that is causing these problems. Anyone seen
these compaction problems in 1.1? I did run scrub on all my 1.0 data to
convert it to 1.1 and fix level-manifest problems before I started running
1.1.

1 node:
ERROR [CompactionExecutor:281] 2013-02-06 23:56:16,183
AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp
actionExecutor:281,1,main]
java.io.IOError:
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
column name length 0
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
at
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException:
invalid column name length 0
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:98)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234
)
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112)
... 21 more

2nd node:
ERROR [CompactionExecutor:266] 2013-02-06 23:51:35,181
AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp
actionExecutor:266,1,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
at
org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:99)
at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
at 

Re: Operation Consideration with Counter Column Families

2013-02-06 Thread aaron morton
 Thanks Aaron, so will there only be one value for each counter column per 
 sstable just like regular columns?
Yes. 

  For some reason I was under the impression that Cassandra keeps a log of all 
 the increments not the actual value.
Not as far as I understand. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/02/2013, at 11:15 AM, Drew Kutcharian d...@venarc.com wrote:

 Thanks Aaron, so will there only be one value for each counter column per 
 sstable just like regular columns? For some reason I was under the impression 
 that Cassandra keeps a log of all the increments not the actual value.
 
 
 On Feb 5, 2013, at 12:36 PM, aaron morton aa...@thelastpickle.com wrote:
 
 Are there any specific operational considerations one should make when 
 using counter columns families?
 Performance, as they incur a read and a write. 
 There were some issues with overcounts in log replay (see the changes.txt). 
  
  How are counter column families stored on disk? 
 Same as regular CF's. 
 
 How do they effect compaction?
 None.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 6/02/2013, at 7:47 AM, Drew Kutcharian d...@venarc.com wrote:
 
 Hey Guys,
 
 Are there any specific operational considerations one should make when 
 using counter columns families? How are counter column families stored on 
 disk? How do they effect compaction?
 
 -- Drew
 
 
 



Re: DataModel Question

2013-02-06 Thread aaron morton
 2)  DynamicComposites : I read somewhere that they are not recommended ?
You probably wont need them. 

Your current model will not sort message by the time they arrive in a day. The 
sort order will be based on Message type and the message ID. 

I'm assuming you want to order messages, so put the time uuid at the start of 
the composite columns. If you often want to get the most recent messages use a 
reverse comparator. 

You could probably also have wider rows if you want to, not sure how many 
messages kids send a day but you may get by with weekly partitions. 

The CLI model could be:
row_key: phone_number : day
column: time_uuid : message_id : message_type 

You could also pack extra data used JSON, ProtoBuffers etc and store more that 
just the message in the column value. 

If you use using CQL 3 consider this:

create table messages (
phone_numbertext, 
day timestamp, 
message_sequencetimeuuid, # your timestamp
message_id  integer, 
message_typetext, 
message_bodytext
) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)

(phone_number, day) is the partition key, same the thrift row key. 

 message_sequence, message_id is the grouping columns, all instances will be 
grouped / ordered by these columns. 

Hope that helps. 



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:47 AM, Kanwar Sangha kan...@mavenir.com wrote:

 1)  Version is 1.2
 2)  DynamicComposites : I read somewhere that they are not recommended ?
 3)  Good point. I need to think about that one.
  
  
  
 From: Tamar Fraenkel [mailto:ta...@tok-media.com] 
 Sent: 06 February 2013 00:50
 To: user@cassandra.apache.org
 Subject: Re: DataModel Question
  
 Hi!
 I have couple of questions regarding your model:
 
  1. What Cassandra version are you using? I am still working with 1.0 and 
 this seems to make sense, but 1.2 gives you much more power I think.
  2. Maybe I don't understand your model, but I think you need  
 DynamicComposite columns, as user columns are different in number of 
 components and maybe type.
  3. How do you associate between the SMS or MMS and the user you are chating 
 with. Is it done by a separate CF?
 
 Thanks,
 Tamar
  
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 image001.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
  
  
  
 
 On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 Avoid super columns. If you need Sorted, wide rows then go for Composite 
 columns. 
 
 -Vivek
  
 
 On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha kan...@mavenir.com wrote:
 Hi –  We are designing a Cassandra based storage for the following use cases-
  
 ·Store SMS messages
 
 ·Store MMS messages
 
 ·Store Chat history
 
  
 What would be the ideal was to design the data model for this kind of 
 application ? I am thinking on these lines ..
  
 Row-Key :  Composite key [ PhoneNum : Day]
  
 ·Example:   19876543456:05022013
 
  
 Dynamic Column Families
  
 ·Composite column key for SMS [SMS:MessageId:TimeUUID]
 
 ·Composite column key for MMS [MMS:MessageId:TimeUUID]
 
 ·Composite column key for user I am chatting with 
 [UserId:198765432345] – This can have multiple values since each chat conv 
 can have many messages. Should this be a super column ?
 
  
  
 198:05022013
 SMS::ttt
 SMS:xxx12:ttt
 MMS::ttt
 :19
 198:05022013
  
  
  
  
 1987888:05022013
  
  
  
  
  
  
 Thanks,
 Kanwar
  
 
  



RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
Thanks Aaron !

My use case is modeled like skype which stores IM + SMS + MMS in one 
conversation.

I need to have the following functionality -


*When I go offline and come online again, I need to retrieve all 
pending messages from all my conversations.

*I should be able to select a contact and view the 'history' of the 
messages (last 7 days, last 14 days, last 21 days...)

*If I log in to a different device, I should be able to synch at least 
a few days of messages.

*One conversation can have multiple participants.

*Support full synch or delta synch based on number of messages/history.

I guess this makes the data model span across many CFs ?




From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: 06 February 2013 22:20
To: user@cassandra.apache.org
Subject: Re: DataModel Question

2)  DynamicComposites : I read somewhere that they are not recommended ?
You probably wont need them.

Your current model will not sort message by the time they arrive in a day. The 
sort order will be based on Message type and the message ID.

I'm assuming you want to order messages, so put the time uuid at the start of 
the composite columns. If you often want to get the most recent messages use a 
reverse comparator.

You could probably also have wider rows if you want to, not sure how many 
messages kids send a day but you may get by with weekly partitions.

The CLI model could be:
row_key: phone_number : day
column: time_uuid : message_id : message_type

You could also pack extra data used JSON, ProtoBuffers etc and store more that 
just the message in the column value.

If you use using CQL 3 consider this:

create table messages (
phone_numbertext,
day  timestamp,
message_sequence timeuuid, # your timestamp
message_id integer,
message_type text,
message_bodytext
) with PRIMARY KEY ( (phone_number, day), message_sequence, message_id)

(phone_number, day) is the partition key, same the thrift row key.

 message_sequence, message_id is the grouping columns, all instances will be 
grouped / ordered by these columns.

Hope that helps.



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:47 AM, Kanwar Sangha 
kan...@mavenir.commailto:kan...@mavenir.com wrote:


1)  Version is 1.2
2)  DynamicComposites : I read somewhere that they are not recommended ?
3)  Good point. I need to think about that one.



From: Tamar Fraenkel [mailto:ta...@tok-media.comhttp://tok-media.com]
Sent: 06 February 2013 00:50
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: DataModel Question

Hi!
I have couple of questions regarding your model:
 1. What Cassandra version are you using? I am still working with 1.0 and this 
seems to make sense, but 1.2 gives you much more power I think.
 2. Maybe I don't understand your model, but I think you need  DynamicComposite 
columns, as user columns are different in number of components and maybe type.
 3. How do you associate between the SMS or MMS and the user you are chating 
with. Is it done by a separate CF?
Thanks,
Tamar


Tamar Fraenkel
Senior Software Engineer, TOK Media
image001.png

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra 
mishra.v...@gmail.commailto:mishra.v...@gmail.com wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite 
columns.

-Vivek

On Wed, Feb 6, 2013 at 7:09 AM, Kanwar Sangha 
kan...@mavenir.commailto:kan...@mavenir.com wrote:
Hi -  We are designing a Cassandra based storage for the following use cases-


*Store SMS messages

*Store MMS messages

*Store Chat history

What would be the ideal was to design the data model for this kind of 
application ? I am thinking on these lines ..

Row-Key :  Composite key [ PhoneNum : Day]


*Example:   19876543456:05022013

Dynamic Column Families


*Composite column key for SMS [SMS:MessageId:TimeUUID]

*Composite column key for MMS [MMS:MessageId:TimeUUID]

*Composite column key for user I am chatting with [UserId:198765432345] 
- This can have multiple values since each chat conv can have many messages. 
Should this be a super column ?


198:05022013

SMS::ttt

SMS:xxx12:ttt

MMS::ttt

:19

198:05022013









1987888:05022013











Thanks,
Kanwar






Netflix/Astynax Client for Cassandra

2013-02-06 Thread Cassa L
Hi,
 Has anyone used Netflix/astynax java client library for Cassandra? I have
used Hector before and would like to evaluate astynax. Not sure, how it is
accepted in Cassandra community. Any issues with it or advantagest? API
looks very clean and simple compare to Hector. Has anyone used it in
production except Netflix themselves?

Thanks
LCassa


Re: Netflix/Astynax Client for Cassandra

2013-02-06 Thread Michael Kjellman
It's a really great library and definitely recommended by me and many who are 
reading this.

And if you are just starting out on 1.2.1 with C* you might also want to 
evaluate https://github.com/datastax/java-driver and the new binary protocol.

Best,
michael

From: Cassa L lcas...@gmail.commailto:lcas...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, February 6, 2013 10:13 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Netflix/Astynax Client for Cassandra

Hi,
 Has anyone used Netflix/astynax java client library for Cassandra? I have used 
Hector before and would like to evaluate astynax. Not sure, how it is  accepted 
in Cassandra community. Any issues with it or advantagest? API looks very clean 
and simple compare to Hector. Has anyone used it in production except Netflix 
themselves?

Thanks
LCassa


Re: Netflix/Astynax Client for Cassandra

2013-02-06 Thread Gabriel Ciuloaica
Astyanax is not working with Cassandra 1.2.1. Only java-driver is 
working very well with both Cassandra 1.2 and 1.2.1.


Cheers,
Gabi
On 2/7/13 8:16 AM, Michael Kjellman wrote:
It's a really great library and definitely recommended by me and many 
who are reading this.


And if you are just starting out on 1.2.1 with C* you might also want 
to evaluate https://github.com/datastax/java-driver and the new binary 
protocol.


Best,
michael

From: Cassa L lcas...@gmail.com mailto:lcas...@gmail.com
Reply-To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org user@cassandra.apache.org 
mailto:user@cassandra.apache.org

Date: Wednesday, February 6, 2013 10:13 PM
To: user@cassandra.apache.org mailto:user@cassandra.apache.org 
user@cassandra.apache.org mailto:user@cassandra.apache.org

Subject: Netflix/Astynax Client for Cassandra

Hi,
 Has anyone used Netflix/astynax java client library for Cassandra? I 
have used Hector before and would like to evaluate astynax. Not sure, 
how it is  accepted in Cassandra community. Any issues with it or 
advantagest? API looks very clean and simple compare to Hector. Has 
anyone used it in production except Netflix themselves?


Thanks
LCassa




Re: Netflix/Astynax Client for Cassandra

2013-02-06 Thread Vivek Mishra
Kundera 2.3 is also upgraded for Cassandra 1.2(except CQL binary protocol).

-Vivek

On Thu, Feb 7, 2013 at 11:50 AM, Gabriel Ciuloaica gciuloa...@gmail.comwrote:

  Astyanax is not working with Cassandra 1.2.1. Only java-driver is
 working very well with both Cassandra 1.2 and 1.2.1.

 Cheers,
 Gabi

 On 2/7/13 8:16 AM, Michael Kjellman wrote:

 It's a really great library and definitely recommended by me and many who
 are reading this.

  And if you are just starting out on 1.2.1 with C* you might also want to
 evaluate https://github.com/datastax/java-driver and the new binary
 protocol.

  Best,
 michael

  From: Cassa L lcas...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Wednesday, February 6, 2013 10:13 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Netflix/Astynax Client for Cassandra

   Hi,
  Has anyone used Netflix/astynax java client library for Cassandra? I have
 used Hector before and would like to evaluate astynax. Not sure, how it is
 accepted in Cassandra community. Any issues with it or advantagest? API
 looks very clean and simple compare to Hector. Has anyone used it in
 production except Netflix themselves?

 Thanks
 LCassa