Why Cassandra so depend on client local timestamp?

2013-10-01 Thread Jason Tang
Following case may be logical correct for Cassandra, but difficult for user.
Let's say:

Cassandra consistency level: write all, read one
replication_factor:3

For one record, rowkey:001, column:status

Client 1, insert value for rowkey 001, status:True, timestamp 11:00:05
Client 2 Slice Query, get the value True for rowkey 001, @11:00:00
Client 2, update value for rowkey 001, status:False, timestamp 11:00:02

So the client update sequence is True to False, although the update
requests are from different nodes, but the sequence are logically ordered.

But the result is rowkey:001, column:status, value: True

So why Cassandra so depend on client local time? Why not using server
localtime instead client local time?

Because I am using consistency level write all, and replication_factor:3,
so for all the 3 nodes, the update sequence is correct (True - False),
they can give a correct final results.

If for some reason, it need strong depends on operation's timestamp, then
query operation also need a timestamp, then Client 2 will not see the value
True, which happen in future.

So either using server timestamp or provide a consistent view by using
timestamp for query, it will be more consistent.

Otherwise, the consistency of Cassandra is so weak.


Fw: How to log the details of the updated data locally

2013-10-01 Thread sathiya prabhu


Hi all,



In cassandra cluster, once a write/update is successful to the particular 
instance locally, i want to log the data that is updated and its timestamps 
separately in a file.. In which class it will be more appropriate to do this.. 

To the extent i explored the codebase, it's possible to do that in Keyspace 
class (apply method) in db package.. But i don't know how to retrieve the 
timestamp details from the mutation object.. Is Keyspace class is appropriate 
for my purpose.. If yes, please kindly provide me some ideas to retrieve 
timestamp details from mutation object(RowMutation)..

Any help is appreciable.. Looking forward for your kind replies.. Thanks in 
advance..

Thank you.

default_time_to_live

2013-10-01 Thread Pieter Callewaert
Hi,

We are starting up a new cluster with Cassandra 2.0.0 and one of the features 
we were interested in was Per-CF TTL 
(https://issues.apache.org/jira/browse/CASSANDRA-3974)
I didn't find any commands in CQL to set this value, so I've used the following:

   UPDATE system.schema_columnfamilies SET default_time_to_live = 
10 WHERE keyspace_name = 'testschema' AND columnfamily_name = 'hol';

Confirming it is set:

cqlsh:testschema select default_time_to_live from system.schema_columnfamilies 
where keyspace_name = 'testschema' and columnfamily_name = 'hol';

default_time_to_live
--
   10

Then I Insert some dummy data, but it never expires...
Using the ttl command I get this:

cqlsh:testschema select ttl(coverage) from hol;

ttl(coverage)
---
  Null

Am I doing something wrong? Or is this a bug?

Kind regards,
[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web  IT engineer

   Web:   www.be-mobile.behttp://www.be-mobile.be/
   Email: pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be
   Tel:  + 32 9 330 51 80



inline: image001.png

Re: Segmentation fault when trying to store into cassandra...

2013-10-01 Thread Krishna Chaitanya
openjdk was the problem. I updated to the latest sun jdk and the issue was
fixed!!! Thanks...


On Mon, Sep 30, 2013 at 7:30 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Java version issue?
 Using sun jdk or open jdk?

 -Vivek


 On Tue, Oct 1, 2013 at 6:16 AM, Krishna Chaitanya 
 bnsk1990r...@gmail.comwrote:

 Hello,
I modified a network probe which collects network packets to
 store them into cassandra. So there are many packets that are coming in, I
 capture the packets in the program and store them into cassandra. I am
 using libQtCassandra library. The program is crashing with segmentation
 fault as soon as I run it. Can someone help as to what all can go wrong
 here?? Could there be a problem with row/col keys or is it some
 configuration parameter or the speed at which the packets or coming? I am
 not able to figure it out. Thank you.

 --
 Regards,
 BNSK*.
 *





-- 
Regards,
BNSK*.
*


paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
i have a table like the following:

CREATE TABLE log (
mykey timeuuid,
type text,
msg text,
primary key(mykey, type)
);

I want to page through all the results from the table using

select * from log where token(mykey)  token(maxTimeuuid(x)) limit 100;

(where xxx is 0 for the first query, and next one to be the time of the
mykey(timeuuid) from the last query result)

But i seem to get random result.

#1
is the above logic make sense for timeuuid type pagination?

#2
when we use token in the where clase, is the result return sorted?
e.g
where token(k)  token(4) AND token(k)   token(10) limit 3

k=5, k=6, k=7
or
k=7, k=5, k=9

?

I see lot of article use LIMIT to achieve page size, but if the result is
not sorted, then it is possible to miss item?


thanks


nodetool cfhistograms refresh

2013-10-01 Thread Rene Kochen
Quick question.

I am using Cassandra 1.0.11

When is nodetool cfhistograms output reset? I know that data is collected
during read requests. But I am wondering if it is data since the beginning
(start of Cassandra) or if it is reset periodically?

Thanks!

Rene


Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jan Algermissen
Jimmy,

On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:

 i have a table like the following:
  
 CREATE TABLE log (
 mykey timeuuid,
 type text,
 msg text,
 primary key(mykey, type)
 );
  
 I want to page through all the results from the table using

Have you considered the new build-in paging support:

http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

Jan

  
 select * from log where token(mykey)  token(maxTimeuuid(x)) limit 100;
  


 (where xxx is 0 for the first query, and next one to be the time of the 
 mykey(timeuuid) from the last query result)
  
 But i seem to get random result.
  
 #1
 is the above logic make sense for timeuuid type pagination?
  
 #2
 when we use token in the where clase, is the result return sorted?
 e.g
 where token(k)  token(4) AND token(k)   token(10) limit 3
  
 k=5, k=6, k=7
 or
 k=7, k=5, k=9
  
 ?
 
 I see lot of article use LIMIT to achieve page size, but if the result is not 
 sorted, then it is possible to miss item?
  
  
 thanks
  
  



Re: paging through a table with timeuuid primary key

2013-10-01 Thread David Ward
2.0 has a lot of really exciting stuff, unfortunately 2.0 has a lot of
really exciting stuff that may increase the risk of updating to 2.0 just
yet.


On Tue, Oct 1, 2013 at 9:30 AM, Jan Algermissen
jan.algermis...@nordsc.comwrote:

 Jimmy,

 On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:

  i have a table like the following:
 
  CREATE TABLE log (
  mykey timeuuid,
  type text,
  msg text,
  primary key(mykey, type)
  );
 
  I want to page through all the results from the table using

 Have you considered the new build-in paging support:

 http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

 Jan

 
  select * from log where token(mykey)  token(maxTimeuuid(x)) limit
 100;
 


  (where xxx is 0 for the first query, and next one to be the time of the
 mykey(timeuuid) from the last query result)
 
  But i seem to get random result.
 
  #1
  is the above logic make sense for timeuuid type pagination?
 
  #2
  when we use token in the where clase, is the result return sorted?
  e.g
  where token(k)  token(4) AND token(k)   token(10) limit 3
 
  k=5, k=6, k=7
  or
  k=7, k=5, k=9
 
  ?
 
  I see lot of article use LIMIT to achieve page size, but if the result
 is not sorted, then it is possible to miss item?
 
 
  thanks
 
 




Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
unfortunately, i have to stick with 1.2 for now for a while.

So I am looking for the old fashion way to do the pagination correctly.

I think i follow most of the articles on how to paging through a table, but
maybe have some silly gap that don't give me the correct behavior or it is
timeuuid not working for token function?



On Tue, Oct 1, 2013 at 8:57 AM, David Ward da...@shareablee.com wrote:

 2.0 has a lot of really exciting stuff, unfortunately 2.0 has a lot of
 really exciting stuff that may increase the risk of updating to 2.0 just
 yet.


 On Tue, Oct 1, 2013 at 9:30 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:

 Jimmy,

 On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:

  i have a table like the following:
 
  CREATE TABLE log (
  mykey timeuuid,
  type text,
  msg text,
  primary key(mykey, type)
  );
 
  I want to page through all the results from the table using

 Have you considered the new build-in paging support:

 http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0

 Jan

 
  select * from log where token(mykey)  token(maxTimeuuid(x)) limit
 100;
 


  (where xxx is 0 for the first query, and next one to be the time of the
 mykey(timeuuid) from the last query result)
 
  But i seem to get random result.
 
  #1
  is the above logic make sense for timeuuid type pagination?
 
  #2
  when we use token in the where clase, is the result return sorted?
  e.g
  where token(k)  token(4) AND token(k)   token(10) limit 3
 
  k=5, k=6, k=7
  or
  k=7, k=5, k=9
 
  ?
 
  I see lot of article use LIMIT to achieve page size, but if the result
 is not sorted, then it is possible to miss item?
 
 
  thanks
 
 





Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jan Algermissen
Maybe you are hitting the problem that your 'pages' can get truncated in the 
middle of a wide row.

See 
https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ

Jan



On 01.10.2013, at 18:12, Jimmy Lin y2klyf+w...@gmail.com wrote:

 unfortunately, i have to stick with 1.2 for now for a while.
  
 So I am looking for the old fashion way to do the pagination correctly.
  
 I think i follow most of the articles on how to paging through a table, but 
 maybe have some silly gap that don't give me the correct behavior or it is 
 timeuuid not working for token function?
  
 
 
 On Tue, Oct 1, 2013 at 8:57 AM, David Ward da...@shareablee.com wrote:
 2.0 has a lot of really exciting stuff, unfortunately 2.0 has a lot of really 
 exciting stuff that may increase the risk of updating to 2.0 just yet.
 
 
 On Tue, Oct 1, 2013 at 9:30 AM, Jan Algermissen jan.algermis...@nordsc.com 
 wrote:
 Jimmy,
 
 On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:
 
  i have a table like the following:
 
  CREATE TABLE log (
  mykey timeuuid,
  type text,
  msg text,
  primary key(mykey, type)
  );
 
  I want to page through all the results from the table using
 
 Have you considered the new build-in paging support:
 
 http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
 
 Jan
 
 
  select * from log where token(mykey)  token(maxTimeuuid(x)) limit 100;
 
 
 
  (where xxx is 0 for the first query, and next one to be the time of the 
  mykey(timeuuid) from the last query result)
 
  But i seem to get random result.
 
  #1
  is the above logic make sense for timeuuid type pagination?
 
  #2
  when we use token in the where clase, is the result return sorted?
  e.g
  where token(k)  token(4) AND token(k)   token(10) limit 3
 
  k=5, k=6, k=7
  or
  k=7, k=5, k=9
 
  ?
 
  I see lot of article use LIMIT to achieve page size, but if the result is 
  not sorted, then it is possible to miss item?
 
 
  thanks
 
 
 
 
 



Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
thanks, yea i am aware of that, and have already taken care.

 I just also found out a similar thread back in June
http://mail-archives.apache.org/mod_mbox/cassandra-user/201306.mbox/%3ccakkz8q2no6oucbwnveomn_ymxfh0nkpqvtym55jmvwa2qwx...@mail.gmail.com%3E

Somone was saying


Long story short, using non-equal condition on the partition key (i.e. the
first part of your primary key) is generally not advised. Or to put it
another way, the use of the byte ordering partitioner is discouraged. But
if you still want to use the ordering partitioner and do range queries on
the partition key, do not use a timeuuid, because the ordering that the
partitioner enforce will not be one that is meaningful (due to the timeuuid
layout).







So can't ues token on a timeuuid key?





On Tue, Oct 1, 2013 at 9:18 AM, Jan Algermissen
jan.algermis...@nordsc.comwrote:

 Maybe you are hitting the problem that your 'pages' can get truncated in
 the middle of a wide row.

 See
 https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ

 Jan



 On 01.10.2013, at 18:12, Jimmy Lin y2klyf+w...@gmail.com wrote:

  unfortunately, i have to stick with 1.2 for now for a while.
 
  So I am looking for the old fashion way to do the pagination correctly.
 
  I think i follow most of the articles on how to paging through a table,
 but maybe have some silly gap that don't give me the correct behavior or it
 is timeuuid not working for token function?
 
 
 
  On Tue, Oct 1, 2013 at 8:57 AM, David Ward da...@shareablee.com wrote:
  2.0 has a lot of really exciting stuff, unfortunately 2.0 has a lot of
 really exciting stuff that may increase the risk of updating to 2.0 just
 yet.
 
 
  On Tue, Oct 1, 2013 at 9:30 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Jimmy,
 
  On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:
 
   i have a table like the following:
  
   CREATE TABLE log (
   mykey timeuuid,
   type text,
   msg text,
   primary key(mykey, type)
   );
  
   I want to page through all the results from the table using
 
  Have you considered the new build-in paging support:
 
 
 http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
 
  Jan
 
  
   select * from log where token(mykey)  token(maxTimeuuid(x)) limit
 100;
  
 
 
   (where xxx is 0 for the first query, and next one to be the time of
 the mykey(timeuuid) from the last query result)
  
   But i seem to get random result.
  
   #1
   is the above logic make sense for timeuuid type pagination?
  
   #2
   when we use token in the where clase, is the result return sorted?
   e.g
   where token(k)  token(4) AND token(k)   token(10) limit 3
  
   k=5, k=6, k=7
   or
   k=7, k=5, k=9
  
   ?
  
   I see lot of article use LIMIT to achieve page size, but if the result
 is not sorted, then it is possible to miss item?
  
  
   thanks
  
  
 
 
 




Re: nodetool cfhistograms refresh

2013-10-01 Thread Richard Low
On 1 October 2013 16:21, Rene Kochen rene.koc...@schange.com wrote:

 Quick question.

 I am using Cassandra 1.0.11

 When is nodetool cfhistograms output reset? I know that data is collected
 during read requests. But I am wondering if it is data since the beginning
 (start of Cassandra) or if it is reset periodically?


It is reset on node restart and on each call to nodetool cfhistograms.

Richard.


Re: default_time_to_live

2013-10-01 Thread Sylvain Lebresne
You're not supposed to change the table settings by modifying
system.schema_columnfamilies as this will skip proper propagation of the
change. Instead, you're supposed to do an ALTER TABLE, so something like:
  ALTER TABLE hol WITH default_time_to_live=10;

That being said, if you restart the node on which you've made the update,
the change should be picked up and propagated to all nodes. Still not a
bad idea to do the ALTER TABLE to make sure everything is set right.

--
Sylvain


On Tue, Oct 1, 2013 at 10:50 AM, Pieter Callewaert 
pieter.callewa...@be-mobile.be wrote:

  Hi,

 ** **

 We are starting up a new cluster with Cassandra 2.0.0 and one of the
 features we were interested in was Per-CF TTL (
 https://issues.apache.org/jira/browse/CASSANDRA-3974)

 I didn’t find any commands in CQL to set this value, so I’ve used the
 following:

 ** **

UPDATE system.schema_columnfamilies SET
 default_time_to_live = 10 WHERE keyspace_name = 'testschema' AND
 columnfamily_name = 'hol';

 ** **

 Confirming it is set:

 ** **

 cqlsh:testschema select default_time_to_live from
 system.schema_columnfamilies where keyspace_name = 'testschema' and
 columnfamily_name = 'hol';

 ** **

 default_time_to_live

 --

10

 ** **

 Then I Insert some dummy data, but it never expires…

 Using the ttl command I get this:

 ** **

 cqlsh:testschema select ttl(coverage) from hol;

 ** **

 ttl(coverage)

 ---

   Null

 ** **

 Am I doing something wrong? Or is this a bug?

 ** **

 Kind regards,

 [image: Description: cid:image003.png@01CD9CE5.CE5A2330]

 *   **Pieter Callewaert***

 *   Web  IT engineer*

 * *

Web:   www.be-mobile.be 

Email: pieter.callewa...@be-mobile.be 

Tel:  + 32 9 330 51 80

 ** **

 ** **

image001.png

Maintaining counter column consistency

2013-10-01 Thread Ben Hood
Hi,

We're maintaining a bunch of application specific counters that are
incremented on a per event basis just after the event has been
inserted.

Given the fact that they can get of sync, we were wondering if there
are any best practices or just plain real world experience for
handling the consistency of these counters?

The application could tolerate an inconsistency for a while, so I'm
not sure that the cost of any full-on ACID semantics (should they
actually be possible in Cassandra) would be justified.

So the first inclination was to issue the increment after the insert
and hope for the best. Then at some later point, we would run a
reconciliation on the underlying data in the column family and compare
this with the counter values. Obviously you can only do this once a
counter column has gone cold - i.e. it wouldn't make sense to
reconcile something that could still get incremented.

Does it make sense to put the insert and increment in a CQL batch?

Does anybody have any high level advice for this design deliberation?

Cheers,

Ben


Rollback question regarding system metadata change

2013-10-01 Thread Christopher Wirt
Moving back to 1.2.10. What is the procedure roll back from 2.0.1?

 

Changes in the system schema seem to make this quite difficult. 

 

We have:

DC1 - 10 x 1.2.10

DC2 - 4 x 1.2.10

DC3 - 3 x 2.0.1 - ran this for a couple days and have decided to roll back

 

In my efforts I've now completely taken DC3 offline and already tried:

bootstrapping from empty data dirs.

Using the pre-migrationstable directories to get the old schema back. I
think this just gets overwritten by the newer schema.

 

So, I've dug in a bit and it looks like schema_columns now stores all
'columns' not just the value (non-primary) 'columns' and this is causing me
some issues for my rollback. 1.2.10 does not expecting/handling a null on
the component_index field.

 

So when starting up 1.2.10 with the new schema modified by 2.0.1 I get this
exception on loading my first CF. 

 

'did' happens to be the primary key of the first column family we load,
which contains a null in the component index field.

 

ERROR [main] 2013-10-01 14:55:19,071 CassandraDaemon.java (line 463)
Exception encountered during startup

org.apache.cassandra.db.marshal.MarshalException: unable to make int from
'did'

at
org.apache.cassandra.db.marshal.Int32Type.fromString(Int32Type.java:87)

at
org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCom
positeType.java:261)

at
org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.jav
a:230)

at
org.apache.cassandra.config.CFMetaData.addColumnDefinitionSchema(CFMetaData.
java:1522)

at
org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1454)

at
org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.
java:306)

at
org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:287)

at
org.apache.cassandra.db.DefsTable.loadFromTable(DefsTable.java:154)

at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescripto
r.java:571)

at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:253)

at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:4
46)

at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:489)

Caused by: java.lang.NumberFormatException: For input string: did

at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65
)

at java.lang.Integer.parseInt(Integer.java:492)

at java.lang.Integer.parseInt(Integer.java:527)

at
org.apache.cassandra.db.marshal.Int32Type.fromString(Int32Type.java:83)

 

Is there any easy way back from this?

 

My idea is to modify the schema_columns table to be as it used to prior the
migration. i.e. delete all rows for columns which are part of a primary key.

For obvious reasons I'm a little scared to do this.

 

Can anyone think of anything else I'd better watch out for?

 

Also FYI, We're rolling back due to issues we've experience with the new
HsHa server and after some thorough testing on our side intend on moving
back to 2.*

(I was being a bit cowboy-ish due to pressure to improve performance.)

 

Thanks,

Chris

 

 

 

 

 



Re: Rollback question regarding system metadata change

2013-10-01 Thread Robert Coli
On Tue, Oct 1, 2013 at 10:51 AM, Christopher Wirt chris.w...@struq.comwrote:

 Moving back to 1.2.10. What is the procedure roll back from 2.0.1?

 ** **

 Changes in the system schema seem to make this quite difficult. 

 ...

First, unfortunately downgrade is not, in my understanding, a supported
operation.

In my efforts I’ve now completely taken DC3 offline and already tried:


Are you seeing the new schema on the 1.2.10 nodes?

If the schema has been modified globally via gossip, I don't understand how
your 1.2.10 nodes are currently working. It seems like there may be a risk
of them not working if you restart them?

If the schema changes have been seen globally, the safest solution is to
dump, drop, and re-create your schema. This is likely to require a downtime
across all DCs. You could probably do this semi-online by doing the
(theoretical) clone-a-keyspace-with-hard-links operation, but that would
require application changes/synchronization etc...

=Rob


Re: PendingTasks: What does it mean inside Cassandra?

2013-10-01 Thread Robert Coli
On Wed, Aug 28, 2013 at 5:47 AM, Girish Kumar girishkuma...@gmail.comwrote:

 When high number of Pending tasks are showing up what it means to
 cassandra? What are the reasons for high number pending tasks?  Does that
 mean Cassandra is overloaded ?


It means that tasks have been queued to run inside of a Thread Pool, but
are not currently running. A given task moves from Pending to Active state,
and then Completed. Blocked is I attempted to move something from
Pending to Active, but was unable to.

http://www.datastax.com/docs/1.0/operations/monitoring

Cassandra maintains distinct thread pools for different stages of
execution. Each of these thread pools provide statistics on the number of
tasks that are active, pending and completed. Watching trends on these
pools for increases in the pending tasks column is an excellent indicator
of the need to add additional capacity.


=Rob


RE: default_time_to_live

2013-10-01 Thread Pieter Callewaert
Thanks, it works perfectly with ALTER TABLE. Stupid I didn't thought of this.
Maybe I overlooked, but maybe this should be added in the docs. Really a great 
feature!

Kind regards,

[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web  IT engineer

   Web:   www.be-mobile.behttp://www.be-mobile.be/
   Email: pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be
   Tel:  + 32 9 330 51 80



From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: dinsdag 1 oktober 2013 19:10
To: user@cassandra.apache.org
Subject: Re: default_time_to_live

You're not supposed to change the table settings by modifying 
system.schema_columnfamilies as this will skip proper propagation of the 
change. Instead, you're supposed to do an ALTER TABLE, so something like:
  ALTER TABLE hol WITH default_time_to_live=10;

That being said, if you restart the node on which you've made the update, the 
change should be picked up and propagated to all nodes. Still not a bad idea 
to do the ALTER TABLE to make sure everything is set right.

--
Sylvain

On Tue, Oct 1, 2013 at 10:50 AM, Pieter Callewaert 
pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be wrote:
Hi,

We are starting up a new cluster with Cassandra 2.0.0 and one of the features 
we were interested in was Per-CF TTL 
(https://issues.apache.org/jira/browse/CASSANDRA-3974)
I didn't find any commands in CQL to set this value, so I've used the following:

   UPDATE system.schema_columnfamilies SET default_time_to_live = 
10 WHERE keyspace_name = 'testschema' AND columnfamily_name = 'hol';

Confirming it is set:

cqlsh:testschema select default_time_to_live from system.schema_columnfamilies 
where keyspace_name = 'testschema' and columnfamily_name = 'hol';

default_time_to_live
--
   10

Then I Insert some dummy data, but it never expires...
Using the ttl command I get this:

cqlsh:testschema select ttl(coverage) from hol;

ttl(coverage)
---
  Null

Am I doing something wrong? Or is this a bug?

Kind regards,
[Description: cid:image003.png@01CD9CE5.CE5A2330]

   Pieter Callewaert
   Web  IT engineer

   Web:   www.be-mobile.behttp://www.be-mobile.be/
   Email: pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be
   Tel:  + 32 9 330 51 80




inline: image001.png

Re: nodetool cfhistograms refresh

2013-10-01 Thread Rene Kochen
If I look at Read Latency I see indeed that they are reset during two runs
of cfhistograms. However, Row Size and Column Count keep the values.
When are they re-evaluated?

Thanks!

Rene


2013/10/1 Richard Low rich...@wentnet.com

 On 1 October 2013 16:21, Rene Kochen rene.koc...@schange.com wrote:

 Quick question.

 I am using Cassandra 1.0.11

 When is nodetool cfhistograms output reset? I know that data is collected
 during read requests. But I am wondering if it is data since the beginning
 (start of Cassandra) or if it is reset periodically?


 It is reset on node restart and on each call to nodetool cfhistograms.

 Richard.



Re: nodetool cfhistograms refresh

2013-10-01 Thread Tyler Hobbs
On Tue, Oct 1, 2013 at 2:34 PM, Rene Kochen rene.koc...@emea.schange.comwrote:

 However, Row Size and Column Count keep the values. When are they
 re-evaluated?


They are re-evaluated during compaction.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: nodetool cfhistograms refresh

2013-10-01 Thread Rene Kochen
Thanks!

Does that mean that cfhistograms scans all Statistics.db files in order to
populate the Row Size and Column Count values?


2013/10/1 Tyler Hobbs ty...@datastax.com


 On Tue, Oct 1, 2013 at 2:34 PM, Rene Kochen 
 rene.koc...@emea.schange.comwrote:

 However, Row Size and Column Count keep the values. When are they
 re-evaluated?


 They are re-evaluated during compaction.


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Rollback question regarding system metadata change

2013-10-01 Thread Chris Wirt
Yep they still work. They dont acutally have any of the new system CF
created for 2.0, paxos, etc.. but they do have new rows in the
schema_columns table preventing startup and bootstrapping of new
nodes.

If I drop the keyspace and recreate quickly, I'm guessing that will
cause quite a large mess for a couple minutes while i recreate the
schema and load the sstables.
But anyway, actions to do this would be:
- drop schema (wont actually delete data?)
- create schema (will create all the metadata and leave my data
directories alone?)
- on each node run nodetool refresh (will load my existing data?)
sound realistic?









On 1 October 2013 19:40, Robert Coli rc...@eventbrite.com wrote:
 On Tue, Oct 1, 2013 at 10:51 AM, Christopher Wirt chris.w...@struq.com
 wrote:

 Moving back to 1.2.10. What is the procedure roll back from 2.0.1?



 Changes in the system schema seem to make this quite difficult.

 ...

 First, unfortunately downgrade is not, in my understanding, a supported
 operation.

 In my efforts I’ve now completely taken DC3 offline and already tried:


 Are you seeing the new schema on the 1.2.10 nodes?

 If the schema has been modified globally via gossip, I don't understand how
 your 1.2.10 nodes are currently working. It seems like there may be a risk
 of them not working if you restart them?

 If the schema changes have been seen globally, the safest solution is to
 dump, drop, and re-create your schema. This is likely to require a downtime
 across all DCs. You could probably do this semi-online by doing the
 (theoretical) clone-a-keyspace-with-hard-links operation, but that would
 require application changes/synchronization etc...

 =Rob


Re: nodetool cfhistograms refresh

2013-10-01 Thread Tyler Hobbs
On Tue, Oct 1, 2013 at 3:52 PM, Rene Kochen rene.koc...@emea.schange.comwrote:

 Does that mean that cfhistograms scans all Statistics.db files in order to
 populate the Row Size and Column Count values?


On startup, yes.  After that, it should be updated as new SSTables are
created.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Rollback question regarding system metadata change

2013-10-01 Thread Robert Coli
On Tue, Oct 1, 2013 at 3:45 PM, Chris Wirt chris.w...@struq.com wrote:

 Yep they still work. They dont acutally have any of the new system CF
 created for 2.0, paxos, etc.. but they do have new rows in the
 schema_columns table preventing startup and bootstrapping of new
 nodes.


It *may* be least risky to manually remove these rows and then restart DC3.
But unfortunately without really diving into the code, I can't make any
statement about what effects it might have.


 But anyway, actions to do this would be:
 - drop schema (wont actually delete data?)


What actually happens is that you automatically create a snapshot in the
snapshots dir when you drop, so you would have to move (or (better) hard
link) those files back into place.


 - create schema (will create all the metadata and leave my data
 directories alone?)
 - on each node run nodetool refresh (will load my existing data?)


Right. Refresh will rename all SSTables while opening them.

As an alternative to refresh, you can restart the node; Cassandra loads
whatever files it finds in the data dir at startup.

=Rob


Re: Unbalanced ring mystery multi-DC issue with 1.1.11

2013-10-01 Thread Aaron Morton
Check the logs for messages about nodes going up and down, and also look at the 
MessagingService MBean for timeouts. If the node in DR 2 times out replying to 
DR1 the DR1 node will store a hint. 

Also when hints are stored they are TTL'd to the gc_grace_seconds for the CF 
(IIRC). If that's low the hints may not have been delivered. 

Am not aware of any specific tracking for failed hints other than log messages. 

A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 28/09/2013, at 12:01 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 Here is some more information.
 
 I am running full repair on one of the nodes and I am observing strange 
 behavior.
 
 Both DCs were up during the data load. But repair is reporting a lot of 
 out-of-sync data. Why would that be ? Is there a way for me to tell that WAN 
 may be dropping hinted handoff traffic ?
 
 Regards,
 Oleg
 
 On 2013-09-27 10:35:34 +, Oleg Dulin said:
 
 Wanted to add one more thing:
 I can also tell that the numbers are not consistent across DRs this way -- I 
 have a column family with really wide rows (a couple million columns).
 DC1 reports higher column counts than DC2. DC2 only becomes consistent after 
 I do the command a couple of times and trigger a read-repair. But why would 
 nodetool repair logs show that everything is in sync ?
 Regards,
 Oleg
 On 2013-09-27 10:23:45 +, Oleg Dulin said:
 Consider this output from nodetool ring:
 Address DC  RackStatus State   Load
 Effective-Ownership Token
 127605887595351923798765477786913079396
 dc1.5  DC1  RAC1Up Normal  32.07 GB50.00%0
 dc2.100DC2 RAC1Up Normal  8.21 GB 50.00%100
 dc1.6  DC1 RAC1Up Normal  32.82 GB50.00%
 42535295865117307932921825928971026432
 dc2.101DC2 RAC1Up Normal  12.41 GB50.00%
 42535295865117307932921825928971026532
 dc1.7  DC1 RAC1Up Normal  28.37 GB50.00%
 85070591730234615865843651857942052864
 dc2.102DC2 RAC1Up Normal  12.27 GB50.00%
 85070591730234615865843651857942052964
 dc1.8  DC1 RAC1Up Normal  27.34 GB50.00%
 127605887595351923798765477786913079296
 dc2.103DC2 RAC1Up Normal  13.46 GB50.00%
 127605887595351923798765477786913079396
 I concealed IPs and DC names for confidentiality.
 All of the data loading was happening against DC1 at a pretty brisk rate, 
 of, say, 200K writes per minute.
 Note how my tokens are offset by 100. Shouldn't that mean that load on each 
 node should be roughly identical ? In DC1 it is roughly around 30 G on each 
 node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range.
 To verify that the nodes are in sync, I ran nodetool -h localhost repair 
 MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I 
 see that the repair went really quick and all column families are in sync!
 I need help making sense of this. Is this because DC1 is not fully 
 compacted ? Is it because DC2 is not fully synced and I am not checking 
 correctly ? How can I tell that there is still replication going on in 
 progress (note, I started my load yesterday at 9:50am).
 
 
 -- 
 Regards,
 Oleg Dulin
 http://www.olegdulin.com
 
 



Re: changing the primary key type of a table

2013-10-01 Thread Aaron Morton
 is there any downside using text as primary key? any performance impact on 
 the partition ? 
Nope. 

 There is no way to alter a table's primary key with a cql command, that is 
 what i have read, migrating to a new table seems to be the only way.
Yup.

 Is there any good recommendation how to do this? cqlsh copy or jsonsttable 
 seems like will all result in same datatype as before.
* create the new table
* backfill data to it while your app is writing to both the old and the new
* stop writing to the old
* drop the old. 


Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 28/09/2013, at 4:05 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 hi,
 we have a table that its primary key is uuid type. Now we decide that we need 
 to use text type as it is more flexible for our application.
  
 #1
 is there any downside using text as primary key? any performance impact on 
 the partition ? 
  
 #2
 There is no way to alter a table's primary key with a cql command, that is 
 what i have read, migrating to a new table seems to be the only way.
 Is there any good recommendation how to do this? cqlsh copy or jsonsttable 
 seems like will all result in same datatype as before.
  
 thanks
  



Re: What is the best way to install upgrade Cassandra on Ubuntu ?

2013-10-01 Thread Aaron Morton
 Does DSC include other things like Opscenter by default ? 
Not sure, I've normally installed it with an existing cluster.

 Would it be possible to remove any of these installations but keeping the 
 data intact  easily switch to the another, I mean switching from DSC package 
 to apache one or vice versa ?
Yes. 
Same code, same data. 

A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 30/09/2013, at 9:58 PM, Ertio Lew ertio...@gmail.com wrote:

 Thanks Aaron! 
 
 Does DSC include other things like Opscenter by default ? I installed DSC on 
 linux, but Opscenter wasn't installed there but when tried on Windows it was 
 installed along with JRE  python, using the windows installer. 
 
 Would it be possible to remove any of these installations but keeping the 
 data intact  easily switch to the another, I mean switching from DSC package 
 to apache one or vice versa ?
 
 
 On Mon, Sep 30, 2013 at 1:10 PM, Aaron Morton aa...@thelastpickle.com wrote:
 I am not sure if I should use datastax's DSC or official Debian packages 
 from Cassandra. How do I choose between them for a production server ?
 They are technically the same. 
 The DSC update will come out a little after the Apache release, and I _think_ 
 they release for every Apache release.
 
  1.  when I upgrade to a newer version, would that retain my previous 
 configurations so that I don't need to configure everything again ? 
 
 Yes if you select that when doing the package install. 
 
 2.  would that smoothly replace the previous installation by itself ?
 
 
 Yes
 
 3.  what's the way (kindly, if you can tell the command) to upgrade ?
 
 
 http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#upgrade/upgradeC_c.html#concept_ds_yqj_5xr_ck
 
 4. when should I prefer datastax's dsc to that ? (I need to install for 
 production env.)
 
 Above
 
 Hope that helps. 
 
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 27/09/2013, at 11:01 PM, Ertio Lew ertio...@gmail.com wrote:
 
 I am not sure if I should use datastax's DSC or official Debian packages 
 from Cassandra. How do I choose between them for a production server ?
 
 
 
 On Fri, Sep 27, 2013 at 11:02 AM, Ertio Lew ertio...@gmail.com wrote:
 
  Could you please clarify that:
 1.  when I upgrade to a newer version, would that retain my previous 
 configurations so that I don't need to configure everything again ? 
 2.  would that smoothly replace the previous installation by itself ?
 3.  what's the way (kindly, if you can tell the command) to upgrade ?
 4. when should I prefer datastax's dsc to that ? (I need to install for 
 production env.)
 
 
 On Fri, Sep 27, 2013 at 12:50 AM, Robert Coli rc...@eventbrite.com wrote:
 On Thu, Sep 26, 2013 at 12:05 PM, Ertio Lew ertio...@gmail.com wrote:
 How do you install Cassandra on Ubuntu  later how do you upgrade the 
 installation on the node when an update has arrived ? Do you simply download 
  replace the latest tar.gz, untar it to replace the older cassandra files? 
 How do you do it ? How does this upgrade process differ for a major version 
 upgrade, like say switching from 1.2 series to 2.0 series ?
 
 Use the deb packages. To upgrade, install the new package. Only upgrade a 
 single major version. and be sure to consult NEWS.txt for any upgrade 
 caveats.
 
 Also be aware of this sub-optimal behavior of the debian packages :
 
 https://issues.apache.org/jira/browse/CASSANDRA-2356
 
 =Rob
 
 
 
 
 



Re: 2.0.1 counter replicate on write error

2013-10-01 Thread Aaron Morton
 Thanks Aaron, I’ve added to the ticket. We were not running on TRACE logging.
Thanks. 

The only work around I can think of is using nodetool scrub. That will read the 
-Data.db file and re-write it and the other components. 

Remember to snapshot first for roll back. 


Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 30/09/2013, at 10:43 PM, Christopher Wirt chris.w...@struq.com wrote:

 Thanks Aaron, I’ve added to the ticket. We were not running on TRACE logging.
  
 From: Aaron Morton [mailto:aa...@thelastpickle.com] 
 Sent: 30 September 2013 08:37
 To: user@cassandra.apache.org
 Subject: Re: 2.0.1 counter replicate on write error
  
 ERROR [ReplicateOnWriteStage:19] 2013-09-27 10:17:14,778 CassandraDaemon.java 
 (line 185) Exception in thread Thread[ReplicateOnWriteStage:19,5,main]
 java.lang.AssertionError: DecoratedKey(-1754949563326053382, 
 a414b0c07f0547f8a75410555716ced6) != DecoratedKey(-1754949563326053382, 
 aeadcec8184445d4ab631ef4250927d0) in 
 /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831953-Data.db
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:114)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:62)
  
 When reading from an SSTable the position returned from the -Index.db / KEYS 
 cache pointed to a row in the -Data.db component that was for a different 
 row. 
  
 DecoratedKey(-1754949563326053382, aeadcec8184445d4ab631ef4250927d0)
 Was what we were searching for
  
 DecoratedKey(-1754949563326053382, a414b0c07f0547f8a75410555716ced6)
 Is what was found in the data component. 
  
 The first part is the Token (M3 hash) the second is the key. It looks like a 
 collision, but it could also be a bug somewhere else. 
  
 Code in SSTableReader.getPosition() points to 
 https://issues.apache.org/jira/browse/CASSANDRA-4687 and adds an assertion 
 that is only trigger if TRACE logging is running. Can you add to the 4687 
 ticket and update the thread ? 
  
 Cheers
  
 -
 Aaron Morton
 New Zealand
 @aaronmorton
  
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
  
 On 27/09/2013, at 10:50 PM, Christopher Wirt chris.w...@struq.com wrote:
 
 
 Hello,
  
 I’ve started to see a slightly worrying error appear in our logs 
 occasionally. We’re writing at 400qps per machine and I only see this appear 
 every 5-10minutes.
  
 Seems to have started when I switched us to using the hsha thrift server this 
 morning. We’ve been running 2.0.1 ran off the sync thrift server since 
 yesterday without seeing this error.  But might not be related.
  
 There are some machines in another DC still running 1.2.10.
  
 Anyone seen this before? Have any insight?
  
 ERROR [ReplicateOnWriteStage:19] 2013-09-27 10:17:14,778 CassandraDaemon.java 
 (line 185) Exception in thread Thread[ReplicateOnWriteStage:19,5,main]
 java.lang.AssertionError: DecoratedKey(-1754949563326053382, 
 a414b0c07f0547f8a75410555716ced6) != DecoratedKey(-1754949563326053382, 
 aeadcec8184445d4ab631ef4250927d0) in 
 /disk3/cassandra/data/struqrealtime/counters/struqrealtime-counters-jb-831953-Data.db
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:114)
 at 
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:62)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:87)
 at 
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:249)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1468)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1294)
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:332)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:55)
 at 
 org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:100)
 at 
 org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.java:1107)
 at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1897)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)



Re: Refactoring old project

2013-10-01 Thread Aaron Morton
I would try:

Comments CF:
row_key: (thing_type : thing_id ) where thing_type is city etc
column_name: (comment_id (reversed)) where comment_id is a timeuuid
column_value: the comment. 

You will need to be wary of very wide rows. 

It's a pretty simple model for CQL 3 as well:

CREATE TABLE comments (
thing_type  text, 
thing_idlong, 
comment_id  timeuuid,
bodytext
usertext, 
PRIMARY KEY ( (thing_type, thing_id), comment_id)
)

or 

CREATE TABLE comments (
thing_type  text, 
thing_idlong,
created_at  timestamp. 
usertext, 
comment_id  long,
bodytext

PRIMARY KEY ( (thing_type, thing_id), created_at, user)
)


Hope that helps. 

 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 28/09/2013, at 1:24 AM, cbert...@libero.it wrote:

 Hi all, in my very old Cassandra schema (started with 0.6 -- so without 
 secondary indexes -- and now on 1.0.6) I have a ratingreview platform with 
 about 1 million review. The core of the application is the review that a user 
 can leave about a company. At the time I created many CF: Comments, 
 UserComments, CompanyComments , CityComments -- and I used timeuuid to keep 
 data sorted in the way i needed (UserComments/CompanyComments/CityComments 
 did 
 not keep real comments but just a referece [id] to the comment table)
 
 Since I need comments to be sorted by date, what would be the best way to 
 write it again using cassandra 2.0?
 Obviously all these CF will merge into one. What I would need is to perform 
 query likes ...
 
 Get latest X comments in a specific city
 Get latest X comments of a company
 Get latest X comments of a user
 
 I can't sort client side because, even if for user/company I can have up to 
 200 reviews, for a city I can have 50.000 and more comments.
 I know that murmur3 is the suggested one but I wonder if this is not the case 
 to use the Order Preserving.
 
 A row entry would be something like
 
 CommentID (RowKey) -- companyId -- userId -- text - vote - city
 
 Another idea is to use a composite key made by (city, commentid) so I would 
 have all comments sorted by city for free and could perform client-side 
 sorting 
 for user/company comments. Am I missing something? 
 
 TIA,
 Carlo
 
 



Re: temporal solution to CASSANDRA-5543: BUILD FAILED at gen-cql2-grammar target

2013-10-01 Thread Aaron Morton
 In the build file,  I see that cassandra uses the jar lib at the ${build.lib} 
  folder, in this case antlr-3.2.jar
my bad. 

• ant generate-cql-html
• ant maven-ant-tasks-init
  and then  execute ant default target
 
 
• ant
Does it work if you use 

ant clean
ant build
ant artifacts 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 30/09/2013, at 11:10 PM, Miguel Angel Martin junquera 
mianmarjun.mailingl...@gmail.com wrote:

 hi:
 
 
 
 is that mean that antlr-3.2.jar is not the correct version?
 
 what is the correct version?
 
 In the build file,  I see that cassandra uses the jar lib at the ${build.lib} 
  folder, in this case antlr-3.2.jar
 
 
 ...
  
  target name=gen-cql2-grammar depends=check-gen-cql2-grammar 
 unless=cql2current
   echoBuilding Grammar ${build.src.java}/org/apache/cassandra/cql/Cql.g 
  .../echo
   java classname=org.antlr.Tool
 classpath=${build.lib}/antlr-3.2.jar
 fork=true
 failonerror=true
  arg value=${build.src.java}/org/apache/cassandra/cql/Cql.g /
  arg value=-fo /
  arg value=${build.src.gen-java}/org/apache/cassandra/cql/ /
   /java
 /target
 
 ...
 
 
 
 thanks in advance
 
 
 
 Miguel Angel Martín Junquera
 Analyst Engineer.
 miguelangel.mar...@brainsins.com
 
 
 
 2013/9/30 Aaron Morton aa...@thelastpickle.com
 It's an error in the antlr compilation, check the antlr versions.
 
 Cheers
 
 -
 Aaron Morton
 New Zealand
 @aaronmorton
 
 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com
 
 On 27/09/2013, at 11:53 PM, Miguel Angel Martin junquera 
 mianmarjun.mailingl...@gmail.com wrote:
 
 
 
 
 
  hi all:
 
  Environment
 
• apache-cassandra-2.0.1-src
• EC2
• Linux version 3.2.30-49.59.amzn1.x86_64 
  (mockbuild@gobi-build-31003) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3)
 
  When i try to build apache-cassandra-2.0.1-src in EC2 red had AMI, I  have 
  this error at  the target, gen-cql2-grammar:
 
 
  gen-cql2-grammar:
   [echo] Building Grammar 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g
 
  ….
 
[java] warning(209): 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g:638:1:
   Multiple token rules can match input such as '0'..'9': INTEGER, FLOAT, 
  UUID
   [java]
   [java] As a result, token(s) FLOAT,UUID were disabled for that input
   [java] warning(209): 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g:634:1:
   Multiple token rules can match input such as 'I': K_INSERT, K_IN, 
  K_INDEX, K_INTO, IDENT, COMPIDENT
   [java]
   [java] As a result, token(s) K_IN,K_INDEX,K_INTO,IDENT,COMPIDENT were 
  disabled for that input
   [java] warning(209): 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g:634:1:
   Multiple token rules can match input such as {'R', 'r'}: K_REVERSED, 
  IDENT, COMPIDENT
   [java]
   [java] As a result, token(s) IDENT,COMPIDENT were disabled for that 
  input
   [java] warning(209): 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g:634:1:
   Multiple token rules can match input such as 'T': K_LEVEL, K_TRUNCATE, 
  K_COLUMNFAMILY, K_TIMESTAMP, K_TTL, K_TYPE, IDENT, COMPIDENT
   [java]
   [java] As a result, token(s) 
  K_TRUNCATE,K_COLUMNFAMILY,K_TIMESTAMP,K_TTL,K_TYPE,IDENT,COMPIDENT were 
  disabled for that input
   [java] error(208): 
  /home/ec2-user/apache-cassandra/src/java/org/apache/cassandra/cql/Cql.g:654:1:
   The following token definitions can never be matched because prior tokens 
  match the same input: 
  T__93,T__94,T__97,T__98,T__101,T__105,T__107,K_WITH,K_USING,K_USE,K_FIRST,K_COUNT,K_SET,K_APPLY,K_BATCH,K_TRUNCATE,K_IN,K_CREATE,K_KEYSPACE,K_COLUMNFAMILY,K_INDEX,K_ON,K_DROP,K_INTO,K_TIMESTAMP,K_TTL,K_ALTER,K_ADD,K_TYPE,RANGEOP,FLOAT,COMPIDENT,UUID,MULTILINE_COMMENT
 
  BUILD FAILED
  /home/ec2-user/apache-cassandra/build.xml:218: Java returned: 1
 
 
 
 
 
 
  If  I  execute these targets in the next order first:
 
 
 
• ant generate-cql-html
• ant maven-ant-tasks-init
  and then  execute ant default target
 
 
• ant
 
 
  the project build succesfully.
 
  Regards.
 
 
 
 
 
  note:
 
  I do not have this issue in my mac.
 
 
 
 
 
 
  Miguel Angel Martín Junquera
  Analyst Engineer.
  miguelangel.mar...@brainsins.com
 
 
 



Re: Cassandra Summit EU 2013

2013-10-01 Thread Aaron Morton
I'll be there :)

* Conducting training with DS on the 16th and 18th. (Still tickets available on 
the 18th http://www.datastax.com/cassandraeurope2013)
* Participating in the panel discussion for Cassandra London with Time Moreton 
(Acnunu), Patrick McFadin (DataStax), Al Tobey (DataStax) on the 16th 
http://www.meetup.com/Cassandra-London/events/142497992/
* Speaking at the Conference on the 17th. 

Hopefully I'll get to speak to lots of people outside of these times, 

Cheers
Aaron


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 1/10/2013, at 2:12 AM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote:

 For those in the Europe area, there will be a Cassandra Summit EU 2013 in 
 London in the month of October.  On 17 October, there will be the main 
 conference sessions and the 16th and 18th there will be Cassandra workshops.
 
 http://www.datastax.com/cassandraeurope2013
 
 The speakers have been announced and the presentation abstracts are all on 
 there.  Like always, the presentations will be recorded and posted on Planet 
 Cassandra, but it's great to meet and interact with people in the community - 
 in my opinion that's the best part of any conference.
 
 Anyway, just wanted to make sure people knew.
 
 Cheers,
 
 Jeremy



Cassandra Heap Size for data more than 1 TB

2013-10-01 Thread srmore
Does anyone know what would roughly be the heap size for cassandra with 1TB
of data ? We started with about 200 G and now on one of the nodes we are
already on 1 TB. We were using 8G of heap and that served us well up until
we reached 700 G where we started seeing failures and nodes flipping.

With 1 TB of data the node refuses to come back due to lack of memory.
needless to say repairs and compactions takes a lot of time. We upped the
heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.
the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we
started seeing the same symptoms as we were seeing with 8 G.

So my question is how do I determine what is the optimal size of heap for
data around 1 TB ?

Following are some of my JVM settings

-Xms8G
-Xmx8G
-Xmn800m
-XX:NewSize=1200M
XX:MaxTenuringThreshold=2
-XX:SurvivorRatio=4

Thanks !


Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
ok found out the problem..

i was using something like:
select * from log where token(mykey)  token(maxTimeuuid(x)) limit 100;

instead I should just simply use
select * from log where token(mykey)  token(key_from_last_result) limit
100;


the fake timeuuid although represent the time from last key, but doesn't do
anything good to the token function. The argument to the token should
really be the actual key value.


On Tue, Oct 1, 2013 at 9:32 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 thanks, yea i am aware of that, and have already taken care.

  I just also found out a similar thread back in June

 http://mail-archives.apache.org/mod_mbox/cassandra-user/201306.mbox/%3ccakkz8q2no6oucbwnveomn_ymxfh0nkpqvtym55jmvwa2qwx...@mail.gmail.com%3E

 Somone was saying


 Long story short, using non-equal condition on the partition key (i.e. the
 first part of your primary key) is generally not advised. Or to put it
 another way, the use of the byte ordering partitioner is discouraged. But
 if you still want to use the ordering partitioner and do range queries on
 the partition key, do not use a timeuuid, because the ordering that the
 partitioner enforce will not be one that is meaningful (due to the timeuuid
 layout).







 So can't ues token on a timeuuid key?





 On Tue, Oct 1, 2013 at 9:18 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:

 Maybe you are hitting the problem that your 'pages' can get truncated in
 the middle of a wide row.

 See
 https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ

 Jan



 On 01.10.2013, at 18:12, Jimmy Lin y2klyf+w...@gmail.com wrote:

  unfortunately, i have to stick with 1.2 for now for a while.
 
  So I am looking for the old fashion way to do the pagination correctly.
 
  I think i follow most of the articles on how to paging through a table,
 but maybe have some silly gap that don't give me the correct behavior or it
 is timeuuid not working for token function?
 
 
 
  On Tue, Oct 1, 2013 at 8:57 AM, David Ward da...@shareablee.com
 wrote:
  2.0 has a lot of really exciting stuff, unfortunately 2.0 has a lot of
 really exciting stuff that may increase the risk of updating to 2.0 just
 yet.
 
 
  On Tue, Oct 1, 2013 at 9:30 AM, Jan Algermissen 
 jan.algermis...@nordsc.com wrote:
  Jimmy,
 
  On 01.10.2013, at 17:26, Jimmy Lin y2klyf+w...@gmail.com wrote:
 
   i have a table like the following:
  
   CREATE TABLE log (
   mykey timeuuid,
   type text,
   msg text,
   primary key(mykey, type)
   );
  
   I want to page through all the results from the table using
 
  Have you considered the new build-in paging support:
 
 
 http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
 
  Jan
 
  
   select * from log where token(mykey)  token(maxTimeuuid(x))
 limit 100;
  
 
 
   (where xxx is 0 for the first query, and next one to be the time of
 the mykey(timeuuid) from the last query result)
  
   But i seem to get random result.
  
   #1
   is the above logic make sense for timeuuid type pagination?
  
   #2
   when we use token in the where clase, is the result return sorted?
   e.g
   where token(k)  token(4) AND token(k)   token(10) limit 3
  
   k=5, k=6, k=7
   or
   k=7, k=5, k=9
  
   ?
  
   I see lot of article use LIMIT to achieve page size, but if the
 result is not sorted, then it is possible to miss item?
  
  
   thanks
  
  
 
 
 





Re: Cassandra Heap Size for data more than 1 TB

2013-10-01 Thread Mohit Anchlia
Which Cassandra version are you on? Essentially heap size is function of
number of keys/metadata. In Cassandra 1.2 lot of the metadata like bloom
filters were moved off heap.

On Tue, Oct 1, 2013 at 9:34 PM, srmore comom...@gmail.com wrote:

 Does anyone know what would roughly be the heap size for cassandra with
 1TB of data ? We started with about 200 G and now on one of the nodes we
 are already on 1 TB. We were using 8G of heap and that served us well up
 until we reached 700 G where we started seeing failures and nodes flipping.

 With 1 TB of data the node refuses to come back due to lack of memory.
 needless to say repairs and compactions takes a lot of time. We upped the
 heap from 8 G to 12 G and suddenly everything started moving rapidly i.e.
 the repair tasks and the compaction tasks. But soon (in about 9-10 hrs) we
 started seeing the same symptoms as we were seeing with 8 G.

 So my question is how do I determine what is the optimal size of heap for
 data around 1 TB ?

 Following are some of my JVM settings

 -Xms8G
 -Xmx8G
 -Xmn800m
 -XX:NewSize=1200M
 XX:MaxTenuringThreshold=2
 -XX:SurvivorRatio=4

 Thanks !