Re: Upgrade from 2.0.9 to 2.1.3

2015-03-07 Thread Fredrik Larsson Stigbäck
Thanks for the advice.
/Fredrik

 7 mar 2015 kl. 02:25 skrev graham sanderson gra...@vast.com:
 
 Note for anyone who accidentally or otherwise ends up with 2.1.3 in a 
 situation they cannot downgrade, feel free to look at 
 
 https://github.com/vast-engineering/cassandra/tree/vast-cassandra-2.1.3 
 https://github.com/vast-engineering/cassandra/tree/vast-cassandra-2.1.3
 
 We sometimes make custom versions incorporating as many important patches as 
 we reasonably can that we need to run a newer C* environment successfully.
 
 Obviously use at your own risk blah blah… basically install procedure would 
 be to replace the main cassandra jar on a 2.1.3 node while it is down.
 
 On Mar 6, 2015, at 3:15 PM, Robert Coli rc...@eventbrite.com 
 mailto:rc...@eventbrite.com wrote:
 
 On Fri, Mar 6, 2015 at 6:25 AM, graham sanderson gra...@vast.com 
 mailto:gra...@vast.com wrote:
 I would definitely wait for at least 2.1.4
 
 +1
 
 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 
 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
 
 =Rob
  
 



Re: cassandra node jvm stall intermittently

2015-03-07 Thread Jason Wee
Hi Jan, thanks for your time to prepare the question and answer below,


   - How many nodes do you have on the ring ?
   12

   - What is the activity when this occurs  - reads / writes/ compactions
?
   This cluster has a lot of writes and read. off peak period, ops center
   shown cluster write is about 5k/sec, read about 1k/sec and during peak
   period, write could be 22k/sec and read about 10k/sec. So this particular
   one node hang like every moment irrespect if it is peak or non peak, or
   compaction.

   - Is there anything that is unique about this node that makes it
   different from the other nodes ?
   Our nodes are same in term of operating system (centos 6) and cassandra
   configuration settings. Other than that, there are no other resources
   intensive application running in cassandra nodes.


   - Is this a periodic occurance OR a single occurence -   I am trying to
   determine a pattern about when this shows up.
   it *always* happened and in fact, it is happening now.

   - What is the load distribution the ring (ie: is this node carrying more
   load than the others).
   As of this moment,

   - Address DC  RackStatus State   Load
OwnsToken
   -
155962751505430129087380028406227096910
   - node1  us-east 1e  Up Normal  498.66 GB
   8.33%   0
   - node2  us-east 1e  Up Normal  503.36 GB
   8.33%   14178431955039102644307275309657008810
   - node3  us-east 1e  Up Normal  492.08 GB
   8.33%   28356863910078205288614550619314017619
   - node4  us-east 1e  Up Normal  499.54 GB
   8.33%   42535295865117307932921825928971026430
   - node5  us-east 1e  Up Normal  523.76 GB
   8.33%   56713727820156407428984779325531226109
   - node6  us-east 1e  Up Normal  515.36 GB
   8.33%   70892159775195513221536376548285044050
   - node7  us-east 1e  Up Normal  588.93 GB
   8.33%   85070591730234615865843651857942052860
   - node8  us-east 1e  Up Normal  498.51 GB
   8.33%   99249023685273718510150927167599061670
   - node9  us-east 1e  Up Normal  531.81 GB
   8.33%   113427455640312814857969558651062452221
   - node10 us-east 1e  Up Normal  501.85 GB
   8.33%   127605887595351923798765477786913079290
   - node11 us-east 1e  Up Normal  501.13 GB
   8.33%   141784319550391026443072753096570088100
   - node12 us-east 1e  Up Normal  508.45 GB
   8.33%   155962751505430129087380028406227096910

   so that one node is node5. At this instance ring output, yea, it is
   second highest in the ring but unlikely this is the cause.


Jason

On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote:

 HI Jason;

 The single node showing the anomaly is a hint that the problem is probably
 local to a node (as you suspected).

- How many nodes do you have on the ring ?
- What is the activity when this occurs  - reads / writes/ compactions
 ?
- Is there anything that is unique about this node that makes it
different from the other nodes ?
- Is this a periodic occurance OR a single occurence -   I am trying
to determine a pattern about when this shows up.
- What is the load distribution the ring (ie: is this node carrying
more load than the others).


 The system.log should have  more info.,about it.

 hope this helps
 Jan/





   On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com wrote:


 well, StatusLogger.java started shown in cassandra
 system.log, MessagingService.java also shown some stage (e.g. read,
 mutation) dropped.

 It's strange it only happen in this node but this type of message does not
 shown in other node log file at the same time...

 Jason

 On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote:

 HI Jason;

 Whats in the log files at the moment jstat shows 100%.
 What is the activity on the cluster  the node at the specific point in
 time (reads/ writes/ joins etc)

 Jan/


   On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com
 wrote:


 Hi, our cassandra node using java 7 update 72 and we ran jstat on one of
 the node, and notice some strange behaviour as indicated by output below.
 any idea why when eden space stay the same for few seconds like 100% and
 18.02% for few seconds? we suspect such stalling cause timeout to our
 cluster.

 any idea what happened, what went wrong and what could cause this?


 $ jstat -gcutil 32276 1s

   0.00   5.78  91.21  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   

Re: cassandra node jvm stall intermittently

2015-03-07 Thread Jason Wee
hey Ali, 1.0.8

On Sat, Mar 7, 2015 at 5:20 PM, Ali Akhtar ali.rac...@gmail.com wrote:

 What version are you running?

 On Sat, Mar 7, 2015 at 2:14 PM, Jason Wee peich...@gmail.com wrote:


 Hi Jan, thanks for your time to prepare the question and answer below,


- How many nodes do you have on the ring ?
12

- What is the activity when this occurs  - reads / writes/
compactions  ?
This cluster has a lot of writes and read. off peak period, ops
center shown cluster write is about 5k/sec, read about 1k/sec and during
peak period, write could be 22k/sec and read about 10k/sec. So this
particular one node hang like every moment irrespect if it is peak or non
peak, or compaction.

- Is there anything that is unique about this node that makes it
different from the other nodes ?
Our nodes are same in term of operating system (centos 6) and
cassandra configuration settings. Other than that, there are no other
resources intensive application running in cassandra nodes.


- Is this a periodic occurance OR a single occurence -   I am trying
to determine a pattern about when this shows up.
it *always* happened and in fact, it is happening now.

- What is the load distribution the ring (ie: is this node carrying
more load than the others).
As of this moment,

- Address DC  RackStatus State   Load
   OwnsToken
-
   155962751505430129087380028406227096910
- node1  us-east 1e  Up Normal  498.66 GB
8.33%   0
- node2  us-east 1e  Up Normal  503.36 GB
8.33%   14178431955039102644307275309657008810
- node3  us-east 1e  Up Normal  492.08 GB
8.33%   28356863910078205288614550619314017619
- node4  us-east 1e  Up Normal  499.54 GB
8.33%   42535295865117307932921825928971026430
- node5  us-east 1e  Up Normal  523.76 GB
8.33%   56713727820156407428984779325531226109
- node6  us-east 1e  Up Normal  515.36 GB
8.33%   70892159775195513221536376548285044050
- node7  us-east 1e  Up Normal  588.93 GB
8.33%   85070591730234615865843651857942052860
- node8  us-east 1e  Up Normal  498.51 GB
8.33%   99249023685273718510150927167599061670
- node9  us-east 1e  Up Normal  531.81 GB
8.33%   113427455640312814857969558651062452221
- node10 us-east 1e  Up Normal  501.85 GB
8.33%   127605887595351923798765477786913079290
- node11 us-east 1e  Up Normal  501.13 GB
8.33%   141784319550391026443072753096570088100
- node12 us-east 1e  Up Normal  508.45 GB
8.33%   155962751505430129087380028406227096910

so that one node is node5. At this instance ring output, yea, it is
second highest in the ring but unlikely this is the cause.


 Jason

 On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote:

 HI Jason;

 The single node showing the anomaly is a hint that the problem is
 probably local to a node (as you suspected).

- How many nodes do you have on the ring ?
- What is the activity when this occurs  - reads / writes/
compactions  ?
- Is there anything that is unique about this node that makes it
different from the other nodes ?
- Is this a periodic occurance OR a single occurence -   I am trying
to determine a pattern about when this shows up.
- What is the load distribution the ring (ie: is this node carrying
more load than the others).


 The system.log should have  more info.,about it.

 hope this helps
 Jan/





   On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com
 wrote:


 well, StatusLogger.java started shown in cassandra
 system.log, MessagingService.java also shown some stage (e.g. read,
 mutation) dropped.

 It's strange it only happen in this node but this type of message does
 not shown in other node log file at the same time...

 Jason

 On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote:

 HI Jason;

 Whats in the log files at the moment jstat shows 100%.
 What is the activity on the cluster  the node at the specific point in
 time (reads/ writes/ joins etc)

 Jan/


   On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com
 wrote:


 Hi, our cassandra node using java 7 update 72 and we ran jstat on one of
 the node, and notice some strange behaviour as indicated by output below.
 any idea why when eden space stay the same for few seconds like 100% and
 18.02% for few seconds? we suspect such stalling cause timeout to our
 cluster.

 any idea what happened, what went wrong and what could cause this?


 $ jstat -gcutil 32276 1s

   0.00   5.78  91.21  70.94  60.07   2657   73.437 40.056
 73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056
 73.493
   0.00   5.78 

Re: cassandra node jvm stall intermittently

2015-03-07 Thread Ali Akhtar
What version are you running?

On Sat, Mar 7, 2015 at 2:14 PM, Jason Wee peich...@gmail.com wrote:


 Hi Jan, thanks for your time to prepare the question and answer below,


- How many nodes do you have on the ring ?
12

- What is the activity when this occurs  - reads / writes/ compactions
 ?
This cluster has a lot of writes and read. off peak period, ops center
shown cluster write is about 5k/sec, read about 1k/sec and during peak
period, write could be 22k/sec and read about 10k/sec. So this particular
one node hang like every moment irrespect if it is peak or non peak, or
compaction.

- Is there anything that is unique about this node that makes it
different from the other nodes ?
Our nodes are same in term of operating system (centos 6) and
cassandra configuration settings. Other than that, there are no other
resources intensive application running in cassandra nodes.


- Is this a periodic occurance OR a single occurence -   I am trying
to determine a pattern about when this shows up.
it *always* happened and in fact, it is happening now.

- What is the load distribution the ring (ie: is this node carrying
more load than the others).
As of this moment,

- Address DC  RackStatus State   Load
   OwnsToken
-
   155962751505430129087380028406227096910
- node1  us-east 1e  Up Normal  498.66 GB
8.33%   0
- node2  us-east 1e  Up Normal  503.36 GB
8.33%   14178431955039102644307275309657008810
- node3  us-east 1e  Up Normal  492.08 GB
8.33%   28356863910078205288614550619314017619
- node4  us-east 1e  Up Normal  499.54 GB
8.33%   42535295865117307932921825928971026430
- node5  us-east 1e  Up Normal  523.76 GB
8.33%   56713727820156407428984779325531226109
- node6  us-east 1e  Up Normal  515.36 GB
8.33%   70892159775195513221536376548285044050
- node7  us-east 1e  Up Normal  588.93 GB
8.33%   85070591730234615865843651857942052860
- node8  us-east 1e  Up Normal  498.51 GB
8.33%   99249023685273718510150927167599061670
- node9  us-east 1e  Up Normal  531.81 GB
8.33%   113427455640312814857969558651062452221
- node10 us-east 1e  Up Normal  501.85 GB
8.33%   127605887595351923798765477786913079290
- node11 us-east 1e  Up Normal  501.13 GB
8.33%   141784319550391026443072753096570088100
- node12 us-east 1e  Up Normal  508.45 GB
8.33%   155962751505430129087380028406227096910

so that one node is node5. At this instance ring output, yea, it is
second highest in the ring but unlikely this is the cause.


 Jason

 On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote:

 HI Jason;

 The single node showing the anomaly is a hint that the problem is
 probably local to a node (as you suspected).

- How many nodes do you have on the ring ?
- What is the activity when this occurs  - reads / writes/
compactions  ?
- Is there anything that is unique about this node that makes it
different from the other nodes ?
- Is this a periodic occurance OR a single occurence -   I am trying
to determine a pattern about when this shows up.
- What is the load distribution the ring (ie: is this node carrying
more load than the others).


 The system.log should have  more info.,about it.

 hope this helps
 Jan/





   On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com wrote:


 well, StatusLogger.java started shown in cassandra
 system.log, MessagingService.java also shown some stage (e.g. read,
 mutation) dropped.

 It's strange it only happen in this node but this type of message does
 not shown in other node log file at the same time...

 Jason

 On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote:

 HI Jason;

 Whats in the log files at the moment jstat shows 100%.
 What is the activity on the cluster  the node at the specific point in
 time (reads/ writes/ joins etc)

 Jan/


   On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com
 wrote:


 Hi, our cassandra node using java 7 update 72 and we ran jstat on one of
 the node, and notice some strange behaviour as indicated by output below.
 any idea why when eden space stay the same for few seconds like 100% and
 18.02% for few seconds? we suspect such stalling cause timeout to our
 cluster.

 any idea what happened, what went wrong and what could cause this?


 $ jstat -gcutil 32276 1s

   0.00   5.78  91.21  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  60.07   2657   73.437 40.056   73.493
   0.00   5.78 100.00  70.94  

Re: Does DateTieredCompactionStrategy work with a compound clustering key?

2015-03-07 Thread mck

 I believe, that the DateTieredCompactionStrategy would work for PRIMARY
 KEY (timeblock, timestamp) -- but does it also work for PRIMARY KEY
 (timeblock, timestamp, hash) ?


Yes.

 (sure you don't want to be using a timeuuid instead?)

~mck


Re: best practices for time-series data with massive amounts of records

2015-03-07 Thread Eric Stevens
It's probably quite rare for extremely large time series data to be
querying the whole set of data.  Instead there's almost always a Between X
and Y dates aspect to nearly every real time query you might have against
a table like this (with the exception of most recent N events).

Because of this, time bucketing can be an effective strategy, though until
you understand your data better, it's hard to know how large (or small) to
make your buckets.  Because of *that*, I recommend using timestamp data
type for your bucketing strategy - this gives you the advantage of being
able to reduce your bucket sizes while keeping your at-rest data mostly
still quite accessible.

What I mean is that if you change your bucketing strategy from day to hour,
when you are querying across that changed time period, you can iterate at
the finer granularity buckets (hour), and you'll pick up the coarser
granularity (day) automatically for all but the earliest bucket (which is
easy to correct for when you're flooring your start bucket).  In the
coarser time period, most reads are partition key misses, which are
extremely inexpensive in Cassandra.

If you do need most-recent-N queries for broad ranges and you expect to
have some users whose clickrate is dramatically less frequent than your
bucket interval (making iterating over buckets inefficient), you can keep a
separate counter table with PK of ((user_id), bucket) in which you count
new events.  Now you can identify the exact set of buckets you need to read
to satisfy the query no matter what the user's click volume is (so very low
volume users have at most N partition keys queried, higher volume users
query fewer partition keys).

On Fri, Mar 6, 2015 at 4:06 PM, graham sanderson gra...@vast.com wrote:

 Note that using static column(s) for the “head” value, and trailing TTLed
 values behind is something we’re considering. Note this is especially nice
 if your head state includes say a map which is updated by small deltas
 (individual keys)

 We have not yet studied the effect of static columns on say DTCS


 On Mar 6, 2015, at 4:42 PM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi all,

 Thanks for the responses, this was very helpful.

 I don't know yet what the distribution of clicks and users will be, but I
 expect to see a few users with an enormous amount of interactions and most
 users having very few.  The idea of doing some additional manual
 partitioning, and then maintaining another table that contains the head
 partition for each user makes sense, although it would add additional
 latency when we want to get say the most recent 1000 interactions for a
 given user (which is something that we have to do sometimes for
 applications with tight SLAs).

 FWIW I doubt that any users will have so many interactions that they
 exceed what we could reasonably put in a row, but I wanted to have a
 strategy to deal with this.

 Having a nice design pattern in Cassandra for maintaining a row with the
 N-most-recent interactions would also solve this reasonably well, but I
 don't know of any way to implement that without running batch jobs that
 periodically clean out data (which might be okay).

 Best regards,
 Clint




 On Tue, Mar 3, 2015 at 8:10 AM, mck m...@apache.org wrote:


  Here partition is a random digit from 0 to (N*M)
  where N=nodes in cluster, and M=arbitrary number.


 Hopefully it was obvious, but here (unless you've got hot partitions),
 you don't need N.
 ~mck