Re: Upgrade from 2.0.9 to 2.1.3
Thanks for the advice. /Fredrik 7 mar 2015 kl. 02:25 skrev graham sanderson gra...@vast.com: Note for anyone who accidentally or otherwise ends up with 2.1.3 in a situation they cannot downgrade, feel free to look at https://github.com/vast-engineering/cassandra/tree/vast-cassandra-2.1.3 https://github.com/vast-engineering/cassandra/tree/vast-cassandra-2.1.3 We sometimes make custom versions incorporating as many important patches as we reasonably can that we need to run a newer C* environment successfully. Obviously use at your own risk blah blah… basically install procedure would be to replace the main cassandra jar on a 2.1.3 node while it is down. On Mar 6, 2015, at 3:15 PM, Robert Coli rc...@eventbrite.com mailto:rc...@eventbrite.com wrote: On Fri, Mar 6, 2015 at 6:25 AM, graham sanderson gra...@vast.com mailto:gra...@vast.com wrote: I would definitely wait for at least 2.1.4 +1 https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ =Rob
Re: cassandra node jvm stall intermittently
Hi Jan, thanks for your time to prepare the question and answer below, - How many nodes do you have on the ring ? 12 - What is the activity when this occurs - reads / writes/ compactions ? This cluster has a lot of writes and read. off peak period, ops center shown cluster write is about 5k/sec, read about 1k/sec and during peak period, write could be 22k/sec and read about 10k/sec. So this particular one node hang like every moment irrespect if it is peak or non peak, or compaction. - Is there anything that is unique about this node that makes it different from the other nodes ? Our nodes are same in term of operating system (centos 6) and cassandra configuration settings. Other than that, there are no other resources intensive application running in cassandra nodes. - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. it *always* happened and in fact, it is happening now. - What is the load distribution the ring (ie: is this node carrying more load than the others). As of this moment, - Address DC RackStatus State Load OwnsToken - 155962751505430129087380028406227096910 - node1 us-east 1e Up Normal 498.66 GB 8.33% 0 - node2 us-east 1e Up Normal 503.36 GB 8.33% 14178431955039102644307275309657008810 - node3 us-east 1e Up Normal 492.08 GB 8.33% 28356863910078205288614550619314017619 - node4 us-east 1e Up Normal 499.54 GB 8.33% 42535295865117307932921825928971026430 - node5 us-east 1e Up Normal 523.76 GB 8.33% 56713727820156407428984779325531226109 - node6 us-east 1e Up Normal 515.36 GB 8.33% 70892159775195513221536376548285044050 - node7 us-east 1e Up Normal 588.93 GB 8.33% 85070591730234615865843651857942052860 - node8 us-east 1e Up Normal 498.51 GB 8.33% 99249023685273718510150927167599061670 - node9 us-east 1e Up Normal 531.81 GB 8.33% 113427455640312814857969558651062452221 - node10 us-east 1e Up Normal 501.85 GB 8.33% 127605887595351923798765477786913079290 - node11 us-east 1e Up Normal 501.13 GB 8.33% 141784319550391026443072753096570088100 - node12 us-east 1e Up Normal 508.45 GB 8.33% 155962751505430129087380028406227096910 so that one node is node5. At this instance ring output, yea, it is second highest in the ring but unlikely this is the cause. Jason On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote: HI Jason; The single node showing the anomaly is a hint that the problem is probably local to a node (as you suspected). - How many nodes do you have on the ring ? - What is the activity when this occurs - reads / writes/ compactions ? - Is there anything that is unique about this node that makes it different from the other nodes ? - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. - What is the load distribution the ring (ie: is this node carrying more load than the others). The system.log should have more info.,about it. hope this helps Jan/ On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com wrote: well, StatusLogger.java started shown in cassandra system.log, MessagingService.java also shown some stage (e.g. read, mutation) dropped. It's strange it only happen in this node but this type of message does not shown in other node log file at the same time... Jason On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote: HI Jason; Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com wrote: Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such stalling cause timeout to our cluster. any idea what happened, what went wrong and what could cause this? $ jstat -gcutil 32276 1s 0.00 5.78 91.21 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07
Re: cassandra node jvm stall intermittently
hey Ali, 1.0.8 On Sat, Mar 7, 2015 at 5:20 PM, Ali Akhtar ali.rac...@gmail.com wrote: What version are you running? On Sat, Mar 7, 2015 at 2:14 PM, Jason Wee peich...@gmail.com wrote: Hi Jan, thanks for your time to prepare the question and answer below, - How many nodes do you have on the ring ? 12 - What is the activity when this occurs - reads / writes/ compactions ? This cluster has a lot of writes and read. off peak period, ops center shown cluster write is about 5k/sec, read about 1k/sec and during peak period, write could be 22k/sec and read about 10k/sec. So this particular one node hang like every moment irrespect if it is peak or non peak, or compaction. - Is there anything that is unique about this node that makes it different from the other nodes ? Our nodes are same in term of operating system (centos 6) and cassandra configuration settings. Other than that, there are no other resources intensive application running in cassandra nodes. - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. it *always* happened and in fact, it is happening now. - What is the load distribution the ring (ie: is this node carrying more load than the others). As of this moment, - Address DC RackStatus State Load OwnsToken - 155962751505430129087380028406227096910 - node1 us-east 1e Up Normal 498.66 GB 8.33% 0 - node2 us-east 1e Up Normal 503.36 GB 8.33% 14178431955039102644307275309657008810 - node3 us-east 1e Up Normal 492.08 GB 8.33% 28356863910078205288614550619314017619 - node4 us-east 1e Up Normal 499.54 GB 8.33% 42535295865117307932921825928971026430 - node5 us-east 1e Up Normal 523.76 GB 8.33% 56713727820156407428984779325531226109 - node6 us-east 1e Up Normal 515.36 GB 8.33% 70892159775195513221536376548285044050 - node7 us-east 1e Up Normal 588.93 GB 8.33% 85070591730234615865843651857942052860 - node8 us-east 1e Up Normal 498.51 GB 8.33% 99249023685273718510150927167599061670 - node9 us-east 1e Up Normal 531.81 GB 8.33% 113427455640312814857969558651062452221 - node10 us-east 1e Up Normal 501.85 GB 8.33% 127605887595351923798765477786913079290 - node11 us-east 1e Up Normal 501.13 GB 8.33% 141784319550391026443072753096570088100 - node12 us-east 1e Up Normal 508.45 GB 8.33% 155962751505430129087380028406227096910 so that one node is node5. At this instance ring output, yea, it is second highest in the ring but unlikely this is the cause. Jason On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote: HI Jason; The single node showing the anomaly is a hint that the problem is probably local to a node (as you suspected). - How many nodes do you have on the ring ? - What is the activity when this occurs - reads / writes/ compactions ? - Is there anything that is unique about this node that makes it different from the other nodes ? - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. - What is the load distribution the ring (ie: is this node carrying more load than the others). The system.log should have more info.,about it. hope this helps Jan/ On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com wrote: well, StatusLogger.java started shown in cassandra system.log, MessagingService.java also shown some stage (e.g. read, mutation) dropped. It's strange it only happen in this node but this type of message does not shown in other node log file at the same time... Jason On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote: HI Jason; Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com wrote: Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such stalling cause timeout to our cluster. any idea what happened, what went wrong and what could cause this? $ jstat -gcutil 32276 1s 0.00 5.78 91.21 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78
Re: cassandra node jvm stall intermittently
What version are you running? On Sat, Mar 7, 2015 at 2:14 PM, Jason Wee peich...@gmail.com wrote: Hi Jan, thanks for your time to prepare the question and answer below, - How many nodes do you have on the ring ? 12 - What is the activity when this occurs - reads / writes/ compactions ? This cluster has a lot of writes and read. off peak period, ops center shown cluster write is about 5k/sec, read about 1k/sec and during peak period, write could be 22k/sec and read about 10k/sec. So this particular one node hang like every moment irrespect if it is peak or non peak, or compaction. - Is there anything that is unique about this node that makes it different from the other nodes ? Our nodes are same in term of operating system (centos 6) and cassandra configuration settings. Other than that, there are no other resources intensive application running in cassandra nodes. - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. it *always* happened and in fact, it is happening now. - What is the load distribution the ring (ie: is this node carrying more load than the others). As of this moment, - Address DC RackStatus State Load OwnsToken - 155962751505430129087380028406227096910 - node1 us-east 1e Up Normal 498.66 GB 8.33% 0 - node2 us-east 1e Up Normal 503.36 GB 8.33% 14178431955039102644307275309657008810 - node3 us-east 1e Up Normal 492.08 GB 8.33% 28356863910078205288614550619314017619 - node4 us-east 1e Up Normal 499.54 GB 8.33% 42535295865117307932921825928971026430 - node5 us-east 1e Up Normal 523.76 GB 8.33% 56713727820156407428984779325531226109 - node6 us-east 1e Up Normal 515.36 GB 8.33% 70892159775195513221536376548285044050 - node7 us-east 1e Up Normal 588.93 GB 8.33% 85070591730234615865843651857942052860 - node8 us-east 1e Up Normal 498.51 GB 8.33% 99249023685273718510150927167599061670 - node9 us-east 1e Up Normal 531.81 GB 8.33% 113427455640312814857969558651062452221 - node10 us-east 1e Up Normal 501.85 GB 8.33% 127605887595351923798765477786913079290 - node11 us-east 1e Up Normal 501.13 GB 8.33% 141784319550391026443072753096570088100 - node12 us-east 1e Up Normal 508.45 GB 8.33% 155962751505430129087380028406227096910 so that one node is node5. At this instance ring output, yea, it is second highest in the ring but unlikely this is the cause. Jason On Sat, Mar 7, 2015 at 3:35 PM, Jan cne...@yahoo.com wrote: HI Jason; The single node showing the anomaly is a hint that the problem is probably local to a node (as you suspected). - How many nodes do you have on the ring ? - What is the activity when this occurs - reads / writes/ compactions ? - Is there anything that is unique about this node that makes it different from the other nodes ? - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. - What is the load distribution the ring (ie: is this node carrying more load than the others). The system.log should have more info.,about it. hope this helps Jan/ On Friday, March 6, 2015 4:50 AM, Jason Wee peich...@gmail.com wrote: well, StatusLogger.java started shown in cassandra system.log, MessagingService.java also shown some stage (e.g. read, mutation) dropped. It's strange it only happen in this node but this type of message does not shown in other node log file at the same time... Jason On Thu, Mar 5, 2015 at 4:26 AM, Jan cne...@yahoo.com wrote: HI Jason; Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee peich...@gmail.com wrote: Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such stalling cause timeout to our cluster. any idea what happened, what went wrong and what could cause this? $ jstat -gcutil 32276 1s 0.00 5.78 91.21 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 40.056 73.493 0.00 5.78 100.00 70.94
Re: Does DateTieredCompactionStrategy work with a compound clustering key?
I believe, that the DateTieredCompactionStrategy would work for PRIMARY KEY (timeblock, timestamp) -- but does it also work for PRIMARY KEY (timeblock, timestamp, hash) ? Yes. (sure you don't want to be using a timeuuid instead?) ~mck
Re: best practices for time-series data with massive amounts of records
It's probably quite rare for extremely large time series data to be querying the whole set of data. Instead there's almost always a Between X and Y dates aspect to nearly every real time query you might have against a table like this (with the exception of most recent N events). Because of this, time bucketing can be an effective strategy, though until you understand your data better, it's hard to know how large (or small) to make your buckets. Because of *that*, I recommend using timestamp data type for your bucketing strategy - this gives you the advantage of being able to reduce your bucket sizes while keeping your at-rest data mostly still quite accessible. What I mean is that if you change your bucketing strategy from day to hour, when you are querying across that changed time period, you can iterate at the finer granularity buckets (hour), and you'll pick up the coarser granularity (day) automatically for all but the earliest bucket (which is easy to correct for when you're flooring your start bucket). In the coarser time period, most reads are partition key misses, which are extremely inexpensive in Cassandra. If you do need most-recent-N queries for broad ranges and you expect to have some users whose clickrate is dramatically less frequent than your bucket interval (making iterating over buckets inefficient), you can keep a separate counter table with PK of ((user_id), bucket) in which you count new events. Now you can identify the exact set of buckets you need to read to satisfy the query no matter what the user's click volume is (so very low volume users have at most N partition keys queried, higher volume users query fewer partition keys). On Fri, Mar 6, 2015 at 4:06 PM, graham sanderson gra...@vast.com wrote: Note that using static column(s) for the “head” value, and trailing TTLed values behind is something we’re considering. Note this is especially nice if your head state includes say a map which is updated by small deltas (individual keys) We have not yet studied the effect of static columns on say DTCS On Mar 6, 2015, at 4:42 PM, Clint Kelly clint.ke...@gmail.com wrote: Hi all, Thanks for the responses, this was very helpful. I don't know yet what the distribution of clicks and users will be, but I expect to see a few users with an enormous amount of interactions and most users having very few. The idea of doing some additional manual partitioning, and then maintaining another table that contains the head partition for each user makes sense, although it would add additional latency when we want to get say the most recent 1000 interactions for a given user (which is something that we have to do sometimes for applications with tight SLAs). FWIW I doubt that any users will have so many interactions that they exceed what we could reasonably put in a row, but I wanted to have a strategy to deal with this. Having a nice design pattern in Cassandra for maintaining a row with the N-most-recent interactions would also solve this reasonably well, but I don't know of any way to implement that without running batch jobs that periodically clean out data (which might be okay). Best regards, Clint On Tue, Mar 3, 2015 at 8:10 AM, mck m...@apache.org wrote: Here partition is a random digit from 0 to (N*M) where N=nodes in cluster, and M=arbitrary number. Hopefully it was obvious, but here (unless you've got hot partitions), you don't need N. ~mck