In place vnode conversion possible?

2014-12-16 Thread Jonas Borgström
Hi,

I know that adding a new vnode enabled DC is the recommended method to
convert and existing cluster to vnode. And that the cassandra-shuffle
utility has been removed.

That said, I've done some testing and it appears to be possible to
perform an in place conversion as long as all nodes contain all data (3
nodes and replication factor 3 for example) like this:

for each node:
- nodetool -h localhost disablegossip (Not sure if this is needed)

- cqlsh localhost
  UPDATE system.local SET tokens=$NEWTOKENS WHERE key='local';

- nodetool -h localhost disablethrift (Not sure if this is needed)
- nodetool -h localhost drain
- service cassandra restart

And the following python snippet was used to generate $NEWTOKENS for
each node (RandomPartitioner):
"""
import random
print str([str(x) for x in sorted(random.randint(0,2**127-1) for x in
range(256))]).replace('[', '{').replace(']', '}')
"""

I've tested this in a test cluster and it seems to work just fine.

Has anyone else done anything similar?

Or if manually changing tokens is impossible and something horrible will
hit me down the line?

Test cluster configuration
--
Cassandra version: 1.2.19
Number of nodes: 3
Keyspace: NetworkTopologyStrategy:  {DC1: 1, DC2:1, DC3: 1}

/ Jonas



signature.asc
Description: OpenPGP digital signature


Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hello all,

I have read a lot about Cassandra and I read about key-value pairs,
partition keys, clustering keys, etc..
Is key mentioned in key-value pair and partition key refers to same or are
they different?

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2));


In this schema, I know (year, category) is the compound partition key and
frequency is the clustering key. What is the key here?


Thank You!

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Jack Krupansky
Correction: year and category form a “composite partition key”.

frequency, word1, and word2 are “clustering columns”.

The combination of a partition key with clustering columns is a “compound 
primary key”.

Every CQL row will have a partition key by definition, and may optionally have 
clustering columns.

“The key” should just be a synonym for “primary key”, although sometimes people 
are loosely speaking about “the partition” (which should be “the partition 
key”) rather than the CQL “row”.

-- Jack Krupansky

From: Chamila Wijayarathna 
Sent: Tuesday, December 16, 2014 8:03 AM
To: user@cassandra.apache.org 
Subject: Understanding what is key and partition key

Hello all,  

I have read a lot about Cassandra and I read about key-value pairs, partition 
keys, clustering keys, etc.. 
Is key mentioned in key-value pair and partition key refers to same or are they 
different?

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2)
);
In this schema, I know (year, category) is the compound partition key and 
frequency is the clustering key. What is the key here?


Thank You! 


-- 

Chamila Dilshan Wijayarathna,
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jack,

So what will be the keys and values of the following CF instance?

year | category | frequency | word1| word2   | id
--+--+---+--+-+---
 2014 |N | 1 |සියළුම | යුද්ධ |   664
 2014 |N | 1 |එච් |   කාණ්ඩය | 12526
 2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
 2014 |N | 1 |  බී|   කාණ්ඩය | 12505

Thank You!

On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky 
wrote:
>
>   Correction: year and category form a “composite partition key”.
>
> frequency, word1, and word2 are “clustering columns”.
>
> The combination of a partition key with clustering columns is a “compound
> primary key”.
>
> Every CQL row will have a partition key by definition, and may optionally
> have clustering columns.
>
> “The key” should just be a synonym for “primary key”, although sometimes
> people are loosely speaking about “the partition” (which should be “the
> partition key”) rather than the CQL “row”.
>
> -- Jack Krupansky
>
>  *From:* Chamila Wijayarathna 
> *Sent:* Tuesday, December 16, 2014 8:03 AM
> *To:* user@cassandra.apache.org
> *Subject:* Understanding what is key and partition key
>
>  Hello all,
>
> I have read a lot about Cassandra and I read about key-value pairs,
> partition keys, clustering keys, etc..
> Is key mentioned in key-value pair and partition key refers to same or are
> they different?
>
>
> CREATE TABLE corpus.bigram_time_category_ordered_frequency (
> id bigint,
> word1 varchar,
> word2 varchar,
> year int,
> category varchar,
> frequency int,
> PRIMARY KEY((year, category),frequency,word1,word2));
>
>
> In this schema, I know (year, category) is the compound partition key and
> frequency is the clustering key. What is the key here?
>
>
> Thank You!
>
> --
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.
>


-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Understanding what is key and partition key

2014-12-16 Thread Jens Rantil
For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the value-part 
is (664).




Cheers,

Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna
 wrote:

> Hi Jack,
> So what will be the keys and values of the following CF instance?
> year | category | frequency | word1| word2   | id
> --+--+---+--+-+---
>  2014 |N | 1 |සියළුම | යුද්ධ |   664
>  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
>  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
>  2014 |N | 1 |  බී|   කාණ්ඩය | 12505
> Thank You!
> On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky 
> wrote:
>>
>>   Correction: year and category form a “composite partition key”.
>>
>> frequency, word1, and word2 are “clustering columns”.
>>
>> The combination of a partition key with clustering columns is a “compound
>> primary key”.
>>
>> Every CQL row will have a partition key by definition, and may optionally
>> have clustering columns.
>>
>> “The key” should just be a synonym for “primary key”, although sometimes
>> people are loosely speaking about “the partition” (which should be “the
>> partition key”) rather than the CQL “row”.
>>
>> -- Jack Krupansky
>>
>>  *From:* Chamila Wijayarathna 
>> *Sent:* Tuesday, December 16, 2014 8:03 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Understanding what is key and partition key
>>
>>  Hello all,
>>
>> I have read a lot about Cassandra and I read about key-value pairs,
>> partition keys, clustering keys, etc..
>> Is key mentioned in key-value pair and partition key refers to same or are
>> they different?
>>
>>
>> CREATE TABLE corpus.bigram_time_category_ordered_frequency (
>> id bigint,
>> word1 varchar,
>> word2 varchar,
>> year int,
>> category varchar,
>> frequency int,
>> PRIMARY KEY((year, category),frequency,word1,word2));
>>
>>
>> In this schema, I know (year, category) is the compound partition key and
>> frequency is the clustering key. What is the key here?
>>
>>
>> Thank You!
>>
>> --
>> *Chamila Dilshan Wijayarathna,*
>> SMIEEE, SMIESL,
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> University of Moratuwa.
>>
> -- 
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.

Re: Understanding what is key and partition key

2014-12-16 Thread Chamila Wijayarathna
Hi Jens,

Thank You!

On Tue, Dec 16, 2014 at 7:03 PM, Jens Rantil  wrote:
>
> For the first row, the key is: (2014, N, 1, සියළුම, යුද්ධ) and the
> value-part is (664).
>
> Cheers,
> Jens
>
> ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se
> Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
>
>
> On Tue, Dec 16, 2014 at 2:25 PM, Chamila Wijayarathna <
> cdwijayarat...@gmail.com> wrote:
>
>> Hi Jack,
>>
>> So what will be the keys and values of the following CF instance?
>>
>>  year | category | frequency | word1| word2   | id
>> --+--+---+--+-+---
>>  2014 |N | 1 |සියළුම | යුද්ධ |   664
>>  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
>>  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
>>  2014 |N | 1 |  බී|   කාණ්ඩය | 12505
>>
>> Thank You!
>>
>> On Tue, Dec 16, 2014 at 6:45 PM, Jack Krupansky 
>> wrote:
>>>
>>>   Correction: year and category form a “composite partition key”.
>>>
>>> frequency, word1, and word2 are “clustering columns”.
>>>
>>> The combination of a partition key with clustering columns is a
>>> “compound primary key”.
>>>
>>> Every CQL row will have a partition key by definition, and may
>>> optionally have clustering columns.
>>>
>>> “The key” should just be a synonym for “primary key”, although sometimes
>>> people are loosely speaking about “the partition” (which should be “the
>>> partition key”) rather than the CQL “row”.
>>>
>>> -- Jack Krupansky
>>>
>>>  *From:* Chamila Wijayarathna 
>>>  *Sent:* Tuesday, December 16, 2014 8:03 AM
>>>  *To:* user@cassandra.apache.org
>>>  *Subject:* Understanding what is key and partition key
>>>
>>>   Hello all,
>>>
>>> I have read a lot about Cassandra and I read about key-value pairs,
>>> partition keys, clustering keys, etc..
>>> Is key mentioned in key-value pair and partition key refers to same or
>>> are they different?
>>>
>>>
>>> CREATE TABLE corpus.bigram_time_category_ordered_frequency (
>>> id bigint,
>>> word1 varchar,
>>> word2 varchar,
>>> year int,
>>> category varchar,
>>> frequency int,
>>> PRIMARY KEY((year, category),frequency,word1,word2));
>>>
>>>
>>> In this schema, I know (year, category) is the compound partition key
>>> and frequency is the clustering key. What is the key here?
>>>
>>>
>>> Thank You!
>>>
>>> --
>>> *Chamila Dilshan Wijayarathna,*
>>> SMIEEE, SMIESL,
>>> Undergraduate,
>>> Department of Computer Science and Engineering,
>>> University of Moratuwa.
>>>
>>
>>
>> --
>> *Chamila Dilshan Wijayarathna,*
>> SMIEEE, SMIESL,
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> University of Moratuwa.
>>
>
>

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Chamila Wijayarathna
Hello all,

I am trying to test my application using cassandra-unit with following
schema and data given below.

CREATE TABLE corpus.bigram_time_category_ordered_frequency (
id bigint,
word1 varchar,
word2 varchar,
year int,
category varchar,
frequency int,
PRIMARY KEY((year, category),frequency,word1,word2));

year | category | frequency | word1| word2   | id
--+--+---+--+-+---
 2014 |N | 1 |සියළුම | යුද්ධ |   664
 2014 |N | 1 |එච් |   කාණ්ඩය | 12526
 2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
 2014 |N | 1 |  බී|   කාණ්ඩය | 12505

Since this has a compound primary key, I am not clear with how to define
dataset.json [1] for this CF. Can somebody help me on how to do that?

Thank You!

1.
https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet

-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: batch_size_warn_threshold_in_kb

2014-12-16 Thread Eric Stevens
> You are, of course, free to use batches in your application

I'm not looking to justify the use of batches, I'm looking for the path
forward that will give us the Best Results™ both near and long term, for
some definition of Best (which would be a balance of client throughput and
cluster pressure).  If individual writes are best for us, that's what I
want to do.  If batches are best for us, that's what I want to do.

I'm just struggling that I'm not able to reproduce your advice
experimentally, and it's not just a few percent difference, it's 5x to 8x
difference.  It's really difficult for me to adopt advice blindly when it
differs from my own observations by such a substantial amount.  That means
something is wrong either with my observations or with the advice, and I
would really like to know which.  I'm not trying to be argumentative or
push for a particular approach, I'm trying to resolve an inconsistency.


RE your questions: I'm sorry this turns into a wall of text, simple
questions about parallelism and distributed systems rarely can be
adequately answered in just a few words.  I'm trying to be open and
transparent about my testing approach because I want to find out where the
disconnect is here.  At the same time I'm trying to bridge the knowledge
gap since I'm working with parallelism toolset with which you're not
familiar, and that could obviously have a substantial impact on the
results.  Hopefully someone else in the community familiar with Scala will
notice this and provide feedback that I'm not making a fundamental mistake.


1) My original runs were in EC2 being driven by a different server than the
Cassandra cluster, but in the same AZ as one of the Cassandra
servers (typical 3-AZ setup for Cassandra).  All four instances (3x C*, 1x
test driver) were i2.2xl, so have gigabit network between them.


2) The system was under some moderate other load, this is our test cluster
that takes a steady stream of simulated data to provide other developers
with something to work against.  That load is quite constant and doesn't
work these servers particularly hard - only a few thousand records per
second typically.  Load averages between 1 and 3 most of the time.

Unfortunately I'm not successful getting cassandra-stress talking to this
cluster because of ssl configuration (it doesn't seem to actually pay
attention to -ts and -tspw command line flags).  I can find out if our ops
guys would be ok with turning off ssl for a while, but as that would break
our other applications using the same cluster, and may block our other
engineers as a result.  So it has farther reaching implications than just
being something I can happily turn on or off at whim.

I'm curious how you would expect the performance of my stress tool to
differ when the cluster was being overworked - could you explain what you
anticipate the change in results to look like?  I.e. would single-writes
remain about constant for performance while batches would degrade in
performance?


3) Well I specifically attempt to control for this by testing three
different concurrency models, these were named by me "parallel," "scatter,"
and "traverse" (just aliases to make it easier to control the driver).  You
can see the code between the different approaches here - they are pretty
similar to each other, but probably involve some knowledge of how
concurrency works in Scala to really appreciate the differences:
https://gist.github.com/MightyE/1c98912fca104f6138fc/a7db68e72f99ac1215fcfb096d69391ee285c080#file-testsuite-L181-L203

I know you're not a Scala guy, so I'll explain roughly what they do, but
the point is that I'm trying hard to control for just having chosen a bad
concurrency model:

scatter -> Take all of the Statements and call executeAsync() on them as
*fast* the Session will let me.  This is the Unintelligent Brute Force
approach, and it's definitely not how I would model a typical production
application as it doesn't attempt to respond to system pressure at all, and
it's trying to gobble up as many resources as it can.  Use the Scala
Futures system to combine the the set of async calls into a single Future
that completes when all the futures returned from executeAsync() have
completed.

traverse -> Give all of the Statements to the Scala Futures system and tell
it to call executeAsync() on them all at the rate that it thinks is
appropriate.  This would be much closer to my recommendation on how to
model a production application, because in a real application, there's more
than a single class of work to be done, and the Futures system schedules
both this work and other work intelligently and configurably.  It gives us
a single awaitable Future that completes when it has finished all of its
work and all of the async calls have been completed.  You guys are using
Netty for your native protocol, and Netty offers true event driven
concurrency which gets along famously well with Scala's Futures system.

parallel -> Use a Scala Parallel collection

does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
know that C* is not well suited to this kind of workload, but that's where
we are, and before I go looking for an entirely new data layer I would
rather explore whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always
occur by backend processes to "clean up" data after it has been processed,
and thus they do not need to be 100% available.  So this made me think,
what if I did the following?

   - gc_grace_seconds = 0, which ensures that tombstones are never created
   - replication factor = 3
   - for writes that are inserts, consistency = QUORUM, which ensures that
   writes can proceed even if 1 replica is slow/down
   - for deletes, consistency = ALL, which ensures that when we delete a
   record it disappears entirely (no need for tombstones)
   - for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't
need to worry about inconsistencies created by partial updates (e.g. value
gets changed on one machine but not another).  Sometimes there will be
duplicate writes, but I think that should be fine since the value is always
identical.

Any red flags with this approach?  Has anyone tried it and have experiences
to share?  Also, I *think* that this means that I don't need to run
repairs, which from an ops perspective is great.

Thanks, as always,
- Ian


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Eric Stevens
No, deletes are always written as a tombstone no matter the consistency.
This is because data at rest is written to sstables which are immutable
once written. The tombstone marks that a record in another sstable is now
deleted, and so a read of that value should be treated as if it doesn't
exist.

When sstables are later compacted, several sstables are merged into one and
any overlapping values between the tables are condensed into one. Values
which have a tombstone can be excluded from the new sstable. GC grace
period indicates how long a tombstone should be kept after all underlying
values have been compacted away so that the deleted value can't be
resurrected if a node rejoins the cluster which knew that value.
On Dec 16, 2014 8:23 AM, "Ian Rose"  wrote:

> Howdy all,
>
> Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
> know that C* is not well suited to this kind of workload, but that's where
> we are, and before I go looking for an entirely new data layer I would
> rather explore whether C* could be tuned to work well for us.
>
> However, deletions are never driven by users in our app - deletions always
> occur by backend processes to "clean up" data after it has been processed,
> and thus they do not need to be 100% available.  So this made me think,
> what if I did the following?
>
>- gc_grace_seconds = 0, which ensures that tombstones are never created
>- replication factor = 3
>- for writes that are inserts, consistency = QUORUM, which ensures
>that writes can proceed even if 1 replica is slow/down
>- for deletes, consistency = ALL, which ensures that when we delete a
>record it disappears entirely (no need for tombstones)
>- for reads, consistency = QUORUM
>
> Also, I should clarify that our data essentially append only, so I don't
> need to worry about inconsistencies created by partial updates (e.g. value
> gets changed on one machine but not another).  Sometimes there will be
> duplicate writes, but I think that should be fine since the value is always
> identical.
>
> Any red flags with this approach?  Has anyone tried it and have
> experiences to share?  Also, I *think* that this means that I don't need to
> run repairs, which from an ops perspective is great.
>
> Thanks, as always,
> - Ian
>
>


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot 
be deleted. Therefore, a tombstone is required. The value you deleted will be 
physically removed during compaction.

My workload sounds similar to yours in some respects, and I was able to get C* 
working for me. I have large chunks of data which I periodically replace. I 
write the new data, update a reference, and then delete the old data. I 
designed my schema to be tombstone-friendly, and C* works great. For some of my 
tables I am able to delete entire partitions. Because of the reference that I 
updated, I never try to access the old data, and therefore the tombstones for 
these partitions are never read. The old data simply has to wait for 
compaction. Other tables require deleting records within partitions. These 
tombstones do get read, so there are performance implications. I was able to 
design my schema so that no partition ever has more than a few tombstones (one 
for each generation of deleted data, which is usually no more than one).

Hope this helps.

Robert

On Dec 16, 2014, at 8:22 AM, Ian Rose 
mailto:ianr...@fullstory.com>> wrote:

Howdy all,

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to "clean up" data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?

  *   gc_grace_seconds = 0, which ensures that tombstones are never created
  *   replication factor = 3
  *   for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down
  *   for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones)
  *   for reads, consistency = QUORUM

Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian




Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Ah, makes sense.  Thanks for the explanations!

- Ian


On Tue, Dec 16, 2014 at 10:53 AM, Robert Wille  wrote:
>
>  Tombstones have to be created. The SSTables are immutable, so the data
> cannot be deleted. Therefore, a tombstone is required. The value you
> deleted will be physically removed during compaction.
>
>  My workload sounds similar to yours in some respects, and I was able to
> get C* working for me. I have large chunks of data which I periodically
> replace. I write the new data, update a reference, and then delete the old
> data. I designed my schema to be tombstone-friendly, and C* works great.
> For some of my tables I am able to delete entire partitions. Because of the
> reference that I updated, I never try to access the old data, and therefore
> the tombstones for these partitions are never read. The old data simply has
> to wait for compaction. Other tables require deleting records within
> partitions. These tombstones do get read, so there are performance
> implications. I was able to design my schema so that no partition ever has
> more than a few tombstones (one for each generation of deleted data, which
> is usually no more than one).
>
>  Hope this helps.
>
>  Robert
>
>  On Dec 16, 2014, at 8:22 AM, Ian Rose  wrote:
>
>  Howdy all,
>
>  Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
> know that C* is not well suited to this kind of workload, but that's where
> we are, and before I go looking for an entirely new data layer I would
> rather explore whether C* could be tuned to work well for us.
>
>  However, deletions are never driven by users in our app - deletions
> always occur by backend processes to "clean up" data after it has been
> processed, and thus they do not need to be 100% available.  So this made me
> think, what if I did the following?
>
>- gc_grace_seconds = 0, which ensures that tombstones are never created
>- replication factor = 3
>- for writes that are inserts, consistency = QUORUM, which ensures
>that writes can proceed even if 1 replica is slow/down
>- for deletes, consistency = ALL, which ensures that when we delete a
>record it disappears entirely (no need for tombstones)
>- for reads, consistency = QUORUM
>
> Also, I should clarify that our data essentially append only, so I don't
> need to worry about inconsistencies created by partial updates (e.g. value
> gets changed on one machine but not another).  Sometimes there will be
> duplicate writes, but I think that should be fine since the value is always
> identical.
>
>  Any red flags with this approach?  Has anyone tried it and have
> experiences to share?  Also, I *think* that this means that I don't need to
> run repairs, which from an ops perspective is great.
>
>  Thanks, as always,
> - Ian
>
>
>


Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Jack Krupansky
When you say “no need for tombstones”, did you actually read that somewhere or 
were you just speculating? If the former, where exactly?

-- Jack Krupansky

From: Ian Rose 
Sent: Tuesday, December 16, 2014 10:22 AM
To: user 
Subject: does consistency=ALL for deletes obviate the need for tombstones?

Howdy all, 

Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I know 
that C* is not well suited to this kind of workload, but that's where we are, 
and before I go looking for an entirely new data layer I would rather explore 
whether C* could be tuned to work well for us.

However, deletions are never driven by users in our app - deletions always 
occur by backend processes to "clean up" data after it has been processed, and 
thus they do not need to be 100% available.  So this made me think, what if I 
did the following?
  a.. gc_grace_seconds = 0, which ensures that tombstones are never created 
  b.. replication factor = 3 
  c.. for writes that are inserts, consistency = QUORUM, which ensures that 
writes can proceed even if 1 replica is slow/down 
  d.. for deletes, consistency = ALL, which ensures that when we delete a 
record it disappears entirely (no need for tombstones) 
  e.. for reads, consistency = QUORUM
Also, I should clarify that our data essentially append only, so I don't need 
to worry about inconsistencies created by partial updates (e.g. value gets 
changed on one machine but not another).  Sometimes there will be duplicate 
writes, but I think that should be fine since the value is always identical.

Any red flags with this approach?  Has anyone tried it and have experiences to 
share?  Also, I *think* that this means that I don't need to run repairs, which 
from an ops perspective is great.

Thanks, as always,
- Ian


Re: Hinted handoff not working

2014-12-16 Thread Robert Wille
Nope. I added millions of records and several GB to the cluster while one node 
was down, and then ran "nodetool flush system hints" on a couple of nodes that 
were up, and system/hints has less than 200K in it.

Here’s the relevant part of "nodetool cfstats system.hints":

Keyspace: system
Read Count: 28572
Read Latency: 0.01806502869942601 ms.
Write Count: 351
Write Latency: 0.04547008547008547 ms.
Pending Tasks: 0
Table: hints
SSTable count: 1
Space used (live), bytes: 7446
Space used (total), bytes: 80062
SSTable Compression Ratio: 0.2651441528992549
Number of keys (estimate): 128
Memtable cell count: 1
Memtable data size, bytes: 1740

The hints are definitely not being stored.

Robert

On Dec 14, 2014, at 11:44 PM, Jens Rantil 
mailto:jens.ran...@tink.se>> wrote:

Hi Robert ,

Maybe you need to flush your memtables to actually see the disk usage increase? 
This applies to both hosts.

Cheers,
Jens




On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille 
mailto:rwi...@fold3.com>> wrote:

I have a cluster with RF=3. If I shut down one node, add a bunch of data to the 
cluster, I don’t see a bunch of records added to system.hints. Also, du of 
/var/lib/cassandra/data/system/hints of the nodes that are up shows that hints 
aren’t being stored. When I start the down node, its data doesn’t grow until I 
run repair, which then takes a really long time because it is significantly out 
of date. Is there some magic setting I cannot find in the documentation to 
enable hinted handoff? I’m running 2.0.11. Any insights would be greatly 
appreciated.

Thanks

Robert





Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum =
(2/2) +1 = 2.

Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
confused, what is my read CL and what is my WRITE CL?)

So, does it mean that for every WRITE it will write in both the nodes?

and For every READ, it will read from both nodes and give back to client?

DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

Regards

Neha

On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad  wrote:
>
> I did a presentation on diagnosing performance problems in production at
> the US & Euro summits, in which I covered quite a few tools & preventative
> measures you should know when running a production cluster.  You may find
> it useful:
> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
>
> On ops center - I recommend it.  It gives you a nice dashboard.  I don't
> think it's completely comprehensive (but no tool really is) but it gets you
> 90% of the way there.
>
> It's a good idea to run repairs, especially if you're doing deletes or
> querying at CL=ONE.  I assume you're not using quorum, because on RF=2
> that's the same as CL=ALL.
>
> I recommend at least RF=3 because if you lose 1 server, you're on the edge
> of data loss.
>
>
> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
> wrote:
>
>> Hi,
>> We have Two Node Cluster Configuration in production with RF=2.
>>
>> Which means that the data is written in both the clusters and it's
>> running for about a month now and has good amount of data.
>>
>> Questions?
>> 1. What are the best practices for maintenance?
>> 2. Is OPScenter required to be installed or I can manage with nodetool
>> utility?
>> 3. Is is necessary to run repair weekly?
>>
>> thanks
>> regards
>> Neha
>>
>


Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Hi,
I have been having a few exchanges with contributors to the project around what 
is possible with Cassandra and a common response that comes up when I describe 
functionality as broken or missing is that I am not modelling my data 
correctly. Unfortunately, I cannot seem to find comprehensive documentation on 
modelling with Cassandra. In particular, I am finding myself modelling by 
restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such 
documentation?The DataStax documentation on data modelling is far too weak to 
be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further 
processing; ie I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a 
column is part of the primary key, it cannot be edited so I have a circular 
dependency
Thanks,
Jason


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
CL quorum with RF2 is equivalent to ALL, writes will require
acknowledgement from both nodes, and reads will be from both nodes.

CL one will write to both replicas, but return success as soon as the first
one responds, read will be from one node ( load balancing strategy
determines which one).

FWIW I've come around to dislike downgrading retry policy. I now feel like
if I'm using downgrading, I'm effectively going to be using that downgraded
policy most of the time under server stress, so in practice that reduced
consistency is the effective consistency I'm asking for from my writes and
reads.



On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi 
wrote:
>
> Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum
> = (2/2) +1 = 2.
>
> Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
> confused, what is my read CL and what is my WRITE CL?)
>
> So, does it mean that for every WRITE it will write in both the nodes?
>
> and For every READ, it will read from both nodes and give back to client?
>
> DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?
>
> Regards
>
> Neha
>
> On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad 
> wrote:
>>
>> I did a presentation on diagnosing performance problems in production at
>> the US & Euro summits, in which I covered quite a few tools & preventative
>> measures you should know when running a production cluster.  You may find
>> it useful:
>> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
>>
>> On ops center - I recommend it.  It gives you a nice dashboard.  I don't
>> think it's completely comprehensive (but no tool really is) but it gets you
>> 90% of the way there.
>>
>> It's a good idea to run repairs, especially if you're doing deletes or
>> querying at CL=ONE.  I assume you're not using quorum, because on RF=2
>> that's the same as CL=ALL.
>>
>> I recommend at least RF=3 because if you lose 1 server, you're on the
>> edge of data loss.
>>
>>
>> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
>> wrote:
>>
>>> Hi,
>>> We have Two Node Cluster Configuration in production with RF=2.
>>>
>>> Which means that the data is written in both the clusters and it's
>>> running for about a month now and has good amount of data.
>>>
>>> Questions?
>>> 1. What are the best practices for maintenance?
>>> 2. Is OPScenter required to be installed or I can manage with nodetool
>>> utility?
>>> 3. Is is necessary to run repair weekly?
>>>
>>> thanks
>>> regards
>>> Neha
>>>
>>

-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
Thanks Ryan.
So, as Jonathan recommended, we should have RF=3 with Three nodes.
So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not
need the  downgrading retry policy, in case if my one node goes down.

I can dynamically add a New node to my Cluster.
Can I change my RF to 3, dynamically without affecting my nodes ?

regards
Neha

On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla  wrote:
>
>
> CL quorum with RF2 is equivalent to ALL, writes will require
> acknowledgement from both nodes, and reads will be from both nodes.
>
> CL one will write to both replicas, but return success as soon as the
> first one responds, read will be from one node ( load balancing strategy
> determines which one).
>
> FWIW I've come around to dislike downgrading retry policy. I now feel like
> if I'm using downgrading, I'm effectively going to be using that downgraded
> policy most of the time under server stress, so in practice that reduced
> consistency is the effective consistency I'm asking for from my writes and
> reads.
>
>
>
> On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi 
> wrote:
>>
>> Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us Quorum
>> = (2/2) +1 = 2.
>>
>> Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
>> confused, what is my read CL and what is my WRITE CL?)
>>
>> So, does it mean that for every WRITE it will write in both the nodes?
>>
>> and For every READ, it will read from both nodes and give back to client?
>>
>> DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?
>>
>> Regards
>>
>> Neha
>>
>> On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad 
>> wrote:
>>>
>>> I did a presentation on diagnosing performance problems in production at
>>> the US & Euro summits, in which I covered quite a few tools & preventative
>>> measures you should know when running a production cluster.  You may find
>>> it useful:
>>> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
>>>
>>> On ops center - I recommend it.  It gives you a nice dashboard.  I don't
>>> think it's completely comprehensive (but no tool really is) but it gets you
>>> 90% of the way there.
>>>
>>> It's a good idea to run repairs, especially if you're doing deletes or
>>> querying at CL=ONE.  I assume you're not using quorum, because on RF=2
>>> that's the same as CL=ALL.
>>>
>>> I recommend at least RF=3 because if you lose 1 server, you're on the
>>> edge of data loss.
>>>
>>>
>>> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
>>> wrote:
>>>
 Hi,
 We have Two Node Cluster Configuration in production with RF=2.

 Which means that the data is written in both the clusters and it's
 running for about a month now and has good amount of data.

 Questions?
 1. What are the best practices for maintenance?
 2. Is OPScenter required to be installed or I can manage with nodetool
 utility?
 3. Is is necessary to run repair weekly?

 thanks
 regards
 Neha

>>>
>
> --
>
> [image: datastax_logo.png] 
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png]  [image: linkedin.png]
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Ryan Svihla
you'll have to run repair and that will involve some load and streaming,
but this is a normal use case for cassandra..and your cluster should be
sized load wise to allow repair, and bootstrapping of new nodes..otherwise
when you're over whelmed you won't be able to add more nodes easily.

If you need to reduce the cost of streaming to the existing cluster, just
set streaming throughput on your existing nodes to a lower number like 50
or 25.

On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi 
wrote:
>
> Thanks Ryan.
> So, as Jonathan recommended, we should have RF=3 with Three nodes.
> So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will not
> need the  downgrading retry policy, in case if my one node goes down.
>
> I can dynamically add a New node to my Cluster.
> Can I change my RF to 3, dynamically without affecting my nodes ?
>
> regards
> Neha
>
> On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla 
> wrote:
>>
>>
>> CL quorum with RF2 is equivalent to ALL, writes will require
>> acknowledgement from both nodes, and reads will be from both nodes.
>>
>> CL one will write to both replicas, but return success as soon as the
>> first one responds, read will be from one node ( load balancing strategy
>> determines which one).
>>
>> FWIW I've come around to dislike downgrading retry policy. I now feel
>> like if I'm using downgrading, I'm effectively going to be using that
>> downgraded policy most of the time under server stress, so in practice that
>> reduced consistency is the effective consistency I'm asking for from my
>> writes and reads.
>>
>>
>>
>> On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi 
>> wrote:
>>>
>>> Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us
>>> Quorum = (2/2) +1 = 2.
>>>
>>> Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
>>> confused, what is my read CL and what is my WRITE CL?)
>>>
>>> So, does it mean that for every WRITE it will write in both the nodes?
>>>
>>> and For every READ, it will read from both nodes and give back to client?
>>>
>>> DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?
>>>
>>> Regards
>>>
>>> Neha
>>>
>>> On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad 
>>> wrote:

 I did a presentation on diagnosing performance problems in production
 at the US & Euro summits, in which I covered quite a few tools &
 preventative measures you should know when running a production cluster.
 You may find it useful:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/

 On ops center - I recommend it.  It gives you a nice dashboard.  I
 don't think it's completely comprehensive (but no tool really is) but it
 gets you 90% of the way there.

 It's a good idea to run repairs, especially if you're doing deletes or
 querying at CL=ONE.  I assume you're not using quorum, because on RF=2
 that's the same as CL=ALL.

 I recommend at least RF=3 because if you lose 1 server, you're on the
 edge of data loss.


 On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
 wrote:

> Hi,
> We have Two Node Cluster Configuration in production with RF=2.
>
> Which means that the data is written in both the clusters and it's
> running for about a month now and has good amount of data.
>
> Questions?
> 1. What are the best practices for maintenance?
> 2. Is OPScenter required to be installed or I can manage with nodetool
> utility?
> 3. Is is necessary to run repair weekly?
>
> thanks
> regards
> Neha
>

>>
>> --
>>
>> [image: datastax_logo.png] 
>>
>> Ryan Svihla
>>
>> Solution Architect
>>
>> [image: twitter.png]  [image: linkedin.png]
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>

-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Cassandra Maintenance Best practices

2014-12-16 Thread Neha Trivedi
thanks Ryan.. We will get a new node and add it in the cluster. I will mail
if I have any question regarding the same.

On Tue, Dec 16, 2014 at 10:52 PM, Ryan Svihla  wrote:
>
> you'll have to run repair and that will involve some load and streaming,
> but this is a normal use case for cassandra..and your cluster should be
> sized load wise to allow repair, and bootstrapping of new nodes..otherwise
> when you're over whelmed you won't be able to add more nodes easily.
>
> If you need to reduce the cost of streaming to the existing cluster, just
> set streaming throughput on your existing nodes to a lower number like 50
> or 25.
>
> On Tue, Dec 16, 2014 at 11:10 AM, Neha Trivedi 
> wrote:
>>
>> Thanks Ryan.
>> So, as Jonathan recommended, we should have RF=3 with Three nodes.
>> So Quorum = 2 so, CL= 2 (or I need the CL to be set to two) and I will
>> not need the  downgrading retry policy, in case if my one node goes down.
>>
>> I can dynamically add a New node to my Cluster.
>> Can I change my RF to 3, dynamically without affecting my nodes ?
>>
>> regards
>> Neha
>>
>> On Tue, Dec 16, 2014 at 10:32 PM, Ryan Svihla 
>> wrote:
>>>
>>>
>>> CL quorum with RF2 is equivalent to ALL, writes will require
>>> acknowledgement from both nodes, and reads will be from both nodes.
>>>
>>> CL one will write to both replicas, but return success as soon as the
>>> first one responds, read will be from one node ( load balancing strategy
>>> determines which one).
>>>
>>> FWIW I've come around to dislike downgrading retry policy. I now feel
>>> like if I'm using downgrading, I'm effectively going to be using that
>>> downgraded policy most of the time under server stress, so in practice that
>>> reduced consistency is the effective consistency I'm asking for from my
>>> writes and reads.
>>>
>>>
>>>
>>> On Tue, Dec 16, 2014 at 10:50 AM, Neha Trivedi 
>>> wrote:

 Hi Jonathan,QUORUM = (sum_of_replication_factors / 2) + 1, For us
 Quorum = (2/2) +1 = 2.

 Default CL is ONE and RF=2 with Two Nodes in the cluster.(I am little
 confused, what is my read CL and what is my WRITE CL?)

 So, does it mean that for every WRITE it will write in both the nodes?

 and For every READ, it will read from both nodes and give back to
 client?

 DOWNGRADERETRYPOLICY will downgrade the CL if a node is down?

 Regards

 Neha

 On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad 
 wrote:
>
> I did a presentation on diagnosing performance problems in production
> at the US & Euro summits, in which I covered quite a few tools &
> preventative measures you should know when running a production cluster.
> You may find it useful:
> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
>
> On ops center - I recommend it.  It gives you a nice dashboard.  I
> don't think it's completely comprehensive (but no tool really is) but it
> gets you 90% of the way there.
>
> It's a good idea to run repairs, especially if you're doing deletes or
> querying at CL=ONE.  I assume you're not using quorum, because on RF=2
> that's the same as CL=ALL.
>
> I recommend at least RF=3 because if you lose 1 server, you're on the
> edge of data loss.
>
>
> On Tue Dec 09 2014 at 7:19:32 PM Neha Trivedi 
> wrote:
>
>> Hi,
>> We have Two Node Cluster Configuration in production with RF=2.
>>
>> Which means that the data is written in both the clusters and it's
>> running for about a month now and has good amount of data.
>>
>> Questions?
>> 1. What are the best practices for maintenance?
>> 2. Is OPScenter required to be installed or I can manage with
>> nodetool utility?
>> 3. Is is necessary to run repair weekly?
>>
>> thanks
>> regards
>> Neha
>>
>
>>>
>>> --
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Ryan Svihla
>>>
>>> Solution Architect
>>>
>>> [image: twitter.png]  [image: linkedin.png]
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>>
>
> --
>
> [image: datastax_logo.png] 
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png]  [image: linkedin.png]
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s 

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
Data Modeling a distributed application could be a book unto itself.
However, I will add, modeling by restriction is basically the entire
thought process in Cassandra data modeling since it's a distributed hash
table and a core aspect of that sort of application is you need to be able
to quickly locate which server owns the data you want in the cluster (which
is provided by the partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works.
I'm not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise
you'd have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that
:


   1. Querying a single server will be faster than querying many servers
   2. Multiple tables with the same data but with different partition keys
   is much easier to scale that a single table that you have to scan the whole
   cluster for your answer.


If you accept this, you've basically got the key principle down...most
other ideas are extensions of this, some nuance includes dealing with
tombstones, partition size and order. and I can answer any more specifics.

I've been meaning to write a series of blog posts on this, but as I stated,
it's almost a book unto itself. Data modeling a distributed application
requires a fundamental rethink of all the assumptions we've been taught for
master/slave style databases.


On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania  wrote:
>
> Hi,
>
> I have been having a few exchanges with contributors to the project around
> what is possible with Cassandra and a common response that comes up when I
> describe functionality as broken or missing is that I am not modelling my
> data correctly. Unfortunately, I cannot seem to find comprehensive
> documentation on modelling with Cassandra. In particular, I am finding
> myself modelling by restriction rather than what I would like to do.
>
> Does such documentations exist? If not, is there any effort to create such
> documentation?The DataStax documentation on data modelling is far too weak
> to be meaningful.
>
> In particular, I am caught because:
>
> 1) I want to search on a specific column to make updates to it after
> further processing; ie I don't know its value on first insert
> 2) If I want to search on a column, it has to be part of the primary key
> 3) If a column is part of the primary key, it cannot be edited so I have a
> circular dependency
>
> Thanks,
>
> Jason
>


-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Defining DataSet.json for cassandra-unit testing

2014-12-16 Thread Ryan Svihla
I'd ask the author of cassandra-unit. I've not personally used that project.

On Tue, Dec 16, 2014 at 8:00 AM, Chamila Wijayarathna <
cdwijayarat...@gmail.com> wrote:
>
> Hello all,
>
> I am trying to test my application using cassandra-unit with following
> schema and data given below.
>
> CREATE TABLE corpus.bigram_time_category_ordered_frequency (
> id bigint,
> word1 varchar,
> word2 varchar,
> year int,
> category varchar,
> frequency int,
> PRIMARY KEY((year, category),frequency,word1,word2));
>
> year | category | frequency | word1| word2   | id
> --+--+---+--+-+---
>  2014 |N | 1 |සියළුම | යුද්ධ |   664
>  2014 |N | 1 |එච් |   කාණ්ඩය | 12526
>  2014 |N | 1 |ගජබා | සුපර්ක්‍රොස් | 25779
>  2014 |N | 1 |  බී|   කාණ්ඩය | 12505
>
> Since this has a compound primary key, I am not clear with how to define
> dataset.json [1] for this CF. Can somebody help me on how to do that?
>
> Thank You!
>
> 1.
> https://github.com/jsevellec/cassandra-unit/wiki/What-can-you-set-into-a-dataSet
>
> --
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.
>


-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Changing replication factor of Cassandra cluster

2014-12-16 Thread Ryan Svihla
Repair's performance is going to vary heavily by a large number of factors,
hours for 1 node to finish is within range of what I see in the wild, again
there are so many factors it's impossible to speculate on if that is good
or bad for your cluster. Factors that matter include:

   1. speed of disk io
   2. amount of ram and cpu on each node
   3. network interface speed
   4. is this multidc or not
   5. are vnodes enabled or not
   6. what are the jvm tunings
   7. compaction settings
   8. current load on the cluster
   9. streaming settings

Suffice it to say to improve repair performance is a full on tuning
exercise, note you're current operation is going to be worse than
tradtional repair, as your streaming copies of data around and not just
doing normal merkel tree work.

Restoring from backup to a new cluster (including how to handle token
ranges) is discussed in detail here
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html


On Mon, Dec 15, 2014 at 4:14 PM, Pranay Agarwal 
wrote:
>
> Hi All,
>
>
> I have 20 nodes cassandra cluster with 500gb of data and replication
> factor of 1. I increased the replication factor to 3 and ran nodetool
> repair on each node one by one as the docs says. But it takes hours for 1
> node to finish repair. Is that normal or am I doing something wrong?
>
> Also, I took backup of cassandra data on each node. How do I restore the
> graph in a new cluster of nodes using the backup? Do I have to have the
> tokens range backed up as well?
>
> -Pranay
>


-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Jason Kania
Ryan,
Thanks for the response. It offers a bit more clarity.
I think a series of blog posts with good real world examples would go a long 
way to increasing usability of Cassandra. Right now I find the process like 
going through a mine field because I only discover what is not possible after 
trying something that I would find logical and failing.

For my specific questions, the problem is that since searching is only possible 
on columns in the primary key and the primary key cannot be updated, I am not 
sure what the appropriate solution is when data exists that needs to be 
searched and then updated. What is the preferrable approach to this? Is the 
expectation to maintain a series of tables, one for each stage of data 
manipulation with its own primary key?
Thanks,
Jason
  From: Ryan Svihla 
 To: user@cassandra.apache.org 
 Sent: Tuesday, December 16, 2014 12:36 PM
 Subject: Re: Comprehensive documentation on Cassandra Data modelling
   
Data Modeling a distributed application could be a book unto itself. However, I 
will add, modeling by restriction is basically the entire thought process in 
Cassandra data modeling since it's a distributed hash table and a core aspect 
of that sort of application is you need to be able to quickly locate which 
server owns the data you want in the cluster (which is provided by the 
partition key).

in specific response to your questions
1) as long as you know the primary key and the column name this just works. I'm 
not sure what the problem is
2) Yes, the partition key tells you which server owns the data, otherwise you'd 
have to scan all servers to find what you're asking for.
3) I'm not sure I understand this.

To summarize, all modeling can be understood when you embrace the idea that :

   
   - Querying a single server will be faster than querying many servers
   - Multiple tables with the same data but with different partition keys is 
much easier to scale that a single table that you have to scan the whole 
cluster for your answer. 

If you accept this, you've basically got the key principle down...most other 
ideas are extensions of this, some nuance includes dealing with tombstones, 
partition size and order. and I can answer any more specifics. 

I've been meaning to write a series of blog posts on this, but as I stated, 
it's almost a book unto itself. Data modeling a distributed application 
requires a fundamental rethink of all the assumptions we've been taught for 
master/slave style databases.




On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania  wrote:
Hi,
I have been having a few exchanges with contributors to the project around what 
is possible with Cassandra and a common response that comes up when I describe 
functionality as broken or missing is that I am not modelling my data 
correctly. Unfortunately, I cannot seem to find comprehensive documentation on 
modelling with Cassandra. In particular, I am finding myself modelling by 
restriction rather than what I would like to do.

Does such documentations exist? If not, is there any effort to create such 
documentation?The DataStax documentation on data modelling is far too weak to 
be meaningful.

In particular, I am caught because:
1) I want to search on a specific column to make updates to it after further 
processing; ie I don't know its value on first insert
2) If I want to search on a column, it has to be part of the primary key3) If a 
column is part of the primary key, it cannot be edited so I have a circular 
dependency
Thanks,
Jason



-- 
Ryan SvihlaSolution Architect
 

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay. 


  

Re: Comprehensive documentation on Cassandra Data modelling

2014-12-16 Thread Ryan Svihla
There is a lot of stuff out there and the best thing you can do today is
watch Patrick McFadden's series. This is  was what I used before I started
at DataStax. Planet Cassandra has a data modeling playlist of videos you
can watch
https://www.youtube.com/playlist?list=PLqcm6qE9lgKJoSWKYWHWhrVupRbS8mmDA
including the McFadden videos I mentioned.

Finally, you hit a key point, a series of tables is the normal approach to
most data modeling, you model your tables around the queries you need, with
the exception of the nuance I referred to in the last email, this one
concept will get you through 80% of use cases fine.

On Tue, Dec 16, 2014 at 12:01 PM, Jason Kania  wrote:
>
> Ryan,
>
> Thanks for the response. It offers a bit more clarity.
>
> I think a series of blog posts with good real world examples would go a
> long way to increasing usability of Cassandra. Right now I find the process
> like going through a mine field because I only discover what is not
> possible after trying something that I would find logical and failing.
>
> For my specific questions, the problem is that since searching is only
> possible on columns in the primary key and the primary key cannot be
> updated, I am not sure what the appropriate solution is when data exists
> that needs to be searched and then updated. What is the preferrable
> approach to this? Is the expectation to maintain a series of tables, one
> for each stage of data manipulation with its own primary key?
>
> Thanks,
>
> Jason
>
>   --
>  *From:* Ryan Svihla 
> *To:* user@cassandra.apache.org
> *Sent:* Tuesday, December 16, 2014 12:36 PM
> *Subject:* Re: Comprehensive documentation on Cassandra Data modelling
>
> Data Modeling a distributed application could be a book unto itself.
> However, I will add, modeling by restriction is basically the entire
> thought process in Cassandra data modeling since it's a distributed hash
> table and a core aspect of that sort of application is you need to be able
> to quickly locate which server owns the data you want in the cluster (which
> is provided by the partition key).
>
> in specific response to your questions
> 1) as long as you know the primary key and the column name this just
> works. I'm not sure what the problem is
> 2) Yes, the partition key tells you which server owns the data, otherwise
> you'd have to scan all servers to find what you're asking for.
> 3) I'm not sure I understand this.
>
> To summarize, all modeling can be understood when you embrace the idea
> that :
>
>
>1. Querying a single server will be faster than querying many servers
>2. Multiple tables with the same data but with different partition
>keys is much easier to scale that a single table that you have to scan the
>whole cluster for your answer.
>
>
> If you accept this, you've basically got the key principle down...most
> other ideas are extensions of this, some nuance includes dealing with
> tombstones, partition size and order. and I can answer any more specifics.
>
> I've been meaning to write a series of blog posts on this, but as I
> stated, it's almost a book unto itself. Data modeling a distributed
> application requires a fundamental rethink of all the assumptions we've
> been taught for master/slave style databases.
>
>
>
>
> On Tue, Dec 16, 2014 at 10:46 AM, Jason Kania 
> wrote:
>
> Hi,
>
> I have been having a few exchanges with contributors to the project around
> what is possible with Cassandra and a common response that comes up when I
> describe functionality as broken or missing is that I am not modelling my
> data correctly. Unfortunately, I cannot seem to find comprehensive
> documentation on modelling with Cassandra. In particular, I am finding
> myself modelling by restriction rather than what I would like to do.
>
> Does such documentations exist? If not, is there any effort to create such
> documentation?The DataStax documentation on data modelling is far too weak
> to be meaningful.
>
> In particular, I am caught because:
>
> 1) I want to search on a specific column to make updates to it after
> further processing; ie I don't know its value on first insert
> 2) If I want to search on a column, it has to be part of the primary key
> 3) If a column is part of the primary key, it cannot be edited so I have a
> circular dependency
>
> Thanks,
>
> Jason
>
>
>
> --
> [image: datastax_logo.png] 
> Ryan Svihla
> Solution Architect
>
> [image: twitter.png]  [image: linkedin.png]
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
I was speculating.  From the responses above, it now appears to me that
tombstones serve (at least) 2 distinct roles:

1. When reading within a single cassandra instance, they mark a new version
of a value (that value being "deleted").  Without this, the prior version
would be the most recent and so reads would still return the last value
even after it was deleted.

2. They can resolve discrepancies when a client read receives conflicting
answers from Cassandra nodes (e.g. where one of the nodes is out of date
because it never saw the delete command).

So in the above I was only referring to #2, without realizing the role they
play in #1.

- Ian




On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky 
wrote:
>
>   When you say “no need for tombstones”, did you actually read that
> somewhere or were you just speculating? If the former, where exactly?
>
> -- Jack Krupansky
>
>  *From:* Ian Rose 
> *Sent:* Tuesday, December 16, 2014 10:22 AM
> *To:* user 
> *Subject:* does consistency=ALL for deletes obviate the need for
> tombstones?
>
>  Howdy all,
>
> Our use of cassandra unfortunately makes use of lots of deletes.  Yes, I
> know that C* is not well suited to this kind of workload, but that's where
> we are, and before I go looking for an entirely new data layer I would
> rather explore whether C* could be tuned to work well for us.
>
> However, deletions are never driven by users in our app - deletions always
> occur by backend processes to "clean up" data after it has been processed,
> and thus they do not need to be 100% available.  So this made me think,
> what if I did the following?
>
>- gc_grace_seconds = 0, which ensures that tombstones are never
>created
>- replication factor = 3
>- for writes that are inserts, consistency = QUORUM, which ensures
>that writes can proceed even if 1 replica is slow/down
>- for deletes, consistency = ALL, which ensures that when we delete a
>record it disappears entirely (no need for tombstones)
>- for reads, consistency = QUORUM
>
> Also, I should clarify that our data essentially append only, so I don't
> need to worry about inconsistencies created by partial updates (e.g. value
> gets changed on one machine but not another).  Sometimes there will be
> duplicate writes, but I think that should be fine since the value is always
> identical.
>
> Any red flags with this approach?  Has anyone tried it and have
> experiences to share?  Also, I *think* that this means that I don't need to
> run repairs, which from an ops perspective is great.
>
> Thanks, as always,
> - Ian
>
>


100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I have a three node cluster that has been sitting at a load of 4 (for each
node), 100% CPI utilization (although 92% nice) for that last 12 hours,
ever since some significant writes finished. I'm trying to determine what
tuning I should be doing to get it out of this state. The debug log is just
an endless series of:

DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
8000634880

iostat shows virtually no I/O.

Compaction may enter into this, but i don't really know what to make of
compaction stats since they never change:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 10
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction   mediamedia_tracks_raw   271651482
563615497 bytes48.20%
   Compaction   mediamedia_tracks_raw30308910
  21676695677 bytes 0.14%
   Compaction   mediamedia_tracks_raw  1198384080
   1815603161 bytes66.00%
Active compaction remaining time :   0h22m24s

5 minutes later:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 9
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction   mediamedia_tracks_raw   271651482
563615497 bytes48.20%
   Compaction   mediamedia_tracks_raw30308910
  21676695677 bytes 0.14%
   Compaction   mediamedia_tracks_raw  1198384080
   1815603161 bytes66.00%
Active compaction remaining time :   0h22m24s

Sure the pending tasks went down by one, but the rest is identical.
media_tracks_raw likely has a bunch of tombstones (can't figure out how to
get stats on that).

Is this behavior something that indicates that i need more Heap, larger new
generation? Should I be manually running compaction on tables with lots of
tombstones?

Any suggestions or places to educate myself better on performance tuning
would be appreciated.

arne


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jonathan Lacefield
Hello,

  What version of Cassandra are you running?

  If it's 2.0, we recently experienced something similar with 8447 [1],
which 8485 [2] should hopefully resolve.

  Please note that 8447 is not related to tombstones.  Tombstone processing
can put a lot of pressure on the heap as well. Why do you think you have a
lot of tombstones in that one particular table?

  [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
  [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]

 

On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen  wrote:
>
> I have a three node cluster that has been sitting at a load of 4 (for each
> node), 100% CPI utilization (although 92% nice) for that last 12 hours,
> ever since some significant writes finished. I'm trying to determine what
> tuning I should be doing to get it out of this state. The debug log is just
> an endless series of:
>
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
> 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
> 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
> 8000634880
>
> iostat shows virtually no I/O.
>
> Compaction may enter into this, but i don't really know what to make of
> compaction stats since they never change:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 10
>   compaction typekeyspace   table   completed
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   271651482
>   563615497 bytes48.20%
>Compaction   mediamedia_tracks_raw30308910
> 21676695677 bytes 0.14%
>Compaction   mediamedia_tracks_raw  1198384080
>  1815603161 bytes66.00%
> Active compaction remaining time :   0h22m24s
>
> 5 minutes later:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 9
>   compaction typekeyspace   table   completed
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   271651482
>   563615497 bytes48.20%
>Compaction   mediamedia_tracks_raw30308910
> 21676695677 bytes 0.14%
>Compaction   mediamedia_tracks_raw  1198384080
>  1815603161 bytes66.00%
> Active compaction remaining time :   0h22m24s
>
> Sure the pending tasks went down by one, but the rest is identical.
> media_tracks_raw likely has a bunch of tombstones (can't figure out how to
> get stats on that).
>
> Is this behavior something that indicates that i need more Heap, larger
> new generation? Should I be manually running compaction on tables with lots
> of tombstones?
>
> Any suggestions or places to educate myself better on performance tuning
> would be appreciated.
>
> arne
>


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's heap usage at?

On Tue, Dec 16, 2014 at 1:04 PM, Arne Claassen  wrote:
>
> I have a three node cluster that has been sitting at a load of 4 (for each
> node), 100% CPI utilization (although 92% nice) for that last 12 hours,
> ever since some significant writes finished. I'm trying to determine what
> tuning I should be doing to get it out of this state. The debug log is just
> an endless series of:
>
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
> 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
> 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
> 8000634880
>
> iostat shows virtually no I/O.
>
> Compaction may enter into this, but i don't really know what to make of
> compaction stats since they never change:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 10
>   compaction typekeyspace   table   completed
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   271651482
>   563615497 bytes48.20%
>Compaction   mediamedia_tracks_raw30308910
> 21676695677 bytes 0.14%
>Compaction   mediamedia_tracks_raw  1198384080
>  1815603161 bytes66.00%
> Active compaction remaining time :   0h22m24s
>
> 5 minutes later:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 9
>   compaction typekeyspace   table   completed
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   271651482
>   563615497 bytes48.20%
>Compaction   mediamedia_tracks_raw30308910
> 21676695677 bytes 0.14%
>Compaction   mediamedia_tracks_raw  1198384080
>  1815603161 bytes66.00%
> Active compaction remaining time :   0h22m24s
>
> Sure the pending tasks went down by one, but the rest is identical.
> media_tracks_raw likely has a bunch of tombstones (can't figure out how to
> get stats on that).
>
> Is this behavior something that indicates that i need more Heap, larger
> new generation? Should I be manually running compaction on tables with lots
> of tombstones?
>
> Any suggestions or places to educate myself better on performance tuning
> would be appreciated.
>
> arne
>


-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I'm running 2.0.10.

The data is all time series data and as we change our pipeline, we've been
periodically been reprocessing the data sources, which causes each time
series to be overwritten, i.e. every row per partition key is deleted and
re-written, so I assume i've been collecting a bunch of tombstones.

Also, the presence of the ever present and never completing compaction
types, i assumed were an artifact of tombstoning, but i fully admit to
conjecture based on about ~20 blog posts and stackoverflow questions i've
surveyed.

I doubled the Heap on one node and it changed nothing regarding the load or
the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%.

Anything else i should look at and report, let me know.

On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
jlacefi...@datastax.com> wrote:
>
> Hello,
>
>   What version of Cassandra are you running?
>
>   If it's 2.0, we recently experienced something similar with 8447 [1],
> which 8485 [2] should hopefully resolve.
>
>   Please note that 8447 is not related to tombstones.  Tombstone
> processing can put a lot of pressure on the heap as well. Why do you think
> you have a lot of tombstones in that one particular table?
>
>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>
> Jonathan
>
> [image: datastax_logo.png]
>
> Jonathan Lacefield
>
> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
>  
>
> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen  wrote:
>>
>> I have a three node cluster that has been sitting at a load of 4 (for
>> each node), 100% CPI utilization (although 92% nice) for that last 12
>> hours, ever since some significant writes finished. I'm trying to determine
>> what tuning I should be doing to get it out of this state. The debug log is
>> just an endless series of:
>>
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
>> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
>> 8000634880
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
>> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
>> 8000634880
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
>> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
>> 8000634880
>>
>> iostat shows virtually no I/O.
>>
>> Compaction may enter into this, but i don't really know what to make of
>> compaction stats since they never change:
>>
>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>> pending tasks: 10
>>   compaction typekeyspace   table   completed
>>   total  unit  progress
>>Compaction   mediamedia_tracks_raw   271651482
>>   563615497 bytes48.20%
>>Compaction   mediamedia_tracks_raw30308910
>> 21676695677 bytes 0.14%
>>Compaction   mediamedia_tracks_raw  1198384080
>>  1815603161 bytes66.00%
>> Active compaction remaining time :   0h22m24s
>>
>> 5 minutes later:
>>
>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>> pending tasks: 9
>>   compaction typekeyspace   table   completed
>>   total  unit  progress
>>Compaction   mediamedia_tracks_raw   271651482
>>   563615497 bytes48.20%
>>Compaction   mediamedia_tracks_raw30308910
>> 21676695677 bytes 0.14%
>>Compaction   mediamedia_tracks_raw  1198384080
>>  1815603161 bytes66.00%
>> Active compaction remaining time :   0h22m24s
>>
>> Sure the pending tasks went down by one, but the rest is identical.
>> media_tracks_raw likely has a bunch of tombstones (can't figure out how to
>> get stats on that).
>>
>> Is this behavior something that indicates that i need more Heap, larger
>> new generation? Should I be manually running compaction on tables with lots
>> of tombstones?
>>
>> Any suggestions or places to educate myself better on performance tuning
>> would be appreciated.
>>
>> arne
>>
>


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's CPU, RAM, Storage layer, and data density per node? Exact heap
settings would be nice. In the logs look for TombstoneOverflowingException


On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen  wrote:
>
> I'm running 2.0.10.
>
> The data is all time series data and as we change our pipeline, we've been
> periodically been reprocessing the data sources, which causes each time
> series to be overwritten, i.e. every row per partition key is deleted and
> re-written, so I assume i've been collecting a bunch of tombstones.
>
> Also, the presence of the ever present and never completing compaction
> types, i assumed were an artifact of tombstoning, but i fully admit to
> conjecture based on about ~20 blog posts and stackoverflow questions i've
> surveyed.
>
> I doubled the Heap on one node and it changed nothing regarding the load
> or the ParNew log statements. New Generation Usage is 50%, Eden itself is
> 56%.
>
> Anything else i should look at and report, let me know.
>
> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
> jlacefi...@datastax.com> wrote:
>>
>> Hello,
>>
>>   What version of Cassandra are you running?
>>
>>   If it's 2.0, we recently experienced something similar with 8447 [1],
>> which 8485 [2] should hopefully resolve.
>>
>>   Please note that 8447 is not related to tombstones.  Tombstone
>> processing can put a lot of pressure on the heap as well. Why do you think
>> you have a lot of tombstones in that one particular table?
>>
>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>
>> Jonathan
>>
>> [image: datastax_logo.png]
>>
>> Jonathan Lacefield
>>
>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>>  
>>
>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen  wrote:
>>>
>>> I have a three node cluster that has been sitting at a load of 4 (for
>>> each node), 100% CPI utilization (although 92% nice) for that last 12
>>> hours, ever since some significant writes finished. I'm trying to determine
>>> what tuning I should be doing to get it out of this state. The debug log is
>>> just an endless series of:
>>>
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
>>> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
>>> 8000634880
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
>>> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
>>> 8000634880
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
>>> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
>>> 8000634880
>>>
>>> iostat shows virtually no I/O.
>>>
>>> Compaction may enter into this, but i don't really know what to make of
>>> compaction stats since they never change:
>>>
>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>> pending tasks: 10
>>>   compaction typekeyspace   table
>>> completed   total  unit  progress
>>>Compaction   mediamedia_tracks_raw
>>> 271651482   563615497 bytes48.20%
>>>Compaction   mediamedia_tracks_raw
>>>  30308910 21676695677 bytes 0.14%
>>>Compaction   mediamedia_tracks_raw
>>>  1198384080  1815603161 bytes66.00%
>>> Active compaction remaining time :   0h22m24s
>>>
>>> 5 minutes later:
>>>
>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>> pending tasks: 9
>>>   compaction typekeyspace   table
>>> completed   total  unit  progress
>>>Compaction   mediamedia_tracks_raw
>>> 271651482   563615497 bytes48.20%
>>>Compaction   mediamedia_tracks_raw
>>>  30308910 21676695677 bytes 0.14%
>>>Compaction   mediamedia_tracks_raw
>>>  1198384080  1815603161 bytes66.00%
>>> Active compaction remaining time :   0h22m24s
>>>
>>> Sure the pending tasks went down by one, but the rest is identical.
>>> media_tracks_raw likely has a bunch of tombstones (can't figure out how to
>>> get stats on that).
>>>
>>> Is this behavior something that indicates that i need more Heap, larger
>>> new generation? Should I be manually running compaction on tables with lots
>>> of tombstones?
>>>
>>> Any suggestions or places to educate myself better on performance tuning
>>> would be appreciated.
>>>
>>> arne
>>>
>>

-- 

[image: datastax_logo.png] 

Ryan Svihla

Solution Architect

[image: twitter.png]  [image: linkedin.png]


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
might go c3.2xlarge instead if CPU is more important than RAM
Storage is optimized EBS SSD (but iostat shows no real IO going on)
Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.

The node on which I set the Heap to 10GB from 6GB the utlilization has
dropped to 46%nice now, but the ParNew log messages still continue at the
same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
nice CPU further down.

No TombstoneOverflowingExceptions.

On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla  wrote:
>
> What's CPU, RAM, Storage layer, and data density per node? Exact heap
> settings would be nice. In the logs look for TombstoneOverflowingException
>
>
> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen  wrote:
>>
>> I'm running 2.0.10.
>>
>> The data is all time series data and as we change our pipeline, we've
>> been periodically been reprocessing the data sources, which causes each
>> time series to be overwritten, i.e. every row per partition key is deleted
>> and re-written, so I assume i've been collecting a bunch of tombstones.
>>
>> Also, the presence of the ever present and never completing compaction
>> types, i assumed were an artifact of tombstoning, but i fully admit to
>> conjecture based on about ~20 blog posts and stackoverflow questions i've
>> surveyed.
>>
>> I doubled the Heap on one node and it changed nothing regarding the load
>> or the ParNew log statements. New Generation Usage is 50%, Eden itself is
>> 56%.
>>
>> Anything else i should look at and report, let me know.
>>
>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>> jlacefi...@datastax.com> wrote:
>>>
>>> Hello,
>>>
>>>   What version of Cassandra are you running?
>>>
>>>   If it's 2.0, we recently experienced something similar with 8447 [1],
>>> which 8485 [2] should hopefully resolve.
>>>
>>>   Please note that 8447 is not related to tombstones.  Tombstone
>>> processing can put a lot of pressure on the heap as well. Why do you think
>>> you have a lot of tombstones in that one particular table?
>>>
>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>>
>>> Jonathan
>>>
>>> [image: datastax_logo.png]
>>>
>>> Jonathan Lacefield
>>>
>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>>  
>>>
>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
>>> wrote:

 I have a three node cluster that has been sitting at a load of 4 (for
 each node), 100% CPI utilization (although 92% nice) for that last 12
 hours, ever since some significant writes finished. I'm trying to determine
 what tuning I should be doing to get it out of this state. The debug log is
 just an endless series of:

 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
 8000634880
 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
 8000634880

 iostat shows virtually no I/O.

 Compaction may enter into this, but i don't really know what to make of
 compaction stats since they never change:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 10
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active compaction remaining time :   0h22m24s

 5 minutes later:

 [root@cassandra-37919c3a ~]# nodetool compactionstats
 pending tasks: 9
   compaction typekeyspace   table
 completed   total  unit  progress
Compaction   mediamedia_tracks_raw
 271651482   563615497 bytes48.20%
Compaction   mediamedia_tracks_raw
  30308910 21676695677 bytes 0.14%
Compaction   mediamedia_tracks_raw
  1198384080  1815603161 bytes66.00%
 Active c

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
The others are 6GB

On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen  wrote:
>
> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
> might go c3.2xlarge instead if CPU is more important than RAM
> Storage is optimized EBS SSD (but iostat shows no real IO going on)
> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>
> The node on which I set the Heap to 10GB from 6GB the utlilization has
> dropped to 46%nice now, but the ParNew log messages still continue at the
> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
> nice CPU further down.
>
> No TombstoneOverflowingExceptions.
>
> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
> wrote:
>>
>> What's CPU, RAM, Storage layer, and data density per node? Exact heap
>> settings would be nice. In the logs look for TombstoneOverflowingException
>>
>>
>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen  wrote:
>>>
>>> I'm running 2.0.10.
>>>
>>> The data is all time series data and as we change our pipeline, we've
>>> been periodically been reprocessing the data sources, which causes each
>>> time series to be overwritten, i.e. every row per partition key is deleted
>>> and re-written, so I assume i've been collecting a bunch of tombstones.
>>>
>>> Also, the presence of the ever present and never completing compaction
>>> types, i assumed were an artifact of tombstoning, but i fully admit to
>>> conjecture based on about ~20 blog posts and stackoverflow questions i've
>>> surveyed.
>>>
>>> I doubled the Heap on one node and it changed nothing regarding the load
>>> or the ParNew log statements. New Generation Usage is 50%, Eden itself is
>>> 56%.
>>>
>>> Anything else i should look at and report, let me know.
>>>
>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>>> jlacefi...@datastax.com> wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447 [1],
 which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png]  [image:
 facebook.png]  [image: twitter.png]
  [image: g+.png]
 
  

 On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
 wrote:
>
> I have a three node cluster that has been sitting at a load of 4 (for
> each node), 100% CPI utilization (although 92% nice) for that last 12
> hours, ever since some significant writes finished. I'm trying to 
> determine
> what tuning I should be doing to get it out of this state. The debug log 
> is
> just an endless series of:
>
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
> (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max
> is 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
> (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max
> is 8000634880
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
> (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used;
> max is 8000634880
>
> iostat shows virtually no I/O.
>
> Compaction may enter into this, but i don't really know what to make
> of compaction stats since they never change:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 10
>   compaction typekeyspace   table
> completed   total  unit  progress
>Compaction   mediamedia_tracks_raw
> 271651482   563615497 bytes48.20%
>Compaction   mediamedia_tracks_raw
>  30308910 21676695677 bytes 0.14%
>Compaction   mediamedia_tracks_raw
>  1198384080  1815603161 bytes66.00%
> Active compaction remaining time :   0h22m24s
>
> 5 minutes later:
>
> [root@cassandra-37919c3a ~]# nodetool compactionstats
> pending tasks: 9
>   compaction typekeyspace   table
> completed   total  unit  progress
>Compaction   mediamedia_tracks_raw
>

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
Checked my dev cluster to see if the ParNew log entries are just par for
the course, but not seeing them there. However, both have the following
every 30 seconds:

DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line
165) Started replayAllFailedBatches
DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
ColumnFamilyStore.java (line 866) forceFlush requested but everything is
clean in batchlog
DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line
200) Finished replayAllFailedBatches

Is that just routine scheduled house-keeping or a sign of something else?

On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen  wrote:
>
> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
> The others are 6GB
>
> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen  wrote:
>>
>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
>> might go c3.2xlarge instead if CPU is more important than RAM
>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>
>> The node on which I set the Heap to 10GB from 6GB the utlilization has
>> dropped to 46%nice now, but the ParNew log messages still continue at the
>> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
>> nice CPU further down.
>>
>> No TombstoneOverflowingExceptions.
>>
>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
>> wrote:
>>>
>>> What's CPU, RAM, Storage layer, and data density per node? Exact heap
>>> settings would be nice. In the logs look for TombstoneOverflowingException
>>>
>>>
>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
>>> wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline, we've
 been periodically been reprocessing the data sources, which causes each
 time series to be overwritten, i.e. every row per partition key is deleted
 and re-written, so I assume i've been collecting a bunch of tombstones.

 Also, the presence of the ever present and never completing compaction
 types, i assumed were an artifact of tombstoning, but i fully admit to
 conjecture based on about ~20 blog posts and stackoverflow questions i've
 surveyed.

 I doubled the Heap on one node and it changed nothing regarding the
 load or the ParNew log statements. New Generation Usage is 50%, Eden itself
 is 56%.

 Anything else i should look at and report, let me know.

 On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
 jlacefi...@datastax.com> wrote:
>
> Hello,
>
>   What version of Cassandra are you running?
>
>   If it's 2.0, we recently experienced something similar with 8447
> [1], which 8485 [2] should hopefully resolve.
>
>   Please note that 8447 is not related to tombstones.  Tombstone
> processing can put a lot of pressure on the heap as well. Why do you think
> you have a lot of tombstones in that one particular table?
>
>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>
> Jonathan
>
> [image: datastax_logo.png]
>
> Jonathan Lacefield
>
> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
>  
>
> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
> wrote:
>>
>> I have a three node cluster that has been sitting at a load of 4 (for
>> each node), 100% CPI utilization (although 92% nice) for that last 12
>> hours, ever since some significant writes finished. I'm trying to 
>> determine
>> what tuning I should be doing to get it out of this state. The debug log 
>> is
>> just an endless series of:
>>
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
>> (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max
>> is 8000634880
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
>> (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max
>> is 8000634880
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java
>> (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used;
>> max is 8000634880
>>
>> iostat shows virtually no I/O.
>>
>> Compaction may enter into this, but i don't really know what to make
>> of compaction stats since they never change:
>>
>> [root@cassandra-37919c3a ~]# nodetool compactionsta

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So heap of that size without some tuning will create a number of problems
(high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
(which I'd only set that low for that low cpu count) , or attempt the
tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150

On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen  wrote:
>
> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
> Checked my dev cluster to see if the ParNew log entries are just par for
> the course, but not seeing them there. However, both have the following
> every 30 seconds:
>
> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line
> 165) Started replayAllFailedBatches
> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
> clean in batchlog
> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line
> 200) Finished replayAllFailedBatches
>
> Is that just routine scheduled house-keeping or a sign of something else?
>
> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen  wrote:
>>
>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
>> The others are 6GB
>>
>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
>> wrote:
>>>
>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
>>> might go c3.2xlarge instead if CPU is more important than RAM
>>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>>
>>> The node on which I set the Heap to 10GB from 6GB the utlilization has
>>> dropped to 46%nice now, but the ParNew log messages still continue at the
>>> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
>>> nice CPU further down.
>>>
>>> No TombstoneOverflowingExceptions.
>>>
>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
>>> wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact heap
 settings would be nice. In the logs look for TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
 wrote:
>
> I'm running 2.0.10.
>
> The data is all time series data and as we change our pipeline, we've
> been periodically been reprocessing the data sources, which causes each
> time series to be overwritten, i.e. every row per partition key is deleted
> and re-written, so I assume i've been collecting a bunch of tombstones.
>
> Also, the presence of the ever present and never completing compaction
> types, i assumed were an artifact of tombstoning, but i fully admit to
> conjecture based on about ~20 blog posts and stackoverflow questions i've
> surveyed.
>
> I doubled the Heap on one node and it changed nothing regarding the
> load or the ParNew log statements. New Generation Usage is 50%, Eden 
> itself
> is 56%.
>
> Anything else i should look at and report, let me know.
>
> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
> jlacefi...@datastax.com> wrote:
>>
>> Hello,
>>
>>   What version of Cassandra are you running?
>>
>>   If it's 2.0, we recently experienced something similar with 8447
>> [1], which 8485 [2] should hopefully resolve.
>>
>>   Please note that 8447 is not related to tombstones.  Tombstone
>> processing can put a lot of pressure on the heap as well. Why do you 
>> think
>> you have a lot of tombstones in that one particular table?
>>
>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>
>> Jonathan
>>
>> [image: datastax_logo.png]
>>
>> Jonathan Lacefield
>>
>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image:
>> twitter.png]  [image: g+.png]
>> 
>>  
>>
>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
>> wrote:
>>>
>>> I have a three node cluster that has been sitting at a load of 4
>>> (for each node), 100% CPI utilization (although 92% nice) for that last 
>>> 12
>>> hours, ever since some significant writes finished. I'm trying to 
>>> determine
>>> what tuning I should be doing to get it out of this state. The debug 
>>> log is
>>> just an endless series of:
>>>
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java
>>> (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; 
>>> max
>>> is 8000634880
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java
>>> (lin

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
also based on replayed batches..are you using batches to load data?

On Tue, Dec 16, 2014 at 3:12 PM, Ryan Svihla  wrote:
>
> So heap of that size without some tuning will create a number of problems
> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
> (which I'd only set that low for that low cpu count) , or attempt the
> tunings as indicated in
> https://issues.apache.org/jira/browse/CASSANDRA-8150
>
> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen  wrote:
>>
>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
>> Checked my dev cluster to see if the ParNew log entries are just par for
>> the course, but not seeing them there. However, both have the following
>> every 30 seconds:
>>
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
>> (line 165) Started replayAllFailedBatches
>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
>> clean in batchlog
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
>> (line 200) Finished replayAllFailedBatches
>>
>> Is that just routine scheduled house-keeping or a sign of something else?
>>
>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
>> wrote:
>>>
>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
>>> The others are 6GB
>>>
>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
>>> wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
 wrote:
>
> What's CPU, RAM, Storage layer, and data density per node? Exact heap
> settings would be nice. In the logs look for TombstoneOverflowingException
>
>
> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
> wrote:
>>
>> I'm running 2.0.10.
>>
>> The data is all time series data and as we change our pipeline, we've
>> been periodically been reprocessing the data sources, which causes each
>> time series to be overwritten, i.e. every row per partition key is 
>> deleted
>> and re-written, so I assume i've been collecting a bunch of tombstones.
>>
>> Also, the presence of the ever present and never completing
>> compaction types, i assumed were an artifact of tombstoning, but i fully
>> admit to conjecture based on about ~20 blog posts and stackoverflow
>> questions i've surveyed.
>>
>> I doubled the Heap on one node and it changed nothing regarding the
>> load or the ParNew log statements. New Generation Usage is 50%, Eden 
>> itself
>> is 56%.
>>
>> Anything else i should look at and report, let me know.
>>
>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>> jlacefi...@datastax.com> wrote:
>>>
>>> Hello,
>>>
>>>   What version of Cassandra are you running?
>>>
>>>   If it's 2.0, we recently experienced something similar with 8447
>>> [1], which 8485 [2] should hopefully resolve.
>>>
>>>   Please note that 8447 is not related to tombstones.  Tombstone
>>> processing can put a lot of pressure on the heap as well. Why do you 
>>> think
>>> you have a lot of tombstones in that one particular table?
>>>
>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>>
>>> Jonathan
>>>
>>> [image: datastax_logo.png]
>>>
>>> Jonathan Lacefield
>>>
>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image:
>>> twitter.png]  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
>>> wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that 
 last 12
 hours, ever since some significant writes finished. I'm trying to 
 determine
 what tuning I should be doing to get it out of this state. The debug 
 log is
 just an endless series of:


Best Time Series insert strategy

2014-12-16 Thread Arne Claassen
I have a time series table consisting of frame information for media. The
table is partitioned on the media ID and uses time and some other frame
level keys as cluster keys, i.e. all frames for a one piece of media is
really one column family "row", even though it is represented in CQL as a
ordered series of frame data. The size of these sets vary from 5k to 200k
"rows" per media and are always inserted at one time and available in
memory in ordered form. I'm currently fanning the inserts out via async
calls, using a queue to fix the max parallelism (set to 100 right now).

For some of the larger sets (50k and above) I sometimes get the following
exception:

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency ONE (1 replica were required but
only 0 acknowledged the write)
at
com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.Responses$Error.asException(Responses.java:93)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:237)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:402)
~[com.datastax.cassandra.cassandra-driver-core-2.1.1.jar:na]


I've tried to reduce the max parallelism and increasing the timeout
threshold, but once the cluster gets humming from a bunch of inserts even
going as low as 10 in parallel doesn't seem to completely avoid those
exceptions from occurring.

I realize that fanning out just means that previously ordered data is not
arriving at random nodes in random order and has to get to the partition
key owning nodes and be re-ordered as they arrive, which seems less like
the wrong way to do it. However the parallelism approach does increase
insert speed almost linearly except for those timeouts.

I'm wondering what the best approach would be. The scenarios I can think of
are:

1) Retry and back off on Timeout Exceptions, but keep the fan out approach.

Seems like a good approach unless the Timeout really is just a warning that
I'm overloading things

2) Switch to BATCH inserts

Would this be better, since the data would go to only a single node and be
inserted in ordered form? And would this even alleviate timeouts since now
giant batches need to be acknowledged by the replicas.

3) Go to consistency ANY.

The docs seem to imply that TimeoutException isn't really a failure, just a
heads up. I don't really care about waiting for all replicas to be up to
date on these inserts anyhow, but is it really safe or am i looking at
replica's drifting out of sync.

4) Figure out how to tune my cluster better and change nothing on the client

thanks,
arne


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
The starting configuration I had, which is still running on two of the
nodes, was 6GB Heap, 1024MB parnew which is close to what you are
suggesting and those have been pegged at load 4 for the over 12 hours with
hardly and read or write traffic. I will set one to 8GB/400MB and see if
its load changes.

On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla  wrote:
>
> So heap of that size without some tuning will create a number of problems
> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
> (which I'd only set that low for that low cpu count) , or attempt the
> tunings as indicated in
> https://issues.apache.org/jira/browse/CASSANDRA-8150
>
> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen  wrote:
>>
>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
>> Checked my dev cluster to see if the ParNew log entries are just par for
>> the course, but not seeing them there. However, both have the following
>> every 30 seconds:
>>
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
>> (line 165) Started replayAllFailedBatches
>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
>> clean in batchlog
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
>> (line 200) Finished replayAllFailedBatches
>>
>> Is that just routine scheduled house-keeping or a sign of something else?
>>
>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
>> wrote:
>>>
>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now.
>>> The others are 6GB
>>>
>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
>>> wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
 might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.

 The node on which I set the Heap to 10GB from 6GB the utlilization has
 dropped to 46%nice now, but the ParNew log messages still continue at the
 same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that
 nice CPU further down.

 No TombstoneOverflowingExceptions.

 On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
 wrote:
>
> What's CPU, RAM, Storage layer, and data density per node? Exact heap
> settings would be nice. In the logs look for TombstoneOverflowingException
>
>
> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
> wrote:
>>
>> I'm running 2.0.10.
>>
>> The data is all time series data and as we change our pipeline, we've
>> been periodically been reprocessing the data sources, which causes each
>> time series to be overwritten, i.e. every row per partition key is 
>> deleted
>> and re-written, so I assume i've been collecting a bunch of tombstones.
>>
>> Also, the presence of the ever present and never completing
>> compaction types, i assumed were an artifact of tombstoning, but i fully
>> admit to conjecture based on about ~20 blog posts and stackoverflow
>> questions i've surveyed.
>>
>> I doubled the Heap on one node and it changed nothing regarding the
>> load or the ParNew log statements. New Generation Usage is 50%, Eden 
>> itself
>> is 56%.
>>
>> Anything else i should look at and report, let me know.
>>
>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>> jlacefi...@datastax.com> wrote:
>>>
>>> Hello,
>>>
>>>   What version of Cassandra are you running?
>>>
>>>   If it's 2.0, we recently experienced something similar with 8447
>>> [1], which 8485 [2] should hopefully resolve.
>>>
>>>   Please note that 8447 is not related to tombstones.  Tombstone
>>> processing can put a lot of pressure on the heap as well. Why do you 
>>> think
>>> you have a lot of tombstones in that one particular table?
>>>
>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>>
>>> Jonathan
>>>
>>> [image: datastax_logo.png]
>>>
>>> Jonathan Lacefield
>>>
>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image:
>>> twitter.png]  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen 
>>> wrote:

 I have a three node cluster that has been sitting at a load of 4
 (for each node), 100% CPI utilization (although 92% nice) for that 
 last 12

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough
to run Cassandra well in, especially if you're going full bore on loads.
However, you maybe just flat out be CPU bound on your write throughput, how
many TPS and what size writes do you have? Also what is your widest row?

Final question what is compaction throughput at?


On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen  wrote:
>
> The starting configuration I had, which is still running on two of the
> nodes, was 6GB Heap, 1024MB parnew which is close to what you are
> suggesting and those have been pegged at load 4 for the over 12 hours with
> hardly and read or write traffic. I will set one to 8GB/400MB and see if
> its load changes.
>
> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla  wrote:
>
>> So heap of that size without some tuning will create a number of problems
>> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew
>> (which I'd only set that low for that low cpu count) , or attempt the
>> tunings as indicated in
>> https://issues.apache.org/jira/browse/CASSANDRA-8150
>>
>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen  wrote:
>>>
>>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now.
>>> Checked my dev cluster to see if the ParNew log entries are just par for
>>> the course, but not seeing them there. However, both have the following
>>> every 30 seconds:
>>>
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
>>> (line 165) Started replayAllFailedBatches
>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
>>> clean in batchlog
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
>>> (line 200) Finished replayAllFailedBatches
>>>
>>> Is that just routine scheduled house-keeping or a sign of something else?
>>>
>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
>>> wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
 wrote:
>
> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we
> might go c3.2xlarge instead if CPU is more important than RAM
> Storage is optimized EBS SSD (but iostat shows no real IO going on)
> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>
> The node on which I set the Heap to 10GB from 6GB the utlilization has
> dropped to 46%nice now, but the ParNew log messages still continue at the
> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings 
> that
> nice CPU further down.
>
> No TombstoneOverflowingExceptions.
>
> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
> wrote:
>>
>> What's CPU, RAM, Storage layer, and data density per node? Exact heap
>> settings would be nice. In the logs look for 
>> TombstoneOverflowingException
>>
>>
>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
>> wrote:
>>>
>>> I'm running 2.0.10.
>>>
>>> The data is all time series data and as we change our pipeline,
>>> we've been periodically been reprocessing the data sources, which causes
>>> each time series to be overwritten, i.e. every row per partition key is
>>> deleted and re-written, so I assume i've been collecting a bunch of
>>> tombstones.
>>>
>>> Also, the presence of the ever present and never completing
>>> compaction types, i assumed were an artifact of tombstoning, but i fully
>>> admit to conjecture based on about ~20 blog posts and stackoverflow
>>> questions i've surveyed.
>>>
>>> I doubled the Heap on one node and it changed nothing regarding the
>>> load or the ParNew log statements. New Generation Usage is 50%, Eden 
>>> itself
>>> is 56%.
>>>
>>> Anything else i should look at and report, let me know.
>>>
>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>>> jlacefi...@datastax.com> wrote:

 Hello,

   What version of Cassandra are you running?

   If it's 2.0, we recently experienced something similar with 8447
 [1], which 8485 [2] should hopefully resolve.

   Please note that 8447 is not related to tombstones.  Tombstone
 processing can put a lot of pressure on the heap as well. Why do you 
 think
 you have a lot of tombstones in that one particular table?

   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485

 Jonathan

 [image: datastax_logo.png]

 Jonathan Lacefield

 Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

 [image: linkedin.png]  [image:

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Actually not sure why the machine was originally configured at 6GB since we
even started it on an r3.large with 15GB.

Re: Batches

Not using batches. I actually have that as a separate question on the list.
Currently I fan out async single inserts and I'm wondering if batches are
better since my data is inherently inserted in blocks of ordered rows for a
single partition key.


Re: Traffic

There isn't all that much traffic. Inserts come in as blocks per partition
key, but then can be 5k-200k rows for that partition key. Each of these
rows is less than 100k. It's small, lots of ordered rows. It's frame and
sub-frame information for media. and rows for one piece of media is
inserted at once (the partition key).

For the last 12 hours, where the load on all these machine has been stuck
there's been virtually no traffic at all. This is the nodes basically
sitting idle, except that they had  load of 4 each.

BTW, how do you determine widest row or for that matter number of
tombstones in a row?

thanks,
arne

On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla  wrote:
>
> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
> enough to run Cassandra well in, especially if you're going full bore on
> loads. However, you maybe just flat out be CPU bound on your write
> throughput, how many TPS and what size writes do you have? Also what is
> your widest row?
>
> Final question what is compaction throughput at?
>
>
> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen  wrote:
>>
>> The starting configuration I had, which is still running on two of the
>> nodes, was 6GB Heap, 1024MB parnew which is close to what you are
>> suggesting and those have been pegged at load 4 for the over 12 hours with
>> hardly and read or write traffic. I will set one to 8GB/400MB and see if
>> its load changes.
>>
>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla 
>> wrote:
>>
>>> So heap of that size without some tuning will create a number of
>>> problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
>>> parnew (which I'd only set that low for that low cpu count) , or attempt
>>> the tunings as indicated in
>>> https://issues.apache.org/jira/browse/CASSANDRA-8150
>>>
>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen 
>>> wrote:

 Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
 now. Checked my dev cluster to see if the ParNew log entries are just par
 for the course, but not seeing them there. However, both have the following
 every 30 seconds:

 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
 (line 165) Started replayAllFailedBatches
 DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
 ColumnFamilyStore.java (line 866) forceFlush requested but everything is
 clean in batchlog
 DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
 (line 200) Finished replayAllFailedBatches

 Is that just routine scheduled house-keeping or a sign of something
 else?

 On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
 wrote:
>
> Sorry, I meant 15GB heap on the one machine that has less nice CPU%
> now. The others are 6GB
>
> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
> wrote:
>>
>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
>> we might go c3.2xlarge instead if CPU is more important than RAM
>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>
>> The node on which I set the Heap to 10GB from 6GB the utlilization
>> has dropped to 46%nice now, but the ParNew log messages still continue at
>> the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that 
>> brings
>> that nice CPU further down.
>>
>> No TombstoneOverflowingExceptions.
>>
>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
>> wrote:
>>>
>>> What's CPU, RAM, Storage layer, and data density per node? Exact
>>> heap settings would be nice. In the logs look for
>>> TombstoneOverflowingException
>>>
>>>
>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
>>> wrote:

 I'm running 2.0.10.

 The data is all time series data and as we change our pipeline,
 we've been periodically been reprocessing the data sources, which 
 causes
 each time series to be overwritten, i.e. every row per partition key is
 deleted and re-written, so I assume i've been collecting a bunch of
 tombstones.

 Also, the presence of the ever present and never completing
 compaction types, i assumed were an artifact of tombstoning, but i 
 fully
 admit to conjecture based on about ~20 blog posts and stackoverflow
 questions i've surveyed.

 I doubled the Heap on one node and it changed nothin

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Can you define what is "virtual no traffic" sorry to be repetitive about
that, but I've worked on a lot of clusters in the past year and people have
wildly different ideas what that means.

unlogged batches of the same partition key are definitely a performance
optimization. Typically async is much faster and easier on the cluster when
you're using multip partition key batches.

nodetool cfhistograms  

On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen  wrote:
>
> Actually not sure why the machine was originally configured at 6GB since
> we even started it on an r3.large with 15GB.
>
> Re: Batches
>
> Not using batches. I actually have that as a separate question on the
> list. Currently I fan out async single inserts and I'm wondering if batches
> are better since my data is inherently inserted in blocks of ordered rows
> for a single partition key.
>
>
> Re: Traffic
>
> There isn't all that much traffic. Inserts come in as blocks per partition
> key, but then can be 5k-200k rows for that partition key. Each of these
> rows is less than 100k. It's small, lots of ordered rows. It's frame and
> sub-frame information for media. and rows for one piece of media is
> inserted at once (the partition key).
>
> For the last 12 hours, where the load on all these machine has been stuck
> there's been virtually no traffic at all. This is the nodes basically
> sitting idle, except that they had  load of 4 each.
>
> BTW, how do you determine widest row or for that matter number of
> tombstones in a row?
>
> thanks,
> arne
>
> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla  wrote:
>>
>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
>> enough to run Cassandra well in, especially if you're going full bore on
>> loads. However, you maybe just flat out be CPU bound on your write
>> throughput, how many TPS and what size writes do you have? Also what is
>> your widest row?
>>
>> Final question what is compaction throughput at?
>>
>>
>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen  wrote:
>>>
>>> The starting configuration I had, which is still running on two of the
>>> nodes, was 6GB Heap, 1024MB parnew which is close to what you are
>>> suggesting and those have been pegged at load 4 for the over 12 hours with
>>> hardly and read or write traffic. I will set one to 8GB/400MB and see if
>>> its load changes.
>>>
>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla 
>>> wrote:
>>>
 So heap of that size without some tuning will create a number of
 problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
 parnew (which I'd only set that low for that low cpu count) , or attempt
 the tunings as indicated in
 https://issues.apache.org/jira/browse/CASSANDRA-8150

 On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen 
 wrote:
>
> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
> now. Checked my dev cluster to see if the ParNew log entries are just par
> for the course, but not seeing them there. However, both have the 
> following
> every 30 seconds:
>
> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
> (line 165) Started replayAllFailedBatches
> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
> clean in batchlog
> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
> (line 200) Finished replayAllFailedBatches
>
> Is that just routine scheduled house-keeping or a sign of something
> else?
>
> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
> wrote:
>>
>> Sorry, I meant 15GB heap on the one machine that has less nice CPU%
>> now. The others are 6GB
>>
>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
>> wrote:
>>>
>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
>>> we might go c3.2xlarge instead if CPU is more important than RAM
>>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>>
>>> The node on which I set the Heap to 10GB from 6GB the utlilization
>>> has dropped to 46%nice now, but the ParNew log messages still continue 
>>> at
>>> the same pace. I'm gonna up the HEAP to 20GB for a bit, see if that 
>>> brings
>>> that nice CPU further down.
>>>
>>> No TombstoneOverflowingExceptions.
>>>
>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla 
>>> wrote:

 What's CPU, RAM, Storage layer, and data density per node? Exact
 heap settings would be nice. In the logs look for
 TombstoneOverflowingException


 On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen 
 wrote:
>
> I'm running 2.0.10.
>
> The data is all time series data and as we change our pipeline,
>

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
No problem with the follow up questions. I'm on a crash course here trying
to understand what makes C* tick so I appreciate all feedback.

We reprocessed all media (1200 partition keys) last night where partition
keys had somewhere between 4k and 200k "rows". After that completed, no
traffic went to cluster at all for ~8 hours and throughout today, we may
get a couple (less than 10) queries per second and maybe 3-4 write batches
per hour.

I assume the last value in the Partition Size histogram is the largest row:

20924300 bytes: 79
25109160 bytes: 57

The majority seems clustered around 20 bytes.

I will look at switching my inserts to unlogged batches since they are
always for one partition key.

On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla  wrote:
>
> Can you define what is "virtual no traffic" sorry to be repetitive about
> that, but I've worked on a lot of clusters in the past year and people have
> wildly different ideas what that means.
>
> unlogged batches of the same partition key are definitely a performance
> optimization. Typically async is much faster and easier on the cluster when
> you're using multip partition key batches.
>
> nodetool cfhistograms  
>
> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen  wrote:
>>
>> Actually not sure why the machine was originally configured at 6GB since
>> we even started it on an r3.large with 15GB.
>>
>> Re: Batches
>>
>> Not using batches. I actually have that as a separate question on the
>> list. Currently I fan out async single inserts and I'm wondering if batches
>> are better since my data is inherently inserted in blocks of ordered rows
>> for a single partition key.
>>
>>
>> Re: Traffic
>>
>> There isn't all that much traffic. Inserts come in as blocks per
>> partition key, but then can be 5k-200k rows for that partition key. Each of
>> these rows is less than 100k. It's small, lots of ordered rows. It's frame
>> and sub-frame information for media. and rows for one piece of media is
>> inserted at once (the partition key).
>>
>> For the last 12 hours, where the load on all these machine has been stuck
>> there's been virtually no traffic at all. This is the nodes basically
>> sitting idle, except that they had  load of 4 each.
>>
>> BTW, how do you determine widest row or for that matter number of
>> tombstones in a row?
>>
>> thanks,
>> arne
>>
>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla 
>> wrote:
>>>
>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
>>> enough to run Cassandra well in, especially if you're going full bore on
>>> loads. However, you maybe just flat out be CPU bound on your write
>>> throughput, how many TPS and what size writes do you have? Also what is
>>> your widest row?
>>>
>>> Final question what is compaction throughput at?
>>>
>>>
>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen 
>>> wrote:

 The starting configuration I had, which is still running on two of the
 nodes, was 6GB Heap, 1024MB parnew which is close to what you are
 suggesting and those have been pegged at load 4 for the over 12 hours with
 hardly and read or write traffic. I will set one to 8GB/400MB and see if
 its load changes.

 On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla 
 wrote:

> So heap of that size without some tuning will create a number of
> problems (high cpu usage one of them), I suggest either 8GB heap and 400mb
> parnew (which I'd only set that low for that low cpu count) , or attempt
> the tunings as indicated in
> https://issues.apache.org/jira/browse/CASSANDRA-8150
>
> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen 
> wrote:
>>
>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
>> now. Checked my dev cluster to see if the ParNew log entries are just par
>> for the course, but not seeing them there. However, both have the 
>> following
>> every 30 seconds:
>>
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
>> (line 165) Started replayAllFailedBatches
>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
>> clean in batchlog
>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
>> (line 200) Finished replayAllFailedBatches
>>
>> Is that just routine scheduled house-keeping or a sign of something
>> else?
>>
>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
>> wrote:
>>>
>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU%
>>> now. The others are 6GB
>>>
>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen 
>>> wrote:

 AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because
 we might go c3.2xlarge instead if CPU is more important than RAM
 Storage is optimized EBS SSD (but iostat shows no real IO going on)
 Each node only has 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Ok based on those numbers I have a theory..

can you show me nodetool tptats for all 3 nodes?

On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen  wrote:
>
> No problem with the follow up questions. I'm on a crash course here trying
> to understand what makes C* tick so I appreciate all feedback.
>
> We reprocessed all media (1200 partition keys) last night where partition
> keys had somewhere between 4k and 200k "rows". After that completed, no
> traffic went to cluster at all for ~8 hours and throughout today, we may
> get a couple (less than 10) queries per second and maybe 3-4 write batches
> per hour.
>
> I assume the last value in the Partition Size histogram is the largest row:
>
> 20924300 bytes: 79
> 25109160 bytes: 57
>
> The majority seems clustered around 20 bytes.
>
> I will look at switching my inserts to unlogged batches since they are
> always for one partition key.
>
> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla  wrote:
>>
>> Can you define what is "virtual no traffic" sorry to be repetitive about
>> that, but I've worked on a lot of clusters in the past year and people have
>> wildly different ideas what that means.
>>
>> unlogged batches of the same partition key are definitely a performance
>> optimization. Typically async is much faster and easier on the cluster when
>> you're using multip partition key batches.
>>
>> nodetool cfhistograms  
>>
>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen  wrote:
>>>
>>> Actually not sure why the machine was originally configured at 6GB since
>>> we even started it on an r3.large with 15GB.
>>>
>>> Re: Batches
>>>
>>> Not using batches. I actually have that as a separate question on the
>>> list. Currently I fan out async single inserts and I'm wondering if batches
>>> are better since my data is inherently inserted in blocks of ordered rows
>>> for a single partition key.
>>>
>>>
>>> Re: Traffic
>>>
>>> There isn't all that much traffic. Inserts come in as blocks per
>>> partition key, but then can be 5k-200k rows for that partition key. Each of
>>> these rows is less than 100k. It's small, lots of ordered rows. It's frame
>>> and sub-frame information for media. and rows for one piece of media is
>>> inserted at once (the partition key).
>>>
>>> For the last 12 hours, where the load on all these machine has been
>>> stuck there's been virtually no traffic at all. This is the nodes basically
>>> sitting idle, except that they had  load of 4 each.
>>>
>>> BTW, how do you determine widest row or for that matter number of
>>> tombstones in a row?
>>>
>>> thanks,
>>> arne
>>>
>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla 
>>> wrote:

 So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly
 enough to run Cassandra well in, especially if you're going full bore on
 loads. However, you maybe just flat out be CPU bound on your write
 throughput, how many TPS and what size writes do you have? Also what is
 your widest row?

 Final question what is compaction throughput at?


 On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen 
 wrote:
>
> The starting configuration I had, which is still running on two of the
> nodes, was 6GB Heap, 1024MB parnew which is close to what you are
> suggesting and those have been pegged at load 4 for the over 12 hours with
> hardly and read or write traffic. I will set one to 8GB/400MB and see if
> its load changes.
>
> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla 
> wrote:
>
>> So heap of that size without some tuning will create a number of
>> problems (high cpu usage one of them), I suggest either 8GB heap and 
>> 400mb
>> parnew (which I'd only set that low for that low cpu count) , or attempt
>> the tunings as indicated in
>> https://issues.apache.org/jira/browse/CASSANDRA-8150
>>
>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen 
>> wrote:
>>>
>>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20%
>>> now. Checked my dev cluster to see if the ParNew log entries are just 
>>> par
>>> for the course, but not seeing them there. However, both have the 
>>> following
>>> every 30 seconds:
>>>
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java
>>> (line 165) Started replayAllFailedBatches
>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is
>>> clean in batchlog
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java
>>> (line 200) Finished replayAllFailedBatches
>>>
>>> Is that just routine scheduled house-keeping or a sign of something
>>> else?
>>>
>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen 
>>> wrote:

 Sorry, I meant 15GB heap on the one machine that has less nice CPU%
 now. The others are 6GB

 On Tue, Dec 16, 2014 at 12:

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Of course QA decided to start a test batch (still relatively low traffic),
so I hope it doesn't throw the tpstats off too much

Node 1:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0   13804928 0
0
ReadStage 0 0  10975 0
0
RequestResponseStage  0 07725378 0
0
ReadRepairStage   0 0   1247 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 1 1 50 0
0
FlushWriter   0 0306 0
   31
MemoryMeter   0 0719 0
0
GossipStage   0 0 286505 0
0
CacheCleanupExecutor  0 0  0 0
0
InternalResponseStage 0 0  0 0
0
CompactionExecutor414159 0
0
ValidationExecutor0 0  0 0
0
MigrationStage0 0  0 0
0
commitlog_archiver0 0  0 0
0
AntiEntropyStage  0 0  0 0
0
PendingRangeCalculator0 0 11 0
0
MemtablePostFlusher   0 0   1781 0
0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION391041
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Node 2:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 0 997042 0
0
ReadStage 0 0   2623 0
0
RequestResponseStage  0 0 706650 0
0
ReadRepairStage   0 0275 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 2 2 12 0
0
FlushWriter   0 0 37 0
4
MemoryMeter   0 0 70 0
0
GossipStage   0 0  14927 0
0
CacheCleanupExecutor  0 0  0 0
0
InternalResponseStage 0 0  0 0
0
CompactionExecutor4 7 94 0
0
ValidationExecutor0 0  0 0
0
MigrationStage0 0  0 0
0
commitlog_archiver0 0  0 0
0
AntiEntropyStage  0 0  0 0
0
PendingRangeCalculator0 0  3 0
0
MemtablePostFlusher   0 0114 0
0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0

Node 3:
Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 01539324 0
0
ReadStage 0 0   2571 0
0
RequestResponseStage  0 0 373300 0
0
ReadRepairStage   0 0325 0
0
ReplicateOnWriteStage 0 0  0 0
0
MiscStage 0 0  0 0
0
HintedHandoff 1 1 21 0
0
FlushWriter   0 0 38 0
5
MemoryMeter   0 0 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so you've got some blocked flush writers but you have a incredibly large
number of dropped mutations, are you using secondary indexes? and if so how
many? what is your flush queue set to?

On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen  wrote:
>
> Of course QA decided to start a test batch (still relatively low traffic),
> so I hope it doesn't throw the tpstats off too much
>
> Node 1:
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MutationStage 0 0   13804928 0
> 0
> ReadStage 0 0  10975 0
> 0
> RequestResponseStage  0 07725378 0
> 0
> ReadRepairStage   0 0   1247 0
> 0
> ReplicateOnWriteStage 0 0  0 0
> 0
> MiscStage 0 0  0 0
> 0
> HintedHandoff 1 1 50 0
> 0
> FlushWriter   0 0306 0
>31
> MemoryMeter   0 0719 0
> 0
> GossipStage   0 0 286505 0
> 0
> CacheCleanupExecutor  0 0  0 0
> 0
> InternalResponseStage 0 0  0 0
> 0
> CompactionExecutor414159 0
> 0
> ValidationExecutor0 0  0 0
> 0
> MigrationStage0 0  0 0
> 0
> commitlog_archiver0 0  0 0
> 0
> AntiEntropyStage  0 0  0 0
> 0
> PendingRangeCalculator0 0 11 0
> 0
> MemtablePostFlusher   0 0   1781 0
> 0
>
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION391041
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
>
> Node 2:
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MutationStage 0 0 997042 0
> 0
> ReadStage 0 0   2623 0
> 0
> RequestResponseStage  0 0 706650 0
> 0
> ReadRepairStage   0 0275 0
> 0
> ReplicateOnWriteStage 0 0  0 0
> 0
> MiscStage 0 0  0 0
> 0
> HintedHandoff 2 2 12 0
> 0
> FlushWriter   0 0 37 0
> 4
> MemoryMeter   0 0 70 0
> 0
> GossipStage   0 0  14927 0
> 0
> CacheCleanupExecutor  0 0  0 0
> 0
> InternalResponseStage 0 0  0 0
> 0
> CompactionExecutor4 7 94 0
> 0
> ValidationExecutor0 0  0 0
> 0
> MigrationStage0 0  0 0
> 0
> commitlog_archiver0 0  0 0
> 0
> AntiEntropyStage  0 0  0 0
> 0
> PendingRangeCalculator0 0  3 0
> 0
> MemtablePostFlusher   0 0114 0
> 0
>
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
>
> Node 3:
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> MutationStage 0 01539324 0
> 0
> ReadStage 0 0   2571 0
> 0
> RequestResponseStage  0 0 373300 0
> 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Not using any secondary indicies and memtable_flush_queue_size is the default 4.

But let me tell you how data is "mutated" right now, maybe that will give you 
an insight on how this is happening

Basically the frame data table has the following primary key: PRIMARY KEY 
((id), trackid, "timestamp")

Generally data is inserted once. So day to day writes are all new rows.
However, when out process for generating analytics for these rows changes, we 
run the media back through again, causing overwrites.

Up until last night, this was just a new insert because the PK never changed so 
it was always 1-to-1 overwrite of every row.

Last night was the first time that a new change went in where the PK could 
actually change so now the process is always, DELETE by partition key, insert 
all rows for partition key, repeat.

We two tables that have similar frame data projections and some other 
aggregates with much smaller row count per partition key.

hope that helps,
arne

On Dec 16, 2014, at 2:46 PM, Ryan Svihla  wrote:

> so you've got some blocked flush writers but you have a incredibly large 
> number of dropped mutations, are you using secondary indexes? and if so how 
> many? what is your flush queue set to?
> 
> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen  wrote:
> Of course QA decided to start a test batch (still relatively low traffic), so 
> I hope it doesn't throw the tpstats off too much
> 
> Node 1:
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   13804928 0
>  0
> ReadStage 0 0  10975 0
>  0
> RequestResponseStage  0 07725378 0
>  0
> ReadRepairStage   0 0   1247 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 1 1 50 0
>  0
> FlushWriter   0 0306 0
> 31
> MemoryMeter   0 0719 0
>  0
> GossipStage   0 0 286505 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CompactionExecutor414159 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 11 0
>  0
> MemtablePostFlusher   0 0   1781 0
>  0
> 
> Message type   Dropped
> READ 0
> RANGE_SLICE  0
> _TRACE   0
> MUTATION391041
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE 0
> PAGED_RANGE  0
> READ_REPAIR  0
> 
> Node 2:
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0 997042 0
>  0
> ReadStage 0 0   2623 0
>  0
> RequestResponseStage  0 0 706650 0
>  0
> ReadRepairStage   0 0275 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 2 2 12 0
>  0
> FlushWriter   0 0 37 0
>  4
> MemoryMeter   0 0 70 0
>  0
> GossipStage   0 0  14927 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CompactionExecutor4 7 94 0
>  0
> ValidationExecutor 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so a delete is really another write for gc_grace_seconds (default 10 days),
if you get enough tombstones it can make managing your cluster a challenge
as is. open up cqlsh, turn on tracing and try a few queries..how many
tombstones are scanned for a given query? It's possible the heap problems
you're seeing are actually happening on the query side and not on the
ingest side, the severity of this depends on driver and cassandra version,
but older drivers and versions of cassandra could easily overload heap with
expensive selects, when layered over tombstones it's certainly becomes a
possibility this is your root cause.

Now this will primarily create more load on compaction and depending on
your cassandra version there maybe some other issue at work, but something
I can tell you is every time I see 1 dropped mutation I see a cluster that
was overloaded enough it had to shed load. If I see 200k I see a
cluster/configuration/hardware that is badly overloaded.

I suggest the following

   - trace some of the queries used in prod
   - monitor your ingest rate, see at what levels you run into issues
   (GCInspector log messages, dropped mutations, etc)
   - heap configuration we mentioned earlier..go ahead and monitor heap
   usage, if it hits 75% repeated this is an indication of heavy load
   - monitor dropped mutations..any dropped mutation is evidence of an
   overloaded server, again the root cause can be many other problems that are
   solvable with current hardware, and LOTS of people runs with nodes with
   similar configuration.


On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen  wrote:
>
> Not using any secondary indicies and memtable_flush_queue_size is the
> default 4.
>
> But let me tell you how data is "mutated" right now, maybe that will give
> you an insight on how this is happening
>
> Basically the frame data table has the following primary key: PRIMARY KEY
> ((id), trackid, "timestamp")
>
> Generally data is inserted once. So day to day writes are all new rows.
> However, when out process for generating analytics for these rows changes,
> we run the media back through again, causing overwrites.
>
> Up until last night, this was just a new insert because the PK never
> changed so it was always 1-to-1 overwrite of every row.
>
> Last night was the first time that a new change went in where the PK could
> actually change so now the process is always, DELETE by partition key,
> insert all rows for partition key, repeat.
>
> We two tables that have similar frame data projections and some other
> aggregates with much smaller row count per partition key.
>
> hope that helps,
> arne
>
> On Dec 16, 2014, at 2:46 PM, Ryan Svihla  wrote:
>
> so you've got some blocked flush writers but you have a incredibly large
> number of dropped mutations, are you using secondary indexes? and if so how
> many? what is your flush queue set to?
>
> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen  wrote:
>>
>> Of course QA decided to start a test batch (still relatively low
>> traffic), so I hope it doesn't throw the tpstats off too much
>>
>> Node 1:
>> Pool NameActive   Pending  Completed   Blocked
>>  All time blocked
>> MutationStage 0 0   13804928 0
>>   0
>> ReadStage 0 0  10975 0
>>   0
>> RequestResponseStage  0 07725378 0
>>   0
>> ReadRepairStage   0 0   1247 0
>>   0
>> ReplicateOnWriteStage 0 0  0 0
>>   0
>> MiscStage 0 0  0 0
>>   0
>> HintedHandoff 1 1 50 0
>>   0
>> FlushWriter   0 0306 0
>>  31
>> MemoryMeter   0 0719 0
>>   0
>> GossipStage   0 0 286505 0
>>   0
>> CacheCleanupExecutor  0 0  0 0
>>   0
>> InternalResponseStage 0 0  0 0
>>   0
>> CompactionExecutor414159 0
>>   0
>> ValidationExecutor0 0  0 0
>>   0
>> MigrationStage0 0  0 0
>>   0
>> commitlog_archiver0 0  0 0
>>   0
>> AntiEntropyStage  0 0  0 0
>>   0
>> PendingRangeCalculator0 0 11 0
>>   0
>> MemtablePostFlusher   0 0   1781 0
>>   0
>>
>> Message type   Dropped
>> READ  

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I just did a wide set of selects and ran across no tombstones. But while on the 
subject of gc_grace_seconds, any reason, on a small cluster not to set it to 
something low like a single day. It seems like 10 days is only need to large 
clusters undergoing long partition splits, or am i misunderstanding 
gc_grace_seconds.

Now, given all that, does any of this explain a high load when the cluster is 
idle? Is it compaction catching up and would manual forced compaction alleviate 
that?

thanks,
arne

On Dec 16, 2014, at 3:28 PM, Ryan Svihla  wrote:

> so a delete is really another write for gc_grace_seconds (default 10 days), 
> if you get enough tombstones it can make managing your cluster a challenge as 
> is. open up cqlsh, turn on tracing and try a few queries..how many tombstones 
> are scanned for a given query? It's possible the heap problems you're seeing 
> are actually happening on the query side and not on the ingest side, the 
> severity of this depends on driver and cassandra version, but older drivers 
> and versions of cassandra could easily overload heap with expensive selects, 
> when layered over tombstones it's certainly becomes a possibility this is 
> your root cause.
> 
> Now this will primarily create more load on compaction and depending on your 
> cassandra version there maybe some other issue at work, but something I can 
> tell you is every time I see 1 dropped mutation I see a cluster that was 
> overloaded enough it had to shed load. If I see 200k I see a 
> cluster/configuration/hardware that is badly overloaded.
> 
> I suggest the following
> trace some of the queries used in prod
> monitor your ingest rate, see at what levels you run into issues (GCInspector 
> log messages, dropped mutations, etc)
> heap configuration we mentioned earlier..go ahead and monitor heap usage, if 
> it hits 75% repeated this is an indication of heavy load
> monitor dropped mutations..any dropped mutation is evidence of an overloaded 
> server, again the root cause can be many other problems that are solvable 
> with current hardware, and LOTS of people runs with nodes with similar 
> configuration.
> 
> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen  wrote:
> Not using any secondary indicies and memtable_flush_queue_size is the default 
> 4.
> 
> But let me tell you how data is "mutated" right now, maybe that will give you 
> an insight on how this is happening
> 
> Basically the frame data table has the following primary key: PRIMARY KEY 
> ((id), trackid, "timestamp")
> 
> Generally data is inserted once. So day to day writes are all new rows.
> However, when out process for generating analytics for these rows changes, we 
> run the media back through again, causing overwrites.
> 
> Up until last night, this was just a new insert because the PK never changed 
> so it was always 1-to-1 overwrite of every row.
> 
> Last night was the first time that a new change went in where the PK could 
> actually change so now the process is always, DELETE by partition key, insert 
> all rows for partition key, repeat.
> 
> We two tables that have similar frame data projections and some other 
> aggregates with much smaller row count per partition key.
> 
> hope that helps,
> arne
> 
> On Dec 16, 2014, at 2:46 PM, Ryan Svihla  wrote:
> 
>> so you've got some blocked flush writers but you have a incredibly large 
>> number of dropped mutations, are you using secondary indexes? and if so how 
>> many? what is your flush queue set to?
>> 
>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen  wrote:
>> Of course QA decided to start a test batch (still relatively low traffic), 
>> so I hope it doesn't throw the tpstats off too much
>> 
>> Node 1:
>> Pool NameActive   Pending  Completed   Blocked  All 
>> time blocked
>> MutationStage 0 0   13804928 0   
>>   0
>> ReadStage 0 0  10975 0   
>>   0
>> RequestResponseStage  0 07725378 0   
>>   0
>> ReadRepairStage   0 0   1247 0   
>>   0
>> ReplicateOnWriteStage 0 0  0 0   
>>   0
>> MiscStage 0 0  0 0   
>>   0
>> HintedHandoff 1 1 50 0   
>>   0
>> FlushWriter   0 0306 0   
>>  31
>> MemoryMeter   0 0719 0   
>>   0
>> GossipStage   0 0 286505 0   
>>   0
>> CacheCleanupExecutor  0 0  0 0   
>>   0
>> InternalResponseStage 0 0  0 0   
>>   0
>> CompactionExecutor4

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
manual forced compactions create more problems than they solve, if you have
no evidence of tombstones in your selects (which seems odd, can you share
some of the tracing output?), then I'm not sure what it would solve for you.

Compaction running could explain a high load, logs messages with ERRORS,
WARN, GCInspector are all meaningful there, I suggest search jira for your
version to see if there are any interesting bugs.



On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen  wrote:
>
> I just did a wide set of selects and ran across no tombstones. But while
> on the subject of gc_grace_seconds, any reason, on a small cluster not to
> set it to something low like a single day. It seems like 10 days is only
> need to large clusters undergoing long partition splits, or am i
> misunderstanding gc_grace_seconds.
>
> Now, given all that, does any of this explain a high load when the cluster
> is idle? Is it compaction catching up and would manual forced compaction
> alleviate that?
>
> thanks,
> arne
>
>
> On Dec 16, 2014, at 3:28 PM, Ryan Svihla  wrote:
>
> so a delete is really another write for gc_grace_seconds (default 10
> days), if you get enough tombstones it can make managing your cluster a
> challenge as is. open up cqlsh, turn on tracing and try a few queries..how
> many tombstones are scanned for a given query? It's possible the heap
> problems you're seeing are actually happening on the query side and not on
> the ingest side, the severity of this depends on driver and cassandra
> version, but older drivers and versions of cassandra could easily overload
> heap with expensive selects, when layered over tombstones it's certainly
> becomes a possibility this is your root cause.
>
> Now this will primarily create more load on compaction and depending on
> your cassandra version there maybe some other issue at work, but something
> I can tell you is every time I see 1 dropped mutation I see a cluster that
> was overloaded enough it had to shed load. If I see 200k I see a
> cluster/configuration/hardware that is badly overloaded.
>
> I suggest the following
>
>- trace some of the queries used in prod
>- monitor your ingest rate, see at what levels you run into issues
>(GCInspector log messages, dropped mutations, etc)
>- heap configuration we mentioned earlier..go ahead and monitor heap
>usage, if it hits 75% repeated this is an indication of heavy load
>- monitor dropped mutations..any dropped mutation is evidence of an
>overloaded server, again the root cause can be many other problems that are
>solvable with current hardware, and LOTS of people runs with nodes with
>similar configuration.
>
>
> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen  wrote:
>>
>> Not using any secondary indicies and memtable_flush_queue_size is the
>> default 4.
>>
>> But let me tell you how data is "mutated" right now, maybe that will give
>> you an insight on how this is happening
>>
>> Basically the frame data table has the following primary key: PRIMARY KEY
>> ((id), trackid, "timestamp")
>>
>> Generally data is inserted once. So day to day writes are all new rows.
>> However, when out process for generating analytics for these rows
>> changes, we run the media back through again, causing overwrites.
>>
>> Up until last night, this was just a new insert because the PK never
>> changed so it was always 1-to-1 overwrite of every row.
>>
>> Last night was the first time that a new change went in where the PK
>> could actually change so now the process is always, DELETE by partition
>> key, insert all rows for partition key, repeat.
>>
>> We two tables that have similar frame data projections and some other
>> aggregates with much smaller row count per partition key.
>>
>> hope that helps,
>> arne
>>
>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla  wrote:
>>
>> so you've got some blocked flush writers but you have a incredibly large
>> number of dropped mutations, are you using secondary indexes? and if so how
>> many? what is your flush queue set to?
>>
>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen  wrote:
>>>
>>> Of course QA decided to start a test batch (still relatively low
>>> traffic), so I hope it doesn't throw the tpstats off too much
>>>
>>> Node 1:
>>> Pool NameActive   Pending  Completed   Blocked
>>>  All time blocked
>>> MutationStage 0 0   13804928 0
>>>   0
>>> ReadStage 0 0  10975 0
>>>   0
>>> RequestResponseStage  0 07725378 0
>>>   0
>>> ReadRepairStage   0 0   1247 0
>>>   0
>>> ReplicateOnWriteStage 0 0  0 0
>>>   0
>>> MiscStage 0 0  0 0
>>>   0
>>> HintedHandoff 1 1 50 0
>>> 

Questions about bootrapping and compactions during bootstrapping

2014-12-16 Thread Donald Smith
Looking at the output of "nodetool netstats" I see that the bootstrapping nodes 
pulling from only two of the nine nodes currently in the datacenter.   That 
surprises me: I'd think the vnodes it pulls from would be randomly spread 
across the existing nodes.  We're using Cassandra 2.0.11 with 256 vnodes each.

I also notice that while bootstrapping, the node is quite busy doing 
compactions.   There are over 1000 pending compactions on the new node and it's 
not finished bootstrapping. I'd think those would be unnecessary, since the 
other nodes in the data center have zero pending compactions.  Perhaps the 
compactions explains why running "du -hs /var/lib/cassandra/data" on the new 
node shows more disk space usage than on the old nodes.

Is it reasonable to do "nodetool disableautocompaction" on the bootstrapping 
node? Should that be the default???

If I start bootstrapping one node, it's not yet in the cluster but it decides 
which token ranges it owns and requests streams for that data. If  I then try 
to bootstrap a SECOND node concurrently, it will take over ownership of some 
token ranges from the first node. Will the first node then adjust what data it 
streams?

It seems to me the cassandra server needs to keep track of both the OLD token 
ranges and vnodes and the NEW ones.  I'm not convinced that running two 
bootstraps concurrently (starting the second one after several minutes of 
delay) is safe.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com

[AudienceScience]



Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
That's just the thing. There is nothing in the logs except the constant ParNew 
collections like

DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC 
for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888

But the load is staying continuously high.

There's always some compaction on just that one table, media_tracks_raw going 
on and those values rarely changed (certainly the remaining time is meaningless)

pending tasks: 17
  compaction typekeyspace   table   completed   
total  unit  progress
   Compaction   mediamedia_tracks_raw   444294932  
1310653468 bytes33.90%
   Compaction   mediamedia_tracks_raw   131931354  
3411631999 bytes 3.87%
   Compaction   mediamedia_tracks_raw30308970 
23097672194 bytes 0.13%
   Compaction   mediamedia_tracks_raw   899216961  
1815591081 bytes49.53%
Active compaction remaining time :   0h27m56s

Here's a sample of a query trace:

 activity   
  | timestamp| source| source_elapsed
--+--+---+
   
execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
 Parsing select * from media_tracks_raw where id 
=74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 
| 47
  
Preparing statement | 00:11:46,612 | 10.140.22.236 |234
 Sending 
message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
 Message received 
from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
 Executing single-partition query 
on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
 Acquiring 
sstable references | 00:11:46,644 |  10.140.21.54 |  22029
  Merging 
memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
Bloom filter allows 
skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
Bloom filter allows 
skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
Bloom filter allows 
skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
Bloom filter allows 
skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
Bloom filter allows 
skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
Bloom filter allows 
skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
Bloom filter allows 
skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
Bloom filter allows 
skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
Bloom filter allows 
skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
Bloom filter allows 
skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
Bloom filter allows 
skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
Bloom filter allows 
skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
Bloom filter allows 
skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
Bloom filter allows 
skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
Bloom filter allows 
skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |  22441
Bloom filter allows 
skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |  22452
Bloom 

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What version of Cassandra?
On Dec 16, 2014 6:36 PM, "Arne Claassen"  wrote:

> That's just the thing. There is nothing in the logs except the constant
> ParNew collections like
>
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
> 8000634888
>
> But the load is staying continuously high.
>
> There's always some compaction on just that one table, media_tracks_raw
> going on and those values rarely changed (certainly the remaining time is
> meaningless)
>
> pending tasks: 17
>   compaction typekeyspace   table   completed
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   444294932
>  1310653468 bytes33.90%
>Compaction   mediamedia_tracks_raw   131931354
>  3411631999 bytes 3.87%
>Compaction   mediamedia_tracks_raw30308970
> 23097672194 bytes 0.13%
>Compaction   mediamedia_tracks_raw   899216961
>  1815591081 bytes49.53%
> Active compaction remaining time :   0h27m56s
>
> Here's a sample of a query trace:
>
>  activity
> | timestamp| source| source_elapsed
>
> --+--+---+
>
>  execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
>  Parsing select * from media_tracks_raw where id
> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 |
> 10.140.22.236 | 47
>
> Preparing statement | 00:11:46,612 | 10.140.22.236 |234
>  Sending
> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
>  Message
> received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 |
> 12
>  Executing single-partition
> query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
>
>  Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |
>  22029
>
> Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
> Bloom filter
> allows skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
> Bloom filter
> allows skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
> Bloom filter
> allows skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
> Bloom filter
> allows skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
> Bloom filter
> allows skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
> Bloom filter
> allows skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
> Bloom filter
> allows skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
> Bloom filter
> allows skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
> Bloom filter
> allows skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
> Bloom filter
> allows skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
> Bloom filter
> allows skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
> Bloom filter
> allows skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
> Bloom filter
> allows skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
> Bloom filter
> allows skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
> Bloom filter
> allows skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |  22441
> Bloom filter
> allows skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |  22452
> Bloom filter
> allows skipping sstable 1328 | 00:11:46,644 |  10.140.21.54 |  22463
> Bloom filter
> allows

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Cassandra 2.0.10 and Datastax Java Driver 2.1.1

On Dec 16, 2014, at 4:48 PM, Ryan Svihla  wrote:

> What version of Cassandra?
> 
> On Dec 16, 2014 6:36 PM, "Arne Claassen"  wrote:
> That's just the thing. There is nothing in the logs except the constant 
> ParNew collections like
> 
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
> GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
> 
> But the load is staying continuously high.
> 
> There's always some compaction on just that one table, media_tracks_raw going 
> on and those values rarely changed (certainly the remaining time is 
> meaningless)
> 
> pending tasks: 17
>   compaction typekeyspace   table   completed 
>   total  unit  progress
>Compaction   mediamedia_tracks_raw   444294932 
>  1310653468 bytes33.90%
>Compaction   mediamedia_tracks_raw   131931354 
>  3411631999 bytes 3.87%
>Compaction   mediamedia_tracks_raw30308970 
> 23097672194 bytes 0.13%
>Compaction   mediamedia_tracks_raw   899216961 
>  1815591081 bytes49.53%
> Active compaction remaining time :   0h27m56s
> 
> Here's a sample of a query trace:
> 
>  activity 
> | timestamp| source| source_elapsed
> --+--+---+
>   
>  execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
>  Parsing select * from media_tracks_raw where id 
> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
> 10.140.22.236 | 47
>   
> Preparing statement | 00:11:46,612 | 10.140.22.236 |234
>  Sending 
> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
>  Message received 
> from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
>  Executing single-partition query 
> on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
>  
> Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |  22029
>   Merging 
> memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
> Bloom filter allows 
> skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
> Bloom filter allows 
> skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
> Bloom filter allows 
> skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
> Bloom filter allows 
> skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
> Bloom filter allows 
> skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
> Bloom filter allows 
> skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
> Bloom filter allows 
> skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
> Bloom filter allows 
> skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
> Bloom filter allows 
> skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
> Bloom filter allows 
> skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
> Bloom filter allows 
> skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
> Bloom filter allows 
> skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
> Bloom filter allows 
> skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |  22408
> Bloom filter allows 
> skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |  22429
>  

[Consitency on cqlsh command prompt]

2014-12-16 Thread nitin padalia
Hi,

When I set Consistency to QUORUM in cqlsh command line. It says
consistency is set to quorum.

cqlsh:testdb> CONSISTENCY QUORUM ;
Consistency level set to QUORUM.

However when I check it back using CONSISTENCY command on the prompt
it says consistency is 4. However it should be 2 as my replication
factor for the keyspace is 3.
cqlsh:testdb> CONSISTENCY ;
Current consistency level is 4.

Isn't consistency QUORUM calculated by: (replication_factor/2)+1?
Where replication_factor/2 is rounded down.

If yes then why consistency is displayed as 4, however it should be 2
(3/2 = 1.5 = 1)+1 = 2.

I am using Casssandra version 2.1.2 and cqlsh 5.0.1 and CQL spec 3.2.0


Thanks! in advance.
Nitin Padalia


Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jens Rantil
Maybe checking which thread(s) would hint what's going on? (see 
http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/).

On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen  wrote:

> Cassandra 2.0.10 and Datastax Java Driver 2.1.1
> On Dec 16, 2014, at 4:48 PM, Ryan Svihla  wrote:
>> What version of Cassandra?
>> 
>> On Dec 16, 2014 6:36 PM, "Arne Claassen"  wrote:
>> That's just the thing. There is nothing in the logs except the constant 
>> ParNew collections like
>> 
>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
>> GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
>> 
>> But the load is staying continuously high.
>> 
>> There's always some compaction on just that one table, media_tracks_raw 
>> going on and those values rarely changed (certainly the remaining time is 
>> meaningless)
>> 
>> pending tasks: 17
>>   compaction typekeyspace   table   completed
>>total  unit  progress
>>Compaction   mediamedia_tracks_raw   444294932
>>   1310653468 bytes33.90%
>>Compaction   mediamedia_tracks_raw   131931354
>>   3411631999 bytes 3.87%
>>Compaction   mediamedia_tracks_raw30308970
>>  23097672194 bytes 0.13%
>>Compaction   mediamedia_tracks_raw   899216961
>>   1815591081 bytes49.53%
>> Active compaction remaining time :   0h27m56s
>> 
>> Here's a sample of a query trace:
>> 
>>  activity
>>  | timestamp| source| source_elapsed
>> --+--+---+
>>  
>>   execute_cql3_query | 00:11:46,612 | 10.140.22.236 |  0
>>  Parsing select * from media_tracks_raw where id 
>> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
>> 10.140.22.236 | 47
>>  
>>  Preparing statement | 00:11:46,612 | 10.140.22.236 |234
>>  Sending 
>> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |   7190
>>  Message 
>> received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 | 12
>>  Executing single-partition 
>> query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |  21971
>>  
>> Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |  22029
>>   
>> Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |  22131
>> Bloom filter allows 
>> skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |  22245
>> Bloom filter allows 
>> skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |  22279
>> Bloom filter allows 
>> skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |  22293
>> Bloom filter allows 
>> skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |  22304
>> Bloom filter allows 
>> skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |  22317
>> Bloom filter allows 
>> skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |  22328
>> Bloom filter allows 
>> skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |  22340
>> Bloom filter allows 
>> skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |  22352
>> Bloom filter allows 
>> skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |  22363
>> Bloom filter allows 
>> skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |  22374
>> Bloom filter allows 
>> skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |  22386
>> Bloom filter allows 
>> skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |  22397
>>