Re: Bottleneck for small inserts?

2017-05-25 Thread Eric Pederson
Due to a cut and paste error those flamegraphs were a recording of the
whole system, not just Cassandra.Throughput is approximately 30k
rows/sec.

Here's the graphs with just the Cassandra PID:

   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars2.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars2.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars2.svg


And here's graphs during a cqlsh COPY FROM to the same table, using real
data, MAXBATCHSIZE=2.Throughput is good at approximately 110k rows/sec.

   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_cars_batch2.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_cars_batch2.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_cars_batch2.svg




-- Eric

On Thu, May 25, 2017 at 6:44 PM, Eric Pederson  wrote:

> Totally understood :)
>
> I forgot to mention - I set the /proc/irq/*/smp_affinity mask to include
> all of the CPUs.  Actually most of them were set that way already (for
> example, ,) - it might be because irqbalanced is
> running.  But for some reason the interrupts are all being handled on CPU 0
> anyway.
>
> I see this in /var/log/dmesg on the machines:
>
>>
>> Your BIOS has requested that x2apic be disabled.
>> This will leave your machine vulnerable to irq-injection attacks.
>> Use 'intremap=no_x2apic_optout' to override BIOS request.
>> Enabled IRQ remapping in xapic mode
>> x2apic not enabled, IRQ remapping is in xapic mode
>
>
> In a reply to one of the comments, he says:
>
>
> When IO-APIC configured to spread interrupts among all cores, it can
>> handle up to eight cores. If you have more than eight cores, kernel will
>> not configure IO-APIC to spread interrupts. Thus the trick I described in
>> the article will not work.
>> Otherwise it may be caused by buggy BIOS or even buggy hardware.
>
>
> I'm not sure if either of them is relevant to my situation.
>
>
> Thanks!
>
>
>
>
>
> -- Eric
>
> On Thu, May 25, 2017 at 4:16 PM, Jonathan Haddad 
> wrote:
>
>> You shouldn't need a kernel recompile.  Check out the section "Simple
>> solution for the problem" in http://www.alexonlinux.com/
>> smp-affinity-and-proper-interrupt-handling-in-linux.  You can balance
>> your requests across up to 8 CPUs.
>>
>> I'll check out the flame graphs in a little bit - in the middle of
>> something and my brain doesn't multitask well :)
>>
>> On Thu, May 25, 2017 at 1:06 PM Eric Pederson  wrote:
>>
>>> Hi Jonathan -
>>>
>>> It looks like these machines are configured to use CPU 0 for all I/O
>>> interrupts.  I don't think I'm going to get the OK to compile a new kernel
>>> for them to balance the interrupts across CPUs, but to mitigate the problem
>>> I taskset the Cassandra process to run on all CPU except 0.  It didn't
>>> change the performance though.  Let me know if you think it's crucial that
>>> we balance the interrupts across CPUs and I can try to lobby for a new
>>> kernel.
>>>
>>> Here are flamegraphs from each node from a cassandra-stress ingest into
>>> a table representative of the what we are going to be using.   This table
>>> is also roughly 200 bytes, with 64 columns and the primary key (date,
>>> sequence_number).  Cassandra-stress was run on 3 separate client
>>> machines.  Using cassandra-stress to write to this table I see the same
>>> thing: neither disk, CPU or network is fully utilized.
>>>
>>>- http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>05/flamegraph_ultva01_sars.svg
>>>- http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>05/flamegraph_ultva02_sars.svg
>>>- http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>05/flamegraph_ultva03_sars.svg
>>>
>>> Re: GC: In the stress run with the parameters above, two of the three
>>> nodes log zero or one GCInspectors.  On the other hand, the 3rd machine
>>> logs a GCInspector every 5 seconds or so, 300-500ms each time.  I found
>>> out that the 3rd machine actually has different specs as the other two.
>>> It's an older box with the same RAM but less CPUs (32 instead of 48), a
>>> slower SSD and slower memory.   The Cassandra configuration is exactly the
>>> same.   I tried running Cassandra with only 32 CPUs on the newer boxes to
>>> see if that would cause them to GC pause more, but it didn't.
>>>
>>> On a separate topic - for this cassandra-stress run I reduced the batch
>>> size to 2 in order to keep the logs clean.  That also reduced the
>>> throughput from around 100k rows/second to 32k rows/sec.  I've been doing
>>> ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom
>>> C++ application.  In most of the tests that I've been doing I've been using
>>> a batch size of around 20 

Re: Bottleneck for small inserts?

2017-05-25 Thread Eric Pederson
Totally understood :)

I forgot to mention - I set the /proc/irq/*/smp_affinity mask to include
all of the CPUs.  Actually most of them were set that way already (for
example, ,) - it might be because irqbalanced is running.
But for some reason the interrupts are all being handled on CPU 0 anyway.

I see this in /var/log/dmesg on the machines:

>
> Your BIOS has requested that x2apic be disabled.
> This will leave your machine vulnerable to irq-injection attacks.
> Use 'intremap=no_x2apic_optout' to override BIOS request.
> Enabled IRQ remapping in xapic mode
> x2apic not enabled, IRQ remapping is in xapic mode


In a reply to one of the comments, he says:


When IO-APIC configured to spread interrupts among all cores, it can handle
> up to eight cores. If you have more than eight cores, kernel will not
> configure IO-APIC to spread interrupts. Thus the trick I described in the
> article will not work.
> Otherwise it may be caused by buggy BIOS or even buggy hardware.


I'm not sure if either of them is relevant to my situation.


Thanks!





-- Eric

On Thu, May 25, 2017 at 4:16 PM, Jonathan Haddad  wrote:

> You shouldn't need a kernel recompile.  Check out the section "Simple
> solution for the problem" in http://www.alexonlinux.com/
> smp-affinity-and-proper-interrupt-handling-in-linux.  You can balance
> your requests across up to 8 CPUs.
>
> I'll check out the flame graphs in a little bit - in the middle of
> something and my brain doesn't multitask well :)
>
> On Thu, May 25, 2017 at 1:06 PM Eric Pederson  wrote:
>
>> Hi Jonathan -
>>
>> It looks like these machines are configured to use CPU 0 for all I/O
>> interrupts.  I don't think I'm going to get the OK to compile a new kernel
>> for them to balance the interrupts across CPUs, but to mitigate the problem
>> I taskset the Cassandra process to run on all CPU except 0.  It didn't
>> change the performance though.  Let me know if you think it's crucial that
>> we balance the interrupts across CPUs and I can try to lobby for a new
>> kernel.
>>
>> Here are flamegraphs from each node from a cassandra-stress ingest into
>> a table representative of the what we are going to be using.   This table
>> is also roughly 200 bytes, with 64 columns and the primary key (date,
>> sequence_number).  Cassandra-stress was run on 3 separate client
>> machines.  Using cassandra-stress to write to this table I see the same
>> thing: neither disk, CPU or network is fully utilized.
>>
>>- http://sourcedelica.com/wordpress/wp-content/uploads/
>>2017/05/flamegraph_ultva01_sars.svg
>>
>> 
>>- http://sourcedelica.com/wordpress/wp-content/uploads/
>>2017/05/flamegraph_ultva02_sars.svg
>>
>> 
>>- http://sourcedelica.com/wordpress/wp-content/uploads/
>>2017/05/flamegraph_ultva03_sars.svg
>>
>> 
>>
>> Re: GC: In the stress run with the parameters above, two of the three
>> nodes log zero or one GCInspectors.  On the other hand, the 3rd machine
>> logs a GCInspector every 5 seconds or so, 300-500ms each time.  I found
>> out that the 3rd machine actually has different specs as the other two.
>> It's an older box with the same RAM but less CPUs (32 instead of 48), a
>> slower SSD and slower memory.   The Cassandra configuration is exactly the
>> same.   I tried running Cassandra with only 32 CPUs on the newer boxes to
>> see if that would cause them to GC pause more, but it didn't.
>>
>> On a separate topic - for this cassandra-stress run I reduced the batch
>> size to 2 in order to keep the logs clean.  That also reduced the
>> throughput from around 100k rows/second to 32k rows/sec.  I've been doing
>> ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom C++
>> application.  In most of the tests that I've been doing I've been using a
>> batch size of around 20 (unlogged, all batch rows have the same partition
>> key).  However, it fills the logs with batch size warnings.  I was going to
>> raise the batch warning size but the docs scared me away from doing that.
>> Given that we're using unlogged/same partition batches is it safe to raise
>> the batch size warning limit?   Actually cqlsh COPY FROM has very good
>> throughput using a small batch size, but I can't get that same throughput
>> in cassandra-stress or my C++ app with a batch size of 2.
>>
>> Thanks!
>>
>>
>>
>> -- Eric
>>
>> On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad 
>> wrote:
>>
>>> How many CPUs are you using for interrupts?  http://www.alexonlinux.com/
>>> smp-affinity-and-proper-interrupt-handling-in-linux
>>>
>>> Have you tried making a flame graph to see where Cassandra is spending
>>> its time? 

Re: Effect of frequent mutations / memtable

2017-05-25 Thread Thakrar, Jayesh
That's because Zookeeper is purpose built for such a kind of usage.
Its asynchronous nature  - e.g. you can create "watchers" with callbacks so 
that when ephemeral nodes die/disappear (due to servers crashing) makes it 
better to program.
It also reduces the "checkin" and "polling" cycle overhead.
Furthermore, zk does not have the "overhead" of other things that Cassandra 
does.

Honestly I am not familiar with Paxos and stuff, so can't speak to it.



On 5/25/17, 3:40 PM, "Jan Algermissen"  wrote:

Hi Jayesh,


On 25 May 2017, at 18:31, Thakrar, Jayesh wrote:

> Hi Jan,
>
> I would suggest looking at using Zookeeper for such a usecase.

thanks - yes, it is an alternative.

Out of curiosity: since both, Zk and C* implement Paxos to enable such 
kind of thing, why do you think Zookeeper would be a better fit?

Jan

>
> See http://zookeeper.apache.org/doc/trunk/recipes.html for some 
> examples.
>
> Zookeeper is used for such purposes in Apache HBase (active master), 
> Apache Kafka (active controller), Apache Hadoop, etc.
>
> Look for the "Leader Election" usecase.
> Examples
> http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
> https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm
>
> Its more/new work, but should be an elegant solution.
>
> Hope that helps.
> Jayesh
>
> On 5/25/17, 9:19 AM, "Jan Algermissen"  
> wrote:
>
> Hi,
>
> I am using a updates to a column with a ttl to represent a lock. 
> The
> owning process keeps updating the lock's TTL as long as it is 
> running.
> If the process crashes, the lock will timeout and be deleted. Then
> another process can take over.
>
> I have used this pattern very successfully over years with TTLs in 
> the
> order of tens of seconds.
>
> Now I have a use case in mind that would require much smaller 
> TTLs, e.g.
> 1 or two seconds and I am worried about the increased number of
> mutations and possible effect on SSTables.
>
> However: I'd assume these frequent updates on a cell to mostly 
> happen in
> the memtable resulting in only occasional manifestation in 
> SSTables.
>
> Is that assumption correct and if so, what config parameters 
> should I
> tweak to keep the memtable from being flushed for longer periods 
> of
> time?
>
>
> Jan




Re: Effect of frequent mutations / memtable

2017-05-25 Thread Jan Algermissen

Hi Jayesh,


On 25 May 2017, at 18:31, Thakrar, Jayesh wrote:


Hi Jan,

I would suggest looking at using Zookeeper for such a usecase.


thanks - yes, it is an alternative.

Out of curiosity: since both, Zk and C* implement Paxos to enable such 
kind of thing, why do you think Zookeeper would be a better fit?


Jan



See http://zookeeper.apache.org/doc/trunk/recipes.html for some 
examples.


Zookeeper is used for such purposes in Apache HBase (active master), 
Apache Kafka (active controller), Apache Hadoop, etc.


Look for the "Leader Election" usecase.
Examples
http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm

Its more/new work, but should be an elegant solution.

Hope that helps.
Jayesh

On 5/25/17, 9:19 AM, "Jan Algermissen"  
wrote:


Hi,

I am using a updates to a column with a ttl to represent a lock. 
The
owning process keeps updating the lock's TTL as long as it is 
running.

If the process crashes, the lock will timeout and be deleted. Then
another process can take over.

I have used this pattern very successfully over years with TTLs in 
the

order of tens of seconds.

Now I have a use case in mind that would require much smaller 
TTLs, e.g.

1 or two seconds and I am worried about the increased number of
mutations and possible effect on SSTables.

However: I'd assume these frequent updates on a cell to mostly 
happen in
the memtable resulting in only occasional manifestation in 
SSTables.


Is that assumption correct and if so, what config parameters 
should I
tweak to keep the memtable from being flushed for longer periods 
of

time?


Jan


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Bottleneck for small inserts?

2017-05-25 Thread Jonathan Haddad
You shouldn't need a kernel recompile.  Check out the section "Simple
solution for the problem" in
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux.
You can balance your requests across up to 8 CPUs.

I'll check out the flame graphs in a little bit - in the middle of
something and my brain doesn't multitask well :)

On Thu, May 25, 2017 at 1:06 PM Eric Pederson  wrote:

> Hi Jonathan -
>
> It looks like these machines are configured to use CPU 0 for all I/O
> interrupts.  I don't think I'm going to get the OK to compile a new kernel
> for them to balance the interrupts across CPUs, but to mitigate the problem
> I taskset the Cassandra process to run on all CPU except 0.  It didn't
> change the performance though.  Let me know if you think it's crucial that
> we balance the interrupts across CPUs and I can try to lobby for a new
> kernel.
>
> Here are flamegraphs from each node from a cassandra-stress ingest into a
> table representative of the what we are going to be using.   This table is
> also roughly 200 bytes, with 64 columns and the primary key (date,
> sequence_number).  Cassandra-stress was run on 3 separate client
> machines.  Using cassandra-stress to write to this table I see the same
> thing: neither disk, CPU or network is fully utilized.
>
>-
>
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars.svg
>-
>
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars.svg
>-
>
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars.svg
>
> Re: GC: In the stress run with the parameters above, two of the three
> nodes log zero or one GCInspectors.  On the other hand, the 3rd machine
> logs a GCInspector every 5 seconds or so, 300-500ms each time.  I found
> out that the 3rd machine actually has different specs as the other two.
> It's an older box with the same RAM but less CPUs (32 instead of 48), a
> slower SSD and slower memory.   The Cassandra configuration is exactly the
> same.   I tried running Cassandra with only 32 CPUs on the newer boxes to
> see if that would cause them to GC pause more, but it didn't.
>
> On a separate topic - for this cassandra-stress run I reduced the batch
> size to 2 in order to keep the logs clean.  That also reduced the
> throughput from around 100k rows/second to 32k rows/sec.  I've been doing
> ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom C++
> application.  In most of the tests that I've been doing I've been using a
> batch size of around 20 (unlogged, all batch rows have the same partition
> key).  However, it fills the logs with batch size warnings.  I was going to
> raise the batch warning size but the docs scared me away from doing that.
> Given that we're using unlogged/same partition batches is it safe to raise
> the batch size warning limit?   Actually cqlsh COPY FROM has very good
> throughput using a small batch size, but I can't get that same throughput
> in cassandra-stress or my C++ app with a batch size of 2.
>
> Thanks!
>
>
>
> -- Eric
>
> On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad 
> wrote:
>
>> How many CPUs are you using for interrupts?
>> http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
>>
>> Have you tried making a flame graph to see where Cassandra is spending
>> its time?
>> http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
>>
>> Are you tracking GC pauses?
>>
>> Jon
>>
>> On Mon, May 22, 2017 at 2:03 PM Eric Pederson  wrote:
>>
>>> Hi all:
>>>
>>> I'm new to Cassandra and I'm doing some performance testing.  One of
>>> things that I'm testing is ingestion throughput.   My server setup is:
>>>
>>>- 3 node cluster
>>>- SSD data (both commit log and sstables are on the same disk)
>>>- 64 GB RAM per server
>>>- 48 cores per server
>>>- Cassandra 3.0.11
>>>- 48 Gb heap using G1GC
>>>- 1 Gbps NICs
>>>
>>> Since I'm using SSD I've tried tuning the following (one at a time) but
>>> none seemed to make a lot of difference:
>>>
>>>- concurrent_writes=384
>>>- memtable_flush_writers=8
>>>- concurrent_compactors=8
>>>
>>> I am currently doing ingestion tests sending data from 3 clients on the
>>> same subnet.  I am using cassandra-stress to do some ingestion testing.
>>> The tests are using CL=ONE and RF=2.
>>>
>>> Using cassandra-stress (3.10) I am able to saturate the disk using a
>>> large enough column size and the standard five column cassandra-stress
>>> schema.  For example, -col size=fixed(400) will saturate the disk and
>>> compactions will start falling behind.
>>>
>>> One of our main tables has a row size that approximately 200 bytes,
>>> across 64 columns.  When ingesting this table I don't see any resource
>>> saturation.  Disk utilization is around 10-15% per iostat.  Incoming
>>> network traffic on the servers is around 

Re: Bottleneck for small inserts?

2017-05-25 Thread Eric Pederson
Hi Jonathan -

It looks like these machines are configured to use CPU 0 for all I/O
interrupts.  I don't think I'm going to get the OK to compile a new kernel
for them to balance the interrupts across CPUs, but to mitigate the problem
I taskset the Cassandra process to run on all CPU except 0.  It didn't
change the performance though.  Let me know if you think it's crucial that
we balance the interrupts across CPUs and I can try to lobby for a new
kernel.

Here are flamegraphs from each node from a cassandra-stress ingest into a
table representative of the what we are going to be using.   This table is
also roughly 200 bytes, with 64 columns and the primary key (date,
sequence_number).  Cassandra-stress was run on 3 separate client machines.
Using cassandra-stress to write to this table I see the same thing: neither
disk, CPU or network is fully utilized.

   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars.svg

Re: GC: In the stress run with the parameters above, two of the three nodes
log zero or one GCInspectors.  On the other hand, the 3rd machine logs a
GCInspector every 5 seconds or so, 300-500ms each time.  I found out that
the 3rd machine actually has different specs as the other two.  It's an
older box with the same RAM but less CPUs (32 instead of 48), a slower SSD
and slower memory.   The Cassandra configuration is exactly the same.   I
tried running Cassandra with only 32 CPUs on the newer boxes to see if that
would cause them to GC pause more, but it didn't.

On a separate topic - for this cassandra-stress run I reduced the batch
size to 2 in order to keep the logs clean.  That also reduced the
throughput from around 100k rows/second to 32k rows/sec.  I've been doing
ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom C++
application.  In most of the tests that I've been doing I've been using a
batch size of around 20 (unlogged, all batch rows have the same partition
key).  However, it fills the logs with batch size warnings.  I was going to
raise the batch warning size but the docs scared me away from doing that.
Given that we're using unlogged/same partition batches is it safe to raise
the batch size warning limit?   Actually cqlsh COPY FROM has very good
throughput using a small batch size, but I can't get that same throughput
in cassandra-stress or my C++ app with a batch size of 2.

Thanks!



-- Eric

On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad  wrote:

> How many CPUs are you using for interrupts?  http://www.alexonlinux.com/
> smp-affinity-and-proper-interrupt-handling-in-linux
>
> Have you tried making a flame graph to see where Cassandra is spending its
> time? http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
>
> Are you tracking GC pauses?
>
> Jon
>
> On Mon, May 22, 2017 at 2:03 PM Eric Pederson  wrote:
>
>> Hi all:
>>
>> I'm new to Cassandra and I'm doing some performance testing.  One of
>> things that I'm testing is ingestion throughput.   My server setup is:
>>
>>- 3 node cluster
>>- SSD data (both commit log and sstables are on the same disk)
>>- 64 GB RAM per server
>>- 48 cores per server
>>- Cassandra 3.0.11
>>- 48 Gb heap using G1GC
>>- 1 Gbps NICs
>>
>> Since I'm using SSD I've tried tuning the following (one at a time) but
>> none seemed to make a lot of difference:
>>
>>- concurrent_writes=384
>>- memtable_flush_writers=8
>>- concurrent_compactors=8
>>
>> I am currently doing ingestion tests sending data from 3 clients on the
>> same subnet.  I am using cassandra-stress to do some ingestion testing.
>> The tests are using CL=ONE and RF=2.
>>
>> Using cassandra-stress (3.10) I am able to saturate the disk using a
>> large enough column size and the standard five column cassandra-stress
>> schema.  For example, -col size=fixed(400) will saturate the disk and
>> compactions will start falling behind.
>>
>> One of our main tables has a row size that approximately 200 bytes,
>> across 64 columns.  When ingesting this table I don't see any resource
>> saturation.  Disk utilization is around 10-15% per iostat.  Incoming
>> network traffic on the servers is around 100-300 Mbps.  CPU utilization is
>> around 20-70%.  nodetool tpstats shows mostly zeros with occasional
>> spikes around 500 in MutationStage.
>>
>> The stress run does 10,000,000 inserts per client, each with a separate
>> range of partition IDs.  The run with 200 byte rows takes about 4 minutes,
>> with mean Latency 4.5ms, Total GC time of 21 secs, Avg GC time 173 ms.
>>
>> The overall performance is good - around 120k rows/sec ingested.  But I'm
>> curious to know where the bottleneck is.  There's no resource saturation and
>> nodetool tpstats shows only 

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
I agree that for such a small data, Cassandra is obviously not needed.
However, this is purely an experimental setup by using which I'm trying to
understand how and exactly when memtable flush is triggered. As I mentioned
in my post, I read the documentation and tweaked the parameters accordingly
so that I never hit memtable flush but it is still doing that. As far the
the setup is concerned, I'm just using 1 node and running Cassandra using
"cassandra -R" option and then running some queries to insert some dummy
data.

I use the schema from CASSANDRA_HOME/tools/cqlstress-insanity-example.yaml
and add "durable_writes=false" in the keyspace_definition.

@Daemeon - The previous post lead to this post but since I was unaware of
memtable flush and I assumed memtable flush wasn't happening, the previous
post was about something else (throughput/latency etc.). This post is
explicitly about exactly when memtable is being dumped to the disk. Didn't
want to confuse two different goals that's why posted a new one.

On Thu, May 25, 2017 at 10:38 AM, Avi Kivity  wrote:

> It doesn't have to fit in memory. If your key distribution has strong
> temporal locality, then a larger memtable that can coalesce overwrites
> greatly reduces the disk I/O load for the memtable flush and subsequent
> compactions. Of course, I have no idea if the is what the OP had in mind.
>
>
> On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
>
> Sorry for the confusion.  That was for the OP.  I wrote it quickly right
> after waking up.
>
> What I'm asking is why does the OP want to keep his data in the memtable
> exclusively?  If the goal is to "make reads fast", then just turn on row
> caching.
>
> If there's so little data that it fits in memory (300MB), and there aren't
> going to be any writes past the initial small dataset, why use Cassandra?
> It sounds like the wrong tool for this job.  Sounds like something that
> could easily be stored in S3 and loaded in memory when the app is fired up.
>
>
> On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:
>
>> Not sure whether you're asking me or the original poster, but the more
>> times data gets overwritten in a memtable, the less it has to be compacted
>> later on (and even without overwrites, larger memtables result in less
>> compaction).
>>
>> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>>
>> Why do you think keeping your data in the memtable is a what you need to
>> do?
>> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>>
>>> Then it doesn't have to (it still may, for other reasons).
>>>
>>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>>
>>> What if the commit log is disabled?
>>>
>>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>>
 Cassandra has to flush the memtable occasionally, or the commit log
 grows without bounds.

 On 05/25/2017 03:42 AM, preetika tyagi wrote:

 Hi,

 I'm running Cassandra with a very small dataset so that the data can
 exist on memtable only. Below are my configurations:

 In jvm.options:

 -Xms4G
 -Xmx4G

 In cassandra.yaml,

 memtable_cleanup_threshold: 0.50
 memtable_allocation_type: heap_buffers

 As per the documentation in cassandra.yaml, the
 *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be
 set of 1/4 of heap size i.e. 1000MB

 According to the documentation here (http://docs.datastax.com/en/
 cassandra/3.0/cassandra/configuration/configCassandra_
 yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
 memtable flush will trigger if the total size of memtabl(s) goes beyond
 (1000+1000)*0.50=1000MB.

 Now if I perform several write requests which results in almost ~300MB
 of the data, memtable still gets flushed since I see sstables being created
 on file system (Data.db etc.) and I don't understand why.

 Could anyone explain this behavior and point out if I'm missing
 something here?

 Thanks,

 Preetika



>>>
>>
>


Re: How do you do automatic restacking of AWS instance for cassandra?

2017-05-25 Thread daemeon reiydelle
What is restacking?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Thu, May 25, 2017 at 10:24 AM, Surbhi Gupta 
wrote:

> Hi,
>
> Wanted to understand, how do you do automatic restacking of cassandra
> nodes on AWS?
>
> Thanks
> Surbhi
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
It doesn't have to fit in memory. If your key distribution has strong 
temporal locality, then a larger memtable that can coalesce overwrites 
greatly reduces the disk I/O load for the memtable flush and subsequent 
compactions. Of course, I have no idea if the is what the OP had in mind.


On 05/25/2017 07:14 PM, Jonathan Haddad wrote:
Sorry for the confusion.  That was for the OP.  I wrote it quickly 
right after waking up.


What I'm asking is why does the OP want to keep his data in the 
memtable exclusively?  If the goal is to "make reads fast", then just 
turn on row caching.


If there's so little data that it fits in memory (300MB), and there 
aren't going to be any writes past the initial small dataset, why use 
Cassandra?  It sounds like the wrong tool for this job.  Sounds like 
something that could easily be stored in S3 and loaded in memory when 
the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity > wrote:


Not sure whether you're asking me or the original poster, but the
more times data gets overwritten in a memtable, the less it has to
be compacted later on (and even without overwrites, larger
memtables result in less compaction).


On 05/25/2017 05:59 PM, Jonathan Haddad wrote:

Why do you think keeping your data in the memtable is a what you
need to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity > wrote:

Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:

Cassandra has to flush the memtable occasionally, or the
commit log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that
the data can exist on memtable only. Below are my
configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and
/memtable_heap_space_in_mb/ will be set of 1/4 of heap
size i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
the memtable flush will trigger if the total size of
memtabl(s) goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results
in almost ~300MB of the data, memtable still gets
flushed since I see sstables being created on file
system (Data.db etc.) and I don't understand why.

Could anyone explain this behavior and point out if I'm
missing something here?

Thanks,

Preetika











How do you do automatic restacking of AWS instance for cassandra?

2017-05-25 Thread Surbhi Gupta
Hi,

Wanted to understand, how do you do automatic restacking of cassandra nodes
on AWS?

Thanks
Surbhi


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread daemeon reiydelle
This sounds exactly like a previous post that ended when I asked the person
to document the number of nodes ec2 instance type and size. I suspected a
single nose you system. So the poster reposts? Hmm.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 25, 2017 9:14 AM, "Jonathan Haddad"  wrote:

Sorry for the confusion.  That was for the OP.  I wrote it quickly right
after waking up.

What I'm asking is why does the OP want to keep his data in the memtable
exclusively?  If the goal is to "make reads fast", then just turn on row
caching.

If there's so little data that it fits in memory (300MB), and there aren't
going to be any writes past the initial small dataset, why use Cassandra?
It sounds like the wrong tool for this job.  Sounds like something that
could easily be stored in S3 and loaded in memory when the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:

> Not sure whether you're asking me or the original poster, but the more
> times data gets overwritten in a memtable, the less it has to be compacted
> later on (and even without overwrites, larger memtables result in less
> compaction).
>
> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>
> Why do you think keeping your data in the memtable is a what you need to
> do?
> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>
>> Then it doesn't have to (it still may, for other reasons).
>>
>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>
>> What if the commit log is disabled?
>>
>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>
>>> Cassandra has to flush the memtable occasionally, or the commit log
>>> grows without bounds.
>>>
>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>
>>> Hi,
>>>
>>> I'm running Cassandra with a very small dataset so that the data can
>>> exist on memtable only. Below are my configurations:
>>>
>>> In jvm.options:
>>>
>>> -Xms4G
>>> -Xmx4G
>>>
>>> In cassandra.yaml,
>>>
>>> memtable_cleanup_threshold: 0.50
>>> memtable_allocation_type: heap_buffers
>>>
>>> As per the documentation in cassandra.yaml, the
>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>>> of 1/4 of heap size i.e. 1000MB
>>>
>>> According to the documentation here (http://docs.datastax.com/en/
>>> cassandra/3.0/cassandra/configuration/configCassandra_
>>> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the
>>> memtable flush will trigger if the total size of memtabl(s) goes beyond
>>> (1000+1000)*0.50=1000MB.
>>>
>>> Now if I perform several write requests which results in almost ~300MB
>>> of the data, memtable still gets flushed since I see sstables being created
>>> on file system (Data.db etc.) and I don't understand why.
>>>
>>> Could anyone explain this behavior and point out if I'm missing
>>> something here?
>>>
>>> Thanks,
>>>
>>> Preetika
>>>
>>>
>>>
>>
>


Re: Effect of frequent mutations / memtable

2017-05-25 Thread Thakrar, Jayesh
Hi Jan,

I would suggest looking at using Zookeeper for such a usecase.

See http://zookeeper.apache.org/doc/trunk/recipes.html for some examples.

Zookeeper is used for such purposes in Apache HBase (active master), Apache 
Kafka (active controller), Apache Hadoop, etc.

Look for the "Leader Election" usecase.
Examples
http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm

Its more/new work, but should be an elegant solution.

Hope that helps.
Jayesh

On 5/25/17, 9:19 AM, "Jan Algermissen"  wrote:

Hi,

I am using a updates to a column with a ttl to represent a lock. The 
owning process keeps updating the lock's TTL as long as it is running. 
If the process crashes, the lock will timeout and be deleted. Then 
another process can take over.

I have used this pattern very successfully over years with TTLs in the 
order of tens of seconds.

Now I have a use case in mind that would require much smaller TTLs, e.g. 
1 or two seconds and I am worried about the increased number of 
mutations and possible effect on SSTables.

However: I'd assume these frequent updates on a cell to mostly happen in 
the memtable resulting in only occasional manifestation in SSTables.

Is that assumption correct and if so, what config parameters should I 
tweak to keep the memtable from being flushed for longer periods of 
time?


Jan





Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Sorry for the confusion.  That was for the OP.  I wrote it quickly right
after waking up.

What I'm asking is why does the OP want to keep his data in the memtable
exclusively?  If the goal is to "make reads fast", then just turn on row
caching.

If there's so little data that it fits in memory (300MB), and there aren't
going to be any writes past the initial small dataset, why use Cassandra?
It sounds like the wrong tool for this job.  Sounds like something that
could easily be stored in S3 and loaded in memory when the app is fired up.


On Thu, May 25, 2017 at 8:06 AM Avi Kivity  wrote:

> Not sure whether you're asking me or the original poster, but the more
> times data gets overwritten in a memtable, the less it has to be compacted
> later on (and even without overwrites, larger memtables result in less
> compaction).
>
> On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
>
> Why do you think keeping your data in the memtable is a what you need to
> do?
> On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:
>
>> Then it doesn't have to (it still may, for other reasons).
>>
>> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>>
>> What if the commit log is disabled?
>>
>> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>>
>>> Cassandra has to flush the memtable occasionally, or the commit log
>>> grows without bounds.
>>>
>>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>>
>>> Hi,
>>>
>>> I'm running Cassandra with a very small dataset so that the data can
>>> exist on memtable only. Below are my configurations:
>>>
>>> In jvm.options:
>>>
>>> -Xms4G
>>> -Xmx4G
>>>
>>> In cassandra.yaml,
>>>
>>> memtable_cleanup_threshold: 0.50
>>> memtable_allocation_type: heap_buffers
>>>
>>> As per the documentation in cassandra.yaml, the
>>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>>> of 1/4 of heap size i.e. 1000MB
>>>
>>> According to the documentation here (
>>> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
>>> the memtable flush will trigger if the total size of memtabl(s) goes beyond
>>> (1000+1000)*0.50=1000MB.
>>>
>>> Now if I perform several write requests which results in almost ~300MB
>>> of the data, memtable still gets flushed since I see sstables being created
>>> on file system (Data.db etc.) and I don't understand why.
>>>
>>> Could anyone explain this behavior and point out if I'm missing
>>> something here?
>>>
>>> Thanks,
>>>
>>> Preetika
>>>
>>>
>>>
>>
>


Partition range incremental repairs

2017-05-25 Thread Chris Stokesmore
Hi,

We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running 
repairs with the —pr option, via a cron job that runs on each node once per 
week.

We changed that as some advice on the Cassandra IRC channel said it would cause 
more anticompaction and  
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
 

 says “Performing partitioner range repairs by using the -pr option is 
generally considered a good choice for doing manual repairs. However, this 
option cannot be used with incremental repairs (default for Cassandra 2.2 and 
later).

Only problem is our -pr repairs were taking about 8 hours, and now the non-pr 
repair are taking 24+ - I guess this makes sense, repairing 1/7 of data 
increased to 3/7, except I was hoping to see a speed up after the first loop 
through the cluster as each repair will be marking much more data as repaired, 
right?


Is running -pr with incremental repairs really that bad? 

Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Not sure whether you're asking me or the original poster, but the more 
times data gets overwritten in a memtable, the less it has to be 
compacted later on (and even without overwrites, larger memtables result 
in less compaction).



On 05/25/2017 05:59 PM, Jonathan Haddad wrote:
Why do you think keeping your data in the memtable is a what you need 
to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity > wrote:


Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:

Cassandra has to flush the memtable occasionally, or the
commit log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the
data can exist on memtable only. Below are my configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type:
heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and
/memtable_heap_space_in_mb/ will be set of 1/4 of heap size
i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
the memtable flush will trigger if the total size of
memtabl(s) goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in
almost ~300MB of the data, memtable still gets flushed since
I see sstables being created on file system (Data.db etc.)
and I don't understand why.

Could anyone explain this behavior and point out if I'm
missing something here?

Thanks,

Preetika









Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Jonathan Haddad
Why do you think keeping your data in the memtable is a what you need to do?
On Thu, May 25, 2017 at 7:16 AM Avi Kivity  wrote:

> Then it doesn't have to (it still may, for other reasons).
>
> On 05/25/2017 05:11 PM, preetika tyagi wrote:
>
> What if the commit log is disabled?
>
> On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:
>
>> Cassandra has to flush the memtable occasionally, or the commit log grows
>> without bounds.
>>
>> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>>
>> Hi,
>>
>> I'm running Cassandra with a very small dataset so that the data can
>> exist on memtable only. Below are my configurations:
>>
>> In jvm.options:
>>
>> -Xms4G
>> -Xmx4G
>>
>> In cassandra.yaml,
>>
>> memtable_cleanup_threshold: 0.50
>> memtable_allocation_type: heap_buffers
>>
>> As per the documentation in cassandra.yaml, the
>> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
>> of 1/4 of heap size i.e. 1000MB
>>
>> According to the documentation here (
>> http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
>> the memtable flush will trigger if the total size of memtabl(s) goes beyond
>> (1000+1000)*0.50=1000MB.
>>
>> Now if I perform several write requests which results in almost ~300MB of
>> the data, memtable still gets flushed since I see sstables being created on
>> file system (Data.db etc.) and I don't understand why.
>>
>> Could anyone explain this behavior and point out if I'm missing something
>> here?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>


Effect of frequent mutations / memtable

2017-05-25 Thread Jan Algermissen

Hi,

I am using a updates to a column with a ttl to represent a lock. The 
owning process keeps updating the lock's TTL as long as it is running. 
If the process crashes, the lock will timeout and be deleted. Then 
another process can take over.


I have used this pattern very successfully over years with TTLs in the 
order of tens of seconds.


Now I have a use case in mind that would require much smaller TTLs, e.g. 
1 or two seconds and I am worried about the increased number of 
mutations and possible effect on SSTables.


However: I'd assume these frequent updates on a cell to mostly happen in 
the memtable resulting in only occasional manifestation in SSTables.


Is that assumption correct and if so, what config parameters should I 
tweak to keep the memtable from being flushed for longer periods of 
time?



Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity

Then it doesn't have to (it still may, for other reasons).


On 05/25/2017 05:11 PM, preetika tyagi wrote:

What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity" > wrote:


Cassandra has to flush the memtable occasionally, or the commit
log grows without bounds.


On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the data
can exist on memtable only. Below are my configurations:

In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type:
heap_buffers |

As per the documentation in cassandra.yaml, the
/memtable_heap_space_in_mb/ and /memtable_heap_space_in_mb/ will
be set of 1/4 of heap size i.e. 1000MB

According to the documentation here

(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold

),
the memtable flush will trigger if the total size of memtabl(s)
goes beyond (1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in almost
~300MB of the data, memtable still gets flushed since I see
sstables being created on file system (Data.db etc.) and I don't
understand why.

Could anyone explain this behavior and point out if I'm missing
something here?

Thanks,

Preetika







Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread preetika tyagi
What if the commit log is disabled?

On May 25, 2017 4:31 AM, "Avi Kivity"  wrote:

> Cassandra has to flush the memtable occasionally, or the commit log grows
> without bounds.
>
> On 05/25/2017 03:42 AM, preetika tyagi wrote:
>
> Hi,
>
> I'm running Cassandra with a very small dataset so that the data can
> exist on memtable only. Below are my configurations:
>
> In jvm.options:
>
> -Xms4G
> -Xmx4G
>
> In cassandra.yaml,
>
> memtable_cleanup_threshold: 0.50
> memtable_allocation_type: heap_buffers
>
> As per the documentation in cassandra.yaml, the
> *memtable_heap_space_in_mb* and *memtable_heap_space_in_mb* will be set
> of 1/4 of heap size i.e. 1000MB
>
> According to the documentation here (http://docs.datastax.com/en/
> cassandra/3.0/cassandra/configuration/configCassandra_
> yaml.html#configCassandra_yaml__memtable_cleanup_threshold), the memtable
> flush will trigger if the total size of memtabl(s) goes beyond
> (1000+1000)*0.50=1000MB.
>
> Now if I perform several write requests which results in almost ~300MB of
> the data, memtable still gets flushed since I see sstables being created on
> file system (Data.db etc.) and I don't understand why.
>
> Could anyone explain this behavior and point out if I'm missing something
> here?
>
> Thanks,
>
> Preetika
>
>
>


Re: How to avoid flush if the data can fit into memtable

2017-05-25 Thread Avi Kivity
Cassandra has to flush the memtable occasionally, or the commit log 
grows without bounds.



On 05/25/2017 03:42 AM, preetika tyagi wrote:

Hi,

I'm running Cassandra with a very small dataset so that the data can 
exist on memtable only. Below are my configurations:


In jvm.options:

|-Xms4G -Xmx4G |

In cassandra.yaml,

|memtable_cleanup_threshold: 0.50 memtable_allocation_type: heap_buffers |

As per the documentation in cassandra.yaml, the 
/memtable_heap_space_in_mb/ and /memtable_heap_space_in_mb/ will be 
set of 1/4 of heap size i.e. 1000MB


According to the documentation here 
(http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold), 
the memtable flush will trigger if the total size of memtabl(s) goes 
beyond (1000+1000)*0.50=1000MB.


Now if I perform several write requests which results in almost ~300MB 
of the data, memtable still gets flushed since I see sstables being 
created on file system (Data.db etc.) and I don't understand why.


Could anyone explain this behavior and point out if I'm missing 
something here?


Thanks,

Preetika