Re: underutilized servers

2021-03-06 Thread Bowen Song

Hi Erick,


Please allow me to disagree on this. A node dropping reads and writes 
doesn't always mean the disk is the bottleneck. I have seen the same 
behaviour when a node had excessive STW GCs and a lots of timeouts, and 
I have also seen writes get dropped because the size of the mutation 
exceeds half of the commit log segment size. I'd like to keep an open 
mind until it's supported by evidence, so we don't ended up wasting time 
(and money) on trying to fix an issue that doesn't exist in the first place.



Cheers,

Bowen

On 05/03/2021 23:09, Erick Ramirez wrote:
The tpstats you posted show that the node is dropping reads and writes 
which means that your disk can't keep up with the load meaning your 
disk is the bottleneck. If you haven't already, place data and 
commitlog on separate disks so they're not competing for the same IO 
bandwidth. Note that It's OK to have them on the same disk/volume if 
you have NVMe SSDs since it's a lot more difficult to saturate them.


The challenge with monitoring is that typically it's only checking 
disk stats every 5 minutes (for example). But your app traffic is 
bursty in nature so stats averaged out over a period of time is 
irrelevant because the only thing that matters is what the disk IO is 
at the the time you hit peak loads.


The dropped reads and mutations tell you the node is overloaded. 
Provided your nodes are configured correctly, the only way out of this 
situation is to correctly size your cluster and add more nodes -- your 
cluster needs to be sized for peak loads, not average throughput. Cheers!


Re: underutilized servers

2021-03-06 Thread Bowen Song

Hi Attila,


Addressing your data modelling issue is definitely important, and this 
alone may be enough to solve all the issues you have with Cassandra.


 * "Since these are VMs, is there any chance they are competing for
   resources on the same physical host?"
   We are splitting the physical hardware into 2 VMs - and resources
   (cpu cores, disks, ram) all assigned in a dedicated fashion to the
   VMs without intersection

How do you split? Number of cores in all VMs sums to the total physical 
CPU cores is not enough, because context switches and possible thread 
contentions will waste CPU cycles. Since you have also said 8-12% CPU 
time is spent in sys mode, I think it warrants an investigation.


Also, do you expose physical disks to the VM or use disk image files? 
Disk image files can be slow, especially for high IOPS random reads.


Personally, I won't recommend running a database on a VM other than for 
dev/testing/etc. purposes. If possible, you should try to add a node 
running on a bare metal server of the similar spec as the VM, and see if 
there's any noticeable performance differences between this bare metal 
node and the VM nodes.



 * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the limit
   of the physical host - so our 2 VMs competing here. Possible that
   Cassandra VM has ~50-70% of it...

A 50-70% utilization of a 1 Gbps network interface on average doesn't 
sound good at all. That over 60MB/s network traffic constantly. Can you 
investigate why is this happening? Do you really read/write that much? 
Or is it something else?



 * "nodetool tpstats"
   whooa I never used it, we definitely need some learning here to even
   understand the output... :-) But I copy that here to the bottom ...
   maybe clearly shows something to someone who can read it...

I noticed that you are using counters in Cassandra. I have to say that I 
haven't had a good experience with Cassandra counters. An article 
 
which I read recently may convince you to get rid of it. I also don't 
think counter is something the Cassandra developers are focused on, 
because things like CASSANDRA-6506 
 have been sitting 
there for many years.


Use your database software for their strengths, not their weaknesses. 
You have Cassandra, but you don't have to use every feature in 
Cassandra. Sometimes another technology may be more suitable for 
something that Cassandra can do but isn't very good at.



Cheers,

Bowen

On 05/03/2021 18:37, Attila Wind wrote:


Thanks for the answers @Sean and @Bowen !!!

First of all, this article described very similar thing we experience 
- let me share

https://www.senticore.com/overcoming-cassandra-write-performance-problems/
we are studying that now

Furthermore

  * yes, we have some level of unbalanced data which needs to be
improved - this is on our backlog so should be done
  * and yes we do see clearly that this unbalanced data is slowing
down everything in Cassandra (there is proof of it in our
Prometheus+Grafana based monitoring)
  * we will do this optimization now definitely (luckily we have plan
already)

@Sean:

  * "Since these are VMs, is there any chance they are competing for
resources on the same physical host?"
We are splitting the physical hardware into 2 VMs - and resources
(cpu cores, disks, ram) all assigned in a dedicated fashion to the
VMs without intersection
BUT!!
You are right... There is one thing we are sharing: network
bandwidth... and actually that one does not come up in the
"iowait" part for sure. We will further analyze into this
direction definitely because from the monitoring as far as I see
yeppp, we might hit the wall here
  * consistency level: we are using LOCAL_ONE
  * "Does the app use prepared statements that are only prepared once
per app invocation?"
Yes and yes :-)
  * "Any LWT/”if exists” in your code?"
No. We go with RF=2 so we even can not use this (as LWT goes with
QUORUM and in our case this would mean we could not tolerate
losing a node... not good... so no)

@Bowen:

  * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the
limit of the physical host - so our 2 VMs competing here. Possible
that Cassandra VM has ~50-70% of it...
  * The CPU's "system" value shows 8-12%
  * "nodetool tpstats"
whooa I never used it, we definitely need some learning here to
even understand the output... :-) But I copy that here to the
bottom ... maybe clearly shows something to someone who can read it...

so, "nodetool tpstats" from one of the nodes

Pool Name Active   Pending  Completed   Blocked  All time blocked
ReadStage  0 0 6248406 
0 0
CompactionExecutor 0 0 168525 
0 0
Mutatio

Re: underutilized servers

2021-03-06 Thread Attila Wind

Thanks Bowen,

 * "How do you split?"
   challenging to answer short, but let me try: physical host has cores
   from idx 0 - 11 (6 physical and 6 virtual in pairs - they are in
   pairs as 0,6 belongs together, then 1,7 and then 2,8 and so on)
   What we do is that in the virt-install command we use --cpu
   host-passthrough --cpuset={{virtinst_cpu_set}} --vcpus=6
   where {{virtinst_cpu_set}} is
   - 0,6,1,7,2,8 - for CassandraVM
   - 3,9,4,10,5,11 - for the other VM
   (we split the physical host into 2 VMs)

 * "do you expose physical disks to the VM or use disk image files"
   no images, physical host has 2 spinning disks and 1 SSD drive
   CassandraVM gets assigned explicitly 1 of the spinning disks and she
   also gets assigned a partition of the SSD (which is used for commit
   logs only so that is separated from the data)

 * "A 50-70% utilization of a 1 Gbps network interface on average
   doesn't sound good at all."
   Yes, this is weird... Especially because e.g. if we bring down a
   node, the other 2 nodes (we go with RF=2) are producing ~600Mb hints
   files / minute
   And assuming hint files is basicall the saved "network traffic"
   until node is down this would still just give 10Mb/sec ...
   OK, these are just the replicated updates, there is also read and of
   course App layer is also reading but even with that in mind it does
   not add up... So we will try to do further analysis here

Thanks for the article also regarding the Counter tables!
Actually we already know for a while there are "interesting" things 
going around the Counter tables it is surprising how difficult to find 
info regarding this topic...
I personally tried to look around here several times and always just 
getting the same and the same information in posts...


Moving away from counters would not be bad especially because of the 
difficulties around DELETEing (we also feel it) them however I do not 
see any obvious migration strategy here...
But maybe let me ask this in a separate question. Might make more 
sense... :-)


Thanks again - and thanks to others as well

It looks mastering the "nodetool tpstats" and the Cassandra thread pools 
would worth some time... :-)



Attila Wind

http://www.linkedin.com/in/attilaw 
Mobile: +49 176 43556932


06.03.2021 13:03 keltezéssel, Bowen Song írta:


Hi Attila,


Addressing your data modelling issue is definitely important, and this 
alone may be enough to solve all the issues you have with Cassandra.


  * "Since these are VMs, is there any chance they are competing for
resources on the same physical host?"
We are splitting the physical hardware into 2 VMs - and resources
(cpu cores, disks, ram) all assigned in a dedicated fashion to the
VMs without intersection

How do you split? Number of cores in all VMs sums to the total 
physical CPU cores is not enough, because context switches and 
possible thread contentions will waste CPU cycles. Since you have also 
said 8-12% CPU time is spent in sys mode, I think it warrants an 
investigation.


Also, do you expose physical disks to the VM or use disk image files? 
Disk image files can be slow, especially for high IOPS random reads.


Personally, I won't recommend running a database on a VM other than 
for dev/testing/etc. purposes. If possible, you should try to add a 
node running on a bare metal server of the similar spec as the VM, and 
see if there's any noticeable performance differences between this 
bare metal node and the VM nodes.



  * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the
limit of the physical host - so our 2 VMs competing here. Possible
that Cassandra VM has ~50-70% of it...

A 50-70% utilization of a 1 Gbps network interface on average doesn't 
sound good at all. That over 60MB/s network traffic constantly. Can 
you investigate why is this happening? Do you really read/write that 
much? Or is it something else?



  * "nodetool tpstats"
whooa I never used it, we definitely need some learning here to
even understand the output... :-) But I copy that here to the
bottom ... maybe clearly shows something to someone who can read it...

I noticed that you are using counters in Cassandra. I have to say that 
I haven't had a good experience with Cassandra counters. An article 
 
which I read recently may convince you to get rid of it. I also don't 
think counter is something the Cassandra developers are focused on, 
because things like CASSANDRA-6506 
 have been 
sitting there for many years.


Use your database software for their strengths, not their weaknesses. 
You have Cassandra, but you don't have to use every feature in 
Cassandra. Sometimes another technology may be more suitable for 
something that Cassandra can do but isn't very good at.



Cheers,

Bowen

On 05/03/2021 18:37

moving away from Counters - strategy?

2021-03-06 Thread Attila Wind

Hi guys,

We do use Counter tables a lot because in our app we have several things 
to count (business logic)


More time we work with Cassandra we keep hearing more and more: "you 
should not use counter tables  because ."
Yes, we also feel here and there the trade off is too much restrictive - 
for us what hurts now days is that deleting counters it seems not that 
simple... Also the TTL possibility we do miss a lot.


But I have to confess I do not see an obvious migration strategy here...
What bothers me e.g.: concurrency, and wrong results thanks to that
namely

If I want to fulfill the mission "UPDATE table SET mycounter = mycounter 
+ x WHERE ..." does

with traditional table (with an int column) I need to do this:
1. read the value of "mycounter"
2. add x to the value I readc(in memory)
3. update mycounter = new value

Needless to say that if I have a race condition so ThreadA and ThreadB 
are executing the above sequence ~ the same time then the mycounter 
value will be wrong...


I started to wonder: how do you solve this problem?
Is anyone aware of any nice post/article regarding migration strategy - 
stepping away from counters?


thanks!


--
Attila Wind

http://www.linkedin.com/in/attilaw 
Mobile: +49 176 43556932




Re: moving away from Counters - strategy?

2021-03-06 Thread Jeff Jirsa

You can do this with conditional (CAS) updates - update ... set c=y if c=x

Requires serial writes and serial reads, so a bit more expensive, but allows 
TTL.


> On Mar 6, 2021, at 8:03 AM, Attila Wind  wrote:
> 
> 
> Hi guys,
> 
> We do use Counter tables a lot because in our app we have several things to 
> count (business logic)
> 
> More time we work with Cassandra we keep hearing more and more: "you should 
> not use counter tables  because ."
> Yes, we also feel here and there the trade off is too much restrictive - for 
> us what hurts now days is that deleting counters it seems not that simple... 
> Also the TTL possibility we do miss a lot.
> 
> But I have to confess I do not see an obvious migration strategy here...
> What bothers me e.g.: concurrency, and wrong results thanks to that
> namely
> 
> If I want to fulfill the mission "UPDATE table SET mycounter = mycounter + x 
> WHERE ..." does
> with traditional table (with an int column) I need to do this:
> 1. read the value of "mycounter"
> 2. add x to the value I readc(in memory)
> 3. update mycounter = new value
> 
> Needless to say that if I have a race condition so ThreadA and ThreadB are 
> executing the above sequence ~ the same time then the mycounter value will be 
> wrong... 
> 
> I started to wonder: how do you solve this problem?
> Is anyone aware of any nice post/article regarding migration strategy - 
> stepping away from counters?
> 
> thanks!
> 
> 
> 
> -- 
> Attila Wind
> 
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
> 
> 


Re: moving away from Counters - strategy?

2021-03-06 Thread Attila Wind
Ahh forgot to mention we have RF=2, sorry!

LWT requires RF >= 3 otherwise we can not tolerate losing a node (because
of LOCAL_QUORUM is working in the background which you can not really
change AFAIK...)

Or am I wrong?

Plus in a highly concurrent setup writing the same PK this optimistic
locking fashion would end up in lots of retries I'm afraid. Eventually
making this strategy much more expensive.

Or am I wrong here too?


Cheers
Attila

On Sun, 7 Mar 2021, 05:20 Jeff Jirsa,  wrote:

>
> You can do this with conditional (CAS) updates - update ... set c=y if c=x
>
> Requires serial writes and serial reads, so a bit more expensive, but
> allows TTL.
>
>
> On Mar 6, 2021, at 8:03 AM, Attila Wind  wrote:
>
> 
>
> Hi guys,
>
> We do use Counter tables a lot because in our app we have several things
> to count (business logic)
>
> More time we work with Cassandra we keep hearing more and more: "you
> should not use counter tables  because ."
> Yes, we also feel here and there the trade off is too much restrictive -
> for us what hurts now days is that deleting counters it seems not that
> simple... Also the TTL possibility we do miss a lot.
>
> But I have to confess I do not see an obvious migration strategy here...
> What bothers me e.g.: concurrency, and wrong results thanks to that
> namely
>
> If I want to fulfill the mission "UPDATE table SET mycounter = mycounter +
> x WHERE ..." does
> with traditional table (with an int column) I need to do this:
> 1. read the value of "mycounter"
> 2. add x to the value I readc(in memory)
> 3. update mycounter = new value
>
> Needless to say that if I have a race condition so ThreadA and ThreadB are
> executing the above sequence ~ the same time then the mycounter value will
> be wrong...
>
> I started to wonder: how do you solve this problem?
> Is anyone aware of any nice post/article regarding migration strategy -
> stepping away from counters?
>
> thanks!
>
>
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
>