Re: moving from 4.0-alpha4 to 4.0.1

2021-10-09 Thread Attila Wind

Thank you Paulo for your detailed answer!
I was not monitoring NEWS.txt in the Git repo so far but that file 
definitely has info I was looking for.


cheers

Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


09.10.2021 15:07 keltezéssel, Paulo Motta írta:

Hi Attila,

Minor version upgrades are generally fine to do in-place, unless 
otherwise specified on NEWS.txt 
<https://github.com/apache/cassandra/blob/cassandra-4.0.1/NEWS.txt 
<https://github.com/apache/cassandra/blob/cassandra-4.0.1/NEWS.txt>> 
for the specific versions you're upgrading. Cassandra is designed with 
this goal in mind, and potentially disruptive changes can only be 
introduced in major versions, which require a little more care during 
the upgrade process.


It's definitely safe to do an in-place one-node-at-a-time upgrade for 
minor versions in the same major series (ie. 4.0-alpha to 4.0.1). 
Nevertheless it doesn't hurt to take a global snapshot "just-in-case", 
so you can rollback in case you run into an unexpected issue, but this 
is just extra safety and not strictly required.


Unfortunately there's no official upgrade guide yet, this is something 
the community is working on to provide soon, but you can find some 
unofficial ones with a quick google search.


Major upgrades are also designed to be harmless, but a little bit more 
preparation is required to ensure a smooth ride due to potentially 
non-compatible changes. I've written an upgrade guide sometime ago 
which can be useful to prepare for a major upgrade, but can also apply 
to minor upgrades as well to ensure extra safety during the process: 
http://monkeys.chaordic.com.br/operation/2014/04/11/zero-downtime-cassandra-upgrade.html 
<http://monkeys.chaordic.com.br/operation/2014/04/11/zero-downtime-cassandra-upgrade.html>


Cheers and good luck!

Paulo

Em sáb., 9 de out. de 2021 às 06:56, Attila Wind 
 escreveu:


Hi all,

I have 2 quick questions

1. We have a cluster running 4.0-alpha4. Now 4.0.1 is out and
obviously it would make lots of sense to switch to this version.
Does anyone know if we can do it simply "in place"? I mean we just
ugrade the software and restart? Or it would not work / would be
dangerous due to some storage layer incompatibilities or other
risk factors? So better to run a (usual) data migration process..?

2. Actually the above brought the more generic question: is the
community maintaining any kind of guide/readme/whatever one can
use to find answer for similar questions? As a user I see the
changelog and that's cool but that is not providing obvious
answers (of course). So I mean some sort of migration hints/guide.

thanks!

-- 
Attila Wind


http://www.linkedin.com/in/attilaw
<http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




moving from 4.0-alpha4 to 4.0.1

2021-10-09 Thread Attila Wind

Hi all,

I have 2 quick questions

1. We have a cluster running 4.0-alpha4. Now 4.0.1 is out and obviously 
it would make lots of sense to switch to this version. Does anyone know 
if we can do it simply "in place"? I mean we just ugrade the software 
and restart? Or it would not work / would be dangerous due to some 
storage layer incompatibilities or other risk factors? So better to run 
a (usual) data migration process..?


2. Actually the above brought the more generic question: is the 
community maintaining any kind of guide/readme/whatever one can use to 
find answer for similar questions? As a user I see the changelog and 
that's cool but that is not providing obvious answers (of course). So I 
mean some sort of migration hints/guide.


thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: Integration tests - is anything wrong with Cassandra beta4 docker??

2021-03-10 Thread Attila Wind


Yes this defintely explains the (much) longer startup - however I'm not 
sure about stability issues still... Anyways thanks Erik for the info!


Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


10.03.2021 14:26 keltezéssel, Erik Merkle írta:
I opened a ticket when beta4 was released, but no Docker image was 
available. It turned out the reason the image wasn't published is that 
the tests DockerHub uses to verify the builds were timing out, so it 
may be a similar issue you are running into. The ticket is here:


https://github.com/docker-library/cassandra/issues/221#issuecomment-761187461 
<https://github.com/docker-library/cassandra/issues/221#issuecomment-761187461>


DockerHub ended up altering some of the default Cassandra settings for 
their verification tests only, to allow the tests to pass. For your 
integration tests, you may want to do something similar.


On Wed, Mar 10, 2021 at 5:08 AM Attila Wind  
wrote:


Hi Guys,

We are using dockerized Cassandra to run our integration tests.
So far we were using the 4.0-beta2 docker image

(https://hub.docker.com/layers/cassandra/library/cassandra/4.0-beta2/images/sha256-77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4?context=explore

<https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_layers_cassandra_library_cassandra_4.0-2Dbeta2_images_sha256-2D77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4-3Fcontext-3Dexplore&d=DwMCBA&c=adz96Xi0w1RHqtPMowiL2g&r=uHjHq8qzJoJORfwNE9cgGQeHQBiMQtuQd1uTkDPFJP0&m=cyTtaxvGkKVI7sE73eON05g5XWqhkJbmLe0kdT3u2uk&s=ICCzMRahGNP7SHgwY-VsYo5fNj1gww3OaQRLKOpRHcA&e=>)
Recently I tried to switch to the 4.0-beta4 docker but noticed a
few problems...

  * image starts much much slower for me (4x more time to come up
and can connect to it)
  * unlike the beta2 version beta4 barely survive all of our test
cases... typically it gets stucked and fail with timeouts at
around 60-80% of completed test cases

Anyone similar / related experiences maybe?

thanks

-- 
Attila Wind


http://www.linkedin.com/in/attilaw

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_attilaw&d=DwMCBA&c=adz96Xi0w1RHqtPMowiL2g&r=uHjHq8qzJoJORfwNE9cgGQeHQBiMQtuQd1uTkDPFJP0&m=cyTtaxvGkKVI7sE73eON05g5XWqhkJbmLe0kdT3u2uk&s=6zQfwDS6Cxc7wA1UVRJzQ3AYwjdd1bUVfGhY4jaao9M&e=>
Mobile: +49 176 43556932




--
Erik Merkle
e. erik.mer...@datastax.com <mailto:erik.mer...@datastax.com>
w. www.datastax.com <http://www.datastax.com>



Integration tests - is anything wrong with Cassandra beta4 docker??

2021-03-10 Thread Attila Wind

Hi Guys,

We are using dockerized Cassandra to run our integration tests.
So far we were using the 4.0-beta2 docker image 
(https://hub.docker.com/layers/cassandra/library/cassandra/4.0-beta2/images/sha256-77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4?context=explore)
Recently I tried to switch to the 4.0-beta4 docker but noticed a few 
problems...


 * image starts much much slower for me (4x more time to come up and
   can connect to it)
 * unlike the beta2 version beta4 barely survive all of our test
   cases... typically it gets stucked and fail with timeouts at around
   60-80% of completed test cases

Anyone similar / related experiences maybe?

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: moving away from Counters - strategy?

2021-03-06 Thread Attila Wind
Ahh forgot to mention we have RF=2, sorry!

LWT requires RF >= 3 otherwise we can not tolerate losing a node (because
of LOCAL_QUORUM is working in the background which you can not really
change AFAIK...)

Or am I wrong?

Plus in a highly concurrent setup writing the same PK this optimistic
locking fashion would end up in lots of retries I'm afraid. Eventually
making this strategy much more expensive.

Or am I wrong here too?


Cheers
Attila

On Sun, 7 Mar 2021, 05:20 Jeff Jirsa,  wrote:

>
> You can do this with conditional (CAS) updates - update ... set c=y if c=x
>
> Requires serial writes and serial reads, so a bit more expensive, but
> allows TTL.
>
>
> On Mar 6, 2021, at 8:03 AM, Attila Wind  wrote:
>
> 
>
> Hi guys,
>
> We do use Counter tables a lot because in our app we have several things
> to count (business logic)
>
> More time we work with Cassandra we keep hearing more and more: "you
> should not use counter tables  because ."
> Yes, we also feel here and there the trade off is too much restrictive -
> for us what hurts now days is that deleting counters it seems not that
> simple... Also the TTL possibility we do miss a lot.
>
> But I have to confess I do not see an obvious migration strategy here...
> What bothers me e.g.: concurrency, and wrong results thanks to that
> namely
>
> If I want to fulfill the mission "UPDATE table SET mycounter = mycounter +
> x WHERE ..." does
> with traditional table (with an int column) I need to do this:
> 1. read the value of "mycounter"
> 2. add x to the value I readc(in memory)
> 3. update mycounter = new value
>
> Needless to say that if I have a race condition so ThreadA and ThreadB are
> executing the above sequence ~ the same time then the mycounter value will
> be wrong...
>
> I started to wonder: how do you solve this problem?
> Is anyone aware of any nice post/article regarding migration strategy -
> stepping away from counters?
>
> thanks!
>
>
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
>


moving away from Counters - strategy?

2021-03-06 Thread Attila Wind

Hi guys,

We do use Counter tables a lot because in our app we have several things 
to count (business logic)


More time we work with Cassandra we keep hearing more and more: "you 
should not use counter tables  because ."
Yes, we also feel here and there the trade off is too much restrictive - 
for us what hurts now days is that deleting counters it seems not that 
simple... Also the TTL possibility we do miss a lot.


But I have to confess I do not see an obvious migration strategy here...
What bothers me e.g.: concurrency, and wrong results thanks to that
namely

If I want to fulfill the mission "UPDATE table SET mycounter = mycounter 
+ x WHERE ..." does

with traditional table (with an int column) I need to do this:
1. read the value of "mycounter"
2. add x to the value I readc(in memory)
3. update mycounter = new value

Needless to say that if I have a race condition so ThreadA and ThreadB 
are executing the above sequence ~ the same time then the mycounter 
value will be wrong...


I started to wonder: how do you solve this problem?
Is anyone aware of any nice post/article regarding migration strategy - 
stepping away from counters?


thanks!


--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: underutilized servers

2021-03-06 Thread Attila Wind

Thanks Bowen,

 * "How do you split?"
   challenging to answer short, but let me try: physical host has cores
   from idx 0 - 11 (6 physical and 6 virtual in pairs - they are in
   pairs as 0,6 belongs together, then 1,7 and then 2,8 and so on)
   What we do is that in the virt-install command we use --cpu
   host-passthrough --cpuset={{virtinst_cpu_set}} --vcpus=6
   where {{virtinst_cpu_set}} is
   - 0,6,1,7,2,8 - for CassandraVM
   - 3,9,4,10,5,11 - for the other VM
   (we split the physical host into 2 VMs)

 * "do you expose physical disks to the VM or use disk image files"
   no images, physical host has 2 spinning disks and 1 SSD drive
   CassandraVM gets assigned explicitly 1 of the spinning disks and she
   also gets assigned a partition of the SSD (which is used for commit
   logs only so that is separated from the data)

 * "A 50-70% utilization of a 1 Gbps network interface on average
   doesn't sound good at all."
   Yes, this is weird... Especially because e.g. if we bring down a
   node, the other 2 nodes (we go with RF=2) are producing ~600Mb hints
   files / minute
   And assuming hint files is basicall the saved "network traffic"
   until node is down this would still just give 10Mb/sec ...
   OK, these are just the replicated updates, there is also read and of
   course App layer is also reading but even with that in mind it does
   not add up... So we will try to do further analysis here

Thanks for the article also regarding the Counter tables!
Actually we already know for a while there are "interesting" things 
going around the Counter tables it is surprising how difficult to find 
info regarding this topic...
I personally tried to look around here several times and always just 
getting the same and the same information in posts...


Moving away from counters would not be bad especially because of the 
difficulties around DELETEing (we also feel it) them however I do not 
see any obvious migration strategy here...
But maybe let me ask this in a separate question. Might make more 
sense... :-)


Thanks again - and thanks to others as well

It looks mastering the "nodetool tpstats" and the Cassandra thread pools 
would worth some time... :-)



Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


06.03.2021 13:03 keltezéssel, Bowen Song írta:


Hi Attila,


Addressing your data modelling issue is definitely important, and this 
alone may be enough to solve all the issues you have with Cassandra.


  * "Since these are VMs, is there any chance they are competing for
resources on the same physical host?"
We are splitting the physical hardware into 2 VMs - and resources
(cpu cores, disks, ram) all assigned in a dedicated fashion to the
VMs without intersection

How do you split? Number of cores in all VMs sums to the total 
physical CPU cores is not enough, because context switches and 
possible thread contentions will waste CPU cycles. Since you have also 
said 8-12% CPU time is spent in sys mode, I think it warrants an 
investigation.


Also, do you expose physical disks to the VM or use disk image files? 
Disk image files can be slow, especially for high IOPS random reads.


Personally, I won't recommend running a database on a VM other than 
for dev/testing/etc. purposes. If possible, you should try to add a 
node running on a bare metal server of the similar spec as the VM, and 
see if there's any noticeable performance differences between this 
bare metal node and the VM nodes.



  * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the
limit of the physical host - so our 2 VMs competing here. Possible
that Cassandra VM has ~50-70% of it...

A 50-70% utilization of a 1 Gbps network interface on average doesn't 
sound good at all. That over 60MB/s network traffic constantly. Can 
you investigate why is this happening? Do you really read/write that 
much? Or is it something else?



  * "nodetool tpstats"
whooa I never used it, we definitely need some learning here to
even understand the output... :-) But I copy that here to the
bottom ... maybe clearly shows something to someone who can read it...

I noticed that you are using counters in Cassandra. I have to say that 
I haven't had a good experience with Cassandra counters. An article 
<https://ably.com/blog/cassandra-counter-columns-nice-in-theory-hazardous-in-practice> 
which I read recently may convince you to get rid of it. I also don't 
think counter is something the Cassandra developers are focused on, 
because things like CASSANDRA-6506 
<https://issues.apache.org/jira/browse/CASSANDRA-6506> have been 
sitting there for many years.


Use your database software for their strengths, not their weaknesses. 
You have Cassandra, but you don't have to use every feature in 
Cassandra. Sometimes ano

Re: underutilized servers

2021-03-05 Thread Attila Wind
_REQ  0  0.00 0.00  
0.00  0.00
REPLICATION_DONE_REQ 0  0.00 0.00  
0.00  0.00
PAXOS_PROPOSE_RSP        0  0.00 0.00  
0.00  0.00



Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


05.03.2021 17:45 keltezéssel, Bowen Song írta:


Based on my personal experience, the combination of slow read queries 
and low CPU usage is often an indicator of bad table schema design 
(e.g.: large partitions) or bad query (e.g. without partition key). 
Check the Cassandra logs first, is there any long stop-the-world GC? 
tombstone warning? anything else that's out of ordinary? Check the 
output from "nodetool tpstats", is there any pending or blocked tasks? 
Which thread pool(s) are they in? Is there a high number of dropped 
messages? If you can't find anything useful from the Cassandra server 
logs and "nodetool tpstats", try to get a few slow queries from your 
application's log, and run them manually in the cqlsh. Are the results 
very large? How long do they take?



Regarding some of your observations:

/> CPU load is around 20-25% - so we have lots of spare capacity/

Is it very few threads each uses nearly 100% of a CPU core? If so, 
what are those threads? (I find the ttop command from the sjk tool 
<https://github.com/aragozin/jvm-tools> very helpful)


/> network load is around 50% of the full available bandwidth/

This sounds alarming to me. May I ask what's the full available 
bandwidth? Do you have a lots of CPU time spent in sys (vs user) mode?



On 05/03/2021 14:48, Attila Wind wrote:


Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...


We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on 
investigation we took it seems our bottleneck is the Cassandra 
cluster. The application layer is waiting a lot for Cassandra ops. So 
queries are running slow on Cassandra side however due to our 
monitoring it looks the Cassandra servers still have lots of free 
resources...


The Cassandra machines are virtual machines (we do own the physical 
hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB 
RAM dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version 
(the physical and virtual host)

We are running Cassandra 4.0-alpha4

What we see is

  * CPU load is around 20-25% - so we have lots of spare capacity
  * iowait is around 2-5% - so disk bandwidth should be fine
  * network load is around 50% of the full available bandwidth
  * loadavg is max around 4 - 4.5 but typically around 3 (because of
the cpu count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand 
what could hold Cassandra back to fully utilize the server resources...


We are clearly missing something!
Anyone any idea / tip?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




underutilized servers

2021-03-05 Thread Attila Wind

Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...


We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on 
investigation we took it seems our bottleneck is the Cassandra cluster. 
The application layer is waiting a lot for Cassandra ops. So queries are 
running slow on Cassandra side however due to our monitoring it looks 
the Cassandra servers still have lots of free resources...


The Cassandra machines are virtual machines (we do own the physical 
hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM 
dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version 
(the physical and virtual host)

We are running Cassandra 4.0-alpha4

What we see is

 * CPU load is around 20-25% - so we have lots of spare capacity
 * iowait is around 2-5% - so disk bandwidth should be fine
 * network load is around 50% of the full available bandwidth
 * loadavg is max around 4 - 4.5 but typically around 3 (because of the
   cpu count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand what 
could hold Cassandra back to fully utilize the server resources...


We are clearly missing something!
Anyone any idea / tip?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: strange behavior of counter tables after losing a node

2021-01-27 Thread Attila Wind
Thanks Elliott, yepp! This is exactly what we also figured out as a next 
step. Upgrade our TEST env to that so we can re-evaluate the test we did.

Makes 100% sense

Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


27.01.2021 10:18 keltezéssel, Elliott Sims írta:
To start with, maybe update to beta4.  There's an absolute massive 
list of fixes since alpha4.  I don't think the alphas are expected to 
be in a usable/low-bug state necessarily, where beta4 is approaching 
RC status.


On Tue, Jan 26, 2021, 10:44 PM Attila Wind  wrote:

Hey All,

I'm coming back on my own question (see below) as this has
happened again to us 2 days later so we took the time to further
analyse this issue. I'd like to share our experiences and the
workaround which we figured out too.

So to just quickly sum up the most important details again:

  * we have a 3 nodes cluster - Cassandra 4-alpha4 and RF=2 - in
one DC
  * we are using ONE consistency level in all queries
  * if we lose one node from the cluster then
  o non-counter table writes are fine, remaining 2 nodes
taking over everything
  o but counter table writes start to fail with exception
"com.datastax.driver.core.exceptions.WriteTimeoutException:
Cassandra timeout during COUNTER write query at
consistency ONE (1 replica were required but only 0
acknowledged the write)"
  o the two remaining nodes are both producing hints files for
the fallen one
  * just a note: counter_write_request_timeout_in_ms = 1,
write_request_timeout_in_ms = 5000 in our cassandra.yaml

To test this further bit we did the following:

  * we shut down one of the nodes normally
In this case we do not have the above behavior - everything
happens as it should, no failures on counter table writes
so this is good
  * we reproduced the issue in our TEST env by hard-killing one of
the nodes instead of normal shutdown (simulating a hardware
failure as we had in PROD)
Bingo, issue starts immediately!

Based on the above observations the "normal shutdown - no problem"
case gave an idea - so now we have a workaround how to get back
the cluster into a working state in a case if we would lose a node
permanently (or for a long time at least)

 1. (in our case) we stop the App to stop all Cassandra operations
 2. stop all remaining nodes in the cluster normally
 3. restart them normally

This way the remaining nodes realize the failed node is down and
they are jumping into expected processing - everything works
including counter table writes

If anyone has any idea what to check / change / do in our cluster
I'm all ears! :-)

thanks

Attila Wind

http://www.linkedin.com/in/attilaw
<http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


22.01.2021 07:35 keltezéssel, Attila Wind írta:


Hey guys,

Yesterday we had an outage after we have lost a node and we saw
such a behavior we can not explain.

Our data schema has both: counter and norma tables. And we have
replicationFactor = 2 and consistency level LOCAL_ONE (explicitly
set)

What we saw:
After a node went down the updates of the counter tables slowed
down. A lot! These updates normally take only a few millisecs but
now started to take 30-60 seconds(!)
At the same time the write ops against non-counter tables did not
show any difference. The app log was silent in a sense of errors.
So the queries - including the counter table updates - were not
failing (otherwise we see exceptions coming from DAO layer
originating from Cassandra driver) at all.
One more thing: only those updates suffered from the above huuuge
wait time where the lost node was involved (due to partition
key). Other updates just went fine

The whole stuff looks like Cassandra internally started to wait -
a lot - for the lost node. Updates finally succeeded without
failure - at least for the App (the client)

Did anyone ever experienced similar behavior?
What could be an explanation for the above?

Some more details: the App is implemented in Java 8, we are using
Datastax driver 3.7.1 and server cluster is running on Cassandra
4.0 alpha 4. Cluster size is 3 nodes.

Any feedback is appreciated! :-)

thanks

-- 
Attila Wind


http://www.linkedin.com/in/attilaw
<http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: strange behavior of counter tables after losing a node

2021-01-26 Thread Attila Wind

Hey All,

I'm coming back on my own question (see below) as this has happened 
again to us 2 days later so we took the time to further analyse this 
issue. I'd like to share our experiences and the workaround which we 
figured out too.


So to just quickly sum up the most important details again:

 * we have a 3 nodes cluster - Cassandra 4-alpha4 and RF=2 - in one DC
 * we are using ONE consistency level in all queries
 * if we lose one node from the cluster then
 o non-counter table writes are fine, remaining 2 nodes taking over
   everything
 o but counter table writes start to fail with exception
   "com.datastax.driver.core.exceptions.WriteTimeoutException:
   Cassandra timeout during COUNTER write query at consistency ONE
   (1 replica were required but only 0 acknowledged the write)"
 o the two remaining nodes are both producing hints files for the
   fallen one
 * just a note: counter_write_request_timeout_in_ms = 1,
   write_request_timeout_in_ms = 5000 in our cassandra.yaml

To test this further bit we did the following:

 * we shut down one of the nodes normally
   In this case we do not have the above behavior - everything happens
   as it should, no failures on counter table writes
   so this is good
 * we reproduced the issue in our TEST env by hard-killing one of the
   nodes instead of normal shutdown (simulating a hardware failure as
   we had in PROD)
   Bingo, issue starts immediately!

Based on the above observations the "normal shutdown - no problem" case 
gave an idea - so now we have a workaround how to get back the cluster 
into a working state in a case if we would lose a node permanently (or 
for a long time at least)


1. (in our case) we stop the App to stop all Cassandra operations
2. stop all remaining nodes in the cluster normally
3. restart them normally

This way the remaining nodes realize the failed node is down and they 
are jumping into expected processing - everything works including 
counter table writes


If anyone has any idea what to check / change / do in our cluster I'm 
all ears! :-)


thanks

Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


22.01.2021 07:35 keltezéssel, Attila Wind írta:


Hey guys,

Yesterday we had an outage after we have lost a node and we saw such a 
behavior we can not explain.


Our data schema has both: counter and norma tables. And we have 
replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set)


What we saw:
After a node went down the updates of the counter tables slowed down. 
A lot! These updates normally take only a few millisecs but now 
started to take 30-60 seconds(!)
At the same time the write ops against non-counter tables did not show 
any difference. The app log was silent in a sense of errors. So the 
queries - including the counter table updates - were not failing 
(otherwise we see exceptions coming from DAO layer originating from 
Cassandra driver) at all.
One more thing: only those updates suffered from the above huuuge wait 
time where the lost node was involved (due to partition key). Other 
updates just went fine


The whole stuff looks like Cassandra internally started to wait - a 
lot - for the lost node. Updates finally succeeded without failure - 
at least for the App (the client)


Did anyone ever experienced similar behavior?
What could be an explanation for the above?

Some more details: the App is implemented in Java 8, we are using 
Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 
alpha 4. Cluster size is 3 nodes.


Any feedback is appreciated! :-)

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




strange behavior of counter tables after losing a node

2021-01-21 Thread Attila Wind

Hey guys,

Yesterday we had an outage after we have lost a node and we saw such a 
behavior we can not explain.


Our data schema has both: counter and norma tables. And we have 
replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set)


What we saw:
After a node went down the updates of the counter tables slowed down. A 
lot! These updates normally take only a few millisecs but now started to 
take 30-60 seconds(!)
At the same time the write ops against non-counter tables did not show 
any difference. The app log was silent in a sense of errors. So the 
queries - including the counter table updates - were not failing 
(otherwise we see exceptions coming from DAO layer originating from 
Cassandra driver) at all.
One more thing: only those updates suffered from the above huuuge wait 
time where the lost node was involved (due to partition key). Other 
updates just went fine


The whole stuff looks like Cassandra internally started to wait - a lot 
- for the lost node. Updates finally succeeded without failure - at 
least for the App (the client)


Did anyone ever experienced similar behavior?
What could be an explanation for the above?

Some more details: the App is implemented in Java 8, we are using 
Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 
alpha 4. Cluster size is 3 nodes.


Any feedback is appreciated! :-)

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932




Re: Cassandra timeout during read query

2020-10-27 Thread Attila Wind

Hey Deepak,

"Are you suggesting to reduce the fetchSize (right now fetchSize is 
5000) for this query?"


Definitely yes! If you would go with 1000 only that would give 5x more 
chance to the concrete Cassandra node/nodes which is/are executing your 
query to finish in time pulling together the records (page) - thus helps 
you to avoid the timeout issue.
Based on our measurements smaller page sizes does not add too much to 
the overall query time at all - but helps Cassandra a lot to eventually 
fulfill the full request as she can do much better load balancing too as 
you are iterating over your result set.

I would give it a try - same tactics helped a lot on our side

I also recommend to try to optimize your data in parallel with the above 
- if possible and there is space for improvement.
All I wrote earlier counts a lot. You need to also take care of data 
cleanup strategies in your tables to keep the amount of data managed 
somehow. TTL based approach e.g. is the best if you ask me especially if 
you have huge data set.


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


27.10.2020 20:07 keltezéssel, Deepak Sharma írta:

Hi Attlila,

We did have larger partitions which are now below 100MB threshold 
after we ran nodetool repair. And now we do see most of the time, 
query runs are running successfully but there is a small percentage of 
query runs which are still failing.


Regarding your comment ```considered with your fetchSize together 
(driver setting on the query level)```, can you elaborate more on it? 
Are you suggesting to reduce the fetchSize (right now fetchSize is 
5000) for this query?


Also, we are trying to use prefetch feature as well but it is also not 
helping. Following is the code:


Iterator iter = resultSet.iterator();
while (iter.hasNext()) {
  if (resultSet.getAvailableWithoutFetching() <= fetchSize && 
!resultSet.isFullyFetched()) {

    resultSet.fetchMoreResults();
  }
  Row row = iter.next();
  .
}

Thanks,
Deepak

On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma 
mailto:sharma.dee...@salesforce.com>> 
wrote:


Thanks Attila and Aaron for the response. These are great
insights. I will check and get back to you in case I have any
questions.

Best,
Deepak

On Tue, Sep 15, 2020 at 4:33 AM Attila Wind
 wrote:

Hi Deepak,

Aaron has right - in order being able to help (better) you
need to share those details

That 5 secs timeout comes from the coordinator node I think -
see cassandra.yaml "read_request_timeout_in_ms" setting - that
is influencing this

But it does not matter too much... The point is that none of
the replicas could completed your query within that 5 secs.
And this is a clean indication of something is slow with your
query.
Maybe 4) is a bit less important here, or I would a bit make
it more precise: considered with your fetchSize together
(driver setting on the query level)

By experience one reason could be if the query which used to
works starts not to work any longer is growing number of data.
And a possible "wide cluster" problem.
Do you have monitoring on the Cassandra machines? What does
iowait show? (for us when things like this will start
happening is a clean indication)

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:

Deepak,

Can you reply with:

1) The query you are trying to run.
2) The table definition (PRIMARY KEY, specifically).
3) Maybe a little description of what the table is designed
to do.
4) How much data you're expecting returned (both # of rows
and data size).

Thanks,

Aaron


On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma

<mailto:sharma.dee...@salesforce.com.invalid> wrote:

Hi There,

We are running into a strange issue in our Cassandra
Cluster where one specific query is failing with
following error:

Cassandra timeout during read query at consistency QUORUM
(3 responses were required but only 0 replica responded)

This is not a typical query read timeout that we know for
sure. This error is getting spit out within 5 seconds and
the query timeout we have set is around 30 seconds

Can we know what is happening here and how can we
reproduce this in our local environment?

Thanks,
Deepak



best pointers to learn Cassandra maintenance

2020-10-08 Thread Attila Wind

Hey Guys,

We already started to feel that however Cassandra performance is awesome 
in the beginning over time

- as more and more data is present in the tables,
- more and more deletes creating tombstones,
- cluster gets here and there not that well balanced
performance can drop quickly and significantly...

After ~1 year of learning curve we had to realize that time by time we 
run into things like "running repairs", "running compactions", 
understand tombstones (row and range), TTLs, etc etc becomes critical as 
data is growing.
But on the other hand we also see often lots of warnings... Like "if you 
start Cassandra Reaper you can not stop doing that" ...


I feel a bit confused now, and so far never ran into an article which 
really deeply explains: why?

Why this? Why that? Why not this?

So I think the time has come for us in the team to start focusing on 
these topics now. Invest time to better understanding. Really learn what 
"repair" means, and all consequences of it, etc


So
Does anyone have any "you must read it" recommendations around these 
"long term maintenance" topics?
I mean really well explained blog post(s), article(s), book(s). Not some 
"half done" or  "I quickly write a post because it was too long ago when 
I blogged something..." things  :-)


Good pointers would be appreciated!

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-18 Thread Attila Wind

Hey guys,

I'm curious about your experiences regarding a data modeling question we 
are facing with.
At the moment we see 2 major different approaches in terms of how to 
build the tables
But I'm googling around already for days with no luck to find any useful 
material explaining to me how a Map (as collection datatype) works on 
the storage engine, and what could surprise us later if we . So decided 
to ask this question... (If someone has some nice pointers here maybe 
that is also much appreciated!)


So
*To describe the problem* in a simplified form

 * Imagine you have users (everyone is identified with a UUID),
 * and we want to answer a simple question: "have we seen this guy before?"
 * we "just" want to be able to answer this question for a limited time
   - let's say for 3 months
 * but... there are lots of lots of users we run into... many
   millions / each day...
 * and ~15-20% of them are returning users only - so many guys we
   just might see once

We are thinking about something like a big big Map, in a form of
    userId => lastSeenTimestamp

Obviously if we would have something like that then answering the above 
question is simply:

    if(map.get(userId) != null)  => TRUE - we have seen the guy before

Regarding the 2 major modelling approaches I mentioned above

*Approach 1*
Just simply use a table, something like this

CREATE TABLE IF NOT EXISTS users (
    user_id            varchar,
    last_seen        int,                -- a UNIX timestamp is enough, 
thats why int


    PRIMARY KEY (user_id)
) 
AND default_time_to_live = <3 months of seconds>;

*Approach 2
*to do not produce that much rows, "cluster" the guys a bit together 
(into 1 row) so
introduce a hashing function over the userId, producing a value btw [0; 
1]

and go with a table like

CREATE TABLE IF NOT EXISTS users (
    user_id_hash    int,
    users_seen        map,            -- this is a userId => 
last timestamp map


    PRIMARY KEY (user_id_hash)
) 
AND default_time_to_live = <3 months of seconds>;        -- yes, its 
clearly not a good enough way ...



In theory:

 * on a WRITE path both representation gives us a way to do the write
   without the need of read
 * even the READ path is pretty efficient in both cases
 * Approach2 is worse definitely when we come to the cleanup - "remove
   info if older than 3 month"
 * Approach2 might affect the balance of the cluster more - thats clear
   (however not that much due to the "law of large number" and really
   enough random factors)

And what we are struggling around is: what do you think
*Which approach would be better over time? *So will slow down the 
cluster less considering in compaction etc etc


As far as we can see the real question is:

which hurts more?

 * much more rows, but very small rows (regarding data size), or
 * much less rows, but much bigger rows (regarding data size)

?

Any thoughts, comments, pointers to some related case studies, articles, 
etc is highly appreciated!! :-)


thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




Re: Cassandra timeout during read query

2020-09-15 Thread Attila Wind

Hi Deepak,

Aaron has right - in order being able to help (better) you need to share 
those details


That 5 secs timeout comes from the coordinator node I think - see 
cassandra.yaml "read_request_timeout_in_ms" setting - that is 
influencing this


But it does not matter too much... The point is that none of the 
replicas could completed your query within that 5 secs. And this is a 
clean indication of something is slow with your query.
Maybe 4) is a bit less important here, or I would a bit make it more 
precise: considered with your fetchSize together (driver setting on the 
query level)


By experience one reason could be if the query which used to works 
starts not to work any longer is growing number of data. And a possible 
"wide cluster" problem.
Do you have monitoring on the Cassandra machines? What does iowait show? 
(for us when things like this will start happening is a clean indication)


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:

Deepak,

Can you reply with:

1) The query you are trying to run.
2) The table definition (PRIMARY KEY, specifically).
3) Maybe a little description of what the table is designed to do.
4) How much data you're expecting returned (both # of rows and data size).

Thanks,

Aaron


On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma 
 wrote:


Hi There,

We are running into a strange issue in our Cassandra Cluster where
one specific query is failing with following error:

Cassandra timeout during read query at consistency QUORUM (3
responses were required but only 0 replica responded)

This is not a typical query read timeout that we know for sure.
This error is getting spit out within 5 seconds and the query
timeout we have set is around 30 seconds

Can we know what is happening here and how can we reproduce this
in our local environment?

Thanks,
Deepak



best setup of tombstones cleanup over a "wide" table (was: efficient delete over a "wide" table?)

2020-09-05 Thread Attila Wind
Thank you guys for the answers - I expected this but wanted to verify 
(who knows how smart Cassandra can be in the background! :-) )


@Jeff: unfortunately the records we will pick up for delete are not 
necessarily "neighbours" in terms of creation time so forming up 
contiguous ranges can not be done...


Just one more question left in this case...
As this way we will have lots of row tombstones generated over this 
"wide" table
What would be your recommended table setup here (in terms of 
gc_grace_seconds, compaction, compression, etc etc)? Currently we have 
default setup for everything which I believe should be fine tuned a bit 
better


FYI: this table has ~500k new UUID keyed rows every day in each partition...

thanks a lot!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


04.09.2020 16:33 keltezéssel, Jeff Jirsa írta:


As someone else pointed out it’s the same number of tombstones. Doing 
distinct queries gives you a bit more flexibility to retry it one 
fails, but multiple in one command avoids some contention on the 
memtable partition objects.


If you’re happen to be using type1 uuids (timeuuid) AND you’re 
deleting contiguous ranges, you could do a DELETE ... WHERE uuid>=? 
AND uuid <= ?


This would trade lots of tombstones for a single range tombstones, but 
may not match your model.





On Sep 3, 2020, at 11:57 PM, Attila Wind  wrote:



Hi C* gurus,

I'm looking for the best strategy to delete records from a "wide" table.
"wide" means the table stores records which have a UUID-style id 
element of the key - within each partition


So yes, its not the partitioning key... The partitioning key is 
actually kind of a customerId at the moment and actually I'm not even 
sure this is the right model for this table... Given the fact that 
number of curtomerIds <<< number of UUIDs probably not.
But lets exclude this for a moment maybe and come back to the main 
question of mine!


So the question:
when I delete records from this table, given the fact I can and I 
will delete in "batch fashion" (imagine kind of a scheduled job which 
collects - let's say - 1000 records) every time I do deletes...


Would there be a difference (in terms of generated tombstones) if I 
would


a) issue delete one-by-one like
DELETE FROM ... WHERE ... uuid = 'a'
DELETE FROM ... WHERE ... uuid = 'b'
...
DELETE FROM ... WHERE ... uuid = 'z'

or

b) issue delete in a group fashion like
DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z')

?

or is there any other way to effeicently delete which I miss here?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




efficient delete over a "wide" table?

2020-09-03 Thread Attila Wind

Hi C* gurus,

I'm looking for the best strategy to delete records from a "wide" table.
"wide" means the table stores records which have a UUID-style id element 
of the key - within each partition


So yes, its not the partitioning key... The partitioning key is actually 
kind of a customerId at the moment and actually I'm not even sure this 
is the right model for this table... Given the fact that number of 
curtomerIds <<< number of UUIDs probably not.
But lets exclude this for a moment maybe and come back to the main 
question of mine!


So the question:
when I delete records from this table, given the fact I can and I will 
delete in "batch fashion" (imagine kind of a scheduled job which 
collects - let's say - 1000 records) every time I do deletes...


Would there be a difference (in terms of generated tombstones) if I would

a) issue delete one-by-one like
DELETE FROM ... WHERE ... uuid = 'a'
DELETE FROM ... WHERE ... uuid = 'b'
...
DELETE FROM ... WHERE ... uuid = 'z'

or

b) issue delete in a group fashion like
DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z')

?

or is there any other way to effeicently delete which I miss here?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




Re: tombstones - however there are no deletes

2020-08-21 Thread Attila Wind

right! silly me (regarding "can't have null for clustering column") :-)

OK code is modified, we stopped using NULL on that column. In a few days 
we will see if this was the cause.


Thanks for the useful info eveyrone! Helped a lot!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


21.08.2020 11:04 keltezéssel, Alex Ott írta:
inserting null for any column will generate the tombstone (and you 
can't have null for clustering column, except case when it's an empty 
partition with static column).
if you're really inserting the new data, not overwriting existing one 
- use UNSET instead of null


On Fri, Aug 21, 2020 at 10:45 AM Attila Wind  
wrote:


Thanks a lot! I will process every pointers you gave - appreciated!

1. we do have collection column in that table but that is (we have
only 1 column) a frozen Map - so I guess "Tombstones are also
implicitly created any time you insert or update a row which has
an (unfrozen) collection column: list<>, map<> or set<>.  This has
to be done in order to ensure the new write replaces any existing
collection entries." does not really apply here

2. "Isn’t it so that explicitly setting a column to NULL also
result in a tombstone"
Is this true for all columns? or just clustering key cols?
Because if for all cols (which would make sense maybe to me more)
then we found the possible reason.. :-)
As we do have an Integer coulmn there which is actually NULL often
(and so far in all cases)


 Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


21.08.2020 09:49 keltezéssel, Oleksandr Shulgin írta:

On Fri, Aug 21, 2020 at 9:43 AM Tobias Eriksson
mailto:tobias.eriks...@qvantel.com>> wrote:

Isn’t it so that explicitly setting a column to NULL also
result in a tombstone


True, thanks for pointing that out!

Then as mentioned the use of list,set,map can also result in
tombstones

See

https://www.instaclustr.com/cassandra-collections-hidden-tombstones-and-how-to-avoid-them/


And A. Ott has already mentioned both these possible reasons :-)

--
Alex




--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


Re: tombstones - however there are no deletes

2020-08-21 Thread Attila Wind

Thanks a lot! I will process every pointers you gave - appreciated!

1. we do have collection column in that table but that is (we have only 
1 column) a frozen Map - so I guess "Tombstones are also implicitly 
created any time you insert or update a row which has an (unfrozen) 
collection column: list<>, map<> or set<>.  This has to be done in order 
to ensure the new write replaces any existing collection entries." does 
not really apply here


2. "Isn’t it so that explicitly setting a column to NULL also result in 
a tombstone"

Is this true for all columns? or just clustering key cols?
Because if for all cols (which would make sense maybe to me more) then 
we found the possible reason.. :-)
As we do have an Integer coulmn there which is actually NULL often (and 
so far in all cases)



 Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


21.08.2020 09:49 keltezéssel, Oleksandr Shulgin írta:
On Fri, Aug 21, 2020 at 9:43 AM Tobias Eriksson 
mailto:tobias.eriks...@qvantel.com>> wrote:


Isn’t it so that explicitly setting a column to NULL also result
in a tombstone


True, thanks for pointing that out!

Then as mentioned the use of list,set,map can also result in
tombstones

See

https://www.instaclustr.com/cassandra-collections-hidden-tombstones-and-how-to-avoid-them/


And A. Ott has already mentioned both these possible reasons :-)

--
Alex



tombstones - however there are no deletes

2020-08-20 Thread Attila Wind

Hi Cassandra Gurus,

Recently I captured a very interesting warning in the logs saying

2020-08-19 08:08:32.492 
[cassandra-client-keytiles_data_webhits-nio-worker-2] WARN 
com.datastax.driver.core.RequestHandler - Query '[3 bound values] select 
* from visit_sess
ion_by_start_time_v4 where container_id=? and first_action_time_frame_id 
>= ? and first_action_time_frame_id <= ?;' generated server side 
warning(s): *Read 6628 live rows and
6628 tombstone cells*for query SELECT * FROM 
keytiles_data_webhits.visit_session_by_start_time_v4 WHERE container_id 
= 5YzsPfE2Gcu8sd-76626 AND first_action_time_frame_id > 4
43837 AND first_action_time_frame_id <= 443670 AND user_agent_type > 
browser-mobile AND unique_webclient_id > 
045d1683-c702-48bd-9d2b-dcf1ca87ac7c AND first_action_ts > 15978

15766 LIMIT 6628 (see tombstone_warn_threshold)

What makes this interesting to me is the fact we never issue not even 
row level deletes but any kind of deletes against this table for now
So I'm wondering what can result in tombstone creation in Cassandra - 
apart from explicit DELETE queries and TTL setup...


My suspicion is (but I'm not sure) that as we are going with "select *" 
read strategy, then calculate everything in-memory, eventually writing 
back with kinda "update *" queries to Cassandra in this table (so not 
updating just a few columns but everything) can lead to these... Can it?
I tried to search around this sympthom but was not successful - so 
decided to ask you guys maybe someone can give us a pointer...


Some more info:

 * the table does not have TTL set - this mechanism is turned off
 * the LIMIT param in upper query comes from paging size
 * we are using Cassandra4 alpha3
 * we also have a few similarly built tables where we follow the above
   described "update *" policy on write path - however those tables are
   counter tables... when we mass-read them into memory we also go with
   "select *" logic reading up tons of rows. The point is we never saw
   such a warning for these counter tables however we are handling them
   same fashion... ok counter tables work differently but still
   interesting to me why those never generated things like this

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




Re: relation btw LWTs and RF

2020-06-26 Thread Attila Wind

Thank you!

The 2nd link you sent is very very good description! I recommend for 
others too (who might run into this question via mail archive search 
later...)


In my opinion it explains the entire problem space regarding how LWTs 
are working while also putting them into the context of "consistency 
level" / different phases of LWT very well.
Yesterday I was searching / reading at least 15 different articles + 
docs - none of them answered my questions (and I just had more and more 
as progressing with reading) entirely - this one is a nice one!


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


26.06.2020 08:15 keltezéssel, Erick Ramirez írta:
You are correct. Lightweight transactions perform a read-before-write 
[1]. The read phase is performed with a serial consistency which 
requires a quorum of nodes in the local DC (LOCAL_SERIAL) or across 
the whole cluster (SERIAL) [2].


Quorum of 2 nodes is 2 nodes so RF=2 cannot tolerate a node outage. 
Cheers!


[1] 
https://www.datastax.com/blog/2019/04/lightweight-transactions-datastax-enterprise
[2] 
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__table-write-consistency




Re: relation btw LWTs and RF

2020-06-25 Thread Attila Wind

Ah yeah forgot to mention - I am using Cassandra 4.0-alpha4

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932


26.06.2020 08:06 keltezéssel, Attila Wind írta:


Hey guys,

Recently I ran into an interesting situation (by trying to add 
optimistic locking strategy to one of the tables)
Which lead me eventually to the following observation. Can you confirm 
(or argue) this is correct when I am saying:


"It is not possible to use conditional queries with ReplicationFactor 
= 2 with tolerating 1 node is down (out of that 2 replicas)"


?

Thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




relation btw LWTs and RF

2020-06-25 Thread Attila Wind

Hey guys,

Recently I ran into an interesting situation (by trying to add 
optimistic locking strategy to one of the tables)
Which lead me eventually to the following observation. Can you confirm 
(or argue) this is correct when I am saying:


"It is not possible to use conditional queries with ReplicationFactor = 
2 with tolerating 1 node is down (out of that 2 replicas)"


?

Thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +49 176 43556932




Re: IN OPERATOR VS BATCH QUERY

2020-02-20 Thread Attila Wind
Hi Sergio,

AFAIK you use batches when you want to get "all or nothing" approach from
Cassandra. So turning multiple statements into one atomic operation.

One very typical use case for this is when you have denormalized data in
multiple tables (optimized for different queries) but you need to modify
all of them the same way as they were just one entity.

This means that if any ofyour delete statements would fail for whatever
reason then all of your delete statements would be rolled back.

I think you dont want that overhead here for sure...

We are not there yet with our development but we will need similar
"cleanup" functionality soon.
I was also thinking about the IN operator for similar cases but I am
curious if anyone here has better idea...
Why does the IN operator blowing up the coordinator? I do not entirely get
it...

Thanks
Attila

Sergio  ezt írta (időpont: 2020. febr. 21., P
3:44):

> The current approach is delete from key_value where id = whatever and it
> is performed asynchronously from the client.
> I was thinking to reduce at least the network round-trips between client
> and coordinator with that Batch approach. :)
>
> In any case, I would test it it will improve or not. So when do you use
> batch then?
>
> Best,
>
> Sergio
>
> On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
> wrote:
>
>> Batches aren't really meant for optimisation in the same way as RDBMS. If
>> anything, it will just put pressure on the coordinator having to fire off
>> multiple requests to lots of replicas. The IN operator falls into the same
>> category and I personally wouldn't use it with more than 2 or 3 partitions
>> because then the coordinator will suffer from the same problem.
>>
>> If it were me, I'd just issue single-partition deletes and throttle it to
>> a "reasonable" throughput that your cluster can handle. The word
>> "reasonable" is in quotes because only you can determine that magic number
>> for your cluster through testing. Cheers!
>>
>


Re: Counter table in Cassandra

2019-05-29 Thread Attila Wind
Hi Garvit,

I can not answer your main question but when I read your lines one thing
was popping up constantly: "why do you ask this?"

So what is the background of this question? Do you see anything smelly?

Actually
a) I always assumed so naturally there are of course lots of in-parallel
activities (writes) against any tables includin counters. So of course
there is a race condition and probably threads yes

b) Cassandra do not have isolated transactions so of course in a complex
flow (using multiple tables) there is no business data consistency
guarantee for sure

c) until you are doing just +/- ops it is a mathematical fact that
execution order of writes is not really important. Repeating +1 increase 5
times will result in higher counter by num 5...

Please share your background I am interested in it!

Cheers
Attila

2019. máj. 29., Sze 2:34 dátummal Garvit Sharma  ezt
írta:

> Hi,
>
> I am using counter tables in Cassandra and I want to understand how the
> concurrent updates to counter table are handled in Cassandra.
>
> There are more than one threads who are responsible for updating the
> counter for a partition key. Multiple threads can also update the counter
> for the same key.
>
> In case when more than one threads updating the counter for the same key,
> how Cassandra is handling the race condition?
>
> UPDATE cycling.popular_count
>  SET popularity = popularity + 1
>  WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
>
>
> Are there overheads of using counter tables?
> Are there alternatives to counter tables?
>
> Thanks,
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Attila Wind

Hi Shalom,

Thanks for your notes! So you also experienced this thing... fine

Then maybe the best rules to follow are these:
a) never(!) run a query "ALLOW FILTERING" on a Production cluster
b) if you need these queries build a test cluster (somehow) and mirror 
the data (somehow) OR add denormalized tables (write + code complexity 
overhead) to fulfill those queries


Can we agree on this one maybe as a "good to follow" policy?

In our case luckily users = developers always. So I can expect them 
being aware of the consequences of a particular query.
We also have test data fully mirrored into a test cluster. So running 
those queries on test system is possible.
Plus If for whatever reason we really really need to run such a query in 
Prod I can simply instruct them test query like this in the test system 
for sure


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 28. 8:59, shalom sagges wrote:

Hi Attila,

I'm definitely no guru, but I've experienced several cases where 
people at my company used allow filtering and caused major performance 
issues.
As data size increases, the impact will be stronger. If you have large 
partitions, performance will decrease.
GC can be affected. And if GC stops the world too long for too many 
times, you will feel it.


I sincerely believe the best way would be to educate the users and 
remodel the data. Perhaps you need to denormalize your tables or at 
least use secondary indices (I prefer to keep it as simple as possible 
and denormalize).
If it's a cluster for analytics, perhaps you need to build a 
designated cluster only for that so if something does break or get too 
pressured, normal activities wouldn't be affected, but there are pros 
and cons for that idea too.


Hope this helps.

Regards,


On Tue, May 28, 2019 at 9:43 AM Attila Wind  
wrote:


Hi Gurus,

Looks we stopped this thread. However I would be very much curious
answers regarding b) ...

Anyone any comments on that?
I do see this as a potential production outage risk now...
Especially as we are planning to run analysis queries by hand
exactly like that over the cluster...

thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

I think you should ask yourself a different question. Why am I
using ALLOW FILTERING in the first place? What happens if I
remove it from the query?
I prefer to denormalize the data to multiple tables or at least
create an index on the requested column (preferably queried
together with a known partition key).

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

I think it can justify the unresponsiveness. When using ALLOW
FILTERING, you are doing something like a full table scan in a
relational database.

There is a lot of information on the internet regarding this
subject such as

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

Hope this helps.

    Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind
 <mailto:attilaw@swf.technology> wrote:

Hi,

"When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by node,
searching for the requested data."

a) Interesting... But only in case you do not provide
partitioning key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if
servers are busy with the request seriously becoming
non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.
When you run a query with allow filtering, Cassandra doesn't
know where the data is located, so it has to go node by
node, searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R&D teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mail

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-27 Thread Attila Wind

Hi Gurus,

Looks we stopped this thread. However I would be very much curious 
answers regarding b) ...


Anyone any comments on that?
I do see this as a potential production outage risk now... Especially as 
we are planning to run analysis queries by hand exactly like that over 
the cluster...


thanks!

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 11:42, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R&D teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: CassKop : a Cassandra operator for Kubernetes developped by Orange

2019-05-25 Thread Attila Wind
Maybe my understanding is wrong and I am not really a "deployment guru" 
but it looks like to me that


Orange (https://github.com/Orange-OpenSource/cassandra-k8s-operator, 1 
contributor and 1 commit for now on 2019-05-24)
and sky-uk/cassandra-operator 
(https://github.com/sky-uk/cassandra-operator , it's in alpha phase and 
not recommended in production, 3 contributors, 24 commits btw 
2019.03.25-2019.05.21, 32 Issues)
are developing something I could use in my OWN(!) Kubernetes based 
solution (even on premise if I want or whatever)

They are both open source. Right?

While
Datastax and Instaclustr are commercial players and offer the solution 
in a tightly-coupled way with Cloud only
(I just took a quick look on Instaclustr but could not even figure out 
pricing info for this service... probably I am lame... or not? :-))


So this looks to me a nice competition...
What do I miss?

ps.: maybe Orange and sky-uk/cassandra-operator guys should 
cooperate..?? Others are clearly building business around it


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 24. 20:36, John Sanda wrote:

There is also
https://github.com/sky-uk/cassandra-operator

On Fri, May 24, 2019 at 2:34 PM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> 
wrote:


Fantastic! Now there are three teams making k8s operators for C*:
Datastax, Instaclustr, and now Orange.

rahul.xavier.si...@gmail.com <mailto:rahul.xavier.si...@gmail.com>

http://cassandra.link

I'm speaking at #DataStaxAccelerate, the world’s premiere
#ApacheCassandra conference, and I want to see you there! Use my
code Singh50for 50% off your registration.
www.datastax.com/accelerate <http://www.datastax.com/accelerate>


On Fri, May 24, 2019 at 9:07 AM Jean-Armel Luce
mailto:jaluc...@gmail.com>> wrote:

Hi folks,

We are excited to announce that CassKop, a Cassandra operator
for Kubernetes developped by Orange teams, is now ready for
Beta testing.

CassKop works as a usual K8S controller (reconcile the real
state with a desired state) and automates the Cassandra
operations through JMX. All the operations are launched by
calling standard K8S APIs (kubectl apply …) or by using a K8S
plugin (kubectl casskop …).

CassKop is developed in GO, based on CoreOS operator-sdk
framework.
Main features already available :
- deploying a rack aware cluster (or AZ aware cluster)
- scaling up & down (including cleanups)
- setting and modifying configuration parameters (C* and JVM
parameters)
- adding / removing a datacenter in Cassandra (all datacenters
must be in the same region)
- rebuilding nodes
- removing node or replacing node (in case of hardware failure)
- upgrading C* or Java versions (including upgradesstables)
- monitoring (using Prometheus/Grafana)
- ...

By using local and persistent volumes, it is possible to
handle failures or stop/start nodes for maintenance operations
with no transfer of data between nodes.
Moreover, we can deploy cassandra-reaper in K8S and use it for
scheduling repair sessions.
For now, we can deploy a C* cluster only as a mono-region
cluster. We will work during the next weeks to be able to
deploy a C* cluster as a multi regions cluster.

Still in the roadmap :
- Network encryption
- Monitoring (exporting logs and metrics)
- backup & restore
- multi-regions support

We'd be interested to hear you try this and let us know what
you think!

Please read the description and installation instructions on
https://github.com/Orange-OpenSource/cassandra-k8s-operator.
For a quick start, you can also follow this step by step guide
:

https://orange-opensource.github.io/cassandra-k8s-operator/index.html?slides=Slides-CassKop-demo.md#1


The CassKop Team

--

- John


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-23 Thread Attila Wind

Hi again,

so remaining with a) for a second...
"Why am I using ALLOW FILTERING in the first place?"
Fully agreed! To put it this way: as I reviewer I never want to see 
string occurence "allow filtering" in any selects done by a production 
code. I clearly consider it as an indicator of a wrong db design.
Still! There are use cases - and if I am not mistaken the original 
question was around that - when for whatever reasons PERSONS are running 
such selects manually. E.g. for us where we use Cassandra we have things 
like this:  for analysis purposes. So I think this is a valid use case. 
And once we have found a valid use case question stands. Right? So back 
to the question: "But only in case you do not provide partitioning key 
right?" - I assume the answer is yes right? :-)


b) "I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database"
I get it. Sure. But is Cassandra kind of "single threaded" so much that 
if a node is running one(!) big big extensive query it becomes fully 
unresponsive? I doubt it...
That's what I meant by saying "does not explain or justify". From my 
perspective I definitely consider this kind of being unresponsiveness as 
an abnormal state ...


cheers

Attila


On 23.05.2019 11:42 AM, shalom sagges wrote:
a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


I think you should ask yourself a different question. Why am I using 
ALLOW FILTERING in the first place? What happens if I remove it from 
the query?
I prefer to denormalize the data to multiple tables or at least create 
an index on the requested column (preferably queried together with a 
known partition key).


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


I think it can justify the unresponsiveness. When using ALLOW 
FILTERING, you are doing something like a full table scan in a 
relational database.


There is a lot of information on the internet regarding this subject 
such as 
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/


Hope this helps.

Regards,


On Thu, May 23, 2019 at 7:33 AM Attila Wind  
wrote:


Hi,

"When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node, searching
for the requested data."

a) Interesting... But only in case you do not provide partitioning
key right? (so IN() is for partitioning key?)

b) Still does not explain or justify "all 8 nodes to halt and
unresponsiveness to external requests" behavior... Even if servers
are busy with the request seriously becoming non-responsive...?

cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them,
as per connection consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know
where the data is located, so it has to go node by node,
searching for the requested data.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?
I'm not familiar with such a flag. In my case, I just try to
educate the R&D teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov
mailto:vsfilare...@gmail.com>> wrote:

Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced
data. Usual per-partition selects work somewhat fine, and are
processed by limited number of nodes, but if user issues
SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8
nodes to halt and unresponsiveness to external requests while
disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and
cluster goes back to being responsive. Replication level
across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of
them, as per connection consistency/table replication
settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread Attila Wind

Hi,

"When you run a query with allow filtering, Cassandra doesn't know where 
the data is located, so it has to go node by node, searching for the 
requested data."


a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by 
a limited subset of C* nodes and not by all of them, as per connection 
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know 
where the data is located, so it has to go node by node, searching for 
the requested data.


2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate 
the R&D teams.


Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
mailto:vsfilare...@gmail.com>> wrote:


Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data.
Usual per-partition selects work somewhat fine, and are processed
by limited number of nodes, but if user issues SELECT WHERE IN ()
ALLOW FILTERING, such command stalls all 8 nodes to halt and
unresponsiveness to external requests while disk IO jumps to 100%
across whole cluster. In several minutes all nodes seem to finish
ptocessing the request and cluster goes back to being responsive.
Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them, as
per connection consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.