Re: moving from 4.0-alpha4 to 4.0.1
Thank you Paulo for your detailed answer! I was not monitoring NEWS.txt in the Git repo so far but that file definitely has info I was looking for. cheers Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 09.10.2021 15:07 keltezéssel, Paulo Motta írta: Hi Attila, Minor version upgrades are generally fine to do in-place, unless otherwise specified on NEWS.txt <https://github.com/apache/cassandra/blob/cassandra-4.0.1/NEWS.txt <https://github.com/apache/cassandra/blob/cassandra-4.0.1/NEWS.txt>> for the specific versions you're upgrading. Cassandra is designed with this goal in mind, and potentially disruptive changes can only be introduced in major versions, which require a little more care during the upgrade process. It's definitely safe to do an in-place one-node-at-a-time upgrade for minor versions in the same major series (ie. 4.0-alpha to 4.0.1). Nevertheless it doesn't hurt to take a global snapshot "just-in-case", so you can rollback in case you run into an unexpected issue, but this is just extra safety and not strictly required. Unfortunately there's no official upgrade guide yet, this is something the community is working on to provide soon, but you can find some unofficial ones with a quick google search. Major upgrades are also designed to be harmless, but a little bit more preparation is required to ensure a smooth ride due to potentially non-compatible changes. I've written an upgrade guide sometime ago which can be useful to prepare for a major upgrade, but can also apply to minor upgrades as well to ensure extra safety during the process: http://monkeys.chaordic.com.br/operation/2014/04/11/zero-downtime-cassandra-upgrade.html <http://monkeys.chaordic.com.br/operation/2014/04/11/zero-downtime-cassandra-upgrade.html> Cheers and good luck! Paulo Em sáb., 9 de out. de 2021 às 06:56, Attila Wind escreveu: Hi all, I have 2 quick questions 1. We have a cluster running 4.0-alpha4. Now 4.0.1 is out and obviously it would make lots of sense to switch to this version. Does anyone know if we can do it simply "in place"? I mean we just ugrade the software and restart? Or it would not work / would be dangerous due to some storage layer incompatibilities or other risk factors? So better to run a (usual) data migration process..? 2. Actually the above brought the more generic question: is the community maintaining any kind of guide/readme/whatever one can use to find answer for similar questions? As a user I see the changelog and that's cool but that is not providing obvious answers (of course). So I mean some sort of migration hints/guide. thanks! -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
moving from 4.0-alpha4 to 4.0.1
Hi all, I have 2 quick questions 1. We have a cluster running 4.0-alpha4. Now 4.0.1 is out and obviously it would make lots of sense to switch to this version. Does anyone know if we can do it simply "in place"? I mean we just ugrade the software and restart? Or it would not work / would be dangerous due to some storage layer incompatibilities or other risk factors? So better to run a (usual) data migration process..? 2. Actually the above brought the more generic question: is the community maintaining any kind of guide/readme/whatever one can use to find answer for similar questions? As a user I see the changelog and that's cool but that is not providing obvious answers (of course). So I mean some sort of migration hints/guide. thanks! -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: Integration tests - is anything wrong with Cassandra beta4 docker??
Yes this defintely explains the (much) longer startup - however I'm not sure about stability issues still... Anyways thanks Erik for the info! Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 10.03.2021 14:26 keltezéssel, Erik Merkle írta: I opened a ticket when beta4 was released, but no Docker image was available. It turned out the reason the image wasn't published is that the tests DockerHub uses to verify the builds were timing out, so it may be a similar issue you are running into. The ticket is here: https://github.com/docker-library/cassandra/issues/221#issuecomment-761187461 <https://github.com/docker-library/cassandra/issues/221#issuecomment-761187461> DockerHub ended up altering some of the default Cassandra settings for their verification tests only, to allow the tests to pass. For your integration tests, you may want to do something similar. On Wed, Mar 10, 2021 at 5:08 AM Attila Wind wrote: Hi Guys, We are using dockerized Cassandra to run our integration tests. So far we were using the 4.0-beta2 docker image (https://hub.docker.com/layers/cassandra/library/cassandra/4.0-beta2/images/sha256-77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4?context=explore <https://urldefense.proofpoint.com/v2/url?u=https-3A__hub.docker.com_layers_cassandra_library_cassandra_4.0-2Dbeta2_images_sha256-2D77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4-3Fcontext-3Dexplore&d=DwMCBA&c=adz96Xi0w1RHqtPMowiL2g&r=uHjHq8qzJoJORfwNE9cgGQeHQBiMQtuQd1uTkDPFJP0&m=cyTtaxvGkKVI7sE73eON05g5XWqhkJbmLe0kdT3u2uk&s=ICCzMRahGNP7SHgwY-VsYo5fNj1gww3OaQRLKOpRHcA&e=>) Recently I tried to switch to the 4.0-beta4 docker but noticed a few problems... * image starts much much slower for me (4x more time to come up and can connect to it) * unlike the beta2 version beta4 barely survive all of our test cases... typically it gets stucked and fail with timeouts at around 60-80% of completed test cases Anyone similar / related experiences maybe? thanks -- Attila Wind http://www.linkedin.com/in/attilaw <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_attilaw&d=DwMCBA&c=adz96Xi0w1RHqtPMowiL2g&r=uHjHq8qzJoJORfwNE9cgGQeHQBiMQtuQd1uTkDPFJP0&m=cyTtaxvGkKVI7sE73eON05g5XWqhkJbmLe0kdT3u2uk&s=6zQfwDS6Cxc7wA1UVRJzQ3AYwjdd1bUVfGhY4jaao9M&e=> Mobile: +49 176 43556932 -- Erik Merkle e. erik.mer...@datastax.com <mailto:erik.mer...@datastax.com> w. www.datastax.com <http://www.datastax.com>
Integration tests - is anything wrong with Cassandra beta4 docker??
Hi Guys, We are using dockerized Cassandra to run our integration tests. So far we were using the 4.0-beta2 docker image (https://hub.docker.com/layers/cassandra/library/cassandra/4.0-beta2/images/sha256-77aa30c8e82f0e761d1825ef7eb3adc34d063b009a634714af978268b71225a4?context=explore) Recently I tried to switch to the 4.0-beta4 docker but noticed a few problems... * image starts much much slower for me (4x more time to come up and can connect to it) * unlike the beta2 version beta4 barely survive all of our test cases... typically it gets stucked and fail with timeouts at around 60-80% of completed test cases Anyone similar / related experiences maybe? thanks -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: moving away from Counters - strategy?
Ahh forgot to mention we have RF=2, sorry! LWT requires RF >= 3 otherwise we can not tolerate losing a node (because of LOCAL_QUORUM is working in the background which you can not really change AFAIK...) Or am I wrong? Plus in a highly concurrent setup writing the same PK this optimistic locking fashion would end up in lots of retries I'm afraid. Eventually making this strategy much more expensive. Or am I wrong here too? Cheers Attila On Sun, 7 Mar 2021, 05:20 Jeff Jirsa, wrote: > > You can do this with conditional (CAS) updates - update ... set c=y if c=x > > Requires serial writes and serial reads, so a bit more expensive, but > allows TTL. > > > On Mar 6, 2021, at 8:03 AM, Attila Wind wrote: > > > > Hi guys, > > We do use Counter tables a lot because in our app we have several things > to count (business logic) > > More time we work with Cassandra we keep hearing more and more: "you > should not use counter tables because ." > Yes, we also feel here and there the trade off is too much restrictive - > for us what hurts now days is that deleting counters it seems not that > simple... Also the TTL possibility we do miss a lot. > > But I have to confess I do not see an obvious migration strategy here... > What bothers me e.g.: concurrency, and wrong results thanks to that > namely > > If I want to fulfill the mission "UPDATE table SET mycounter = mycounter + > x WHERE ..." does > with traditional table (with an int column) I need to do this: > 1. read the value of "mycounter" > 2. add x to the value I readc(in memory) > 3. update mycounter = new value > > Needless to say that if I have a race condition so ThreadA and ThreadB are > executing the above sequence ~ the same time then the mycounter value will > be wrong... > > I started to wonder: how do you solve this problem? > Is anyone aware of any nice post/article regarding migration strategy - > stepping away from counters? > > thanks! > > > -- > Attila Wind > > http://www.linkedin.com/in/attilaw > Mobile: +49 176 43556932 > > >
moving away from Counters - strategy?
Hi guys, We do use Counter tables a lot because in our app we have several things to count (business logic) More time we work with Cassandra we keep hearing more and more: "you should not use counter tables because ." Yes, we also feel here and there the trade off is too much restrictive - for us what hurts now days is that deleting counters it seems not that simple... Also the TTL possibility we do miss a lot. But I have to confess I do not see an obvious migration strategy here... What bothers me e.g.: concurrency, and wrong results thanks to that namely If I want to fulfill the mission "UPDATE table SET mycounter = mycounter + x WHERE ..." does with traditional table (with an int column) I need to do this: 1. read the value of "mycounter" 2. add x to the value I readc(in memory) 3. update mycounter = new value Needless to say that if I have a race condition so ThreadA and ThreadB are executing the above sequence ~ the same time then the mycounter value will be wrong... I started to wonder: how do you solve this problem? Is anyone aware of any nice post/article regarding migration strategy - stepping away from counters? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: underutilized servers
Thanks Bowen, * "How do you split?" challenging to answer short, but let me try: physical host has cores from idx 0 - 11 (6 physical and 6 virtual in pairs - they are in pairs as 0,6 belongs together, then 1,7 and then 2,8 and so on) What we do is that in the virt-install command we use --cpu host-passthrough --cpuset={{virtinst_cpu_set}} --vcpus=6 where {{virtinst_cpu_set}} is - 0,6,1,7,2,8 - for CassandraVM - 3,9,4,10,5,11 - for the other VM (we split the physical host into 2 VMs) * "do you expose physical disks to the VM or use disk image files" no images, physical host has 2 spinning disks and 1 SSD drive CassandraVM gets assigned explicitly 1 of the spinning disks and she also gets assigned a partition of the SSD (which is used for commit logs only so that is separated from the data) * "A 50-70% utilization of a 1 Gbps network interface on average doesn't sound good at all." Yes, this is weird... Especially because e.g. if we bring down a node, the other 2 nodes (we go with RF=2) are producing ~600Mb hints files / minute And assuming hint files is basicall the saved "network traffic" until node is down this would still just give 10Mb/sec ... OK, these are just the replicated updates, there is also read and of course App layer is also reading but even with that in mind it does not add up... So we will try to do further analysis here Thanks for the article also regarding the Counter tables! Actually we already know for a while there are "interesting" things going around the Counter tables it is surprising how difficult to find info regarding this topic... I personally tried to look around here several times and always just getting the same and the same information in posts... Moving away from counters would not be bad especially because of the difficulties around DELETEing (we also feel it) them however I do not see any obvious migration strategy here... But maybe let me ask this in a separate question. Might make more sense... :-) Thanks again - and thanks to others as well It looks mastering the "nodetool tpstats" and the Cassandra thread pools would worth some time... :-) Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 06.03.2021 13:03 keltezéssel, Bowen Song írta: Hi Attila, Addressing your data modelling issue is definitely important, and this alone may be enough to solve all the issues you have with Cassandra. * "Since these are VMs, is there any chance they are competing for resources on the same physical host?" We are splitting the physical hardware into 2 VMs - and resources (cpu cores, disks, ram) all assigned in a dedicated fashion to the VMs without intersection How do you split? Number of cores in all VMs sums to the total physical CPU cores is not enough, because context switches and possible thread contentions will waste CPU cycles. Since you have also said 8-12% CPU time is spent in sys mode, I think it warrants an investigation. Also, do you expose physical disks to the VM or use disk image files? Disk image files can be slow, especially for high IOPS random reads. Personally, I won't recommend running a database on a VM other than for dev/testing/etc. purposes. If possible, you should try to add a node running on a bare metal server of the similar spec as the VM, and see if there's any noticeable performance differences between this bare metal node and the VM nodes. * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the limit of the physical host - so our 2 VMs competing here. Possible that Cassandra VM has ~50-70% of it... A 50-70% utilization of a 1 Gbps network interface on average doesn't sound good at all. That over 60MB/s network traffic constantly. Can you investigate why is this happening? Do you really read/write that much? Or is it something else? * "nodetool tpstats" whooa I never used it, we definitely need some learning here to even understand the output... :-) But I copy that here to the bottom ... maybe clearly shows something to someone who can read it... I noticed that you are using counters in Cassandra. I have to say that I haven't had a good experience with Cassandra counters. An article <https://ably.com/blog/cassandra-counter-columns-nice-in-theory-hazardous-in-practice> which I read recently may convince you to get rid of it. I also don't think counter is something the Cassandra developers are focused on, because things like CASSANDRA-6506 <https://issues.apache.org/jira/browse/CASSANDRA-6506> have been sitting there for many years. Use your database software for their strengths, not their weaknesses. You have Cassandra, but you don't have to use every feature in Cassandra. Sometimes ano
Re: underutilized servers
_REQ 0 0.00 0.00 0.00 0.00 REPLICATION_DONE_REQ 0 0.00 0.00 0.00 0.00 PAXOS_PROPOSE_RSP 0 0.00 0.00 0.00 0.00 Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 05.03.2021 17:45 keltezéssel, Bowen Song írta: Based on my personal experience, the combination of slow read queries and low CPU usage is often an indicator of bad table schema design (e.g.: large partitions) or bad query (e.g. without partition key). Check the Cassandra logs first, is there any long stop-the-world GC? tombstone warning? anything else that's out of ordinary? Check the output from "nodetool tpstats", is there any pending or blocked tasks? Which thread pool(s) are they in? Is there a high number of dropped messages? If you can't find anything useful from the Cassandra server logs and "nodetool tpstats", try to get a few slow queries from your application's log, and run them manually in the cqlsh. Are the results very large? How long do they take? Regarding some of your observations: /> CPU load is around 20-25% - so we have lots of spare capacity/ Is it very few threads each uses nearly 100% of a CPU core? If so, what are those threads? (I find the ttop command from the sjk tool <https://github.com/aragozin/jvm-tools> very helpful) /> network load is around 50% of the full available bandwidth/ This sounds alarming to me. May I ask what's the full available bandwidth? Do you have a lots of CPU time spent in sys (vs user) mode? On 05/03/2021 14:48, Attila Wind wrote: Hi guys, I have a DevOps related question - hope someone here could give some ideas/pointers... We are running a 3 nodes Cassandra cluster Recently we realized we do have performance issues. And based on investigation we took it seems our bottleneck is the Cassandra cluster. The application layer is waiting a lot for Cassandra ops. So queries are running slow on Cassandra side however due to our monitoring it looks the Cassandra servers still have lots of free resources... The Cassandra machines are virtual machines (we do own the physical hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated to it. We are using Ubuntu Linux 18.04 distro - everywhere the same version (the physical and virtual host) We are running Cassandra 4.0-alpha4 What we see is * CPU load is around 20-25% - so we have lots of spare capacity * iowait is around 2-5% - so disk bandwidth should be fine * network load is around 50% of the full available bandwidth * loadavg is max around 4 - 4.5 but typically around 3 (because of the cpu count 6 should represent 100% load) and still, query performance is slow ... and we do not understand what could hold Cassandra back to fully utilize the server resources... We are clearly missing something! Anyone any idea / tip? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
underutilized servers
Hi guys, I have a DevOps related question - hope someone here could give some ideas/pointers... We are running a 3 nodes Cassandra cluster Recently we realized we do have performance issues. And based on investigation we took it seems our bottleneck is the Cassandra cluster. The application layer is waiting a lot for Cassandra ops. So queries are running slow on Cassandra side however due to our monitoring it looks the Cassandra servers still have lots of free resources... The Cassandra machines are virtual machines (we do own the physical hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated to it. We are using Ubuntu Linux 18.04 distro - everywhere the same version (the physical and virtual host) We are running Cassandra 4.0-alpha4 What we see is * CPU load is around 20-25% - so we have lots of spare capacity * iowait is around 2-5% - so disk bandwidth should be fine * network load is around 50% of the full available bandwidth * loadavg is max around 4 - 4.5 but typically around 3 (because of the cpu count 6 should represent 100% load) and still, query performance is slow ... and we do not understand what could hold Cassandra back to fully utilize the server resources... We are clearly missing something! Anyone any idea / tip? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: strange behavior of counter tables after losing a node
Thanks Elliott, yepp! This is exactly what we also figured out as a next step. Upgrade our TEST env to that so we can re-evaluate the test we did. Makes 100% sense Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 27.01.2021 10:18 keltezéssel, Elliott Sims írta: To start with, maybe update to beta4. There's an absolute massive list of fixes since alpha4. I don't think the alphas are expected to be in a usable/low-bug state necessarily, where beta4 is approaching RC status. On Tue, Jan 26, 2021, 10:44 PM Attila Wind wrote: Hey All, I'm coming back on my own question (see below) as this has happened again to us 2 days later so we took the time to further analyse this issue. I'd like to share our experiences and the workaround which we figured out too. So to just quickly sum up the most important details again: * we have a 3 nodes cluster - Cassandra 4-alpha4 and RF=2 - in one DC * we are using ONE consistency level in all queries * if we lose one node from the cluster then o non-counter table writes are fine, remaining 2 nodes taking over everything o but counter table writes start to fail with exception "com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during COUNTER write query at consistency ONE (1 replica were required but only 0 acknowledged the write)" o the two remaining nodes are both producing hints files for the fallen one * just a note: counter_write_request_timeout_in_ms = 1, write_request_timeout_in_ms = 5000 in our cassandra.yaml To test this further bit we did the following: * we shut down one of the nodes normally In this case we do not have the above behavior - everything happens as it should, no failures on counter table writes so this is good * we reproduced the issue in our TEST env by hard-killing one of the nodes instead of normal shutdown (simulating a hardware failure as we had in PROD) Bingo, issue starts immediately! Based on the above observations the "normal shutdown - no problem" case gave an idea - so now we have a workaround how to get back the cluster into a working state in a case if we would lose a node permanently (or for a long time at least) 1. (in our case) we stop the App to stop all Cassandra operations 2. stop all remaining nodes in the cluster normally 3. restart them normally This way the remaining nodes realize the failed node is down and they are jumping into expected processing - everything works including counter table writes If anyone has any idea what to check / change / do in our cluster I'm all ears! :-) thanks Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 22.01.2021 07:35 keltezéssel, Attila Wind írta: Hey guys, Yesterday we had an outage after we have lost a node and we saw such a behavior we can not explain. Our data schema has both: counter and norma tables. And we have replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set) What we saw: After a node went down the updates of the counter tables slowed down. A lot! These updates normally take only a few millisecs but now started to take 30-60 seconds(!) At the same time the write ops against non-counter tables did not show any difference. The app log was silent in a sense of errors. So the queries - including the counter table updates - were not failing (otherwise we see exceptions coming from DAO layer originating from Cassandra driver) at all. One more thing: only those updates suffered from the above huuuge wait time where the lost node was involved (due to partition key). Other updates just went fine The whole stuff looks like Cassandra internally started to wait - a lot - for the lost node. Updates finally succeeded without failure - at least for the App (the client) Did anyone ever experienced similar behavior? What could be an explanation for the above? Some more details: the App is implemented in Java 8, we are using Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 alpha 4. Cluster size is 3 nodes. Any feedback is appreciated! :-) thanks -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: strange behavior of counter tables after losing a node
Hey All, I'm coming back on my own question (see below) as this has happened again to us 2 days later so we took the time to further analyse this issue. I'd like to share our experiences and the workaround which we figured out too. So to just quickly sum up the most important details again: * we have a 3 nodes cluster - Cassandra 4-alpha4 and RF=2 - in one DC * we are using ONE consistency level in all queries * if we lose one node from the cluster then o non-counter table writes are fine, remaining 2 nodes taking over everything o but counter table writes start to fail with exception "com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during COUNTER write query at consistency ONE (1 replica were required but only 0 acknowledged the write)" o the two remaining nodes are both producing hints files for the fallen one * just a note: counter_write_request_timeout_in_ms = 1, write_request_timeout_in_ms = 5000 in our cassandra.yaml To test this further bit we did the following: * we shut down one of the nodes normally In this case we do not have the above behavior - everything happens as it should, no failures on counter table writes so this is good * we reproduced the issue in our TEST env by hard-killing one of the nodes instead of normal shutdown (simulating a hardware failure as we had in PROD) Bingo, issue starts immediately! Based on the above observations the "normal shutdown - no problem" case gave an idea - so now we have a workaround how to get back the cluster into a working state in a case if we would lose a node permanently (or for a long time at least) 1. (in our case) we stop the App to stop all Cassandra operations 2. stop all remaining nodes in the cluster normally 3. restart them normally This way the remaining nodes realize the failed node is down and they are jumping into expected processing - everything works including counter table writes If anyone has any idea what to check / change / do in our cluster I'm all ears! :-) thanks Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932 22.01.2021 07:35 keltezéssel, Attila Wind írta: Hey guys, Yesterday we had an outage after we have lost a node and we saw such a behavior we can not explain. Our data schema has both: counter and norma tables. And we have replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set) What we saw: After a node went down the updates of the counter tables slowed down. A lot! These updates normally take only a few millisecs but now started to take 30-60 seconds(!) At the same time the write ops against non-counter tables did not show any difference. The app log was silent in a sense of errors. So the queries - including the counter table updates - were not failing (otherwise we see exceptions coming from DAO layer originating from Cassandra driver) at all. One more thing: only those updates suffered from the above huuuge wait time where the lost node was involved (due to partition key). Other updates just went fine The whole stuff looks like Cassandra internally started to wait - a lot - for the lost node. Updates finally succeeded without failure - at least for the App (the client) Did anyone ever experienced similar behavior? What could be an explanation for the above? Some more details: the App is implemented in Java 8, we are using Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 alpha 4. Cluster size is 3 nodes. Any feedback is appreciated! :-) thanks -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
strange behavior of counter tables after losing a node
Hey guys, Yesterday we had an outage after we have lost a node and we saw such a behavior we can not explain. Our data schema has both: counter and norma tables. And we have replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set) What we saw: After a node went down the updates of the counter tables slowed down. A lot! These updates normally take only a few millisecs but now started to take 30-60 seconds(!) At the same time the write ops against non-counter tables did not show any difference. The app log was silent in a sense of errors. So the queries - including the counter table updates - were not failing (otherwise we see exceptions coming from DAO layer originating from Cassandra driver) at all. One more thing: only those updates suffered from the above huuuge wait time where the lost node was involved (due to partition key). Other updates just went fine The whole stuff looks like Cassandra internally started to wait - a lot - for the lost node. Updates finally succeeded without failure - at least for the App (the client) Did anyone ever experienced similar behavior? What could be an explanation for the above? Some more details: the App is implemented in Java 8, we are using Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 alpha 4. Cluster size is 3 nodes. Any feedback is appreciated! :-) thanks -- Attila Wind http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw> Mobile: +49 176 43556932
Re: Cassandra timeout during read query
Hey Deepak, "Are you suggesting to reduce the fetchSize (right now fetchSize is 5000) for this query?" Definitely yes! If you would go with 1000 only that would give 5x more chance to the concrete Cassandra node/nodes which is/are executing your query to finish in time pulling together the records (page) - thus helps you to avoid the timeout issue. Based on our measurements smaller page sizes does not add too much to the overall query time at all - but helps Cassandra a lot to eventually fulfill the full request as she can do much better load balancing too as you are iterating over your result set. I would give it a try - same tactics helped a lot on our side I also recommend to try to optimize your data in parallel with the above - if possible and there is space for improvement. All I wrote earlier counts a lot. You need to also take care of data cleanup strategies in your tables to keep the amount of data managed somehow. TTL based approach e.g. is the best if you ask me especially if you have huge data set. cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 27.10.2020 20:07 keltezéssel, Deepak Sharma írta: Hi Attlila, We did have larger partitions which are now below 100MB threshold after we ran nodetool repair. And now we do see most of the time, query runs are running successfully but there is a small percentage of query runs which are still failing. Regarding your comment ```considered with your fetchSize together (driver setting on the query level)```, can you elaborate more on it? Are you suggesting to reduce the fetchSize (right now fetchSize is 5000) for this query? Also, we are trying to use prefetch feature as well but it is also not helping. Following is the code: Iterator iter = resultSet.iterator(); while (iter.hasNext()) { if (resultSet.getAvailableWithoutFetching() <= fetchSize && !resultSet.isFullyFetched()) { resultSet.fetchMoreResults(); } Row row = iter.next(); . } Thanks, Deepak On Sat, Sep 19, 2020 at 6:56 PM Deepak Sharma mailto:sharma.dee...@salesforce.com>> wrote: Thanks Attila and Aaron for the response. These are great insights. I will check and get back to you in case I have any questions. Best, Deepak On Tue, Sep 15, 2020 at 4:33 AM Attila Wind wrote: Hi Deepak, Aaron has right - in order being able to help (better) you need to share those details That 5 secs timeout comes from the coordinator node I think - see cassandra.yaml "read_request_timeout_in_ms" setting - that is influencing this But it does not matter too much... The point is that none of the replicas could completed your query within that 5 secs. And this is a clean indication of something is slow with your query. Maybe 4) is a bit less important here, or I would a bit make it more precise: considered with your fetchSize together (driver setting on the query level) By experience one reason could be if the query which used to works starts not to work any longer is growing number of data. And a possible "wide cluster" problem. Do you have monitoring on the Cassandra machines? What does iowait show? (for us when things like this will start happening is a clean indication) cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 14.09.2020 18:36 keltezéssel, Aaron Ploetz írta: Deepak, Can you reply with: 1) The query you are trying to run. 2) The table definition (PRIMARY KEY, specifically). 3) Maybe a little description of what the table is designed to do. 4) How much data you're expecting returned (both # of rows and data size). Thanks, Aaron On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma <mailto:sharma.dee...@salesforce.com.invalid> wrote: Hi There, We are running into a strange issue in our Cassandra Cluster where one specific query is failing with following error: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded) This is not a typical query read timeout that we know for sure. This error is getting spit out within 5 seconds and the query timeout we have set is around 30 seconds Can we know what is happening here and how can we reproduce this in our local environment? Thanks, Deepak
best pointers to learn Cassandra maintenance
Hey Guys, We already started to feel that however Cassandra performance is awesome in the beginning over time - as more and more data is present in the tables, - more and more deletes creating tombstones, - cluster gets here and there not that well balanced performance can drop quickly and significantly... After ~1 year of learning curve we had to realize that time by time we run into things like "running repairs", "running compactions", understand tombstones (row and range), TTLs, etc etc becomes critical as data is growing. But on the other hand we also see often lots of warnings... Like "if you start Cassandra Reaper you can not stop doing that" ... I feel a bit confused now, and so far never ran into an article which really deeply explains: why? Why this? Why that? Why not this? So I think the time has come for us in the team to start focusing on these topics now. Invest time to better understanding. Really learn what "repair" means, and all consequences of it, etc So Does anyone have any "you must read it" recommendations around these "long term maintenance" topics? I mean really well explained blog post(s), article(s), book(s). Not some "half done" or "I quickly write a post because it was too long ago when I blogged something..." things :-) Good pointers would be appreciated! thanks -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
data modeling qu: use a Map datatype, or just simple rows... ?
Hey guys, I'm curious about your experiences regarding a data modeling question we are facing with. At the moment we see 2 major different approaches in terms of how to build the tables But I'm googling around already for days with no luck to find any useful material explaining to me how a Map (as collection datatype) works on the storage engine, and what could surprise us later if we . So decided to ask this question... (If someone has some nice pointers here maybe that is also much appreciated!) So *To describe the problem* in a simplified form * Imagine you have users (everyone is identified with a UUID), * and we want to answer a simple question: "have we seen this guy before?" * we "just" want to be able to answer this question for a limited time - let's say for 3 months * but... there are lots of lots of users we run into... many millions / each day... * and ~15-20% of them are returning users only - so many guys we just might see once We are thinking about something like a big big Map, in a form of userId => lastSeenTimestamp Obviously if we would have something like that then answering the above question is simply: if(map.get(userId) != null) => TRUE - we have seen the guy before Regarding the 2 major modelling approaches I mentioned above *Approach 1* Just simply use a table, something like this CREATE TABLE IF NOT EXISTS users ( user_id varchar, last_seen int, -- a UNIX timestamp is enough, thats why int PRIMARY KEY (user_id) ) AND default_time_to_live = <3 months of seconds>; *Approach 2 *to do not produce that much rows, "cluster" the guys a bit together (into 1 row) so introduce a hashing function over the userId, producing a value btw [0; 1] and go with a table like CREATE TABLE IF NOT EXISTS users ( user_id_hash int, users_seen map, -- this is a userId => last timestamp map PRIMARY KEY (user_id_hash) ) AND default_time_to_live = <3 months of seconds>; -- yes, its clearly not a good enough way ... In theory: * on a WRITE path both representation gives us a way to do the write without the need of read * even the READ path is pretty efficient in both cases * Approach2 is worse definitely when we come to the cleanup - "remove info if older than 3 month" * Approach2 might affect the balance of the cluster more - thats clear (however not that much due to the "law of large number" and really enough random factors) And what we are struggling around is: what do you think *Which approach would be better over time? *So will slow down the cluster less considering in compaction etc etc As far as we can see the real question is: which hurts more? * much more rows, but very small rows (regarding data size), or * much less rows, but much bigger rows (regarding data size) ? Any thoughts, comments, pointers to some related case studies, articles, etc is highly appreciated!! :-) thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
Re: Cassandra timeout during read query
Hi Deepak, Aaron has right - in order being able to help (better) you need to share those details That 5 secs timeout comes from the coordinator node I think - see cassandra.yaml "read_request_timeout_in_ms" setting - that is influencing this But it does not matter too much... The point is that none of the replicas could completed your query within that 5 secs. And this is a clean indication of something is slow with your query. Maybe 4) is a bit less important here, or I would a bit make it more precise: considered with your fetchSize together (driver setting on the query level) By experience one reason could be if the query which used to works starts not to work any longer is growing number of data. And a possible "wide cluster" problem. Do you have monitoring on the Cassandra machines? What does iowait show? (for us when things like this will start happening is a clean indication) cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 14.09.2020 18:36 keltezéssel, Aaron Ploetz írta: Deepak, Can you reply with: 1) The query you are trying to run. 2) The table definition (PRIMARY KEY, specifically). 3) Maybe a little description of what the table is designed to do. 4) How much data you're expecting returned (both # of rows and data size). Thanks, Aaron On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma wrote: Hi There, We are running into a strange issue in our Cassandra Cluster where one specific query is failing with following error: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded) This is not a typical query read timeout that we know for sure. This error is getting spit out within 5 seconds and the query timeout we have set is around 30 seconds Can we know what is happening here and how can we reproduce this in our local environment? Thanks, Deepak
best setup of tombstones cleanup over a "wide" table (was: efficient delete over a "wide" table?)
Thank you guys for the answers - I expected this but wanted to verify (who knows how smart Cassandra can be in the background! :-) ) @Jeff: unfortunately the records we will pick up for delete are not necessarily "neighbours" in terms of creation time so forming up contiguous ranges can not be done... Just one more question left in this case... As this way we will have lots of row tombstones generated over this "wide" table What would be your recommended table setup here (in terms of gc_grace_seconds, compaction, compression, etc etc)? Currently we have default setup for everything which I believe should be fine tuned a bit better FYI: this table has ~500k new UUID keyed rows every day in each partition... thanks a lot! Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 04.09.2020 16:33 keltezéssel, Jeff Jirsa írta: As someone else pointed out it’s the same number of tombstones. Doing distinct queries gives you a bit more flexibility to retry it one fails, but multiple in one command avoids some contention on the memtable partition objects. If you’re happen to be using type1 uuids (timeuuid) AND you’re deleting contiguous ranges, you could do a DELETE ... WHERE uuid>=? AND uuid <= ? This would trade lots of tombstones for a single range tombstones, but may not match your model. On Sep 3, 2020, at 11:57 PM, Attila Wind wrote: Hi C* gurus, I'm looking for the best strategy to delete records from a "wide" table. "wide" means the table stores records which have a UUID-style id element of the key - within each partition So yes, its not the partitioning key... The partitioning key is actually kind of a customerId at the moment and actually I'm not even sure this is the right model for this table... Given the fact that number of curtomerIds <<< number of UUIDs probably not. But lets exclude this for a moment maybe and come back to the main question of mine! So the question: when I delete records from this table, given the fact I can and I will delete in "batch fashion" (imagine kind of a scheduled job which collects - let's say - 1000 records) every time I do deletes... Would there be a difference (in terms of generated tombstones) if I would a) issue delete one-by-one like DELETE FROM ... WHERE ... uuid = 'a' DELETE FROM ... WHERE ... uuid = 'b' ... DELETE FROM ... WHERE ... uuid = 'z' or b) issue delete in a group fashion like DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z') ? or is there any other way to effeicently delete which I miss here? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
efficient delete over a "wide" table?
Hi C* gurus, I'm looking for the best strategy to delete records from a "wide" table. "wide" means the table stores records which have a UUID-style id element of the key - within each partition So yes, its not the partitioning key... The partitioning key is actually kind of a customerId at the moment and actually I'm not even sure this is the right model for this table... Given the fact that number of curtomerIds <<< number of UUIDs probably not. But lets exclude this for a moment maybe and come back to the main question of mine! So the question: when I delete records from this table, given the fact I can and I will delete in "batch fashion" (imagine kind of a scheduled job which collects - let's say - 1000 records) every time I do deletes... Would there be a difference (in terms of generated tombstones) if I would a) issue delete one-by-one like DELETE FROM ... WHERE ... uuid = 'a' DELETE FROM ... WHERE ... uuid = 'b' ... DELETE FROM ... WHERE ... uuid = 'z' or b) issue delete in a group fashion like DELETE FROM ... WHERE ... uuid in ('a', 'b', ... 'z') ? or is there any other way to effeicently delete which I miss here? thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
Re: tombstones - however there are no deletes
right! silly me (regarding "can't have null for clustering column") :-) OK code is modified, we stopped using NULL on that column. In a few days we will see if this was the cause. Thanks for the useful info eveyrone! Helped a lot! Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 21.08.2020 11:04 keltezéssel, Alex Ott írta: inserting null for any column will generate the tombstone (and you can't have null for clustering column, except case when it's an empty partition with static column). if you're really inserting the new data, not overwriting existing one - use UNSET instead of null On Fri, Aug 21, 2020 at 10:45 AM Attila Wind wrote: Thanks a lot! I will process every pointers you gave - appreciated! 1. we do have collection column in that table but that is (we have only 1 column) a frozen Map - so I guess "Tombstones are also implicitly created any time you insert or update a row which has an (unfrozen) collection column: list<>, map<> or set<>. This has to be done in order to ensure the new write replaces any existing collection entries." does not really apply here 2. "Isn’t it so that explicitly setting a column to NULL also result in a tombstone" Is this true for all columns? or just clustering key cols? Because if for all cols (which would make sense maybe to me more) then we found the possible reason.. :-) As we do have an Integer coulmn there which is actually NULL often (and so far in all cases) Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 21.08.2020 09:49 keltezéssel, Oleksandr Shulgin írta: On Fri, Aug 21, 2020 at 9:43 AM Tobias Eriksson mailto:tobias.eriks...@qvantel.com>> wrote: Isn’t it so that explicitly setting a column to NULL also result in a tombstone True, thanks for pointing that out! Then as mentioned the use of list,set,map can also result in tombstones See https://www.instaclustr.com/cassandra-collections-hidden-tombstones-and-how-to-avoid-them/ And A. Ott has already mentioned both these possible reasons :-) -- Alex -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)
Re: tombstones - however there are no deletes
Thanks a lot! I will process every pointers you gave - appreciated! 1. we do have collection column in that table but that is (we have only 1 column) a frozen Map - so I guess "Tombstones are also implicitly created any time you insert or update a row which has an (unfrozen) collection column: list<>, map<> or set<>. This has to be done in order to ensure the new write replaces any existing collection entries." does not really apply here 2. "Isn’t it so that explicitly setting a column to NULL also result in a tombstone" Is this true for all columns? or just clustering key cols? Because if for all cols (which would make sense maybe to me more) then we found the possible reason.. :-) As we do have an Integer coulmn there which is actually NULL often (and so far in all cases) Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 21.08.2020 09:49 keltezéssel, Oleksandr Shulgin írta: On Fri, Aug 21, 2020 at 9:43 AM Tobias Eriksson mailto:tobias.eriks...@qvantel.com>> wrote: Isn’t it so that explicitly setting a column to NULL also result in a tombstone True, thanks for pointing that out! Then as mentioned the use of list,set,map can also result in tombstones See https://www.instaclustr.com/cassandra-collections-hidden-tombstones-and-how-to-avoid-them/ And A. Ott has already mentioned both these possible reasons :-) -- Alex
tombstones - however there are no deletes
Hi Cassandra Gurus, Recently I captured a very interesting warning in the logs saying 2020-08-19 08:08:32.492 [cassandra-client-keytiles_data_webhits-nio-worker-2] WARN com.datastax.driver.core.RequestHandler - Query '[3 bound values] select * from visit_sess ion_by_start_time_v4 where container_id=? and first_action_time_frame_id >= ? and first_action_time_frame_id <= ?;' generated server side warning(s): *Read 6628 live rows and 6628 tombstone cells*for query SELECT * FROM keytiles_data_webhits.visit_session_by_start_time_v4 WHERE container_id = 5YzsPfE2Gcu8sd-76626 AND first_action_time_frame_id > 4 43837 AND first_action_time_frame_id <= 443670 AND user_agent_type > browser-mobile AND unique_webclient_id > 045d1683-c702-48bd-9d2b-dcf1ca87ac7c AND first_action_ts > 15978 15766 LIMIT 6628 (see tombstone_warn_threshold) What makes this interesting to me is the fact we never issue not even row level deletes but any kind of deletes against this table for now So I'm wondering what can result in tombstone creation in Cassandra - apart from explicit DELETE queries and TTL setup... My suspicion is (but I'm not sure) that as we are going with "select *" read strategy, then calculate everything in-memory, eventually writing back with kinda "update *" queries to Cassandra in this table (so not updating just a few columns but everything) can lead to these... Can it? I tried to search around this sympthom but was not successful - so decided to ask you guys maybe someone can give us a pointer... Some more info: * the table does not have TTL set - this mechanism is turned off * the LIMIT param in upper query comes from paging size * we are using Cassandra4 alpha3 * we also have a few similarly built tables where we follow the above described "update *" policy on write path - however those tables are counter tables... when we mass-read them into memory we also go with "select *" logic reading up tons of rows. The point is we never saw such a warning for these counter tables however we are handling them same fashion... ok counter tables work differently but still interesting to me why those never generated things like this thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
Re: relation btw LWTs and RF
Thank you! The 2nd link you sent is very very good description! I recommend for others too (who might run into this question via mail archive search later...) In my opinion it explains the entire problem space regarding how LWTs are working while also putting them into the context of "consistency level" / different phases of LWT very well. Yesterday I was searching / reading at least 15 different articles + docs - none of them answered my questions (and I just had more and more as progressing with reading) entirely - this one is a nice one! cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 26.06.2020 08:15 keltezéssel, Erick Ramirez írta: You are correct. Lightweight transactions perform a read-before-write [1]. The read phase is performed with a serial consistency which requires a quorum of nodes in the local DC (LOCAL_SERIAL) or across the whole cluster (SERIAL) [2]. Quorum of 2 nodes is 2 nodes so RF=2 cannot tolerate a node outage. Cheers! [1] https://www.datastax.com/blog/2019/04/lightweight-transactions-datastax-enterprise [2] https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__table-write-consistency
Re: relation btw LWTs and RF
Ah yeah forgot to mention - I am using Cassandra 4.0-alpha4 Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932 26.06.2020 08:06 keltezéssel, Attila Wind írta: Hey guys, Recently I ran into an interesting situation (by trying to add optimistic locking strategy to one of the tables) Which lead me eventually to the following observation. Can you confirm (or argue) this is correct when I am saying: "It is not possible to use conditional queries with ReplicationFactor = 2 with tolerating 1 node is down (out of that 2 replicas)" ? Thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
relation btw LWTs and RF
Hey guys, Recently I ran into an interesting situation (by trying to add optimistic locking strategy to one of the tables) Which lead me eventually to the following observation. Can you confirm (or argue) this is correct when I am saying: "It is not possible to use conditional queries with ReplicationFactor = 2 with tolerating 1 node is down (out of that 2 replicas)" ? Thanks! -- Attila Wind http://www.linkedin.com/in/attilaw Mobile: +49 176 43556932
Re: IN OPERATOR VS BATCH QUERY
Hi Sergio, AFAIK you use batches when you want to get "all or nothing" approach from Cassandra. So turning multiple statements into one atomic operation. One very typical use case for this is when you have denormalized data in multiple tables (optimized for different queries) but you need to modify all of them the same way as they were just one entity. This means that if any ofyour delete statements would fail for whatever reason then all of your delete statements would be rolled back. I think you dont want that overhead here for sure... We are not there yet with our development but we will need similar "cleanup" functionality soon. I was also thinking about the IN operator for similar cases but I am curious if anyone here has better idea... Why does the IN operator blowing up the coordinator? I do not entirely get it... Thanks Attila Sergio ezt írta (időpont: 2020. febr. 21., P 3:44): > The current approach is delete from key_value where id = whatever and it > is performed asynchronously from the client. > I was thinking to reduce at least the network round-trips between client > and coordinator with that Batch approach. :) > > In any case, I would test it it will improve or not. So when do you use > batch then? > > Best, > > Sergio > > On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez > wrote: > >> Batches aren't really meant for optimisation in the same way as RDBMS. If >> anything, it will just put pressure on the coordinator having to fire off >> multiple requests to lots of replicas. The IN operator falls into the same >> category and I personally wouldn't use it with more than 2 or 3 partitions >> because then the coordinator will suffer from the same problem. >> >> If it were me, I'd just issue single-partition deletes and throttle it to >> a "reasonable" throughput that your cluster can handle. The word >> "reasonable" is in quotes because only you can determine that magic number >> for your cluster through testing. Cheers! >> >
Re: Counter table in Cassandra
Hi Garvit, I can not answer your main question but when I read your lines one thing was popping up constantly: "why do you ask this?" So what is the background of this question? Do you see anything smelly? Actually a) I always assumed so naturally there are of course lots of in-parallel activities (writes) against any tables includin counters. So of course there is a race condition and probably threads yes b) Cassandra do not have isolated transactions so of course in a complex flow (using multiple tables) there is no business data consistency guarantee for sure c) until you are doing just +/- ops it is a mathematical fact that execution order of writes is not really important. Repeating +1 increase 5 times will result in higher counter by num 5... Please share your background I am interested in it! Cheers Attila 2019. máj. 29., Sze 2:34 dátummal Garvit Sharma ezt írta: > Hi, > > I am using counter tables in Cassandra and I want to understand how the > concurrent updates to counter table are handled in Cassandra. > > There are more than one threads who are responsible for updating the > counter for a partition key. Multiple threads can also update the counter > for the same key. > > In case when more than one threads updating the counter for the same key, > how Cassandra is handling the race condition? > > UPDATE cycling.popular_count > SET popularity = popularity + 1 > WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47; > > > Are there overheads of using counter tables? > Are there alternatives to counter tables? > > Thanks, > -- > > Garvit Sharma > github.com/garvitlnmiit/ > > No Body is a Scholar by birth, its only hard work and strong determination > that makes him master. >
Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?
Hi Shalom, Thanks for your notes! So you also experienced this thing... fine Then maybe the best rules to follow are these: a) never(!) run a query "ALLOW FILTERING" on a Production cluster b) if you need these queries build a test cluster (somehow) and mirror the data (somehow) OR add denormalized tables (write + code complexity overhead) to fulfill those queries Can we agree on this one maybe as a "good to follow" policy? In our case luckily users = developers always. So I can expect them being aware of the consequences of a particular query. We also have test data fully mirrored into a test cluster. So running those queries on test system is possible. Plus If for whatever reason we really really need to run such a query in Prod I can simply instruct them test query like this in the test system for sure cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 28. 8:59, shalom sagges wrote: Hi Attila, I'm definitely no guru, but I've experienced several cases where people at my company used allow filtering and caused major performance issues. As data size increases, the impact will be stronger. If you have large partitions, performance will decrease. GC can be affected. And if GC stops the world too long for too many times, you will feel it. I sincerely believe the best way would be to educate the users and remodel the data. Perhaps you need to denormalize your tables or at least use secondary indices (I prefer to keep it as simple as possible and denormalize). If it's a cluster for analytics, perhaps you need to build a designated cluster only for that so if something does break or get too pressured, normal activities wouldn't be affected, but there are pros and cons for that idea too. Hope this helps. Regards, On Tue, May 28, 2019 at 9:43 AM Attila Wind wrote: Hi Gurus, Looks we stopped this thread. However I would be very much curious answers regarding b) ... Anyone any comments on that? I do see this as a potential production outage risk now... Especially as we are planning to run analysis queries by hand exactly like that over the cluster... thanks! Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 11:42, shalom sagges wrote: a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) I think you should ask yourself a different question. Why am I using ALLOW FILTERING in the first place? What happens if I remove it from the query? I prefer to denormalize the data to multiple tables or at least create an index on the requested column (preferably queried together with a known partition key). b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? I think it can justify the unresponsiveness. When using ALLOW FILTERING, you are doing something like a full table scan in a relational database. There is a lot of information on the internet regarding this subject such as https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/ Hope this helps. Regards, On Thu, May 23, 2019 at 7:33 AM Attila Wind <mailto:attilaw@swf.technology> wrote: Hi, "When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data." a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 0:37, shalom sagges wrote: Hi Vsevolod, 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? I'm not familiar with such a flag. In my case, I just try to educate the R&D teams. Regards, On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov mail
Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?
Hi Gurus, Looks we stopped this thread. However I would be very much curious answers regarding b) ... Anyone any comments on that? I do see this as a potential production outage risk now... Especially as we are planning to run analysis queries by hand exactly like that over the cluster... thanks! Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 11:42, shalom sagges wrote: a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) I think you should ask yourself a different question. Why am I using ALLOW FILTERING in the first place? What happens if I remove it from the query? I prefer to denormalize the data to multiple tables or at least create an index on the requested column (preferably queried together with a known partition key). b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? I think it can justify the unresponsiveness. When using ALLOW FILTERING, you are doing something like a full table scan in a relational database. There is a lot of information on the internet regarding this subject such as https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/ Hope this helps. Regards, On Thu, May 23, 2019 at 7:33 AM Attila Wind wrote: Hi, "When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data." a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 0:37, shalom sagges wrote: Hi Vsevolod, 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? I'm not familiar with such a flag. In my case, I just try to educate the R&D teams. Regards, On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov mailto:vsfilare...@gmail.com>> wrote: Hello everyone, We have an 8 node C* cluster with large volume of unbalanced data. Usual per-partition selects work somewhat fine, and are processed by limited number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8 nodes to halt and unresponsiveness to external requests while disk IO jumps to 100% across whole cluster. In several minutes all nodes seem to finish ptocessing the request and cluster goes back to being responsive. Replication level across whole data is 3. 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? Thank you all very much in advance, Vsevolod Filaretov.
Re: CassKop : a Cassandra operator for Kubernetes developped by Orange
Maybe my understanding is wrong and I am not really a "deployment guru" but it looks like to me that Orange (https://github.com/Orange-OpenSource/cassandra-k8s-operator, 1 contributor and 1 commit for now on 2019-05-24) and sky-uk/cassandra-operator (https://github.com/sky-uk/cassandra-operator , it's in alpha phase and not recommended in production, 3 contributors, 24 commits btw 2019.03.25-2019.05.21, 32 Issues) are developing something I could use in my OWN(!) Kubernetes based solution (even on premise if I want or whatever) They are both open source. Right? While Datastax and Instaclustr are commercial players and offer the solution in a tightly-coupled way with Cloud only (I just took a quick look on Instaclustr but could not even figure out pricing info for this service... probably I am lame... or not? :-)) So this looks to me a nice competition... What do I miss? ps.: maybe Orange and sky-uk/cassandra-operator guys should cooperate..?? Others are clearly building business around it cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 24. 20:36, John Sanda wrote: There is also https://github.com/sky-uk/cassandra-operator On Fri, May 24, 2019 at 2:34 PM Rahul Singh mailto:rahul.xavier.si...@gmail.com>> wrote: Fantastic! Now there are three teams making k8s operators for C*: Datastax, Instaclustr, and now Orange. rahul.xavier.si...@gmail.com <mailto:rahul.xavier.si...@gmail.com> http://cassandra.link I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra conference, and I want to see you there! Use my code Singh50for 50% off your registration. www.datastax.com/accelerate <http://www.datastax.com/accelerate> On Fri, May 24, 2019 at 9:07 AM Jean-Armel Luce mailto:jaluc...@gmail.com>> wrote: Hi folks, We are excited to announce that CassKop, a Cassandra operator for Kubernetes developped by Orange teams, is now ready for Beta testing. CassKop works as a usual K8S controller (reconcile the real state with a desired state) and automates the Cassandra operations through JMX. All the operations are launched by calling standard K8S APIs (kubectl apply …) or by using a K8S plugin (kubectl casskop …). CassKop is developed in GO, based on CoreOS operator-sdk framework. Main features already available : - deploying a rack aware cluster (or AZ aware cluster) - scaling up & down (including cleanups) - setting and modifying configuration parameters (C* and JVM parameters) - adding / removing a datacenter in Cassandra (all datacenters must be in the same region) - rebuilding nodes - removing node or replacing node (in case of hardware failure) - upgrading C* or Java versions (including upgradesstables) - monitoring (using Prometheus/Grafana) - ... By using local and persistent volumes, it is possible to handle failures or stop/start nodes for maintenance operations with no transfer of data between nodes. Moreover, we can deploy cassandra-reaper in K8S and use it for scheduling repair sessions. For now, we can deploy a C* cluster only as a mono-region cluster. We will work during the next weeks to be able to deploy a C* cluster as a multi regions cluster. Still in the roadmap : - Network encryption - Monitoring (exporting logs and metrics) - backup & restore - multi-regions support We'd be interested to hear you try this and let us know what you think! Please read the description and installation instructions on https://github.com/Orange-OpenSource/cassandra-k8s-operator. For a quick start, you can also follow this step by step guide : https://orange-opensource.github.io/cassandra-k8s-operator/index.html?slides=Slides-CassKop-demo.md#1 The CassKop Team -- - John
Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?
Hi again, so remaining with a) for a second... "Why am I using ALLOW FILTERING in the first place?" Fully agreed! To put it this way: as I reviewer I never want to see string occurence "allow filtering" in any selects done by a production code. I clearly consider it as an indicator of a wrong db design. Still! There are use cases - and if I am not mistaken the original question was around that - when for whatever reasons PERSONS are running such selects manually. E.g. for us where we use Cassandra we have things like this: for analysis purposes. So I think this is a valid use case. And once we have found a valid use case question stands. Right? So back to the question: "But only in case you do not provide partitioning key right?" - I assume the answer is yes right? :-) b) "I think it can justify the unresponsiveness. When using ALLOW FILTERING, you are doing something like a full table scan in a relational database" I get it. Sure. But is Cassandra kind of "single threaded" so much that if a node is running one(!) big big extensive query it becomes fully unresponsive? I doubt it... That's what I meant by saying "does not explain or justify". From my perspective I definitely consider this kind of being unresponsiveness as an abnormal state ... cheers Attila On 23.05.2019 11:42 AM, shalom sagges wrote: a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) I think you should ask yourself a different question. Why am I using ALLOW FILTERING in the first place? What happens if I remove it from the query? I prefer to denormalize the data to multiple tables or at least create an index on the requested column (preferably queried together with a known partition key). b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? I think it can justify the unresponsiveness. When using ALLOW FILTERING, you are doing something like a full table scan in a relational database. There is a lot of information on the internet regarding this subject such as https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/ Hope this helps. Regards, On Thu, May 23, 2019 at 7:33 AM Attila Wind wrote: Hi, "When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data." a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 0:37, shalom sagges wrote: Hi Vsevolod, 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? I'm not familiar with such a flag. In my case, I just try to educate the R&D teams. Regards, On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov mailto:vsfilare...@gmail.com>> wrote: Hello everyone, We have an 8 node C* cluster with large volume of unbalanced data. Usual per-partition selects work somewhat fine, and are processed by limited number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8 nodes to halt and unresponsiveness to external requests while disk IO jumps to 100% across whole cluster. In several minutes all nodes seem to finish ptocessing the request and cluster goes back to being responsive. Replication level across whole data is 3. 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? Thank you all very much in advance, Vsevolod Filaretov.
Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?
Hi, "When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data." a) Interesting... But only in case you do not provide partitioning key right? (so IN() is for partitioning key?) b) Still does not explain or justify "all 8 nodes to halt and unresponsiveness to external requests" behavior... Even if servers are busy with the request seriously becoming non-responsive...? cheers Attila Wind http://www.linkedin.com/in/attilaw Mobile: +36 31 7811355 On 2019. 05. 23. 0:37, shalom sagges wrote: Hi Vsevolod, 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. When you run a query with allow filtering, Cassandra doesn't know where the data is located, so it has to go node by node, searching for the requested data. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? I'm not familiar with such a flag. In my case, I just try to educate the R&D teams. Regards, On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov mailto:vsfilare...@gmail.com>> wrote: Hello everyone, We have an 8 node C* cluster with large volume of unbalanced data. Usual per-partition selects work somewhat fine, and are processed by limited number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING, such command stalls all 8 nodes to halt and unresponsiveness to external requests while disk IO jumps to 100% across whole cluster. In several minutes all nodes seem to finish ptocessing the request and cluster goes back to being responsive. Replication level across whole data is 3. 1) Why such behavior? I thought any given SELECT request is handled by a limited subset of C* nodes and not by all of them, as per connection consistency/table replication settings, in case. 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups? Thank you all very much in advance, Vsevolod Filaretov.