RE: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

2020-02-21 Thread Durity, Sean R
Batches are for atomicity, not performance.

I would do single deletes with a prepared statement. An IN clause causes extra 
work for the coordinator because multiple partitions are being impacted. So, 
the coordinator has to coordinate all nodes involved in those writes (up to the 
whole cluster). Availability and performance are compromised for multiple 
partition operations. I do not allow them.

Also – TTL at insert (or update) is a much better solution than large purge 
strategies. As someone who spent a month wrangling hundreds of billions of 
deletes, I am an ardent preacher of TTL during design time.

Sean Durity

From: Attila Wind 
Sent: Friday, February 21, 2020 2:52 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: IN OPERATOR VS BATCH QUERY

Hi Sergio,

AFAIK you use batches when you want to get "all or nothing" approach from 
Cassandra. So turning multiple statements into one atomic operation.

One very typical use case for this is when you have denormalized data in 
multiple tables (optimized for different queries) but you need to modify all of 
them the same way as they were just one entity.

This means that if any ofyour delete statements would fail for whatever reason 
then all of your delete statements would be rolled back.

I think you dont want that overhead here for sure...

We are not there yet with our development but we will need similar "cleanup" 
functionality soon.
I was also thinking about the IN operator for similar cases but I am curious if 
anyone here has better idea...
Why does the IN operator blowing up the coordinator? I do not entirely get it...

Thanks
Attila

Sergio mailto:lapostadiser...@gmail.com>> ezt írta 
(időpont: 2020. febr. 21., P 3:44):
The current approach is delete from key_value where id = whatever and it is 
performed asynchronously from the client.
I was thinking to reduce at least the network round-trips between client  and 
coordinator with that Batch approach. :)

In any case, I would test it it will improve or not. So when do you use batch 
then?

Best,

Sergio

On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
mailto:erick.rami...@datastax.com>> wrote:
Batches aren't really meant for optimisation in the same way as RDBMS. If 
anything, it will just put pressure on the coordinator having to fire off 
multiple requests to lots of replicas. The IN operator falls into the same 
category and I personally wouldn't use it with more than 2 or 3 partitions 
because then the coordinator will suffer from the same problem.

If it were me, I'd just issue single-partition deletes and throttle it to a 
"reasonable" throughput that your cluster can handle. The word "reasonable" is 
in quotes because only you can determine that magic number for your cluster 
through testing. Cheers!



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: IN OPERATOR VS BATCH QUERY

2020-02-20 Thread Attila Wind
Hi Sergio,

AFAIK you use batches when you want to get "all or nothing" approach from
Cassandra. So turning multiple statements into one atomic operation.

One very typical use case for this is when you have denormalized data in
multiple tables (optimized for different queries) but you need to modify
all of them the same way as they were just one entity.

This means that if any ofyour delete statements would fail for whatever
reason then all of your delete statements would be rolled back.

I think you dont want that overhead here for sure...

We are not there yet with our development but we will need similar
"cleanup" functionality soon.
I was also thinking about the IN operator for similar cases but I am
curious if anyone here has better idea...
Why does the IN operator blowing up the coordinator? I do not entirely get
it...

Thanks
Attila

Sergio  ezt írta (időpont: 2020. febr. 21., P
3:44):

> The current approach is delete from key_value where id = whatever and it
> is performed asynchronously from the client.
> I was thinking to reduce at least the network round-trips between client
> and coordinator with that Batch approach. :)
>
> In any case, I would test it it will improve or not. So when do you use
> batch then?
>
> Best,
>
> Sergio
>
> On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
> wrote:
>
>> Batches aren't really meant for optimisation in the same way as RDBMS. If
>> anything, it will just put pressure on the coordinator having to fire off
>> multiple requests to lots of replicas. The IN operator falls into the same
>> category and I personally wouldn't use it with more than 2 or 3 partitions
>> because then the coordinator will suffer from the same problem.
>>
>> If it were me, I'd just issue single-partition deletes and throttle it to
>> a "reasonable" throughput that your cluster can handle. The word
>> "reasonable" is in quotes because only you can determine that magic number
>> for your cluster through testing. Cheers!
>>
>


Re: IN OPERATOR VS BATCH QUERY

2020-02-20 Thread Sergio
The current approach is delete from key_value where id = whatever and it is
performed asynchronously from the client.
I was thinking to reduce at least the network round-trips between client
and coordinator with that Batch approach. :)

In any case, I would test it it will improve or not. So when do you use
batch then?

Best,

Sergio

On Thu, Feb 20, 2020, 6:18 PM Erick Ramirez 
wrote:

> Batches aren't really meant for optimisation in the same way as RDBMS. If
> anything, it will just put pressure on the coordinator having to fire off
> multiple requests to lots of replicas. The IN operator falls into the same
> category and I personally wouldn't use it with more than 2 or 3 partitions
> because then the coordinator will suffer from the same problem.
>
> If it were me, I'd just issue single-partition deletes and throttle it to
> a "reasonable" throughput that your cluster can handle. The word
> "reasonable" is in quotes because only you can determine that magic number
> for your cluster through testing. Cheers!
>


Re: IN OPERATOR VS BATCH QUERY

2020-02-20 Thread Erick Ramirez
Batches aren't really meant for optimisation in the same way as RDBMS. If
anything, it will just put pressure on the coordinator having to fire off
multiple requests to lots of replicas. The IN operator falls into the same
category and I personally wouldn't use it with more than 2 or 3 partitions
because then the coordinator will suffer from the same problem.

If it were me, I'd just issue single-partition deletes and throttle it to a
"reasonable" throughput that your cluster can handle. The word "reasonable"
is in quotes because only you can determine that magic number for your
cluster through testing. Cheers!


IN OPERATOR VS BATCH QUERY

2020-02-20 Thread Sergio Bilello
Hi guys!

Let's say we have a KEY-VALUE schema

The goal is to delete the KEYS in batches without burning the cluster and be 
efficient as soon as possible

I would like to know if it is better to run the query with DELETE FROM 
KEY_VALUE_COLUMN_FAMILY WHERE KEY IN ('A','B','C'); At most 10 KEYS in the IN 
STATEMENT 

OR

HANDLE WITH A CASSANDRA BATCH QUERY and in particular, I was looking at 
https://docs.spring.io/spring-data/cassandra/docs/current/api/org/springframework/data/cassandra/core/ReactiveCassandraBatchOperations.html#delete-java.lang.Iterable-

Thanks,

Sergio




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org