Number of values in IN clause for clustering column

2018-08-19 Thread Vova Shelgunov
Hi All,

Let's imagine that I have the following schema:

CREATE TABLE IF NOT EXISTS history_data
(
discriminator uuid,
a bigint,
b bigint,
date date,
data custom_type,
PRIMARY KEY ((discriminator, a, b), date)
) WITH CLUSTERING ORDER BY (date DESC);

I want to delete the data from this table for a single pair (a and b)
and multiple dates:

DELETE FROM history_data WHERE discriminator = ... and a = 1 and b = 4
and date in (:list_of_dates)

How many dates can I pass in a single delete query without any
performance issues?


I planned to split the list_of_dates into multiple buckets and then
send an unlogged batch with all delete queries.


Thanks,

Uladzimir


Re: Repair daily refreshed table

2018-08-19 Thread Maxim Parkachov
Hi Raul,

I cannot afford delete and then load as this will create downtime for the
record, that's why I'm upserting with TTL today()+7days as I mentioted in
my original question. And at the moment I don't have an issue either with
loading nor with access times. My question is should I repair such table or
not and if yes before load or after (or it doesn't matter) ?

Thanks,
Maxim.

On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh 
wrote:

> If you wanted to be certain that all replicas were acknowledging receipt
> of the data, then you could use ALL or EACH_QUORUM ( if you have multiple
> DCs) but you must really want high consistency if you do that.
>
> You should avoid consciously creating tombstones if possible — it ends up
> making reads slower because they need to be accounted for until they are
> compacted / garbage collected out.
>
> Tombstones are created when data is either deleted, or nulled. When
> marking data with a TTL , the actual delete is not done until after the TTL
> has expired.
>
> When you say you are overwriting, are you deleting and then loading?
> That’s the only way you should see tombstones — or maybe you are setting
> nulls?
>
> Rahul
> On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov ,
> wrote:
>
> Hi Rahul,
>
> I'm already using LOCAL_QUORUM in batch process and it runs every day. As
> far as I understand, because I'm overwriting whole table with new TTL,
> process creates tons of thumbstones and I'm more concerned with them.
>
> Regards,
> Maxim.
>
> On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
> wrote:
>
>> Are you loading using a batch process? What’s the frequency of the data
>> Ingest and does it have to very fast. If not too frequent and can be a
>> little slower, you may consider a higher consistency to ensure data is on
>> replicas.
>>
>> Rahul
>> On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov ,
>> wrote:
>>
>> Hi community,
>>
>> I'm currently puzzled with following challenge. I have a CF with 7 days
>> TTL on all rows. Daily there is a process which loads actual data with +7
>> days TTL. Thus records which are not present in last 7 days of load
>> expired. Amount of these expired records are very small < 1%. I have daily
>> repair process, which take considerable amount of time and resources, and
>> snapshot after that. Obviously I'm concerned only with the last loaded
>> data. Basically, my question: should I run repair before load, after load
>> or maybe I don't need to repair such table at all ?
>>
>> Regards,
>> Maxim.
>>
>>