Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-18 Thread Alain RODRIGUEZ
"However the old rows will not be purged from disk unless all fragments of
the row are involved in a compaction process. So it may take some time to
purge from disk, depending on the workload. "

http://wiki.apache.org/cassandra/Counters

The doc says: "Counter removal is intrinsically limited. For instance, if
you issue very quickly the sequence "increment, remove, increment" it is
possible for the removal to be lost (if for some reason the remove happens
to be the last received messages). Hence, removal of counters is provided
for definitive removal only, that is when the deleted counter is not
increment afterwards. This holds for row deletion too: if you delete a row
of counters, incrementing any counter in that row (that existed before the
deletion) will result in an undetermined behavior. Note that if you need to
reset a counter, one option (that is unfortunately not concurrent safe)
could be to read its *value* and add *-value*."

Just wanted to add that we experienced it. While data is purged from disk,
we couldn't write anything in that row. I mean, weren't enable to create
any new column.

I just wanted to let you know in case it could help.



2013/2/18 aaron morton 

> Sorry, missed the Counters part.
>
> You are probably interested in this one
> https://issues.apache.org/jira/browse/CASSANDRA-5228
>
> Add your need to ticket to help it along. IMHO if you have write once,
> read many time series data the SSTables are effectively doing horizontal
> partitioning for you. So been able to "drop a partition" would make life
> easier.
>
> If you can delete entire row then the deletes have less impact than per
> column. However the old rows will not be purged from disk unless all
> fragments of the row are involved in a compaction process. So it may take
> some time to purge from disk, depending on the workload.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/02/2013, at 10:43 AM, Ilya Grebnov  wrote:
>
> According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There
> is no support for time to live (TTL) on counter columns. Did I miss
> something?
>
> Thanks,
> Ilya
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Sunday, February 17, 2013 9:16 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Deleting old items during compaction (WAS: Deleting old
> items)
> ** **
> That's what the TTL does. 
> ** **
> Manually delete all the older data now, then start using TTL. 
> ** **
> Cheers
> ** **
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
> ** **
> On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:
>
>
> 
> Hi,
>  
> We looking for solution for same problem. We have a wide column family
> with counters and we want to delete old data like 1 months old. One of
> potential ideas was to implement hook in compaction code and drop column
> which we don’t need. Is this a viable option?
>  
> Thanks,
> Ilya
> *From:* aaron morton [mailto:aa...@thelastpickle.com]
> *Sent:* Tuesday, February 12, 2013 9:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Deleting old items
>  
>
> So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?
>
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>
>
>
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?*
> ***
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the
> rows starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with
> (name, value, timestamp). So is it possible to delete all the data inserted
> in some CF between 2 dates or data older than 1 month ?
>  
> Alain
>  
>
>
>


Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-18 Thread aaron morton
Sorry, missed the Counters part.

You are probably interested in this one 
https://issues.apache.org/jira/browse/CASSANDRA-5228

Add your need to ticket to help it along. IMHO if you have write once, read 
many time series data the SSTables are effectively doing horizontal 
partitioning for you. So been able to "drop a partition" would make life 
easier. 

If you can delete entire row then the deletes have less impact than per column. 
However the old rows will not be purged from disk unless all fragments of the 
row are involved in a compaction process. So it may take some time to purge 
from disk, depending on the workload. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 10:43 AM, Ilya Grebnov  wrote:

> According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There is no 
> support for time to live (TTL) on counter columns. Did I miss something?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Sunday, February 17, 2013 9:16 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items during compaction (WAS: Deleting old items)
>  
> That's what the TTL does. 
>  
> Manually delete all the older data now, then start using TTL. 
>  
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:
> 
> 
> Hi,
>  
> We looking for solution for same problem. We have a wide column family with 
> counters and we want to delete old data like 1 months old. One of potential 
> ideas was to implement hook in compaction code and drop column which we don’t 
> need. Is this a viable option?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Tuesday, February 12, 2013 9:01 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items
>  
> So is it possible to delete all the data inserted in some CF between 2 dates 
> or data older than 1 month ?
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll 
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the 
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
> 
> 
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already 
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a 
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the rows 
> starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with (name, 
> value, timestamp). So is it possible to delete all the data inserted in some 
> CF between 2 dates or data older than 1 month ?
>  
> Alain
>  



RE: Deleting old items during compaction (WAS: Deleting old items)

2013-02-17 Thread Ilya Grebnov
According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There is
no support for time to live (TTL) on counter columns. Did I miss something?

 

Thanks,

Ilya

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Sunday, February 17, 2013 9:16 AM
To: user@cassandra.apache.org
Subject: Re: Deleting old items during compaction (WAS: Deleting old items)

 

That's what the TTL does. 

 

Manually delete all the older data now, then start using TTL. 

 

Cheers

 

-

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

http://www.thelastpickle.com

 

On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:





Hi,

 

We looking for solution for same problem. We have a wide column family with
counters and we want to delete old data like 1 months old. One of potential
ideas was to implement hook in compaction code and drop column which we
don't need. Is this a viable option?

 

Thanks,

Ilya

From: aaron morton [mailto:aaron@ <http://thelastpickle.com>
thelastpickle.com] 
Sent: Tuesday, February 12, 2013 9:01 AM
To:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
Subject: Re: Deleting old items

 

So is it possible to delete all the data inserted in some CF between 2 dates
or data older than 1 month ?

No. 

 

You need to issue row level deletes. If you don't know the row key you'll
need to do range scans to locate them. 

 

If you are deleting parts of wide rows consider reducing the
min_compaction_level_threshold on the CF to 2

 

Cheers

 

 

-

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

 <http://www.thelastpickle.com> http://www.thelastpickle.com

 

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ < <mailto:arodr...@gmail.com>
arodr...@gmail.com> wrote:






Hi,

 

I would like to know if there is a way to delete old/unused data easily ?

 

I know about TTL but there are 2 limitations of TTL:

 

- AFAIK, there is no TTL on counter columns

- TTL need to be defined at write time, so it's too late for data already
inserted.

 

I also could use a standard "delete" but it seems inappropriate for such a
massive.

 

In some cases, I don't know the row key and would like to delete all the
rows starting by, let's say, "1050#..." 

 

Even better, I understood that columns are always inserted in C* with (name,
value, timestamp). So is it possible to delete all the data inserted in some
CF between 2 dates or data older than 1 month ?

 

Alain

 

 



Re: Deleting old items

2013-02-17 Thread aaron morton
I'll email the docs people. 

I believe they are saying "use compaction throttling rather than this" not 
"this does nothing"

Although I used this in the last month on a machine with very little ram to 
limit compaction memory use.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ  wrote:

> "Can you point to the docs."
> 
> http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold
> 
> And thanks about the rest of your answers, once again ;-).
> 
> Alain
> 
> 
> 2013/2/16 aaron morton 
>>  Is that a feature that could possibly be developed one day ?
> No. 
> Timestamps are essentially internal implementation used to resolve different 
> values for the same column. 
> 
>> With "min_compaction_level_threshold" did you mean 
>> "min_compaction_threshold"  ? If so, why should I do that, what are the 
>> advantage/inconvenient of reducing this value ?
> 
> Yes, min_compaction_threshold, my bad. 
> If you have a wide row and delete a lot of values you will end up with a lot 
> of tombstones. These may dramatically reduce the read performance until they 
> are purged. Reducing the compaction threshold makes compaction happen more 
> frequently. 
> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
>> Cassandra 1.1 and later.". How to ensure that I'll always keep a small 
>> amount of SSTables then ?
> AFAIK it's not. 
> There may be some confusion about the location of the settings in CLI vs CQL. 
> Can you point to the docs. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ  wrote:
> 
>> Hi Aaron, once again thanks for this answer.
>>> "So is it possible to delete all the data inserted in some CF between 2 
>>> dates or data older than 1 month ?"
>> "No. "
>> 
>> Why is there no way of deleting or getting data using the internal timestamp 
>> stored alongside of any inserted column (as described here: 
>> http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is 
>> that a feature that could possibly be developed one day ? It could be useful 
>> to perform delete of old data or to bring to a dev cluster just the last 
>> week of data for example.
>> 
>> With "min_compaction_level_threshold" did you mean 
>> "min_compaction_threshold"  ? If so, why should I do that, what are the 
>> advantage/inconvenient of reducing this value ?
>> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
>> Cassandra 1.1 and later.". How to ensure that I'll always keep a small 
>> amount of SSTables then ? Why is this deprecated ?
>> 
>> Alain
>> 
>> 
>> 2013/2/12 aaron morton 
>>> So is it possible to delete all the data inserted in some CF between 2 
>>> dates or data older than 1 month ?
>> No. 
>> 
>> You need to issue row level deletes. If you don't know the row key you'll 
>> need to do range scans to locate them. 
>> 
>> If you are deleting parts of wide rows consider reducing the 
>> min_compaction_level_threshold on the CF to 2
>> 
>> Cheers
>> 
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>> 
>>> Hi,
>>> 
>>> I would like to know if there is a way to delete old/unused data easily ?
>>> 
>>> I know about TTL but there are 2 limitations of TTL:
>>> 
>>> - AFAIK, there is no TTL on counter columns
>>> - TTL need to be defined at write time, so it's too late for data already 
>>> inserted.
>>> 
>>> I also could use a standard "delete" but it seems inappropriate for such a 
>>> massive.
>>> 
>>> In some cases, I don't know the row key and would like to delete all the 
>>> rows starting by, let's say, "1050#..." 
>>> 
>>> Even better, I understood that columns are always inserted in C* with 
>>> (name, value, timestamp). So is it possible to delete all the data inserted 
>>> in some CF between 2 dates or data older than 1 month ?
>>> 
>>> Alain
>> 
>> 
> 
> 



Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-17 Thread aaron morton
That's what the TTL does. 

Manually delete all the older data now, then start using TTL. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:

> Hi,
>  
> We looking for solution for same problem. We have a wide column family with 
> counters and we want to delete old data like 1 months old. One of potential 
> ideas was to implement hook in compaction code and drop column which we don’t 
> need. Is this a viable option?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Tuesday, February 12, 2013 9:01 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items
>  
> So is it possible to delete all the data inserted in some CF between 2 dates 
> or data older than 1 month ?
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll 
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the 
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
> 
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already 
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a 
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the rows 
> starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with (name, 
> value, timestamp). So is it possible to delete all the data inserted in some 
> CF between 2 dates or data older than 1 month ?
>  
> Alain
>  



Re: Deleting old items

2013-02-16 Thread Alain RODRIGUEZ
"Can you point to the docs."

http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold

And thanks about the rest of your answers, once again ;-).

Alain


2013/2/16 aaron morton 

>  Is that a feature that could possibly be developed one day ?
>
> No.
> Timestamps are essentially internal implementation used to resolve
> different values for the same column.
>
> With "min_compaction_level_threshold" did you mean "
> min_compaction_threshold"  ? If so, why should I do that, what are the
> advantage/inconvenient of reducing this value ?
>
> Yes, min_compaction_threshold, my bad.
> If you have a wide row and delete a lot of values you will end up with a
> lot of tombstones. These may dramatically reduce the read performance until
> they are purged. Reducing the compaction threshold makes compaction happen
> more frequently.
>
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small
> amount of SSTables then ?
>
> AFAIK it's not.
> There may be some confusion about the location of the settings in CLI vs
> CQL.
> Can you point to the docs.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ  wrote:
>
> Hi Aaron, once again thanks for this answer.
>
> "So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?"
>
> "No. "
>
> Why is there no way of deleting or getting data using the internal
> timestamp stored alongside of any inserted column (as described here:
> http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is
> that a feature that could possibly be developed one day ? It could
> be useful to perform delete of old data or to bring to a dev cluster just
> the last week of data for example.
>
> With "min_compaction_level_threshold" did you mean "
> min_compaction_threshold"  ? If so, why should I do that, what are the
> advantage/inconvenient of reducing this value ?
>
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small
> amount of SSTables then ? Why is this deprecated ?
>
> Alain
>
>
> 2013/2/12 aaron morton 
>
>> So is it possible to delete all the data inserted in some CF between 2
>> dates or data older than 1 month ?
>>
>> No.
>>
>> You need to issue row level deletes. If you don't know the row key you'll
>> need to do range scans to locate them.
>>
>> If you are deleting parts of wide rows consider reducing the
>> min_compaction_level_threshold on the CF to 2
>>
>> Cheers
>>
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>>
>> Hi,
>>
>> I would like to know if there is a way to delete old/unused data easily ?
>>
>> I know about TTL but there are 2 limitations of TTL:
>>
>> - AFAIK, there is no TTL on counter columns
>> - TTL need to be defined at write time, so it's too late for data already
>> inserted.
>>
>> I also could use a standard "delete" but it seems inappropriate for such
>> a massive.
>>
>> In some cases, I don't know the row key and would like to delete all the
>> rows starting by, let's say, "1050#..."
>>
>> Even better, I understood that columns are always inserted in C* with
>> (name, value, timestamp). So is it possible to delete all the data inserted
>> in some CF between 2 dates or data older than 1 month ?
>>
>> Alain
>>
>>
>>
>
>


Re: Deleting old items

2013-02-16 Thread aaron morton
>  Is that a feature that could possibly be developed one day ?
No. 
Timestamps are essentially internal implementation used to resolve different 
values for the same column. 

> With "min_compaction_level_threshold" did you mean "min_compaction_threshold" 
>  ? If so, why should I do that, what are the advantage/inconvenient of 
> reducing this value ?
Yes, min_compaction_threshold, my bad. 
If you have a wide row and delete a lot of values you will end up with a lot of 
tombstones. These may dramatically reduce the read performance until they are 
purged. Reducing the compaction threshold makes compaction happen more 
frequently. 

> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount 
> of SSTables then ?
AFAIK it's not. 
There may be some confusion about the location of the settings in CLI vs CQL. 
Can you point to the docs. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ  wrote:

> Hi Aaron, once again thanks for this answer.
>> "So is it possible to delete all the data inserted in some CF between 2 
>> dates or data older than 1 month ?"
> "No. "
> 
> Why is there no way of deleting or getting data using the internal timestamp 
> stored alongside of any inserted column (as described here: 
> http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is 
> that a feature that could possibly be developed one day ? It could be useful 
> to perform delete of old data or to bring to a dev cluster just the last week 
> of data for example.
> 
> With "min_compaction_level_threshold" did you mean "min_compaction_threshold" 
>  ? If so, why should I do that, what are the advantage/inconvenient of 
> reducing this value ?
> 
> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
> Cassandra 1.1 and later.". How to ensure that I'll always keep a small amount 
> of SSTables then ? Why is this deprecated ?
> 
> Alain
> 
> 
> 2013/2/12 aaron morton 
>> So is it possible to delete all the data inserted in some CF between 2 dates 
>> or data older than 1 month ?
> No. 
> 
> You need to issue row level deletes. If you don't know the row key you'll 
> need to do range scans to locate them. 
> 
> If you are deleting parts of wide rows consider reducing the 
> min_compaction_level_threshold on the CF to 2
> 
> Cheers
> 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
> 
>> Hi,
>> 
>> I would like to know if there is a way to delete old/unused data easily ?
>> 
>> I know about TTL but there are 2 limitations of TTL:
>> 
>> - AFAIK, there is no TTL on counter columns
>> - TTL need to be defined at write time, so it's too late for data already 
>> inserted.
>> 
>> I also could use a standard "delete" but it seems inappropriate for such a 
>> massive.
>> 
>> In some cases, I don't know the row key and would like to delete all the 
>> rows starting by, let's say, "1050#..." 
>> 
>> Even better, I understood that columns are always inserted in C* with (name, 
>> value, timestamp). So is it possible to delete all the data inserted in some 
>> CF between 2 dates or data older than 1 month ?
>> 
>> Alain
> 
> 



Deleting old items during compaction (WAS: Deleting old items)

2013-02-13 Thread Ilya Grebnov
Hi,

 

We looking for solution for same problem. We have a wide column family with
counters and we want to delete old data like 1 months old. One of potential
ideas was to implement hook in compaction code and drop column which we
don't need. Is this a viable option?

 

Thanks,

Ilya

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Tuesday, February 12, 2013 9:01 AM
To: user@cassandra.apache.org
Subject: Re: Deleting old items

 

So is it possible to delete all the data inserted in some CF between 2 dates
or data older than 1 month ?

No. 

 

You need to issue row level deletes. If you don't know the row key you'll
need to do range scans to locate them. 

 

If you are deleting parts of wide rows consider reducing the
min_compaction_level_threshold on the CF to 2

 

Cheers

 

 

-

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

http://www.thelastpickle.com

 

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:





Hi,

 

I would like to know if there is a way to delete old/unused data easily ?

 

I know about TTL but there are 2 limitations of TTL:

 

- AFAIK, there is no TTL on counter columns

- TTL need to be defined at write time, so it's too late for data already
inserted.

 

I also could use a standard "delete" but it seems inappropriate for such a
massive.

 

In some cases, I don't know the row key and would like to delete all the
rows starting by, let's say, "1050#..." 

 

Even better, I understood that columns are always inserted in C* with (name,
value, timestamp). So is it possible to delete all the data inserted in some
CF between 2 dates or data older than 1 month ?

 

Alain

 



Re: Deleting old items

2013-02-13 Thread Alain RODRIGUEZ
Hi Aaron, once again thanks for this answer.

"So is it possible to delete all the data inserted in some CF between 2
dates or data older than 1 month ?"

"No. "

Why is there no way of deleting or getting data using the internal
timestamp stored alongside of any inserted column (as described here:
http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is
that a feature that could possibly be developed one day ? It could
be useful to perform delete of old data or to bring to a dev cluster just
the last week of data for example.

With "min_compaction_level_threshold" did you mean "min_compaction_threshold"
 ? If so, why should I do that, what are the advantage/inconvenient of
reducing this value ?

Looking at the doc I saw that: "max_compaction_threshold: Ignored in
Cassandra 1.1 and later.". How to ensure that I'll always keep a small
amount of SSTables then ? Why is this deprecated ?

Alain


2013/2/12 aaron morton 

> So is it possible to delete all the data inserted in some CF between 2
> dates or data older than 1 month ?
>
> No.
>
> You need to issue row level deletes. If you don't know the row key you'll
> need to do range scans to locate them.
>
> If you are deleting parts of wide rows consider reducing the
> min_compaction_level_threshold on the CF to 2
>
> Cheers
>
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>
> Hi,
>
> I would like to know if there is a way to delete old/unused data easily ?
>
> I know about TTL but there are 2 limitations of TTL:
>
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already
> inserted.
>
> I also could use a standard "delete" but it seems inappropriate for such a
> massive.
>
> In some cases, I don't know the row key and would like to delete all the
> rows starting by, let's say, "1050#..."
>
> Even better, I understood that columns are always inserted in C* with
> (name, value, timestamp). So is it possible to delete all the data inserted
> in some CF between 2 dates or data older than 1 month ?
>
> Alain
>
>
>


Re: Deleting old items

2013-02-12 Thread aaron morton
> So is it possible to delete all the data inserted in some CF between 2 dates 
> or data older than 1 month ?
No. 

You need to issue row level deletes. If you don't know the row key you'll need 
to do range scans to locate them. 

If you are deleting parts of wide rows consider reducing the 
min_compaction_level_threshold on the CF to 2

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:

> Hi,
> 
> I would like to know if there is a way to delete old/unused data easily ?
> 
> I know about TTL but there are 2 limitations of TTL:
> 
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already 
> inserted.
> 
> I also could use a standard "delete" but it seems inappropriate for such a 
> massive.
> 
> In some cases, I don't know the row key and would like to delete all the rows 
> starting by, let's say, "1050#..." 
> 
> Even better, I understood that columns are always inserted in C* with (name, 
> value, timestamp). So is it possible to delete all the data inserted in some 
> CF between 2 dates or data older than 1 month ?
> 
> Alain