Re: Tombstones not getting purged

2019-06-20 Thread Léo FERLIN SUTTON
Thank you for the information !

On Thu, Jun 20, 2019 at 9:50 AM Alexander Dejanovski 
wrote:

> Léo,
>
> if a major compaction isn't a viable option, you can give a go at
> Instaclustr SSTables tools to target the partitions with the most
> tombstones :
> https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge
>
> It generates a report like this:
>
> Summary:
>
> +-+-+
>
> | | Size|
>
> +-+-+
>
> | Disk|  1.9 GB |
>
> | Reclaim | 11.7 MB |
>
> +-+-+
>
>
> Largest reclaimable partitions:
>
> +--++-+-+
>
> | Key  | Size   | Reclaim | Generations |
>
> +--++-+-+
>
> | 001.2.340862 | 3.2 kB |  3.2 kB | [534, 438, 498] |
>
> | 001.2.946243 | 2.9 kB |  2.8 kB | [534, 434, 384] |
>
> | 001.1.527557 | 2.8 kB |  2.7 kB | [534, 519, 394] |
>
> | 001.2.181797 | 2.6 kB |  2.6 kB | [534, 424, 343] |
>
> | 001.3.475853 | 2.7 kB |28 B |  [524, 462] |
>
> | 001.0.159704 | 2.7 kB |28 B |  [440, 247] |
>
> | 001.1.311372 | 2.6 kB |28 B |  [424, 458] |
>
> | 001.0.756293 | 2.6 kB |28 B |  [428, 358] |
>
> | 001.2.681009 | 2.5 kB |28 B |  [440, 241] |
>
> | 001.2.474773 | 2.5 kB |28 B |  [524, 484] |
>
> | 001.2.974571 | 2.5 kB |28 B |  [386, 517] |
>
> | 001.0.143176 | 2.5 kB |28 B |  [518, 368] |
>
> | 001.1.185198 | 2.5 kB |28 B |  [517, 386] |
>
> | 001.3.503517 | 2.5 kB |28 B |  [426, 346] |
>
> | 001.1.847384 | 2.5 kB |28 B |  [436, 396] |
>
> | 001.0.949269 | 2.5 kB |28 B |  [516, 356] |
>
> | 001.0.756763 | 2.5 kB |28 B |  [440, 249] |
>
> | 001.3.973808 | 2.5 kB |28 B |  [517, 386] |
>
> | 001.0.312718 | 2.4 kB |28 B |  [524, 467] |
>
> | 001.3.632066 | 2.4 kB |28 B |  [432, 377] |
>
> | 001.1.946590 | 2.4 kB |28 B |  [519, 389] |
>
> | 001.1.798591 | 2.4 kB |28 B |  [434, 388] |
>
> | 001.3.953922 | 2.4 kB |28 B |  [432, 375] |
>
> | 001.2.585518 | 2.4 kB |28 B |  [432, 375] |
>
> | 001.3.284942 | 2.4 kB |28 B |  [376, 432] |
>
> +--++-+-+
>
> Once you've identified these partitions you can run a compaction on the
> SSTables that contain them (identified using "nodetool getsstables").
> Note that user defined compactions are only available for STCS.
> Also ic-purge will perform a compaction but without writing to disk
> (should look like a validation compaction), so it is rightfully reported by
> the docs as an "intensive process" (not more than a repair though).
>
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> My bad on date formatting, it should have been : %Y/%m/%d
>> Otherwise the SSTables aren't ordered properly.
>>
>> You have 2 SSTables that claim to cover timestamps from 1940 to 2262,
>> which is weird.
>> Aside from that, you have big overlaps all over the SSTables, so that's
>> probably why your tombstones are sticking around.
>>
>> Your best shot here will be a major compaction of that table, since it
>> doesn't seem so big. Remember to use the --split-output flag on the
>> compaction command to avoid ending up with a single SSTable after that.
>>
>> Cheers,
>>
>> -
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
>>  wrote:
>>
>>> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
>>>
 Hi Leo,

 The overlapping SSTables are indeed the most probable cause as
 suggested by Jeff.
 Do you know if the tombstone compactions actually triggered? (did the
 SSTables name change?)

>>>
>>> Hello !
>>>
>>> I believe they have changed. I do not remember the sstable name but the
>>> "last modified" has changed recently for these tables.
>>>
>>>
 Could you run the following command to list SSTables and provide us the
 output? It will display both their timestamp ranges along with the
 estimated droppable tombstones ratio.


 for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
 $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
 -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
 grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y
 %H:%M:%S') $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort

>>>
>>> Here is the results :
>>>
>>> ```
>>> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
>>> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db

Re: Tombstones not getting purged

2019-06-20 Thread Alexander Dejanovski
Léo,

if a major compaction isn't a viable option, you can give a go at
Instaclustr SSTables tools to target the partitions with the most
tombstones :
https://github.com/instaclustr/cassandra-sstable-tools/tree/cassandra-2.2#ic-purge

It generates a report like this:

Summary:

+-+-+

| | Size|

+-+-+

| Disk|  1.9 GB |

| Reclaim | 11.7 MB |

+-+-+


Largest reclaimable partitions:

+--++-+-+

| Key  | Size   | Reclaim | Generations |

+--++-+-+

| 001.2.340862 | 3.2 kB |  3.2 kB | [534, 438, 498] |

| 001.2.946243 | 2.9 kB |  2.8 kB | [534, 434, 384] |

| 001.1.527557 | 2.8 kB |  2.7 kB | [534, 519, 394] |

| 001.2.181797 | 2.6 kB |  2.6 kB | [534, 424, 343] |

| 001.3.475853 | 2.7 kB |28 B |  [524, 462] |

| 001.0.159704 | 2.7 kB |28 B |  [440, 247] |

| 001.1.311372 | 2.6 kB |28 B |  [424, 458] |

| 001.0.756293 | 2.6 kB |28 B |  [428, 358] |

| 001.2.681009 | 2.5 kB |28 B |  [440, 241] |

| 001.2.474773 | 2.5 kB |28 B |  [524, 484] |

| 001.2.974571 | 2.5 kB |28 B |  [386, 517] |

| 001.0.143176 | 2.5 kB |28 B |  [518, 368] |

| 001.1.185198 | 2.5 kB |28 B |  [517, 386] |

| 001.3.503517 | 2.5 kB |28 B |  [426, 346] |

| 001.1.847384 | 2.5 kB |28 B |  [436, 396] |

| 001.0.949269 | 2.5 kB |28 B |  [516, 356] |

| 001.0.756763 | 2.5 kB |28 B |  [440, 249] |

| 001.3.973808 | 2.5 kB |28 B |  [517, 386] |

| 001.0.312718 | 2.4 kB |28 B |  [524, 467] |

| 001.3.632066 | 2.4 kB |28 B |  [432, 377] |

| 001.1.946590 | 2.4 kB |28 B |  [519, 389] |

| 001.1.798591 | 2.4 kB |28 B |  [434, 388] |

| 001.3.953922 | 2.4 kB |28 B |  [432, 375] |

| 001.2.585518 | 2.4 kB |28 B |  [432, 375] |

| 001.3.284942 | 2.4 kB |28 B |  [376, 432] |

+--++-+-+

Once you've identified these partitions you can run a compaction on the
SSTables that contain them (identified using "nodetool getsstables").
Note that user defined compactions are only available for STCS.
Also ic-purge will perform a compaction but without writing to disk (should
look like a validation compaction), so it is rightfully reported by the
docs as an "intensive process" (not more than a repair though).

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 9:17 AM Alexander Dejanovski 
wrote:

> My bad on date formatting, it should have been : %Y/%m/%d
> Otherwise the SSTables aren't ordered properly.
>
> You have 2 SSTables that claim to cover timestamps from 1940 to 2262,
> which is weird.
> Aside from that, you have big overlaps all over the SSTables, so that's
> probably why your tombstones are sticking around.
>
> Your best shot here will be a major compaction of that table, since it
> doesn't seem so big. Remember to use the --split-output flag on the
> compaction command to avoid ending up with a single SSTable after that.
>
> Cheers,
>
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
>  wrote:
>
>> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi Leo,
>>>
>>> The overlapping SSTables are indeed the most probable cause as suggested
>>> by Jeff.
>>> Do you know if the tombstone compactions actually triggered? (did the
>>> SSTables name change?)
>>>
>>
>> Hello !
>>
>> I believe they have changed. I do not remember the sstable name but the
>> "last modified" has changed recently for these tables.
>>
>>
>>> Could you run the following command to list SSTables and provide us the
>>> output? It will display both their timestamp ranges along with the
>>> estimated droppable tombstones ratio.
>>>
>>>
>>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
>>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
>>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
>>> grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S')
>>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort
>>>
>>
>> Here is the results :
>>
>> ```
>> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
>> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
>> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
>> 05/01/2019 08:03:24 03/06/2018 

Re: Tombstones not getting purged

2019-06-20 Thread Alexander Dejanovski
My bad on date formatting, it should have been : %Y/%m/%d
Otherwise the SSTables aren't ordered properly.

You have 2 SSTables that claim to cover timestamps from 1940 to 2262, which
is weird.
Aside from that, you have big overlaps all over the SSTables, so that's
probably why your tombstones are sticking around.

Your best shot here will be a major compaction of that table, since it
doesn't seem so big. Remember to use the --split-output flag on the
compaction command to avoid ending up with a single SSTable after that.

Cheers,

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 8:13 AM Léo FERLIN SUTTON
 wrote:

> On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Leo,
>>
>> The overlapping SSTables are indeed the most probable cause as suggested
>> by Jeff.
>> Do you know if the tombstone compactions actually triggered? (did the
>> SSTables name change?)
>>
>
> Hello !
>
> I believe they have changed. I do not remember the sstable name but the
> "last modified" has changed recently for these tables.
>
>
>> Could you run the following command to list SSTables and provide us the
>> output? It will display both their timestamp ranges along with the
>> estimated droppable tombstones ratio.
>>
>>
>> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200
>> $f); echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "
>> -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" |
>> grep Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S')
>> $(echo "$meta" | grep droppable) $(ls -lh $f); done | sort
>>
>
> Here is the results :
>
> ```
> 04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
> 04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
> 05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db
> 05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db
> 05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones:
> 0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55
> md-167943-big-Data.db
> 05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones:
> 0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35
> md-167913-big-Data.db
> 06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51
> md-167917-big-Data.db
> 06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones:
> 0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29
> md-167938-big-Data.db
> 06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones:
> 0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52
> md-167940-big-Data.db
> 06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db
> 06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db
> 06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db
> 06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db
> 07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db
> 09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db
> 11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db
> 12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones:
> 0.0 -rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db
> ```
>
> You could also check the min and max tokens in each SSTable (not sure if
>> you get that info from 3.0 sstablemetadata) so that you can detect the
>> SSTables that overlap on token ranges with the ones that carry the
>> tombstones, and have earlier timestamps. This way you'll be able to trigger
>> manual compactions, targeting those specific SSTables.
>>
>
> I have checked and I don't 

Re: Tombstones not getting purged

2019-06-20 Thread Léo FERLIN SUTTON
On Thu, Jun 20, 2019 at 7:37 AM Alexander Dejanovski 
wrote:

> Hi Leo,
>
> The overlapping SSTables are indeed the most probable cause as suggested
> by Jeff.
> Do you know if the tombstone compactions actually triggered? (did the
> SSTables name change?)
>

Hello !

I believe they have changed. I do not remember the sstable name but the
"last modified" has changed recently for these tables.


> Could you run the following command to list SSTables and provide us the
> output? It will display both their timestamp ranges along with the
> estimated droppable tombstones ratio.
>
>
> for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200 $f);
> echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3|
> cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" | grep
> Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(echo
> "$meta" | grep droppable) $(ls -lh $f); done | sort
>

Here is the results :

```
04/01/2019 22:53:12 03/06/2018 16:46:13 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 16G Apr 13 14:35 md-147916-big-Data.db
04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 218M Jun 20 05:57 md-167948-big-Data.db
04/11/2262 23:47:16 10/09/1940 19:13:17 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:57 md-167942-big-Data.db
05/01/2019 08:03:24 03/06/2018 16:46:13 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 4.6G May 1 08:39 md-152253-big-Data.db
05/09/2018 06:35:03 03/06/2018 16:46:07 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 30G Apr 13 22:09 md-147948-big-Data.db
05/21/2019 05:28:01 03/06/2018 16:46:16 Estimated droppable tombstones:
0.45150604672159905 -rw-r--r-- 1 cassandra cassandra 1.1G Jun 20 05:55
md-167943-big-Data.db
05/22/2019 11:54:33 03/06/2018 16:46:16 Estimated droppable tombstones:
0.30826566640798975 -rw-r--r-- 1 cassandra cassandra 7.6G Jun 20 04:35
md-167913-big-Data.db
06/13/2019 00:02:40 03/06/2018 16:46:08 Estimated droppable tombstones:
0.20980847354256815 -rw-r--r-- 1 cassandra cassandra 6.9G Jun 20 04:51
md-167917-big-Data.db
06/17/2019 05:56:12 06/16/2019 20:33:52 Estimated droppable tombstones:
0.6114260192855792 -rw-r--r-- 1 cassandra cassandra 257M Jun 20 05:29
md-167938-big-Data.db
06/18/2019 11:21:55 03/06/2018 17:48:22 Estimated droppable tombstones:
0.18655813086540254 -rw-r--r-- 1 cassandra cassandra 2.2G Jun 20 05:52
md-167940-big-Data.db
06/19/2019 16:53:04 06/18/2019 11:22:04 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 425M Jun 19 17:08 md-167782-big-Data.db
06/20/2019 04:17:22 06/19/2019 16:53:04 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 146M Jun 20 04:18 md-167921-big-Data.db
06/20/2019 05:50:23 06/20/2019 04:17:32 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 42M Jun 20 05:56 md-167946-big-Data.db
06/20/2019 05:56:03 06/20/2019 05:50:32 Estimated droppable tombstones: 0.0
-rw-r--r-- 2 cassandra cassandra 4.8M Jun 20 05:56 md-167947-big-Data.db
07/03/2018 17:26:54 03/06/2018 16:46:07 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 27G Apr 13 17:45 md-147919-big-Data.db
09/09/2018 18:55:23 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 30G Apr 13 18:57 md-147926-big-Data.db
11/30/2018 11:52:33 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 14G Apr 13 13:53 md-147908-big-Data.db
12/20/2018 07:30:03 03/06/2018 16:46:08 Estimated droppable tombstones: 0.0
-rw-r--r-- 1 cassandra cassandra 9.3G Apr 13 13:28 md-147906-big-Data.db
```

You could also check the min and max tokens in each SSTable (not sure if
> you get that info from 3.0 sstablemetadata) so that you can detect the
> SSTables that overlap on token ranges with the ones that carry the
> tombstones, and have earlier timestamps. This way you'll be able to trigger
> manual compactions, targeting those specific SSTables.
>

I have checked and I don't believe the info is available in the 3.0.X
version of sstablemetadata :(


> The rule for a tombstone to be purged is that there is no SSTable outside
> the compaction that would possibly contain the partition and that would
> have older timestamps.
>
 Is there a way to log these checks and decisions made by the compaction
thread ?


> Is this a followup on your previous issue where you were trying to perform
> a major compaction on an LCS table?
>

In some way.

We are trying to globally reclaim the data used up by our tombstones (on
more than one table). We have recently started to purge old data in our
cassandra cluster, and since (on cloud providers) `Disk space isn't cheap`
we are trying to be sure the data correctly expires and the disk space is
reclaimed !

The major compaction on the LCS table was one of our unsuccessful attempts
(too long and too much disk space 

Re: Tombstones not getting purged

2019-06-19 Thread Alexander Dejanovski
Hi Leo,

The overlapping SSTables are indeed the most probable cause as suggested by
Jeff.
Do you know if the tombstone compactions actually triggered? (did the
SSTables name change?)

Could you run the following command to list SSTables and provide us the
output? It will display both their timestamp ranges along with the
estimated droppable tombstones ratio.


for f in *Data.db; do meta=$(sstablemetadata -gc_grace_seconds 259200 $f);
echo $(date --date=@$(echo "$meta" | grep Maximum\ time | cut -d" "  -f3|
cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(date --date=@$(echo "$meta" | grep
Minimum\ time | cut -d" "  -f3| cut -c 1-10) '+%m/%d/%Y %H:%M:%S') $(echo
"$meta" | grep droppable) $(ls -lh $f); done | sort


It will allow to see the timestamp ranges of the SSTables. You could also
check the min and max tokens in each SSTable (not sure if you get that info
from 3.0 sstablemetadata) so that you can detect the SSTables that overlap
on token ranges with the ones that carry the tombstones, and have earlier
timestamps. This way you'll be able to trigger manual compactions,
targeting those specific SSTables.
The rule for a tombstone to be purged is that there is no SSTable outside
the compaction that would possibly contain the partition and that would
have older timestamps.

Is this a followup on your previous issue where you were trying to perform
a major compaction on an LCS table?


-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Thu, Jun 20, 2019 at 7:02 AM Jeff Jirsa  wrote:

> Probably overlapping sstables
>
> Which compaction strategy?
>
>
> > On Jun 19, 2019, at 9:51 PM, Léo FERLIN SUTTON
>  wrote:
> >
> > I have used the following command to check if I had droppable tombstones
> :
> > `/usr/bin/sstablemetadata --gc_grace_seconds 259200
> /var/lib/cassandra/data/stats/tablename/md-sstablename-big-Data.db`
> >
> > I checked every sstable in a loop and had 4 sstables with droppable
> tombstones :
> >
> > ```
> > Estimated droppable tombstones: 0.1558453651124074
> > Estimated droppable tombstones: 0.20980847354256815
> > Estimated droppable tombstones: 0.30826566640798975
> > Estimated droppable tombstones: 0.45150604672159905
> > ```
> >
> > I changed my compaction configuration this morning (via JMX) to force a
> tombstone compaction. These are my settings on this node :
> >
> > ```
> > {
> > "max_threshold":"32",
> > "min_threshold":"4",
> > "unchecked_tombstone_compaction":"true",
> > "tombstone_threshold":"0.1",
> > "class":"org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy"
> > }
> > ```
> > The threshold is lowed than the amount of tombstones in these sstables
> and I expected the setting `unchecked_tombstone_compaction=True` would
> force cassandra to run a "Tombstone Compaction", yet about 24h later all
> the tombstones are still there.
> >
> > ## About the cluster :
> >
> > The compaction backlog is clear and here are our cassandra settings :
> >
> > Cassandra 3.0.18
> > concurrent_compactors: 4
> > compaction_throughput_mb_per_sec: 150
> > sstable_preemptive_open_interval_in_mb: 50
> > memtable_flush_writers: 4
> >
> >
> > Any idea what I might be missing ?
> >
> > Regards,
> >
> > Leo
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Tombstones not getting purged

2019-06-19 Thread Jeff Jirsa
Probably overlapping sstables

Which compaction strategy?


> On Jun 19, 2019, at 9:51 PM, Léo FERLIN SUTTON  
> wrote:
> 
> I have used the following command to check if I had droppable tombstones :
> `/usr/bin/sstablemetadata --gc_grace_seconds 259200 
> /var/lib/cassandra/data/stats/tablename/md-sstablename-big-Data.db`
> 
> I checked every sstable in a loop and had 4 sstables with droppable 
> tombstones : 
> 
> ```
> Estimated droppable tombstones: 0.1558453651124074
> Estimated droppable tombstones: 0.20980847354256815
> Estimated droppable tombstones: 0.30826566640798975
> Estimated droppable tombstones: 0.45150604672159905
> ```
> 
> I changed my compaction configuration this morning (via JMX) to force a 
> tombstone compaction. These are my settings on this node :
> 
> ```
> {
> "max_threshold":"32",
> "min_threshold":"4",
> "unchecked_tombstone_compaction":"true",
> "tombstone_threshold":"0.1",
> "class":"org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy"
> }
> ```
> The threshold is lowed than the amount of tombstones in these sstables and I 
> expected the setting `unchecked_tombstone_compaction=True` would force 
> cassandra to run a "Tombstone Compaction", yet about 24h later all the 
> tombstones are still there.
> 
> ## About the cluster : 
> 
> The compaction backlog is clear and here are our cassandra settings : 
> 
> Cassandra 3.0.18
> concurrent_compactors: 4
> compaction_throughput_mb_per_sec: 150
> sstable_preemptive_open_interval_in_mb: 50
> memtable_flush_writers: 4
> 
> 
> Any idea what I might be missing ?
> 
> Regards,
> 
> Leo

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org