[ 
https://issues.apache.org/jira/browse/CASSANDRA-18042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638619#comment-17638619
 ] 

Michael Semb Wever commented on CASSANDRA-18042:
------------------------------------------------

bq. If you use a zero TTL with Time Window Compaction, the number of SStables 
on disk will grow infinitely. 

It is easy to delete old sstables off disk. Many operators know how to do this, 
and want to take this approach because the TTL is unknown (e.g. based on 
storage costs and not time).

The majority of use-cases TTL I've seen set TTL on writes and not the table 
schema. A fail by default guardrail would be an unpleasant surprise for them. 
Setting TTL only on writes is perfectly acceptable as well (e.g. variable TTL).

bq. it may surprise users who convert a table from Size Tiered to TWC when they 
usually, but not always, insert a row level TTL.

The existing data when it was STCS could well be without TTL. The guardrail 
cannot solve this.


bq. My understanding is that TWC will never clean up a SSTable if even a single 
row has a zero TTL.  

Not true. Tombstone compactions will still occur. There are TWCS options for 
tuning this.

bq. This is what we have seen with users in production who have configured TWC 
and used a zero TTL. 

This is the purpose of implementing the fail guardrail. But there are good 
reasons why it should not be the default.



> Implement a guardrail for not having zero default ttl on tables with TWCS
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18042
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18042
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Guardrails, Legacy/Core
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.x
>
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A user was surprised that his data have not started to expire after 90 days 
> on his TWCS, he noticed that default_time_to_live on the table was set to 0 
> (by accident from his side) and inserts were using TTL = 0 too.
> It is questionable why it it possible to create a table with TWCS and enable 
> a user to specify default_time_to_live to be zero.
> On the other hand, I would argue that having default_time_to_live set to 0 on 
> TWCS does not necessarily mean that such combination is illegal. It is about 
> people just using that with advantage very often so tables are compacted away 
> nicely. However, that does not have to mean that they could not use it with 
> 0. But I yet have to see a use-case where TWCS was used and default ttl was 
> set to 0 on purpose. Merely looking into Cassandra codebase, there are only 
> cases when this parameter is not 0.
> There are three approaches:
> 1) just reject such statements (for CreateTable and AlterTable statements) 
> where default_time_to_live = 0
> 2) Implement a guardrail for 1) so it can be enabled / disabled on demand
> 3) Leave possibility to set default_time_to_live to 0 on a table but make a 
> guardrail for UpdateStatement so it might reject queries for tables with 
> default_time_to_live is zero and for which its TTL (on that update statement) 
> is set to 0 too.
> I would be careful about making the current configuration illegal because of 
> backward compatibility. For that reason 2) makes the most sense to me.
> Maybe implementing 3) would make sense as well. There might be a table which 
> has default ttl set to 0 as it expects a user to supply TTL every time. 
> However, as it is not currently enforced anywhere, a client might still 
> insert TTLs to be set to 0 even by accident.
> POC for 2) is here 
> https://github.com/instaclustr/cassandra/commit/0b4dcc3d3deeffa393c02a3b80e27482007f9579



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to