[ 
https://issues.apache.org/jira/browse/CASSANDRA-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haebin Na updated CASSANDRA-7115:
---------------------------------

    Description: 
We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this 
is anti-pattern) wide row, it is not likely to be deleted since the row would 
be highly fragmented.

In order to solve the problem above, I suggest partitioning column family 
(table) with column key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and 
cover certain range of columns per CF. This means that a row is 
deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate 
specific partition (a sub-table or CF with specific range) if it is older than 
certain age easily without worrying about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by 
biyearly or quarterly, it could be whole lot more efficient than TTLed columns.

What do you think?




  was:
We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this 
is anti-pattern) wide row, it is not likely to be deleted since the row would 
be highly fragmented.

In order to solve the problem above, I suggest partitioning column family 
(table) with column key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and 
cover certain range of columns per CF. This means that a row is 
deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate 
specific partition (a sub-table or CF with specific range) if it is older than 
certain age easily without worrying about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by 
biyearly or quarterly, we could achieve whole lot more efficient than TTLed 
columns.

What do you think?




        Summary: Column Family (Table) partitioning with column keys as 
partition keys (Sorta TTLed Table)  (was: Partitioned Column Family (Table) 
based on Column Keys (Sorta TTLed Table))

> Column Family (Table) partitioning with column keys as partition keys (Sorta 
> TTLed Table)
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7115
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7115
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Haebin Na
>            Priority: Minor
>              Labels: features
>
> We need a better solution to expire columns than TTLed columns.
> If you set TTL 6 months for a column in a frequently updated(deleted, yes, 
> this is anti-pattern) wide row, it is not likely to be deleted since the row 
> would be highly fragmented.
> In order to solve the problem above, I suggest partitioning column family 
> (table) with column key (column1) as partition key.
> It is like a set of column families (tables) which share the same structure 
> and cover certain range of columns per CF. This means that a row is 
> deterministically fragmented by column key.
> If you use timestamp like column key, then you would be able to truncate 
> specific partition (a sub-table or CF with specific range) if it is older 
> than certain age easily without worrying about zombie tombstones. 
> It is not optimal to have many column families, yet even with small set like 
> by biyearly or quarterly, it could be whole lot more efficient than TTLed 
> columns.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to