[ 
https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344463#comment-15344463
 ] 

Sylvain Lebresne commented on CASSANDRA-9779:
---------------------------------------------

Shouldn't our first step to validate that the optimization we have in mind 
actually make a meaningful difference (without having to bend a benchmark too 
hard to show benefits)? It seems clear to me that this will add complexity from 
the user point of view (it's a new concept that will either have good 
footshooting potential (if we were to just trust the user to insert only 
without checking it) and be annoying to use (if we force all columns every 
time)), so it sounds to me like we would need to demonstrate fairly big 
performance benefits to be worth doing (keep in mind that once we add such 
thing, we can't easily remove it, even if the improvement become obsolete).

tl;dr, I don't love that whole idea as I think it adds complexity from the user 
point of view (don't get me wrong, if we could validate this at insert time, 
I'd be a lot more fan, but we can't), and I'm wondering if given DTCS and other 
optimization we have internally this really bring that much to the table. 

> Append-only optimization
> ------------------------
>
>                 Key: CASSANDRA-9779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9779
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> Many common workloads are append-only: that is, they insert new rows but do 
> not update existing ones.  However, Cassandra has no way to infer this and so 
> it must treat all tables as if they may experience updates in the future.
> If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for 
> instance) then we could do a number of optimizations:
> - Compaction would only need to worry about defragmenting partitions, not 
> rows.  We could default to DTCS or similar.
> - CollationController could stop scanning sstables as soon as it finds a 
> matching row
> - Most importantly, materialized views wouldn't need to worry about deleting 
> prior values, which would eliminate the majority of the MV overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to