[ https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344463#comment-15344463 ]
Sylvain Lebresne commented on CASSANDRA-9779: --------------------------------------------- Shouldn't our first step to validate that the optimization we have in mind actually make a meaningful difference (without having to bend a benchmark too hard to show benefits)? It seems clear to me that this will add complexity from the user point of view (it's a new concept that will either have good footshooting potential (if we were to just trust the user to insert only without checking it) and be annoying to use (if we force all columns every time)), so it sounds to me like we would need to demonstrate fairly big performance benefits to be worth doing (keep in mind that once we add such thing, we can't easily remove it, even if the improvement become obsolete). tl;dr, I don't love that whole idea as I think it adds complexity from the user point of view (don't get me wrong, if we could validate this at insert time, I'd be a lot more fan, but we can't), and I'm wondering if given DTCS and other optimization we have internally this really bring that much to the table. > Append-only optimization > ------------------------ > > Key: CASSANDRA-9779 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9779 > Project: Cassandra > Issue Type: New Feature > Components: CQL > Reporter: Jonathan Ellis > Fix For: 3.x > > > Many common workloads are append-only: that is, they insert new rows but do > not update existing ones. However, Cassandra has no way to infer this and so > it must treat all tables as if they may experience updates in the future. > If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for > instance) then we could do a number of optimizations: > - Compaction would only need to worry about defragmenting partitions, not > rows. We could default to DTCS or similar. > - CollationController could stop scanning sstables as soon as it finds a > matching row > - Most importantly, materialized views wouldn't need to worry about deleting > prior values, which would eliminate the majority of the MV overhead -- This message was sent by Atlassian JIRA (v6.3.4#6332)