[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine

Sylvain Lebresne (JIRA) Thu, 18 Jun 2015 06:59:59 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591825#comment-14591825
 ]


Sylvain Lebresne commented on CASSANDRA-8099:
---------------------------------------------

bq. why not introduce a special kind of RTM, of type BOUNDARY

Because that doesn't really fit the part about reusing "slice bounds without 
adding a new concept". A RT is a slice of deletion, and I think that's rather 
clean. I strongly suspect that either not reusing slice for RTs, adding a 
{{BOUNDARY}} concept to {{Slice.BOUND}} (which doesn't really make sense for 
slices per-se), or some other hack to work around this will make things more 
confusing/complex, and is thus not worth a minor optimization of a probably 
pretty rare situation in practice. Feel free to give the changes a shot though 
if you're convinced otherwise, and we can have a more informed discussion on 
the result.

bq. Do we have test coverage of these?

We have generally poor coverage of range tombstone usage (though I've modified 
Branamir's test to test the reverse case too, so we have one test for that now 
at least). We have some basic tests, but nothing fancy enough. I've created 
CASSANDRA-9617 to improve this.

bq. I would like to suggest we relabel the marker types to (something like) 
{{UPPER/LOWER}}

We can, though due to the point above that means also renaming it for slices in 
general. Which is not crazy per-se in that we do scan slices from end to start 
for reverse queries so they are equivalent in that respect, it's just that it's 
a departure of the existing nomenclature (which  I don't personally mind).

> Refactor and modernize the storage engine
> -----------------------------------------
>
>                 Key: CASSANDRA-8099
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0 beta 1
>
>         Attachments: 8099-nit
>
>
> The current storage engine (which for this ticket I'll loosely define as "the 
> code implementing the read/write path") is suffering from old age. One of the 
> main problem is that the only structure it deals with is the cell, which 
> completely ignores the more high level CQL structure that groups cell into 
> (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have 
> to group cells multiple times (to count on replica, then to count on the 
> coordinator, then to produce the CQL resultset) because we forget about the 
> grouping right away each time (so lots of useless cell names comparisons in 
> particular). But outside inefficiencies, having to manually recreate the CQL 
> structure every time we need it for something is hindering new features and 
> makes the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the 
> fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
> hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has 
> to go into to simply "remove the last query result".
> So I want to bite the bullet and modernize this storage engine. I propose to 
> do 2 main things:
> # Make the storage engine more aware of the CQL structure. In practice, 
> instead of having partitions be a simple iterable map of cells, it should be 
> an iterable list of row (each being itself composed of per-column cells, 
> though obviously not exactly the same kind of cell we have today).
> # Make the engine more iterative. What I mean here is that in the read path, 
> we end up reading all cells in memory (we put them in a ColumnFamily object), 
> but there is really no reason to. If instead we were working with iterators 
> all the way through, we could get to a point where we're basically 
> transferring data from disk to the network, and we should be able to reduce 
> GC substantially.
> Please note that such refactor should provide some performance improvements 
> right off the bat but it's not it's primary goal either. It's primary goal is 
> to simplify the storage engine and adds abstraction that are better suited to 
> further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine

Reply via email to