[ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361275#comment-14361275
 ] 

Benedict commented on CASSANDRA-8099:
-------------------------------------

Like I said, it's a shame; I lament not having longer to criticise the less 
optimal decisions. That's not at all to suggest they will cumulatively sabotage 
this patch to worse than the status quo. But the bar for improvement is much 
higher once a round of changes goes in (not least because the effort of 
maintaining compatibility each time, but also because it has to be justified 
afresh, and be worth the risk, argumentation, redevelopment, etc.), and so we 
will find ourselves settling more readily than had we considered our options 
more carefully up front, especially when there are so many aspects to discuss. 
I don't think there is much to be done about it now, though, given the time 
constraints, and we will simply have to do our best.

Anyway, I'll try to properly digest the patch over the next week or so, so I 
can give some actual concrete feedback. On the whole I _do_ think it is a huge 
step forward (well, perhaps not the naming :)). I just wish we weren't rushing 
this part after waiting so long for it, and that we had at least discussed some 
of the more concrete aspects of the design in advance.

The concern I have about the scope being too large to vet effectively is 
somewhat uncorrelated, but I don't have a good answer for that either. My 
experience is that review's capacity for finding problems doesn't scale 
linearly with the scope and complexity of a patch, and I don't think we've ever 
had a patch as large as this (it's basically a whole version jump on its own). 
Of course, if you're planing to break 3.0 just to make me feel better about 
breaking 2.1, I'm cool with that :)

> Refactor and modernize the storage engine
> -----------------------------------------
>
>                 Key: CASSANDRA-8099
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0
>
>         Attachments: 8099-nit
>
>
> The current storage engine (which for this ticket I'll loosely define as "the 
> code implementing the read/write path") is suffering from old age. One of the 
> main problem is that the only structure it deals with is the cell, which 
> completely ignores the more high level CQL structure that groups cell into 
> (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have 
> to group cells multiple times (to count on replica, then to count on the 
> coordinator, then to produce the CQL resultset) because we forget about the 
> grouping right away each time (so lots of useless cell names comparisons in 
> particular). But outside inefficiencies, having to manually recreate the CQL 
> structure every time we need it for something is hindering new features and 
> makes the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the 
> fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
> hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has 
> to go into to simply "remove the last query result".
> So I want to bite the bullet and modernize this storage engine. I propose to 
> do 2 main things:
> # Make the storage engine more aware of the CQL structure. In practice, 
> instead of having partitions be a simple iterable map of cells, it should be 
> an iterable list of row (each being itself composed of per-column cells, 
> though obviously not exactly the same kind of cell we have today).
> # Make the engine more iterative. What I mean here is that in the read path, 
> we end up reading all cells in memory (we put them in a ColumnFamily object), 
> but there is really no reason to. If instead we were working with iterators 
> all the way through, we could get to a point where we're basically 
> transferring data from disk to the network, and we should be able to reduce 
> GC substantially.
> Please note that such refactor should provide some performance improvements 
> right off the bat but it's not it's primary goal either. It's primary goal is 
> to simplify the storage engine and adds abstraction that are better suited to 
> further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to