[ 
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831386#comment-13831386
 ] 

Benedict commented on CASSANDRA-5417:
-------------------------------------

bq. Bug notwithstanding, we only intern names that are part of the 
column_metadata in thrift parlance

Good point, I misread this at the start of processing the patch, should have 
confirmed once I'd got a proper handle on it.

bq. Well, actually no. It is a name-of-a-cell. ...  the naming would be a lot 
more sensible if the Column class was actually named Cell, and we plan on 
changing that...

Well, it seems to me the nomenclature is a little confused because we're 
overloading the name with data. This is parcelled up in the Prefix business 
too. A Simple cell name isn't really prefixed/clustered (by the terminology of 
CQL), and is simply a name-of-a-cell, but when we're mungung data into the name 
to support our clustering behaviour it becomes (to me) a 
(part-of-a-)cell-with-a-name. These were simply suggestions for maybe 
clarifying though, after struggling to grasp it initially. Since I understand 
what's going on now, I won't lose much sleep over the naming, and I can now see 
where you're coming from. I certainly agree that it's used to identify a cell, 
so I'll leave it be.

Sparse in particular threw me, as I can't see a reason for the moniker, and in 
normal English usage it means more than 'not dense', but I won't lose any sleep 
over it either.

bq. There is no meaningful inheritance relations between the cell names 
implementations

This was definitely my most tentative suggestion, and I suggested it only for 
simplifying the class naming and because it didn't seem *too* ugly. Definitely 
happy to ignore it.

bq. Besides, in this particular case, thrift queries will call toByteBuffer() 
all the time without needing the duplication so leaving it to the caller does 
save some

Fair enough - seemed we had the opposite problem with new BBs created in 
Composite*, but since duplicate() will be almost free in almost all cases, it 
was a bit of a non-issue in the first place. Happy to stick with the current 
conventions.

bq. Definitively not the most clear comment but as far as I can tell it's not 
incomplete.

Well, e.g., it says we can select = 'a AND >= 'a' both by using <'a'><0>. I 
assume the intention is to use the latter in a range, with an inclusive lower 
bound and exclusive upper bound. But it isn't clear.

> Push composites support in the storage engine
> ---------------------------------------------
>
>                 Key: CASSANDRA-5417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5417
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: performance
>             Fix For: 2.1
>
>
> CompositeType happens to be very useful and is now widely used: CQL3 heavily 
> rely on it, and super columns are now using it too internally. Besides, 
> CompositeType has been advised as a replacement of super columns on the 
> thrift side for a while, so it's safe to assume that it's generally used 
> there too.
> CompositeType has initially been introduced as just another AbstractType.  
> Meaning that the storage engine has no nothing whatsoever of composites 
> being, well, composite. This has the following drawbacks:
> * Because internally a composite value is handled as just a ByteBuffer, we 
> end up doing a lot of extra work. Typically, each time we compare 2 composite 
> value, we end up "deserializing" the components (which, while it doesn't copy 
> data per-se because we just slice the global ByteBuffer, still waste some cpu 
> cycles and allocate a bunch of ByteBuffer objects). And since compare can be 
> called *a lot*, this is likely not negligible.
> * This make CQL3 code uglier than necessary. Basically, CQL3 makes extensive 
> use of composites, and since it gets backs ByteBuffer from the internal 
> columns, it always have to check if it's actually a compositeType or not, and 
> then split it and pick the different parts it needs. It's only an API 
> problem, but having things exposed as composites directly would definitively 
> make thinks cleaner. In particular, in most cases, CQL3 don't care whether it 
> has a composite with only one component or a non-really-composite value, but 
> we still always distinguishes both cases.  Lastly, if we do expose composites 
> more directly internally, it's not a lot more work to "internalize" better 
> the different parts of the cell name that CQL3 uses (what's the clustering 
> key, what's the actuall CQL3 column name, what's the collection element), 
> making things cleaner. Last but not least, there is currently a bunch of 
> places where methods take a ByteBuffer as argument and it's hard to know 
> whether it expects a cell name or a CQL3 column name. This is pretty error 
> prone.
> * It makes it hard (or impossible) to do a number of performance 
> improvements.  Consider CASSANDRA-4175, I'm not really sure how you can do it 
> properly (in memory) if cell names are just ByteBuffer (since CQL3 column 
> names are just one of the component in general). But we also miss 
> oportunities of sharing prefixes. If we were able to share prefixes of 
> composite names in memory we would 1) lower the memory footprint and 2) 
> potentially speed-up comparison (of the prefixes) by checking reference 
> equality first (also, doing prefix sharing on-disk, which is a separate 
> concern btw, might be easier to do if we do prefix sharing in memory).
> So I suggest pushing CompositeType support inside the storage engine. What I 
> mean by that concretely would be change the internal {{Column.name}} from 
> ByteBuffer to some CellName type. A CellName would API-wise just be a list of 
> ByteBuffer. But in practice, we'd have a specific CellName implementation for 
> not-really-composite names, and the truly composite implementation will allow 
> some prefix sharing. From an external API however, nothing would change, we 
> would pack the composite as usual before sending it back to the client, but 
> at least internally, comparison won't have to deserialize the components 
> every time, and CQL3 code will be cleaner.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to