[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772023#comment-13772023
 ] 

Sylvain Lebresne commented on CASSANDRA-4175:
---------------------------------------------

I'm pretty sure we'd need CASSANDRA-5417 to make that doable (in fact, that's 
one of my original motivation for doing CASSANDRA-5417). Namely, we don't want 
a cell name/id map, we want a cql3 column/id map, otherwise this loose most 
interest. And we can't do a cql3 column/id map if we store cell name as opaque 
byte buffers.

To be more precise, I don't deny that a cell name/id map could be a start and 
would in fact server some use cases, but I'm a bit reluctant in implementing 
that knowing that we want to change to a cql3 column/id map sooner than later 
because I suspect it'll be a lot easier to do "the right thing" to start with 
rather than doing cell name/id map and then have a painful time to switch a 
cql3 column/id one without breaking backward compatibility.

Besides, I also suspect there is a bunch of refactorings that are in 
CASSANDRA-5417 that would be needed here as well, so working on both separately 
without coordination is likely to be frustrating and a duplication of effort.

Anyway, I do plan on getting back to CASSANDRA-5417 asap (though it will 
unlikely be like next week) so maybe we can hold a bit on that one until then? 
If I've made no progress on CASSANDRA-5417 in say a month or two, and people 
really want this, we can re-evaluate? 
                
> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>             Fix For: 2.1
>
>
> We spend a lot of memory on column names, both transiently (during reads) and 
> more permanently (in the row cache).  Compression mitigates this on disk but 
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too 
> via very high allocation rates in the young generation, hence more GC 
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined 
> before they are inserted, we could create a map of (say) 32-bit int column 
> id, to names, and use that internally right up until we return a resultset to 
> the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to