[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810299#comment-13810299
 ] 

Benedict commented on CASSANDRA-4175:
-------------------------------------

I think it could be a big win from a CPU pov just to have a transient (per 
launch, per node) map. On the assumption that we convert back via a single 
array lookup, the extra indirection cost is unlikely to be measurable, but if 
we were to precompute the comparisons of the ByteBuffer names we would 
definitely save O(name.length()) operations per task, but could potentially 
switch to counting sort and save O(m.n.lg(n)) [where n is the number of columns 
involved in an operation, and m is the length of the column names] for CFs 
with, say, < 100 columns.

It could potentially be implemented by abstracting Column to allow different 
sources of name(), so that CFs with large numbers of column names, or TimeUUID 
comparators, etc. can remain with the current implementation. Obviously with 
care taken not to break the native protocol...


> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>              Labels: performance
>             Fix For: 2.1
>
>
> We spend a lot of memory on column names, both transiently (during reads) and 
> more permanently (in the row cache).  Compression mitigates this on disk but 
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too 
> via very high allocation rates in the young generation, hence more GC 
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined 
> before they are inserted, we could create a map of (say) 32-bit int column 
> id, to names, and use that internally right up until we return a resultset to 
> the client.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to