[jira] [Commented] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map

Robert Stupp (JIRA) Fri, 18 Jul 2014 09:42:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066498#comment-14066498
 ]


Robert Stupp commented on CASSANDRA-4175:
-----------------------------------------

My five cent ;) Sorry, if I repeat some things, didn't read everything...

Using such a enum/map of _column-id_ to _column-name_ should also include UDT 
field names

The id generator for the _column-id_ could be per-keyspace (maybe something 
like a _next-column-id_ field per keyspace)

I guess a typical column name is 10-15 chars long.
So the savings on heap and off-heap are worth implementing that enum/map - such 
a typical column name {{String}} occupies about 60 bytes on heap - an {{int}} 
just 4. And it removes pressure from GC.

Savings could also occur on the wire (between nodes), in the commit log and in 
data files. If the _column-id_ is globlally unique per KS, sstable files remain 
to be portable between nodes (are they portable?).

It might also save bandwidth when serializing result sets back to the client 
(if all clients shall have to know about that id-name mapping).

> Reduce memory, disk space, and cpu usage with a column name/id map
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>              Labels: performance
>             Fix For: 3.0
>
>
> We spend a lot of memory on column names, both transiently (during reads) and 
> more permanently (in the row cache).  Compression mitigates this on disk but 
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too 
> via very high allocation rates in the young generation, hence more GC 
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined 
> before they are inserted, we could create a map of (say) 32-bit int column 
> id, to names, and use that internally right up until we return a resultset to 
> the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-4175) Reduce memory, disk space, and cpu usage with a column name/id map

Reply via email to