[ https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698710#comment-13698710 ]
Terje Marthinussen commented on CASSANDRA-4175: ----------------------------------------------- I should maybe add, 1 and 2 above does not exclude but rather complement each other. #1 is a manual map and could allow things like a prefix map such as '$201212' which will map all such prefixes to an id #2 is a auto map. It may require 1 if we want to consider to allow user to give "hints" to substring maps such as '$(201\d\d\d)' to map all year+month like string starting on 201 to a mapping entry. This will just be a hint. The sampling of number of entries should decide what gets mapped to avoid running out of memory. I am a bit unsure if these advanced features like substrings would never be used and should maybe only be implemented as some sort of substring detection separately. As this can be a bit processing intensive, substring statistics (top substrings) could be detected and maintained node wide in compaction and given as hints to the serializer later. > Reduce memory, disk space, and cpu usage with a column name/id map > ------------------------------------------------------------------ > > Key: CASSANDRA-4175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4175 > Project: Cassandra > Issue Type: Improvement > Reporter: Jonathan Ellis > Fix For: 2.1 > > > We spend a lot of memory on column names, both transiently (during reads) and > more permanently (in the row cache). Compression mitigates this on disk but > not on the heap. > The overhead is significant for typical small column values, e.g., ints. > Even though we intern once we get to the memtable, this affects writes too > via very high allocation rates in the young generation, hence more GC > activity. > Now that CQL3 provides us some guarantees that column names must be defined > before they are inserted, we could create a map of (say) 32-bit int column > id, to names, and use that internally right up until we return a resultset to > the client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira