[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

Edward Capriolo (JIRA) Fri, 07 Jun 2013 12:25:27 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13678336#comment-13678336
 ]


Edward Capriolo commented on CASSANDRA-4175:
--------------------------------------------

https://issues.apache.org/jira/browse/CASSANDRA-44
https://issues.apache.org/jira/browse/CASSANDRA-45

If we are going to use zookeeper why not do what was suggested in cassandra-44. 
Move all the schema to zookeeper. Then there is no schema consistency issues at 
all.

We can continue to add stuff to zookeeper until cassandra becomes a poor mans 
hbase. CAS, atomic counters, row locks, lets do it! 

Can someone point me to some real work examples of how large the average column 
name is and how much this optimization will help. I am not sure I follow how 
this helps.

I am looking at http://thelastpickle.com/2013/01/11/primary-keys-in-cql/

{quote}
RowKey: 3:201302
=> (column=2013-02-20 10\:58\:45+1300:, value=, timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:is_dam_dirty_apes, value=01, 
timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:pressure, value=00001ed2, 
timestamp=1357869161380000)
=> (column=2013-02-20 10\:58\:45+1300:temperature, value=0000001f, 
timestamp=1357869161380000)
{quote}

In this example the column names are '2013-02-20 10\:58\:45+1300' '2013-02-20 
10\:58\:45+1300:is_dam_dirty_apes', '2013-02-20 10\:58\:45+1300:pressure, 
2013-02-20 10\:58\:45+1300:temperature'

How are we going to build caches of this?  We must be also thinking of some new 
format not sstables?



















                
> Reduce memory (and disk) space requirements with a column name/id map
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4175
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>             Fix For: 2.1
>
>
> We spend a lot of memory on column names, both transiently (during reads) and 
> more permanently (in the row cache).  Compression mitigates this on disk but 
> not on the heap.
> The overhead is significant for typical small column values, e.g., ints.
> Even though we intern once we get to the memtable, this affects writes too 
> via very high allocation rates in the young generation, hence more GC 
> activity.
> Now that CQL3 provides us some guarantees that column names must be defined 
> before they are inserted, we could create a map of (say) 32-bit int column 
> id, to names, and use that internally right up until we return a resultset to 
> the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

Reply via email to