[jira] [Commented] (CASSANDRA-3073) liveSize() calculation is wrong in case of overwrite

Yang Yang (JIRA) Wed, 24 Aug 2011 12:54:55 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090440#comment-13090440
 ]


Yang Yang commented on CASSANDRA-3073:
--------------------------------------

Jonathan:

I looked at the SlabAllocator code, I think it does create a problem for 
update-oriented workload: as you said, its "live size" is essentially the 
throughput. thus even if you have a single counter, your memory consumption 
quickly grows, and you have to flush out many tiny SSTables. could we add a 
config option to control to choose the SlabAllocator or the native one? in the 
counter case, it seems the disadvantages of many tiny SSTables is worse than 
possible fragmentation.

also could you please educate me more on how the current SlabAllocator code 
works? (it's quite different from standard slab allocation as described in 
books since it does not have the free() )   I thought by the time minor GC 
happens, some old columns in the Region has been superceded and are useless, so 
the Region already has internal fragmentation, and then this Region is promoted 
to old gen, so old gen would still have internal fragmentation (actually a lot 
in case of counters). or am I missing something?

Thanks
Yang

> liveSize() calculation is wrong in case of overwrite
> ----------------------------------------------------
>
>                 Key: CASSANDRA-3073
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3073
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Yang Yang
>            Priority: Minor
>         Attachments: 
> 0001-liveSize-is-different-from-throughput-particularly-w.patch
>
>
> currently liveSize() is the sum of currentThroughput.
> this definition is wrong if most of the operations are overwrite, or counter 
> (which is essentially overwrite).
> for example, the following code should always keep a single entry in db, with 
> one row, one cf, one column, and supposedly should have a size of only about 
> 100 bytes.
> connect localhost/9160;  
> create keyspace blah;
> use blah;
> create column family cf2 with memtable_throughput=1024 and 
> memtable_operations=10000  ;
> set the cassandra.yaml 
> memtable_total_space_in_mb: 20
> to make the error appear faster (but if u set to default, still same issue 
> will appear)
> then we use a simple pycassa  script:
> >>> pool = pycassa.connect('blah')
> >>> mycf = pycassa.ColumnFamily(pool,"cf2");
> >>> for x in range(1,10000000) :
> ...     xx = mycf.insert('key1',{'col1':"{}".format(x)})
> ... 
> you will see sstables being generated with only sizes of a few k, though we 
> set the CF options to get high SSTable sizes

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3073) liveSize() calculation is wrong in case of overwrite

Reply via email to