Throughput and RAM
Based on my tuning work with C* over the last days, I guess I reached the following insights. Maybe someone can confirm whether they make sense: The more heap I give to Cassandra (up to the GC tipping point of ~8GB) the more writes it can accumulate in memtables before doing IO. The more writes are accumulated in memtables, the closer the IO gets towards the maximum possible IO throughput (because there will be fewer writes of larger sstables). So in a sense, C* is designed to maximize IO write efficiency by pre-organizing write queries in memory. The more memory, the better the organization works (caveat GC). Cassandra takes this eagerness for consuming writes and organizing the writes in memory to such an extreme, that any given node will rather die than stop consuming writes. Especially I am looking a confirmation of the last one. Jan
Re: Throughput and RAM
On Tue, Sep 10, 2013 at 2:30 AM, Jan Algermissen jan.algermis...@nordsc.com wrote: So in a sense, C* is designed to maximize IO write efficiency by pre-organizing write queries in memory. The more memory, the better the organization works (caveat GC). http://en.wikipedia.org/wiki/Log-structured_merge-tree The LSM-tree is a hybrid data structure. It is composed of two tree-likehttp://en.wikipedia.org/wiki/Tree_(data_structure) structures, known as the C0 and C1 components. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. The performance characteristics of LSM-trees stem for the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort http://en.wikipedia.org/wiki/Merge_sort. Cassandra takes this eagerness for consuming writes and organizing the writes in memory to such an extreme, that any given node will rather die than stop consuming writes. Perhaps more simply : RAM is faster than disk and Cassandra does not prevent a given node from writing to RAM faster than it can flush to disk? =Rob
Re: Throughput and RAM
On 10.09.2013, at 19:37, Robert Coli rc...@eventbrite.com wrote: Cassandra does not prevent a given node from writing to RAM faster than it can flush to disk? Yes, that is what I meant. What remains unclear to me is what the oprational strategy is towards handling an increase in writes or peaks. Seems to be: wait until nodes die and then add capacity. I guess what I am looking for is the switch so that *I* can tell C* not to write more to RAM than it is able to flush. I have a hunch that coordinators pile up incoming requests and that the memory used by them causes the node to stop flushing completely. I tried to reduce rpc connections and/or reduce write timeouts but both hadd no effect. Can anybody provide a direction in which to look? This image ( http://twitpic.com/dcwlmn) shows the typical situation for me, no matter what switches I work with. There is always this segment of an arc which shows the increasing unflushed memtables. Jan