Aleksey Yeschenko created CASSANDRA-6945:
--------------------------------------------

             Summary: Calculate liveRatio on per-memtable basis, non per-CF
                 Key: CASSANDRA-6945
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6945
             Project: Cassandra
          Issue Type: Bug
            Reporter: Aleksey Yeschenko
            Assignee: Aleksey Yeschenko


Currently we recalculate live ratio every doubling of write ops to the CF, not 
to an individual memtable. The value itself is also CF-bound, not 
memtable-bound. This is causing at least several issues:

1. Depending on what stage the current memtable is, the live ratio calculated 
can vary *a lot*
2. That calculated live ratio will potentially stay that way for quite a while 
- the longer C* process is on, the longer it would stay incorrect
3. Incorrect live ratio means inefficient MeteredFlusher - flushing less or 
more often than needed, picking bad candidates for flushing, etc.
4. Incorrect live ratio means incorrect size returned to the metrics consumers
5. Compaction strategies that rely on memtable size estimation are affected
6. All of the above is slightly amplified by the fact that all the memtables 
pending flush would also use that one incorrect value

Depending on the stage the current memtable at the moment of live ratio 
recalculation is, the value calculated can be *extremely* wrong (say, a 
recently created, fresh memtable - would have a much higher than average live 
ratio).

The suggested fix is to bind live ratio to individual memtables, not column 
families as a whole, with some optimizations to make recalculations run less 
often by inheriting previous memtable's stats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to