Hello, Could you please tell me if I correctly understand this problem...
Example behavior 1: * create table * do 10 operations: insert cell, override (given that versions # configured to 1) it, override, ... override. * after flushing memstore with these edits, all of them getting written to hfiles Ideally, in this situation one edit should be performed (resulting value of cell). I.e. only "current visible state" of memstore should be flushed as opposed to flushing all the edits from HLog. This will have a lot of benefits (e.g. reducing data amount to flush -> may be less frequent flushing needing -> less freq compactions, etc. operations), esp in particular use-cases (like using counters, or updating some "aggregated values"). The problem, as I understand (correct me here, please if I'm wrong) is that it is not an easy thing to do, mainly because 1) additional resource management burden (flushing large memstore isn't cheap) 2) compaction may add a lot of unnecessary overhead (so that in some cases there will be no actual benefit from it), may make flushing much slower, which can bring a lot of issues 3) edits flushed from memstore and HLog edits should be kept in sync, because we want the flush process to be reliable. I.e. if it fails in the middle we should be able to restore the state from HLog. Keeping memstore and HLog in sync during compaction (and we would need partial compaction of some older data of the memstore) is difficult. 4) anything else? Esp. 3rd point - am I getting it right? Thanx, Alex Baranau