Not sure whether there already had similar discussion on it, sorry for 
re-raising if yes.

1. data inconsistency between master and peer clusters:
  a). write a keyvalue KV1 at a specific coordinate (row, cf, col, ts) with 
value V1 to master cluster A, and since there is active scanner while flushing, 
KV1's memstoreTS not set to 0 in the resultant hfile F1
  b). write KV1 once again with the same coordinate (row, cf, col, ts) but with 
different value V2, and no active scanner while flushing this time, KV1's 
memstoreTS is set to 0 in the resultant hfile F2
  c). two KV1 are replicated to peer cluster serially, no active scanner when 
flushing and they are flushed to two different hfiles both with memstoreTS=0

  now, a client reads KV1 from the master cluster will find the value is V1 
(since its memstoreTS is larger), and when it reads KV1 from peer cluster will 
find the value is V2 (since memstoreTS are equal but the latter's seqID is 
larger)

2. data inconsistency in different time phases:
   a). write a keyvalue KV1 at a specific coordinate (row, cf, col, ts) with 
value V1 to master cluster A, and since there is active scanner while flushing, 
KV1's memstoreTS is not set to 0 in the resultant hfile F1
  b). write KV1 once again with the same coordinate (row, cf, col, ts) but with 
different value V2, and no active scanner while flushing this time, KV1's 
memstoreTS is set to 0 in the resultant hfile F2

  reading KV1 now will find the value is V1 (since its memstoreTS is larger)

  c). after a while a compact including F1(but not F2) occurs and KV1's 
memstoreTS is set to 0 since no active scanner

  reading KV1 now will find the value is V2 (since memstoreTS are equal but the 
latter's seqID is larger)

Keeping mvcc untouched during a keyvalue's whole lifecycle (during 
flush/compact, or failover/HLog-replay) can avoid above two kinds of data 
inconsistency, any opinion?

Reply via email to