Problem in rereplication algorithm

Ivan Kelly Wed, 05 Mar 2014 05:17:31 -0800

Hi folks,

We've come across a problem in autorecovery, which I've been banging
my head against for the last day so I decided to open it up to
everyone to see if a solution is any clearer.


The problem was observed in production, and while it doesn't cause
data loss, it does appear to the admin as if entries have been lost.

= Problem scenario =

You have a ledger L1. There is one segment in the ledger with quorum
2, ensemble 3 starting at entry 0. This segment is on the bookie B1,
B2 & B3. So metadata looks like

0: B1, B2, B3

No data has been written to the ledger.

B3 crashes. The auditor notes that L1 contains a segment with B3, so
scheduled the ledger to be checked. A recovery worker opens the ledger
without fencing. The recovery worker sees that the segment is still
open and that the lastAddConfirmed is less than the segment start id,
so it reads forward. Ultimately it gets a lastAddConfirmed which is
less than the segment start id, as all bookies in the quorum [B1,B2]
respond with NoSuchEntry for entry 0. So the recovery worker sees that
there are no underreplicated fragments, so there's nothing to
recovery. So far, so good.

But now consider if B2 crashes. L1 will be scheduled to be checked
again. A recovery worker will try to open with fencing. It won't be
able to reach all quorums; [B2, B3] is now unavailable. Open will
fail. 

As a result, the underreplicated node for L1 hangs around forever.

I have some ideas for a fix, but none is straightforward, so I'd like
to hear other opinions first.

-Ivan

Problem in rereplication algorithm

Reply via email to