In your case, you already lost a quorum. Any actions here will cause potential data loss. If you really want to address it, provide a tool to ask admin force-close the ledger, in aware of potential data loss.
- Sijie On Wed, Mar 5, 2014 at 10:01 AM, Ivan Kelly <[email protected]> wrote: > It was during the open that it failed, but it was at the > readLastAddConfirmed part, not at recovery, as recovery didn't run > because it was opening without fencing. > > -Ivan > > On Wed, Mar 05, 2014 at 02:50:26PM +0000, Rakesh R wrote: > > Hi Ivan, > > > > I hope the following would have happened in your env. > > > > During fencing, ReplicationWorker(RW) is hitting the exception > "org.apache.bookkeeper.client.BKException$BKLedgerRecoveryException" > > as ledger did not hear success responses from all quorums. Now again and > again RW will try to do fence and this cycle never ends, isn't it ? > > > > > > If that is the case, I think graceful fencing will be difficult we may > need to find some alternate way of handling this case. > > > > > > -Rakesh > > > > -----Original Message----- > > From: Ivan Kelly [mailto:[email protected]] > > Sent: 05 March 2014 18:45 > > To: [email protected] > > Subject: Problem in rereplication algorithm > > > > Hi folks, > > > > We've come across a problem in autorecovery, which I've been banging my > head against for the last day so I decided to open it up to everyone to see > if a solution is any clearer. > > > > The problem was observed in production, and while it doesn't cause data > loss, it does appear to the admin as if entries have been lost. > > > > = Problem scenario = > > > > You have a ledger L1. There is one segment in the ledger with quorum 2, > ensemble 3 starting at entry 0. This segment is on the bookie B1, > > B2 & B3. So metadata looks like > > > > 0: B1, B2, B3 > > > > No data has been written to the ledger. > > > > B3 crashes. The auditor notes that L1 contains a segment with B3, so > scheduled the ledger to be checked. A recovery worker opens the ledger > without fencing. The recovery worker sees that the segment is still open > and that the lastAddConfirmed is less than the segment start id, so it > reads forward. Ultimately it gets a lastAddConfirmed which is less than the > segment start id, as all bookies in the quorum [B1,B2] respond with > NoSuchEntry for entry 0. So the recovery worker sees that there are no > underreplicated fragments, so there's nothing to recovery. So far, so good. > > > > But now consider if B2 crashes. L1 will be scheduled to be checked > again. A recovery worker will try to open with fencing. It won't be able to > reach all quorums; [B2, B3] is now unavailable. Open will fail. > > > > As a result, the underreplicated node for L1 hangs around forever. > > > > I have some ideas for a fix, but none is straightforward, so I'd like to > hear other opinions first. > > > > -Ivan >
