>>> I already pointed out. the admin should be aware of potential data loss. so >>> no confidence.
In HDFS shared storage perspective, data loss is not acceptable. >> the postponing is already there, since the ledger couldn't be opened and >> fenced. Yeah Sijie you are right, it will postpone to next cycle. AFAIK AutoRecovery feature will keep on trying to open it again and again, this cycle will never ends. It is a kind of hanging too. -Rakesh -----Original Message----- From: Sijie Guo [mailto:[email protected]] Sent: 06 March 2014 13:50 To: [email protected] Subject: Re: Problem in rereplication algorithm On Wed, Mar 5, 2014 at 9:16 PM, Rakesh R <[email protected]> wrote: > > If the failure is more than the tolerated failures, it would not be > safe to go ahead with any cleanup. > For ex, quorum size is 2 and say failed 2 bookies out of 3, according > to me for this ledger allowed failure is only 1. > > Also, please someone tell me, how the admin will get the confidence to > safely do any cleanups. I already pointed out. the admin should be aware of potential data loss. so no confidence. > IMHO postponing the recovery would be safe. > the postponing is already there, since the ledger couldn't be opened and fenced. > > -Rakesh > > -----Original Message----- > From: Uma Maheswara Rao G [mailto:[email protected]] > Sent: 06 March 2014 10:05 > To: [email protected] > Subject: Re: Problem in rereplication algorithm > > >As Sijie pointed out, we lost quorum, so the ledger is not good any > longer. > Because we might not be able to detect such cases automatically, I was > wondering if we need to manually delete it. > > Yes. As Sijie and Flavio pointed out , how about providing a tool to > clean such ledgers. > At the same time I agree, we have to think some automatic way to > detect it as we claim the feature as Auto. > at any time, if the quorum requirement is broken, we shouldn't do any auto things. leave it to human. > > or shall we delay such quorum failure ledgers replication cycle > incrementally by somehow tracking time in underreplication ledger > nodes? [ I am not very sure on this, we have to think more] > > Regards, > Uma > > > > On Thu, Mar 6, 2014 at 7:35 AM, Flavio Junqueira > <[email protected] > >wrote: > > > I'm not sure what the desirable outcome is here. When you say that > > the underreplicated L1 node hangs around forever, does it mean that > > we keep trying to create new replicas? > The hang means that the ledger couldn't be opened and fenced. > > > > As Sijie pointed out, we lost quorum, so the ledger is not good any > longer. > > Because we might not be able to detect such cases automatically, I > > was wondering if we need to manually delete it. > > > > -Flavio > > > > > > -----Original Message----- > > From: Ivan Kelly [mailto:[email protected]] > > Sent: Wednesday, March 5, 2014 5:15 AM > > To: [email protected] > > Subject: Problem in rereplication algorithm > > > > Hi folks, > > > > We've come across a problem in autorecovery, which I've been banging > > my head against for the last day so I decided to open it up to > > everyone to see if a solution is any clearer. > > > > The problem was observed in production, and while it doesn't cause > > data loss, it does appear to the admin as if entries have been lost. > > > > = Problem scenario = > > > > You have a ledger L1. There is one segment in the ledger with quorum > > 2, ensemble 3 starting at entry 0. This segment is on the bookie B1, > > B2 & B3. So metadata looks like > > > > 0: B1, B2, B3 > > > > No data has been written to the ledger. > > > > B3 crashes. The auditor notes that L1 contains a segment with B3, so > > scheduled the ledger to be checked. A recovery worker opens the > > ledger without fencing. The recovery worker sees that the segment is > > still open and that the lastAddConfirmed is less than the segment > > start id, so it reads forward. Ultimately it gets a lastAddConfirmed > > which is less than the segment start id, as all bookies in the > > quorum [B1,B2] respond with NoSuchEntry for entry 0. So the recovery > > worker sees that there are no underreplicated fragments, so there's > > nothing to recovery. So far, so good. > > > > But now consider if B2 crashes. L1 will be scheduled to be checked > > again. A recovery worker will try to open with fencing. It won't be > > able to reach all quorums; [B2, B3] is now unavailable. Open will > > fail. > > > > As a result, the underreplicated node for L1 hangs around forever. > > > > I have some ideas for a fix, but none is straightforward, so I'd > > like to hear other opinions first. > > > > -Ivan > > > > >
