I don't think it a good idea to leave the tail to the replication.
This could lead to the perception of data loss, and it's more evident in
the case of larger WQ and disparity with AQ.
If we determine LLAC based on having 'a copy', which is never acknowledged
to the client, and if that bookie goes down(or crashes and burns)
before replication worker gets a chance, it gives the illusion of data
loss. Moreover, we have no way to determine the real data loss vs
this scenario where we have never acknowledged the client.


On Mon, Aug 6, 2018 at 12:32 AM, Sijie Guo <guosi...@gmail.com> wrote:

> On Mon, Aug 6, 2018 at 12:08 AM Ivan Kelly <iv...@apache.org> wrote:
>
> > >> Recovery operates on a few seconds of data (from the last LAC written
> > >> to the end of the ledger, call this LLAC).
> > >
> > > the data during this duration can be very large if the traffic of the
> > > ledger is large. That has
> > > been observed at Twitter's production. so when we are talking about "a
> > few
> > > seconds of data",
> > > we can't assume the amount of data is little. That says the recovery
> can
> > be
> > > taking time than
> >
> > Yes, it can be large, but still it is only a few seconds worth of
> > data. It is the amount of data that can be transmitted in the period
> > of one roundtrip, as the next roundtrip will update the LAC.
>
>
> > I didn't mean to imply the data was small. I was implying that the
> > data was small in comparison to the overall size of that ledger.
>
>
> > > what we can expect, so if we don't handle failures during recovery how
> we
> > > are able to ensure
> > > we have enough data copy during recovery.
> >
> > Consider a e3w3a2 ledger, there's two cases where you can lose a
> > bookie during recover.
> >
> > Case one, one bookie is lost. You can still recover from as ack=2 is
> > available.
> > Case two, two bookies are lost. You can't recover, but ledger is
> > unavailable anyhow, since any entry in the ledger may only have been
> > replicated to 2.
> >
> > However, with e3w3a3 I guess you wouldn't be able to recover at all,
> > and we have to handle that case.
> >
> > > I am not sure "make ledger metadata immutable" == "getting rid of
> merging
> > > ledger metadata".
> > > because I don't think these are same thing. making ledger metadata
> > > immutable will make code
> > > much clearer and simpler because the ledger metadata is immutable. how
> > > getting rid of merging
> > > ledger metadata is a different thing, when you make ledger metadata
> > > immutable, it will help make
> > > merging ledger metadata on conflicts clearer.
> >
> > I wouldn't call it merging in this case.
>
>
> That's fine.
>
>
> > Merging implies taking two
> > valid pieces of metadata and getting another usable, valid metadata
> > from it.
> > What happens with immutable metadata, is that you are taking one valid
> > metadata, and applying operations to it. So in the failure during
> > recovery place, we would have a list of AddEnsemble operations which
> > we add when we try to close.
> >
> > In theory this is perfectly valid and clean. It just can look messy in
> > the code, due to how the PendingAddOp reaches back into the ledger
> > handle to get the current ensemble.
> >
>
> That's okay since it is reality which we have to face anyway. But the most
> important thing
> is that we can't get rid of ensemble changes during ledger recovery.
>
>
> >
> > So, in conclusion, I will keep the handling.
>
>
> Thank you.
>
>
> > In any case, these
> > changes are all still blocked on
> > https://github.com/apache/bookkeeper/pull/1577.
> >
> > -Ivan
> >
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to