my ack quorum is 1, please let me explain my confusion: 1??when one bookie is down, as you said, why some ledgers can be replicated successfully, but some cannot. 2??from the code below in PendingReadLacOp, i don't see any codes relation to ack quorum when read lac.
public void initiate() { for (int i = 0; i < currentEnsemble.size(); i++) { bookieClient.readLac(currentEnsemble.get(i), lh.ledgerId, this, i); } } ------------------ ???????? ------------------ ??????: "dev" <jvanligh...@splunk.com.INVALID>; ????????: 2021??9??14??(??????) ????8:49 ??????: "dev"<dev@bookkeeper.apache.org>; ????: Re: AutoRecovery failed replicate ledger , because, it would read lac from failed bookie An LAC read will fail in this way if Ack Quorum or more bookies respond with any other than OK, NoSuchEntry, NoSuchLedger. What is your ack quorum? If it is just 1 (not a good setting), then a single bookie being down will make the LAC read fail this way. If your ack quorum is 2, then 2 bookies being down will cause it etc. Jack On Tue, Sep 14, 2021 at 1:17 PM zhangao <gaozhangmin...@qq.com.invalid> wrote: > [ External sender. Exercise caution. ] > > As title, When bookie is lost, the ledger which state is open cannot > replicated because of reading lac from failed bookie. > it would failed read lac from failed bookie, because it cannot be > connected. > > How bookkeeper auto recovery deal with open ledger in failed bookie ? > > I don't know if it's a bug or not. > > The error log: > > 12:29:57.072 [main-EventThread] INFO&nbsp; > org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve > x.x.x.x:3181, bookie is unknown > org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: > Bookie handle is not available > > 12:29:57.072 [main-EventThread] ERROR > org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to > x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err > org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: > Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not > running > > 12:29:57.078 [BookKeeperClientWorker-OrderedExecutor-29-0] INFO&nbsp; > org.apache.bookkeeper.client.PendingReadLacOp - While readLac ledger: 96789 > did not hear success responses from all of ensemble > > 12:29:57.078 [ReplicationWorker] INFO&nbsp; > org.apache.bookkeeper.replication.ReplicationWorker - BKReadException while > rereplicating ledger 96789. Enough Bookies might not have available So, no > harm to continue