Hi, You can use "recover" command instead.
Switch bookie to read-only (via REST API) bin/bookkeeper shell recover .. recover command also has a flag to delete the cookie in ZK. As an additional benefit, this way you can decomm bookie with ledgers created with write quorum = 1. HTH. On Sun, Mar 26, 2023 at 9:27 PM Hang Chen <chenh...@apache.org> wrote: > Hi guys, I found the BookKeeper decommission may be blocked by ledgers > that cannot be replicated. > > Current bookie decommissions process. > - Step 1: Use the command `bin/bookkeeper shell listunderreplicated` > to check whether there are some ledgers in the under-replicated state > - Step 2: After all the ledgers are replicated complete, stop the > bookie and use the command `bin/bookkeeper shell decommissionbookie > -bookieid <bookieaddress>` to trigger decommission > - Step 3: Wait for all the ledgers to be replicated and the bookie > decommission process will complete > > However, there is a bug in the decommissioning process. > > In Step 1, those under-replicated state ledgers are marked by the > following steps: > - Auditor check lost bookie: it will be triggered by two cases: a) > One bookie lost after `lostBookieRecoveryDelay`, b) Check every > `auditorPeriodicBookieCheckInterval`. The default is 24 hours. > - Auditor checks all ledgers: triggered every > `auditorPeriodicCheckInterval`. The default is 7 days. It will check > every ledger's fragments with the following steps: > - For every fragment, calculate pending read entries according to > `auditorLedgerVerificationPercentage`, default is `0`, which means > only checking the first and last entries of this fragment. > - Read those entries from all the bookies in the ensemble list for > the pending read entries. If any entries read failed, mark the ledger > into an under-replicated state. > > > When we use the `bin/bookkeeper shell listunderreplicated` command to > check whether some are under-replicated, it only represents those > ledgers missing replicas before the last check. The lost bookie check > was 24 hours ago, and the all ledgers check was seven days ago. The > time range from the last check to the current timestamp won't mark any > missing replicas ledgers. Suppose we set EnsembleSize=3, > WriteQuorumSize=2, and AckQuorumSize=1, and decommission one bookie > with the current decommission process. In that case, it may result in > some ledgers that can't be replicated due to the only available > replica on the decommissioned bookie. > > Moreover, the Auditor checks all ledgers and only checks the first and > last entries of each fragment of those ledgers. If the bookie disabled > writing journals and some entries are lost in one fragment, but the > first and last entries still exist, the checker won't find it. > > ### Options > There are two options to tune the decommissioning process. > > 1. Trigger-check all ledgers before Step 1. It has the following > disadvantages. > - It will cost a lot of resources > - It only checks the first and last entries of each fragment of > those ledgers by default. It can't cover all the entries that check > > 2. Turn the bookie into read-only mode instead of shutting it down > before using the `bin/bookkeeper shell decommissionbookie -bookieid > <bookieaddress>` command to trigger commission. When replicating > ledgers located on the decommission bookie, the ledgers can be > replicated successfully if one replica is available. > > I suggest choosing the second option to tune the current bookie > decommission process. Do you have any suggestions? > > Thanks, > Hang > -- Andrey Yegorov