Hi,

You can use "recover" command instead.

Switch bookie to read-only (via REST API)
bin/bookkeeper shell recover ..
recover command also has a flag to delete the cookie in ZK.
As an additional benefit, this way you can decomm bookie with ledgers
created with write quorum = 1.

HTH.

On Sun, Mar 26, 2023 at 9:27 PM Hang Chen <chenh...@apache.org> wrote:

> Hi guys, I found the BookKeeper decommission may be blocked by ledgers
> that cannot be replicated.
>
> Current bookie decommissions process.
>   - Step 1: Use the command `bin/bookkeeper shell listunderreplicated`
> to check whether there are some ledgers in the under-replicated state
>   - Step 2: After all the ledgers are replicated complete, stop the
> bookie and use the command `bin/bookkeeper shell decommissionbookie
> -bookieid <bookieaddress>` to trigger decommission
>   - Step 3: Wait for all the ledgers to be replicated and the bookie
> decommission process will complete
>
> However, there is a bug in the decommissioning process.
>
> In Step 1, those under-replicated state ledgers are marked by the
> following steps:
>   - Auditor check lost bookie: it will be triggered by two cases: a)
> One bookie lost after `lostBookieRecoveryDelay`, b) Check every
> `auditorPeriodicBookieCheckInterval`.  The default is 24 hours.
>   - Auditor checks all ledgers: triggered every
> `auditorPeriodicCheckInterval`. The default is 7 days. It will check
> every ledger's fragments with the following steps:
>     - For every fragment, calculate pending read entries according to
> `auditorLedgerVerificationPercentage`, default is `0`, which means
> only checking the first and last entries of this fragment.
>     - Read those entries from all the bookies in the ensemble list for
> the pending read entries. If any entries read failed, mark the ledger
> into an under-replicated state.
>
>
> When we use the `bin/bookkeeper shell listunderreplicated` command to
> check whether some are under-replicated, it only represents those
> ledgers missing replicas before the last check. The lost bookie check
> was 24 hours ago, and the all ledgers check was seven days ago. The
> time range from the last check to the current timestamp won't mark any
> missing replicas ledgers. Suppose we set EnsembleSize=3,
> WriteQuorumSize=2, and AckQuorumSize=1, and decommission one bookie
> with the current decommission process. In that case, it may result in
> some ledgers that can't be replicated due to the only available
> replica on the decommissioned bookie.
>
> Moreover, the Auditor checks all ledgers and only checks the first and
> last entries of each fragment of those ledgers. If the bookie disabled
> writing journals and some entries are lost in one fragment, but the
> first and last entries still exist, the checker won't find it.
>
> ### Options
> There are two options to tune the decommissioning process.
>
> 1. Trigger-check all ledgers before Step 1. It has the following
> disadvantages.
>    - It will cost a lot of resources
>    - It only checks the first and last entries of each fragment of
> those ledgers by default. It can't cover all the entries that check
>
>  2. Turn the bookie into read-only mode instead of shutting it down
> before using the `bin/bookkeeper shell decommissionbookie -bookieid
> <bookieaddress>` command to trigger commission. When replicating
> ledgers located on the decommission bookie, the ledgers can be
> replicated successfully if one replica is available.
>
> I suggest choosing the second option to tune the current bookie
> decommission process. Do you have any suggestions?
>
> Thanks,
> Hang
>


-- 
Andrey Yegorov

Reply via email to