Hi Enrico, I'm glad to. Do you have any suggestions for it?
Best regards, Yang Yang On Mon, Aug 29, 2022 at 3:45 AM Enrico Olivelli <eolive...@gmail.com> wrote: > Yang, > > Il Sab 27 Ago 2022, 11:05 Yang Yang <fantaps...@gmail.com> ha scritto: > > > For the short term, you can see if the `recover` command is able to help > > you: > > > https://bookkeeper.apache.org/docs/reference/cli#bookkeeper-shell-recover > > > > For the long term, I have proposed a solution to mark the bookie to be > > decommissioned in a `draining` state and let the autorecovery mechanism > > replicate the ledgers, please take a look and see if it could solve your > > use case: > https://lists.apache.org/thread/1l9kzb1l0vok105gj2ody3g8nyv7s9l8 > > > Would you be able to continue that proposal? > > Enrico > > > > > > > Best regards, > > Yang Yang > > > > > > On Thu, Aug 25, 2022 at 6:32 PM steven lu <lushiji2...@gmail.com> wrote: > > > > > I think this feature is somewhat custom and not very generic; and there > > are > > > risks: > > > 1. If you want to go offline on node A (which has already been > > extracted), > > > but wrong write B, this function will directly go offline on node B, > > which > > > is likely to cause online failures > > > 2. If the node to be offline suddenly accesses traffic, how should it > be > > > handled? It is easy to cause the loss of cluster replicas > > > > > > In response to these two problems, how to avoid, please help explain > > > > > > lordcheng10 <1572139...@qq.com.invalid> 于2022年8月25日周四 17:12写道: > > > > > > > Hi Bookkeeper Community, > > > > > > > > > > > > This is a BP discussion on Support non-stop bookie data > migration > > > and > > > > bookie offline > > > > The issue can be found: > > > > https://github.com/apache/bookkeeper/issues/3456 > > > > > > > > > > > > I copy the content here for convenience, any suggestions are welcome > > and > > > > appreciated. > > > > > > > > > > > > > > > > > > > > ### Motivation > > > > bookie offline steps: > > > > 1. Log on to the bookie node, check if there are underreplicated > > > > ledgers.If there are, the decommission command will force them to be > > > > replicated: bin/bookkeeper shell listunderreplicated > > > > 2. Stop the bookie : bin/bookkeeper-daemon.sh stop bookie > > > > 3. Run the decommission command. If you have logged onto the node you > > > wish > > > > to decommission, you don't need to provide -bookieid If you are > running > > > the > > > > decommission command for target bookie node from another bookie node > > you > > > > should mention the target bookie id in the arguments for -bookieid > > > : > > > > bin/bookkeeper shell decommissionbookie or $ bin/bookkeeper shell > > > > decommissionbookie -bookieid <target bookieid> > > > > 4. Validate that there are no ledgers on decommissioned bookie $ > > > > bin/bookkeeper shell listledgers -bookieid <target bookieid> > > > > > > > > > > > > For the current bookie offline solution, need to stop the bookie > > > > first,execute the decommission command and wait for the ledger > > migration > > > on > > > > the bookie to complete. > > > > > > > > > > > > it is very time-consuming to offline a bookie node. When we need to > > > > offline a lot of bookie nodes, the time-consuming of this solution > will > > > not > > > > be acceptable. > > > > > > > > > > > > Therefore, we need a solution that can migrate data without stopping > > > > bookie, so that bookie nodes can be offlined in batches. > > > > > > > > > > > > ### Proposal > > > > In order to solve this solution, we propose a solution that can be > > > > replicated without stopping the bookie. > > > > The process is as follows: > > > > 1. Submit the bookie node to be offline; > > > > 5. Traverse each ledgers on the offline bookie, and persist these > > ledgers > > > > and the corresponding offline bookie nodes to the zookeeper > directory: > > > > ledgers/offline_ledgers/ledgerId; > > > > 6. Get the ledger to be offline; > > > > 7. Traverse all fragments on a ledger, and filter out the fragments > > > > containing the offline bookie copy; > > > > 8. Copy data for each fragment; > > > > 9. When a ledger fragment is copied, delete the corresponding > > > > ledgers/offline_ledgers/ledgerId; > > > > 10. When all ledgerId directories under ledgers/offline_ledgers are > > > > deleted, it means that the data has been migrated, you can stop > bookies > > > in > > > > batches and go offline; > > > > > > > > > > > > To achieve our goal, we need to achieve two things: > > > > 1. Implement a command to submit the bookie to be offline and the > > > > corresponding ledgers, for example: > > > > bin/bookkeeper shell decommissionbookie -offline_bookieids > > > > bookieId1,bookieId2,bookieId3,...bookieIdN > > > > This command will write all ledgers on the offline bookie node > > to > > > > the zookeeper directory, for example: put > > > ledgers/offline_ledgers/ledgerId > > > > bookId1,bookId2,...bookIdn; > > > > 2. Design a ReassignLedgerWorker class to perform the actual ledger > > > > replication: > > > > this class will obtain a ledger from the zookeeper > > directory > > > > ledgers/offline_ledgers for replication. > > > > It will first filter out all the fragments containing > the > > > > offline bookieId under the ledger,then copy these fragments; > > > > > >