Hi Enrico,

I'm glad to. Do you have any suggestions for it?

Best regards,
Yang Yang


On Mon, Aug 29, 2022 at 3:45 AM Enrico Olivelli <eolive...@gmail.com> wrote:

> Yang,
>
> Il Sab 27 Ago 2022, 11:05 Yang Yang <fantaps...@gmail.com> ha scritto:
>
> > For the short term, you can see if the `recover` command is able to help
> > you:
> >
> https://bookkeeper.apache.org/docs/reference/cli#bookkeeper-shell-recover
> >
> > For the long term, I have proposed a solution to mark the bookie to be
> > decommissioned in a `draining` state and let the autorecovery mechanism
> > replicate the ledgers, please take a look and see if it could solve your
> > use case:
> https://lists.apache.org/thread/1l9kzb1l0vok105gj2ody3g8nyv7s9l8
>
>
> Would you be able to continue that proposal?
>
> Enrico
>
>
>
> >
> > Best regards,
> > Yang Yang
> >
> >
> > On Thu, Aug 25, 2022 at 6:32 PM steven lu <lushiji2...@gmail.com> wrote:
> >
> > > I think this feature is somewhat custom and not very generic; and there
> > are
> > > risks:
> > > 1.  If you want to go offline on node A (which has already been
> > extracted),
> > > but wrong write B, this function will directly go offline on node B,
> > which
> > > is likely to cause online failures
> > > 2.  If the node to be offline suddenly accesses traffic, how should it
> be
> > > handled? It is easy to cause the loss of cluster replicas
> > >
> > > In response to these two problems, how to avoid, please help explain
> > >
> > > lordcheng10 <1572139...@qq.com.invalid> 于2022年8月25日周四 17:12写道:
> > >
> > > > Hi Bookkeeper Community,&nbsp;
> > > >
> > > >
> > > > This is a BP discussion on&nbsp;Support non-stop bookie data
> migration
> > > and
> > > > bookie offline
> > > > The issue can be found:&nbsp;
> > > > https://github.com/apache/bookkeeper/issues/3456&nbsp;
> > > >
> > > >
> > > > I copy the content here for convenience, any suggestions are welcome
> > and
> > > > appreciated.
> > > >
> > > >
> > > >
> > > >
> > > > ### Motivation
> > > > bookie offline steps:
> > > > 1. Log on to the bookie node, check if there are underreplicated
> > > > ledgers.If there are, the decommission command will force them to be
> > > > replicated: bin/bookkeeper shell listunderreplicated
> > > > 2. Stop the bookie : bin/bookkeeper-daemon.sh stop bookie
> > > > 3. Run the decommission command. If you have logged onto the node you
> > > wish
> > > > to decommission, you don't need to provide -bookieid If you are
> running
> > > the
> > > > decommission command for target bookie node from another bookie node
> > you
> > > > should mention the target bookie id in the arguments for -bookieid
> > > :&nbsp;
> > > > bin/bookkeeper shell decommissionbookie or $ bin/bookkeeper shell
> > > > decommissionbookie -bookieid <target bookieid&gt;
> > > > 4. Validate that there are no ledgers on decommissioned bookie $
> > > > bin/bookkeeper shell listledgers -bookieid <target bookieid&gt;
> > > >
> > > >
> > > > For the current bookie offline solution, need to stop the bookie
> > > > first,execute the decommission command and wait for the ledger
> > migration
> > > on
> > > > the bookie to complete.
> > > >
> > > >
> > > > it is very time-consuming to offline a bookie node. When we need to
> > > > offline a lot of bookie nodes, the time-consuming of this solution
> will
> > > not
> > > > be acceptable.
> > > >
> > > >
> > > > Therefore, we need a solution that can migrate data without stopping
> > > > bookie, so that bookie nodes can be offlined in batches.
> > > >
> > > >
> > > > ### Proposal
> > > > In order to solve this solution, we propose a solution that can be
> > > > replicated without stopping the bookie.&nbsp;
> > > > The process is as follows:
> > > > 1. Submit the bookie node to be offline;
> > > > 5. Traverse each ledgers on the offline bookie, and persist these
> > ledgers
> > > > and the corresponding offline bookie nodes to the zookeeper
> directory:
> > > > ledgers/offline_ledgers/ledgerId;
> > > > 6. Get the ledger to be offline;
> > > > 7. Traverse all fragments on a ledger, and filter out the fragments
> > > > containing the offline bookie copy;
> > > > 8. Copy data for each fragment;
> > > > 9. When a ledger fragment is copied, delete the corresponding
> > > > ledgers/offline_ledgers/ledgerId;
> > > > 10. When all ledgerId directories under ledgers/offline_ledgers are
> > > > deleted, it means that the data has been migrated, you can stop
> bookies
> > > in
> > > > batches and go offline;
> > > >
> > > >
> > > > To achieve our goal, we need to achieve two things:
> > > > 1. Implement a command to submit the bookie to be offline and the
> > > > corresponding ledgers, for example:
> > > > bin/bookkeeper shell decommissionbookie -offline_bookieids&nbsp;
> > > > bookieId1,bookieId2,bookieId3,...bookieIdN
> > > > &nbsp; This command will write all ledgers on the offline bookie node
> > to
> > > > the zookeeper directory, for example: put
> > > ledgers/offline_ledgers/ledgerId
> > > > bookId1,bookId2,...bookIdn;
> > > > 2. Design a ReassignLedgerWorker class to perform the actual ledger
> > > > replication:&nbsp;
> > > > &nbsp; &nbsp;this class will obtain a ledger from the zookeeper
> > directory
> > > > ledgers/offline_ledgers for replication.&nbsp;
> > > > &nbsp; &nbsp;It will first filter out all the fragments containing
> the
> > > > offline bookieId under the ledger,then copy these fragments;
> > >
> >
>

Reply via email to