Getting rid of order constraints would be very cool for garbage
collections. +1 for your proposal.

It would be nice to have some benchmark results for these two different
solutions after you generated the patch.

-Sijie


On Mon, Mar 18, 2013 at 8:18 PM, Jiannan Wang <[email protected]> wrote:

> Hi Sijie:
>
> for your proposal, you just putting the memory issue into the snapshot. for
> now, the local snapshot we should not modify it, otherwise it would modify
> the whole active ledgers map. so if you want to modify the snapshot, you
> need to construct another map from this snapshot to proceed your algorithm,
> which is additional map. So you need to have some experiment to show a BFS
> consume more memory than your proposal.
> ------------------
>
>    Thanks for figuring out potential memory issue. Yes, I need a snapshot
> copy to process the algorithm. But I think it is acceptable compare to a
> BFS on radix tree: the copy of snapshot entry number is same with local
> active ledgers (so that is tens of M bytes at most), while BFS requires to
> maintain nodes in one tree level and it might very large in worse case
> because there are ledgers from other bookies.
>
>
> Actually I am thinking we should adapt improved GC, since it expose some GC
> interface to allow different implementations for different GC algorithm. so
> it would help you have different GC for radix ledger manager. How is your
> options?
> ------------------
>
>
>    Of cause, I would also prefer improved GC.
>    But I would like to say that without the order requirement when scan
> all ledger ids, many things become simple:
>       * We even don't need radix tree to maintain ledger metadata, a
> hierarchical hash tree is enough (just as what topic metadata management
> does).
>       * Easy to handle 64-bit ledger id backward compatibility for
> MSLedgerManager: currently we format ledger id to a fixed length (10 bit)
> digit string to make order scan, when a 64-bit ledger id is used we need
> to enlarge the fixed length, then old ledger id backward compatibility
> turns to be a trouble.
>    I don't know if there is urgent requirement for order scan ledger id,
> if no I would suggest to remove it.
>
>
>
> Regards,
> Jiannan
>
>
> On 3/19/13 2:03 AM, "Sijie Guo" <[email protected]> wrote:
>
> >Jiannan:
> >
> >for your proposal, you just putting the memory issue into the snapshot.
> >for
> >now, the local snapshot we should not modify it, otherwise it would modify
> >the whole active ledgers map. so if you want to modify the snapshot, you
> >need to construct another map from this snapshot to proceed your
> >algorithm,
> >which is additional map. So you need to have some experiment to show a BFS
> >consume more memory than your proposal.
> >
> >Actually I am thinking we should adapt improved GC, since it expose some
> >GC
> >interface to allow different implementations for different GC algorithm.
> >so
> >it would help you have different GC for radix ledger manager. How is your
> >options?
> >
> >-Sijie
> >
> >
> >On Mon, Mar 18, 2013 at 3:14 AM, Jiannan Wang <[email protected]>
> >wrote:
> >
> >> Hi there,
> >>    I'm preparing the patch for new 64-bits zookeeper-based ledger
> >>manager
> >> with radix tree (BOOKKEEPER-553), but I get a problem.
> >>
> >>    A LedgerManager implementation needs to complete a method called
> >> "getLedgerRanges", which scans all ledger ids in order from metadata
> >> storage to do local Scan-And-Compare GC. However, for a radix tree
> >>ledger
> >> manager implementation, scan all ledgers in order (in BFS way) may
> >>require
> >> large memory usage.
> >>
> >>    After digging the Scan-And-Compare GC, I find it's not necessary to
> >> make any order requirement. What Scan-And-Compare GC approach wants to
> >>know
> >> is the ledger id that exists in local bookie server but not in metadata
> >> storage. So we can do it in another way:
> >>    1. Get a snapshot of local ledger id list "L".
> >>    2. Get all the ledger id from metadata storage and remove it from
> >>list
> >> "L" (here we do not require the metadata storage return ledger id range
> >>in
> >> any order guarantee).
> >>    3. After step 2 finish looping all ledger id in metadata storage, GC
> >> the remaining ledger id in "L" list.
> >>    By this, we don't require a "ORDER SCAN" now,
> >> LedgerManager#asyncProcessLedgers is enough to do this job.
> >>
> >>    Of cause, this implementation has one drawback: GC process can only
> >> take place after iterating all ledger id in metadata storage. But I just
> >> don't think we need specific order guarantee for Scan-And-Compare GC,
> >>there
> >> are already some other better improved GC approaches.
> >>
> >> - Jiannan
> >>
> >>
>
>

Reply via email to