Getting rid of order constraints would be very cool for garbage collections. +1 for your proposal.
It would be nice to have some benchmark results for these two different solutions after you generated the patch. -Sijie On Mon, Mar 18, 2013 at 8:18 PM, Jiannan Wang <[email protected]> wrote: > Hi Sijie: > > for your proposal, you just putting the memory issue into the snapshot. for > now, the local snapshot we should not modify it, otherwise it would modify > the whole active ledgers map. so if you want to modify the snapshot, you > need to construct another map from this snapshot to proceed your algorithm, > which is additional map. So you need to have some experiment to show a BFS > consume more memory than your proposal. > ------------------ > > Thanks for figuring out potential memory issue. Yes, I need a snapshot > copy to process the algorithm. But I think it is acceptable compare to a > BFS on radix tree: the copy of snapshot entry number is same with local > active ledgers (so that is tens of M bytes at most), while BFS requires to > maintain nodes in one tree level and it might very large in worse case > because there are ledgers from other bookies. > > > Actually I am thinking we should adapt improved GC, since it expose some GC > interface to allow different implementations for different GC algorithm. so > it would help you have different GC for radix ledger manager. How is your > options? > ------------------ > > > Of cause, I would also prefer improved GC. > But I would like to say that without the order requirement when scan > all ledger ids, many things become simple: > * We even don't need radix tree to maintain ledger metadata, a > hierarchical hash tree is enough (just as what topic metadata management > does). > * Easy to handle 64-bit ledger id backward compatibility for > MSLedgerManager: currently we format ledger id to a fixed length (10 bit) > digit string to make order scan, when a 64-bit ledger id is used we need > to enlarge the fixed length, then old ledger id backward compatibility > turns to be a trouble. > I don't know if there is urgent requirement for order scan ledger id, > if no I would suggest to remove it. > > > > Regards, > Jiannan > > > On 3/19/13 2:03 AM, "Sijie Guo" <[email protected]> wrote: > > >Jiannan: > > > >for your proposal, you just putting the memory issue into the snapshot. > >for > >now, the local snapshot we should not modify it, otherwise it would modify > >the whole active ledgers map. so if you want to modify the snapshot, you > >need to construct another map from this snapshot to proceed your > >algorithm, > >which is additional map. So you need to have some experiment to show a BFS > >consume more memory than your proposal. > > > >Actually I am thinking we should adapt improved GC, since it expose some > >GC > >interface to allow different implementations for different GC algorithm. > >so > >it would help you have different GC for radix ledger manager. How is your > >options? > > > >-Sijie > > > > > >On Mon, Mar 18, 2013 at 3:14 AM, Jiannan Wang <[email protected]> > >wrote: > > > >> Hi there, > >> I'm preparing the patch for new 64-bits zookeeper-based ledger > >>manager > >> with radix tree (BOOKKEEPER-553), but I get a problem. > >> > >> A LedgerManager implementation needs to complete a method called > >> "getLedgerRanges", which scans all ledger ids in order from metadata > >> storage to do local Scan-And-Compare GC. However, for a radix tree > >>ledger > >> manager implementation, scan all ledgers in order (in BFS way) may > >>require > >> large memory usage. > >> > >> After digging the Scan-And-Compare GC, I find it's not necessary to > >> make any order requirement. What Scan-And-Compare GC approach wants to > >>know > >> is the ledger id that exists in local bookie server but not in metadata > >> storage. So we can do it in another way: > >> 1. Get a snapshot of local ledger id list "L". > >> 2. Get all the ledger id from metadata storage and remove it from > >>list > >> "L" (here we do not require the metadata storage return ledger id range > >>in > >> any order guarantee). > >> 3. After step 2 finish looping all ledger id in metadata storage, GC > >> the remaining ledger id in "L" list. > >> By this, we don't require a "ORDER SCAN" now, > >> LedgerManager#asyncProcessLedgers is enough to do this job. > >> > >> Of cause, this implementation has one drawback: GC process can only > >> take place after iterating all ledger id in metadata storage. But I just > >> don't think we need specific order guarantee for Scan-And-Compare GC, > >>there > >> are already some other better improved GC approaches. > >> > >> - Jiannan > >> > >> > >
