+1 for merging to branch-2 (2.6)
> On Dec 8, 2021, at 6:04 PM, 张铎 <palomino...@gmail.com> wrote: > > I think here we just want this to be backported to 2.x, not 2.5.x. > > So thanks Andrew for the quick action. > > +1 on merging HBASE-26067 to master and backporting to branch-2(2.6.0). > > Thanks. > > Andrew Purtell <apurt...@apache.org> 于2021年12月9日周四 08:45写道: > >> I concur with Nick, but let me help here by branching 2.5 today. It was >> always going to be somewhat arbitrary a point. >> >>> On Wed, Dec 8, 2021 at 3:09 PM Nick Dimiduk <ndimi...@apache.org> wrote: >>> >>> Based solely on the comments made to this thread, I would recommend >> against >>> a merge to branch-2, given that we are very close to 2.5. The points >> about >>> existing gaps seem like things we're not ready to publish in the >> impending >>> minor release. Once we have a branch-2.5, this particular concern of mine >>> will be alleviated. >>> >>> Thanks, >>> Nick >>> >>>> On Wed, Dec 8, 2021 at 1:37 PM Josh Elser <els...@apache.org> wrote: >>> >>>> I was going to wait for some other folks to chime in, but I guess I can >>>> be the next one :) >>>> >>>> Duo, Wellington, and Szabolcs have been doing some excellent work on >> the >>>> storefile tracking (SFT) to a degree that I never expected to see. I >>>> remember some of the original "Filesystem re-do" issues on Jira. The >>>> idea was exceptional, but the result seemed unreachable. >>>> >>>> These devs, building on the success of what Zach/Stephen first talked >>>> about in HBASE-24749, came up with what I think is an excellent step >>>> forward. I've yet to break it via my own testing, but do acknowledge >>>> that there's always more work to be done. >>>> >>>> I think this is at a reasonable place to merge this back into the >>>> "mainline" branches from the feature branch (HBASE-26067). I believe >>>> this is ready because: >>>> >>>> 1. The feature is completely opt-in (HBase works the same way by >> default) >>>> 2. There is API to migrate tables into the new SFT implementation >>>> 3. There is also API to migrate tables back to the default >> implementation >>>> >>>> Some gaps still exist around bulk loading, documentation, snapshots, >> and >>>> recovery tooling, but these are being worked on. In the context of S3, >>>> this makes a significantly more compelling offering of HBase by >> removing >>>> the complexity of HBOSS. For HBase in all installations, I think SFT >>>> makes more a significantly more "deterministic" way of managing >>>> regions/files. >>>> >>>> +1 from me to merge HBASE-26067 into master and branch-2 >>>> >>>> - Josh >>>> >>>> On 12/7/21 10:31 AM, Wellington Chevreuil wrote: >>>>> Hello everyone, >>>>> >>>>> We have been making progress on the alternative way of tracking store >>>> files >>>>> originally proposed by Duo in HBASE-26067. >>>>> >>>>> To briefly summarize it for those not following it, this feature >>>> introduces >>>>> an abstraction layer to track store files still used/needed by store >>>>> engines, allowing for plugging different approaches of identifying >>> store >>>>> files required by the given store. The design doc describing it in >> more >>>>> detail is available here >>>>> < >>>> >>> >> https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s >>>>> >>>>> . >>>>> >>>>> Our main goal within this feature is to avoid the need for using temp >>>> files >>>>> and renames when creating new hfiles (whenever flushing, compacting, >>>>> splitting/merging or snapshotting). This is made possible by the >>>> pluggable >>>>> tracker implementation labeled "FILE". The current behavior using >> temp >>>> dirs >>>>> and renames would still be the default approach (labeled "DEFAULT"). >>>>> >>>>> This "renameless" approach is appealing for deployments using Amazon >> S3 >>>>> Object store file system, where the lack of atomic rename operations >>>>> imposed the necessity of an additional layer of locking (HBOSS), >> which >>>>> combined with the s3a rename operation can have a performance >> overhead. >>>>> >>>>> Some test runs on my employer infrastructure have shown promising >>>> results. >>>>> A pure insertion ycsb run has shown ~6% performance gain on the >> client >>>>> writes. Snapshot clone of hundreds of regions table completes in half >>> of >>>>> the time. There are also improvements in compaction, splits and >> merges >>>>> times. >>>>> >>>>> Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we >> feel >>>>> optimistic that the current implementation is in a good state to get >>>> merged >>>>> into master branch, but it would be nice to hear other opinions about >>> it, >>>>> before we effectively commit it. Looking forward to hearing some >>>>> thoughts/concerns you might have. >>>>> >>>>> Kind regards, >>>>> Wellington. >>>>> >>>> >>> >> >> >> -- >> Best regards, >> Andrew >> >> Words like orphans lost among the crosstalk, meaning torn from truth's >> decrepit hands >> - A23, Crosstalk >>