Re: [DISCUSS] Merge HBASE-26067 branch into master, and backport it to base 2 branches

Andrew Purtell Wed, 08 Dec 2021 18:07:45 -0800

+1 for merging to branch-2 (2.6)


> On Dec 8, 2021, at 6:04 PM, 张铎 <[email protected]> wrote:
> 
> I think here we just want this to be backported to 2.x, not 2.5.x.
> 
> So thanks Andrew for the quick action.
> 
> +1 on merging HBASE-26067 to master and backporting to branch-2(2.6.0).
> 
> Thanks.
> 
> Andrew Purtell <[email protected]> 于2021年12月9日周四 08:45写道：
> 
>> I concur with Nick, but let me help here by branching 2.5 today. It was
>> always going to be somewhat arbitrary a point.
>> 
>>> On Wed, Dec 8, 2021 at 3:09 PM Nick Dimiduk <[email protected]> wrote:
>>> 
>>> Based solely on the comments made to this thread, I would recommend
>> against
>>> a merge to branch-2, given that we are very close to 2.5. The points
>> about
>>> existing gaps seem like things we're not ready to publish in the
>> impending
>>> minor release. Once we have a branch-2.5, this particular concern of mine
>>> will be alleviated.
>>> 
>>> Thanks,
>>> Nick
>>> 
>>>> On Wed, Dec 8, 2021 at 1:37 PM Josh Elser <[email protected]> wrote:
>>> 
>>>> I was going to wait for some other folks to chime in, but I guess I can
>>>> be the next one :)
>>>> 
>>>> Duo, Wellington, and Szabolcs have been doing some excellent work on
>> the
>>>> storefile tracking (SFT) to a degree that I never expected to see. I
>>>> remember some of the original "Filesystem re-do" issues on Jira. The
>>>> idea was exceptional, but the result seemed unreachable.
>>>> 
>>>> These devs, building on the success of what Zach/Stephen first talked
>>>> about in HBASE-24749, came up with what I think is an excellent step
>>>> forward. I've yet to break it via my own testing, but do acknowledge
>>>> that there's always more work to be done.
>>>> 
>>>> I think this is at a reasonable place to merge this back into the
>>>> "mainline" branches from the feature branch (HBASE-26067). I believe
>>>> this is ready because:
>>>> 
>>>> 1. The feature is completely opt-in (HBase works the same way by
>> default)
>>>> 2. There is API to migrate tables into the new SFT implementation
>>>> 3. There is also API to migrate tables back to the default
>> implementation
>>>> 
>>>> Some gaps still exist around bulk loading, documentation, snapshots,
>> and
>>>> recovery tooling, but these are being worked on. In the context of S3,
>>>> this makes a significantly more compelling offering of HBase by
>> removing
>>>> the complexity of HBOSS. For HBase in all installations, I think SFT
>>>> makes more a significantly more "deterministic" way of managing
>>>> regions/files.
>>>> 
>>>> +1 from me to merge HBASE-26067 into master and branch-2
>>>> 
>>>> - Josh
>>>> 
>>>> On 12/7/21 10:31 AM, Wellington Chevreuil wrote:
>>>>> Hello everyone,
>>>>> 
>>>>> We have been making progress on the alternative way of tracking store
>>>> files
>>>>> originally proposed by Duo in HBASE-26067.
>>>>> 
>>>>> To briefly summarize it for those not following it, this feature
>>>> introduces
>>>>> an abstraction layer to track store files still used/needed by store
>>>>> engines, allowing for plugging different approaches of identifying
>>> store
>>>>> files required by the given store. The design doc describing it in
>> more
>>>>> detail is available here
>>>>> <
>>>> 
>>> 
>> https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s
>>>>> 
>>>>> .
>>>>> 
>>>>> Our main goal within this feature is to avoid the need for using temp
>>>> files
>>>>> and renames when creating new hfiles (whenever flushing, compacting,
>>>>> splitting/merging or snapshotting). This is made possible by the
>>>> pluggable
>>>>> tracker implementation labeled "FILE". The current behavior using
>> temp
>>>> dirs
>>>>> and renames would still be the default approach (labeled "DEFAULT").
>>>>> 
>>>>> This "renameless" approach is appealing for deployments using Amazon
>> S3
>>>>> Object store file system, where the lack of atomic rename operations
>>>>> imposed the necessity of an additional layer of locking (HBOSS),
>> which
>>>>> combined with the s3a rename operation can have a performance
>> overhead.
>>>>> 
>>>>> Some test runs on my employer infrastructure have shown promising
>>>> results.
>>>>> A pure insertion ycsb run has shown ~6% performance gain on the
>> client
>>>>> writes. Snapshot clone of hundreds of regions table completes in half
>>> of
>>>>> the time. There are also improvements in compaction, splits and
>> merges
>>>>> times.
>>>>> 
>>>>> Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we
>> feel
>>>>> optimistic that the current implementation is in a good state to get
>>>> merged
>>>>> into master branch, but it would be nice to hear other opinions about
>>> it,
>>>>> before we effectively commit it. Looking forward to hearing some
>>>>> thoughts/concerns you might have.
>>>>> 
>>>>> Kind regards,
>>>>> Wellington.
>>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> Andrew
>> 
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>   - A23, Crosstalk
>>

Re: [DISCUSS] Merge HBASE-26067 branch into master, and backport it to base 2 branches

Reply via email to