I was going to wait for some other folks to chime in, but I guess I can
be the next one :)
Duo, Wellington, and Szabolcs have been doing some excellent work on the
storefile tracking (SFT) to a degree that I never expected to see. I
remember some of the original "Filesystem re-do" issues on Jira. The
idea was exceptional, but the result seemed unreachable.
These devs, building on the success of what Zach/Stephen first talked
about in HBASE-24749, came up with what I think is an excellent step
forward. I've yet to break it via my own testing, but do acknowledge
that there's always more work to be done.
I think this is at a reasonable place to merge this back into the
"mainline" branches from the feature branch (HBASE-26067). I believe
this is ready because:
1. The feature is completely opt-in (HBase works the same way by default)
2. There is API to migrate tables into the new SFT implementation
3. There is also API to migrate tables back to the default implementation
Some gaps still exist around bulk loading, documentation, snapshots, and
recovery tooling, but these are being worked on. In the context of S3,
this makes a significantly more compelling offering of HBase by removing
the complexity of HBOSS. For HBase in all installations, I think SFT
makes more a significantly more "deterministic" way of managing
regions/files.
+1 from me to merge HBASE-26067 into master and branch-2
- Josh
On 12/7/21 10:31 AM, Wellington Chevreuil wrote:
Hello everyone,
We have been making progress on the alternative way of tracking store files
originally proposed by Duo in HBASE-26067.
To briefly summarize it for those not following it, this feature introduces
an abstraction layer to track store files still used/needed by store
engines, allowing for plugging different approaches of identifying store
files required by the given store. The design doc describing it in more
detail is available here
<https://docs.google.com/document/d/16Nr1Fn3VaXuz1g1FTiME-bnGR3qVK5B-raXshOkDLcY/edit#heading=h.calrs3kn4d8s>
.
Our main goal within this feature is to avoid the need for using temp files
and renames when creating new hfiles (whenever flushing, compacting,
splitting/merging or snapshotting). This is made possible by the pluggable
tracker implementation labeled "FILE". The current behavior using temp dirs
and renames would still be the default approach (labeled "DEFAULT").
This "renameless" approach is appealing for deployments using Amazon S3
Object store file system, where the lack of atomic rename operations
imposed the necessity of an additional layer of locking (HBOSS), which
combined with the s3a rename operation can have a performance overhead.
Some test runs on my employer infrastructure have shown promising results.
A pure insertion ycsb run has shown ~6% performance gain on the client
writes. Snapshot clone of hundreds of regions table completes in half of
the time. There are also improvements in compaction, splits and merges
times.
Talking with Duo Zhang and Josh Elser in the HBASE-26067 jira, we feel
optimistic that the current implementation is in a good state to get merged
into master branch, but it would be nice to hear other opinions about it,
before we effectively commit it. Looking forward to hearing some
thoughts/concerns you might have.
Kind regards,
Wellington.