Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase having resistance against the people proposing or against Ozone’
Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I remember Uma chasing that to solve these problems only? Pulling up the core HDFS API honestly looks a naive approach, there is some work around reflection for DistCp with snapahots to work with Ozone, Hbase folks could have used that as well(https://issues.apache.org/jira/browse/HDFS-16911) Juzz my thoughts on solving the problem, which I feel can be easily solved by writing a Util class in Hbase with some reflection logics… -Ayush > On 20-Mar-2023, at 9:54 PM, Wei-Chiu Chuang <weic...@apache.org> wrote: > > Thank you. Makes sense to me. Yes, as part of this effort we are going to > need contract tests. > >> On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <ste...@cloudera.com.invalid> >> wrote: >> >> 1. I think a new interface would be good as FileContext could do the >> same thing >> 2. using PathCapabilities probes should still be mandatory as for >> FileContext it would depend on the back end >> 3. Whoever does this gets to specify what the API does and write the >> contract tests. Saying "just to do what HDFS does" isn't enough as it's >> not >> always clear the HDFS team no how much of that behaviour is intentional >> (rename, anyone?). >> >> >> For any new API (a better rename, a better delete,...) I would normally >> insist on making it cloud friendly, with an extensible builder API and an >> emphasis on asynchronous IO. However this is existing code and does target >> HDFS and Ozone -pulling the existing APIs up into a new interface seems the >> right thing to do here. >> >> I have a WiP project to do a shim library to offer new FS APIs two older >> Hadoop releases by way of reflection, so that we can get new APIs taken up >> across projects where we cannot choreograph version updates across the >> entire stack. (hello parquet, spark,...). My goal is to actually make this >> a Hadoop managed project, with its own release schedule. You could add an >> equivalent of the new interface in here, which would then use reflection >> behind-the-scenes to invoke the underlying HDFS methods when the FS client >> has them. >> >> https://github.com/steveloughran/fs-api-shim >> >> I've just added vector IO API there; the next step is to copy over a lot of >> the contract tests from hadoop common and apply them through the shim -to >> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as >> tricky as the reflection itself. However without this library it is going >> to take a long long time for the open source applications to pick up the >> higher performance/Cloud ready Apis. Yes, those of us who can build the >> entire stack can do it, but that gradually adds more divergence from the >> open source libraries, reduces the test coverage overall and only increases >> maintenance costs over time. >> >> steve >> >>> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <weic...@apache.org> wrote: >>> >>> Hi, >>> >>> Stephen and I are working on a project to make HBase to run on Ozone. >>> >>> HBase, born out of the Hadoop project, depends on a number of HDFS >> specific >>> APIs, including recoverLease() and isInSafeMode(). The HBase community >> [1] >>> strongly voiced that they don't want the project to have direct >> dependency >>> on additional FS implementations due to dependency and vulnerability >>> management concerns. >>> >>> To make this project successful, we're exploring options, to push up >> these >>> APIs to the FileSystem abstraction. Eventually, it would make HBase FS >>> implementation agnostic, and perhaps enable HBase to support other >> storage >>> systems in the future. >>> >>> We'd use the PathCapabilities API to probe if the underlying FS >>> implementation supports these APIs, and would then invoke the >> corresponding >>> FileSystem APIs. This is straightforward but the FileSystem would become >>> bloated. >>> >>> Another option is to create a "RecoverableFileSystem" interface, and have >>> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This >>> way the impact to the Hadoop project and the FileSystem abstraction is >> even >>> smaller. >>> >>> Thoughts? >>> >>> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv >>> >>