Hbase doesn’t want to add Ozone as a dependency sounds to me like a ‘Hbase 
having resistance against the people proposing or against Ozone’

Anyway doesn’t ViewDistributedFileSystem not solve this Ozone problem, I 
remember Uma chasing that to solve these problems only?

Pulling up the core HDFS API honestly looks a naive approach, there is some 
work around reflection for DistCp with snapahots to work with Ozone, Hbase 
folks could have used that as 
well(https://issues.apache.org/jira/browse/HDFS-16911)

Juzz my thoughts on solving the problem, which I feel can be easily solved by 
writing a Util class in Hbase with some reflection logics…


-Ayush

> On 20-Mar-2023, at 9:54 PM, Wei-Chiu Chuang <weic...@apache.org> wrote:
> 
> Thank you. Makes sense to me. Yes, as part of this effort we are going to
> need contract tests.
> 
>> On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <ste...@cloudera.com.invalid>
>> wrote:
>> 
>>   1. I think a new interface would be good as FileContext could do the
>>   same thing
>>   2. using PathCapabilities probes should still be mandatory as for
>>   FileContext it would depend on the back end
>>   3. Whoever does this gets to specify what the API does and write the
>>   contract tests. Saying "just to do what HDFS does" isn't enough as it's
>> not
>>   always clear the HDFS team no how much of that behaviour is intentional
>>   (rename, anyone?).
>> 
>> 
>> For any new API (a better rename, a better delete,...) I would normally
>> insist on making it cloud friendly, with an extensible builder API and an
>> emphasis on asynchronous IO. However this is existing code and does target
>> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
>> right thing to do here.
>> 
>> I have a WiP project to do a shim library to offer new FS APIs two older
>> Hadoop releases by way of reflection, so that we can get new APIs taken up
>> across projects where we cannot choreograph version updates across the
>> entire stack. (hello parquet, spark,...). My goal is to actually make this
>> a Hadoop managed project, with its own release schedule. You could add an
>> equivalent of the new interface in here, which would then use reflection
>> behind-the-scenes to invoke the underlying HDFS methods when the FS client
>> has them.
>> 
>> https://github.com/steveloughran/fs-api-shim
>> 
>> I've just added vector IO API there; the next step is to copy over a lot of
>> the contract tests from hadoop common and apply them through the shim -to
>> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
>> tricky as the reflection itself. However without this library it is going
>> to take a long long time for the open source applications to pick up the
>> higher performance/Cloud ready Apis. Yes, those of us who can build the
>> entire stack can do it, but that gradually adds more divergence from the
>> open source libraries, reduces the test coverage overall and only increases
>> maintenance costs over time.
>> 
>> steve
>> 
>>> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <weic...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> Stephen and I are working on a project to make HBase to run on Ozone.
>>> 
>>> HBase, born out of the Hadoop project, depends on a number of HDFS
>> specific
>>> APIs, including recoverLease() and isInSafeMode(). The HBase community
>> [1]
>>> strongly voiced that they don't want the project to have direct
>> dependency
>>> on additional FS implementations due to dependency and vulnerability
>>> management concerns.
>>> 
>>> To make this project successful, we're exploring options, to push up
>> these
>>> APIs to the FileSystem abstraction. Eventually, it would make HBase FS
>>> implementation agnostic, and perhaps enable HBase to support other
>> storage
>>> systems in the future.
>>> 
>>> We'd use the PathCapabilities API to probe if the underlying FS
>>> implementation supports these APIs, and would then invoke the
>> corresponding
>>> FileSystem APIs. This is straightforward but the FileSystem would become
>>> bloated.
>>> 
>>> Another option is to create a "RecoverableFileSystem" interface, and have
>>> both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
>>> way the impact to the Hadoop project and the FileSystem abstraction is
>> even
>>> smaller.
>>> 
>>> Thoughts?
>>> 
>>> [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
>>> 
>> 

Reply via email to