Thank you. Makes sense to me. Yes, as part of this effort we are going to
need contract tests.

On Fri, Mar 17, 2023 at 3:52 AM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

>    1. I think a new interface would be good as FileContext could do the
>    same thing
>    2. using PathCapabilities probes should still be mandatory as for
>    FileContext it would depend on the back end
>    3. Whoever does this gets to specify what the API does and write the
>    contract tests. Saying "just to do what HDFS does" isn't enough as it's
> not
>    always clear the HDFS team no how much of that behaviour is intentional
>    (rename, anyone?).
>
>
> For any new API (a better rename, a better delete,...) I would normally
> insist on making it cloud friendly, with an extensible builder API and an
> emphasis on asynchronous IO. However this is existing code and does target
> HDFS and Ozone -pulling the existing APIs up into a new interface seems the
> right thing to do here.
>
>  I have a WiP project to do a shim library to offer new FS APIs two older
> Hadoop releases by way of reflection, so that we can get new APIs taken up
> across projects where we cannot choreograph version updates across the
> entire stack. (hello parquet, spark,...). My goal is to actually make this
> a Hadoop managed project, with its own release schedule. You could add an
> equivalent of the new interface in here, which would then use reflection
> behind-the-scenes to invoke the underlying HDFS methods when the FS client
> has them.
>
> https://github.com/steveloughran/fs-api-shim
>
> I've just added vector IO API there; the next step is to copy over a lot of
> the contract tests from hadoop common and apply them through the shim -to
> hadoop 3.2, 3.3.0-3.3.5. That testing against many backends is actually as
> tricky as the reflection itself. However without this library it is going
> to take a long long time for the open source applications to pick up the
> higher performance/Cloud ready Apis. Yes, those of us who can build the
> entire stack can do it, but that gradually adds more divergence from the
> open source libraries, reduces the test coverage overall and only increases
> maintenance costs over time.
>
> steve
>
> On Thu, 16 Mar 2023 at 20:56, Wei-Chiu Chuang <weic...@apache.org> wrote:
>
> > Hi,
> >
> > Stephen and I are working on a project to make HBase to run on Ozone.
> >
> > HBase, born out of the Hadoop project, depends on a number of HDFS
> specific
> > APIs, including recoverLease() and isInSafeMode(). The HBase community
> [1]
> > strongly voiced that they don't want the project to have direct
> dependency
> > on additional FS implementations due to dependency and vulnerability
> > management concerns.
> >
> > To make this project successful, we're exploring options, to push up
> these
> > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > implementation agnostic, and perhaps enable HBase to support other
> storage
> > systems in the future.
> >
> > We'd use the PathCapabilities API to probe if the underlying FS
> > implementation supports these APIs, and would then invoke the
> corresponding
> > FileSystem APIs. This is straightforward but the FileSystem would become
> > bloated.
> >
> > Another option is to create a "RecoverableFileSystem" interface, and have
> > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone). This
> > way the impact to the Hadoop project and the FileSystem abstraction is
> even
> > smaller.
> >
> > Thoughts?
> >
> > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> >
>

Reply via email to