They both need it for a similar use case: "to support Ozone", not anything
core that we handle as part of "Apache Hadoop" and I suppose both are
working fine with HDFS, because of adding dependency with HDFS? and now
they don't want to add Ozone for whatever reasons and folks chasing this
integration want to pass those issues or workload(maintaining/releasing) to
Hadoop.

Adding *isReady()* does make sense to me, I haven't checked what it is
gonna do or work for all FileSystems, but it sounds fair enough to me. Feel
free to raise a ticket for that if you have done some work already

Adding those isSafemode, recoverLease to FileSystem still doesn't make
sense to me, considering them not useful for a bunch of other implementing
FS and we don't want them to lie around just like that and put that
Technical Debt for all. Doing this sounds to me like a hack to avoid adding
Ozone as dependency to these client projects, where they want to use Ozone.
Anyway not starting this again 😉

Quoting Wei-Chiu itself from the first mail, he agreed that adding those
aren't a good idea:

> This is straightforward but the FileSystem would become
> bloated.
>

It was an agreed and known fact, now if it has changed, then can't help it.
A quick HADOOP-18671 <https://issues.apache.org/jira/browse/HADOOP-18671> which
got raised by Wei-Chiu to add these to "FileSystem" has a comment as well
to get them around interface, that still is "kind of Ok", If done properly,
in addition to what already mentioned there in the comments, Without
introducing any "incompatibilities" and not just test but "stable" tests.
Not like some of the recent few changes where folks come and say "we can't
do it without breaking anything", and other people trying to fix the mess,
not getting off topic with that here....

Second idea mentioned in the original mail is also similar to mentioned in
the comment in the above ticket and is still quite acceptable, name can be
negotiated though, Add an interface to pull the relevant methods up in that
without touching FileSystem class, we can have DFS implement that and Ozone
FS implement them as well. We should be sorted: No Hacking, No Bothering
FileSystem and still things can work

-Ayush




On Thu, 23 Mar 2023 at 14:01, Tsz Wo Sze <szets...@yahoo.com> wrote:

> (Clicked "send" too accidentally.  Please ignore my previous email.
> Sorry.)
>
> Hi,
>
>
> We probably should exclude HBase in this discuss.  I guess Wei-Chiu
> mentioning it as an example use case.  There are other projects such as
> Apache Solr requiring similar features.
>
>
> (1) We already has the Syncable (hsync/hflush) interface in Hadoop, it
> makes sense to have a recover() method for recovering hsync'ed/hflushed
> files.  Otherwise, the Syncable feature is incomplete.
>
>
> (2) I also suggest to add a isReady() method.  In FileSystem, there is an
> initialize(..) and its javadoc says:
>
>    * Called after the new FileSystem instance is constructed, and before it
>    * is ready for use.
>
> However, there is no way to check if the FileSystem instance is ready.
>
>
> These are currently two missing features in Hadoop FileSystem.
>
>
> Tsz-Wo
>
>
>
>
> On Wednesday, March 22, 2023, 12:30:49 PM GMT+8, Ayush Saxena <
> ayush...@gmail.com> wrote:
>
>
> Well reflections are good or not will drag this somewhere else. I will
> respect what Tsz-Wo said and put this in my rule book for future :)
>
> If I get into Why we don’t have “all” the API in FileSystem itself will
> drag it to another area, What and where to use Abstraction and stuff like
> that, Which none of the people over here would be interested in.
>
> On a conclusive note: Using Reflections at Hbase from us at Hadoop isn’t
> suggested as Tsz-Wo considers it as a hack.
>
> I have strict objections on pulling them up to FileSystem class because
> they are very core to HDFS and the mentioned API are not just the ones,
> ErasureCoding? And Tomorrow we would have similar requests for ABFS only
> API, Huawei OBS only and many more. FileSystem class would become huge and
> the Technical Debt that we would bring in or encourage would be really
> high. Who is gonna chase behind people if these code creates issues
> somewhere else? I don’t want to quote example publicly and proove a point
> or so. So, leaving this here. With my conclusion on this solution.
>
> Technically it is a HBase problem, they should adapt to Ozone, not sure why
> are they creating unnecessary sound. I got into similar situation for Ozone
> in a different downstream project and folks were very encouraging to get
> and add support for Ozone, Guava messed up else it would have been in. Now
> with this approach those guys won’t even agree. “Go handle at Hadoop or
> Ozone, Why us”, this is something neither of the projects want….
>
> There were mention of favoured node and all as well in the Hbase ML, after
> this would be these stuff, IIRC. The proposal for having option for
> favoured node in Distcp was vetoed recently considering it HDFS only (not
> by me), so thats never ending….
>
> We at Hadoop are discussing and trying to negotiate for Hbase and Ozone
> 🤷‍♂️, When in past ViewDFS was also done at HDFS for same use case, I
> think now people don’t consider it as a solution and we will keep on doing
> stuff for Hbase and then other folks will keep on managing, maintaining and
> releasing them forever!!!
>
> Good Luck!!! But 2 possible solutions are down in this thread
>
> -Ayush
>
> On Wed, 22 Mar 2023 at 7:48 AM, Tsz Wo Sze <szets...@yahoo.com> wrote:
>
> >
> > Ayush,
> >
> >
> > Yes, reflections are a part of Java.  Why we have to define the
> > FileSystem APIs but not simply use reflections all the times?
> >
> >
> > Reflection is good for dealing with unknown code such as loading a
> plugin,
> > code analysis, etc.  However, it probably is not a good way to define
> APIs.
> >
> >
> >
> > Tsz-Wo
> >
> >
> > On Tuesday, March 21, 2023, 01:00:20 PM GMT+8, Ayush Saxena <
> > ayush...@gmail.com> wrote:
> >
> >
> > I am not sure what classifies as a Hack and what not, I thought
> reflections
> > are part of Java.
> >
> > Whatever solution but pulling in just the HDFS specific stuff to
> FileSystem
> > just for Ozone, because Hbase guys didn’t agree and we have people in
> > Hadoop who we can convince, I am -1 to such an approach and mindset.
> Hbase
> > wants ozone, they should give way for it like they do for HDFS
> >
> > Explore ways in Hbase, explore the Utils and ways by the links that Steve
> > shared, try ViewDFS, When we have some more convincing reasons, we can
> > discuss more over here to pull them to FileSystem as the last option
> >
> > -Ayush
> >
> > On Fri, 17 Mar 2023 at 2:26 AM, Wei-Chiu Chuang <weic...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Stephen and I are working on a project to make HBase to run on Ozone.
> > >
> > > HBase, born out of the Hadoop project, depends on a number of HDFS
> > specific
> > > APIs, including recoverLease() and isInSafeMode(). The HBase community
> > [1]
> > > strongly voiced that they don't want the project to have direct
> > dependency
> > > on additional FS implementations due to dependency and vulnerability
> > > management concerns.
> > >
> > > To make this project successful, we're exploring options, to push up
> > these
> > > APIs to the FileSystem abstraction. Eventually, it would make HBase FS
> > > implementation agnostic, and perhaps enable HBase to support other
> > storage
> > > systems in the future.
> > >
> > > We'd use the PathCapabilities API to probe if the underlying FS
> > > implementation supports these APIs, and would then invoke the
> > corresponding
> > > FileSystem APIs. This is straightforward but the FileSystem would
> become
> > > bloated.
> > >
> > > Another option is to create a "RecoverableFileSystem" interface, and
> have
> > > both DistributedFileSystem (HDFS) and RootedOzoneFileSystem (Ozone).
> This
> > > way the impact to the Hadoop project and the FileSystem abstraction is
> > even
> > > smaller.
> > >
> > > Thoughts?
> > >
> > > [1] https://lists.apache.org/thread/tcrp8vxxs3z12y36mpzx35txhpp7tvxv
> > >
> >
>

Reply via email to