With S3, we use reflection when loading the filesystem. As long as it is on
the CP, we're okay. However, I think the difference here is the APIs that
Stephen wants to use is not in the hadoop filesystem API/class, but instead
in DFS I believe. Others in this thread have mentioned ways to work around
this both in the short term and long term.

On Thu, Mar 16, 2023, 11:06 PM 张铎(Duo Zhang) <[email protected]> wrote:

> I just mean how do we manage the dependencies. We do not ship hbase with
> aws-sdk, but I believe users can put the related jars under lib and then
> use S3AFileSystem. I think we can support ozone in the same way?
>
> Viraj Jasani <[email protected]> 于2023年3月17日周五 13:33写道:
>
> > WAL is still kept on hdfs only, even when hfiles are kept in s3 AFAIK.
> But
> > here it seems, both WAL and HFiles can be kept in Ozone IIUC.
> >
> >
> > On Thu, Mar 16, 2023 at 8:46 PM 张铎(Duo Zhang) <[email protected]>
> > wrote:
> >
> > > How do we support S3 as HFile storage currently? I do not think we have
> > > added aws-sdk as a direct dependency in HBase now?
> > >
> > > Viraj Jasani <[email protected]> 于2023年3月17日周五 04:37写道:
> > >
> > > > +1, similar to what was done in the past for using
> > > > HdfsDataOutputStreamBuilder that was available since hadoop 2.9 or
> so I
> > > > think.
> > > >
> > > >
> > > > On Thu, Mar 16, 2023 at 1:04 PM Andrew Purtell <
> > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > It should be done with reflection rather than take a direct
> > dependency,
> > > > > until Hadoop common interfaces are available in what we consider
> the
> > > > lowest
> > > > > supported version.
> > > > >
> > > > > > On Mar 16, 2023, at 12:35 PM, Viraj Jasani <[email protected]>
> > > wrote:
> > > > > >
> > > > > > It would be nice using PathCapabilities to determine lease
> > recovery
> > > > as a
> > > > > > feature flag.
> > > > > > In fact, s3a and abfs have lots of feature flags being derived
> from
> > > > this
> > > > > > API already. It would be good for dfs and ozone to recognize
> lease
> > > > > recovery
> > > > > > as a capability.
> > > > > >
> > > > > > However, this alone might not be sufficient and something like
> > > > > > RecoverableFileSystem interface would be helpful as long as we
> can
> > > > > abstract
> > > > > > out lease recovery (and safe mode etc) options as hbase anyways
> > need
> > > to
> > > > > > perform them.
> > > > > >
> > > > > > Hence, having both: a) path capability to identify if lease
> > recovery
> > > > etc
> > > > > > features are available and b) a new FileSystem interface that
> both
> > > dfs
> > > > > and
> > > > > > ozone can implement, would be great IMHO. Because even if we just
> > > have
> > > > > path
> > > > > > capability for the feature flag, we would still end up adding
> ozone
> > > > > > dependency (unless done with reflection as Andrew mentioned) to
> > > perform
> > > > > > lease recovery unless lease recovery is abstracted out somewhere
> in
> > > > > hadoop.
> > > > > >
> > > > > >> One of the original worries is if the Hadoop/HDFS community
> > > > > >> would reject our proposal when we change the base
> > interface/abstract
> > > > > class
> > > > > >> in FileSystem (if it's non-backward compatible).
> > > > > >
> > > > > > I believe, new IA.Public interface in hadoop that can abstract
> out
> > > > lease
> > > > > > recovery etc would have less likelihood of getting rejected than
> > > > "making
> > > > > > changes in FileSystem directly".
> > > > > >
> > > > > >
> > > > > >> On Thu, Mar 16, 2023 at 2:07 AM Tak Lon (Stephen) Wu <
> > > > [email protected]
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> In addition, I'm yet confirm but based on another search in the
> > > hadoop
> > > > > >> code, we may be able to add recover lease as a feature flag in
> > > > > >> CommonPathCapabilities [3] and can be used by the interface of
> > > > > >> PathCapabilities#hasPathCapability [4]. (this is similar to
> > > > > >> StreamCapabilities as mentioned by Viraj)
> > > > > >>
> > > > > >> 3.
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java
> > > > > >> 4.
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
> > > > > >>
> > > > > >> -Stephen
> > > > > >>
> > > > > >>> On Thu, Mar 16, 2023 at 12:00 AM Tak Lon (Stephen) Wu <
> > > > > [email protected]>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> Thanks everyone ! Sean helped to clarify that something like
> DFS
> > > > > specific
> > > > > >>> APIs used by HBase has been in-place in many HBase modules as
> the
> > > > > feature
> > > > > >>> implementation but yet standardized in hadoop general
> FileSystem
> > > API,
> > > > > >> e.g.
> > > > > >>> lease recovery. One of the original worries is if the
> Hadoop/HDFS
> > > > > >> community
> > > > > >>> would reject our proposal when we change the base
> > > interface/abstract
> > > > > >> class
> > > > > >>> in FileSystem (if it's non-backward compatible). The discussion
> > > here
> > > > > >> helps
> > > > > >>> to confirm the direction, and let's see how we can make it
> > generic
> > > > and
> > > > > >>> could help to avoid confusion in both places.
> > > > > >>>
> > > > > >>> Thanks again,
> > > > > >>> Stephen
> > > > > >>>
> > > > > >>> On Wed, Mar 15, 2023 at 2:54 PM Andrew Purtell <
> > > > > [email protected]
> > > > > >>>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Then Hadoop should add one and although we would need a
> > reflection
> > > > > >> based
> > > > > >>>> check in the interim we can converge toward the ideal.
> > > > > >>>>
> > > > > >>>> In any case I believe we can avoid a direct dependency on
> Ozone
> > > and
> > > > > >> should
> > > > > >>>> strongly avoid taking such unnecessary dependencies. The
> Hadoop
> > > and
> > > > > >> HBase
> > > > > >>>> build dependency sets are already very large and we and other
> > > users
> > > > > are
> > > > > >>>> being hit with significant security issue remediation work,
> much
> > > of
> > > > > >> which
> > > > > >>>> represents compatibility problems and is not upstreamable
> (like
> > > > > >> protobuf 2
> > > > > >>>> removal in 2.x). We struggle with the existing dependencies
> > enough
> > > > > >> already
> > > > > >>>> at my employer.
> > > > > >>>>
> > > > > >>>>> On Mar 15, 2023, at 1:53 PM, Sean Busbey <[email protected]>
> > > > wrote:
> > > > > >>>>>
> > > > > >>>>> the check that Stephen is referring to is for logic around
> > lease
> > > > > >>>> recovery
> > > > > >>>>> and not stream flush/sync. the lease recovery is specific to
> > DFS
> > > > > >> IIRC and
> > > > > >>>>> doesn't have a FileSystem marker.
> > > > > >>>>>
> > > > > >>>>>> On Wed, Mar 15, 2023 at 3:22 PM Andrew Purtell <
> > > > [email protected]
> > > > > >>>
> > > > > >>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>> So we can test StreamCapabilities in code, in worst case by
> > > > wrapping
> > > > > >>>> some
> > > > > >>>>>> probe code during startup with try-catch and examining the
> > > > > >> exception.
> > > > > >>>>>>
> > > > > >>>>>>> On Wed, Mar 15, 2023 at 1:09 PM Viraj Jasani <
> > > [email protected]
> > > > >
> > > > > >>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>> As of today, both WAL impl (fshlog and asyncfs) throw
> > > > > >>>>>>> StreamLacksCapabilityException if the FS Data OutputStream
> > > probe
> > > > > >> fails
> > > > > >>>>>> for
> > > > > >>>>>>> Hflush/Hsync:
> > > > > >>>>>>>
> > > > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HFLUSH)
> > > > > >>>>>>> and
> > > > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HSYNC)
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Mar 15, 2023 at 12:51 PM Andrew Purtell <
> > > > > >> [email protected]>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Does Hadoop have a marker interface that lets an
> application
> > > > know
> > > > > >> its
> > > > > >>>>>>>> FileSystem instances can support hsync/hflush? Ideally all
> > we
> > > > > >> should
> > > > > >>>>>> need
> > > > > >>>>>>>> to do is test with instanceof for that marker and use
> > > reflection
> > > > > >> (in
> > > > > >>>>>> the
> > > > > >>>>>>>> worst case) to get a handle to the hsync or hflush method,
> > and
> > > > > >> then
> > > > > >>>>>> call
> > > > > >>>>>>>> it. This approach should be taken wherever we have a
> > > requirement
> > > > > >> to
> > > > > >>>>>> use a
> > > > > >>>>>>>> special WAL specific API provided by the underlying
> > > FileSystem,
> > > > > >> so we
> > > > > >>>>>> can
> > > > > >>>>>>>> abstract it sufficiently to not require a direct
> dependency
> > on
> > > > > >> Ozone
> > > > > >>>> or
> > > > > >>>>>>> S3A
> > > > > >>>>>>>> or any non HDFS filesystem.
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Wed, Mar 15, 2023 at 12:31 PM Tak Lon (Stephen) Wu <
> > > > > >>>>>> [email protected]
> > > > > >>>>>>>>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi team,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Recently, Wei-Chiu and I have been discussing about if
> > HBase
> > > > can
> > > > > >> use
> > > > > >>>>>>>>> Ozone as another storage as WAL (see the hsync and hflush
> > > JIRAs
> > > > > >> [1])
> > > > > >>>>>>>>> and HFile, for HFile it’s pluggable by configuring the
> file
> > > > > >> system to
> > > > > >>>>>>>>> use Ozone File System (Ozone)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> But we found that the WAL it’s a bit different,
> especially
> > > > > >>>>>>>>> RecoverLeaseFSUtils#recoverFileLease [2], it has one
> check
> > > > about
> > > > > >> if
> > > > > >>>>>>>>> the file system is an instance of HDFS, and thus WAL
> > recovery
> > > > to
> > > > > >>>>>>>>> execute file lease recovery from RS crashes. Here, if we
> > > would
> > > > > >> like
> > > > > >>>>>> to
> > > > > >>>>>>>>> add Ozone, it does not matter by importing as a direct
> > > > > >> dependency to
> > > > > >>>>>>>>> perform similar lease recovery or via reflection by class
> > > name
> > > > in
> > > > > >>>>>>>>> plaintext String, we still need to somehow introduce
> Ozone
> > to
> > > > be
> > > > > >>>>>>>>> another supported file system. (we can discuss how we can
> > > > > >> implement
> > > > > >>>>>>>>> better as well)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> We also found other places e.g. FSUtils and HFileSystem
> > have
> > > > used
> > > > > >>>>>>>>> DistributedFileSystem, but it should be able to move them
> > > into
> > > > > >> either
> > > > > >>>>>>>>> hbase-asyncfs or a new FS related component to separate
> the
> > > use
> > > > > >> of
> > > > > >>>>>>>>> different supported file systems.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> So, we’re wondering if anyone would have any objections
> to
> > > > adding
> > > > > >>>>>>>>> Ozone as a dependency to hbase-asyncfs? or if you have a
> > > better
> > > > > >> idea
> > > > > >>>>>>>>> how this could be added without adding Ozone as
> dependency,
> > > > > >> please
> > > > > >>>>>>>>> feel free to comment on this thread.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> [1] Ozone is working on support for hsync and hflush,
> > > > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-7593,
> > > > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-4353
> > > > > >>>>>>>>> [2] RecoverLeaseFSUtils#recoverFileLease,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L53-L63
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks,
> > > > > >>>>>>>>> Stephen
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks,
> > > > > > Viraj
> > > > >
> > > >
> > >
> >
>

Reply via email to