I just mean how do we manage the dependencies. We do not ship hbase with
aws-sdk, but I believe users can put the related jars under lib and then
use S3AFileSystem. I think we can support ozone in the same way?

Viraj Jasani <vjas...@apache.org> 于2023年3月17日周五 13:33写道:

> WAL is still kept on hdfs only, even when hfiles are kept in s3 AFAIK. But
> here it seems, both WAL and HFiles can be kept in Ozone IIUC.
>
>
> On Thu, Mar 16, 2023 at 8:46 PM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > How do we support S3 as HFile storage currently? I do not think we have
> > added aws-sdk as a direct dependency in HBase now?
> >
> > Viraj Jasani <vjas...@apache.org> 于2023年3月17日周五 04:37写道:
> >
> > > +1, similar to what was done in the past for using
> > > HdfsDataOutputStreamBuilder that was available since hadoop 2.9 or so I
> > > think.
> > >
> > >
> > > On Thu, Mar 16, 2023 at 1:04 PM Andrew Purtell <
> andrew.purt...@gmail.com
> > >
> > > wrote:
> > >
> > > > It should be done with reflection rather than take a direct
> dependency,
> > > > until Hadoop common interfaces are available in what we consider the
> > > lowest
> > > > supported version.
> > > >
> > > > > On Mar 16, 2023, at 12:35 PM, Viraj Jasani <vjas...@apache.org>
> > wrote:
> > > > >
> > > > > It would be nice using PathCapabilities to determine lease
> recovery
> > > as a
> > > > > feature flag.
> > > > > In fact, s3a and abfs have lots of feature flags being derived from
> > > this
> > > > > API already. It would be good for dfs and ozone to recognize lease
> > > > recovery
> > > > > as a capability.
> > > > >
> > > > > However, this alone might not be sufficient and something like
> > > > > RecoverableFileSystem interface would be helpful as long as we can
> > > > abstract
> > > > > out lease recovery (and safe mode etc) options as hbase anyways
> need
> > to
> > > > > perform them.
> > > > >
> > > > > Hence, having both: a) path capability to identify if lease
> recovery
> > > etc
> > > > > features are available and b) a new FileSystem interface that both
> > dfs
> > > > and
> > > > > ozone can implement, would be great IMHO. Because even if we just
> > have
> > > > path
> > > > > capability for the feature flag, we would still end up adding ozone
> > > > > dependency (unless done with reflection as Andrew mentioned) to
> > perform
> > > > > lease recovery unless lease recovery is abstracted out somewhere in
> > > > hadoop.
> > > > >
> > > > >> One of the original worries is if the Hadoop/HDFS community
> > > > >> would reject our proposal when we change the base
> interface/abstract
> > > > class
> > > > >> in FileSystem (if it's non-backward compatible).
> > > > >
> > > > > I believe, new IA.Public interface in hadoop that can abstract out
> > > lease
> > > > > recovery etc would have less likelihood of getting rejected than
> > > "making
> > > > > changes in FileSystem directly".
> > > > >
> > > > >
> > > > >> On Thu, Mar 16, 2023 at 2:07 AM Tak Lon (Stephen) Wu <
> > > tak...@apache.org
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> In addition, I'm yet confirm but based on another search in the
> > hadoop
> > > > >> code, we may be able to add recover lease as a feature flag in
> > > > >> CommonPathCapabilities [3] and can be used by the interface of
> > > > >> PathCapabilities#hasPathCapability [4]. (this is similar to
> > > > >> StreamCapabilities as mentioned by Viraj)
> > > > >>
> > > > >> 3.
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonPathCapabilities.java
> > > > >> 4.
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/hadoop/blob/branch-3.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PathCapabilities.java
> > > > >>
> > > > >> -Stephen
> > > > >>
> > > > >>> On Thu, Mar 16, 2023 at 12:00 AM Tak Lon (Stephen) Wu <
> > > > tak...@apache.org>
> > > > >>> wrote:
> > > > >>>
> > > > >>> Thanks everyone ! Sean helped to clarify that something like DFS
> > > > specific
> > > > >>> APIs used by HBase has been in-place in many HBase modules as the
> > > > feature
> > > > >>> implementation but yet standardized in hadoop general FileSystem
> > API,
> > > > >> e.g.
> > > > >>> lease recovery. One of the original worries is if the Hadoop/HDFS
> > > > >> community
> > > > >>> would reject our proposal when we change the base
> > interface/abstract
> > > > >> class
> > > > >>> in FileSystem (if it's non-backward compatible). The discussion
> > here
> > > > >> helps
> > > > >>> to confirm the direction, and let's see how we can make it
> generic
> > > and
> > > > >>> could help to avoid confusion in both places.
> > > > >>>
> > > > >>> Thanks again,
> > > > >>> Stephen
> > > > >>>
> > > > >>> On Wed, Mar 15, 2023 at 2:54 PM Andrew Purtell <
> > > > andrew.purt...@gmail.com
> > > > >>>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Then Hadoop should add one and although we would need a
> reflection
> > > > >> based
> > > > >>>> check in the interim we can converge toward the ideal.
> > > > >>>>
> > > > >>>> In any case I believe we can avoid a direct dependency on Ozone
> > and
> > > > >> should
> > > > >>>> strongly avoid taking such unnecessary dependencies. The Hadoop
> > and
> > > > >> HBase
> > > > >>>> build dependency sets are already very large and we and other
> > users
> > > > are
> > > > >>>> being hit with significant security issue remediation work, much
> > of
> > > > >> which
> > > > >>>> represents compatibility problems and is not upstreamable (like
> > > > >> protobuf 2
> > > > >>>> removal in 2.x). We struggle with the existing dependencies
> enough
> > > > >> already
> > > > >>>> at my employer.
> > > > >>>>
> > > > >>>>> On Mar 15, 2023, at 1:53 PM, Sean Busbey <bus...@apache.org>
> > > wrote:
> > > > >>>>>
> > > > >>>>> the check that Stephen is referring to is for logic around
> lease
> > > > >>>> recovery
> > > > >>>>> and not stream flush/sync. the lease recovery is specific to
> DFS
> > > > >> IIRC and
> > > > >>>>> doesn't have a FileSystem marker.
> > > > >>>>>
> > > > >>>>>> On Wed, Mar 15, 2023 at 3:22 PM Andrew Purtell <
> > > apurt...@apache.org
> > > > >>>
> > > > >>>> wrote:
> > > > >>>>>>
> > > > >>>>>> So we can test StreamCapabilities in code, in worst case by
> > > wrapping
> > > > >>>> some
> > > > >>>>>> probe code during startup with try-catch and examining the
> > > > >> exception.
> > > > >>>>>>
> > > > >>>>>>> On Wed, Mar 15, 2023 at 1:09 PM Viraj Jasani <
> > vjas...@apache.org
> > > >
> > > > >>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>> As of today, both WAL impl (fshlog and asyncfs) throw
> > > > >>>>>>> StreamLacksCapabilityException if the FS Data OutputStream
> > probe
> > > > >> fails
> > > > >>>>>> for
> > > > >>>>>>> Hflush/Hsync:
> > > > >>>>>>>
> > > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HFLUSH)
> > > > >>>>>>> and
> > > > >>>>>>> StreamLacksCapabilityException(StreamCapabilities.HSYNC)
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Mar 15, 2023 at 12:51 PM Andrew Purtell <
> > > > >> apurt...@apache.org>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Does Hadoop have a marker interface that lets an application
> > > know
> > > > >> its
> > > > >>>>>>>> FileSystem instances can support hsync/hflush? Ideally all
> we
> > > > >> should
> > > > >>>>>> need
> > > > >>>>>>>> to do is test with instanceof for that marker and use
> > reflection
> > > > >> (in
> > > > >>>>>> the
> > > > >>>>>>>> worst case) to get a handle to the hsync or hflush method,
> and
> > > > >> then
> > > > >>>>>> call
> > > > >>>>>>>> it. This approach should be taken wherever we have a
> > requirement
> > > > >> to
> > > > >>>>>> use a
> > > > >>>>>>>> special WAL specific API provided by the underlying
> > FileSystem,
> > > > >> so we
> > > > >>>>>> can
> > > > >>>>>>>> abstract it sufficiently to not require a direct dependency
> on
> > > > >> Ozone
> > > > >>>> or
> > > > >>>>>>> S3A
> > > > >>>>>>>> or any non HDFS filesystem.
> > > > >>>>>>>>
> > > > >>>>>>>> On Wed, Mar 15, 2023 at 12:31 PM Tak Lon (Stephen) Wu <
> > > > >>>>>> tak...@apache.org
> > > > >>>>>>>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi team,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Recently, Wei-Chiu and I have been discussing about if
> HBase
> > > can
> > > > >> use
> > > > >>>>>>>>> Ozone as another storage as WAL (see the hsync and hflush
> > JIRAs
> > > > >> [1])
> > > > >>>>>>>>> and HFile, for HFile it’s pluggable by configuring the file
> > > > >> system to
> > > > >>>>>>>>> use Ozone File System (Ozone)
> > > > >>>>>>>>>
> > > > >>>>>>>>> But we found that the WAL it’s a bit different, especially
> > > > >>>>>>>>> RecoverLeaseFSUtils#recoverFileLease [2], it has one check
> > > about
> > > > >> if
> > > > >>>>>>>>> the file system is an instance of HDFS, and thus WAL
> recovery
> > > to
> > > > >>>>>>>>> execute file lease recovery from RS crashes. Here, if we
> > would
> > > > >> like
> > > > >>>>>> to
> > > > >>>>>>>>> add Ozone, it does not matter by importing as a direct
> > > > >> dependency to
> > > > >>>>>>>>> perform similar lease recovery or via reflection by class
> > name
> > > in
> > > > >>>>>>>>> plaintext String, we still need to somehow introduce Ozone
> to
> > > be
> > > > >>>>>>>>> another supported file system. (we can discuss how we can
> > > > >> implement
> > > > >>>>>>>>> better as well)
> > > > >>>>>>>>>
> > > > >>>>>>>>> We also found other places e.g. FSUtils and HFileSystem
> have
> > > used
> > > > >>>>>>>>> DistributedFileSystem, but it should be able to move them
> > into
> > > > >> either
> > > > >>>>>>>>> hbase-asyncfs or a new FS related component to separate the
> > use
> > > > >> of
> > > > >>>>>>>>> different supported file systems.
> > > > >>>>>>>>>
> > > > >>>>>>>>> So, we’re wondering if anyone would have any objections to
> > > adding
> > > > >>>>>>>>> Ozone as a dependency to hbase-asyncfs? or if you have a
> > better
> > > > >> idea
> > > > >>>>>>>>> how this could be added without adding Ozone as dependency,
> > > > >> please
> > > > >>>>>>>>> feel free to comment on this thread.
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> [1] Ozone is working on support for hsync and hflush,
> > > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-7593,
> > > > >>>>>>>>> https://issues.apache.org/jira/browse/HDDS-4353
> > > > >>>>>>>>> [2] RecoverLeaseFSUtils#recoverFileLease,
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L53-L63
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>> Stephen
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Viraj
> > > >
> > >
> >
>

Reply via email to