[ https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827003#comment-16827003 ]
Sean Busbey commented on HBASE-22149: ------------------------------------- Given the combination of this a) needing hadoop 3 only and b) being an experimental approach that we're not sure on sustainability in production I'd much prefer a different repository. Is anyone opposed to landing this in a new repository, i.e. `hbase-filesystem`? Provided it includes instructions for installation / set up we wouldn't even need to add the artifacts from that repository as a dependency for the main repo's binary artifacts. > HBOSS: A FileSystem implementation to provide HBase's required semantics > ------------------------------------------------------------------------ > > Key: HBASE-22149 > URL: https://issues.apache.org/jira/browse/HBASE-22149 > Project: HBase > Issue Type: New Feature > Components: Filesystem Integration > Reporter: Sean Mackrory > Assignee: Sean Mackrory > Priority: Critical > Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, > HBASE-22149-hbase-3.patch, HBASE-22149-hbase-4.patch, > HBASE-22149-hbase-5.patch, HBASE-22149-hbase.patch > > > (Have been using the name HBOSS for HBase / Object Store Semantics) > I've had some thoughts about how to solve the problem of running HBase on > object stores. There has been some thought in the past about adding the > required semantics to S3Guard, but I have some concerns about that. First, > it's mixing complicated solutions to different problems (bridging the gap > between a flat namespace and a hierarchical namespace vs. solving > inconsistency). Second, it's S3-specific, whereas other objects stores could > use virtually identical solutions. And third, we can't do things like atomic > renames in a true sense. There would have to be some trade-offs specific to > HBase's needs and it's better if we can solve that in an HBase-specific > module without mixing all that logic in with the rest of S3A. > Ideas to solve this above the FileSystem layer have been proposed and > considered (HBASE-20431, for one), and maybe that's the right way forward > long-term, but it certainly seems to be a hard problem and hasn't been done > yet. But I don't know enough of all the internal considerations to make much > of a judgment on that myself. > I propose a FileSystem implementation that wraps another FileSystem instance > and provides locking of FileSystem operations to ensure correct semantics. > Locking could quite possibly be done on the same ZooKeeper ensemble as an > HBase cluster already uses (I'm sure there are some performance > considerations here that deserve more attention). I've put together a > proof-of-concept on which I've tested some aspects of atomic renames and > atomic file creates. Both of these tests fail reliably on a naked s3a > instance. I've also done a small YCSB run against a small cluster to sanity > check other functionality and was successful. I will post the patch, and my > laundry list of things that still need work. The WAL is still placed on HDFS, > but the HBase root directory is otherwise on S3. > Note that my prototype is built on Hadoop's source tree right now. That's > purely for my convenience in putting it together quickly, as that's where I > mostly work. I actually think long-term, if this is accepted as a good > solution, it makes sense to live in HBase (or it's own repository). It only > depends on stable, public APIs in Hadoop and is targeted entirely at HBase's > needs, so it should be able to iterate on the HBase community's terms alone. > Another idea [~ste...@apache.org] proposed to me is that of an inode-based > FileSystem that keeps hierarchical metadata in a more appropriate store that > would allow the required transactions (maybe a special table in HBase could > provide that store itself for other tables), and stores the underlying files > with unique identifiers on S3. This allows renames to actually become fast > instead of just large atomic operations. It does however place a strong > dependency on the metadata store. I have not explored this idea much. My > current proof-of-concept has been pleasantly simple, so I think it's the > right solution unless it proves unable to provide the required performance > characteristics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)