All,

Some of AWS's back end services use a version of Accumulo modified to use
Amazon's S3 as its storage system. Amazon engineers forked Accumulo 2.0 and
merged that S3 support into it <https://github.com/cmilbert/accumulo/>.
Chris Milbert is the lead Amazon engineer who did the integration. Chris
and I would like to jump start the conversation about how best to initiate
the pull request for these changes into Accumulo 2.1.

Mike Wall suggested using this as an opportunity to abstract out the
storage system of Accumulo and make it pluggable. He suggested the
following broad steps:

   1. Identify all the things HDFS provides such as read, write,
   replication and failover.
   2. Abstract out a file system interface with hooks for all those things
   (and does not require loading hadoop jars).
   3. Plugin HDFS as the default implementation of that interface, hiding
   all hadoop jars there.
   4. Make another implementation that plugins in S3 and make it optionally
   configured.
   5. Run tests to make sure we didn't break things with HDFS.
   6. Run tests to see if S3 meets all the requirements.

Ed Coleman also suggested first forking Accumulo 2.1 and merging the S3
changes into it.

Chris and I look forward to the discussion on how best to add S3 support to
Accumulo.

Thanks,
Jeff
-- 
Jeff Kubina

Reply via email to