Hi Jeff, what would be the difference between this path, and what can be accomplished by using a Hadoop FileSystem interface based connector to talk to S3? Is it because of the consistency limitations with s3a:// (https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html)?
As you probably know for Azure, we went with the abfss:// connector provided as part of hadoop-azure (https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html) with minimal effort. Just wondering what the key difference here is for S3. Thanks! Arvind. -----Original Message----- From: Jeff Kubina <jeff.kub...@gmail.com> Sent: Tuesday, July 27, 2021 10:16 AM To: dev@accumulo.apache.org Subject: [EXTERNAL] Accumulo with Native S3 Support All, Some of AWS's back end services use a version of Accumulo modified to use Amazon's S3 as its storage system. Amazon engineers forked Accumulo 2.0 and merged that S3 support into it <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcmilbert%2Faccumulo%2F&data=04%7C01%7Carvindsh%40microsoft.com%7C9b8c533f2a85467b90c008d95122491f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637630030450339294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WvhjAgkOZMRVM%2B2KzXH8ZvDU2ZsFxaw%2BFUPtupsNNbs%3D&reserved=0>. Chris Milbert is the lead Amazon engineer who did the integration. Chris and I would like to jump start the conversation about how best to initiate the pull request for these changes into Accumulo 2.1. Mike Wall suggested using this as an opportunity to abstract out the storage system of Accumulo and make it pluggable. He suggested the following broad steps: 1. Identify all the things HDFS provides such as read, write, replication and failover. 2. Abstract out a file system interface with hooks for all those things (and does not require loading hadoop jars). 3. Plugin HDFS as the default implementation of that interface, hiding all hadoop jars there. 4. Make another implementation that plugins in S3 and make it optionally configured. 5. Run tests to make sure we didn't break things with HDFS. 6. Run tests to see if S3 meets all the requirements. Ed Coleman also suggested first forking Accumulo 2.1 and merging the S3 changes into it. Chris and I look forward to the discussion on how best to add S3 support to Accumulo. Thanks, Jeff -- Jeff Kubina