Thank you everyone for reviewing and voting on the S3Guard feature branch merge.
It looks like the Vote was a success. We have six binding +1's (Steve Loughran, Sean Mackrory, Mingliang Liu, Sanjay Radia, Kihwal Lee, and Lei (Eddy) Xu) and zero -1's. I will coordinate w/ Steve L to get this committed to trunk. I think we are going to bring it to branch-2 as well. -AF On Thu, Aug 24, 2017 at 1:54 PM, Mingliang Liu <lium...@gmail.com> wrote: > Thanks Andrew. Arpit also told me about this but I forgot to bring it up > here. > > Best, > > > On Aug 24, 2017, at 10:59 AM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > > > > FYI that committer +1s are binding on merges, so Sean and Mingliang's +1s > > can be upgraded to binding. > > > > On Thu, Aug 24, 2017 at 6:09 AM, Kihwal Lee <kih...@oath.com.invalid> > wrote: > > > >> +1 (binding) > >> Great work guys! > >> > >> On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran <ste...@hortonworks.com > > > >> wrote: > >> > >>> > >>> On 23 Aug 2017, at 19:21, Aaron Fabbri <fab...@cloudera.com<mailto:fa > >>> b...@cloudera.com>> wrote: > >>> > >>> > >>> On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran < > ste...@hortonworks.com > >> < > >>> mailto:ste...@hortonworks.com>> wrote: > >>> video being processed: https://www.youtube.com/watch? > >>> v=oIe5Zl2YsLE&feature=youtu.be > >>> > >>> > >>> Awesome demo Steve, thanks for doing this. Particularly glad to see > >> folks > >>> using and extending the failure injection client. > >>> > >>> The HADOOP-13786 iteration turns on throttle event generation. All the > >> new > >>> committer stuff is ready for it, but all the existing S3A FS ops react > >> to a > >>> throttle exception by failing, when they need to just back off a bit. > >> This > >>> complicates testing as I have to explicitly turn off fault injection > for > >>> setup & teardown > >>> > >>> > >>> Demoing the CLI tool was great as well. > >>> > >>> > >>> I'm going to have to do another iteration on that CLI tool post-merge, > as > >>> I had one big problem: working out if the bucket and all the binding > >>> settings meant it was "guarded". I think we'll need to track what > issues > >>> like that crop up in the field and add the diagnostics/other options. > >>> > >>> +I think another one that'd be useful would be to enum all s3guard DDB > >>> tables in a region/globally & list their allocated IOPs. I know the AWS > >> UI > >>> can list tables by region, but you need to look around every region to > >> find > >>> out if you've accidentally created one. If you enum all table & look > for > >> a > >>> s3guard version marker, then you can identify tables. > >>> > >>> Wanted to mention two things: > >>> > >>> 1. Authoritative mode is not fully implemented yet with Dynamo (it > needs > >>> to persist an extra bit for directories). I do have an auth-mode patch > >>> (done for a hackathon) that I need to post which shows large > performance > >>> improvements over what S3Guard has today. As you said, we don't > consider > >>> authoritative mode ready for production yet: we want to play with it > more > >>> and improve the prune algorithm first. Authoritative mode can be > thought > >>> of as a nice bonus in the future: The main goal of S3Guard v1 is to fix > >> the > >>> get / list consistency issues you mentioned, which it does well. > >>> > >>> > >>> we need to call that out in the release notes. > >>> > >>> 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early > >>> design and contributed some patches as well. > >>> > >>> > >>> good point. Lei: you will get a special mention the next time I do the > >> demo > >>> > >>> > >>> Again, great demo, enjoyed it! > >>> > >>> -AF > >>> > >>> > >>> its actually quite hard to show any benefits of s3guard on the command > >>> line, so I've ended up showing some scala tests where I turn on the > >>> (bundled) inconsistent AWS client to show how you then need to enable > >>> s3guard to make the stack traces go away > >>> > >>> > >>> On 22 Aug 2017, at 11:17, Steve Loughran <ste...@hortonworks.com< > mailto: > >>> ste...@hortonworks.com><mailto:ste...@hortonworks.com<mailto: > >>> ste...@hortonworks.com>>> wrote: > >>> > >>> +1 (binding) > >>> > >>> I'm happy with it; it's a great piece of work by (in no particular > >> order): > >>> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few > bits > >>> in the corners where I got to break things while they were all asleep. > >> Also > >>> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy > on > >>> the corners of S3, everyone who tested in (including our QA team), > Sanjay > >>> Radia, & others. > >>> > >>> I've already done a couple of iterations of fixing checksyles & code > >>> reviews, so I think it is ready. I also have a branch-2 patch based on > >>> earlier work by Mingliang, for people who want that. > >>> > >>> > >>> > >>> > >>> On 17 Aug 2017, at 23:07, Aaron Fabbri <fab...@cloudera.com<mailto:fa > >>> b...@cloudera.com><mailto:fab...@cloudera.com<mailto:fabbri@ > cloudera.com > >>>>> > >>> wrote: > >>> > >>> Hello, > >>> > >>> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge > >> the > >>> HADOOP-13345 feature branch into trunk. > >>> > >>> This branch contains the new S3Guard feature which adds metadata > >>> consistency features to the S3A client. Formatted site documentation > can > >>> be found here: > >>> > >>> https://github.com/apache/hadoop/blob/HADOOP-13345/ > >>> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md > >>> > >>> The current patch against trunk is posted here: > >>> > >>> https://issues.apache.org/jira/browse/HADOOP-13998 > >>> > >>> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws > >> module: > >>> > >>> - The feature is off by default, and care has been taken to insure it > has > >>> no impact when disabled. > >>> - S3Guard can be enabled with the production database which is backed > by > >>> DynamoDB, or with a local, in-memory implementation that facilitates > >>> integration testing without having to pay for a database. > >>> - getFileStatus() as well as directory listing consistency has been > >>> implemented and thoroughly tested, including delete tracking. > >>> - Convenient Maven profiles for testing with and without S3Guard. > >>> - New failure injection code and integration tests that exercise it. > We > >>> use timers and a wrapper around the Amazon SDK client object to force > >>> consistency delays to occur. This allows us to assert that S3Guard > works > >>> as advertised. This will be extended with more types of failure > >> injection > >>> to continue hardening the S3A client. > >>> > >>> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor > >>> changes: > >>> > >>> - core-default.xml defaults and documentation for s3guard parameters. > >>> - A couple additional FS contract test cases around rename. > >>> - More goodies in LambdaTestUtils > >>> - A new CLI tool for inspecting and manipulating S3Guard features, > >>> including the backing MetadataStore database. > >>> > >>> This branch has seen extensive testing as well as use in production. > >> This > >>> branch makes significant improvements to S3A's test toolkit as well. > >>> > >>> Performance is typically on par with, and in some cases better than, > the > >>> existing S3A code without S3Guard enabled. > >>> > >>> This feature was developed with contributions and feedback from many > >>> people. I'd like to thank everyone who worked on HADOOP-13345 as well > as > >>> all of those who contributed feedback and work on the original design > >>> document. > >>> > >>> This is the first major Apache Hadoop project I've worked on from start > >> to > >>> finish, and I've really enjoyed it. Please shout if I've missed > anything > >>> important here or in the VOTE process. > >>> > >>> Cheers, > >>> Aaron Fabbri > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > <mailto: > >>> common-dev-unsubscr...@hadoop.apache.org><mailto:common-dev- > >>> unsubscr...@hadoop.apache.org<mailto:common-dev-unsubscribe@ > >>> hadoop.apache.org>> > >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >> <mailto: > >>> common-dev-h...@hadoop.apache.org><mailto:comm > >>> on-dev-h...@hadoop.apache.org<mailto:common-dev-h...@hadoop.apache.org > >> > >>> > >>> > >>> > >>> > >>> > >> > >