Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-09-01 Thread Steve Loughran

On 25 Aug 2017, at 20:22, Aaron Fabbri 
> wrote:

Thank you everyone for reviewing and voting on the S3Guard feature branch merge.

It looks like the Vote was a success. We have six binding +1's (Steve Loughran, 
Sean Mackrory, Mingliang Liu, Sanjay Radia, Kihwal Lee, and Lei (Eddy) Xu) and 
zero -1's.

I will coordinate w/ Steve L to get this committed to trunk.  I think we are 
going to bring it to branch-2 as well.

-AF




Update: this is now committed to trunk!


This was a major piece of work —and it's been a great time working with people.

Chris Nauroth, Aaron Fabbri, Mingliang Liu, Lei (Eddy) Xu, Sean Mackrory, & 
others, as well as the effort of everyone who tested this, helped with the 
documentation, complained when it broke, etc.

Special mention: Thomas Demoor & Ewan Higgs for explaining the low-level 
details of S3 protocols in a way that AWS themselves don't document.

It's gone in as one big patch & not many small ones; we'd always planned it 
that way and had regularly merged trunk into the branch, its got regressions 
and fixes of them in. it'd be a mess. Now it's a single patch and you know who 
to complain to when it doesn't work. Sorry.


What next in S3A land, well, let me see

* HADOOP-14825 is where all the unfinished S3Guard work goes, with HADOOP-14220 
being some CLI improvements I've been adding based on recent use.
* HADOOP-13786 is my big "0-rename committer". It's been a regularly rebased 
branch atop the HADOOP-13345 branch, alongside an external downstream module to 
test the spark integration (we know the mapred v2 API stuff works, its only 
Spark & Parquet which doesn't play).
* With S3Guard in, you can now turn on listing inconsistency in the client; in 
HADOOP-13786 I've added more fault injection in the form of "service throttled" 
responses. S3A doesn't handle them yet, which needs to be fixed not just in the 
new commit operations (which do), but in every single FileSystem API call. Same 
for other failures. Hence 
HADOOP-14531 .

As usual, people willing to code, document & test welcome. Go on, download 
trunk, test with s3guard enabled: now is the time to complain that things don't 
work!

-Steve.


Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-25 Thread Aaron Fabbri
Thank you everyone for reviewing and voting on the S3Guard feature branch
merge.

It looks like the Vote was a success. We have six binding +1's (Steve
Loughran, Sean Mackrory, Mingliang Liu, Sanjay Radia, Kihwal Lee, and Lei
(Eddy) Xu) and zero -1's.

I will coordinate w/ Steve L to get this committed to trunk.  I think we
are going to bring it to branch-2 as well.

-AF


On Thu, Aug 24, 2017 at 1:54 PM, Mingliang Liu  wrote:

> Thanks Andrew. Arpit also told me about this but I forgot to bring it up
> here.
>
> Best,
>
> > On Aug 24, 2017, at 10:59 AM, Andrew Wang 
> wrote:
> >
> > FYI that committer +1s are binding on merges, so Sean and Mingliang's +1s
> > can be upgraded to binding.
> >
> > On Thu, Aug 24, 2017 at 6:09 AM, Kihwal Lee 
> wrote:
> >
> >> +1 (binding)
> >> Great work guys!
> >>
> >> On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran  >
> >> wrote:
> >>
> >>>
> >>> On 23 Aug 2017, at 19:21, Aaron Fabbri  >>> b...@cloudera.com>> wrote:
> >>>
> >>>
> >>> On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran <
> ste...@hortonworks.com
> >> <
> >>> mailto:ste...@hortonworks.com>> wrote:
> >>> video being processed:  https://www.youtube.com/watch?
> >>> v=oIe5Zl2YsLE=youtu.be
> >>>
> >>>
> >>> Awesome demo Steve, thanks for doing this.  Particularly glad to see
> >> folks
> >>> using and extending the failure injection client.
> >>>
> >>> The HADOOP-13786 iteration turns on throttle event generation. All the
> >> new
> >>> committer stuff is ready for it, but all the existing S3A FS ops react
> >> to a
> >>> throttle exception by failing, when they need to just back off a bit.
> >> This
> >>> complicates testing as I have to explicitly turn off fault injection
> for
> >>> setup & teardown
> >>>
> >>>
> >>> Demoing the CLI tool was great as well.
> >>>
> >>>
> >>> I'm going to have to do another iteration on that CLI tool post-merge,
> as
> >>> I had one big problem: working out if the bucket and all the binding
> >>> settings meant it was "guarded". I think we'll need to track what
> issues
> >>> like that crop up in the field and add the diagnostics/other options.
> >>>
> >>> +I think another one that'd be useful would be to enum all s3guard DDB
> >>> tables in a region/globally & list their allocated IOPs. I know the AWS
> >> UI
> >>> can list tables by region, but you need to look around every region to
> >> find
> >>> out if you've accidentally created one. If you enum all table & look
> for
> >> a
> >>> s3guard version marker, then you can identify tables.
> >>>
> >>> Wanted to mention two things:
> >>>
> >>> 1. Authoritative mode is not fully implemented yet with Dynamo (it
> needs
> >>> to persist an extra bit for directories).  I do have an auth-mode patch
> >>> (done for a hackathon) that I need to post which shows large
> performance
> >>> improvements over what S3Guard has today.  As you said, we don't
> consider
> >>> authoritative mode ready for production yet: we want to play with it
> more
> >>> and improve the prune algorithm first.  Authoritative mode can be
> thought
> >>> of as a nice bonus in the future: The main goal of S3Guard v1 is to fix
> >> the
> >>> get / list consistency issues you mentioned, which it does well.
> >>>
> >>>
> >>> we need to call that out in the release notes.
> >>>
> >>> 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
> >>> design and contributed some patches as well.
> >>>
> >>>
> >>> good point. Lei: you will get a special mention the next time I do the
> >> demo
> >>>
> >>>
> >>> Again, great demo, enjoyed it!
> >>>
> >>> -AF
> >>>
> >>>
> >>> its actually quite hard to show any benefits of s3guard on the command
> >>> line, so I've ended up showing some scala tests where I turn on the
> >>> (bundled) inconsistent AWS client to show how you then need to enable
> >>> s3guard to make the stack traces go away
> >>>
> >>>
> >>> On 22 Aug 2017, at 11:17, Steve Loughran  mailto:
> >>> ste...@hortonworks.com>>> ste...@hortonworks.com>>> wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> I'm happy with it; it's a great piece of work by (in no particular
> >> order):
> >>> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few
> bits
> >>> in the corners where I got to break things while they were all asleep.
> >> Also
> >>> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy
> on
> >>> the corners of S3, everyone who tested in (including our QA team),
> Sanjay
> >>> Radia, & others.
> >>>
> >>> I've already done a couple of iterations of fixing checksyles & code
> >>> reviews, so I think it is ready. I also have a branch-2 patch based on
> >>> earlier work by Mingliang, for people who want that.
> >>>
> >>>
> >>>
> >>>
> >>> On 17 Aug 2017, at 23:07, Aaron Fabbri  >>> 

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-24 Thread Mingliang Liu
Thanks Andrew. Arpit also told me about this but I forgot to bring it up here.

Best,

> On Aug 24, 2017, at 10:59 AM, Andrew Wang  wrote:
> 
> FYI that committer +1s are binding on merges, so Sean and Mingliang's +1s
> can be upgraded to binding.
> 
> On Thu, Aug 24, 2017 at 6:09 AM, Kihwal Lee  wrote:
> 
>> +1 (binding)
>> Great work guys!
>> 
>> On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran 
>> wrote:
>> 
>>> 
>>> On 23 Aug 2017, at 19:21, Aaron Fabbri >> b...@cloudera.com>> wrote:
>>> 
>>> 
>>> On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran > <
>>> mailto:ste...@hortonworks.com>> wrote:
>>> video being processed:  https://www.youtube.com/watch?
>>> v=oIe5Zl2YsLE=youtu.be
>>> 
>>> 
>>> Awesome demo Steve, thanks for doing this.  Particularly glad to see
>> folks
>>> using and extending the failure injection client.
>>> 
>>> The HADOOP-13786 iteration turns on throttle event generation. All the
>> new
>>> committer stuff is ready for it, but all the existing S3A FS ops react
>> to a
>>> throttle exception by failing, when they need to just back off a bit.
>> This
>>> complicates testing as I have to explicitly turn off fault injection for
>>> setup & teardown
>>> 
>>> 
>>> Demoing the CLI tool was great as well.
>>> 
>>> 
>>> I'm going to have to do another iteration on that CLI tool post-merge, as
>>> I had one big problem: working out if the bucket and all the binding
>>> settings meant it was "guarded". I think we'll need to track what issues
>>> like that crop up in the field and add the diagnostics/other options.
>>> 
>>> +I think another one that'd be useful would be to enum all s3guard DDB
>>> tables in a region/globally & list their allocated IOPs. I know the AWS
>> UI
>>> can list tables by region, but you need to look around every region to
>> find
>>> out if you've accidentally created one. If you enum all table & look for
>> a
>>> s3guard version marker, then you can identify tables.
>>> 
>>> Wanted to mention two things:
>>> 
>>> 1. Authoritative mode is not fully implemented yet with Dynamo (it needs
>>> to persist an extra bit for directories).  I do have an auth-mode patch
>>> (done for a hackathon) that I need to post which shows large performance
>>> improvements over what S3Guard has today.  As you said, we don't consider
>>> authoritative mode ready for production yet: we want to play with it more
>>> and improve the prune algorithm first.  Authoritative mode can be thought
>>> of as a nice bonus in the future: The main goal of S3Guard v1 is to fix
>> the
>>> get / list consistency issues you mentioned, which it does well.
>>> 
>>> 
>>> we need to call that out in the release notes.
>>> 
>>> 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
>>> design and contributed some patches as well.
>>> 
>>> 
>>> good point. Lei: you will get a special mention the next time I do the
>> demo
>>> 
>>> 
>>> Again, great demo, enjoyed it!
>>> 
>>> -AF
>>> 
>>> 
>>> its actually quite hard to show any benefits of s3guard on the command
>>> line, so I've ended up showing some scala tests where I turn on the
>>> (bundled) inconsistent AWS client to show how you then need to enable
>>> s3guard to make the stack traces go away
>>> 
>>> 
>>> On 22 Aug 2017, at 11:17, Steve Loughran > ste...@hortonworks.com>> ste...@hortonworks.com>>> wrote:
>>> 
>>> +1 (binding)
>>> 
>>> I'm happy with it; it's a great piece of work by (in no particular
>> order):
>>> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
>>> in the corners where I got to break things while they were all asleep.
>> Also
>>> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
>>> the corners of S3, everyone who tested in (including our QA team), Sanjay
>>> Radia, & others.
>>> 
>>> I've already done a couple of iterations of fixing checksyles & code
>>> reviews, so I think it is ready. I also have a branch-2 patch based on
>>> earlier work by Mingliang, for people who want that.
>>> 
>>> 
>>> 
>>> 
>>> On 17 Aug 2017, at 23:07, Aaron Fabbri >> b...@cloudera.com> 
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
>> the
>>> HADOOP-13345 feature branch into trunk.
>>> 
>>> This branch contains the new S3Guard feature which adds metadata
>>> consistency features to the S3A client.  Formatted site documentation can
>>> be found here:
>>> 
>>> https://github.com/apache/hadoop/blob/HADOOP-13345/
>>> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>>> 
>>> The current patch against trunk is posted here:
>>> 
>>> https://issues.apache.org/jira/browse/HADOOP-13998
>>> 
>>> The branch modifies the s3a portion of the 

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-24 Thread Andrew Wang
FYI that committer +1s are binding on merges, so Sean and Mingliang's +1s
can be upgraded to binding.

On Thu, Aug 24, 2017 at 6:09 AM, Kihwal Lee  wrote:

> +1 (binding)
> Great work guys!
>
> On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran 
> wrote:
>
> >
> > On 23 Aug 2017, at 19:21, Aaron Fabbri  > b...@cloudera.com>> wrote:
> >
> >
> > On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran  <
> > mailto:ste...@hortonworks.com>> wrote:
> > video being processed:  https://www.youtube.com/watch?
> > v=oIe5Zl2YsLE=youtu.be
> >
> >
> > Awesome demo Steve, thanks for doing this.  Particularly glad to see
> folks
> > using and extending the failure injection client.
> >
> > The HADOOP-13786 iteration turns on throttle event generation. All the
> new
> > committer stuff is ready for it, but all the existing S3A FS ops react
> to a
> > throttle exception by failing, when they need to just back off a bit.
> This
> > complicates testing as I have to explicitly turn off fault injection for
> > setup & teardown
> >
> >
> > Demoing the CLI tool was great as well.
> >
> >
> > I'm going to have to do another iteration on that CLI tool post-merge, as
> > I had one big problem: working out if the bucket and all the binding
> > settings meant it was "guarded". I think we'll need to track what issues
> > like that crop up in the field and add the diagnostics/other options.
> >
> > +I think another one that'd be useful would be to enum all s3guard DDB
> > tables in a region/globally & list their allocated IOPs. I know the AWS
> UI
> > can list tables by region, but you need to look around every region to
> find
> > out if you've accidentally created one. If you enum all table & look for
> a
> > s3guard version marker, then you can identify tables.
> >
> > Wanted to mention two things:
> >
> > 1. Authoritative mode is not fully implemented yet with Dynamo (it needs
> > to persist an extra bit for directories).  I do have an auth-mode patch
> > (done for a hackathon) that I need to post which shows large performance
> > improvements over what S3Guard has today.  As you said, we don't consider
> > authoritative mode ready for production yet: we want to play with it more
> > and improve the prune algorithm first.  Authoritative mode can be thought
> > of as a nice bonus in the future: The main goal of S3Guard v1 is to fix
> the
> > get / list consistency issues you mentioned, which it does well.
> >
> >
> > we need to call that out in the release notes.
> >
> > 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
> > design and contributed some patches as well.
> >
> >
> > good point. Lei: you will get a special mention the next time I do the
> demo
> >
> >
> > Again, great demo, enjoyed it!
> >
> > -AF
> >
> >
> > its actually quite hard to show any benefits of s3guard on the command
> > line, so I've ended up showing some scala tests where I turn on the
> > (bundled) inconsistent AWS client to show how you then need to enable
> > s3guard to make the stack traces go away
> >
> >
> > On 22 Aug 2017, at 11:17, Steve Loughran  ste...@hortonworks.com> ste...@hortonworks.com>>> wrote:
> >
> > +1 (binding)
> >
> > I'm happy with it; it's a great piece of work by (in no particular
> order):
> > Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
> > in the corners where I got to break things while they were all asleep.
> Also
> > deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
> > the corners of S3, everyone who tested in (including our QA team), Sanjay
> > Radia, & others.
> >
> > I've already done a couple of iterations of fixing checksyles & code
> > reviews, so I think it is ready. I also have a branch-2 patch based on
> > earlier work by Mingliang, for people who want that.
> >
> >
> >
> >
> > On 17 Aug 2017, at 23:07, Aaron Fabbri  > b...@cloudera.com> >>>
> > wrote:
> >
> > Hello,
> >
> > I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
> the
> > HADOOP-13345 feature branch into trunk.
> >
> > This branch contains the new S3Guard feature which adds metadata
> > consistency features to the S3A client.  Formatted site documentation can
> > be found here:
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > The current patch against trunk is posted here:
> >
> > https://issues.apache.org/jira/browse/HADOOP-13998
> >
> > The branch modifies the s3a portion of the hadoop-tools/hadoop-aws
> module:
> >
> > - The feature is off by default, and care has been taken to insure it has
> > no impact when disabled.
> > - S3Guard can be enabled with the production database which is backed by
> > DynamoDB, or with a local, 

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-24 Thread Kihwal Lee
+1 (binding)
Great work guys!

On Thu, Aug 24, 2017 at 5:01 AM, Steve Loughran 
wrote:

>
> On 23 Aug 2017, at 19:21, Aaron Fabbri  b...@cloudera.com>> wrote:
>
>
> On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran  mailto:ste...@hortonworks.com>> wrote:
> video being processed:  https://www.youtube.com/watch?
> v=oIe5Zl2YsLE=youtu.be
>
>
> Awesome demo Steve, thanks for doing this.  Particularly glad to see folks
> using and extending the failure injection client.
>
> The HADOOP-13786 iteration turns on throttle event generation. All the new
> committer stuff is ready for it, but all the existing S3A FS ops react to a
> throttle exception by failing, when they need to just back off a bit. This
> complicates testing as I have to explicitly turn off fault injection for
> setup & teardown
>
>
> Demoing the CLI tool was great as well.
>
>
> I'm going to have to do another iteration on that CLI tool post-merge, as
> I had one big problem: working out if the bucket and all the binding
> settings meant it was "guarded". I think we'll need to track what issues
> like that crop up in the field and add the diagnostics/other options.
>
> +I think another one that'd be useful would be to enum all s3guard DDB
> tables in a region/globally & list their allocated IOPs. I know the AWS UI
> can list tables by region, but you need to look around every region to find
> out if you've accidentally created one. If you enum all table & look for a
> s3guard version marker, then you can identify tables.
>
> Wanted to mention two things:
>
> 1. Authoritative mode is not fully implemented yet with Dynamo (it needs
> to persist an extra bit for directories).  I do have an auth-mode patch
> (done for a hackathon) that I need to post which shows large performance
> improvements over what S3Guard has today.  As you said, we don't consider
> authoritative mode ready for production yet: we want to play with it more
> and improve the prune algorithm first.  Authoritative mode can be thought
> of as a nice bonus in the future: The main goal of S3Guard v1 is to fix the
> get / list consistency issues you mentioned, which it does well.
>
>
> we need to call that out in the release notes.
>
> 2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
> design and contributed some patches as well.
>
>
> good point. Lei: you will get a special mention the next time I do the demo
>
>
> Again, great demo, enjoyed it!
>
> -AF
>
>
> its actually quite hard to show any benefits of s3guard on the command
> line, so I've ended up showing some scala tests where I turn on the
> (bundled) inconsistent AWS client to show how you then need to enable
> s3guard to make the stack traces go away
>
>
> On 22 Aug 2017, at 11:17, Steve Loughran >> wrote:
>
> +1 (binding)
>
> I'm happy with it; it's a great piece of work by (in no particular order):
> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
> in the corners where I got to break things while they were all asleep. Also
> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
> the corners of S3, everyone who tested in (including our QA team), Sanjay
> Radia, & others.
>
> I've already done a couple of iterations of fixing checksyles & code
> reviews, so I think it is ready. I also have a branch-2 patch based on
> earlier work by Mingliang, for people who want that.
>
>
>
>
> On 17 Aug 2017, at 23:07, Aaron Fabbri  b...@cloudera.com>>>
> wrote:
>
> Hello,
>
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
>
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> The current patch against trunk is posted here:
>
> https://issues.apache.org/jira/browse/HADOOP-13998
>
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
>
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-24 Thread Steve Loughran

On 23 Aug 2017, at 19:21, Aaron Fabbri 
> wrote:


On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran 
> wrote:
video being processed:  
https://www.youtube.com/watch?v=oIe5Zl2YsLE=youtu.be


Awesome demo Steve, thanks for doing this.  Particularly glad to see folks 
using and extending the failure injection client.

The HADOOP-13786 iteration turns on throttle event generation. All the new 
committer stuff is ready for it, but all the existing S3A FS ops react to a 
throttle exception by failing, when they need to just back off a bit. This 
complicates testing as I have to explicitly turn off fault injection for setup 
& teardown


Demoing the CLI tool was great as well.


I'm going to have to do another iteration on that CLI tool post-merge, as I had 
one big problem: working out if the bucket and all the binding settings meant 
it was "guarded". I think we'll need to track what issues like that crop up in 
the field and add the diagnostics/other options.

+I think another one that'd be useful would be to enum all s3guard DDB tables 
in a region/globally & list their allocated IOPs. I know the AWS UI can list 
tables by region, but you need to look around every region to find out if 
you've accidentally created one. If you enum all table & look for a s3guard 
version marker, then you can identify tables.

Wanted to mention two things:

1. Authoritative mode is not fully implemented yet with Dynamo (it needs to 
persist an extra bit for directories).  I do have an auth-mode patch (done for 
a hackathon) that I need to post which shows large performance improvements 
over what S3Guard has today.  As you said, we don't consider authoritative mode 
ready for production yet: we want to play with it more and improve the prune 
algorithm first.  Authoritative mode can be thought of as a nice bonus in the 
future: The main goal of S3Guard v1 is to fix the get / list consistency issues 
you mentioned, which it does well.


we need to call that out in the release notes.

2. Also wanted to thank Lei (Eddy) Xu, he was very active during early design 
and contributed some patches as well.


good point. Lei: you will get a special mention the next time I do the demo


Again, great demo, enjoyed it!

-AF


its actually quite hard to show any benefits of s3guard on the command line, so 
I've ended up showing some scala tests where I turn on the (bundled) 
inconsistent AWS client to show how you then need to enable s3guard to make the 
stack traces go away


On 22 Aug 2017, at 11:17, Steve Loughran 
>>
 wrote:

+1 (binding)

I'm happy with it; it's a great piece of work by (in no particular order): 
Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits in 
the corners where I got to break things while they were all asleep. Also 
deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on the 
corners of S3, everyone who tested in (including our QA team), Sanjay Radia, & 
others.

I've already done a couple of iterations of fixing checksyles & code reviews, 
so I think it is ready. I also have a branch-2 patch based on earlier work by 
Mingliang, for people who want that.




On 17 Aug 2017, at 23:07, Aaron Fabbri 
>>
 wrote:

Hello,

I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
HADOOP-13345 feature branch into trunk.

This branch contains the new S3Guard feature which adds metadata
consistency features to the S3A client.  Formatted site documentation can
be found here:

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

The current patch against trunk is posted here:

https://issues.apache.org/jira/browse/HADOOP-13998

The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:

- The feature is off by default, and care has been taken to insure it has
no impact when disabled.
- S3Guard can be enabled with the production database which is backed by
DynamoDB, or with a local, in-memory implementation that facilitates
integration testing without having to pay for a database.
- getFileStatus() as well as directory listing consistency has been
implemented and thoroughly tested, including delete tracking.
- Convenient Maven profiles for testing with and without S3Guard.
- New failure injection code and integration tests that exercise it.  We
use timers and a wrapper around the Amazon SDK client object to force
consistency delays to occur.  This allows us to assert that S3Guard works
as advertised.  This will be extended with more types of failure injection
to continue hardening the S3A client.

Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-23 Thread sanjay Radia

+1 (binding)
Thanks  community for all the hard that went into this critical piece of work.


sanjay
> 
> 
> On 22 Aug 2017, at 11:17, Steve Loughran 
> > wrote:
> 
> +1 (binding)
> 
> I'm happy with it; it's a great piece of work by (in no particular order): 
> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits in 
> the corners where I got to break things while they were all asleep. Also 
> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on the 
> corners of S3, everyone who tested in (including our QA team), Sanjay Radia, 
> & others.
> 
> I've already done a couple of iterations of fixing checksyles & code reviews, 
> so I think it is ready. I also have a branch-2 patch based on earlier work by 
> Mingliang, for people who want that.
> 
> 
> 
> 
> On 17 Aug 2017, at 23:07, Aaron Fabbri 
> > wrote:
> 
> Hello,
> 
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
> 
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
> 
> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> 
> The current patch against trunk is posted here:
> 
> https://issues.apache.org/jira/browse/HADOOP-13998
> 
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
> 
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
> 
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
> 
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
> 
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
> 
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
> 
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
> 
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
> 
> Cheers,
> Aaron Fabbri
> 
> 
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org
> 
> 


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-23 Thread Aaron Fabbri
On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran 
wrote:

> video being processed:  https://www.youtube.com/watch?
> v=oIe5Zl2YsLE=youtu.be
>
>
Awesome demo Steve, thanks for doing this.  Particularly glad to see folks
using and extending the failure injection client.  Demoing the CLI tool was
great as well.

Wanted to mention two things:

1. Authoritative mode is not fully implemented yet with Dynamo (it needs to
persist an extra bit for directories).  I do have an auth-mode patch (done
for a hackathon) that I need to post which shows large performance
improvements over what S3Guard has today.  As you said, we don't consider
authoritative mode ready for production yet: we want to play with it more
and improve the prune algorithm first.  Authoritative mode can be thought
of as a nice bonus in the future: The main goal of S3Guard v1 is to fix the
get / list consistency issues you mentioned, which it does well.

2. Also wanted to thank Lei (Eddy) Xu, he was very active during early
design and contributed some patches as well.

Again, great demo, enjoyed it!

-AF



> its actually quite hard to show any benefits of s3guard on the command
> line, so I've ended up showing some scala tests where I turn on the
> (bundled) inconsistent AWS client to show how you then need to enable
> s3guard to make the stack traces go away
>
>
> On 22 Aug 2017, at 11:17, Steve Loughran > wrote:
>
> +1 (binding)
>
> I'm happy with it; it's a great piece of work by (in no particular order):
> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
> in the corners where I got to break things while they were all asleep. Also
> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
> the corners of S3, everyone who tested in (including our QA team), Sanjay
> Radia, & others.
>
> I've already done a couple of iterations of fixing checksyles & code
> reviews, so I think it is ready. I also have a branch-2 patch based on
> earlier work by Mingliang, for people who want that.
>
>
>
>
> On 17 Aug 2017, at 23:07, Aaron Fabbri  b...@cloudera.com>> wrote:
>
> Hello,
>
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
>
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> The current patch against trunk is posted here:
>
> https://issues.apache.org/jira/browse/HADOOP-13998
>
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
>
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
>
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
>
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
>
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
>
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
>
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
>
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
>
> Cheers,
> Aaron Fabbri
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: 

Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-22 Thread Sean Mackrory
+1 (non-binding)

On Tue, Aug 22, 2017 at 10:24 AM, Steve Loughran 
wrote:

> video being processed:  https://www.youtube.com/watch?
> v=oIe5Zl2YsLE=youtu.be
>
> its actually quite hard to show any benefits of s3guard on the command
> line, so I've ended up showing some scala tests where I turn on the
> (bundled) inconsistent AWS client to show how you then need to enable
> s3guard to make the stack traces go away
>
>
> On 22 Aug 2017, at 11:17, Steve Loughran > wrote:
>
> +1 (binding)
>
> I'm happy with it; it's a great piece of work by (in no particular order):
> Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits
> in the corners where I got to break things while they were all asleep. Also
> deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on
> the corners of S3, everyone who tested in (including our QA team), Sanjay
> Radia, & others.
>
> I've already done a couple of iterations of fixing checksyles & code
> reviews, so I think it is ready. I also have a branch-2 patch based on
> earlier work by Mingliang, for people who want that.
>
>
>
>
> On 17 Aug 2017, at 23:07, Aaron Fabbri  b...@cloudera.com>> wrote:
>
> Hello,
>
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
>
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> The current patch against trunk is posted here:
>
> https://issues.apache.org/jira/browse/HADOOP-13998
>
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
>
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
>
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
>
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
>
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
>
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
>
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
>
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
>
> Cheers,
> Aaron Fabbri
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org common-dev-h...@hadoop.apache.org>
>
>
>


Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-22 Thread Steve Loughran
video being processed:  
https://www.youtube.com/watch?v=oIe5Zl2YsLE=youtu.be

its actually quite hard to show any benefits of s3guard on the command line, so 
I've ended up showing some scala tests where I turn on the (bundled) 
inconsistent AWS client to show how you then need to enable s3guard to make the 
stack traces go away


On 22 Aug 2017, at 11:17, Steve Loughran 
> wrote:

+1 (binding)

I'm happy with it; it's a great piece of work by (in no particular order): 
Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits in 
the corners where I got to break things while they were all asleep. Also 
deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on the 
corners of S3, everyone who tested in (including our QA team), Sanjay Radia, & 
others.

I've already done a couple of iterations of fixing checksyles & code reviews, 
so I think it is ready. I also have a branch-2 patch based on earlier work by 
Mingliang, for people who want that.




On 17 Aug 2017, at 23:07, Aaron Fabbri 
> wrote:

Hello,

I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
HADOOP-13345 feature branch into trunk.

This branch contains the new S3Guard feature which adds metadata
consistency features to the S3A client.  Formatted site documentation can
be found here:

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

The current patch against trunk is posted here:

https://issues.apache.org/jira/browse/HADOOP-13998

The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:

- The feature is off by default, and care has been taken to insure it has
no impact when disabled.
- S3Guard can be enabled with the production database which is backed by
DynamoDB, or with a local, in-memory implementation that facilitates
integration testing without having to pay for a database.
- getFileStatus() as well as directory listing consistency has been
implemented and thoroughly tested, including delete tracking.
- Convenient Maven profiles for testing with and without S3Guard.
- New failure injection code and integration tests that exercise it.  We
use timers and a wrapper around the Amazon SDK client object to force
consistency delays to occur.  This allows us to assert that S3Guard works
as advertised.  This will be extended with more types of failure injection
to continue hardening the S3A client.

Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
changes:

- core-default.xml defaults and documentation for s3guard parameters.
- A couple additional FS contract test cases around rename.
- More goodies in LambdaTestUtils
- A new CLI tool for inspecting and manipulating S3Guard features,
including the backing MetadataStore database.

This branch has seen extensive testing as well as use in production.  This
branch makes significant improvements to S3A's test toolkit as well.

Performance is typically on par with, and in some cases better than, the
existing S3A code without S3Guard enabled.

This feature was developed with contributions and feedback from many
people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
all of those who contributed feedback and work on the original design
document.

This is the first major Apache Hadoop project I've worked on from start to
finish, and I've really enjoyed it.  Please shout if I've missed anything
important here or in the VOTE process.

Cheers,
Aaron Fabbri


-
To unsubscribe, e-mail: 
common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org




Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-22 Thread Steve Loughran
+1 (binding)

I'm happy with it; it's a great piece of work by (in no particular order): 
Chris Nauroth, Aaron Fabbri, Sean McRory & Mingliang Liu. plus a few bits in 
the corners where I got to break things while they were all asleep. Also 
deserving a mention: Thomas Demoor & Ewan Higgs @ WDC for consultancy on the 
corners of S3, everyone who tested in (including our QA team), Sanjay Radia, & 
others.

I've already done a couple of iterations of fixing checksyles & code reviews, 
so I think it is ready. I also have a branch-2 patch based on earlier work by 
Mingliang, for people who want that.


 

> On 17 Aug 2017, at 23:07, Aaron Fabbri  wrote:
> 
> Hello,
> 
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
> 
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
> 
> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> 
> The current patch against trunk is posted here:
> 
> https://issues.apache.org/jira/browse/HADOOP-13998
> 
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
> 
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
> 
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
> 
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
> 
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
> 
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
> 
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
> 
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
> 
> Cheers,
> Aaron Fabbri


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-21 Thread Lei Xu
+1 (binding)

S3guard significantly extends the use cases of Hadoop with S3. I was
involved in the early days of the development.  Really appreciate the
high quality work and collaborative environment from the Hadoop
community.

Best,

On Mon, Aug 21, 2017 at 1:09 PM, Aaron Fabbri  wrote:
> + Eddy Xu (having list issues)
>
>
>
> On Mon, Aug 21, 2017 at 8:10 AM, Steve Loughran 
> wrote:
>>
>>
>> On 18 Aug 2017, at 17:51, John Zhuge
>> > wrote:
>>
>> That will be great. Please record it if possible.
>>
>> good idea. I'll do a video & demo, that way I can avoid fielding hard
>> questions
>>
>
> Hah!
>
> -AF
>
>
>>
>>
>> On Fri, Aug 18, 2017 at 4:12 AM, Steve Loughran
>> > wrote:
>>
>> I can do a demo of this next week if people are interested
>>
>> > On 17 Aug 2017, at 23:07, Aaron Fabbri
>> > > wrote:
>> >
>> > Hello,
>> >
>> > I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
>> > the
>> > HADOOP-13345 feature branch into trunk.
>> >
>> > This branch contains the new S3Guard feature which adds metadata
>> > consistency features to the S3A client.  Formatted site documentation
>> > can
>> > be found here:
>> >
>> >
>> > https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>> >
>> > The current patch against trunk is posted here:
>> >
>> > https://issues.apache.org/jira/browse/HADOOP-13998
>> >
>> > The branch modifies the s3a portion of the hadoop-tools/hadoop-aws
>> > module:
>> >
>> > - The feature is off by default, and care has been taken to insure it
>> > has
>> > no impact when disabled.
>> > - S3Guard can be enabled with the production database which is backed by
>> > DynamoDB, or with a local, in-memory implementation that facilitates
>> > integration testing without having to pay for a database.
>> > - getFileStatus() as well as directory listing consistency has been
>> > implemented and thoroughly tested, including delete tracking.
>> > - Convenient Maven profiles for testing with and without S3Guard.
>> > - New failure injection code and integration tests that exercise it.  We
>> > use timers and a wrapper around the Amazon SDK client object to force
>> > consistency delays to occur.  This allows us to assert that S3Guard
>> > works
>> > as advertised.  This will be extended with more types of failure
>> > injection
>> > to continue hardening the S3A client.
>> >
>> > Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
>> > changes:
>> >
>> > - core-default.xml defaults and documentation for s3guard parameters.
>> > - A couple additional FS contract test cases around rename.
>> > - More goodies in LambdaTestUtils
>> > - A new CLI tool for inspecting and manipulating S3Guard features,
>> > including the backing MetadataStore database.
>> >
>> > This branch has seen extensive testing as well as use in production.
>> > This
>> > branch makes significant improvements to S3A's test toolkit as well.
>> >
>> > Performance is typically on par with, and in some cases better than, the
>> > existing S3A code without S3Guard enabled.
>> >
>> > This feature was developed with contributions and feedback from many
>> > people.  I'd like to thank everyone who worked on HADOOP-13345 as well
>> > as
>> > all of those who contributed feedback and work on the original design
>> > document.
>> >
>> > This is the first major Apache Hadoop project I've worked on from start
>> > to
>> > finish, and I've really enjoyed it.  Please shout if I've missed
>> > anything
>> > important here or in the VOTE process.
>> >
>> > Cheers,
>> > Aaron Fabbri
>>
>>
>> -
>> To unsubscribe, e-mail:
>> common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail:
>> common-dev-h...@hadoop.apache.org
>>
>>
>>
>>
>> --
>> John
>>
>



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-21 Thread Aaron Fabbri
+ Eddy Xu (having list issues)



On Mon, Aug 21, 2017 at 8:10 AM, Steve Loughran 
wrote:

>
> On 18 Aug 2017, at 17:51, John Zhuge > wrote:
>
> That will be great. Please record it if possible.
>
> good idea. I'll do a video & demo, that way I can avoid fielding hard
> questions
>
>
Hah!

-AF



>
> On Fri, Aug 18, 2017 at 4:12 AM, Steve Loughran  mailto:ste...@hortonworks.com>> wrote:
>
> I can do a demo of this next week if people are interested
>
> > On 17 Aug 2017, at 23:07, Aaron Fabbri  b...@cloudera.com>> wrote:
> >
> > Hello,
> >
> > I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
> the
> > HADOOP-13345 feature branch into trunk.
> >
> > This branch contains the new S3Guard feature which adds metadata
> > consistency features to the S3A client.  Formatted site documentation can
> > be found here:
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > The current patch against trunk is posted here:
> >
> > https://issues.apache.org/jira/browse/HADOOP-13998
> >
> > The branch modifies the s3a portion of the hadoop-tools/hadoop-aws
> module:
> >
> > - The feature is off by default, and care has been taken to insure it has
> > no impact when disabled.
> > - S3Guard can be enabled with the production database which is backed by
> > DynamoDB, or with a local, in-memory implementation that facilitates
> > integration testing without having to pay for a database.
> > - getFileStatus() as well as directory listing consistency has been
> > implemented and thoroughly tested, including delete tracking.
> > - Convenient Maven profiles for testing with and without S3Guard.
> > - New failure injection code and integration tests that exercise it.  We
> > use timers and a wrapper around the Amazon SDK client object to force
> > consistency delays to occur.  This allows us to assert that S3Guard works
> > as advertised.  This will be extended with more types of failure
> injection
> > to continue hardening the S3A client.
> >
> > Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> > changes:
> >
> > - core-default.xml defaults and documentation for s3guard parameters.
> > - A couple additional FS contract test cases around rename.
> > - More goodies in LambdaTestUtils
> > - A new CLI tool for inspecting and manipulating S3Guard features,
> > including the backing MetadataStore database.
> >
> > This branch has seen extensive testing as well as use in production.
> This
> > branch makes significant improvements to S3A's test toolkit as well.
> >
> > Performance is typically on par with, and in some cases better than, the
> > existing S3A code without S3Guard enabled.
> >
> > This feature was developed with contributions and feedback from many
> > people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> > all of those who contributed feedback and work on the original design
> > document.
> >
> > This is the first major Apache Hadoop project I've worked on from start
> to
> > finish, and I've really enjoyed it.  Please shout if I've missed anything
> > important here or in the VOTE process.
> >
> > Cheers,
> > Aaron Fabbri
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org common-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org common-dev-h...@hadoop.apache.org>
>
>
>
>
> --
> John
>
>


Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-21 Thread Steve Loughran

On 18 Aug 2017, at 17:51, John Zhuge 
> wrote:

That will be great. Please record it if possible.

good idea. I'll do a video & demo, that way I can avoid fielding hard questions


On Fri, Aug 18, 2017 at 4:12 AM, Steve Loughran 
> wrote:

I can do a demo of this next week if people are interested

> On 17 Aug 2017, at 23:07, Aaron Fabbri 
> > wrote:
>
> Hello,
>
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
>
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> The current patch against trunk is posted here:
>
> https://issues.apache.org/jira/browse/HADOOP-13998
>
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
>
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
>
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
>
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
>
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
>
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
>
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
>
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
>
> Cheers,
> Aaron Fabbri


-
To unsubscribe, e-mail: 
common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org




--
John



Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-19 Thread Mingliang Liu
+1 (non-binding)

I also worked on this project from start to finish and I really enjoyed the 
collaboration in community. The feature is to solve the very important and 
challenging consistency problem as stated in the design doc. All patches were 
reviewed by feature/trunk committers and we have been testing it with real 
world applications. Overall I think it is now production ready. Most 
contributors of this project are active in community and I believe the future 
work and code maintenance will be well addressed.

Thanks,

> On Aug 17, 2017, at 3:07 PM, Aaron Fabbri  wrote:
> 
> Hello,
> 
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
> 
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
> 
> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> 
> The current patch against trunk is posted here:
> 
> https://issues.apache.org/jira/browse/HADOOP-13998
> 
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
> 
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
> 
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
> 
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
> 
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
> 
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
> 
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
> 
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
> 
> Cheers,
> Aaron Fabbri


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-18 Thread John Zhuge
That will be great. Please record it if possible.

On Fri, Aug 18, 2017 at 4:12 AM, Steve Loughran 
wrote:

>
> I can do a demo of this next week if people are interested
>
> > On 17 Aug 2017, at 23:07, Aaron Fabbri  wrote:
> >
> > Hello,
> >
> > I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge
> the
> > HADOOP-13345 feature branch into trunk.
> >
> > This branch contains the new S3Guard feature which adds metadata
> > consistency features to the S3A client.  Formatted site documentation can
> > be found here:
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > The current patch against trunk is posted here:
> >
> > https://issues.apache.org/jira/browse/HADOOP-13998
> >
> > The branch modifies the s3a portion of the hadoop-tools/hadoop-aws
> module:
> >
> > - The feature is off by default, and care has been taken to insure it has
> > no impact when disabled.
> > - S3Guard can be enabled with the production database which is backed by
> > DynamoDB, or with a local, in-memory implementation that facilitates
> > integration testing without having to pay for a database.
> > - getFileStatus() as well as directory listing consistency has been
> > implemented and thoroughly tested, including delete tracking.
> > - Convenient Maven profiles for testing with and without S3Guard.
> > - New failure injection code and integration tests that exercise it.  We
> > use timers and a wrapper around the Amazon SDK client object to force
> > consistency delays to occur.  This allows us to assert that S3Guard works
> > as advertised.  This will be extended with more types of failure
> injection
> > to continue hardening the S3A client.
> >
> > Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> > changes:
> >
> > - core-default.xml defaults and documentation for s3guard parameters.
> > - A couple additional FS contract test cases around rename.
> > - More goodies in LambdaTestUtils
> > - A new CLI tool for inspecting and manipulating S3Guard features,
> > including the backing MetadataStore database.
> >
> > This branch has seen extensive testing as well as use in production.
> This
> > branch makes significant improvements to S3A's test toolkit as well.
> >
> > Performance is typically on par with, and in some cases better than, the
> > existing S3A code without S3Guard enabled.
> >
> > This feature was developed with contributions and feedback from many
> > people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> > all of those who contributed feedback and work on the original design
> > document.
> >
> > This is the first major Apache Hadoop project I've worked on from start
> to
> > finish, and I've really enjoyed it.  Please shout if I've missed anything
> > important here or in the VOTE process.
> >
> > Cheers,
> > Aaron Fabbri
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


-- 
John


Re: [VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-18 Thread Steve Loughran

I can do a demo of this next week if people are interested

> On 17 Aug 2017, at 23:07, Aaron Fabbri  wrote:
> 
> Hello,
> 
> I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
> HADOOP-13345 feature branch into trunk.
> 
> This branch contains the new S3Guard feature which adds metadata
> consistency features to the S3A client.  Formatted site documentation can
> be found here:
> 
> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> 
> The current patch against trunk is posted here:
> 
> https://issues.apache.org/jira/browse/HADOOP-13998
> 
> The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:
> 
> - The feature is off by default, and care has been taken to insure it has
> no impact when disabled.
> - S3Guard can be enabled with the production database which is backed by
> DynamoDB, or with a local, in-memory implementation that facilitates
> integration testing without having to pay for a database.
> - getFileStatus() as well as directory listing consistency has been
> implemented and thoroughly tested, including delete tracking.
> - Convenient Maven profiles for testing with and without S3Guard.
> - New failure injection code and integration tests that exercise it.  We
> use timers and a wrapper around the Amazon SDK client object to force
> consistency delays to occur.  This allows us to assert that S3Guard works
> as advertised.  This will be extended with more types of failure injection
> to continue hardening the S3A client.
> 
> Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
> changes:
> 
> - core-default.xml defaults and documentation for s3guard parameters.
> - A couple additional FS contract test cases around rename.
> - More goodies in LambdaTestUtils
> - A new CLI tool for inspecting and manipulating S3Guard features,
> including the backing MetadataStore database.
> 
> This branch has seen extensive testing as well as use in production.  This
> branch makes significant improvements to S3A's test toolkit as well.
> 
> Performance is typically on par with, and in some cases better than, the
> existing S3A code without S3Guard enabled.
> 
> This feature was developed with contributions and feedback from many
> people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
> all of those who contributed feedback and work on the original design
> document.
> 
> This is the first major Apache Hadoop project I've worked on from start to
> finish, and I've really enjoyed it.  Please shout if I've missed anything
> important here or in the VOTE process.
> 
> Cheers,
> Aaron Fabbri


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[VOTE] Merge HADOOP-13345 (S3Guard feature branch)

2017-08-17 Thread Aaron Fabbri
Hello,

I'd like to open a vote (7 days, ending August 24 at 3:10 PST) to merge the
HADOOP-13345 feature branch into trunk.

This branch contains the new S3Guard feature which adds metadata
consistency features to the S3A client.  Formatted site documentation can
be found here:

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

The current patch against trunk is posted here:

https://issues.apache.org/jira/browse/HADOOP-13998

The branch modifies the s3a portion of the hadoop-tools/hadoop-aws module:

- The feature is off by default, and care has been taken to insure it has
no impact when disabled.
- S3Guard can be enabled with the production database which is backed by
DynamoDB, or with a local, in-memory implementation that facilitates
integration testing without having to pay for a database.
- getFileStatus() as well as directory listing consistency has been
implemented and thoroughly tested, including delete tracking.
- Convenient Maven profiles for testing with and without S3Guard.
- New failure injection code and integration tests that exercise it.  We
use timers and a wrapper around the Amazon SDK client object to force
consistency delays to occur.  This allows us to assert that S3Guard works
as advertised.  This will be extended with more types of failure injection
to continue hardening the S3A client.

Outside of hadoop-tools/hadoop-aws's s3a directory there are some minor
changes:

- core-default.xml defaults and documentation for s3guard parameters.
- A couple additional FS contract test cases around rename.
- More goodies in LambdaTestUtils
- A new CLI tool for inspecting and manipulating S3Guard features,
including the backing MetadataStore database.

This branch has seen extensive testing as well as use in production.  This
branch makes significant improvements to S3A's test toolkit as well.

Performance is typically on par with, and in some cases better than, the
existing S3A code without S3Guard enabled.

This feature was developed with contributions and feedback from many
people.  I'd like to thank everyone who worked on HADOOP-13345 as well as
all of those who contributed feedback and work on the original design
document.

This is the first major Apache Hadoop project I've worked on from start to
finish, and I've really enjoyed it.  Please shout if I've missed anything
important here or in the VOTE process.

Cheers,
Aaron Fabbri