Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-17 Thread Andrew Wang
Thanks for the reply Steve, aligns what Aaron said above. Sooner the better
for this branch merge :)

On Thu, Aug 17, 2017 at 6:49 AM, Steve Loughran 
wrote:

>
> On 16 Aug 2017, at 18:39, Andrew Wang  wrote:
>
> Hi Steve,
>
> What's the target release vehicle, and the timeline for merging this? The
> target date for beta1 is mid-September, so any large code movements make me
> nervous.
>
>
> Code targets trunk, current state is ready to go in.
>
> I've also got it building & running against branch-2: all the code is
> Java-7 and the classpath problems were dealt with by Mingliang.
>
>
> Could you comment on testing and API stability of this branch? I'm
> trusting the judgement of the contributors involved, since there isn't much
> time to fix things before beta1.
>
>
>
> This is all working in the s3 code, and it's something you have to
> explicitly enable; I'm confident that when disabled it doesn't cause
> problems
>
> There's two modes of use in production (as well as a local dynamodb for
> testing)
>
> * dynamo DB as cache, "non authoritative"
> * dynamo DB as store of record, "authoritative"
>
> I'm fairly happy with non-auth; but as auth assumes that all clients are
> using s3guard, it's the one with the most risks. That one I'd be cautious
> over. But it does deliver the best speedup. And it lets you use the v1/v2
> algorithms to commit output, as now you get the consistent directory
> listings you need. There's still the O(data) COPY call, but at least the
> risk of incomplete listings -> incomplete copy operation is eliminated.
>
> We've had a preview version up for a while, running large hive/LLAP tests
> against it happily in particular, and my spark & cloud testing has shown
> all is well (indeed, I can show how all isn't well if you enable the
> inconsistent FS client and *dont* turn s3guard on).
>
> After the initial merge, there is more work to do, but mostly around:
> metrics, diagnostics, and the new committer work which depends on the
> consistent listings for one of the committers, but doesn't do *any* API
> calls into s3guard itself. All it needs is a consistent S3 endpoint, be it
> AWS S3 & S3Guard, or something else like the WDC cloud store. That's not
> going to be ready for Beta 1.
>
> -Steve
>
>
>
>
> Best,
> Andrew
>
> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
> wrote:
>
>>
>> FYI, We're getting ready for a patch to merge the current S3Guard branch,
>> HADOOP-13345, via a patch https://issues.apache.org/jira
>> /browse/HADOOP-13998
>>
>> After that's done, we do plan to have a second iteration, work on a
>> 0-rename committer (HADOOP-13786) with all the other tuning and
>> improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
>> and/or do things patch-by-patch .
>>
>> Anyway, now is a great time for people to download and play
>>
>> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-
>> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>>
>> testing this
>>
>> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-
>> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
>>
>> The Inconsistent AWS Client is also something everyone is free to use for
>> injecting inconsistencies (and soon faults) into their own apps by way of
>> 2-3 config options. Want to know how your code handles S3A being observably
>> inconsistent? We'll let you do that.
>>
>> -Steve
>>
>>
>>
>
>


Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-17 Thread Steve Loughran

On 16 Aug 2017, at 18:39, Andrew Wang 
mailto:andrew.w...@cloudera.com>> wrote:

Hi Steve,

What's the target release vehicle, and the timeline for merging this? The 
target date for beta1 is mid-September, so any large code movements make me 
nervous.

Code targets trunk, current state is ready to go in.

I've also got it building & running against branch-2: all the code is Java-7 
and the classpath problems were dealt with by Mingliang.


Could you comment on testing and API stability of this branch? I'm trusting the 
judgement of the contributors involved, since there isn't much time to fix 
things before beta1.


This is all working in the s3 code, and it's something you have to explicitly 
enable; I'm confident that when disabled it doesn't cause problems

There's two modes of use in production (as well as a local dynamodb for testing)

* dynamo DB as cache, "non authoritative"
* dynamo DB as store of record, "authoritative"

I'm fairly happy with non-auth; but as auth assumes that all clients are using 
s3guard, it's the one with the most risks. That one I'd be cautious over. But 
it does deliver the best speedup. And it lets you use the v1/v2 algorithms to 
commit output, as now you get the consistent directory listings you need. 
There's still the O(data) COPY call, but at least the risk of incomplete 
listings -> incomplete copy operation is eliminated.

We've had a preview version up for a while, running large hive/LLAP tests 
against it happily in particular, and my spark & cloud testing has shown all is 
well (indeed, I can show how all isn't well if you enable the inconsistent FS 
client and *dont* turn s3guard on).

After the initial merge, there is more work to do, but mostly around: metrics, 
diagnostics, and the new committer work which depends on the consistent 
listings for one of the committers, but doesn't do *any* API calls into s3guard 
itself. All it needs is a consistent S3 endpoint, be it AWS S3 & S3Guard, or 
something else like the WDC cloud store. That's not going to be ready for Beta 
1.

-Steve




Best,
Andrew

On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
mailto:ste...@hortonworks.com>> wrote:

FYI, We're getting ready for a patch to merge the current S3Guard branch, 
HADOOP-13345, via a patch https://issues.apache.org/jira/browse/HADOOP-13998

After that's done, we do plan to have a second iteration, work on a 0-rename 
committer (HADOOP-13786) with all the other tuning and improvements; We'd add a 
new uber-JIRA & move stuff over, maybe branch, and/or do things patch-by-patch .

Anyway, now is a great time for people to download and play

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

testing this

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md

The Inconsistent AWS Client is also something everyone is free to use for 
injecting inconsistencies (and soon faults) into their own apps by way of 2-3 
config options. Want to know how your code handles S3A being observably 
inconsistent? We'll let you do that.

-Steve






Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-16 Thread Andrew Wang
Thanks for the detailed explanation Aaron. Given that this has gone through
Cloudera's QA cycle and is run in production, that adds a lot of confidence
in the feature. Looking forward to having this in 3.0.0-beta1!

Best,
Andrew

On Wed, Aug 16, 2017 at 2:17 PM, Aaron Fabbri  wrote:

>
>
> On Wed, Aug 16, 2017 at 1:39 PM, Andrew Wang 
> wrote:
>
>> Hi Steve,
>>
>> What's the target release vehicle, and the timeline for merging this? The
>> target date for beta1 is mid-September, so any large code movements make
>> me
>> nervous.
>>
>
> I think this is ready to get in before beta1.  Most of upstream s3a dev
> has been happening on this branch so it has a lot of improvements and
> testing.
>
>
>> Could you comment on testing and API stability of this branch? I'm
>> trusting
>> the judgement of the contributors involved, since there isn't much time to
>> fix things before beta1.
>>
>>
> We've done a ton of testing on this branch:
>
> - List consistency tests with failure injection. (HADOOP-13793) This
> integration test forces a delay in visibility of certain files by wrapping
> the AWS S3 client. It asserts listing is consistent. The test fails without
> S3Guard, and succeeds with it.
>
> - All existing S3 integration tests with and without S3Guard. The
> filesystem contract tests have been invaluable here. (HADOOP-13589 makes
> these very easy to run).
>
> - MetadataStore contract tests that ensure that the API semantics of the
> DynamoDB and in-memory reference implementations are correct.
>
> - MetadataStore scale tests that can be used to force DynamoDB service
> throttling and ensure we are robust to that.
>
> - Unit tests for different parts of the S3Guard logic.
>
> As you probably know, at Cloudera we are using this codebase in
> production, and have run all of our downstream tests including Hive, Spark,
> Impala on the new S3A client code, with and without S3Guard enabled.
>
> In terms of API compatibility, the new features sit behind the FileSystem
> / FileContext APIs, which have not changed.  Applications don't require any
> changes.  Internal APIs for S3Guard, such as MetadataStore (currently
> private / evolving), should be properly annotated already.  The S3Guard
> work has been active for quite a while now, so the APIs are fairly stable
> in practice.
>
> Probably my biggest goal in writing the S3AFileSystem integration code
> (HADOOP-13651) was to preserve existing logic and correctness when S3Guard
> is not enabled.  One design choice which has worked well was to define a
> "null" implementation of the MetadataStore (the API that filesystem clients
> use to log metadata changes):
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/
> NullMetadataStore.java
>
> This is used in S3A by default. This made it easier to reason about
> correctness and minimized the size of the diff to the FS client as well.
>
> Other questions welcomed!
>
> Cheers,
> Aaron
>
>
>
> Best,
>> Andrew
>>
>> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
>> wrote:
>>
>> >
>> > FYI, We're getting ready for a patch to merge the current S3Guard
>> branch,
>> > HADOOP-13345, via a patch https://issues.apache.org/
>> > jira/browse/HADOOP-13998
>> >
>> > After that's done, we do plan to have a second iteration, work on a
>> > 0-rename committer (HADOOP-13786) with all the other tuning and
>> > improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
>> > and/or do things patch-by-patch .
>> >
>> > Anyway, now is a great time for people to download and play
>> >
>> > https://github.com/apache/hadoop/blob/HADOOP-13345/
>> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>> >
>> > testing this
>> >
>> > https://github.com/apache/hadoop/blob/HADOOP-13345/
>> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
>> >
>> > The Inconsistent AWS Client is also something everyone is free to use
>> for
>> > injecting inconsistencies (and soon faults) into their own apps by way
>> of
>> > 2-3 config options. Want to know how your code handles S3A being
>> observably
>> > inconsistent? We'll let you do that.
>> >
>> > -Steve
>> >
>> >
>> >
>>
>
>


Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-16 Thread Aaron Fabbri
On Wed, Aug 16, 2017 at 1:39 PM, Andrew Wang 
wrote:

> Hi Steve,
>
> What's the target release vehicle, and the timeline for merging this? The
> target date for beta1 is mid-September, so any large code movements make me
> nervous.
>

I think this is ready to get in before beta1.  Most of upstream s3a dev has
been happening on this branch so it has a lot of improvements and testing.


> Could you comment on testing and API stability of this branch? I'm trusting
> the judgement of the contributors involved, since there isn't much time to
> fix things before beta1.
>
>
We've done a ton of testing on this branch:

- List consistency tests with failure injection. (HADOOP-13793) This
integration test forces a delay in visibility of certain files by wrapping
the AWS S3 client. It asserts listing is consistent. The test fails without
S3Guard, and succeeds with it.

- All existing S3 integration tests with and without S3Guard. The
filesystem contract tests have been invaluable here. (HADOOP-13589 makes
these very easy to run).

- MetadataStore contract tests that ensure that the API semantics of the
DynamoDB and in-memory reference implementations are correct.

- MetadataStore scale tests that can be used to force DynamoDB service
throttling and ensure we are robust to that.

- Unit tests for different parts of the S3Guard logic.

As you probably know, at Cloudera we are using this codebase in production,
and have run all of our downstream tests including Hive, Spark, Impala on
the new S3A client code, with and without S3Guard enabled.

In terms of API compatibility, the new features sit behind the FileSystem /
FileContext APIs, which have not changed.  Applications don't require any
changes.  Internal APIs for S3Guard, such as MetadataStore (currently
private / evolving), should be properly annotated already.  The S3Guard
work has been active for quite a while now, so the APIs are fairly stable
in practice.

Probably my biggest goal in writing the S3AFileSystem integration code
(HADOOP-13651) was to preserve existing logic and correctness when S3Guard
is not enabled.  One design choice which has worked well was to define a
"null" implementation of the MetadataStore (the API that filesystem clients
use to log metadata changes):

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java

This is used in S3A by default. This made it easier to reason about
correctness and minimized the size of the diff to the FS client as well.

Other questions welcomed!

Cheers,
Aaron



Best,
> Andrew
>
> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
> wrote:
>
> >
> > FYI, We're getting ready for a patch to merge the current S3Guard branch,
> > HADOOP-13345, via a patch https://issues.apache.org/
> > jira/browse/HADOOP-13998
> >
> > After that's done, we do plan to have a second iteration, work on a
> > 0-rename committer (HADOOP-13786) with all the other tuning and
> > improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
> > and/or do things patch-by-patch .
> >
> > Anyway, now is a great time for people to download and play
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
> >
> > testing this
> >
> > https://github.com/apache/hadoop/blob/HADOOP-13345/
> > hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
> >
> > The Inconsistent AWS Client is also something everyone is free to use for
> > injecting inconsistencies (and soon faults) into their own apps by way of
> > 2-3 config options. Want to know how your code handles S3A being
> observably
> > inconsistent? We'll let you do that.
> >
> > -Steve
> >
> >
> >
>


Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-16 Thread Andrew Wang
Hi Steve,

What's the target release vehicle, and the timeline for merging this? The
target date for beta1 is mid-September, so any large code movements make me
nervous.

Could you comment on testing and API stability of this branch? I'm trusting
the judgement of the contributors involved, since there isn't much time to
fix things before beta1.

Best,
Andrew

On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
wrote:

>
> FYI, We're getting ready for a patch to merge the current S3Guard branch,
> HADOOP-13345, via a patch https://issues.apache.org/
> jira/browse/HADOOP-13998
>
> After that's done, we do plan to have a second iteration, work on a
> 0-rename committer (HADOOP-13786) with all the other tuning and
> improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
> and/or do things patch-by-patch .
>
> Anyway, now is a great time for people to download and play
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>
> testing this
>
> https://github.com/apache/hadoop/blob/HADOOP-13345/
> hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
>
> The Inconsistent AWS Client is also something everyone is free to use for
> injecting inconsistencies (and soon faults) into their own apps by way of
> 2-3 config options. Want to know how your code handles S3A being observably
> inconsistent? We'll let you do that.
>
> -Steve
>
>
>


Getting close to a vote on a merge of S3Guard., HADOOP-13345

2017-08-16 Thread Steve Loughran

FYI, We're getting ready for a patch to merge the current S3Guard branch, 
HADOOP-13345, via a patch https://issues.apache.org/jira/browse/HADOOP-13998

After that's done, we do plan to have a second iteration, work on a 0-rename 
committer (HADOOP-13786) with all the other tuning and improvements; We'd add a 
new uber-JIRA & move stuff over, maybe branch, and/or do things patch-by-patch .

Anyway, now is a great time for people to download and play

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

testing this

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md

The Inconsistent AWS Client is also something everyone is free to use for 
injecting inconsistencies (and soon faults) into their own apps by way of 2-3 
config options. Want to know how your code handles S3A being observably 
inconsistent? We'll let you do that.

-Steve