[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2018-02-01 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349673#comment-16349673
 ] 

shanyu zhao commented on HADOOP-13345:
--

Part of this patch added "copy-dependencies" execution in hadoop-aws/pom.xml 
without scope. This caused all test jars copied to the lib folder as well. 
  
  package
  
  copy-dependencies
  
  
  ${project.build.directory}/lib
  
  
 
Should we limit the scope of the copy to runtime? e.g. add following to 
 section:
runtime

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>Priority: Major
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-09-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185673#comment-16185673
 ] 

Steve Loughran commented on HADOOP-13345:
-

this is now been cherry-picked/backported into branch-2, modifying the fix 
version in this JIRA to note this. 

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-09-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150975#comment-16150975
 ] 

Steve Loughran commented on HADOOP-13345:
-

Filed HADOOP-14826. 


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-09-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150884#comment-16150884
 ] 

Andrew Wang commented on HADOOP-13345:
--

Great to see this merged :)

Steve, could you file a JIRA for the doc enhancements and raise it as a blocker 
for beta1? I can do a prompt review of whatever gets posted there.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Fix For: 3.0.0-beta1
>
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-09-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150528#comment-16150528
 ] 

Hudson commented on HADOOP-13345:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12297 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12297/])
HADOOP-13345 HS3Guard: Improved Consistency for S3A. Contributed by: (stevel: 
rev 621b43e254afaff708cd6fc4698b29628f6abc33)
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/S3xLoginHelper.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardEmptyDirs.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LruHashMap.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3AMockTest.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ACredentialsInURL.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestPathMetadataDynamoDBTranslation.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/fileContext/ITestS3AFileContextURI.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3ADirectoryPerformance.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DirListingMetadata.java
* (add) hadoop-tools/hadoop-aws/src/main/shellprofile.d/hadoop-s3guard.sh
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3GuardTool.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileOperationCost.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AFileSystemContract.java
* (edit) hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/UploadInfo.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/fileContext/ITestS3AFileContextStatistics.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/AbstractS3GuardToolTestBase.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardListConsistency.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AInconsistency.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/AbstractITestS3AMetadataStoreScale.java
* (edit) hadoop-project/pom.xml
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ACopyFromLocalFile.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestListing.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java
* (edit) hadoop-tools/hadoop-aws/dev-support/findbugs-exclude.xml
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRootDir.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AMiscOperations.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ADelayedFNF.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/PathMetadataDynamoDBTranslation.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Tristate.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractMkdir.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestDynamoDBMetadataStore.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestLocalMetadataStore.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractS3ATestBase.java
* (add) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardConcurrentOps.java
* (add) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
* (edit) hadoop-tools/hadoop-aws/src/test/resources/core-site.xml
* (add) 

[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-08-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129948#comment-16129948
 ] 

Andrew Wang commented on HADOOP-13345:
--

One small request, when this merges to trunk, could someone also add a blurb to 
hadoop-project/src/site/markdown/index.md.vm describing the feature, with a 
link to the relevant documentation? Similarly, a JIRA release note would be 
great too.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-08-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114396#comment-16114396
 ] 

Steve Loughran commented on HADOOP-13345:
-

Just ran parallel tests with {{ -Dparallel-tests -DtestsThreadCount=8 
-Ddynamodblocal -Ds3guard }}, got one failure

{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 133.321 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardConcurrentOps
testConcurrentTableCreations(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardConcurrentOps)
  Time elapsed: 132.954 sec  <<< ERROR!
com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Cannot do 
operations on a non-existent table (Service: AmazonDynamoDBv2; Status Code: 
400; Error Code: ResourceNotFoundException; Request ID: 
c9372335-5346-47dd-9375-e8c83d976f70)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2089)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2065)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeDescribeTable(AmazonDynamoDBClient.java:1048)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.describeTable(AmazonDynamoDBClient.java:1024)
at 
com.amazonaws.services.dynamodbv2.document.Table.describe(Table.java:137)
at 
com.amazonaws.services.dynamodbv2.document.Table.waitForActive(Table.java:488)
at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardConcurrentOps.deleteTable(ITestS3GuardConcurrentOps.java:68)
at 
org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardConcurrentOps.testConcurrentTableCreations(ITestS3GuardConcurrentOps.java:132)


{code}

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-08-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114392#comment-16114392
 ] 

Steve Loughran commented on HADOOP-13345:
-

# I get intermittent bad  request on that list directory test too...never 
worked out why, s3 ireland.
# what do you need me to do?

How about you attach it as a .patch under the phase I release JIRA, 
HADOOP-13998 (which I just renamed), and see what yetus says...no doubt 
there'll be some auditing of all its warnings, which will take an iteration or 
two, then we can vote for the big merge, which will be a 2 +1 vote requirement. 
After its in, we can do a branch-2 version which will have to switch 
lambda-expressions into anon classes; IntelliJ will do that for us


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-08-01 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109690#comment-16109690
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

I have a clean trunk merge commit ready to push.  [~ste...@apache.org] would 
you like me to proceed?

I still have one key on a different bucket that gives me 400 errors.  I have an 
AWS ticket open, no response yet.

On my new bucket, I ran:
mvn clean verify -Dit.test="ITestS3A*,ITestS3G*" -Dtest=none -Ds3guard

with zero failures in US West 2.  I'm re-running with dynamo now.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-07-28 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105773#comment-16105773
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

FYI I started working on merging latest trunk into HADOOP-13345 feature branch.

Right now I am looking into a 400 Bad Request error that is 100% reproduction 
on ITestS3AContractRootDir#testListEmptyRootDirectory, on trunk.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063035#comment-16063035
 ] 

Steve Loughran commented on HADOOP-13345:
-

OK, it took so long that I assumed it was a failure. Too long for a test run as 
it is hurting overall test execution time

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-06 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039636#comment-16039636
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Not hanging for me, but took about 8 1/2 minutes to complete.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-06 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039611#comment-16039611
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

I'll test it in a moment.  How long did you wait?  I thought someone increased 
the visibility delay for the inconsistent s3 client, and IIRC the test waits 2x 
that long in some cases.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039602#comment-16039602
 ] 

Steve Loughran commented on HADOOP-13345:
-

I'm getting ITestS3GuardListConsistency hanging when I run with dynamo or 
localdynamo. Anyone else?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-05 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037850#comment-16037850
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Sounds good to me [~liuml07].  Thank you for doing the merge.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037785#comment-16037785
 ] 

Mingliang Liu commented on HADOOP-13345:


Kinda clean integration tests.

# Running w/o s3guard {{mvn -Dit.test='ITestS3A*' -Dtest=none -Dscale -q clean 
verify}}, all test cases pass.
# Running with DynamoDB web service us-west-1 region, {{mvn 
-Dit.test='ITestS3A*,ITestS3Guard*,ITestDynamo*' -Dtest=none -Ds3guard -Ddynamo 
-q verify}}. Only one test failure, ITestS3AEncryptionSSEC. This has been 
identified and reported by [HADOOP-14448].
# Running with DynamoDB Local (in-memory DDB simulator for test), {{mvn 
-Dit.test='ITestS3A*,ITestS3Guard*,ITestDynamo*' -Dtest=none -Ds3guard 
-Ddynamodblocal -q verify}}. As above, only one test failure, 
ITestS3AEncryptionSSEC. This has been identified and reported by [HADOOP-14448].
# Running with Local mode (in-memory metadata store),
{code}
$ mvn -Dit.test='ITestS3A*,ITestS3Guard*,ITestDynamo*' -Dtest=none -Ds3guard 
-Dlocal -q verify
Results :

Tests run: 390, Failures: 0, Errors: 0, Skipped: 55
{code}

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-06-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037592#comment-16037592
 ] 

Mingliang Liu commented on HADOOP-13345:


I'll do a new merge from {{trunk}} today. I merge on my local machine and 
almost finish all the integration tests. Unless there is any objection or 
concerns, I'll push the merge after I post a clean test report by the end of 
day.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-30 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030134#comment-16030134
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

FYI [~ste...@apache.org], I created two JIRAs for your suggestions here: 
HADOOP-14467, and HADOOP-14468.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-30 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030050#comment-16030050
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

{quote}This s a read pipeline.{quote}

Ah, I read that wrong, sorry.

{quote}
1. could this be reported? e.g when an FNFE is raised when opening a stream on 
a s3guarded bucket, warn use this may be an inconsistency.
{quote}
For now, this sounds reasonable.
{quote}
2. S3AInputStream relies on the file length being normative {see 
calculateRequestLimit). If DDB thinks there is less data than there is, the 
extra data isn't picked up. You won't be able to seek past the amount of data 
that s3guard thinks is in the file, even if there is now more
{quote}
I can't think of any normal cases off top of my head where the MetadataStore 
length would be wrong (can you)?  Still this is a good point on side-effects of 
skipping s3 for the getObjectMetadata().
{quote}
We may want to have s3guard in non-auth mode do the HEAD on the final entry for 
that failfast and to get the length.
{quote}
Yes.  I also think we should add a new config flag for this behavior:  Leave 
fs.s3a.metadatastore.authoritative to be for listings, add a new 
fs.s3a.metadatastore.getfilestatus.authoritative for this case.  That way you 
can still get the same behavior we have today (which is useful IMO).
{quote}
 (side topic: if we do that, and note the length is different, what to do in 
s3guard itself?). 
{quote}
"Correct" thing to do is go into a retry policy until there is consensus.  And 
we should really be doing the dynamo and s3 requests async (in parallel) so the 
round trips can overlap.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019741#comment-16019741
 ] 

Steve Loughran commented on HADOOP-13345:
-

This s a read pipeline. What I think has happened is the client did open(), and 
s3guard skipped the existence check as ddb said it was there (and how long it 
was). The HTTP stream isn't set up in open(); it relies on the HEAD to have 
done the check first (a getFileStatus() is called to verify the path isn't a 
dir; if the path isn't there it fails. (note we could do a simpler check 
without the LIST call in the dir scan).

Because with s3Guard the HEAD request is skipped, it's only on the first seek 
that an attempt is made to GET the file contents. No file, error. There's 
nothing wrong with that per-se, it just means that if s3guard is inconsistent 
with the store, things show up later.

1.  could this be reported? e.g when an FNFE is raised when opening  a stream 
on a s3guarded bucket, warn use this may be an inconsistency.
2. S3AInputStream relies on the file length being normative {see 
{{calculateRequestLimit}}). If DDB thinks there is less data than there is, the 
extra data isn't picked up. You won't be able to seek past the amount of data 
that s3guard thinks is in the file, even if there is now more

We may want to have s3guard in non-auth mode do the HEAD on the final entry for 
that failfast and to get the length. (side topic: if we do that, and note the 
length is different, what to do in s3guard itself?). (This could be done in s3a 
input stream, as it if fadvise=normal it could start with a full GET of the 
file & pick up content-length there. Its for the seek-optimised random IO that 
we'd want to postpone the GET until the first readFully(), and limit its length 
to something shorter

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-17 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014800#comment-16014800
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Very interesting [~ste...@apache.org], thanks for sharing.  I've heard that S3 
GET is supposed to be consistent, except maybe after a previous negative GET.  
So, I'm trying to understand if that is the case.  I suppose we naturally have 
a negative GET preceeding the S3 object creation, where 
{{S3AFileSystem#create()}} does a {{getFileStatus()}} to see if the file 
already exists...  So we have 

- Create test file: 
   GET -> 404 (existence check)
   PUT ...
   S3Guard: Record (path, metadata)
- Read test file:
  S3Guard -> Yes, file exists (short-circuit getFileStatus())
  GET -> 404 (eventual consistency)

The simple solution would be to add a bit of plumbing into the InputStream so 
it knows that "the file should exist" and thus 404 should be subject to a retry 
policy.  That bit would be set when we get a hit from the MetadataStore's 
get().  I'm not sure we'd ever want to retry in other cases, as it slows down 
applications that may just be trying to confirm a file does not exist.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014702#comment-16014702
 ] 

Steve Loughran commented on HADOOP-13345:
-

Think I may have hit my first inconsistency BTW, in a localdb s3guard test. 
open() worked, but the first attempt to read the file triggered the FNFE. 

As discussed on HADOOP-14303; we should consider what our retry policy is. Here 
I think 404 -> fail fast.

+ S3aInputStream retries on some non-recoverable events, as it does one extra 
attempt on any exception. This can lead to 404s triggering a retry rather than 
fail fast. 
{code}
testSequentialRead(org.apache.hadoop.fs.contract.s3a.ITestS3AContractOpen)  
Time elapsed: 1.221 sec  <<< ERROR!
java.io.FileNotFoundException: Reopen at position 0 on 
s3a://hwdev-steve-ireland-new/fork-0007/test/testsequentialread.txt: 
com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not 
exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request 
ID: 8D81F218D02DE21E), S3 Extended Request ID: 
aXUWP6yYGSsP9ofVawyIteGZWBmkNTFjmRCvwAR1KyJmtR0A6H6UOggE4OlYB2ZOJ99F3MV74fU=
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:166)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:165)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.onReadFailure(S3AInputStream.java:348)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:321)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.fs.contract.AbstractContractOpenTest.testSequentialRead(AbstractContractOpenTest.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key 
does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; 
Request ID: 8D81F218D02DE21E)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1586)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1254)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
at 
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1373)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:158)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.onReadFailure(S3AInputStream.java:348)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:321)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.fs.contract.AbstractContractOpenTest.testSequentialRead(AbstractContractOpenTest.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 

[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-05-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014122#comment-16014122
 ] 

Steve Loughran commented on HADOOP-13345:
-

HADOOP-14432 adds tests and robustness for  {{copyFromLocalFile(false, true, 
dst, dst)}}. tests will verify that s3guard doesn't cause regressions


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-04-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967836#comment-15967836
 ] 

Steve Loughran commented on HADOOP-13345:
-

Broke my test runs. HADOOP-14216 is the cause. Workaround detailed on that 
now-reopened JIRA.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-04-12 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966770#comment-15966770
 ] 

Mingliang Liu commented on HADOOP-13345:


{code}
$ mvn -Dit.test='ITestS3A*,ITestS3Guard*,ITestDynamo*' -Dtest=none -Dscale 
-Ds3guard -Ddynamo -q clean verify
{code}

Merge happened.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-04-12 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966690#comment-15966690
 ] 

Mingliang Liu commented on HADOOP-13345:


As all the four recent changes in trunk are committed, I'll merge from trunk 
again. I'll run the integration tests again before that.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-04-04 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955569#comment-15955569
 ] 

Mingliang Liu commented on HADOOP-13345:


I'm expecting some conflicts because of [HADOOP-14135] and [HADOOP-14248]. Also 
there are some improvements meaningful to S3Guard like [HADOOP-14255] and 
[HADOOP-14247]. Let's issue another merge from trunk after these are resolved.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-20 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933777#comment-15933777
 ] 

Mingliang Liu commented on HADOOP-13345:


{code}
$ mvn -Dit.test='ITestS3A*, ITestS3Guard*' -Dtest=none -Dscale -Ds3guard 
-Ddynamo -q clean verify

Results :

Tests run: 348, Failures: 0, Errors: 0, Skipped: 16
{code}
Merge happened.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-20 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932993#comment-15932993
 ] 

Mingliang Liu commented on HADOOP-13345:


Hi all, I'll merge from trunk again in 24 hours for latest conflict changes in 
{{FileSystemContractBaseTest}}. I will commit if no test failing. Thanks,


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925119#comment-15925119
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

{quote}What about the prefix "s3guard+" on all buckets unless otherwise 
stated?{quote}

We generally try to discourage the per-bucket DDB table naming, as customers 
with jobs that access multiple buckets can end up with a slew of DDB tables, 
and it is a waste to provision them each for peak load.

We generally try to do one DDB table per cluster, with optional sharing between 
clusters.  I'm leaning towards a default name "s3guard-metadata" which is 
populated above Hadoop (i.e. CM or Ambari).

Personally I'd suggest not bothering too much with the per-bucket case.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-14 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925108#comment-15925108
 ] 

Mingliang Liu commented on HADOOP-13345:


Easy and useful change. According to:

{quote}
DDB - TableName
The name of the table to create.

Type: String

Length Constraints: Minimum length of 3. Maximum length of 255.

Pattern: {noformat}[a-zA-Z0-9_.-]+{noformat}

Required: Yes
{quote}
And
{quote}
S3: Rules for Bucket Naming:
Bucket names must be at least 3 and no more than 63 characters long.
Bucket names must be a series of one or more labels. Adjacent labels are 
separated by a single period (.). Bucket names can contain lowercase letters, 
numbers, and hyphens. Each label must start and end with a lowercase letter or 
a number.
Bucket names must not be formatted as an IP address (e.g., 192.168.5.4).
When using virtual hosted–style buckets with SSL, the SSL wildcard certificate 
only matches buckets that do not contain periods. To work around this, use HTTP 
or write your own certificate verification logic. We recommend that you do not 
use periods (".") in bucket names.
{quote}

S3Guard_ or S3Guard- works good as a prefix.

Alternatively, we can add tags for DDB tables. See 
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TagResource.html

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925087#comment-15925087
 ] 

Steve Loughran commented on HADOOP-13345:
-

I've been thinking a bit about DDB table names in a large organisation. What 
we've done today: table name == bucket name works for us developers, but I'm so 
sure it will work in large orgs. Even in house I can see a trend for developers 
like rajesh to use his name in tables to help assign ownership.

h4. What about the prefix "s3guard+" on all buckets unless otherwise stated? 
That way, people looking at costs of AWS accounts can see which costs are due 
to s3guard, without having to look into the tables, or search for matching 
buckets of the same name...


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-03-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907495#comment-15907495
 ] 

Steve Loughran commented on HADOOP-13345:
-

merged trunk in again; test with -Ds3guard all tests worked except that 
intermittent root dir test (which is really annoying me; I think I may just 
skip unless I can fix it)

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-22 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879422#comment-15879422
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Works for me, thanks [~liuml07]

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-22 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879382#comment-15879382
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks [~ste...@apache.org] for testing and commenting here. I plan to commit 
the merge-from-trunk and file new JIRA for making 
{{ITestS3GuardListConsistency}} stable. The improvements to version 
marker/check can be a separate JIRA as well. If [~fabbri] thinks differently, 
we can also address these before merging.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878871#comment-15878871
 ] 

Steve Loughran commented on HADOOP-13345:
-

Also, as tables are global & shared with others, how about we add to the 
version marker the username and timestamp of creation, along with an optional 
message. 

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878821#comment-15878821
 ] 

Steve Loughran commented on HADOOP-13345:
-

I'm trying to run the tests against s3 frankfurt, failing as I've an older 
table there, which I'll have to delete. Only here's the fun part, because the 
version check is in {{DynamoDBMetadataStore.initialize()}}, and that gets 
called by the CLI Destroy before metastore.destroy can be called, there's 
currently no way to destroy a table of an incompatible s3guard version from the 
CLI.

How about that version check is made a (secret) internal config option, and in 
the destroy operation, that option is set so that the init code skips the 
version check?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-21 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876991#comment-15876991
 ] 

Mingliang Liu commented on HADOOP-13345:


Hi Aaron, I updated from feature branch but still have the following error. 
This is the same for w/ and w/o merging from trunk and I assume it's not 
related to merge (not committed yet). I'll have a look at the test. Thanks,

{code}
mvn -Dit.test='ITestS3GuardListConsistency' -Dtest=none -Dscale -Ds3guard 
-Ddynamo -q clean verify

---
 T E S T S
---
Running org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.544 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency
testListStatusWriteBack(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)  
Time elapsed: 3.147 sec  <<< FAILURE!
java.lang.AssertionError: Unexpected number of results from metastore. 
Metastore should only know about /XYZ: 
DirListingMetadata{path=s3a://mliu-s3guard/test/ListStatusWriteBack, 
listMap={s3a://mliu-s3guard/test/ListStatusWriteBack/XYZ=PathMetadata{fileStatus=S3AFileStatus{path=s3a://mliu-s3guard/test/ListStatusWriteBack/XYZ;
 isDirectory=true; modification_time=0; access_time=0; owner=mliu; group=mliu; 
permission=rwxrwxrwx; isSymlink=false} isEmptyDirectory=true}, 
s3a://mliu-s3guard/test/ListStatusWriteBack/123=PathMetadata{fileStatus=S3AFileStatus{path=s3a://mliu-s3guard/test/ListStatusWriteBack/123;
 isDirectory=true; modification_time=0; access_time=0; owner=mliu; group=mliu; 
permission=rwxrwxrwx; isSymlink=false} isEmptyDirectory=true}}, 
isAuthoritative=false}
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testListStatusWriteBack(ITestS3GuardListConsistency.java:127)
{code}

{quote}
I'm wondering if that fix should be a separate commit instead of modifying the 
merge commit? 
{quote}
That makes sense. I'll file a separate JIRA for tracking this and submit a 
patch for fixing it (unless Steve objects). Merging commit should be 
simple/small/clear if possible.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-21 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876515#comment-15876515
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Sorry for delayed response..

{quote}
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency fails before/after merge. 
Do I need to configure something special?
{quote}
You should not have to. 

Strange, it has been working for me.  Could be a difference in the tables we 
use.  I will run that test on the latest code now and see what happens.  Could 
be related to HADOOP-14096, which just got fixed.

{quote}We don't really change anything in that part. I guess the reason is 
that, when enabling S3Guard, the code path that fails in S3AFileSystem changes 
for that test somehow. 
{quote}

Ok.. I'm wondering if that fix should be a separate commit instead of modifying 
the merge commit?  Maybe ping [~ste...@apache.org] for his opinion.




> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872936#comment-15872936
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks [~fabbri] for prompt reviewing test report!

{quote}
Thats ok.. It does miss ITestS3Guard\{ListConsistency, ToolDynamoDB\}
{quote}
That's a good catch. I just learned these two tests. However, 
{{org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency}} fails before/after 
merge. Do I need to configure something special?
{code}
cmvn -Dit.test='ITestS3Guard*' -Dtest=none -Dscale -Ds3guard -Ddynamo -q clean 
verify

---
 T E S T S
---
Running org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 14.992 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency
testListStatusWriteBack(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)  
Time elapsed: 13.552 sec  <<< FAILURE!
java.lang.AssertionError: Unexpected number of results from metastore. 
Metastore should only know about /XYZ: 
DirListingMetadata{path=s3a://mliu-s3guard/test/ListStatusWriteBack, 
listMap={s3a://mliu-s3guard/test/ListStatusWriteBack/XYZ=PathMetadata{fileStatus=S3AFileStatus{path=s3a://mliu-s3guard/test/ListStatusWriteBack/XYZ;
 isDirectory=true; modification_time=0; access_time=0; owner=mliu; group=mliu; 
permission=rwxrwxrwx; isSymlink=false} isEmptyDirectory=true}, 
s3a://mliu-s3guard/test/ListStatusWriteBack/123=PathMetadata{fileStatus=S3AFileStatus{path=s3a://mliu-s3guard/test/ListStatusWriteBack/123;
 isDirectory=true; modification_time=0; access_time=0; owner=mliu; group=mliu; 
permission=rwxrwxrwx; isSymlink=false} isEmptyDirectory=true}}, 
isAuthoritative=false}
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testListStatusWriteBack(ITestS3GuardListConsistency.java:127)
{code}

{quote}
Curious, what is our difference in HADOOP-13345 that changes this? Is our 
feature branch exception behavior different?
{quote}
We don't really change anything in that part. I guess the reason is that, when 
enabling S3Guard, the code path that fails in S3AFileSystem changes for that 
test somehow. For example (to be confirmed), the request w/o S3Guard was 
calling {{getFileStatus()}} and fails with access denied exception containing 
"Forbidden" keyword; while the request w/ S3Guard is able to call 
{{getFileStatus()}} and fails later with read operations, which then fails with 
access denied exception containing "Access Denied" keyword. So I think relaxing 
exception message assertion in test should work just fine.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872867#comment-15872867
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks [~liuml07]!

{quote}
$ mvn -Dit.test='ITestS3A*' -Dscale -Dtest=none -Ds3guard -Ddynamo -q clean 
verify
{quote}
Thats ok.. It does miss ITestS3Guard{ListConsistency, ToolDynamoDB}, FYI, but 
you got most of the tests.

{quote}
2. ITestS3ACredentialsInURL#testInstantiateFromURL is not supported. Should we 
simply skip this test if the metadata store is enabled (in a separate JIRA)?
{quote}

Yes.  Nothing new here and we do need to fix it.

{quote}
3. ITestS3AEncryptionSSEC started failing after merge because of the strict 
exception message assertion; it is fine in trunk. The only change is to remove 
"Forbidden" word as it would be "Access Denied" sometimes along with the same 
exception class java.nio.file.AccessDeniedException and message Service: Amazon 
S3; Status Code: 403; Error Code: AccessDenied;. For this I made the change 
when merging.
{quote}

Sounds ok.  Curious, what is our difference in HADOOP-13345 that changes this?  
Is our feature branch exception behavior different?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872859#comment-15872859
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks for your comments.

The summary of test report is:
{code}
$ mvn -Dit.test='ITestS3A*' -Dscale -Dtest=none -Ds3guard -Ddynamo -q clean 
verify

Results :

Failed tests:
  ITestS3AEncryptionSSEC.testCreateFileAndReadWithDifferentEncryptionKey:60 
Expected to find 'Forbidden (Service: Amazon S3; Status Code: 403;' but got 
unexpected exception:java.nio.file.AccessDeniedException: 
s3a://mliu-s3guard/test/testCreateFileAndReadWithDifferentEncryptionKey-0800: 
Reopen at position 0 on 
s3a://mliu-s3guard/test/testCreateFileAndReadWithDifferentEncryptionKey-0800: 
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: 
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 
8A23739237751886), S3 Extended Request ID: 
BEDP2iHUuZXjZTnU/s1f/8+kHM7F+czV2CAGJm3FEpzxxxo37nb+OqbswYsM7vUpWd682RP+4iY=
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:165)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:291)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:374)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.readDataset(ContractTestUtils.java:180)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.verifyFileContents(ContractTestUtils.java:204)
at 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionSSEC.lambda$testCreateFileAndReadWithDifferentEncryptionKey$4(ITestS3AEncryptionSSEC.java:80)
at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:346)
at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:418)
at 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionSSEC.testCreateFileAndReadWithDifferentEncryptionKey(ITestS3AEncryptionSSEC.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied 
(Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 
8A23739237751886), S3 Extended Request ID: 
BEDP2iHUuZXjZTnU/s1f/8+kHM7F+czV2CAGJm3FEpzxxxo37nb+OqbswYsM7vUpWd682RP+4iY=
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1586)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1254)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
at 
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1373)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:158)
... 21 more


Tests in error:
  ITestS3ACredentialsInURL.testInstantiateFromURL:86 » InterruptedIO initTable: 
...
  

[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872384#comment-15872384
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Once we figure out the SSE test failure I'm +1 to do a merge.  Looks like 
exception behavior is different in trunk?  Or the trunk test is also broken?

Also if you can paste a summary of tests (cli command used, number tests run / 
error / etc), if you still have it handy, that would be awesome.  Trying to 
make sure this branch is stable.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871651#comment-15871651
 ] 

Steve Loughran commented on HADOOP-13345:
-

{{ITestS3AEncryptionSSEC}} is a new test from HADOOP-13075; have a look to see 
if it is failing for you on trunk & if it does, open a JIRA.

Maybe we're just translating the exception more strictly.

I don't think s3guard and credentials in URLs should work together at all, in 
fact, explicitly refusing to work with them could be extra incentive to stop 
using it.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-17 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871450#comment-15871450
 ] 

Mingliang Liu commented on HADOOP-13345:


I propose we merge from trunk again. I have fixed the conflicts so if you vote 
up, I'll simply push.

{code}
commit a434d50fe4547f32de7b1fafb3c370a7123cda2d
Merge: 8b37b6a96c 02c549484a
Author: Mingliang Liu 
Date:   Thu Feb 16 22:38:55 2017 -0800

Merge branch 'trunk' into HADOOP-13345

After HADOOP-14040, we use shaded aws-sdk uber-JAR so don't have to
bring DynamoDB dependency explicitly. However, for tests we do need the
DynamoDBLocal dependency from its Maven repository.
{code}

I got integration tests run against us-west-1. Please confirm as this merge is 
major. Thanks,

{code}
Failed tests:
  ITestS3AEncryptionSSEC.testCreateFileAndReadWithDifferentEncryptionKey:60 
Expected to find 'Forbidden (Service: Amazon S3; Status Code: 403;' but got 
unexpected exception:java.nio.file.AccessDeniedException: 
s3a://mliu-s3guard/test/testCreateFileAndReadWithDifferentEncryptionKey-0800: 
Reopen at position 0 on 
s3a://mliu-s3guard/test/testCreateFileAndReadWithDifferentEncryptionKey-0800: 
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: 
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 
4E0A3A7A0B2D8005), S3 Extended Request ID: 
ZKm3w28W57skopifj0wH5p+c8KF1NVzL7ItNG067aK6FNK9dk1kmGrykda/NI4EhtFmN1/bv60c=
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:165)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:291)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:374)

Tests in error:
  ITestS3ACredentialsInURL.testInstantiateFromURL:86 » InterruptedIO initTable: 
...
  
ITestS3AFileSystemContract>FileSystemContractBaseTest.testRenameToDirWithSamePrefixAllowed:669->FileSystemContractBaseTest.rename:525
 » AWSServiceIO
{code}
For failing test {{ITestS3AEncryptionSSEC}} I'm not sure it's the caused by the 
merge; {{ITestS3ACredentialsInURL}} is known not supported as credentials in 
URL are very unsafe. 
{{ITestS3AFileSystemContract#testRenameToDirWithSamePrefixAllowed}} I can pass 
it 2nd run.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-09 Thread Sameer Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859628#comment-15859628
 ] 

Sameer Choudhary commented on HADOOP-13345:
---

Makes sense. Thanks! For persistance, frequent snapshotting to S3 have to be 
implemented by the user for their Metadata store. One that is loss less. 
However, I agree that for most users Dynamo DB based solution should be 
sufficient.

 

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859586#comment-15859586
 ] 

Steve Loughran commented on HADOOP-13345:
-

I should add to aaron with 

# the in-memory one really, really is for testing only. 
# it won't be throttling per-se, more than when you get API calls rejected,  
the client will back off. See HADOOP-13904.

I like your thoughts about HBase; there's no obvious reason why this won't work 
(though you need to persist it somehow). For now though, Dynamo is what we 
target, so we can just use something that AWS keeps up. It helps for dev & test 
as we don't need to bring up miniHbase clusters. 

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858758#comment-15858758
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

S3Guard is specifically designed to allow multiple backend implementations for 
MetadataStore (the interface that stores metadata).  So far we have an 
in-memory "reference" or testing implementation, and one for DynamoDB.  We 
expect more back ends to be implemented in the future.

Note there is also a test suite that ensures that different implementations 
provide correct semantics.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-02-08 Thread Sameer Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858755#comment-15858755
 ] 

Sameer Choudhary commented on HADOOP-13345:
---

Hi,

Today, I attended the talk on the project at Spark Summit 2017. Thanks for 
putting in all the effort!

I have a question regarding pricing of DynamoDB. It charges on read/write 
request rate. So, users might have to pay high amount of price for getting the 
consistency guarantees. This would especially affect large Spark Jobs with many 
parallel executing tasks that are trying to read/write to DynamoDB. Putting 
throttling will affect the job performance. Some benchmarks here would be great.

A solution could be for S3Guard to additionally support for custom Key Value 
store such as Apache HBase that supports strictly consistent reads/writes. A 
user can create a separate cluster or use the same Spark cluster to setup the 
store. The benefit of the approach is that users can now achieve high 
throughput on even large Spark jobs with paying just a fraction of cost.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829973#comment-15829973
 ] 

Steve Loughran commented on HADOOP-13345:
-

One thing everyone needs to keep an eye on is HADOOP-14000 : support for 
millions of files. This'll require big changes in the current 
{{DirListingMetadata}} class, as well as hooking up all the S3aFS list* 
operations

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829778#comment-15829778
 ] 

Steve Loughran commented on HADOOP-13345:
-

FYI, I've just cherry-picked in HADOOP-13999, "Add -DskipShade maven profile to 
disable jar shading to reduce compile time"

I/we will probably need to roll this back before the next trunk merge; until 
then it lets us build this branch without waiting 6+ minutes for the shade

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828156#comment-15828156
 ] 

Steve Loughran commented on HADOOP-13345:
-

I've created a JIRA to explicitly track the things people think is needed for 
the first preview/merge: HADOOP-13998

I want the version marker in, as we mustn't let this get into the field without 
one: otherwise the whole version marker logic is cripped from outset.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, s3c.001.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-10 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816491#comment-15816491
 ] 

Mingliang Liu commented on HADOOP-13345:


I re-visited the retry logic and think it's not a needed item for basic list 
consistency. I can think of a few more JIRAs like better request throttling 
handling in DynamoDB, failure detection and recovery between writing to S3 and 
S3Guard etc; I think they can be post merge subtasks. Thanks,

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-10 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815897#comment-15815897
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

+1 merging in trunk.. Works for me [~ste...@apache.org]

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15814651#comment-15814651
 ] 

Steve Loughran commented on HADOOP-13345:
-

I'd like to merge trunk into this branch (i.e. not a rebase, just a merge). 
Why? I think I need HADOOP-13922 for building/testing spark

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15811874#comment-15811874
 ] 

Steve Loughran commented on HADOOP-13345:
-

-1 to anything working with user:pass in URIs. IT's wrong because it 
contaminates the logs with secrets. With HADOOP-13336 you can define per-bucket 
secrets in core-site.xml, or the command line with {{-D 
fs.s3a.bucket.stevel-new.aws.secret.id=mykey}}; they'll get picked up in he FS 
instance.

This means that the bucket URI will be needed for setup, but it doesn't have to 
involve instantiating the FS, just: 

{code}
bucket = new URI(fsName).getHost()
conf = propagateBucketOptions(origConf, bucket)
s3ClientFactoryClass = conf.getClass(
  S3_CLIENT_FACTORY_IMPL, DEFAULT_S3_CLIENT_FACTORY_IMPL,
  S3ClientFactory.class);
s3Client = ReflectionUtils.newInstance(s3ClientFactoryClass, conf)
  .createS3Client(name, uri);
{code}
Thats all. But we will need that bucket name if there's any need to go near 
bucket-related options

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806403#comment-15806403
 ] 

Mingliang Liu commented on HADOOP-13345:


Yeah, the two points both make sense to me. I'll get my hand dirty by working 
on that early; the code might not go to the feature branch before merge to 
trunk. For the 2nd point, I was thinking of rename, which uses the copy 
operation. But I was not sure GET after copy is consistent. Will check that as 
well.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802751#comment-15802751
 ] 

Mingliang Liu commented on HADOOP-13345:


Hi [~eddyxu],

You're right that in the current code users have to specify the defaultFS (via 
configuration file or -fs option from command line) for operating DDB metadata 
store directly. The s3 URI is used to create AmazonS3 object along with 
credential object (for S3 and DDB).
# AmazonS3 client, which is used for detecting the bucket region, is able to 
operate any bucket and creating such object is not binding to any specific 
bucket.
# As to the credentials in URI (e.g. s3://user:pass@bucket/), they're optional 
and deprecated. This pattern is not supported in DDB. However, the 
DDBClientFactory itself uses the same {{createAWSCredentialProviderSet}} as 
S3ClientFactory does so it honors the creds in URI name. The reason it's not 
yet supported is that after {{FS#initialization}}, S3AFS has stripped the creds 
and returns the {{scheme://host}} only URI for creating a MetadataStore. One 
possible fix is to pass the name URI which contains the creds to 
S3Guard#getMetadataStore().
{code:title=S3AFileSystem#initialize()}
-  metadataStore = S3Guard.getMetadataStore(this);
+  metadataStore = S3Guard.getMetadataStore(this, name);
{code}

For command line operations, I think fs.defaultFS is a basic config for users 
and specifying s3://bucket seems not heavy. But still, we can remove this 
constraint.
# Option 1: The DDB table name has to be specified via configuration; and we 
assume the bucket name is the DDB table name if the defaultFS is not provided 
(or it's not S3). To determine the region of the bucket, we still assume the S3 
bucket (whose name is the same as DDB table name) does exist; and the 
AmazonS3.getBucketLocation will have the value.
# Option 2: The DDB table name and endpoint have to be specified via 
configuration. We can determine the DDB region by the DDB endpoint. This way, 
we don't have to know the related S3 bucket for the DDB metadata store to 
operate.

I prefer the 2nd approach. I'm not sure both of the options work but I can work 
on a wip patch recently; or as you suggested, we can support this later.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802338#comment-15802338
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks [~ste...@apache.org], this is good stuff.  Sounds like you are adding 
(1) CLI polish / docs / fsck and (2) FS metrics to my list.

In terms of clean integration with S3A code:  I spent a lot of time 
streamlining the S3AFileSystem integration.  One of the main warts IMO is the 
isEmptyDirectory thing, but refactoring that post-merge seems much easier to me 
(and also isn't S3Guard specific per-se).  Anything else actionable we can do 
to address the maintainability point?  Maybe we do another feature branch code 
review with this in mind after we get the above features in?



> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802302#comment-15802302
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Hi [~cnauroth].  Just to clarify, you can run all the S3A integration tests 
with S3Guard enabled, and the steps are documented in s3guard.md.  I agree 
having better integration / automation is important.  Will discuss on that JIRA.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801890#comment-15801890
 ] 

Chris Nauroth commented on HADOOP-13345:


I would prefer to see HADOOP-13589 completed before merge.  Being able to run 
the existing S3A test suite with S3Guard enabled would help ensure that we're 
maintaining the existing semantics as much as possible as we iterate further on 
the code.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801537#comment-15801537
 ] 

Steve Loughran commented on HADOOP-13345:
-

I'd like to see manageability, correctness and maintainability, especially to 
the extent that we are confident that people can start to play with it and not 
get into a mess.

This means: that CLI is usable, documented, tested. I think we need some fsck 
option to verify that everything matches between DB and FS, fsck --fix to do 
the corrections.

Other management? Some metrics, consistent with the rest of the S3A metrics, in 
terms of what they count, and appearing in S3AFileSystem.toString for some 
diagnostics downstream

maintainability: let's look at the code and see if its integration with S3A is 
as clean as we can have it. Once you have things merged into trunk and two 
branches forking again, it becomes much harder to clean up.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-05 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801188#comment-15801188
 ] 

Lei (Eddy) Xu commented on HADOOP-13345:


Hi, [~fabbri], [~liuml07], [~ste...@apache.org].   Happy new year.

[~liuml07], regarding HADOOP-13650 (CLI),  for 
{{DynamoDBMetadataStore#initialize(Configuration)}}, does it still require s3 
fs name defined in the configuration?  I think that the user might not be used 
to specify {{fs.defaultFS}} to an S3 bucket. It seems that logic that creating 
AWS credential from S3 URL is deeply in the code.  The consequence of it is 
that even {{hadoop s3a [init|destroy]}} can directly use a DynamoDB URL to 
create or clear / destroy the metadata store, it still requires the user to 
specify a s3a url {{s3a://bucket}} to create the FS instance.  It is also 
undesirable for mutli-bucket cases, i.e., sharing one dynamodb table with 
multiple buckets. 

Should we get the current form of CLI committed first, then we change the CLI 
parameters after the feature branch merged into trunk?

Thanks

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-04 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1577#comment-1577
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks for your input [~liuml07].  The reason I propose doing the retry logic 
after merge is:

1. It will change S3AFileSystem code significantly.  At least in terms of 
dealing with merge conflicts, things like putting a function inside a retry 
loop (increasing the indent of all the code) might mean a good amount of code 
churn.

2. I'm not sure retries are needed for basic list consistency.  The failure 
injection test I added demonstrates that basic listing consistency works.  
Since most S3 GETs are consistent, I would expect use cases like PUT, list, GET 
to work as is.

Let me know if I'm missing something.  Looking forward to others' opinions too. 
 Cheers!



> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-04 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799916#comment-15799916
 ] 

Mingliang Liu commented on HADOOP-13345:


Hi [~fabbri], happy new year!

The list makes sense to me. I also think of the retry policies should be 
needed. We can 1) make it pluggable and 2) provide a simple one; other efforts 
to make the retry policy work better will be appreciated. My current work is 
focused on fixing DDB related bugs/improvements.

Thanks,

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-04 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15799249#comment-15799249
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Happy new year [~ste...@apache.org] and [~liuml07].  I'd like to figure out the 
set of JIRAs we want to resolve before merging the feature branch.  We've 
discussed before targeting basic list consistency for the initial feature set.  
It seems like the main features we need are:

Needed:
- Any DynamoDB bugs (initialization stuff, etc).
- Basic CLI functionality. HADOOP-13650
- Multi-bucket improvements, including read-only buckets.  HADOOP-13876.  I 
think the solution will be to tackle the per-bucket config JIRA HADOOP-13336

Maybe:
- Make sure testing is sorted.  Currently tests are good but a bit manual.  Do 
we want to do HADOOP-13589 pre merge?

Post merge: 
- The rest of the JIRAs.

Let me know what you think.  Should we create a new "s3guard phase 2" umbrella 
JIRA so we can start moving stuff over once we have consensus?


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2017-01-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795579#comment-15795579
 ] 

Steve Loughran commented on HADOOP-13345:
-

the CLI needs to be able to rebuild its state from the bucket, including 
detecting & logging any inconsistencies.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735396#comment-15735396
 ] 

Steve Loughran commented on HADOOP-13345:
-

yeah, but the patch has been reverted from trunk as it broke a small bit of 
YARN. Once the final patch is in I'll merge up again.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733598#comment-15733598
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Looks like you did the commit *and* you also did a merge to update HADOOP-13345 
w/ trunk.  Thanks!

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15733590#comment-15733590
 ] 

Steve Loughran commented on HADOOP-13345:
-

done

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730226#comment-15730226
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Sounds good.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730218#comment-15730218
 ] 

Steve Loughran commented on HADOOP-13345:
-

thanks. I've actually got permission to add to trunk...how about I do that and 
then do another merge of trunk -> HDP-13345?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-07 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729951#comment-15729951
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

+1 on adding patch v2 from HADOOP-13852 to the s3guard feature branch 
(HADOOP-13345).

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-12-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728778#comment-15728778
 ] 

Steve Loughran commented on HADOOP-13345:
-

HADOOP-13852 allows hadoop to be declare a different version in its 
version-info than in the POM. This is currently needed to allow spark to act as 
a regression test for the s3guard work.

I've already got it in the HADOOP-13786 branch, but by putting it into the base 
s3guard branch lets me test dynamodb and other patches, while the committer 
code is still a WiP

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15647834#comment-15647834
 ] 

Steve Loughran commented on HADOOP-13345:
-

thanks: sorry for breaking things —it's the price of working against a changing 
part of the codebase. Now I'm involved in this branch as well (a) I'll have 
less time to break things on branch-2 and (b) I'll be more aware of what I've 
just broken.

FWIW I don't think you need to do rebase here, merging is better for 
collaboration, and avoids that hell of having to fix up some patch conflict 
over a class you know gets deleted later. When this gets pulled into 
trunk/branch-2, it'll be done as a squashed merge, so there's no harm in doing 
merges here rather than rebase

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-06 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642759#comment-15642759
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks!

Will post the DDBMetadatastore patch upon rebased branch soon.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-06 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642735#comment-15642735
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

[~liuml07], [~eddyxu], [~steve_l] I just rebased on trunk and force-pushed to 
the HADOOP-13345 feature branch.  You'll need to do the usual 
branch/reset/rebase dance for your outstanding commits.  Shout if you have any 
questions/concerns.




> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630005#comment-15630005
 ] 

Steve Loughran commented on HADOOP-13345:
-

+1 for rebasing; I know the S3AFS class is a moving target, but regular resyncs 
will stop things divering/breaking each other

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-02 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629966#comment-15629966
 ] 

Lei (Eddy) Xu commented on HADOOP-13345:


Thanks, [~fabbri]. 

It works for me. Looking forward to it.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-02 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629922#comment-15629922
 ] 

Mingliang Liu commented on HADOOP-13345:


That sounds perfect. I think Steve also has some plan about this? Perhaps you 
can sync with him before rebasing. Thanks,

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-11-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629911#comment-15629911
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

FYI [~eddyxu], [~liuml07], [~ste...@apache.org], I'd like to rebase the feature 
branch on latest trunk tomorrow morning.  Let me know if this works for you.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-09-09 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478327#comment-15478327
 ] 

Chris Nauroth commented on HADOOP-13345:


Based on the most recent discussion here about simplifying the notion of 
policy, I have resolved as invalid sub-tasks HADOOP-13450 and HADOOP-13451.  I 
expect the previous intended scope of these issues will be small enough that we 
can combine it into the metadata store implementation sub-tasks and possibly a 
few other sub-tasks to track integration back into {{S3AFileSystem}}.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-09-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468732#comment-15468732
 ] 

Chris Nauroth commented on HADOOP-13345:


Yes, on further reflection, I agree that much of the aspects of policy that I 
laid out earlier (and their corresponding JIRA sub-tasks) can be simplified.  
Aaron's last 2 comments look like a step in the right direction.  I suggest 
that we proceed toward implementing what the design doc describes in "Policy C: 
On-​Demand Source of Truth".  A small number of properties like Aaron described 
might influence the exact behavior.  Beyond that, a more complex notion of 
policy might turn out to be overkill.  Some of those JIRA sub-tasks might just 
get closed as invalid.

My immediate next step will be to review the work Aaron did on HADOOP-13448 in 
my absence.


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-08-31 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453734#comment-15453734
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thank you for the feedback [~eddyxu].

Let me give an example why I think the {{fullyCached}} or {{isAuthoritative}} 
flag is required for return value from {{MetadataStore.listChildren()}}.

Assume we have an existing s3 bucket that contains these files:

/a/b/file0
/a/b/file1

Now, assume we start up a Hadoop cluster, with s3guard configured for the 
{{MetadataStore}} to be authoritative, and do the following operations:

create(/a/b/file2)
listStatus(/a/b)

In this case we have to query both the MetadataStore, and the s3 backend, as 
/a/b/file2 visibility may be subject to eventual consistency.  Also the 
MetadataStore only knows about /a/b/file2, so the client has to consult s3 to 
learn about file0 and file1.  In the listStatus() above, 
{{MetadataStore.listChildren(/a/b)}} will return {{(("/a/b/file2"), 
isAuthoritative=false)}}, since the MetadataStore did not get a {{put()}} with 
{{isAuthoritative=true}}, nor did it see a {{mkdir(/a/b)}} happen.

Two examples where {{MetadataStore.listChildren()}} would return a result with 
{{isAuthoritative=true}}:

1. 
mkdir(/a/b/c)
create(/a/b/c/fileA)
listStatus(/a/b/c)

Here, since the metadata store saw the creation of /a/b/c, it knows that it has 
observed all creations and deletions inside the /a/b/c directory.

2. Extending the original example:

Existed before cluster startup:
/a/b/file0
/a/b/file1

Then with cluster, we see:
create(/a/b/file2)
listStatus(/a/b)
listStatus(/a/b)

The first call to listStatus(/a/b) will have to fetch the full directory 
contents from s3 since {{MetadataStore.listChildren(/a/b)}} will return 
{{isAuthoritative=false}}.  Once the client gets the full listing from s3, it 
can call {{MetadataStore.put(('a/b/file0', '/a/b/file1', '/a/b/file2'), 
isAuthoritative=true)}}.  *Now*, the MetadataStore has been told it has full 
contents of /a/b, and the second call to listStatus(/a/b) above will see the 
MetadataStore return {{('a/b/file0', '/a/b/file1', '/a/b/file2'), 
isAuthoritative=true)}}


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-08-31 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453539#comment-15453539
 ] 

Lei (Eddy) Xu commented on HADOOP-13345:


Hi, [~fabbri] thanks for these great suggestions here. 

One question here is:
* Can we consider {{fullycache.directories == true iff 
metadatastore.allow.authoritative == true}}?  If we combine them together, case 
2 of {{fullycache.directories}} should not happen. 

bq. as the MetadataStore will always return results marked as non-authoritative.
If we have this flag, we might not need to mark results as well. 

So I think the code like following can make the things simpler:

{code}
List subFiles = metadataStore.get(path);
if (metadataStore.isAuthoritive()) {
List s3Files = s3.listDir(path);
// merge subfile and s3Files...
}
{code}

What do you think?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-08-23 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434139#comment-15434139
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Having the MetadataStore interface is an important first step for us to 
parallelize our effort here.  Thanks again Chris for getting that first patch 
out.  

I still have questions about the subtasks though. There is still some fuzziness 
with respect to the policy part.  (We may want to have a conf. call to 
discuss--and I'm open tomorrow.)

I've been thinking about policy a little and I believe:

- Allowing MetadataStore implementations to opt in/out of being source of truth 
is important.  Implementations may wish to opt out based on implementation 
complexity, or lack of transactions for underlying store, or policy (LRU 
discard).

- Allowing the client to opt out of relying on MetadataStore as source of truth 
is also desirable.  Workloads that add files outside of hadoop, for example.  
And opting out is less risky while we stabilize the codebase.

This implies some configuration parameters (ignoring the naming for now--I 
assume a future where this is factored out of s3a for any FS client to utilize)

fs..metadatastore.allow.authoritative
- If true, allow configured metadata store (if any) to be source of truth on 
cached file metadata and directory listings.
- If true, but configured metadata store does not support being authoritative, 
this setting will have no effect,
  as the MetadataStore will always return results marked as non-authoritative.

fs..metadatastore.class
- Configure which MetadataStore implementation to use, if any.
- This may replace fs.s3a.s3guard.enabled proposed in doc?

fs.metadatastore..fullycache.directories
- If the metadata store implementation supports being authoritative on 
directory listings, this will cause it 
  to return DirectoryListMetadata (name tbd) results with fullyCached=true when 
it has complete directory 
  listing.
- If metadata store implementation does not support this, it should log an 
error.  Client will work correctly
  as implementation will never claim to fully cache listings / PathMetadata.

We could name this authoritative.directories instead.. We could 
also add an analogue for files:  ...authoritative.files as well.  In 
my prototype I assumed get() on a single Path could always be authoritative.  I 
could go either way.

Thoughts?

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-08-10 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415653#comment-15415653
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks [~cnauroth] for updating the doc.  I was out last week.  There are some 
minor sections that need reconciliation (text from your doc says always use s3 
as source of truth, text from our doc says make it configurable but shoot for 
support for directory caching), but overall it is very good.

I'll comment on HADOOP-13448 with some questions I have related to 
MetadataStore interface design in your patch here.



> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-08-01 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403080#comment-15403080
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks for updating the design doc and creating the sub-tasks. I suggest we 
elevate the sub-tasks' priority as "Major" as they are.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> S3GuardImprovedConsistencyforS3AV2.pdf, s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-26 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394627#comment-15394627
 ] 

Lei (Eddy) Xu commented on HADOOP-13345:


Thanks a lot , [~cnauroth]!

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-26 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394590#comment-15394590
 ] 

Chris Nauroth commented on HADOOP-13345:


I have created the HADOOP-13345 feature branch and JIRA fix version.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-26 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394565#comment-15394565
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks [~cnauroth].  Really cool to compare two independently designed 
solutions.  Thanks for the GCS link, I'll check that out.

I agree we should proceed and collaborate on this.  Feature branch sounds good.

{quote}
The main difference I see is that my work focused more on consistency, with the 
S3 bucket still treated as source of truth, and your work focused more on 
performance. I hadn't tried anything with the DynamoDB lookup completely 
short-circuiting the S3 lookup. I think we can reconcile this though.
{quote}

We tried to make the {{MetadataStore}} interface expressive enough to allow 
implementations (both the MetadataStore impl. and the client code that uses it) 
to decide on whether or not the {{MetadataStore}} can be source of truth on 
directory listings:

- Our {{MetadataStore#listStatus(Path)}} returns a {{CachedDirectory}} which 
contains a flag {{isFullyCached}}.   Implementations may always set that flag 
to false, indicating that the client needs to consult the backing storage as 
well.

- If a client connector wishes to take advantage of the performance benefits, 
it can publish full directory listings to the {{MetadataStore}} via 
{{putListStatus()}} with {{isFullyCached=true}}, and also note the 
{{isFullyCached}} flags on the return values from {{listStatus()}}.  If a 
client connector does not want to deal with two possible sources of truth (e.g. 
to simplify failure cases), it can chose not to publish full listings to the 
{{MetadataStore}}, and to ignore any {{isFullyCached}} flags that are set on 
return from {{MetadataStore#listStatus()}}.



> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-25 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392892#comment-15392892
 ] 

Chris Nauroth commented on HADOOP-13345:


[~ajfabbri] and [~eddyxu], thank you for sharing your work.  I see a lot of 
commonality between the 2 efforts so far.

I also explored something like the Layered FileSystem vs. Pluggable Metadata 
trade-off that you described.  Specifically, I had an earlier prototype (not 
attached) that was a {{FilterFileSystem}}, with the intent that you could layer 
it over any other {{FileSystem}} implementation.  I abandoned this idea when I 
got into implementation and found that I was going to need to coordinate more 
directly with the S3A logic in a way that wasn't amenable to overriding 
{{FileSystem}} methods.  For example, I needed special case logic around 
{{createFakeDirectoryIfNecessary}}.  It looks like you came to the same 
conclusion in your patch.

The main difference I see is that my work focused more on consistency, with the 
S3 bucket still treated as source of truth, and your work focused more on 
performance.  I hadn't tried anything with the DynamoDB lookup completely 
short-circuiting the S3 lookup.  I think we can reconcile this though.  Like 
you said, we can support configurable policies for different use cases.  For 
example, if a user is willing to commit to performing all access through S3A 
and no external tools, then I expect it's safe for them to turn on a more 
aggressive caching policy that satisfies all metadata lookups from DynamoDB.  
Alternatively, there can be a fix-up tool like you described.  This might fold 
into HADOOP-13311, where I proposed a new shell entry point for S3A-specific 
administration commands.

Another interesting example in this area is GCS, which has something like the 
policies we are describing in terms of their {{DirectoryListCache}}.  This 
includes an implementation like the in-memory one included in your patch.

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/1447da82f2bded2ac8493b07797a5c2483b70497/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/InMemoryDirectoryListCache.java

The JavaDocs advertise it as providing consistency within a single process.  
Like you said, there is no cache coherence across processes.

[HADOOP-12876|https://issues.apache.org/jira/browse/HADOOP-12876] is slightly 
related.  Azure Data Lake has implemented an in-memory {{FileStatus}} cache 
(patch not yet available).  When this idea was suggested, I raised the concern 
about cache coherence, but system testing with that caching enabled has gone 
well.  That's a good sign that the cache coherence problem might not cause much 
harm to applications in practice.  I had been thinking the HADOOP-12876 work 
could eventually be refactored to hadoop-common for any {{FileSystem}} to use, 
effectively becoming something like the "dentry cache" of Hadoop.  I had been 
thinking this could happen independent of S3Guard.  We can explore further if 
that makes sense, or if it's really beneficial to push the caching lower into 
S3A itself.  (Some of the internal S3 listing calls don't map exactly to 
{{FileSystem}} method calls.)

To summarize though, I see more commonality than difference, so I'd like to 
proceed with collaborating on this.  I'd start by creating a feature branch and 
folding all of the information into a shared design doc.  Please let me know 
your thoughts.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390915#comment-15390915
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks for addressing the comments, [~cnauroth]. I like the design and look 
forward to its addressing the important S3A consistency problem in Hadoop.

{quote}
Are you recommending this based on the fact that the AWS SDK JavaDocs for 
withKeyConditions describe it as a "legacy parameter", or is there something 
more to it? This is my first time working with DynamoDB, so I'm learning as I 
go.
{quote}
Sorry I don't have solid support. It stems from my personal experience when I 
worked for DynamoDB. Key condition expression is a new and easier approach to 
using expression-style syntax for specifying the key conditions. I searched and 
found the official AWS blog 
[here|https://aws.amazon.com/blogs/aws/dynamodb-update-improved-json-editing-key-condition-expressions/].
 I then realized that the conditions of DDB requests are not that complex in 
the current design. I now believe it's minor and can be improved later.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-23 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390848#comment-15390848
 ] 

Chris Nauroth commented on HADOOP-13345:


[~liuml07], thank you for your feedback.

bq. Besides the metrics2, do you plan to support statistics (as subclass of 
StorageStatistics probably)?

I hadn't considered it, but yes, I think we can investigate adding statistics 
specific to S3Guard's implementation.

bq. As there is no limit to the number of objects that can be stored in a S3 
bucket, the S3 bucket may be very large. In this case, the consistency check 
requests may go to a single DynamoDB talbe. S3Guard may suffer from the low 
capacity units (read and write). To avoid this, the customers need to monitor 
and provision the table throughput. I suggest we consider this as the third 
"potential drawback" when using DynamoDB as a consistency store. See page 11 of 
design doc. I think using namenode should work just fine regarding the 
operation overhead.

Management of provisioned throughput is an additional source of operational 
complexity, but I failed to call that out specifically in the first revision of 
the document.  I'll add it in the next revision.

bq. fs.s3a.s3guard.fail.on.error the default value is false, which should be 
true as indicated by the config key description. I believe this is an omission.

Yes, thank you for catching it.

bq. As to the exponential back­off strategy for recheck, will the jitter be 
helpful?

Yes, FWIW, I consider jitter important enough that it should be a part of any 
exponential back-off implementation.  I tend to think of it as implicit 
whenever anyone uses the phrase "exponential back-off", but that's not 
necessarily true, so I'll state it explicitly in the next revision.

bq. I think we can also discuss on the ConsistentStore methods that a 
consistent store should implement in the design doc. Plus the DynamoDB Table 
scheme/index design. I saw in the code there is discussion about alternative 
schema ideas which is helpful.

Yes, I can fold this information into the design document.  There is a balance 
to strike as I expect some of these aspects to evolve during implementation, 
which risks invalidating an overly prescriptive upfront design document.  We'll 
figure out that balance as we go.

bq. DescendantsIterator.java claims to implement preordering depth-first 
traversal (DFS) of a path and all of its descendants recursively. The example 
given was actually a breath-first traversal (BFS). I checked the code and think 
that it did implement a BFS, which conforms with the example. I think this is 
an omission in the javadoc.

I'll need to revisit this, because I think I actually have a bug in here right 
now.  My intent was to match the iteration order as would be seen through the 
S3 object listings performed inside {{S3AFileSystem}} during recursive deletes 
and renames.  I believed matching the iteration order would make it easier to 
reason about failure modes.  However, I now realize that it's almost never 
going to match up exactly anyway, because S3 won't have a key for every 
intermediate directory, but I expect DynamoDB will.

bq. In DynamoDBConsistentStore#listChildren(), we can use key condition 
expression instead of key conditions in pathToParentEq for the query request.

Are you recommending this based on the fact that the AWS SDK JavaDocs for 
{{withKeyConditions}} describe it as a "legacy parameter", or is there 
something more to it?  This is my first time working with DynamoDB, so I'm 
learning as I go.

All of the other code suggestions look great to me.  Thanks!


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-23 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390836#comment-15390836
 ] 

Chris Nauroth commented on HADOOP-13345:


Everyone, thank you for reviewing.  There is a lot of helpful feedback here.  
I'm going to respond to everyone's points over the next several days and then 
fold that into another revision of the design document.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf, 
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378372#comment-15378372
 ] 

Aaron Fabbri commented on HADOOP-13345:
---

Thanks for posting [~cnauroth].  Very cool stuff. I'm interested in helping.

[~eddyxu] and I have also been working on a similar effort called S3 
Consistency (S3C)  and want to share it here so we can compare designs.  I will 
attach a design doc and patch of what we did.

Some interesting differences, from first glance:

- You are a little further along.   We only started this a month or so ago, 
part time.  Our DynamoDB back-end is not finished.  We wrote the design doc 
then started prototyping.   Prototyping was valuable experience.  Devil is in 
the details, as they say.
- Our prototype (S3C) treats performance as a primary goal.  There are some 
interesting tradeoffs.  We demonstrated a significant performance improvement 
with our "fully cached directory" concept (see the design doc), but having a 
non-S3 source of truth makes failure handling trickier. Ultimately, I'd love 
the policy to be configurable.
- Really interesting to compare your {{ConsistentStore}} with our 
{{MetadataStore}} interface.  Look forward to discussions on this, and the 
"caching policy" as I call it (i.e. source of truth policy).
- Our code separation at this point needs work.  We intended to refactor.  
{{S3AFileSystem}} could use that in general. We wanted to keep 
{{MetadataStore}} separate from {{s3a}}, so it wouldn't be to hard to pull it 
out of {{s3a}} and use it for other storage connectors.  We debated doing a 
separate {{FileSystem}} wrapper (discussed in design doc).  Also thought about 
doing subclassing of {{S3AFileSystem}} as you did (might be brittle to future 
change but your separation is much better).  At this point, though, the code 
served to make us aware of the details we need to conquer  in the 
{{MetadataStore}} interface as well as feasibility of using it as a source of 
truth, adaptively (Policy C in design doc).
- Having a local implementation of the {{MetadataStore}} has been fun.  The 
{{TestLocalMetadataStore}} unit tests would probably become a general contract 
test of the interface.  Intention is for local unit testing, but it happens to 
speed up something like {{hadoop -fs -copyFromLocal /my/dir/tree s3a://bucket}} 
over 2x in my quick tests.  In addition to the warning in the class comment, it 
needs a log.WARN "this is not for production use" or something.. Concerned 
about people turning it on for performance without realizing that it has zero 
cross-node coherency.

I'm on vacation next week but wanted to send out what we have.  I'm hoping we 
can help you with this and combine the best of both efforts.



> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-14 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378360#comment-15378360
 ] 

Mingliang Liu commented on HADOOP-13345:


Thanks, [~cnauroth] for the design doc and prototype patch. I like the proposal.

h6. Design doc:
# Besides the metrics2, do you plan to support statistics (as subclass of 
{{StorageStatistics}} probably)?
# As there is no limit to the number of objects that can be stored in a S3 
bucket, the S3 bucket may be very large. In this case, the consistency check 
requests may go to a single DynamoDB talbe. S3Guard may suffer from the low 
capacity units (read and write). To avoid this, the customers need to monitor 
and provision the table throughput. I suggest we consider this as the third 
"potential drawback" when using DynamoDB as a consistency store. See page 11 of 
design doc. I think using namenode should work just fine regarding the 
operation overhead.
# {{fs.s3a.s3guard.fail.on.error}} the default value is false, which should be 
true as indicated by the config key description. I believe this is an omission.
# As to the exponential back­off strategy for recheck, will the jitter be 
helpful? I referred to https://www.awsarchitectureblog.com/2015/03/backoff.html.
# I think we can also discuss on the {{ConsistentStore}} methods that a 
consistent store should implement in the design doc. Plus the DynamoDB Table 
scheme/index design. I saw in the code there is discussion about alternative 
schema ideas which is helpful.

h6. The patch:
# {{DescendantsIterator.java}} claims to implement preordering depth-first 
traversal (DFS) of a path and all of its descendants recursively. The example 
given was actually a breath-first traversal (BFS). I checked the code and think 
that it did implement a BFS, which conforms with the example. I think this is 
an omission in the javadoc.
# In {{DynamoDBConsistentStore#initTable()}}, perhaps we can call 
{{dynamodb.getTable(tableName).waitForActiveOrDelete()}} instead of sleeping 
and polling manually.
# I think we can use the DynamoDB document API (refer to 
[here|http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/document/DynamoDB.html]).
 For example, if we use the @ThreadSafe {{Table}} class, we can avoid directly 
operating the {{AmazonDynamoDBClient}} object and setting the table name for 
each request.
# For {{DynamoDBConsistentStore#get()}}, do we need to set *ConsistentRead* to 
true for the {{getItem()}} request?
# In {{DynamoDBConsistentStore#listChildren()}}, we can use key condition 
expression instead of key conditions in {{pathToParentEq}} for the query 
request.

h6. Nits:
# I understand that the {{fs.s3a.s3guard.store.table.name.prefix}} was not used 
yet in the patch.
# {{S3AFileSystem#awsConf}} can be final?
# In {{DescendantsIterator#hasNext()}} the statement {{\!(stack.isEmpty() && 
!children.hasNext());}} can be simplified as {{\!(stack.isEmpty()) || 
children.hasNext());}}. It's simpler to me.
# I may need to read the {{DescendantsIterator#move()}} and related helper 
methods carefully, but it'd be helpful if we can add some javadoc stating that 
in DynamoDB, we can not update the key schema attributes. We need to delete and 
put a new item for key changes.

Considering a dozen of TODOs in the current patch, user doc, and test, I agree 
with [~ste...@apache.org] that the work can be done in a feature branch and 
this JIRA be an umbrella JIRA for subtasks so people who are interested (like 
me) can pick up small task and contribute.


> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370553#comment-15370553
 ] 

Steve Loughran commented on HADOOP-13345:
-

Obviously I'm supportive of this. I think we should be doing this as a feature 
branch though, because it'll take >1 patch for the code to be functional. The 
other s3a work has generally been incremental.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13345) S3Guard: Improved Consistency for S3A

2016-07-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365126#comment-15365126
 ] 

Chris Nauroth commented on HADOOP-13345:


[~andrew.wang], [~jzhuge] and [~eddyxu], please have a look.  Feedback 
appreciated.  I will be offline for a while, returning 7/17, so I won't be able 
to respond to comments until then.

> S3Guard: Improved Consistency for S3A
> -
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch, 
> S3GuardImprovedConsistencyforS3A.pdf
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a 
> stronger consistency model than what is currently offered.  The solution 
> coordinates with a strongly consistent external store to resolve 
> inconsistencies caused by the S3 eventual consistency model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org