[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-26 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14020:
--
Status: Open  (was: Patch Available)

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, 
> HADOOP-14020-HADOOP-13345.004.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-26 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14020:
--
Status: Patch Available  (was: Open)

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, 
> HADOOP-14020-HADOOP-13345.004.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-26 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14020:
---
Attachment: HADOOP-14020-HADOOP-13345.004.patch

Ah good catch. Fixed!

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, 
> HADOOP-14020-HADOOP-13345.004.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-26 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14020:
---
Attachment: HADOOP-14020-HADOOP-13345.003.patch

Thanks for the review [~fabbri] - I like the feedback. Attaching a patch that 
incorporates all of it.

I've tested against a couple of US regions with and without '-Ds3guard 
-Ddynamo', with and without parallel tests. I can get all the tests to pass 
when I run specific tests 1 at a time, but I'm unfortunately seeing some 
gremlins again. Running all of them in the same maven command, I've been 
getting errors like this:

{code}
java.io.IOException: Failed to instantiate metadata store 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore defined in 
fs.s3a.metadatastore.impl: java.lang.IllegalArgumentException: Table 
sean-s3guard-test is not being created (with status=DELETING)
{code}

Just throwing it out there as a problem I'm seeing, because it happens both 
with and without this patch, so I'm satisfied it's not this patch. It's usually 
the same 5 or 6 tests for the most part, but the set does vary.

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-25 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14020:
---
Status: Patch Available  (was: Open)

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-25 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14020:
---
Attachment: HADOOP-14020-HADOOP-13345.002.patch

Attaching a patch that just uses authoritative mode to enable / disable the new 
logic, and also fixing some javadoc errors. 

Not sure what was up with my failure to build all of a sudden yesterday. 
hadoop-kms was not building the classes or tests artifacts, and the failure to 
find them got blamed on the DynamoDB repo. Rebasing my local branch on the 
latest trunk did the trick...

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch, 
> HADOOP-14020-HADOOP-13345.002.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion

2017-01-24 Thread Sean Mackrory (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14020:
---
Attachment: HADOOP-14020-HADOOP-13345.001.patch

Attaching a patch with the optimization and tests. I originally went with a 
separate property to enable write back. The more I think about it the more I 
think it makes perfect sense to just use authoritative mode for this. Let me 
know if you disagree. It eliminates a few lines from the patch. I'm unable to 
build and test it because of issues with the DynamoDB Local repo that I'm 
having trouble working around, so just posting this for initial comment. I 
would vote to end up going with the next patch (that just uses the 
authoritative mode config) unless anyone can think of a use case that justifies 
2 separate configs.

> Optimize dirListingUnion
> 
>
> Key: HADOOP-14020
> URL: https://issues.apache.org/jira/browse/HADOOP-14020
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14020-HADOOP-13345.001.patch
>
>
> There's a TODO in dirListingUnion:
> {quote}// TODO optimize for when allowAuthoritative = false{quote}
> There will be cases when we can intelligently avoid a round trip: if S3A 
> results are a subset or the metadatastore results (including them being equal 
> or empty) then writing back will do nothing (although perhaps that should set 
> the authoritative flag if it isn't set already).
> There may also be cases where users want to just skip that altogether. It's 
> wasted work if authoritative mode is disabled, so perhaps we want to trigger 
> a skip if that's false, or perhaps it should be a separate property. First 
> one makes for simpler config, second is more flexible...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org