[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Fabbri updated HADOOP-14020: -- Status: Open (was: Patch Available) > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, > HADOOP-14020-HADOOP-13345.004.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Fabbri updated HADOOP-14020: -- Status: Patch Available (was: Open) > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, > HADOOP-14020-HADOOP-13345.004.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14020: --- Attachment: HADOOP-14020-HADOOP-13345.004.patch Ah good catch. Fixed! > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch, > HADOOP-14020-HADOOP-13345.004.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14020: --- Attachment: HADOOP-14020-HADOOP-13345.003.patch Thanks for the review [~fabbri] - I like the feedback. Attaching a patch that incorporates all of it. I've tested against a couple of US regions with and without '-Ds3guard -Ddynamo', with and without parallel tests. I can get all the tests to pass when I run specific tests 1 at a time, but I'm unfortunately seeing some gremlins again. Running all of them in the same maven command, I've been getting errors like this: {code} java.io.IOException: Failed to instantiate metadata store org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore defined in fs.s3a.metadatastore.impl: java.lang.IllegalArgumentException: Table sean-s3guard-test is not being created (with status=DELETING) {code} Just throwing it out there as a problem I'm seeing, because it happens both with and without this patch, so I'm satisfied it's not this patch. It's usually the same 5 or 6 tests for the most part, but the set does vary. > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch, HADOOP-14020-HADOOP-13345.003.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14020: --- Status: Patch Available (was: Open) > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14020: --- Attachment: HADOOP-14020-HADOOP-13345.002.patch Attaching a patch that just uses authoritative mode to enable / disable the new logic, and also fixing some javadoc errors. Not sure what was up with my failure to build all of a sudden yesterday. hadoop-kms was not building the classes or tests artifacts, and the failure to find them got blamed on the DynamoDB repo. Rebasing my local branch on the latest trunk did the trick... > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch, > HADOOP-14020-HADOOP-13345.002.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14020) Optimize dirListingUnion
[ https://issues.apache.org/jira/browse/HADOOP-14020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14020: --- Attachment: HADOOP-14020-HADOOP-13345.001.patch Attaching a patch with the optimization and tests. I originally went with a separate property to enable write back. The more I think about it the more I think it makes perfect sense to just use authoritative mode for this. Let me know if you disagree. It eliminates a few lines from the patch. I'm unable to build and test it because of issues with the DynamoDB Local repo that I'm having trouble working around, so just posting this for initial comment. I would vote to end up going with the next patch (that just uses the authoritative mode config) unless anyone can think of a use case that justifies 2 separate configs. > Optimize dirListingUnion > > > Key: HADOOP-14020 > URL: https://issues.apache.org/jira/browse/HADOOP-14020 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14020-HADOOP-13345.001.patch > > > There's a TODO in dirListingUnion: > {quote}// TODO optimize for when allowAuthoritative = false{quote} > There will be cases when we can intelligently avoid a round trip: if S3A > results are a subset or the metadatastore results (including them being equal > or empty) then writing back will do nothing (although perhaps that should set > the authoritative flag if it isn't set already). > There may also be cases where users want to just skip that altogether. It's > wasted work if authoritative mode is disabled, so perhaps we want to trigger > a skip if that's false, or perhaps it should be a separate property. First > one makes for simpler config, second is more flexible... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org