[GitHub] [hadoop] bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode
bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode URL: https://github.com/apache/hadoop/pull/1810#discussion_r368971623 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java ## @@ -299,29 +313,72 @@ public static BulkOperationState initiateBulkWrite( // Since the authoritative case is already handled outside this function, // we will basically start with the set of directory entries in the // DirListingMetadata, and add any that only exist in the backingStatuses. +// +// We try to avoid writing any more child entries than need be to :- +// (a) save time and money. +// (b) avoid overwriting the authoritative bit of children (HADOOP-16746). +// For auth mode updates, we supply the full listing and a list of which +// child entries have not been changed; the store gets to optimize its +// update however it chooses. +// +// for non-auth-mode S3Guard, we just build a list of entries to add and +// submit them in a batch; this is more efficient than trickling out the +// updates one-by-one. + +// track all unchanged entries; used so the metastore can identify entries +// it doesn't need to update +List unchangedEntries = new ArrayList<>(dirMeta.getListing().size()); +List nonAuthEntriesToAdd = new ArrayList<>(backingStatuses.size()); boolean changed = false; -final Map dirMetaMap = dirMeta.getListing().stream() -.collect(Collectors.toMap( -pm -> pm.getFileStatus().getPath(), PathMetadata::getFileStatus) -); +final Map dirMetaMap = dirMeta.getListing().stream() +.collect(Collectors.toMap(pm -> pm.getFileStatus().getPath(), pm -> pm)); BulkOperationState operationState = ms.initiateBulkWrite( BulkOperationState.OperationType.Listing, path); for (S3AFileStatus s : backingStatuses) { - if (deleted.contains(s.getPath())) { + final Path statusPath = s.getPath(); + if (deleted.contains(statusPath)) { continue; } - final PathMetadata pathMetadata = new PathMetadata(s); - - if (!isAuthoritative){ -FileStatus status = dirMetaMap.get(s.getPath()); -if (status != null -&& s.getModificationTime() > status.getModificationTime()) { - LOG.debug("Update ms with newer metadata of: {}", status); - S3Guard.putWithTtl(ms, pathMetadata, timeProvider, operationState); + final PathMetadata originalMD = dirMetaMap.get(statusPath); + + // this is built up to be whatever entry is added to the dirMeta + // collection + PathMetadata pathMetadata = originalMD; + + if (!isAuthoritative) { +// in non-auth listings, we compare the file status of the metastore +// list with those in the FS, and overwrite the MS entry if +// either of two conditions are met +// - there is no entry in the metadata and Review comment: nit: there is no entry in the _metastore_ and This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode
bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode URL: https://github.com/apache/hadoop/pull/1810#discussion_r368986839 ## File path: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java ## @@ -299,29 +313,72 @@ public static BulkOperationState initiateBulkWrite( // Since the authoritative case is already handled outside this function, // we will basically start with the set of directory entries in the // DirListingMetadata, and add any that only exist in the backingStatuses. +// +// We try to avoid writing any more child entries than need be to :- +// (a) save time and money. +// (b) avoid overwriting the authoritative bit of children (HADOOP-16746). +// For auth mode updates, we supply the full listing and a list of which +// child entries have not been changed; the store gets to optimize its +// update however it chooses. +// +// for non-auth-mode S3Guard, we just build a list of entries to add and +// submit them in a batch; this is more efficient than trickling out the +// updates one-by-one. + +// track all unchanged entries; used so the metastore can identify entries +// it doesn't need to update +List unchangedEntries = new ArrayList<>(dirMeta.getListing().size()); +List nonAuthEntriesToAdd = new ArrayList<>(backingStatuses.size()); boolean changed = false; -final Map dirMetaMap = dirMeta.getListing().stream() -.collect(Collectors.toMap( -pm -> pm.getFileStatus().getPath(), PathMetadata::getFileStatus) -); +final Map dirMetaMap = dirMeta.getListing().stream() +.collect(Collectors.toMap(pm -> pm.getFileStatus().getPath(), pm -> pm)); BulkOperationState operationState = ms.initiateBulkWrite( BulkOperationState.OperationType.Listing, path); for (S3AFileStatus s : backingStatuses) { - if (deleted.contains(s.getPath())) { + final Path statusPath = s.getPath(); + if (deleted.contains(statusPath)) { continue; } - final PathMetadata pathMetadata = new PathMetadata(s); - - if (!isAuthoritative){ -FileStatus status = dirMetaMap.get(s.getPath()); -if (status != null -&& s.getModificationTime() > status.getModificationTime()) { - LOG.debug("Update ms with newer metadata of: {}", status); - S3Guard.putWithTtl(ms, pathMetadata, timeProvider, operationState); + final PathMetadata originalMD = dirMetaMap.get(statusPath); + + // this is built up to be whatever entry is added to the dirMeta + // collection + PathMetadata pathMetadata = originalMD; + + if (!isAuthoritative) { Review comment: Could we factor out this to another method? Just to make Uncle Bob happy :). Jokes aside this grew really huge, and we should split the auth and non-auth execution paths to be more maintainable in the future. It was your idea in one of your comments on the PR, and that was a good idea. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode
bgaborg commented on a change in pull request #1810: HADOOP-16746 mkdirs and s3guard auth mode URL: https://github.com/apache/hadoop/pull/1810#discussion_r368973683 ## File path: hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreAuthoritativeMode.java ## @@ -55,7 +53,7 @@ import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides; import static org.apache.hadoop.fs.s3a.S3AUtils.applyLocatedFiles; import static org.apache.hadoop.fs.s3a.Statistic.OBJECT_LIST_REQUESTS; -import static org.apache.hadoop.fs.s3a.Statistic.S3GUARD_METADATASTORE_AUTHORITATIVE_DIRECTORIES_UPDATED; +import static org.apache.hadoop.fs.s3a.Statistic.S3GUARD_METADATASTORE_AUTHORITATIVE_DIRECTORIES_UPDATED;import static org.apache.hadoop.fs.s3a.Statistic.S3GUARD_METADATASTORE_RECORD_WRITES; Review comment: nit: import static in new line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org