[ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031152#comment-16031152 ]
Steve Loughran commented on HADOOP-14457: ----------------------------------------- -1 as is Reviewing the patch, it does fix HADOOP-13221. But it's going to be expensive, where expensive == ~200+mS per parent entry. And as it doesn't appear to break the loop when the situation : dir on s3 & s3guard disabled. On raw S3 that should be marker where you can break successfully, relying on mkdirs() to have done the walk earlier. I've actually considered fixing HADOOP-13221 async: you can do the checking before the close(). After all, there's nothing to stop me manipulating the parent directory tree after create() and before close(), and end up with the wrong outcome. But since nobody ever noticed the problem occurring in the wild, I've never been in a rush to hurt performance for a corner case which only arises once you reset your assumptions about files and directories. # for raw-S3, I'd like a full treewalk to be optional. Even checking the parent is expensive. Needs to bail out on that first parent==dir. # tests should be for {{AbstractContractCreateTest}}; that way we can see what fails. Add a test for direct parent being a file, one for a couple of entries up. This test can go straight into trunk with any fixes/opt-outs for all the blobstores # The raw S3 tests don't need to be getFileStatus calls. as we are only looking to make sure there aren't parent files (with the absolute paths), not whether they are mock dirs or prefixes with children, That is, for a path "/a/b/c.txt", you only need to do a HEAD /a/b and a HEAD a (actually, you could do them in parallel), and skip the HEAD /a/b/ HEAD /a/, LIST /a/b, LIST /a. This will strip down on the HTTP calls, especially that LIST > create() does not notify metadataStore of parent directories or ensure > they're not existing files > ------------------------------------------------------------------------------------------------- > > Key: HADOOP-14457 > URL: https://issues.apache.org/jira/browse/HADOOP-14457 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Sean Mackrory > Attachments: HADOOP-14457-HADOOP-13345.001.patch > > > Not a great test yet, but it at least reliably demonstrates the issue. > LocalMetadataStore will sometimes erroneously report that a directory is > empty with isAuthoritative = true when it *definitely* has children the > metadatastore should know about. It doesn't appear to happen if the children > are just directory. The fact that it's returning an empty listing is > concerning, but the fact that it says it's authoritative *might* be a second > bug. > {code} > diff --git > a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > > b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > index 78b3970..1821d19 100644 > --- > a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > +++ > b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > @@ -965,7 +965,7 @@ public boolean hasMetadataStore() { > } > > @VisibleForTesting > - MetadataStore getMetadataStore() { > + public MetadataStore getMetadataStore() { > return metadataStore; > } > > diff --git > a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > > b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > index 4339649..881bdc9 100644 > --- > a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > +++ > b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > @@ -23,6 +23,11 @@ > import org.apache.hadoop.fs.contract.AbstractFSContract; > import org.apache.hadoop.fs.FileSystem; > import org.apache.hadoop.fs.Path; > +import org.apache.hadoop.fs.s3a.S3AFileSystem; > +import org.apache.hadoop.fs.s3a.Tristate; > +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata; > +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore; > +import org.junit.Test; > > import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset; > import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset; > @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws > Throwable { > boolean rename = fs.rename(srcDir, destDir); > assertFalse("s3a doesn't support rename to non-empty directory", rename); > } > + > + @Test > + public void testMkdirPopulatesFileAncestors() throws Exception { > + final FileSystem fs = getFileSystem(); > + final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore(); > + final Path parent = path("testMkdirPopulatesFileAncestors/source"); > + try { > + fs.mkdirs(parent); > + final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4"); > + byte[] srcDataset = dataset(256, 'a', 'z'); > + writeDataset(fs, nestedFile, srcDataset, srcDataset.length, > + 1024, false); > + > + DirListingMetadata list = ms.listChildren(parent); > + assertTrue("MetadataStore falsely reports authoritative empty list", > + list.isEmpty() == Tristate.FALSE || !list.isAuthoritative()); > + } finally { > + fs.delete(parent, true); > + } > + } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org