[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031152#comment-16031152
 ] 

Steve Loughran commented on HADOOP-14457:
-----------------------------------------

-1 as is

Reviewing the patch, it does fix HADOOP-13221. But it's going to be expensive, 
where expensive == ~200+mS per parent entry. And as it doesn't appear to break 
the loop when the situation : dir on s3 & s3guard disabled. On raw S3 that 
should be marker where you can break successfully, relying on mkdirs() to have 
done the walk earlier.

I've actually considered fixing HADOOP-13221 async: you can do the checking 
before the close(). After all, there's nothing to stop me manipulating the 
parent directory tree after create() and before close(), and end up with the 
wrong outcome.  But since nobody ever noticed the problem occurring in the 
wild, I've never been in a rush to hurt performance for a corner case which 
only arises once you reset your assumptions about files and directories.

# for raw-S3, I'd like a full treewalk to be optional. Even checking the parent 
is expensive. Needs to bail out on that first parent==dir.
# tests should be for {{AbstractContractCreateTest}}; that way we can see what 
fails. Add a test for direct parent being a file, one for a couple of entries 
up. This test can go straight into trunk with any fixes/opt-outs for all the 
blobstores
# The raw S3 tests don't need to be getFileStatus calls. as we are only looking 
to make sure there aren't parent files (with the absolute paths), not whether 
they are mock dirs or prefixes with children, That is, for a path "/a/b/c.txt", 
you only need to do a HEAD /a/b and a HEAD a (actually, you could do them in 
parallel), and skip the HEAD /a/b/ HEAD /a/, LIST /a/b, LIST /a. This will 
strip down on the HTTP calls, especially that LIST
 

> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14457
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14457
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-14457-HADOOP-13345.001.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. 
> LocalMetadataStore will sometimes erroneously report that a directory is 
> empty with isAuthoritative = true when it *definitely* has children the 
> metadatastore should know about. It doesn't appear to happen if the children 
> are just directory. The fact that it's returning an empty listing is 
> concerning, but the fact that it says it's authoritative *might* be a second 
> bug.
> {code}
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>    }
>  
>    @VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>      return metadataStore;
>    }
>  
> diff --git 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>  
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> @@ -23,6 +23,11 @@
>  import org.apache.hadoop.fs.contract.AbstractFSContract;
>  import org.apache.hadoop.fs.FileSystem;
>  import org.apache.hadoop.fs.Path;
> +import org.apache.hadoop.fs.s3a.S3AFileSystem;
> +import org.apache.hadoop.fs.s3a.Tristate;
> +import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
> +import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
> +import org.junit.Test;
>  
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
>  import static org.apache.hadoop.fs.contract.ContractTestUtils.writeDataset;
> @@ -72,4 +77,24 @@ public void testRenameDirIntoExistingDir() throws 
> Throwable {
>      boolean rename = fs.rename(srcDir, destDir);
>      assertFalse("s3a doesn't support rename to non-empty directory", rename);
>    }
> +
> +  @Test
> +  public void testMkdirPopulatesFileAncestors() throws Exception {
> +    final FileSystem fs = getFileSystem();
> +    final MetadataStore ms = ((S3AFileSystem) fs).getMetadataStore();
> +    final Path parent = path("testMkdirPopulatesFileAncestors/source");
> +    try {
> +      fs.mkdirs(parent);
> +      final Path nestedFile = new Path(parent, "dir1/dir2/dir3/file4");
> +      byte[] srcDataset = dataset(256, 'a', 'z');
> +      writeDataset(fs, nestedFile, srcDataset, srcDataset.length,
> +          1024, false);
> +
> +      DirListingMetadata list = ms.listChildren(parent);
> +      assertTrue("MetadataStore falsely reports authoritative empty list",
> +          list.isEmpty() == Tristate.FALSE || !list.isAuthoritative());
> +    } finally {
> +      fs.delete(parent, true);
> +    }
> +  }
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to