[jira] [Comment Edited] (HADOOP-14236) S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata store

2017-03-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948009#comment-15948009
 ] 

Mingliang Liu edited comment on HADOOP-14236 at 3/29/17 10:43 PM:
--

[~fabbri], thanks for your comment and review. You get the point precisely, and 
even better.

For the inferred ancestor directories of nested directory, I was thinking that 
they're listed explicitly in S3A {{listObjects()}}. I got that impression 
because when we mkdir(), unlike creating new file, we don't delete its fake 
parent directory objects. There is a long-pending TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all 
ancestors are not listed in later {{listObjects()}} call. For example (using 
your example), {{/a/source}} is empty directory and after making subdirectory 
{{/a/source/c/d}}, later {{listObjects("/a/source")}} will get {{/a/source}} 
and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I filed 
[HADOOP-14255] to address this by deleting fake directory objects when 
{{mkdir()}}: that will make the senmatic easier. Thanks [~ste...@apache.org] 
for offline discussion.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. By enabling S3Guard, they 
can not pass w/o this patch while they can pass w/ this patch. By disabling 
S3Guard, they pass as expected.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), 
I found the file system object in {{MetadataStoreTestBase}} is mocked. So 
testing integration code between S3AFileSystem and S3Guard seems challenging 
there. If that part of code is MS dependent, perhaps we can skip testing that 
part as long as the FS contract is verified carefully.


was (Author: liuml07):
[~fabbri], thanks for your comment and review. You get the point precisely, and 
even better.

For the inferred ancestor directories of nested directory, I was thinking that 
they're listed explicitly in S3A {{listObjects()}}. I got that impression 
because when we mkdir(), unlike creating new file, we don't delete its fake 
parent directory objects. There is a long-pending TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all 
ancestors are not listed in later {{listObjects()}} call. For example (using 
your example), {{/a/source}} is empty directory and after making subdirectory 
{{/a/source/c/d}}, later {{listObjects("/a/source")}} will get {{/a/source}} 
and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I filed 
[HADOOP-14255] to address this by deleting fake directory objects when 
{{mkdir()}}: that will make the senmatic easier. Thanks [~ste...@apache.org] 
for offline discussion.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. They can not pass w/o this 
patch while can pass w/ this patch.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), 
I found the file system object in {{MetadataStoreTestBase}} is mocked. So 
testing integration code between S3AFileSystem and S3Guard seems challenging 
there. If that part of code is MS dependent, perhaps we can skip testing that 
part as long as the FS contract is verified carefully.

> S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries 
> in metadata store
> ---
>
> Key: HADOOP-14236
> URL: https://issues.apache.org/jira/browse/HADOOP-14236
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HADOOP-14236-HADOOP-13345.000.patch, 
> HADOOP-14236-HADOOP-13345.001.patch, HADOOP-14236-HADOOP-13345.002.patch, 
> HADOOP-14236-HADOOP-13345.003.patch, HADOOP-14236-HADOOP-13345.004.patch
>
>
> After running integration test {{ITestS3AFileSystemContract}}, I found the 
> following items are not cleaned up in DynamoDB:
> {code}
> 

[jira] [Comment Edited] (HADOOP-14236) S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata store

2017-03-29 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948009#comment-15948009
 ] 

Mingliang Liu edited comment on HADOOP-14236 at 3/29/17 10:32 PM:
--

[~fabbri], thanks for your comment and review. You get the point precisely, and 
even better.

For the inferred ancestor directories of nested directory, I was thinking that 
they're listed explicitly in S3A {{listObjects()}}. I got that impression 
because when we mkdir(), unlike creating new file, we don't delete its fake 
parent directory objects. There is a long-pending TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all 
ancestors are not listed in later {{listObjects()}} call. For example (using 
your example), {{/a/source}} is empty directory and after making subdirectory 
{{/a/source/c/d}}, later {{listObjects("/a/source")}} will get {{/a/source}} 
and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I filed 
[HADOOP-14255] to address this by deleting fake directory objects when 
{{mkdir()}}: that will make the senmatic easier. Thanks [~ste...@apache.org] 
for offline discussion.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. They can not pass w/o this 
patch while can pass w/ this patch.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), 
I found the file system object in {{MetadataStoreTestBase}} is mocked. So 
testing integration code between S3AFileSystem and S3Guard seems challenging 
there. If that part of code is MS dependent, perhaps we can skip testing that 
part as long as the FS contract is verified carefully.


was (Author: liuml07):
[~fabbri], thanks for your comment and review. You get the point precisely, and 
even better.

For the inferred ancestor directories of nested directory, I was thinking that 
they're listed explicitly in S3A {{listObjects()}}. That's because when we 
mkdir(), unlike creating new file, we don't delete it's fake parent directory 
objects. There is a long-pending TODO for {{mkdir()}}:
{quote}
// TODO: If we have created an empty file at /foo/bar and we then call
// mkdirs for /foo/bar/baz/roo what happens to the empty file /foo/bar/?
private boolean innerMkdirs(Path p, FsPermission permission)
{quote}
However, we can see that we still need to take care of its ancestors, as all 
ancestors are not listed in later {{listObjects()}} call. For example (using 
your example), {{/a/source}} is empty directory and after making subdirectory 
{{/a/source/c/d}}, later {{listObjects("/a/source")}} will get {{/a/source}} 
and {{/a/source/c/d}}: w/o {{/a/source/c}} but w/ {{/a/source}}. I filed 
[HADOOP-14255] to address this by deleting fake directory objects when 
{{mkdir()}}: that will make the senmatic easier.

The new patch (tested against us-west-1):
# takes care of its ancestors for both a nested file and a nested directory.
# makes test code FileSystem-only in the new patch. They can not pass w/o this 
patch while can pass w/ this patch.
# moves the "add inferred directories" logic into a helper function

For asserting MetadataStore (and {{AbstractMSContract#allowMissing()}} check), 
I found the file system object in {{MetadataStoreTestBase}} is mocked. So 
testing integration code between S3AFileSystem and S3Guard seems challenging 
there. If that part of code is MS dependent, perhaps we can skip testing that 
part as long as the FS contract is verified carefully.

> S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries 
> in metadata store
> ---
>
> Key: HADOOP-14236
> URL: https://issues.apache.org/jira/browse/HADOOP-14236
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HADOOP-14236-HADOOP-13345.000.patch, 
> HADOOP-14236-HADOOP-13345.001.patch, HADOOP-14236-HADOOP-13345.002.patch, 
> HADOOP-14236-HADOOP-13345.003.patch
>
>
> After running integration test {{ITestS3AFileSystemContract}}, I found the 
> following items are not cleaned up in DynamoDB:
> {code}
> parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExisting/dir,
>  child=subdir
> 

[jira] [Comment Edited] (HADOOP-14236) S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata store

2017-03-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947439#comment-15947439
 ] 

Steve Loughran edited comment on HADOOP-14236 at 3/29/17 4:26 PM:
--

As mingliang has just pointed out to me, s3a's {{mkdirs()}} doesn't delete mock 
parent paths, so 
{{mkdir /a}}  creates "a/"
{{mkdir /a/b}} creates "a/b/" and leaves "a/"
only in the {{reate(a/b/c.txt)}} will a/ and b/ get deleted

FS contract tests need to verify empty directory create through sequence of 

{code}
mkdir a
mkdir a/b
mkdir a/b/c
rename (a, d)
assertIsDir(d/b)
assertIsDir(d/b/c)
{code}

+ maybe also create the dest path using 1+ mkdir call, so it also has mock 
parent entries


was (Author: ste...@apache.org):
As mingliang has just pointed out to me, s3a's {{mkdirs()}} doesn't delete mock 
parent paths, so 
{{mkdir /a}}  creates "a/"
{{mkdir /a/b}} creates "a/b/" and leaves "a/"
only in the {{reate(a/b/c.txt)}} will a/ and b/ get deleted

FS contract tests need to verify empty directory create through sequence of 

{code}
mkdir a
mkdir a/b
mkdir a/b/c
rename (a, d)
assertIsDir(d/b)
assertIsDir(d/b/c)
{code}

> S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries 
> in metadata store
> ---
>
> Key: HADOOP-14236
> URL: https://issues.apache.org/jira/browse/HADOOP-14236
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HADOOP-14236-HADOOP-13345.000.patch, 
> HADOOP-14236-HADOOP-13345.001.patch, HADOOP-14236-HADOOP-13345.002.patch
>
>
> After running integration test {{ITestS3AFileSystemContract}}, I found the 
> following items are not cleaned up in DynamoDB:
> {code}
> parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExisting/dir,
>  child=subdir
> parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExistingNew/newdir/subdir,
>  child=file2
> {code}
> At first I thought it’s similar to [HADOOP-14226] or [HADOOP-14227], and we 
> need to be careful when cleaning up test data.
> Then I found it’s a bug(?) in the code of integrating S3Guard with 
> S3AFileSystem: for rename we miss sub-directory items to put (dest) and 
> delete (src). The reason is that in S3A, we delete those fake directory 
> objects if they are not necessary, e.g. non-empty. So when we list the 
> objects to rename, the object summaries will only return _file_ objects. This 
> has two consequences after rename:
> #  there will be left items for src path in metadata store - left-overs will 
> confuse {{get(Path)}} which should return null
> # we are not persisting the whole subtree for dest path to metadata store - 
> this will break the DynamoDBMetadataStore invariant: _if a path exists, all 
> its ancestors will also exist in the table_.
> UPDATE: modified test case {{ITestS3AFileSystemContract:: 
> testRenameDirectoryAsExistingDirectory()}} will fail w/o this patch; it 
> passes w/ this patch. If the test case makes sense, the proposal follows.
> Existing tests are not complaining about this though. If this is a real bug, 
> let’s address it here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org