[ 
https://issues.apache.org/jira/browse/HADOOP-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613313#comment-15613313
 ] 

Aaron Fabbri commented on HADOOP-13449:
---------------------------------------

Exciting stuff, thanks for update.

{quote}
I changed the base unit test as the owner, group and permission etc are not 
part of the metadata we're interested in by now.
{quote}

Good. We could have a helper function that all tests could use, e.g. 
doesMetadataStorePersistOwnerGroupPermission() which returns false if 
MetadataStore instanceof DynamoDBMetadataStore.  This is also another spot it 
might be nice to add a function {{getProperty()}} for MetadataStore, so we 
could {{getProperty(PERSISTS_PERMISSIONS}} etc.  We could do that later on.

{quote}
We store the is_empty for directory in the DynamoDB (DDB) metadata store now. 
We have to update this information in a consistent and efficient way. We don't 
want to check the parent directory every time we delete/put a file item. At 
least we can optimize this when deleting a subtree.
{quote}
This part is a pain.  We should revisit the whole 
{{S3AFileStatus#isEmptyDirectory}} idea in the future. 

In case it helps, my algorithm is here:

In put(PathMetadata meta):
{code}
  if we have PathMetadata for meta's parent path:
      parentMeta.setIsEmpty(false)
{code}

The harder case, when we are removing an entry:

{code}

      // If we have cached a FileStatus for the parent...
      DirListingMetadata dir = dirHash.get(parent);
      if (dir != null) {
        LOG.debug("removing parent's entry for {} ", path);

        // Remove our path from the parent dir
        dir.remove(path);

        // S3A-specific logic dealing with S3AFileStatus#isEmptyDirectory()
        if (isS3A) {
          if (dir.isAuthoritative() && dir.numEntries() == 0) {
            setS3AIsEmpty(parent, true);
          } else if (dir.numEntries() == 0) {
            // We do not know of any remaining entries in parent directory.
            // However, we do not have authoritative listing, so there may
            // still be some entries in the dir.  Since we cannot know the
            // proper state of the parent S3AFileStatus#isEmptyDirectory, we
            // will invalidate our entries for it.
            // Better than deleting entries would be marking them as "missing
            // metadata".  Deleting them means we lose consistent listing and
            // ability to retry for eventual consistency for the parent path.

            // TODO implement missing metadata feature
            invalidateFileStatus(parent);
          }
          // else parent directory still has entries in it, isEmptyDirectory
          // does not change
        }
{code}

Fixing the loss of consistency on the parent could be achieved by leaving an 
empty PathMetadata for the parent that does not contain a FileStatus in it.  
That "missing metadata" PathMetadata would indicate to future getFileStatus() 
or listStatus() calls that the file does exist (so retry if S3 is eventually 
consistent), but the FileStatus needs to be fetched from S3, since we cannot 
know the value of its isEmptyDirectory()

I added a TODO because we can tackle this later if we want.

{quote}The contract assumes we create the direct parent directory (other 
ancestors should be taken care of by the clients/callers) when putting a new 
file item{quote}

Yeah this is for consistent listing on the parent after the child is created.  
I'm wondering if we can relax this or make it configurable?  When 
{{fs.s3a.metadatastore.authoritative}} is true, the performance hit on create 
could be offset by a performance gain on subsequent listing of the parent 
directory. 

Looks like good progress! Please shout if I can help at all.


> S3Guard: Implement DynamoDBMetadataStore.
> -----------------------------------------
>
>                 Key: HADOOP-13449
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13449
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13449-HADOOP-13345.000.patch, 
> HADOOP-13449-HADOOP-13345.001.patch
>
>
> Provide an implementation of the metadata store backed by DynamoDB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to