[ 
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616699#comment-15616699
 ] 

Chris Nauroth commented on HADOOP-13651:
----------------------------------------

Hello [~fabbri].  Thank you for sharing your patch.

I have not yet reviewed everything, but first I would like to discuss the 
management of {{MetadataStore}} as a singleton.  This could be problematic for 
a process that wants to access multiple {{S3AFileSystem}} instances backed by 
different S3 buckets.  A concrete example of this would be a DistCp task 
copying data from one bucket to another.

I had been thinking there would be a 1:1 cardinality relationship between 
{{S3AFileSystem}} instances and {{MetadataStore}} instances.  An 
{{S3AFileSystem}} instance accesses exactly one bucket, and likewise, a 
{{DynamoDBMetadataStore}} instance would access exactly one DynamoDB table.  (I 
also see this relationship is carried through into the latest HADOOP-13449 
patch from [~liuml07].)

I think this is overall the easiest implementation path that supports use of 
multiple {{S3AFileSystem}} instances in the same process.  I suppose the 
{{MetadataStore}} implementations could be made flexible to handle paths from 
multiple {{S3AFileSystem}} instances, but that seems to lead to more complexity 
to manage mapping tables and multiple AWS SDK client instances within the 
{{MetadataStore}} implementation.

If the goal is to guard against costly repeated initialization, then I think 
the {{FileSystem}} cache already has us covered.  {{S3AFileSystem}} instances 
can get reused via the cache, and assuming the 1:1 relationship, the 
corresponding {{MetadataStore}} would get reused too.

> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13651-HADOOP-13345.001.patch, 
> HADOOP-13651-HADOOP-13345.002.patch, HADOOP-13651-HADOOP-13345.003.patch
>
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata 
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is 
> configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to