[ 
https://issues.apache.org/jira/browse/HADOOP-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416136#comment-15416136
 ] 

Chris Nauroth commented on HADOOP-13448:
----------------------------------------

bq. Why does your {{DynamoDBConsistentStore#save()}} implementation walk the 
path to the root and save all ancestor paths as well?

That's a good observation.  I think this is a weakness of my prototype, not a 
desirable choice intended to carry through to the full implementation.

More specifically, I approached my prototype by developing a separate 
hadoop-s3guard module with a new {{ConsistentS3AFileSystem}} class defined as a 
subclass of the existing {{S3AFileSystem}} class.  The benefit of this approach 
was that I didn't need to make a lot of code changes directly in hadoop-aws, so 
I could develop the prototype isolated from the churn of merge conflicts on 
upstream hadoop-aws patches.  (There was a lot of optimization and bug fixing 
happening concurrently at the time.)  The drawback of this approach was that it 
constrained my implementation.  For {{mkdirs}}, I could only call the 
superclass and then pass the path to {{ConsistentStore#save}}, so the 
consistent store code needed a complete implementation using solely that path 
argument.  There was no way for me to preserve the information discovered in 
{{S3AFileSystem#innerMkdirs}} about which intermediate directories were 
missing, as was done in your prototype.

I came to the conclusion that the subclassing approach wouldn't be ideal for 
reasons like this.  We can get better results by hooking into implementation 
details more deeply, and that led me to the refactoring proposed on 
HADOOP-13447.  Between {{S3Store}}, {{AbstractS3AccessPolicy}} and the 
{{MetadataStore}} interface, we should feel free to evolve those interfaces 
however it best suits requirements.  They are internal interfaces, so they 
don't need to be constrained by the Hadoop compatibility guidelines, as long as 
{{S3AFileSystem}} can translate back to the public {{FileSystem}} interface at 
the end.  In the example you gave here, maybe that means something like 
{{S3Store#mkdirs}} returning a result object that lists which directories in 
the ancestry were not pre-existing.

Another smaller reason my prototype worked that way is that it was also easy to 
hook a call to {{ConsistentStore#save}} onto the close of the stream returned 
by {{FileSystem#create}}.  Unlike {{mkdirs}}, there is no such walk up the 
ancestry to check for pre-existing directories there, so I had to take care of 
it entirely within my code.  This is really more of a bug in the existing S3A 
code though that I was working around.  (See HADOOP-13221.)

> S3Guard: Define MetadataStore interface.
> ----------------------------------------
>
>                 Key: HADOOP-13448
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13448
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>
> Define the common interface for metadata store operations.  This is the 
> interface that any metadata back-end must implement in order to integrate 
> with S3Guard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to