[ https://issues.apache.org/jira/browse/HADOOP-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416136#comment-15416136 ]
Chris Nauroth commented on HADOOP-13448: ---------------------------------------- bq. Why does your {{DynamoDBConsistentStore#save()}} implementation walk the path to the root and save all ancestor paths as well? That's a good observation. I think this is a weakness of my prototype, not a desirable choice intended to carry through to the full implementation. More specifically, I approached my prototype by developing a separate hadoop-s3guard module with a new {{ConsistentS3AFileSystem}} class defined as a subclass of the existing {{S3AFileSystem}} class. The benefit of this approach was that I didn't need to make a lot of code changes directly in hadoop-aws, so I could develop the prototype isolated from the churn of merge conflicts on upstream hadoop-aws patches. (There was a lot of optimization and bug fixing happening concurrently at the time.) The drawback of this approach was that it constrained my implementation. For {{mkdirs}}, I could only call the superclass and then pass the path to {{ConsistentStore#save}}, so the consistent store code needed a complete implementation using solely that path argument. There was no way for me to preserve the information discovered in {{S3AFileSystem#innerMkdirs}} about which intermediate directories were missing, as was done in your prototype. I came to the conclusion that the subclassing approach wouldn't be ideal for reasons like this. We can get better results by hooking into implementation details more deeply, and that led me to the refactoring proposed on HADOOP-13447. Between {{S3Store}}, {{AbstractS3AccessPolicy}} and the {{MetadataStore}} interface, we should feel free to evolve those interfaces however it best suits requirements. They are internal interfaces, so they don't need to be constrained by the Hadoop compatibility guidelines, as long as {{S3AFileSystem}} can translate back to the public {{FileSystem}} interface at the end. In the example you gave here, maybe that means something like {{S3Store#mkdirs}} returning a result object that lists which directories in the ancestry were not pre-existing. Another smaller reason my prototype worked that way is that it was also easy to hook a call to {{ConsistentStore#save}} onto the close of the stream returned by {{FileSystem#create}}. Unlike {{mkdirs}}, there is no such walk up the ancestry to check for pre-existing directories there, so I had to take care of it entirely within my code. This is really more of a bug in the existing S3A code though that I was working around. (See HADOOP-13221.) > S3Guard: Define MetadataStore interface. > ---------------------------------------- > > Key: HADOOP-13448 > URL: https://issues.apache.org/jira/browse/HADOOP-13448 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > > Define the common interface for metadata store operations. This is the > interface that any metadata back-end must implement in order to integrate > with S3Guard. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org