[ http://issues.apache.org/jira/browse/HADOOP-574?page=comments#action_12449188 ] Tom White commented on HADOOP-574: ----------------------------------
Thanks Doug. Collaboration sounds good: I'll contact Jim directly. Regarding HADOOP-571 I agree it makes sense to tackle this in conjunction. I'll have a look at it after we get the basics of the S3 filesystem working. As far as the design goes I agree that (like DFS) the S3 filesystem should divide things into blocks and buffer them to disk before writing them to S3. I'm not sure about using putting the block number at the end of the filename (using a delimiter) since this makes renames very inefficient as S3 has no rename operation. Instead I have opted for a level of indirection whereby the S3 object at the filename is a metadata file which lists the block IDs that hold the data. A rename then is simply a re-PUT of the metadata. What do you think? The other aspect I haven't put much thought into yet is locking. Keeping the number of HTTP requests to a minimum will be an interesting challenge. > want FileSystem implementation for Amazon S3 > -------------------------------------------- > > Key: HADOOP-574 > URL: http://issues.apache.org/jira/browse/HADOOP-574 > Project: Hadoop > Issue Type: New Feature > Components: fs > Reporter: Doug Cutting > > An S3-based Hadoop FileSystem would make a great addition to Hadoop. > It would facillitate use of Hadoop on Amazon's EC2 computing grid, as > discussed here: > http://www.mail-archive.com/[email protected]/msg00318.html > This is related to HADOOP-571, which would make Hadoop's FileSystem > considerably easier to extend. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
