Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "AmazonS3" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/AmazonS3?action=diff&rev1=18&rev2=19 Comment: remove all content on configuring the S3 filesystems -point to the markdown docs on github instead. = History = * The S3 block filesystem was introduced in Hadoop 0.10.0 ([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]). * The S3 native filesystem was introduced in Hadoop 0.18.0 ([[http://issues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename support was added in Hadoop 0.19.0 ([[https://issues.apache.org/jira/browse/HADOOP-3361|HADOOP-3361]]). - * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found and fixed for later Hadoop versions[[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]], so Hadoop-2.6.0's support of s3a must be considered an incomplete replacement for the s3n FS. + * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found and fixed for later Hadoop versions [[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]]. - = Why you cannot use S3 as a replacement for HDFS = + = Configuring and using the S3 filesystem support = + + Consult the [[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md|Latest Hadoop documentation]] for the specifics on using any of the S3 clients. + + + = Important: you cannot use S3 as a replacement for HDFS = + - You cannot use either of the S3 filesystem clients as a drop-in replacement for HDFS. Amazon S3 is an "object store" with + You cannot use any of the S3 filesystem clients as a drop-in replacement for HDFS. Amazon S3 is an "object store" with * eventual consistency: changes made by one application (creation, updates and deletions) will not be visible until some undefined time. * s3n and s3a: non-atomic rename and delete operations. Renaming or deleting large directories takes time proportional to the number of entries -and visible to other processes during this time, and indeed, until the eventual consistency has been resolved. S3 is not a filesystem. The Hadoop S3 filesystem bindings make it pretend to be a filesystem, but it is not. It can act as a source of data, and as a destination -though in the latter case, you must remember that the output may not be immediately visible. - - == Configuring to use s3/ s3n filesystems == - - Edit your `core-site.xml` file to include your S3 keys - - {{{ - - <property> - <name>fs.s3.awsAccessKeyId</name> - <value>ID</value> - </property> - - <property> - <name>fs.s3.awsSecretAccessKey</name> - <value>SECRET</value> - </property> - }}} - - You can then use URLs to your bucket : ``s3n://MYBUCKET/``, or directories and files inside it. - - {{{ - - s3n://BUCKET/ - s3n://BUCKET/dir - s3n://BUCKET/dir/files.csv.tar.gz - s3n://BUCKET/dir/*.gz - - }}} - - Alternatively, you can put the access key ID and the secret access key into a ''s3n'' (or ''s3'') URI as the user info: - - {{{ - s3n://ID:SECRET@BUCKET - }}} - - Note that since the secret - access key can contain slashes, you must remember to escape them by replacing each slash `/` with the string `%2F`. - Keys specified in the URI take precedence over any specified using the properties `fs.s3.awsAccessKeyId` and - `fs.s3.awsSecretAccessKey`. - - This option is less secure as the URLs are likely to appear in output logs and error messages, so being exposed to remote users. = Security =