[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran

Apache Wiki Tue, 21 Apr 2015 05:10:54 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "AmazonS3" page has been changed by SteveLoughran:
https://wiki.apache.org/hadoop/AmazonS3?action=diff&rev1=18&rev2=19

Comment:
remove all content on configuring the S3 filesystems -point to the markdown 
docs on github instead.

  = History =
   * The S3 block filesystem was introduced in Hadoop 0.10.0 
([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]).
   * The S3 native filesystem was introduced in Hadoop 0.18.0 
([[http://issues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename 
support was added in Hadoop 0.19.0 
([[https://issues.apache.org/jira/browse/HADOOP-3361|HADOOP-3361]]).
-  * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found 
and fixed for later Hadoop 
versions[[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]], so 
Hadoop-2.6.0's support of s3a must be considered an incomplete replacement for 
the s3n FS.
+  * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found 
and fixed for later Hadoop versions 
[[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]].
  
- = Why you cannot use S3 as a replacement for HDFS =
  
+ = Configuring and using the S3 filesystem support =
+ 
+ Consult the 
[[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md|Latest
 Hadoop documentation]] for the specifics on using any of the S3 clients.
+ 
+ 
+ = Important: you cannot use S3 as a replacement for HDFS =
+ 
- You cannot use either of the S3 filesystem clients as a drop-in replacement 
for HDFS. Amazon S3 is an "object store" with
+ You cannot use any of the S3 filesystem clients as a drop-in replacement for 
HDFS. Amazon S3 is an "object store" with
   * eventual consistency: changes made by one application (creation, updates 
and deletions) will not be visible until some undefined time.
   * s3n and s3a: non-atomic rename and delete operations. Renaming or deleting 
large directories takes time proportional to the number of entries -and visible 
to other processes during this time, and indeed, until the eventual consistency 
has been resolved.
  
  S3 is not a filesystem. The Hadoop S3 filesystem bindings make it pretend to 
be a filesystem, but it is not. It can
  act as a source of data, and as a destination -though in the latter case, you 
must remember that the output may not be immediately visible.
- 
- == Configuring to use s3/ s3n filesystems ==
- 
- Edit your `core-site.xml` file to include your S3 keys
- 
- {{{
- 
- <property>
-   <name>fs.s3.awsAccessKeyId</name>
-   <value>ID</value>
- </property>
- 
- <property>
-   <name>fs.s3.awsSecretAccessKey</name>
-   <value>SECRET</value>
- </property>
- }}}
- 
- You can then use URLs to your bucket : ``s3n://MYBUCKET/``, or directories 
and files inside it.
- 
- {{{
- 
- s3n://BUCKET/
- s3n://BUCKET/dir
- s3n://BUCKET/dir/files.csv.tar.gz
- s3n://BUCKET/dir/*.gz
- 
- }}}
- 
- Alternatively, you can put the access key ID and the secret access key into a 
''s3n'' (or ''s3'') URI as the user info:
- 
- {{{
-   s3n://ID:SECRET@BUCKET
- }}}
- 
- Note that since the secret
- access key can contain slashes, you must remember to escape them by replacing 
each slash `/` with the string `%2F`.
- Keys specified in the URI take precedence over any specified using the 
properties `fs.s3.awsAccessKeyId` and
- `fs.s3.awsSecretAccessKey`.
- 
- This option is less secure as the URLs are likely to appear in output logs 
and error messages, so being exposed to remote users.
  
  = Security =

[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran

Reply via email to