[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran

Apache Wiki Tue, 03 Mar 2015 18:32:21 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "AmazonS3" page has been changed by SteveLoughran:
https://wiki.apache.org/hadoop/AmazonS3?action=diff&rev1=15&rev2=16

Comment:
s3a

  monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] 
is free. This makes use of
  S3 attractive for Hadoop users who run clusters on EC2.
  
- Hadoop provides two filesystems that use S3.
+ Hadoop provides multiple filesystem clients for reading and writing to and 
from Amazon S3 or compatible service.
  
   S3 Native FileSystem (URI scheme: s3n)::
   A native filesystem for reading and writing regular files on S3. The 
advantage of this filesystem is that you can access files on S3 that were 
written with other tools. Conversely, other tools can access files written 
using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.
+ 
+  S3A (URI scheme: s3a)::
+  A successor to the S3 Native, s3n fs, the S3a: system uses Amazon's 
libraries to interact with S3. This allows S3a to support larger files (no more 
5GB limit), higher performance operations and more. The filesystem is intended 
to be a replacement for/successor to S3 Native: all objects accessible from 
s3n:// URLs should also be accessible from s3a simply by replacing the URL 
schema.
  
   S3 Block FileSystem (URI scheme: s3)::
   A block-based filesystem backed by S3. Files are stored as blocks, just like 
they are in HDFS. This permits efficient implementation of renames. This 
filesystem requires you to dedicate a bucket for the filesystem - you should 
not use an existing bucket containing files, or write other files to the same 
bucket. The files stored by this filesystem can be larger than 5GB, but they 
are not interoperable with other S3 tools.
@@ -20, +23 @@

  = History =
   * The S3 block filesystem was introduced in Hadoop 0.10.0 
([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]), but this had 
a few bugs so you should use Hadoop 0.10.1 or later.
   * The S3 native filesystem was introduced in Hadoop 0.18.0 
([[http://issues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename 
support was added in Hadoop 0.19.0 
([[https://issues.apache.org/jira/browse/HADOOP-3361|HADOOP-3361]]).
+  * The S3A filesystem was introduced in Hadoop 2.6.0.
  
  = Why you cannot use S3 as a replacement for HDFS =
  
  You cannot use either of the S3 filesystems as a drop-in replacement for 
HDFS. Amazon S3 is an "object store" with
   * eventual consistency: changes made by one application (creation, updates 
and deletions) will not be visible until some undefined time.
-  * s3n: non-atomic rename and delete operations. Renaming or deleting large 
directories takes time proportional to the number of entries -and visible to 
other processes during this time, and indeed, until the eventual consistency 
has been resolved.
+  * s3n and s3a: non-atomic rename and delete operations. Renaming or deleting 
large directories takes time proportional to the number of entries -and visible 
to other processes during this time, and indeed, until the eventual consistency 
has been resolved.
  
  S3 is not a filesystem. The Hadoop S3 filesystem bindings make it pretend to 
be a filesystem, but it is not. It can
  act as a source of data, and as a destination -though in the latter case, you 
must remember that the output may not be immediately visible.
@@ -73, +77 @@

  
  = Security =
  
- Your Amazon Secret Access Key is that: secret. If it gets known you have to 
go to the [[https://portal.aws.amazon.com/gp/aws/securityCredentials|Security 
Credentials]] page and revoke it. Try and avoid printing it in logs, or 
checking the XML configuration files into revision control.
+ Your Amazon Secret Access Key is that: secret. If it gets known you have to 
go to the [[https://portal.aws.amazon.com/gp/aws/securityCredentials|Security 
Credentials]] page and revoke it. Try and avoid printing it in logs, or 
checking the XML configuration files into revision control. Do not ever check 
it in to revision control systems.
  
  = Running bulk copies in and out of S3 =
  
@@ -90, +94 @@

  
  Flip the arguments if you want to run the copy in the opposite direction.
  
- Other schemes supported by `distcp` are `file` (for local), and `http`.
+ Other schemes supported by `distcp` include `file:` (for local), and `http:`.

[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran

Reply via email to