[
https://issues.apache.org/jira/browse/HADOOP-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463198
]
[EMAIL PROTECTED] commented on HADOOP-862:
------------------------------------------
Updated patch.
+ Renamed DFSCopyFilesMapper as FSCopyFilesMapper
+ If no scheme, use 'default' (the value of 'fs.default.name' in
hadoop-site.xml).
I ran more extensive tests going from hdfs to s3 and back again and copying
from http into s3 and hdfs (distcp is a nice tool). For example, here is
output from a copy of a small nutch segment from hdfs to s3 (in the below hdfs
was set as the fs.default.name filesystem):
[EMAIL PROTECTED]:~/checkouts/hadoop$ ./bin/hadoop fs -lsr outputs/segments
/user/stack/outputs/segments/20070108213341-test <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/data
<r 1> 1187
/user/stack/outputs/segments/20070108213341-test/crawl_fetch/part-00000/index
<r 1> 234
/user/stack/outputs/segments/20070108213341-test/crawl_parse <dir>
/user/stack/outputs/segments/20070108213341-test/crawl_parse/part-00000 <r 1>
9010
/user/stack/outputs/segments/20070108213341-test/parse_data <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/data
<r 1> 4630
/user/stack/outputs/segments/20070108213341-test/parse_data/part-00000/index
<r 1> 234
/user/stack/outputs/segments/20070108213341-test/parse_text <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000 <dir>
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/data
<r 1> 6180
/user/stack/outputs/segments/20070108213341-test/parse_text/part-00000/index
<r 1> 234
Here's copy to an s3 directory named segments-bkup:
% ./bin/hadoop distcp /user/stack/outputs/segments s3://KEY:[EMAIL
PROTECTED]/segments-bkup
Here's listing of s3 content:
[EMAIL PROTECTED]:~/checkouts/hadoop$ ./bin/hadoop fs -fs s3://KEY:[EMAIL
PROTECTED]/segments-bkup -lsr /segments-bkup/
/segments-bkup/20070108213341-test <dir>
/segments-bkup/20070108213341-test/crawl_fetch <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000 <dir>
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/data <r 1> 1187
/segments-bkup/20070108213341-test/crawl_fetch/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/crawl_parse <dir>
/segments-bkup/20070108213341-test/crawl_parse/part-00000 <r 1> 9010
/segments-bkup/20070108213341-test/parse_data <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_data/part-00000/data <r 1> 4630
/segments-bkup/20070108213341-test/parse_data/part-00000/index <r 1> 234
/segments-bkup/20070108213341-test/parse_text <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000 <dir>
/segments-bkup/20070108213341-test/parse_text/part-00000/data <r 1> 6180
/segments-bkup/20070108213341-test/parse_text/part-00000/index <r 1> 234
> Add handling of s3 to CopyFile tool
> -----------------------------------
>
> Key: HADOOP-862
> URL: https://issues.apache.org/jira/browse/HADOOP-862
> Project: Hadoop
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.10.0
> Reporter: [EMAIL PROTECTED]
> Priority: Minor
> Attachments: copyfiles-s3-2.diff, copyfiles-s3.diff
>
>
> CopyFile is a useful tool for doing bulk copies. It doesn't have handling
> for the recently added s3 filesystem.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira