[
https://issues.apache.org/jira/browse/CRUNCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shawn Smith updated CRUNCH-47:
------------------------------
Attachment: multiple-file-systems.patch
Attached patch replaces all calls to FileSystem.get() with Path.getFileSystem().
Uses FileUtil.copy(src, dst, ..deleteSource=true..) instead of
FileSystem.rename() when src and dest are on different file systems.
> Inputs and outputs can't use non-default Hadoop FileSystem
> ----------------------------------------------------------
>
> Key: CRUNCH-47
> URL: https://issues.apache.org/jira/browse/CRUNCH-47
> Project: Crunch
> Issue Type: Bug
> Components: IO
> Affects Versions: 0.3.0
> Environment: Elastic MapReduce Hadoop 1.0.3
> Reporter: Shawn Smith
> Attachments: multiple-file-systems.patch
>
>
> I'm getting the following exception trying to use Crunch with Elastic
> MapReduce where input and output files use the Native S3 FileSystem and
> intermediate files use HDFS. HDFS is configured as the default file system:
> Exception in thread "main" java.lang.IllegalArgumentException: This file
> system object (hdfs://10.114.37.65:9000) does not support access to the
> request path 's3n://test-bucket/test/Input.avro' You possibly called
> FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to
> obtain a file system supporting your path.
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:767)
> at
> org.apache.crunch.io.SourceTargetHelper.getPathSize(SourceTargetHelper.java:44)
> It looks like Crunch has a number of calls to FileSystem.get(Configuration)
> that assume the default configured file system and fail with an S3 input or
> output.
> Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only
> if the source and destination use the same file system. This breaks the
> final upload of the output files from HDFS to S3.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira