[jira] [Updated] (CRUNCH-47) Inputs and outputs can't use non-default Hadoop FileSystem

Shawn Smith (JIRA) Tue, 14 Aug 2012 15:37:39 -0700

     [ 
https://issues.apache.org/jira/browse/CRUNCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shawn Smith updated CRUNCH-47:
------------------------------

    Attachment: multiple-file-systems.patch

Attached patch replaces all calls to FileSystem.get() with Path.getFileSystem().

Uses FileUtil.copy(src, dst, ..deleteSource=true..) instead of 
FileSystem.rename() when src and dest are on different file systems.
                
> Inputs and outputs can't use non-default Hadoop FileSystem
> ----------------------------------------------------------
>
>                 Key: CRUNCH-47
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-47
>             Project: Crunch
>          Issue Type: Bug
>          Components: IO
>    Affects Versions: 0.3.0
>         Environment: Elastic MapReduce Hadoop 1.0.3
>            Reporter: Shawn Smith
>         Attachments: multiple-file-systems.patch
>
>
> I'm getting the following exception trying to use Crunch with Elastic 
> MapReduce where input and output files use the Native S3 FileSystem and 
> intermediate files use HDFS.  HDFS is configured as the default file system:
> Exception in thread "main" java.lang.IllegalArgumentException: This file 
> system object (hdfs://10.114.37.65:9000) does not support access to the 
> request path 's3n://test-bucket/test/Input.avro' You possibly called 
> FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to 
> obtain a file system supporting your path.
>       at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513)
>       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:767)
>       at 
> org.apache.crunch.io.SourceTargetHelper.getPathSize(SourceTargetHelper.java:44)
> It looks like Crunch has a number of calls to FileSystem.get(Configuration) 
> that assume the default configured file system and fail with an S3 input or 
> output.
> Also, CrunchJob.handleMultiPaths() calls FileSystem.rename() which works only 
> if the source and destination use the same file system.  This breaks the 
> final upload of the output files from HDFS to S3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-47) Inputs and outputs can't use non-default Hadoop FileSystem

Reply via email to