[ 
https://issues.apache.org/jira/browse/PIG-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-4032:
-------------------------------

    Attachment: PIG-4032-1.patch

The patch discards the scheme from the given path and only uses the rest to 
create a distributed cache file name.

> BloomFilter fails with s3 path in Hadoop 2.4
> --------------------------------------------
>
>                 Key: PIG-4032
>                 URL: https://issues.apache.org/jira/browse/PIG-4032
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4032-1.patch
>
>
> BloomFilter is broken with s3 path in Hadoop 2. Here is a simple example-
> {code}
> DEFINE bloomtest Bloom('s3n://foo/bar/bloom');
> a = LOAD 's3n://foo/bar/test.txt' using PigStorage('\t') as (k:int, v:int) ;
> split a into yes if bloomtest(k,v), no otherwise;
> dump yes;
> {code}
> This query fails with the following error-
> {code}
> 14/06/22 06:28:58 INFO jobcontrol.ControlledJob: PigLatin:test.pig got an 
> error while submitting
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: s3n:__foo_bar_bloom
>       at org.apache.hadoop.fs.Path.initialize(Path.java:206)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:172)
>       at 
> org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java
> {code}
> The problem is that the distributed cache file name {{s3n:__foo_bar_bloom}} 
> causes a uri syntax error because of the s3n prefix.
> In fact, this is a regression of HADOOP-8562 that includes the following 
> change-
> {code:title=Path.java}
> -      this.uri = new URI(scheme, authority, normalizePath(path), null, 
> fragment)
> +      this.uri = new URI(scheme, authority, normalizePath(scheme, path), 
> null, fragment)
> {code}
> Since the scheme was ignored in Hadoop 1, s3 path used to work accidentally. 
> But in Hadoop 2, it starts failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to