[ 
https://issues.apache.org/jira/browse/PIG-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729771#action_12729771
 ] 

Hong Tang commented on PIG-879:
-------------------------------

Both are valid arguments. The problem of 2) and 4) are that they require change 
to the load statement syntax or load-func api and would take longer to get 
there. 

I guess we could structure the fix in two phases: Phase One: supporting 1) and 
3), so that we can have the minimum to move along without having to disable 
multi-query optimization completely. User should be able to modify the script 
to change all relative paths to absolute ones (the chance of such usage should 
be rare that most people should not be impacted). Phase Two: support either 2) 
or 4) (but I do not think we need both). And personally I think 4) would be 
better because loader should be the one that interprets the location string 
syntax.

> Pig should provide a way for input location string in load statement to be 
> passed as-is to the Loader
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-879
>                 URL: https://issues.apache.org/jira/browse/PIG-879
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>
>  Due to multiquery optimization, Pig always converts the filenames to 
> absolute URIs (see 
> http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section 
> about Incompatible Changes - Path Names and Schemes). This is necessary since 
> the script may have "cd .." statements between load or store statements and 
> if the load statements have relative paths, we would need to convert to 
> absolute paths to know where to load/store from. To do this 
> QueryParser.massageFilename() has the code below[1] which basically gives the 
> fully qualified hdfs path
>  
> However the issue with this approach is that if the filename string is 
> something like 
> "hdfs://localhost.localdomain:39125/user/bla/1,hdfs://localhost.localdomain:39125/user/bla/2",
>  the code below[1] actually translates this to 
> hdfs://localhost.localdomain:38264/user/bla/1,hdfs://localhost.localdomain:38264/user/bla/2
>  and throws an exception that it is an incorrect path.
>  
> Some loaders may want to interpret the filenames (the input location string 
> in the load statement) in any way they wish and may want Pig to not make 
> absolute paths out of them.
>  
> There are a few options to address this:
> 1)    A command line switch to indicate to Pig that pathnames in the script 
> are all absolute and hence Pig should not alter them and pass them as-is to 
> Loaders and Storers. 
> 2)    A keyword in the load and store statements to indicate the same intent 
> to pig
> 3)    A property which users can supply on cmdline or in pig.properties to 
> indicate the same intent.
> 4)    A method in LoadFunc - relativeToAbsolutePath(String filename, String 
> curDir) which does the conversion to absolute - this way Loader can chose to 
> implement it as a noop.
> Thoughts?
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to