[ https://issues.apache.org/jira/browse/PIG-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809237#comment-16809237 ]
Xuzhou Yin commented on PIG-5360: --------------------------------- This issue has been opened for a while. It would be a great appreciate is anyone can take a look... Thanks a lot! > Pig sets working directory of input file systems causes exception thrown > ------------------------------------------------------------------------ > > Key: PIG-5360 > URL: https://issues.apache.org/jira/browse/PIG-5360 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.17.0 > Reporter: Xuzhou Yin > Priority: Minor > Labels: patch > Fix For: 0.18.0 > > Attachments: PIG-5360.diff > > Original Estimate: 504h > Remaining Estimate: 504h > > {color:#000000}In getSplits() method in PigInputFormat, Pig is trying to set > the working directory of input File System to > jobContext.getWorkingDirectory(), which is always the default working > directory of default file system (eg. hdfs://host:port/user/userId in case of > HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default > value. So if the input path uses non-default file system, then it will fail > since it is trying to set the working directory of non-default file system to > a HDFS path.{color} > {color:#000000}The proposed change is to completely remove this logic of > setting working directory. There are several reasons for doing so. {color} > {color:#000000}Firstly, getSplits() is only supposed to return a list of > input splits. It should not have side effects (especially doing so can > potentially change the output path). Having InputFormat changes OutputFormat > does not make much sense here. > {color} > {color:#000000}Secondly, there is inconsistency between the working > directories of input and output file systems. if "mapreduce.job.working.dir" > is set to non-default value, it will affect the output path only (if it is a > relative path) because input path will be made qualified even before this > logic.{color} > {color:#000000}Thirdly, there is already a "CD" functionality that allows > customers to change the working directory. However, this logic will overwrite > the "CD" functionality if input and output paths both use default file > system.{color} > {color:#000000}Lastly, if customer has a sequence of jobs, changing the > working directory may change the input paths of downstream jobs if the input > paths are specified as relative{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)