[ https://issues.apache.org/jira/browse/PIG-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan reassigned PIG-1216: ------------------------------------- Assignee: Ashutosh Chauhan > New load store design does not allow Pig to validate inputs and outputs up > front > -------------------------------------------------------------------------------- > > Key: PIG-1216 > URL: https://issues.apache.org/jira/browse/PIG-1216 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Alan Gates > Assignee: Ashutosh Chauhan > > In Pig 0.6 and before, Pig attempts to verify existence of inputs and > non-existence of outputs during parsing to avoid run time failures when > inputs don't exist or outputs can't be overwritten. The downside to this was > that Pig assumed all inputs and outputs were HDFS files, which made > implementation harder for non-HDFS based load and store functions. In the > load store redesign (PIG-966) this was delegated to InputFormats and > OutputFormats to avoid this problem and to make use of the checks already > being done in those implementations. Unfortunately, for Pig Latin scripts > that run more then one MR job, this does not work well. MR does not do > input/output verification on all the jobs at once. It does them one at a > time. So if a Pig Latin script results in 10 MR jobs and the file to store > to at the end already exists, the first 9 jobs will be run before the 10th > job discovers that the whole thing was doomed from the beginning. > To avoid this a validate call needs to be added to the new LoadFunc and > StoreFunc interfaces. Pig needs to pass this method enough information that > the load function implementer can delegate to InputFormat.getSplits() and the > store function implementer to OutputFormat.checkOutputSpecs() if s/he decides > to. Since 90% of all load and store functions use HDFS and PigStorage will > also need to, the Pig team should implement a default file existence check on > HDFS and make it available as a static method to other Load/Store function > implementers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.