[ https://issues.apache.org/jira/browse/PIG-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich reassigned PIG-978: ---------------------------------- Assignee: Corinne Chandel (was: Richard Ding) > ERROR 2100 (hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist) > and ERROR 2999: (Unexpected internal error. null) when using Multi-Query > optimization > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-978 > URL: https://issues.apache.org/jira/browse/PIG-978 > Project: Pig > Issue Type: Bug > Components: documentation > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Assignee: Corinne Chandel > Fix For: 0.6.0 > > > I have Pig script of this form.. which I execute using Multi-query > optimization. > {code} > A = load '/user/viraj/firstinput' using PigStorage(); > B = group .... > C = ..agrregation function > store C into '/user/viraj/firstinputtempresult/days1'; > .. > Atab = load '/user/viraj/secondinput' using PigStorage(); > Btab = group .... > Ctab = ..agrregation function > store Ctab into '/user/viraj/secondinputtempresult/days1'; > .. > E = load '/user/viraj/firstinputtempresult/' using PigStorage(); > F = group > G = aggregation function > store G into '/user/viraj/finalresult1'; > Etab = load '/user/viraj/secondinputtempresult/' using PigStorage(); > Ftab = group > Gtab = aggregation function > store Gtab into '/user/viraj/finalresult2'; > {code} > 2009-07-20 22:05:44,507 [main] ERROR org.apache.pig.tools.grunt.GruntParser - > ERROR 2100: hdfs://localhost/tmp/temp175740929/tmp-1126214010 does not exist. > Details at logfile: /homes/viraj/pigscripts/pig_1248127173601.log) > is due to the mismatch of store/load commands. The script first stores files > into the 'days1' directory (store C into > '/user/viraj/firstinputtempresult/days1' using PigStorage();), but it later > loads from the top level directory (E = load > '/user/viraj/firstinputtempresult/' using PigStorage()) instead of the > original directory (/user/viraj/firstinputtempresult/days1). > The current multi-query optimizer can't solve the dependency between these > two commands--they have different load file paths. So the jobs will run > concurrently and result in the errors. > The solution is to add 'exec' or 'run' command after the first two stores . > This will force the first two store commands to run before the rest commands. > It would be nice to see this fixed as a part of an enhancement to the > Multi-query. We either disable the Multi-query or throw a warning/error > message, so that the user can correct his load/store statements. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.