Aha, thanks Gabor. I'll just wait til Twadoop is building and let it fill in.
On Thu, Nov 15, 2012 at 9:53 AM, Gabor Szabo <ga...@twitter.com> wrote: > Kurt, there's a lot of dependencies that this job expects at different > places, and if you run it as a normal user, it will get confused. Most > likely some input files are not found. > > > On 11/15/2012 09:36 AM, Julien Le Dem wrote: > >> Hi Kurt, >> From the stack trace, it looks like it runs into an error while >> estimating the size of the input. >> Are all of the paths it's looking for there in hdfs:///user/kurt ? >> does it work with pig_11 ? add --pig_version pig_11 to the oink command >> also send out the command line you are using. >> Thanks, >> Julien >> >> On Thu, Nov 15, 2012 at 9:15 AM, Kurt Smith <k...@twitter.com> wrote: >> >>> I'm getting this error when doing a manual run of the search_simplified >>> twadoop query. This query has run fine before. Any idea what the issue >>> is? >>> >>> Pig Stack Trace >>> --------------- >>> ERROR 2017: Internal error creating job configuration. >>> >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobCreationException: >>> ERROR 2017: Internal error creating job configuration. >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.getJob(**JobControlCompiler.java:738) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.compile(**JobControlCompiler.java:264) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> MapReduceLauncher.launchPig(**MapReduceLauncher.java:150) >>> at org.apache.pig.PigServer.**launchPlan(PigServer.java:**1267) >>> at >>> org.apache.pig.PigServer.**executeCompiledLogicalPlan(** >>> PigServer.java:1252) >>> at org.apache.pig.PigServer.**execute(PigServer.java:1242) >>> at org.apache.pig.PigServer.**executeBatch(PigServer.java:** >>> 356) >>> at >>> org.apache.pig.tools.grunt.**GruntParser.executeBatch(** >>> GruntParser.java:132) >>> at >>> org.apache.pig.tools.grunt.**GruntParser.processScript(** >>> GruntParser.java:452) >>> at >>> org.apache.pig.tools.**pigscript.parser.**PigScriptParser.Script(** >>> PigScriptParser.java:752) >>> at >>> org.apache.pig.tools.**pigscript.parser.**PigScriptParser.parse(** >>> PigScriptParser.java:423) >>> at >>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>> GruntParser.java:189) >>> at >>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>> GruntParser.java:165) >>> at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84) >>> at org.apache.pig.Main.run(Main.**java:561) >>> at org.apache.pig.Main.main(Main.**java:111) >>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native >>> Method) >>> at >>> sun.reflect.**NativeMethodAccessorImpl.**invoke(** >>> NativeMethodAccessorImpl.java:**39) >>> at >>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** >>> DelegatingMethodAccessorImpl.**java:25) >>> at java.lang.reflect.Method.**invoke(Method.java:597) >>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:186) >>> Caused by: java.lang.NullPointerException >>> at org.apache.hadoop.fs.**FileSystem.globStatus(** >>> FileSystem.java:971) >>> at org.apache.hadoop.fs.**FileSystem.globStatus(** >>> FileSystem.java:944) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.**getInputSize(**JobControlCompiler.java:840) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.**estimateNumberOfReducers(** >>> JobControlCompiler.java:810) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.**adjustNumReducers(**JobControlCompiler.java:750) >>> at >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler.getJob(**JobControlCompiler.java:396) >>> ... 20 more >>> ==============================**==============================** >>> ==================== >>> >>> >>> from the output: >>> --- >>> >>> 2012-11-15 07:54:02,554 [main] INFO >>> org.apache.pig.tools.pigstats.**ScriptState - Pig script settings are >>> added to >>> the job >>> 2012-11-15 07:54:02,555 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - mapred.job.reduce.markreset.**buffer.percent is not set, set to >>> default 0.3 >>> 2012-11-15 07:54:03,005 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - BytesPerReducer=1610612736 maxReducers=999 >>> totalInputFileSize=**1456250098769 >>> 2012-11-15 07:54:03,005 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - Setting Parallelism to 15 >>> 2012-11-15 07:54:04,655 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - creating jar file Job917013994443138523.jar >>> 2012-11-15 07:54:07,963 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - jar file Job917013994443138523.jar created >>> 2012-11-15 07:54:07,972 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - Setting up multi store job >>> 2012-11-15 07:54:08,199 [main] INFO >>> org.apache.pig.tools.pigstats.**ScriptState - Pig script settings are >>> added to >>> the job >>> 2012-11-15 07:54:08,199 [main] INFO >>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.** >>> JobControlCompiler >>> - mapred.job.reduce.markreset.**buffer.percent is not set, set to >>> default 0.3 >>> 2012-11-15 07:54:08,404 [main] ERROR org.apache.pig.tools.grunt.**Grunt >>> - >>> ERROR 2017: Internal error creating job configuration. >>> Details at logfile: /var/log/pig/pig_**1352950582587.log >>> 2012-11-15 07:54:08,706 [Thread-5] INFO >>> org.apache.hcatalog.common.**HiveClientCache - Cleaning up hive client >>> cache >>> in ShutDown hook >>> [2012-11-15 07:54:10] Pig job failed: return code 6 from running >>> /usr/bin/pig_9 -t ColumnMapKeyPrune -F -param >>> TWADOOP_HOME=/home/kurt/**twadoop -p END_DATE="20121102" -p >>> END_TIME_UNIX="1351814400" -p >>> LATEST_PATH_FILE="/user/kurt/**processed/search/search_** >>> simplified/daily/_latest" >>> -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p >>> REGISTER_HCAT="'register >>> /usr/lib/hcatalog/share/**hcatalog/hcatalog-{core,pig-**adapter}-*-*.jar'" >>> -p >>> START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/**01" -p >>> BATCH_ID="0" -p >>> SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p >>> PREVIOUS_FOUR_WEEKS_AGO="2012/**10/05" -p PREVIOUS_ONE_DAY="2012/10/31" >>> -p >>> REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p >>> REGISTER_HIVE="'register >>> /usr/lib/hive/lib/{hive-exec-***-*,hive-metastore-*-*,**libfb303-*}.jar'" >>> -p >>> INPROCESS_DIR="/user/kurt/in_**process/processed/search/** >>> search_simplified/daily/2012/**11/01" >>> -p JOB_NAME="search_simplified:**daily_2012/11/01_to_2012/11/**02" -p >>> BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" >>> -p >>> SET_PARALLEL="'set default_parallel 15'" -p >>> ALL_DATES_4_WEEKS_AGO_TO_**TODAY="2012/10/05,2012/10/06,** >>> 2012/10/07,2012/10/08,2012/10/**09,2012/10/10,2012/10/11,2012/** >>> 10/12,2012/10/13,2012/10/14,**2012/10/15,2012/10/16,2012/10/** >>> 17,2012/10/18,2012/10/19,2012/**10/20,2012/10/21,2012/10/22,** >>> 2012/10/23,2012/10/24,2012/10/**25,2012/10/26,2012/10/27,2012/** >>> 10/28,2012/10/29,2012/10/30,**2012/10/31,2012/11/01" >>> -p >>> ALL_DATES_2_WEEKS_AGO_TO_**TODAY="2012/10/19,2012/10/20,** >>> 2012/10/21,2012/10/22,2012/10/**23,2012/10/24,2012/10/25,2012/** >>> 10/26,2012/10/27,2012/10/28,**2012/10/29,2012/10/30,2012/10/** >>> 31,2012/11/01" >>> -p RAND="816544" -p >>> ALL_DATES_8_WEEKS_TO_4_WEEKS_**AGO="2012/09/07,2012/09/08,** >>> 2012/09/09,2012/09/10,2012/09/**11,2012/09/12,2012/09/13,2012/** >>> 09/14,2012/09/15,2012/09/16,**2012/09/17,2012/09/18,2012/09/** >>> 19,2012/09/20,2012/09/21,2012/**09/22,2012/09/23,2012/09/24,** >>> 2012/09/25,2012/09/26,2012/09/**27,2012/09/28,2012/09/29,2012/** >>> 09/30,2012/10/01,2012/10/02,**2012/10/03,2012/10/04" >>> -p PREVIOUS_TWO_WEEKS_AGO="2012/**10/19" -p >>> OUTPUT_DIR="/user/kurt/**processed/search/search_** >>> simplified/daily/2012/11/01" >>> -p >>> OUTPUT_DIR_PARENT="/user/kurt/**processed/search/search_** >>> simplified/daily/2012/11" >>> -p OUTPUT_BASE="dal://smf1-dw-**hcat.kurt.search_search_**simplified_daily" >>> -p >>> START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p >>> BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_**SLASHES="2012/11/01" -p >>> START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p >>> DEBUG="off" -p PART="'part_dt=**20121101T000000Z'" >>> /tmp/oink20121115-33234-**42y95k-0.pig at Thu Nov 15 07:54:10 +0000 >>> 2012. >>> Exiting... >>> [2012-11-15 07:54:10] Oink failed because of unhandled exeception: >>> Pig job failed: return code 6 from running /usr/bin/pig_9 -t >>> ColumnMapKeyPrune -F -param TWADOOP_HOME=/home/kurt/**twadoop -p >>> END_DATE="20121102" -p END_TIME_UNIX="1351814400" -p >>> LATEST_PATH_FILE="/user/kurt/**processed/search/search_** >>> simplified/daily/_latest" >>> -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p >>> REGISTER_HCAT="'register >>> /usr/lib/hcatalog/share/**hcatalog/hcatalog-{core,pig-**adapter}-*-*.jar'" >>> -p >>> START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/**01" -p >>> BATCH_ID="0" -p >>> SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p >>> PREVIOUS_FOUR_WEEKS_AGO="2012/**10/05" -p PREVIOUS_ONE_DAY="2012/10/31" >>> -p >>> REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p >>> REGISTER_HIVE="'register >>> /usr/lib/hive/lib/{hive-exec-***-*,hive-metastore-*-*,**libfb303-*}.jar'" >>> -p >>> INPROCESS_DIR="/user/kurt/in_**process/processed/search/** >>> search_simplified/daily/2012/**11/01" >>> -p JOB_NAME="search_simplified:**daily_2012/11/01_to_2012/11/**02" -p >>> BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" >>> -p >>> SET_PARALLEL="'set default_parallel 15'" -p >>> ALL_DATES_4_WEEKS_AGO_TO_**TODAY="2012/10/05,2012/10/06,** >>> 2012/10/07,2012/10/08,2012/10/**09,2012/10/10,2012/10/11,2012/** >>> 10/12,2012/10/13,2012/10/14,**2012/10/15,2012/10/16,2012/10/** >>> 17,2012/10/18,2012/10/19,2012/**10/20,2012/10/21,2012/10/22,** >>> 2012/10/23,2012/10/24,2012/10/**25,2012/10/26,2012/10/27,2012/** >>> 10/28,2012/10/29,2012/10/30,**2012/10/31,2012/11/01" >>> -p >>> ALL_DATES_2_WEEKS_AGO_TO_**TODAY="2012/10/19,2012/10/20,** >>> 2012/10/21,2012/10/22,2012/10/**23,2012/10/24,2012/10/25,2012/** >>> 10/26,2012/10/27,2012/10/28,**2012/10/29,2012/10/30,2012/10/** >>> 31,2012/11/01" >>> -p RAND="816544" -p >>> ALL_DATES_8_WEEKS_TO_4_WEEKS_**AGO="2012/09/07,2012/09/08,** >>> 2012/09/09,2012/09/10,2012/09/**11,2012/09/12,2012/09/13,2012/** >>> 09/14,2012/09/15,2012/09/16,**2012/09/17,2012/09/18,2012/09/** >>> 19,2012/09/20,2012/09/21,2012/**09/22,2012/09/23,2012/09/24,** >>> 2012/09/25,2012/09/26,2012/09/**27,2012/09/28,2012/09/29,2012/** >>> 09/30,2012/10/01,2012/10/02,**2012/10/03,2012/10/04" >>> -p PREVIOUS_TWO_WEEKS_AGO="2012/**10/19" -p >>> OUTPUT_DIR="/user/kurt/**processed/search/search_** >>> simplified/daily/2012/11/01" >>> -p >>> OUTPUT_DIR_PARENT="/user/kurt/**processed/search/search_** >>> simplified/daily/2012/11" >>> -p OUTPUT_BASE="dal://smf1-dw-**hcat.kurt.search_search_**simplified_daily" >>> -p >>> START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p >>> BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_**SLASHES="2012/11/01" -p >>> START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p >>> DEBUG="off" -p PART="'part_dt=**20121101T000000Z'" >>> /tmp/oink20121115-33234-**42y95k-0.pig at Thu Nov 15 07:54:10 +0000 >>> 2012. >>> Exiting... >>> /home/kurt/twadoop/oink/lib/**runner.rb:327:in `fail' >>> /home/kurt/twadoop/oink/lib/**runner.rb:172:in `run' >>> /home/kurt/twadoop/oink/oink.**rb:25:in `go' >>> /home/kurt/twadoop/oink/oink.**rb:46 >>> >>> >>> -- >>> Kurt Smith >>> Senior Data Scientist, Analytics | Twitter, Inc >>> @kurtosis0 >>> >>> -- Kurt Smith Senior Data Scientist, Analytics | Twitter, Inc @kurtosis0 <https://twitter.com/intent/user?screen_name=kurtosis0>