Hi Kurt, >From the stack trace, it looks like it runs into an error while estimating the size of the input. Are all of the paths it's looking for there in hdfs:///user/kurt ? does it work with pig_11 ? add --pig_version pig_11 to the oink command also send out the command line you are using. Thanks, Julien
On Thu, Nov 15, 2012 at 9:15 AM, Kurt Smith <k...@twitter.com> wrote: > I'm getting this error when doing a manual run of the search_simplified > twadoop query. This query has run fine before. Any idea what the issue is? > > Pig Stack Trace > --------------- > ERROR 2017: Internal error creating job configuration. > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: > ERROR 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:738) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:264) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:150) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1267) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1252) > at org.apache.pig.PigServer.execute(PigServer.java:1242) > at org.apache.pig.PigServer.executeBatch(PigServer.java:356) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) > at > org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:452) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:752) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:423) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:561) > at org.apache.pig.Main.main(Main.java:111) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:971) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:944) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getInputSize(JobControlCompiler.java:840) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:810) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:750) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:396) > ... 20 more > ================================================================================ > > > from the output: > --- > > 2012-11-15 07:54:02,554 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to > the job > 2012-11-15 07:54:02,555 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2012-11-15 07:54:03,005 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - BytesPerReducer=1610612736 maxReducers=999 > totalInputFileSize=1456250098769 > 2012-11-15 07:54:03,005 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting Parallelism to 15 > 2012-11-15 07:54:04,655 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - creating jar file Job917013994443138523.jar > 2012-11-15 07:54:07,963 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - jar file Job917013994443138523.jar created > 2012-11-15 07:54:07,972 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up multi store job > 2012-11-15 07:54:08,199 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to > the job > 2012-11-15 07:54:08,199 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2012-11-15 07:54:08,404 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2017: Internal error creating job configuration. > Details at logfile: /var/log/pig/pig_1352950582587.log > 2012-11-15 07:54:08,706 [Thread-5] INFO > org.apache.hcatalog.common.HiveClientCache - Cleaning up hive client cache > in ShutDown hook > [2012-11-15 07:54:10] Pig job failed: return code 6 from running > /usr/bin/pig_9 -t ColumnMapKeyPrune -F -param > TWADOOP_HOME=/home/kurt/twadoop -p END_DATE="20121102" -p > END_TIME_UNIX="1351814400" -p > LATEST_PATH_FILE="/user/kurt/processed/search/search_simplified/daily/_latest" > -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p > REGISTER_HCAT="'register > /usr/lib/hcatalog/share/hcatalog/hcatalog-{core,pig-adapter}-*-*.jar'" -p > START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/01" -p BATCH_ID="0" -p > SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p > PREVIOUS_FOUR_WEEKS_AGO="2012/10/05" -p PREVIOUS_ONE_DAY="2012/10/31" -p > REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p REGISTER_HIVE="'register > /usr/lib/hive/lib/{hive-exec-*-*,hive-metastore-*-*,libfb303-*}.jar'" -p > INPROCESS_DIR="/user/kurt/in_process/processed/search/search_simplified/daily/2012/11/01" > -p JOB_NAME="search_simplified:daily_2012/11/01_to_2012/11/02" -p > BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" -p > SET_PARALLEL="'set default_parallel 15'" -p > ALL_DATES_4_WEEKS_AGO_TO_TODAY="2012/10/05,2012/10/06,2012/10/07,2012/10/08,2012/10/09,2012/10/10,2012/10/11,2012/10/12,2012/10/13,2012/10/14,2012/10/15,2012/10/16,2012/10/17,2012/10/18,2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01" > -p > ALL_DATES_2_WEEKS_AGO_TO_TODAY="2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01" > -p RAND="816544" -p > ALL_DATES_8_WEEKS_TO_4_WEEKS_AGO="2012/09/07,2012/09/08,2012/09/09,2012/09/10,2012/09/11,2012/09/12,2012/09/13,2012/09/14,2012/09/15,2012/09/16,2012/09/17,2012/09/18,2012/09/19,2012/09/20,2012/09/21,2012/09/22,2012/09/23,2012/09/24,2012/09/25,2012/09/26,2012/09/27,2012/09/28,2012/09/29,2012/09/30,2012/10/01,2012/10/02,2012/10/03,2012/10/04" > -p PREVIOUS_TWO_WEEKS_AGO="2012/10/19" -p > OUTPUT_DIR="/user/kurt/processed/search/search_simplified/daily/2012/11/01" > -p > OUTPUT_DIR_PARENT="/user/kurt/processed/search/search_simplified/daily/2012/11" > -p OUTPUT_BASE="dal://smf1-dw-hcat.kurt.search_search_simplified_daily" -p > START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p > BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_SLASHES="2012/11/01" -p > START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p > DEBUG="off" -p PART="'part_dt=20121101T000000Z'" > /tmp/oink20121115-33234-42y95k-0.pig at Thu Nov 15 07:54:10 +0000 2012. > Exiting... > [2012-11-15 07:54:10] Oink failed because of unhandled exeception: > Pig job failed: return code 6 from running /usr/bin/pig_9 -t > ColumnMapKeyPrune -F -param TWADOOP_HOME=/home/kurt/twadoop -p > END_DATE="20121102" -p END_TIME_UNIX="1351814400" -p > LATEST_PATH_FILE="/user/kurt/processed/search/search_simplified/daily/_latest" > -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p > REGISTER_HCAT="'register > /usr/lib/hcatalog/share/hcatalog/hcatalog-{core,pig-adapter}-*-*.jar'" -p > START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/01" -p BATCH_ID="0" -p > SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p > PREVIOUS_FOUR_WEEKS_AGO="2012/10/05" -p PREVIOUS_ONE_DAY="2012/10/31" -p > REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p REGISTER_HIVE="'register > /usr/lib/hive/lib/{hive-exec-*-*,hive-metastore-*-*,libfb303-*}.jar'" -p > INPROCESS_DIR="/user/kurt/in_process/processed/search/search_simplified/daily/2012/11/01" > -p JOB_NAME="search_simplified:daily_2012/11/01_to_2012/11/02" -p > BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" -p > SET_PARALLEL="'set default_parallel 15'" -p > ALL_DATES_4_WEEKS_AGO_TO_TODAY="2012/10/05,2012/10/06,2012/10/07,2012/10/08,2012/10/09,2012/10/10,2012/10/11,2012/10/12,2012/10/13,2012/10/14,2012/10/15,2012/10/16,2012/10/17,2012/10/18,2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01" > -p > ALL_DATES_2_WEEKS_AGO_TO_TODAY="2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01" > -p RAND="816544" -p > ALL_DATES_8_WEEKS_TO_4_WEEKS_AGO="2012/09/07,2012/09/08,2012/09/09,2012/09/10,2012/09/11,2012/09/12,2012/09/13,2012/09/14,2012/09/15,2012/09/16,2012/09/17,2012/09/18,2012/09/19,2012/09/20,2012/09/21,2012/09/22,2012/09/23,2012/09/24,2012/09/25,2012/09/26,2012/09/27,2012/09/28,2012/09/29,2012/09/30,2012/10/01,2012/10/02,2012/10/03,2012/10/04" > -p PREVIOUS_TWO_WEEKS_AGO="2012/10/19" -p > OUTPUT_DIR="/user/kurt/processed/search/search_simplified/daily/2012/11/01" > -p > OUTPUT_DIR_PARENT="/user/kurt/processed/search/search_simplified/daily/2012/11" > -p OUTPUT_BASE="dal://smf1-dw-hcat.kurt.search_search_simplified_daily" -p > START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p > BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_SLASHES="2012/11/01" -p > START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p > DEBUG="off" -p PART="'part_dt=20121101T000000Z'" > /tmp/oink20121115-33234-42y95k-0.pig at Thu Nov 15 07:54:10 +0000 2012. > Exiting... > /home/kurt/twadoop/oink/lib/runner.rb:327:in `fail' > /home/kurt/twadoop/oink/lib/runner.rb:172:in `run' > /home/kurt/twadoop/oink/oink.rb:25:in `go' > /home/kurt/twadoop/oink/oink.rb:46 > > > -- > Kurt Smith > Senior Data Scientist, Analytics | Twitter, Inc > @kurtosis0 >