Kurt, there's a lot of dependencies that this job expects at different places, and if you run it as a normal user, it will get confused. Most likely some input files are not found.

On 11/15/2012 09:36 AM, Julien Le Dem wrote:
Hi Kurt,
 From the stack trace, it looks like it runs into an error while
estimating the size of the input.
Are all of the paths it's looking for there in hdfs:///user/kurt ?
does it work with pig_11 ? add --pig_version pig_11 to the oink command
also send out the command line you are using.
Thanks,
Julien

On Thu, Nov 15, 2012 at 9:15 AM, Kurt Smith <k...@twitter.com> wrote:
I'm getting this error when doing a manual run of the search_simplified
twadoop query. This query has run fine before. Any idea what the issue is?

Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:738)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:264)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:150)
         at org.apache.pig.PigServer.launchPlan(PigServer.java:1267)
         at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1252)
         at org.apache.pig.PigServer.execute(PigServer.java:1242)
         at org.apache.pig.PigServer.executeBatch(PigServer.java:356)
         at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
         at
org.apache.pig.tools.grunt.GruntParser.processScript(GruntParser.java:452)
         at
org.apache.pig.tools.pigscript.parser.PigScriptParser.Script(PigScriptParser.java:752)
         at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:423)
         at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
         at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
         at org.apache.pig.Main.run(Main.java:561)
         at org.apache.pig.Main.main(Main.java:111)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.lang.NullPointerException
         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:971)
         at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:944)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getInputSize(JobControlCompiler.java:840)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:810)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:750)
         at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:396)
         ... 20 more
================================================================================


from the output:
---

2012-11-15 07:54:02,554 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to
the job
2012-11-15 07:54:02,555 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-11-15 07:54:03,005 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1610612736 maxReducers=999
totalInputFileSize=1456250098769
2012-11-15 07:54:03,005 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting Parallelism to 15
2012-11-15 07:54:04,655 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job917013994443138523.jar
2012-11-15 07:54:07,963 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job917013994443138523.jar created
2012-11-15 07:54:07,972 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2012-11-15 07:54:08,199 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to
the job
2012-11-15 07:54:08,199 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-11-15 07:54:08,404 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2017: Internal error creating job configuration.
Details at logfile: /var/log/pig/pig_1352950582587.log
2012-11-15 07:54:08,706 [Thread-5] INFO
org.apache.hcatalog.common.HiveClientCache - Cleaning up hive client cache
in ShutDown hook
[2012-11-15 07:54:10] Pig job failed: return code 6 from running
/usr/bin/pig_9 -t ColumnMapKeyPrune -F -param
TWADOOP_HOME=/home/kurt/twadoop  -p END_DATE="20121102" -p
END_TIME_UNIX="1351814400" -p
LATEST_PATH_FILE="/user/kurt/processed/search/search_simplified/daily/_latest"
-p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p
REGISTER_HCAT="'register
/usr/lib/hcatalog/share/hcatalog/hcatalog-{core,pig-adapter}-*-*.jar'" -p
START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/01" -p BATCH_ID="0" -p
SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p
PREVIOUS_FOUR_WEEKS_AGO="2012/10/05" -p PREVIOUS_ONE_DAY="2012/10/31" -p
REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p REGISTER_HIVE="'register
/usr/lib/hive/lib/{hive-exec-*-*,hive-metastore-*-*,libfb303-*}.jar'" -p
INPROCESS_DIR="/user/kurt/in_process/processed/search/search_simplified/daily/2012/11/01"
-p JOB_NAME="search_simplified:daily_2012/11/01_to_2012/11/02" -p
BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" -p
SET_PARALLEL="'set default_parallel 15'" -p
ALL_DATES_4_WEEKS_AGO_TO_TODAY="2012/10/05,2012/10/06,2012/10/07,2012/10/08,2012/10/09,2012/10/10,2012/10/11,2012/10/12,2012/10/13,2012/10/14,2012/10/15,2012/10/16,2012/10/17,2012/10/18,2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01"
-p
ALL_DATES_2_WEEKS_AGO_TO_TODAY="2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01"
-p RAND="816544" -p
ALL_DATES_8_WEEKS_TO_4_WEEKS_AGO="2012/09/07,2012/09/08,2012/09/09,2012/09/10,2012/09/11,2012/09/12,2012/09/13,2012/09/14,2012/09/15,2012/09/16,2012/09/17,2012/09/18,2012/09/19,2012/09/20,2012/09/21,2012/09/22,2012/09/23,2012/09/24,2012/09/25,2012/09/26,2012/09/27,2012/09/28,2012/09/29,2012/09/30,2012/10/01,2012/10/02,2012/10/03,2012/10/04"
-p PREVIOUS_TWO_WEEKS_AGO="2012/10/19" -p
OUTPUT_DIR="/user/kurt/processed/search/search_simplified/daily/2012/11/01"
-p
OUTPUT_DIR_PARENT="/user/kurt/processed/search/search_simplified/daily/2012/11"
-p OUTPUT_BASE="dal://smf1-dw-hcat.kurt.search_search_simplified_daily" -p
START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p
BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_SLASHES="2012/11/01" -p
START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p
DEBUG="off" -p PART="'part_dt=20121101T000000Z'"
/tmp/oink20121115-33234-42y95k-0.pig at Thu Nov 15 07:54:10 +0000 2012.
Exiting...
[2012-11-15 07:54:10] Oink failed because of unhandled exeception:
Pig job failed: return code 6 from running  /usr/bin/pig_9 -t
ColumnMapKeyPrune -F -param TWADOOP_HOME=/home/kurt/twadoop  -p
END_DATE="20121102" -p END_TIME_UNIX="1351814400" -p
LATEST_PATH_FILE="/user/kurt/processed/search/search_simplified/daily/_latest"
-p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p
REGISTER_HCAT="'register
/usr/lib/hcatalog/share/hcatalog/hcatalog-{core,pig-adapter}-*-*.jar'" -p
START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/01" -p BATCH_ID="0" -p
SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p
PREVIOUS_FOUR_WEEKS_AGO="2012/10/05" -p PREVIOUS_ONE_DAY="2012/10/31" -p
REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p REGISTER_HIVE="'register
/usr/lib/hive/lib/{hive-exec-*-*,hive-metastore-*-*,libfb303-*}.jar'" -p
INPROCESS_DIR="/user/kurt/in_process/processed/search/search_simplified/daily/2012/11/01"
-p JOB_NAME="search_simplified:daily_2012/11/01_to_2012/11/02" -p
BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4" -p
SET_PARALLEL="'set default_parallel 15'" -p
ALL_DATES_4_WEEKS_AGO_TO_TODAY="2012/10/05,2012/10/06,2012/10/07,2012/10/08,2012/10/09,2012/10/10,2012/10/11,2012/10/12,2012/10/13,2012/10/14,2012/10/15,2012/10/16,2012/10/17,2012/10/18,2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01"
-p
ALL_DATES_2_WEEKS_AGO_TO_TODAY="2012/10/19,2012/10/20,2012/10/21,2012/10/22,2012/10/23,2012/10/24,2012/10/25,2012/10/26,2012/10/27,2012/10/28,2012/10/29,2012/10/30,2012/10/31,2012/11/01"
-p RAND="816544" -p
ALL_DATES_8_WEEKS_TO_4_WEEKS_AGO="2012/09/07,2012/09/08,2012/09/09,2012/09/10,2012/09/11,2012/09/12,2012/09/13,2012/09/14,2012/09/15,2012/09/16,2012/09/17,2012/09/18,2012/09/19,2012/09/20,2012/09/21,2012/09/22,2012/09/23,2012/09/24,2012/09/25,2012/09/26,2012/09/27,2012/09/28,2012/09/29,2012/09/30,2012/10/01,2012/10/02,2012/10/03,2012/10/04"
-p PREVIOUS_TWO_WEEKS_AGO="2012/10/19" -p
OUTPUT_DIR="/user/kurt/processed/search/search_simplified/daily/2012/11/01"
-p
OUTPUT_DIR_PARENT="/user/kurt/processed/search/search_simplified/daily/2012/11"
-p OUTPUT_BASE="dal://smf1-dw-hcat.kurt.search_search_simplified_daily" -p
START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p
BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_SLASHES="2012/11/01" -p
START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p
DEBUG="off" -p PART="'part_dt=20121101T000000Z'"
/tmp/oink20121115-33234-42y95k-0.pig at Thu Nov 15 07:54:10 +0000 2012.
Exiting...
         /home/kurt/twadoop/oink/lib/runner.rb:327:in `fail'
         /home/kurt/twadoop/oink/lib/runner.rb:172:in `run'
         /home/kurt/twadoop/oink/oink.rb:25:in `go'
         /home/kurt/twadoop/oink/oink.rb:46


--
Kurt Smith
Senior Data Scientist, Analytics | Twitter, Inc
@kurtosis0

Reply via email to