Aha, thanks Gabor. I'll just wait til Twadoop is building and let it fill
in.

On Thu, Nov 15, 2012 at 9:53 AM, Gabor Szabo <ga...@twitter.com> wrote:

> Kurt, there's a lot of dependencies that this job expects at different
> places, and if you run it as a normal user, it will get confused. Most
> likely some input files are not found.
>
>
> On 11/15/2012 09:36 AM, Julien Le Dem wrote:
>
>> Hi Kurt,
>>  From the stack trace, it looks like it runs into an error while
>> estimating the size of the input.
>> Are all of the paths it's looking for there in hdfs:///user/kurt ?
>> does it work with pig_11 ? add --pig_version pig_11 to the oink command
>> also send out the command line you are using.
>> Thanks,
>> Julien
>>
>> On Thu, Nov 15, 2012 at 9:15 AM, Kurt Smith <k...@twitter.com> wrote:
>>
>>> I'm getting this error when doing a manual run of the search_simplified
>>> twadoop query. This query has run fine before. Any idea what the issue
>>> is?
>>>
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2017: Internal error creating job configuration.
>>>
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobCreationException:
>>> ERROR 2017: Internal error creating job configuration.
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.getJob(**JobControlCompiler.java:738)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.compile(**JobControlCompiler.java:264)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> MapReduceLauncher.launchPig(**MapReduceLauncher.java:150)
>>>          at org.apache.pig.PigServer.**launchPlan(PigServer.java:**1267)
>>>          at
>>> org.apache.pig.PigServer.**executeCompiledLogicalPlan(**
>>> PigServer.java:1252)
>>>          at org.apache.pig.PigServer.**execute(PigServer.java:1242)
>>>          at org.apache.pig.PigServer.**executeBatch(PigServer.java:**
>>> 356)
>>>          at
>>> org.apache.pig.tools.grunt.**GruntParser.executeBatch(**
>>> GruntParser.java:132)
>>>          at
>>> org.apache.pig.tools.grunt.**GruntParser.processScript(**
>>> GruntParser.java:452)
>>>          at
>>> org.apache.pig.tools.**pigscript.parser.**PigScriptParser.Script(**
>>> PigScriptParser.java:752)
>>>          at
>>> org.apache.pig.tools.**pigscript.parser.**PigScriptParser.parse(**
>>> PigScriptParser.java:423)
>>>          at
>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>> GruntParser.java:189)
>>>          at
>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(**
>>> GruntParser.java:165)
>>>          at org.apache.pig.tools.grunt.**Grunt.exec(Grunt.java:84)
>>>          at org.apache.pig.Main.run(Main.**java:561)
>>>          at org.apache.pig.Main.main(Main.**java:111)
>>>          at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
>>> Method)
>>>          at
>>> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>>> NativeMethodAccessorImpl.java:**39)
>>>          at
>>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>>> DelegatingMethodAccessorImpl.**java:25)
>>>          at java.lang.reflect.Method.**invoke(Method.java:597)
>>>          at org.apache.hadoop.util.RunJar.**main(RunJar.java:186)
>>> Caused by: java.lang.NullPointerException
>>>          at org.apache.hadoop.fs.**FileSystem.globStatus(**
>>> FileSystem.java:971)
>>>          at org.apache.hadoop.fs.**FileSystem.globStatus(**
>>> FileSystem.java:944)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.**getInputSize(**JobControlCompiler.java:840)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.**estimateNumberOfReducers(**
>>> JobControlCompiler.java:810)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.**adjustNumReducers(**JobControlCompiler.java:750)
>>>          at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler.getJob(**JobControlCompiler.java:396)
>>>          ... 20 more
>>> ==============================**==============================**
>>> ====================
>>>
>>>
>>> from the output:
>>> ---
>>>
>>> 2012-11-15 07:54:02,554 [main] INFO
>>> org.apache.pig.tools.pigstats.**ScriptState - Pig script settings are
>>> added to
>>> the job
>>> 2012-11-15 07:54:02,555 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - mapred.job.reduce.markreset.**buffer.percent is not set, set to
>>> default 0.3
>>> 2012-11-15 07:54:03,005 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - BytesPerReducer=1610612736 maxReducers=999
>>> totalInputFileSize=**1456250098769
>>> 2012-11-15 07:54:03,005 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - Setting Parallelism to 15
>>> 2012-11-15 07:54:04,655 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - creating jar file Job917013994443138523.jar
>>> 2012-11-15 07:54:07,963 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - jar file Job917013994443138523.jar created
>>> 2012-11-15 07:54:07,972 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - Setting up multi store job
>>> 2012-11-15 07:54:08,199 [main] INFO
>>> org.apache.pig.tools.pigstats.**ScriptState - Pig script settings are
>>> added to
>>> the job
>>> 2012-11-15 07:54:08,199 [main] INFO
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> JobControlCompiler
>>> - mapred.job.reduce.markreset.**buffer.percent is not set, set to
>>> default 0.3
>>> 2012-11-15 07:54:08,404 [main] ERROR org.apache.pig.tools.grunt.**Grunt
>>> -
>>> ERROR 2017: Internal error creating job configuration.
>>> Details at logfile: /var/log/pig/pig_**1352950582587.log
>>> 2012-11-15 07:54:08,706 [Thread-5] INFO
>>> org.apache.hcatalog.common.**HiveClientCache - Cleaning up hive client
>>> cache
>>> in ShutDown hook
>>> [2012-11-15 07:54:10] Pig job failed: return code 6 from running
>>> /usr/bin/pig_9 -t ColumnMapKeyPrune -F -param
>>> TWADOOP_HOME=/home/kurt/**twadoop  -p END_DATE="20121102" -p
>>> END_TIME_UNIX="1351814400" -p
>>> LATEST_PATH_FILE="/user/kurt/**processed/search/search_**
>>> simplified/daily/_latest"
>>> -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p
>>> REGISTER_HCAT="'register
>>> /usr/lib/hcatalog/share/**hcatalog/hcatalog-{core,pig-**adapter}-*-*.jar'"
>>> -p
>>> START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/**01" -p
>>> BATCH_ID="0" -p
>>> SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p
>>> PREVIOUS_FOUR_WEEKS_AGO="2012/**10/05" -p PREVIOUS_ONE_DAY="2012/10/31"
>>> -p
>>> REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p
>>> REGISTER_HIVE="'register
>>> /usr/lib/hive/lib/{hive-exec-***-*,hive-metastore-*-*,**libfb303-*}.jar'"
>>> -p
>>> INPROCESS_DIR="/user/kurt/in_**process/processed/search/**
>>> search_simplified/daily/2012/**11/01"
>>> -p JOB_NAME="search_simplified:**daily_2012/11/01_to_2012/11/**02" -p
>>> BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4"
>>> -p
>>> SET_PARALLEL="'set default_parallel 15'" -p
>>> ALL_DATES_4_WEEKS_AGO_TO_**TODAY="2012/10/05,2012/10/06,**
>>> 2012/10/07,2012/10/08,2012/10/**09,2012/10/10,2012/10/11,2012/**
>>> 10/12,2012/10/13,2012/10/14,**2012/10/15,2012/10/16,2012/10/**
>>> 17,2012/10/18,2012/10/19,2012/**10/20,2012/10/21,2012/10/22,**
>>> 2012/10/23,2012/10/24,2012/10/**25,2012/10/26,2012/10/27,2012/**
>>> 10/28,2012/10/29,2012/10/30,**2012/10/31,2012/11/01"
>>> -p
>>> ALL_DATES_2_WEEKS_AGO_TO_**TODAY="2012/10/19,2012/10/20,**
>>> 2012/10/21,2012/10/22,2012/10/**23,2012/10/24,2012/10/25,2012/**
>>> 10/26,2012/10/27,2012/10/28,**2012/10/29,2012/10/30,2012/10/**
>>> 31,2012/11/01"
>>> -p RAND="816544" -p
>>> ALL_DATES_8_WEEKS_TO_4_WEEKS_**AGO="2012/09/07,2012/09/08,**
>>> 2012/09/09,2012/09/10,2012/09/**11,2012/09/12,2012/09/13,2012/**
>>> 09/14,2012/09/15,2012/09/16,**2012/09/17,2012/09/18,2012/09/**
>>> 19,2012/09/20,2012/09/21,2012/**09/22,2012/09/23,2012/09/24,**
>>> 2012/09/25,2012/09/26,2012/09/**27,2012/09/28,2012/09/29,2012/**
>>> 09/30,2012/10/01,2012/10/02,**2012/10/03,2012/10/04"
>>> -p PREVIOUS_TWO_WEEKS_AGO="2012/**10/19" -p
>>> OUTPUT_DIR="/user/kurt/**processed/search/search_**
>>> simplified/daily/2012/11/01"
>>> -p
>>> OUTPUT_DIR_PARENT="/user/kurt/**processed/search/search_**
>>> simplified/daily/2012/11"
>>> -p OUTPUT_BASE="dal://smf1-dw-**hcat.kurt.search_search_**simplified_daily"
>>> -p
>>> START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p
>>> BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_**SLASHES="2012/11/01" -p
>>> START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p
>>> DEBUG="off" -p PART="'part_dt=**20121101T000000Z'"
>>> /tmp/oink20121115-33234-**42y95k-0.pig at Thu Nov 15 07:54:10 +0000
>>> 2012.
>>> Exiting...
>>> [2012-11-15 07:54:10] Oink failed because of unhandled exeception:
>>> Pig job failed: return code 6 from running  /usr/bin/pig_9 -t
>>> ColumnMapKeyPrune -F -param TWADOOP_HOME=/home/kurt/**twadoop  -p
>>> END_DATE="20121102" -p END_TIME_UNIX="1351814400" -p
>>> LATEST_PATH_FILE="/user/kurt/**processed/search/search_**
>>> simplified/daily/_latest"
>>> -p START_TIME="2012-11-01-00:00:00" -p PREVIOUS_ONE_WEEK="2012/10/25" -p
>>> REGISTER_HCAT="'register
>>> /usr/lib/hcatalog/share/**hcatalog/hcatalog-{core,pig-**adapter}-*-*.jar'"
>>> -p
>>> START_DATE="20121101" -p PREVIOUS_ONE_MONTH="2012/10/**01" -p
>>> BATCH_ID="0" -p
>>> SCHEDULER_POOL="'set mapred.fairscheduler.pool search'" -p
>>> PREVIOUS_FOUR_WEEKS_AGO="2012/**10/05" -p PREVIOUS_ONE_DAY="2012/10/31"
>>> -p
>>> REGISTER_DAL="'register /usr/lib/dal/dal.jar;'" -p
>>> REGISTER_HIVE="'register
>>> /usr/lib/hive/lib/{hive-exec-***-*,hive-metastore-*-*,**libfb303-*}.jar'"
>>> -p
>>> INPROCESS_DIR="/user/kurt/in_**process/processed/search/**
>>> search_simplified/daily/2012/**11/01"
>>> -p JOB_NAME="search_simplified:**daily_2012/11/01_to_2012/11/**02" -p
>>> BATCH_DESC="'oink search_simplified:daily'" -p END_TIME_DAY_OF_WEEK="4"
>>> -p
>>> SET_PARALLEL="'set default_parallel 15'" -p
>>> ALL_DATES_4_WEEKS_AGO_TO_**TODAY="2012/10/05,2012/10/06,**
>>> 2012/10/07,2012/10/08,2012/10/**09,2012/10/10,2012/10/11,2012/**
>>> 10/12,2012/10/13,2012/10/14,**2012/10/15,2012/10/16,2012/10/**
>>> 17,2012/10/18,2012/10/19,2012/**10/20,2012/10/21,2012/10/22,**
>>> 2012/10/23,2012/10/24,2012/10/**25,2012/10/26,2012/10/27,2012/**
>>> 10/28,2012/10/29,2012/10/30,**2012/10/31,2012/11/01"
>>> -p
>>> ALL_DATES_2_WEEKS_AGO_TO_**TODAY="2012/10/19,2012/10/20,**
>>> 2012/10/21,2012/10/22,2012/10/**23,2012/10/24,2012/10/25,2012/**
>>> 10/26,2012/10/27,2012/10/28,**2012/10/29,2012/10/30,2012/10/**
>>> 31,2012/11/01"
>>> -p RAND="816544" -p
>>> ALL_DATES_8_WEEKS_TO_4_WEEKS_**AGO="2012/09/07,2012/09/08,**
>>> 2012/09/09,2012/09/10,2012/09/**11,2012/09/12,2012/09/13,2012/**
>>> 09/14,2012/09/15,2012/09/16,**2012/09/17,2012/09/18,2012/09/**
>>> 19,2012/09/20,2012/09/21,2012/**09/22,2012/09/23,2012/09/24,**
>>> 2012/09/25,2012/09/26,2012/09/**27,2012/09/28,2012/09/29,2012/**
>>> 09/30,2012/10/01,2012/10/02,**2012/10/03,2012/10/04"
>>> -p PREVIOUS_TWO_WEEKS_AGO="2012/**10/19" -p
>>> OUTPUT_DIR="/user/kurt/**processed/search/search_**
>>> simplified/daily/2012/11/01"
>>> -p
>>> OUTPUT_DIR_PARENT="/user/kurt/**processed/search/search_**
>>> simplified/daily/2012/11"
>>> -p OUTPUT_BASE="dal://smf1-dw-**hcat.kurt.search_search_**simplified_daily"
>>> -p
>>> START_TIME_DAY_OF_WEEK="3" -p END_TIME="2012-11-02-00:00:00" -p
>>> BATCH_STEP="86400" -p END_DATE_MINUS_ONE_WITH_**SLASHES="2012/11/01" -p
>>> START_TIME_UNIX="1351728000" -p START_HOUR="00" -p END_HOUR="00" -p
>>> DEBUG="off" -p PART="'part_dt=**20121101T000000Z'"
>>> /tmp/oink20121115-33234-**42y95k-0.pig at Thu Nov 15 07:54:10 +0000
>>> 2012.
>>> Exiting...
>>>          /home/kurt/twadoop/oink/lib/**runner.rb:327:in `fail'
>>>          /home/kurt/twadoop/oink/lib/**runner.rb:172:in `run'
>>>          /home/kurt/twadoop/oink/oink.**rb:25:in `go'
>>>          /home/kurt/twadoop/oink/oink.**rb:46
>>>
>>>
>>> --
>>> Kurt Smith
>>> Senior Data Scientist, Analytics | Twitter, Inc
>>> @kurtosis0
>>>
>>>


-- 
Kurt Smith
Senior Data Scientist, Analytics | Twitter, Inc
@kurtosis0 <https://twitter.com/intent/user?screen_name=kurtosis0>

Reply via email to