[ 
https://issues.apache.org/jira/browse/PIG-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730940#comment-13730940
 ] 

Cheolsoo Park commented on PIG-3411:
------------------------------------

[~ihadanny], looks like you can't overwrite 
"mapreduce.jobtracker.split.metainfo.maxsize" on a per-job basis. You probably 
need to bounce your JT:
{quote}
There is a property (mapreduce.jobtracker.split.metainfo.maxsize) that can be 
used to override the default of 10^6.
We found that passing this along with the job has no effect, this worked only 
when setting this property on the jobtracker node. Not sure if this is a 
feature or a bug.
{quote}
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/UWBMKplvGkg

                
> pig skewed join with a big table causes “Split metadata size exceeded 
> 10000000”
> -------------------------------------------------------------------------------
>
>                 Key: PIG-3411
>                 URL: https://issues.apache.org/jira/browse/PIG-3411
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: Pig version 0.10.0-cdh3u4a
> Hadoop 0.20.2-cdh3u4a
>            Reporter: Ido Hadanny
>
> We have a pig join between a small (16M rows) distinct table and a big (6B 
> rows) skewed table. A regular join finishes in 2 hours (after some tweaking). 
> We tried using skewed and been able to improve the performance to 20 minutes.
> HOWEVER, when we try a bigger skewed table (19B rows), we get this message 
> from the SAMPLER job:
> Split metadata size exceeded 10000000. Aborting job job_201305151351_21573 
> [ScriptRunner]
> at 
> org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:48)
> This is reproducible every time we try using skewed, and does not happen when 
> we use the regular join.
> we tried setting mapreduce.jobtracker.split.metainfo.maxsize=-1 and we can 
> see it's there in the job.xml file, but it doesn't change anything!
> What's happening here? Is this a bug with the distribution sample created by 
> using skewed? Why doesn't it help changing the param to -1?
> also available at http://stackoverflow.com/q/17163112/574187
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to