[ 
https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981052#action_12981052
 ] 

Thejas M Nair commented on PIG-1803:
------------------------------------

bq. (2) There is a bug in Hadoop that causes memory overuse when combiner is 
used. I don't believe it has been addressed. Thejas, do you remember what JIRA 
number is for MR?
HADOOP-5494 was causing out-of-memory errors in reduce, not in the map. And 
that happens when there are large records being combined, like in the case of a 
group followed by distinct in nested-foreach. 



> Maps are failing if combiner is enabled
> ---------------------------------------
>
>                 Key: PIG-1803
>                 URL: https://issues.apache.org/jira/browse/PIG-1803
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Alex Rovner
>             Fix For: 0.7.0
>
>
> We are constantly hitting the java heap space memory issue if the combiner is 
> enabled on our jobs.
> Configs:
> pig.cachedbag.memusage=20
> io.sort.mb=300
> pig.exec.nocombiner=false
> mapred.child.java.opts=-Xmx750m
> Sample job:
> {noformat} 
> A = LOAD '$INPUT' USING 
> com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
> AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
> ContextCategoryId,Impressions, Clicks, Actions;
> DESCRIBE AA;
> B = GROUP AA BY (checkPointStart, PublisherId, TagId,
> ContextCategoryId);
> result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, 
> SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
> DESCRIBE result;
> STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
> {noformat} 
> Mapper Error Log:
> 2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>       at org.apache.hadoop.mapred.Child.main(Child.java:211)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to