[ 
https://issues.apache.org/jira/browse/PIG-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981042#action_12981042
 ] 

Olga Natkovich commented on PIG-1803:
-------------------------------------

There are two things:

(1) Have you tried Pig 0.8 as we have made quite a bit of progress on memory 
utilization
(2) There is a bug in Hadoop that causes memory overuse when combiner is used. 
I don't believe it has been addressed. Thejas, do you remember what JIRA number 
is for MR?

> Maps are failing if combiner is enabled
> ---------------------------------------
>
>                 Key: PIG-1803
>                 URL: https://issues.apache.org/jira/browse/PIG-1803
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Alex Rovner
>             Fix For: 0.7.0
>
>
> We are constantly hitting the java heap space memory issue if the combiner is 
> enabled on our jobs.
> Configs:
> pig.cachedbag.memusage=20
> io.sort.mb=300
> pig.exec.nocombiner=false
> mapred.child.java.opts=-Xmx750m
> Sample job:
> {noformat} 
> A = LOAD '$INPUT' USING 
> com.contextweb.pig.CWHeaderLoader('$WORK_DIR/schema/rpt.xml');
> AA = foreach A GENERATE checkPointStart, PublisherId, TagId,
> ContextCategoryId,Impressions, Clicks, Actions;
> DESCRIBE AA;
> B = GROUP AA BY (checkPointStart, PublisherId, TagId,
> ContextCategoryId);
> result = FOREACH B GENERATE group, SUM(AA.Impressions) as Impressions, 
> SUM(AA.Clicks) as Clicks, SUM(AA.Actions) as Actions;
> DESCRIBE result;
> STORE result INTO '$OUTPUT' USING com.contextweb.pig.CWHeaderStore();
> {noformat} 
> Mapper Error Log:
> 2011-01-12 18:43:22,084 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:799)
>       at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:549)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>       at org.apache.hadoop.mapred.Child.main(Child.java:211)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to