[jira] Commented: (PIG-1102) Collect number of spills per job

Sriranjan Manjunath (JIRA) Wed, 23 Dec 2009 12:38:58 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794217#action_12794217
 ]


Sriranjan Manjunath commented on PIG-1102:
------------------------------------------

(3) refers to the case where we try to guess the number of records that fit 
into memory and start spilling the other records. InternalCachedBag.java 
addresses this case:

+                if (cacheLimit!= 0 && mContents.size() % cacheLimit == 0) {
+                    /* Increment the spill count*/
+                    incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT);          
          
+                }
             }

cacheLimit holds the number of records that can be held in memory whereas 
mContents is the tuple that holds all the records. Here, I do not increment the 
counter for every record. Instead I count every n'th record, n being the 
cacheLimit.

This however, does not increment the counter by the buffer size. Incrementing 
it by the buffer size will give us a value which approximately equal to the 
number of spilled records.

> Collect number of spills per job
> --------------------------------
>
>                 Key: PIG-1102
>                 URL: https://issues.apache.org/jira/browse/PIG-1102
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Sriranjan Manjunath
>             Fix For: 0.7.0
>
>         Attachments: PIG_1102.patch, PIG_1102.patch.1
>
>
> Memory shortage is one of the main performance issues in Pig. Knowing when we 
> spill do the disk is useful for understanding query performance and also to 
> see how certain changes in Pig effect that.
> Other interesting stats to collect would be average CPU usage and max mem 
> usage but I am not sure if this information is easily retrievable.
> Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1102) Collect number of spills per job

Reply via email to