[jira] Commented: (PIG-1102) Collect number of spills per job

Olga Natkovich (JIRA) Mon, 21 Dec 2009 13:27:46 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793369#action_12793369
 ]


Olga Natkovich commented on PIG-1102:
-------------------------------------

A few questions/comments on the patch:

(1) I think the count should default to 0, not -1.
(2) Does increment of count have to be combined with warn statement. Does this 
mean that users will see this many warnings? If so, should we combine this with 
spill message we already print?
(3) I thought we discussed having increment per buffer not per record and to 
approximate that based on the buffer size. I did not see the code that did this.
(4) I don't think you correctly separated bags that practively spill vs the 
bags that are spilled by memory manager. All the bags created by 
DefaultBagFactory get registerf with SpillableMemoryManager and belong to the 
second category.


> Collect number of spills per job
> --------------------------------
>
>                 Key: PIG-1102
>                 URL: https://issues.apache.org/jira/browse/PIG-1102
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Sriranjan Manjunath
>             Fix For: 0.7.0
>
>         Attachments: PIG_1102.patch
>
>
> Memory shortage is one of the main performance issues in Pig. Knowing when we 
> spill do the disk is useful for understanding query performance and also to 
> see how certain changes in Pig effect that.
> Other interesting stats to collect would be average CPU usage and max mem 
> usage but I am not sure if this information is easily retrievable.
> Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1102) Collect number of spills per job

Reply via email to