[ 
https://issues.apache.org/jira/browse/PIG-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682557#comment-13682557
 ] 

Rohini Palaniswamy commented on PIG-3325:
-----------------------------------------

Dmitriy,

bq. at least iterate to it without calling getMemorySize(), and then add to our 
running avg, rather than recomputing it.
   Still does not help. It is around 5-6000 ns. However we try, I don't think 
it is going to come back to ~400ns unless we revert back to relying on the 
SpillableManager doing the memory size computation. Looking at the 
SpillableManager code, if GC has happened normally clearSpillables(); would 
take care of removing smaller bags. 

{noformat}
       if (toBeFreed < spillFileSizeThreshold) {
                    log.debug("spilling small files - getting out of memory 
handler");
                    break ;
                }
{noformat}
  With the default spillFileSizeThreshold at 5MB, we don't attempt spill at all 
of smaller objects. So going back to Mark's question, how big of an issue small 
bags were for spilling and do we need the markSpillableIfNecessary() at all?

 One thing I can see that can speed up spills is moving the getMemorySize call 
out of the compare in Collections.sort and having a composite Spillable that 
has the memory size reset in the beginning and calculated only once during the 
run. 
                
> Adding a tuple to a bag is slow
> -------------------------------
>
>                 Key: PIG-3325
>                 URL: https://issues.apache.org/jira/browse/PIG-3325
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11, 0.11.1, 0.11.2
>            Reporter: Mark Wagner
>            Assignee: Mark Wagner
>            Priority: Critical
>         Attachments: PIG-3325.demo.patch, PIG-3325.optimize.1.patch
>
>
> The time it takes to add a tuple to a bag has increased significantly, 
> causing some jobs to take about 50x longer compared to 0.10.1. I've tracked 
> this down to PIG-2923, which has made adding a tuple heavier weight (it now 
> includes some memory estimation).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to