[ 
https://issues.apache.org/jira/browse/PIG-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689406#comment-13689406
 ] 

Rohini Palaniswamy commented on PIG-3325:
-----------------------------------------

{noformat} 

if (avgTupleSize != 0 && (mLastContentsSize == numInMem ||
                    mLastContentsSize > 100 && numInMem > 100))
                return totalSizeFromAvgTupleSize(avgTupleSize, numInMem);

{noformat}

  Actually I was wrong. Initializing memory size only once does not help that 
much. It only saves on the call to totalSizeFromAvgTupleSize(avgTupleSize, 
numInMem).  When getMemorySize() is called multiple times from the Comparator, 
the second time it hits mLastContentsSize == numInMem and returns 
totalSizeFromAvgTupleSize() directly without iterating through tuples again. 

Still trying to figure out a solution to optimize spilling. Wondering if 
splitting into two lists one for bigger sizes and one for < 
spillFileSizeThreshold after the first spill pass and sorting/iterating through 
them separately will help in future invocations.  
                
> Adding a tuple to a bag is slow
> -------------------------------
>
>                 Key: PIG-3325
>                 URL: https://issues.apache.org/jira/browse/PIG-3325
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11, 0.11.1, 0.11.2
>            Reporter: Mark Wagner
>            Assignee: Mark Wagner
>            Priority: Critical
>         Attachments: PIG-3325.demo.patch, PIG-3325.optimize.1.patch
>
>
> The time it takes to add a tuple to a bag has increased significantly, 
> causing some jobs to take about 50x longer compared to 0.10.1. I've tracked 
> this down to PIG-2923, which has made adding a tuple heavier weight (it now 
> includes some memory estimation).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to