[ https://issues.apache.org/jira/browse/PIG-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682557#comment-13682557 ]
Rohini Palaniswamy commented on PIG-3325: ----------------------------------------- Dmitriy, bq. at least iterate to it without calling getMemorySize(), and then add to our running avg, rather than recomputing it. Still does not help. It is around 5-6000 ns. However we try, I don't think it is going to come back to ~400ns unless we revert back to relying on the SpillableManager doing the memory size computation. Looking at the SpillableManager code, if GC has happened normally clearSpillables(); would take care of removing smaller bags. {noformat} if (toBeFreed < spillFileSizeThreshold) { log.debug("spilling small files - getting out of memory handler"); break ; } {noformat} With the default spillFileSizeThreshold at 5MB, we don't attempt spill at all of smaller objects. So going back to Mark's question, how big of an issue small bags were for spilling and do we need the markSpillableIfNecessary() at all? One thing I can see that can speed up spills is moving the getMemorySize call out of the compare in Collections.sort and having a composite Spillable that has the memory size reset in the beginning and calculated only once during the run. > Adding a tuple to a bag is slow > ------------------------------- > > Key: PIG-3325 > URL: https://issues.apache.org/jira/browse/PIG-3325 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11, 0.11.1, 0.11.2 > Reporter: Mark Wagner > Assignee: Mark Wagner > Priority: Critical > Attachments: PIG-3325.demo.patch, PIG-3325.optimize.1.patch > > > The time it takes to add a tuple to a bag has increased significantly, > causing some jobs to take about 50x longer compared to 0.10.1. I've tracked > this down to PIG-2923, which has made adding a tuple heavier weight (it now > includes some memory estimation). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira