[ 
https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772772#action_12772772
 ] 

Alan Gates commented on PIG-1037:
---------------------------------

The difference is much more than switching from dumping one tuple at a time to 
multiple tuples.  It is about how spilling is activated.  In the past, spilling 
was passive; it was done when the JVM informed us that memory was getting low.  
This did not work well as the JVM only checks memory usage when it garbage 
collects.  So by the time pig was notified of a low memory condition it was 
often too late.  We often ran out of memory while trying to spill.  Now 
instead, spilling is active.  Pig sets aside a buffer for a bag to put its 
tuples in.  For default bags, once this buffer is full any additional tuples 
are written to disk.  For sorted or distinct bags, once the buffer is full it 
is sorted and dumped to disk, and new records go into the buffer.

This particular patch only adds the change for sorted and distinct bags.  
PIG-975 contains the original patch for default bags.


> better memory layout and spill for sorted and distinct bags
> -----------------------------------------------------------
>
>                 Key: PIG-1037
>                 URL: https://issues.apache.org/jira/browse/PIG-1037
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Ying He
>             Fix For: 0.6.0
>
>         Attachments: PIG-1037.patch, PIG-1037.patch2, PIG-1037.patch3
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to