Keep tuples serialized to limit spilling and speed it when it happens
---------------------------------------------------------------------

                 Key: PIG-1875
                 URL: https://issues.apache.org/jira/browse/PIG-1875
             Project: Pig
          Issue Type: Improvement
          Components: impl
            Reporter: Alan Gates
            Priority: Minor


Currently Pig reads records off of the reduce iterator and immediately 
deserializes them into Java objects.  This takes up much more memory than 
serialized versions, thus Pig spills sooner then if it stored them in 
serialized form.  Also, if it does have to spill, it has to serialize them 
again, and then again deserialize them after reading from the spill file.

We should explore storing them in memory serialized when they are read off of 
the reduce iterator.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to