Keep tuples serialized to limit spilling and speed it when it happens
---------------------------------------------------------------------
Key: PIG-1875
URL: https://issues.apache.org/jira/browse/PIG-1875
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: Alan Gates
Priority: Minor
Currently Pig reads records off of the reduce iterator and immediately
deserializes them into Java objects. This takes up much more memory than
serialized versions, thus Pig spills sooner then if it stored them in
serialized form. Also, if it does have to spill, it has to serialize them
again, and then again deserialize them after reading from the spill file.
We should explore storing them in memory serialized when they are read off of
the reduce iterator.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira