[ https://issues.apache.org/jira/browse/PIG-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1474: ------------------------------- Fix Version/s: 0.9.0 (was: 0.8.0) Unlinking from 0.8 release. I was planning to use the lazy implementations of Map and Bag for this that were proposed in PIG-1473. Those objects would have had a copy of the seralized versions of map and bag. But the plan in the jira had to be abandoned for reasons mentioned there. A different approach is required to solve the issue. > Avoid serialization/deserialization costs for PigStorage data - Use custom > Tuple > -------------------------------------------------------------------------------- > > Key: PIG-1474 > URL: https://issues.apache.org/jira/browse/PIG-1474 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.9.0 > > > Avoid sedes when possible for data loaded using PigStorage by implementing > approach #4 proposed in http://wiki.apache.org/pig/AvoidingSedes . > The write() and readFields() functions of tuple returned by TupleFactory is > used to serialize data between Map and Reduce. By using a tuple that knows > the serialization format of the loader, we avoid sedes at Map Recue boundary > and use the load functions serialized format between Map and Reduce . > To use a new custom tuple for this purpose, a custom TupleFactory that > returns tuples of this type has to be specified using the property > "pig.data.tuple.factory.name" . > This approach will work only for a set of load functions in the query that > share same serialization format for map and bags. If this approach proves to > be very useful, it will build a case for more extensible approach. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.