[ https://issues.apache.org/jira/browse/PIG-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair reassigned PIG-1473: ---------------------------------- Assignee: Thejas M Nair > Avoid serialization/deserialization costs for PigStorage data - Use custom > Map and Bag implementation > ----------------------------------------------------------------------------------------------------- > > Key: PIG-1473 > URL: https://issues.apache.org/jira/browse/PIG-1473 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.8.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.8.0 > > > Cost of serialization/deserialization (sedes) can be very high and avoiding > it will improve performance. > Avoid sedes when possible by implementing approach #3 proposed in > http://wiki.apache.org/pig/AvoidingSedes . > The load function uses subclass of Map and DataBag which holds the serialized > copy. LoadFunction delays deserialization of map and bag types until a > member function of java.util.Map or DataBag is called. > Example of query where this will help - > {CODE} > l = LOAD 'file1' AS (a : int, b : map [ ]); > f = FOREACH l GENERATE udf1(a), b; > fil = FILTER f BY $0 > 5; > dump fil; -- Serialization of column b can be delayed until here using this > approach . > {CODE} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.