[ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai closed PIG-844. -------------------------- > PERFORMANCE: streaming data to the UDFs in foreach > -------------------------------------------------- > > Key: PIG-844 > URL: https://issues.apache.org/jira/browse/PIG-844 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > Fix For: 0.7.0 > > > Currently, Pig places the data passed to UDFs into a bag. This can cause the > process to use more memory than actually needed as in many cases it would be > better to push the data one tuple at a time to the UDFs. > For the case where combiner is invoked, this might not be that important; > however, for non-algebraic UDFs as well as other cases where combiner can't > be used, this can provide significant memory improvement. > Another possible use case is where the data is already grouped going into pig > and we don't need to group it again. > How this will effect UDF interface needs to be further discussed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.