[
https://issues.apache.org/jira/browse/PIG-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhijie Shen updated PIG-2163:
-----------------------------
Attachment: PIG-2163_1.patch
Hi Daniel,
I modified the patch according to your comments.
Now the right-most bag will be streamed while the cross product will be
generated on the fly. Additionally, to make the order of generated tuples
reasonable, I reverse the iteration order of n bags (converting to n, n - 1,
..., 2, 1 order, and avoiding the strange 2, 3, ..., n - 1, n, 1 order). For
example, if there are three bags from left to right:
bag #1 {(a, 1), (a, 2)}
bag #2 {(a, 11), (a, 22)}
bag #3 {(a, 111), (a, 222)}
the generated bag will be:
{
(a, 1, a, 11, a, 111),
(a, 2, a, 11, a, 111),
(a, 1, a, 22, a, 111),
(a, 2, a, 22, a, 111),
(a, 1, a, 11, a, 222),
(a, 2, a, 11, a, 222),
(a, 1, a, 22, a, 222),
(a, 2, a, 22, a, 222)
}
> Improve nested cross to stream one relation
> -------------------------------------------
>
> Key: PIG-2163
> URL: https://issues.apache.org/jira/browse/PIG-2163
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.10
> Reporter: Daniel Dai
> Assignee: Zhijie Shen
> Fix For: 0.10
>
> Attachments: PIG-2163.patch, PIG-2163_1.patch
>
>
> PIG-1916 added nested cross support for PIG. One optimization is instead of
> materialize all bags before producing result, we can stream one of the input
> to save on memory.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira