[
https://issues.apache.org/jira/browse/PIG-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-1631:
--------------------------------
Fix Version/s: 0.10
> Support to 2 level nested foreach
> ---------------------------------
>
> Key: PIG-1631
> URL: https://issues.apache.org/jira/browse/PIG-1631
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.7.0
> Reporter: Viraj Bhat
> Fix For: 0.10
>
>
> What I would like to do is generate certain metrics for every listing
> impression in the context of a page like clicks on the page etc. So, I first
> group by to get clicks and impression together. Now, I would want to iterate
> through the mini-table (one per serve-id) and compute metrics. Since nested
> foreach within foreach is not supported I ended up writing a UDF that took
> both the bags and computed the metric. It would have been elegant to keep the
> logic of iterating over the records outside in the PIG script.
> Here is some pseudocode of how I would have liked to write it:
> {code}
> -- Let us say in our page context there was click on rank 2 for which there
> were 3 ads
> A1 = LOAD '...' AS (page_id, rank); -- clicks.
> A2 = Load '...' AS (page_id, rank); -- impressions
> B = COGROUP A1 by (page_id), A2 by (page_id);
> -- Let us say B contains the following schema
> -- (group, {(A1...)} {(A2...)})
> -- Each record would be in B would be:
> -- page_id_1, {(page_id_1, 2)} {(page_id_1, 1) (page_id_1, 2) (page_id_1, 3))}
> C = FOREACH B GENERATE {
> D = FLATTEN(A1), FLATTEN(A2); -- This wont work in current
> pig as well. Basically, I would like a mini-table which represents an entire
> serve.
> FOREACH D GENERATE
> page_id_1,
> A2:rank,
> SOMEUDF(A1:rank, A2::rank); -- This UDF returns a
> value (like v1, v2, v3 depending on A1::rank and A2::rank)
> };
> # output
> # page_id, 1, v1
> # page_id, 2, v2
> # page_id, 3, v3
> DUMP C;
> {code}
> P.S: I understand that I could have alternatively, flattened the fields of B
> and then done a GROUP on page_id and then iterated through the records
> calling 'SOMEUDF' appropriately but that would be 2 map-reduce operations
> AFAIK.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira