[ https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-2168: ----------------------------------- Resolution: Fixed Release Note: A new builtin UDF, CubeDimensions, is added to simplify the process of producing cube-like aggregations. CubeDimensions produces a DataBag with all combinations of the argument tuple members as in a data cube. Meaning, (a, b, c) will produce the following bag: { (a, b, c), (null, null, null), (a, b, null), (a, null, c), (a, null, null), (null, b, c), (null, null, c), (null, b, null) } The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all". Usage goes something like this: events = load '/logs/events' using EventLoader() as (lang, event, app_id); cubed = foreach x generate FLATTEN(piggybank.CubeDimensions(lang, event, app_id)) as (lang, event, app_id), measure; cube = foreach (group cubed by (lang, event, app_id) parallel $P) generate flatten(group) as (lang, event, app_id), COUNT_STAR(cubed), SUM(measure); store cube into 'event_cube'; Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation. Status: Resolved (was: Patch Available) Committed to 0.10 > CubeDimensions UDF > ------------------ > > Key: PIG-2168 > URL: https://issues.apache.org/jira/browse/PIG-2168 > Project: Pig > Issue Type: Sub-task > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Fix For: 0.10 > > Attachments: PIG-2168.2.patch, PIG-2168.patch > > > A prerequisite for a naive cubing implementation: > A UDF that, given a set of dimensions (a, b, c) generates all the points on > the cube: > (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, > null, null), (null, b, null), (null, null, null). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira