[
https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------
Resolution: Fixed
Release Note:
A new builtin UDF, CubeDimensions, is added to simplify the process of
producing cube-like aggregations.
CubeDimensions produces a DataBag with all combinations of the argument tuple
members as in a data cube. Meaning, (a, b, c) will produce the following bag:
{ (a, b, c), (null, null, null), (a, b, null), (a, null, c),
(a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by
invoking a constructor (via a DEFINE). The constructor takes a single argument,
the string you want to represent "all".
Usage goes something like this:
events = load '/logs/events' using EventLoader() as (lang, event, app_id);
cubed = foreach x generate
FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
as (lang, event, app_id),
measure;
cube = foreach (group cubed
by (lang, event, app_id) parallel $P)
generate
flatten(group) as (lang, event, app_id),
COUNT_STAR(cubed),
SUM(measure);
store cube into 'event_cube';
Note: doing this with non-algebraic aggregations on large data can result in
very slow reducers, since one of the groups is going to get all the records in
your relation.
Status: Resolved (was: Patch Available)
Committed to 0.10
> CubeDimensions UDF
> ------------------
>
> Key: PIG-2168
> URL: https://issues.apache.org/jira/browse/PIG-2168
> Project: Pig
> Issue Type: Sub-task
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Fix For: 0.10
>
> Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on
> the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a,
> null, null), (null, b, null), (null, null, null).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira