[ 
https://issues.apache.org/jira/browse/PIG-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2168:
-----------------------------------

      Resolution: Fixed
    Release Note: 
A new builtin UDF, CubeDimensions, is added to simplify the process of 
producing cube-like aggregations.

CubeDimensions produces a DataBag with all combinations of the argument tuple 
members as in a data cube. Meaning, (a, b, c) will produce the following bag:

 { (a, b, c), (null, null, null), (a, b, null), (a, null, c),
   (a, null, null), (null, b, c), (null, null, c), (null, b, null) }
 
The "all" marker is null by default, but can be set to an arbitrary string by 
invoking a constructor (via a DEFINE). The constructor takes a single argument, 
the string you want to represent "all".

Usage goes something like this:

events = load '/logs/events' using EventLoader() as (lang, event, app_id);
 cubed = foreach x generate
   FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
     as (lang, event, app_id),
   measure;
 cube = foreach (group cubed
                 by (lang, event, app_id) parallel $P)
        generate
   flatten(group) as (lang, event, app_id),
   COUNT_STAR(cubed),
   SUM(measure);
 store cube into 'event_cube';

Note: doing this with non-algebraic aggregations on large data can result in 
very slow reducers, since one of the groups is going to get all the records in 
your relation.
          Status: Resolved  (was: Patch Available)

Committed to 0.10

> CubeDimensions UDF
> ------------------
>
>                 Key: PIG-2168
>                 URL: https://issues.apache.org/jira/browse/PIG-2168
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2168.2.patch, PIG-2168.patch
>
>
> A prerequisite for a naive cubing implementation:
> A UDF that, given a set of dimensions (a, b, c) generates all the points on 
> the cube:
> (a, b, c), (a, b, null), (a, null, c), (null, b, c), (null, null, c), (a, 
> null, null), (null, b, null), (null, null, null).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to