[ 
https://issues.apache.org/jira/browse/PIG-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066023#comment-13066023
 ] 

Dmitriy V. Ryaboy commented on PIG-2167:
----------------------------------------

I believe there is value to providing the naive solution and improving on it 
later, rather than trying to build the optimal plan from the get-go.

Initial (naive) implementation plan:

Add an optional "WITH CUBE" clause to the group operator.

In LogicalPlanBuilder, if "WITH CUBE" is present, insert operators equivalent 
to the following above the group operator:

{code}
relation = foreach relation generate
   FLATTEN(CubeDimensions(dim1, dim2, dim3))
     as (dim1, dim2, dim3),
   other_fields;
{code}

It may be desirable in some cases to group by a superset of dimensions one 
wants to cube on: group by dim1, dim2, dim3 with cube on (dim1, dim2). If we 
want to support that use case, we simply need to know to call the UDF on (dim1, 
dim2) and push dim3 into the other_fields list.

Note also that there's a bit of a problem if null values are legitimate values 
for the dimensions, as we use null to indicate "all". The UDF provided in 
PIG-2168 allows one to use custom strings instead of null for the "all" marker. 
We can optionally support this in the grammar, as well.

> CUBE operation in Pig
> ---------------------
>
>                 Key: PIG-2167
>                 URL: https://issues.apache.org/jira/browse/PIG-2167
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>
> Computing aggregates over a cube of several dimensions is a common operation 
> in data warehousing.
> The standard SQL syntax is "GROUP relation BY dim1, dim2, dim3 WITH CUBE" -- 
> which in addition to all dim1-2-3, produces aggregations for just dim1, just 
> dim1 and dim2, etc. NULL is generally used to represent "all".
> A presentation by Arnab Nandi describes how one might implement efficient 
> cubing in Map-Reduce here: http://pdf.cx/44wrk
> We can start with the naive solution which only works for algebraic measures, 
> and work up from there.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to