[ 
https://issues.apache.org/jira/browse/DERBY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521308
 ] 

Bryan Pendleton commented on DERBY-3002:
----------------------------------------

Hi Manish thanks for reading the diff. You are absolutely right that it is 
possible to compute the rollups with a single sort, but I haven't made that 
optimization yet. I shall work on improving the wiki to make this clearer.

As I see it, there are basically 3 ways to perform grouped aggregation, and 
numerous ways to combine these three variants.
1) If the records to be grouped are already in sorted order, perhaps due to a 
scan of an index, or a previous merge-sort-join operation, or a 
duplicate-eliminating distinct sort, then the grouping operation can be 
implemented by comparing each pair of records on the grouping attributes; each 
time a new value of the grouping attributes is encountered, the computation of 
the previous group can be finalized and emitted. This is the algorithm 
discussed in the wiki page, and is currently implemented by Derby for a single 
set of grouping attributes; I have not yet constructed a patch to extend this 
logic for a ROLLUP set of groups.
2) An unsorted set of records can be grouped by feeding them into the sorter, 
configured for duplicate elimination, with a sort observer which aggregates 
each row's values into the retained row's columns during duplicate processing. 
Each such sort groups the records by a specific set of grouping attributes. The 
patch that I constructed uses N sorts to group the records by N different sets 
of grouping attributes. The input rows are read only once, but then are added 
to each of the N sorts. This is dramatically more efficient than the alternate 
query-rewriting algorithm of constructing N GROUP BY statements and then 
UNION'ing their results together, since in this implementation the input rows 
are materialized and read only once.
3) An unsorted set of records can also be grouped by feeding them into a hash 
table, whose keys are the grouping attributes, with a hash-collision observer 
which aggregates each row's values into the retained row's columns during 
duplicate key elimination. This is fundamentally similar to the sort-based 
technique, with the important difference that at the end of the processing, the 
rows will NOT be emitted in sorted order.

Derby currently implements techniques (1) and (2) for grouped aggregate 
computation; I am not aware of any code which implements a hash-based technique 
for grouped aggregates in Derby.

As Manish notes, techniques (1) and (2) can be combined into an algorithm which 
computes the N ROLLUP groups using a single sort, by first sorting (and 
aggregating) the rows on their finest level of grouping detail. Then, as the 
rows are being retrieved in sorted order, the code could detect when a new 
value of grouping attributes is being retrieved, and could automatically 
compute that higher level group at that time. Thus if we are computing the 
GROUP BY ABCD, ABC, AB, and A, we can:
 - sort (and group and aggregate) the data on ABCD
 - For each row we retrieve, we can propagate it up the pipeline to compute the 
appropriate ABC group, the appropriate AB group, the appropriate A group, and 
the appropriate all-rows group.

This algorithm seems to hold the potential for being quite efficient, but I 
haven't investigated the actual coding yet, and for the time being as an 
experiment I investigated the easier-to-code but less efficient multiple-sorts 
algorithm.


> Add support for GROUP BY ROLLUP
> -------------------------------
>
>                 Key: DERBY-3002
>                 URL: https://issues.apache.org/jira/browse/DERBY-3002
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Bryan Pendleton
>            Assignee: Bryan Pendleton
>            Priority: Minor
>         Attachments: prototypeChangeNoTests.diff
>
>
> Provide an implementation of the ROLLUP form of multi-dimensional grouping 
> according to the SQL standard.
> See http://wiki.apache.org/db-derby/OLAPRollupLists for some more detailed 
> information about this aspect of the SQL standard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to