[ 
https://issues.apache.org/jira/browse/DERBY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724860#action_12724860
 ] 

Bryan Pendleton commented on DERBY-3002:
----------------------------------------

Indeed, I *am* concerned that there may be some circumstances where the new 
technique
is less efficient.  An example might be a case where there is a large amount of 
data to be
processed, but a very small number of groups; in this case, most of the raw 
data ends up
being discarded during the GROUP BY processing, and the sort-observer technique 
will
discard the unneeded data sooner, which means it may have a substantial edge 
over the
new algorithm. However, other circumstances, such as those in which an index 
exists, and
thus the data can be processed in sorted order automatically, and those in 
which the data
is relatively small, and thus is not expensive to sort, and those in which 
there is a lot of data,
but there are also many different values for the grouping column, may not see 
much impact at all.

And, I am just guessing about the performance impacts; I don't know how 
important
this distinction will be, given the multitude of other things that occur during 
query processing.

Earlier in the project, I intended to support multiple algorithms. However, I 
then realized:
 - we'd have to have multiple sets of code for the same functionality
 - we'd have to have some way of determining, at runtime, which implementation 
to choose.

Both problems seemed quite troubling, the second problem seemed very important,
because if the selection of the appropriate algorithm is based on information 
about
the size and distribution of the data being processed by the query, then the 
decision ought
to be made by the optimizer, which appeared like it would dramatically increase 
the
complexity of this project.

So it would be great if the single implementation was "good enough" for the 
queries we
expect to run.

I'll try to put a benchmark together and we can see what the results say, and 
then we'll
have a better idea of how big a problem we have here.

> Add support for GROUP BY ROLLUP
> -------------------------------
>
>                 Key: DERBY-3002
>                 URL: https://issues.apache.org/jira/browse/DERBY-3002
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 10.4.1.3
>            Reporter: Bryan Pendleton
>            Assignee: Bryan Pendleton
>            Priority: Minor
>         Attachments: fixWhiteSpace.diff, IncludesASimpleTest.diff, 
> passesRegressionTests.diff, prototypeChangeNoTests.diff, 
> rewriteGroupByRS.diff, rollupNullability.diff, useLookahead.diff
>
>
> Provide an implementation of the ROLLUP form of multi-dimensional grouping 
> according to the SQL standard.
> See http://wiki.apache.org/db-derby/OLAPRollupLists for some more detailed 
> information about this aspect of the SQL standard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to