Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Army Mon, 24 Jul 2006 16:29:21 -0700

Hi Manish,

I have to be honest and say that I haven't taken the time to fully understandthe "guts" of this patch--namely, the part where you rewrite the group-by treein GroupByNode.addAggregateColumns(). But I did do some review of the overallpatch and I have the following comments. If you can address these I think theresultant patch will be a lot easier to read (and will also be smaller :) sothat I or anyone else reviewing can focus more on the "guts" of what this patchis really doing...


------------------------------------------------------------------

0) Patch doesn't apply cleanly with latest codeline--there appear to beconflicts with the JUnit test changes and also one conflict in sqlgrammar.jj. Imanually resolved these conflicts in my local codeline so that I could run sometests/view the code, but I think it'd be appropriate to post an updated patch(sync'd with the latest codeline) to ease the review process for others.

1) TernarnyOpNode.isEquivalent(): I think we need to check "receiver"equivalence as well, in addition to left and right operands. For example, I didthe following and I received an error as expected:


ij> create table s1 (vc varchar(20), vc2 varchar(10));
0 rows inserted/updated/deleted

ij> insert into s1 values ('allo', 'bye'), ('inigo', 'montoya'), 
('six','fingers');
3 rows inserted/updated/deleted

ij> select substr(vc, 3, 4) from s1 group by vc2;

ERROR 42Y30: The SELECT list of a grouped query contains at least one invalidexpression. If a SELECT list has a GROUP BY, the list may only contain validgrouping expressions and valid aggregate expressions.

I then I did the following, expecting to see the same error--but this actuallyworks:


ij> select substr(vc, 3, 4) from s1 group by substr(vc2, 3, 4);
1
----
igo
lo
x

3 rows selected

I think this is because TernaryOperatorNode.isEquivalent() doesn't compare"receiver" and thus thinks that vc and vc2 are the same, hence allowing thegroup by. Is this supposed to work or is it supposed to throw an error? Myunderstanding is that it should fail, but I could of course be wrong...


2) Aren't we losing information with this change?

In VerifyAggregateExpressionsVisitor.java:

-                               throw

StandardException.newException(SQLState.LANG_INVALID_COL_REF_GROUPED_SELECT_LIST,cr.getSQLColumnName());

+                               throw
StandardException.newException(SQLState.LANG_INVALID_GROUPED_SELECT_LIST);

Before your changes we see the column reference name:

ij> select substr(vc, 3, 4) from s1 group by vc2;

ERROR 42Y36: Column reference 'VC' is invalid. For a SELECT list with a GROUPBY, the list may only contain grouping columns and valid aggregate expressions.


But after the patch is applied, we lose the name of the invalid column 
reference:

ij> select substr(vc, 3, 4) from s1 group by vc2;

Generally it seems like a good idea to provide the user with as much informationas possible for error conditions like this. Is it possible to preserve theresult column name in cases like this (i.e. cases where we do actually know whatthe invalid column reference is)?

3) In UnaryOperatorNode.java the check for equivalence of the operand has threeparts to it:


+       return (operator.equals(other.operator)
+                       || (operand == other.operand)
+                       || ((operand != null) && operand.isEquivalent(other)));

But for other classes--such as BinaryOperatorNode.java--the check is only asingle call to "isEquivalent" for each operand:


+               return methodName.equals(other.methodName)
+                      && leftOperand.isEquivalent(other.leftOperand)
+                      && rightOperand.isEquivalent(other.rightOperand);

Can you explain why the operand for UnaryOperatorNode has three checks(.equals(), "==", and .isEquivalent()) while the operands for BinaryOperatorNodeonly have a single check (.isEquivalent())? Some comments to describe thedifference might be useful in these classes.

3) I think there are several places in this patch where you have commentsexplaining "what" by not explaining "why"--and I think the latter would behelpful. For example:


In VerifyAggregateExpressionsVisitor.java:

Why do we disallow expressions involving native java computation in thefollowing code?


+    } else if (node instanceof JavaToSQLValueNode)
+    {
+      // disallow any expression which involves native java computation.
+      throw StandardException.newException( (groupByList == null) ?
+          SQLState.LANG_INVALID_NON_GROUPED_SELECT_LIST :
+          SQLState.LANG_INVALID_GROUPED_SELECT_LIST);

Why do we not visit children under subqueries in the following code?

        /**
-        * Don't visit children under an aggregate
+        * Don't visit children under an aggregate, subquery or any node which
+        * is equivalent to any of the group by expressions.

In SelectNode.java:

Why do we need to add a call to preprocess in the following code?

+    if (groupByList != null)
+    {
+      groupByList.preprocess(numTables, fromList, whereSubquerys, 
wherePredicates);
+               }
+

For this one you can probably just use the comments that you already included inyour patch description, namely: "This is needed because expressions can getrewritten in the select list but not in the grouping list causing problems whenthe result set is generated." It'd be great if this explanation was included inthe actual code.


In GroupByNode.java:

Why do we always say that the source is not in sorted order if we have anexpression instead of a ColumnReference?


-        crs[index] = gc.getColumnReference();
+        if (gc.getColumnExpression() instanceof ColumnReference)
+        {
+          crs[index] = (ColumnReference)gc.getColumnExpression();
+        }
+        else
+        {
+          isInSortedOrder = false;
+          break;
+        }

Also, in the comment you posted to DERBY-883 with the patch you explained howthe rewrite is different now:

o the rewrite logic is a bit different now. Earlier, we would change
each unaggregate ColumnReference in the select list and point it to the
GroupByColumn. Now we replace each group by expression that we find
in the projection list with a virtual column node that effectively points
to a result column in the result set doing the group by. In addition
the original routine which did the rewrite is now split up into two separate
and smaller routines: addUnAggColumns and addAggregateColumns.

It would be great if this explanation could be added to the relevant parts inthe code and/or if the comments for that section could be updated to match thenew behavior. This will make it easier for people to understand what is goingon when they read the code several years from now.


In GroupByList.java:

Why don't we need to add a ResultColumn/ColumnReference pair to SelectNode's RCLwhen the grouping column is an expression (as opposed to a ColumnReference), asin the following code?


@@ -185,7 +174,8 @@
    /* If no match found in the SELECT list, then add a matching
     * ResultColumn/ColumnReference pair to the SelectNode's RCL.
     */
-       if (! matchFound)
+       if (! matchFound &&
+           groupingCol.getColumnExpression() instanceof ColumnReference)
        {
                ResultColumn newRC;

These are just some examples; I think there are others too. If you can lookagain at the patch and try to add comments explaining *why* we do or do not needto do things like this, that would be a huge help to those reading the code nowand in the future. These don't have to be extensive--even just a line or twocan be very helpful.

4) In GroupByColumn and GroupByList you have deleted a bunch of methods thatapparently are unused. While this is a useful thing to do, I don't know if wewant to do that as part of the DERBY-883 patch--it makes the diff bigger anddoesn't add to the patch functionally. Instead you could file a new Jira entryand attach a separate patch that removes the unnecessary methods to that Jiraissue, to be reviewed/committed separately. Or you could create a follow-uppatch for this issue that just separates out the changes for removing these methods.

5) Extraneous diff in QueryTreeNode.java? Are any of these changes actuallyrequired or are they just there for debugging? If they are an intentional partof the patch then a) comments explaining these changes would be helpful, and b)it might be good to actually delete the commented out lines altogether, insteadof leaving them in to clutter the code.


6) New methods could use javadoc explaining in more detail what they do:

GroupByNode.addUnAggColumns()
GroupByNode.addAggregateColumns()

7) There are a couple of places where you have commented lines out instead ofdeleting them. If there's a reason you left these in, then comments explainingthat would be great; otherwise I think it's cleaner to just delete the linesaltogether. For example:


In GroupByColumn.java:

@@ -48,7 +48,8 @@
         */
        public void init(Object colRef)
        {
-               this.colRef = (ColumnReference) colRef;
+               //this.colRef = (ColumnReference) colRef;
+               this.columnExpression = (ValueNode)colRef;
        }

        /**

8) I think the diff for master/subquery.out has some stuff from a failed mergein it:


Index: java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
===================================================================
--- java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
(revision 422486)
+++ java/testing/org/apache/derbyTesting/functionTests/master/subquery.out
(working copy)
@@ -1016,8 +1016,14 @@
                                        Number of rows visited=9
                                        Scan type=heap
                                        start position:
+<<<<<<< .mine
 null                                   stop position:
+null
+                                       scan qualifiers:
+=======
+null                                   stop position:
 null                                   scan qualifiers:
+>>>>>>> .r422486
 None
                                        next qualifiers:
 None

**** Nit-picking: Changes that you don't *have* to do but that might help incleaning up the patch.


A. Unnecessary white-space changes?

In SelectNode.java:

-                       VerifyAggregateExpressionsVisitor visitor =
-                               new 
VerifyAggregateExpressionsVisitor(groupByList);
+                       VerifyAggregateExpressionsVisitor visitor =
+                               new 
VerifyAggregateExpressionsVisitor(groupByList);
                        resultColumns.accept(visitor);
-               }
-
+        }
+
+
+               

In GroupByNode.java:

-                                                                               
C_NodeTypes.RESULT_COLUMN_LIST,
-                                                                               
getContextManager()),
-                                       ((FromTable) 
childResult).getTableNumber());
+                                                       
C_NodeTypes.RESULT_COLUMN_LIST,
+                                                       getContextManager()),
+                               ((FromTable) childResult).getTableNumber());

...

-               ** For each aggregate
-               */
+                ** For each aggregate
+                */

...

-
+                       
                        /*
-                       ** AGG RESULT: Set the aggregate result to null in the
-                       ** bottom project restrict.
-                       */
+                        ** AGG RESULT: Set the aggregate result to null in the
+                        ** bottom project restrict.
+                        */

... etc.

There are quite a few white-space changes in this patch that make it hard tofigure out what has actually changed.


B. Lines longer than 80 chars:

In SelectNode.java:

@@ -965,6 +956,11 @@
                                                                   
wherePredicates);
                }

+               if (groupByList != null)
+               {
+                       groupByList.preprocess(numTables, fromList, 
whereSubquerys, wherePredicates);
+               }
+               

In VerifyAggregateExpressionsVisitor.java:

+               return ((node instanceof AggregateNode) ||
+                               (node instanceof SubqueryNode) ||
+                               (node instanceof ValueNode &&
+                                               groupByList != null
+                                               && 
groupByList.findGroupingColumn((ValueNode)node) != null));

I know, this whole 80 character thing is a bit picky--but that seems to be therule adopted by most developers in the Derby code, so I'm just bringing it up...


------------------------------------------------------------------

I know that's a lot of feedback and very little of it has to do with the actualcorrectness of the code--so my apologies if this is a bit annoying. But I dothink that some additional comments and an effort to reduce the extra diffs willmake this a cleaner patch and will help developers in the long and short-termwho might find themselves reading this code.

Functionally, this is a great patch--an excellent feature enhancement for Derby.Thank you so much for taking the time to figure out all of the details--andfor writing such extensive tests, as well. I hope this feedback doesn'tdiscourage you from continuing to work on this or other Derby changes: I'm justposting feedback based on my own opinions. If you disagree with anything I'vewritten here, please feel free to speak up!


Thanks again for all of the work on this,
Army

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Reply via email to