Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Manish Khettry Fri, 28 Jul 2006 14:15:18 -0700

On 7/26/06, Army <[EMAIL PROTECTED]> wrote:

Hi Manish,

Thanks for addressing my comments.  The patch is looking better now--especially
with the whitespace issues in version 4.   Some follow-up comments below.

Manish Khettry wrote:
>> 1) TernarnyOpNode.isEquivalent (): I think we need to check "receiver"
>> equivalence as well, in addition to left and right operands.
>
> yes, you are right on this one. The new patch fixes this.

Thanks for fixing this.  If possible, could you add a test case to your JUnit
test to demonstrate?  Could just be the query that I posted in my previous comments.

I have added a couple of queries to make sure that TernaryOpNode#isEquivalent is tested.

> Actually the existing error message no longer makes sense with group by
> expressions. For example, the line of code that you point to, also catches
> errors like this:
> select c1+1,  avg(c2) from test group by c1+10;
> Does the existing error message make sense for this query?

Not with this query, no.  But two things to note here:

1) Before your patch, if you replace "group by c1+10" with "group by c2" you'll
get error 42Y36 saying that column reference C1 is invalid, even though it's
actually part of an _expression_ ("c1+1")--so that type of behavior *does* already
exist, even if it doesn't really "make sense".  If you change the SQLSTATE and
error message for this case, the result is a change (albeit a pretty small one)
in customer apps, which is really the issue here.

2) The key _expression_ in my previous comment was "in cases like this", where
"like this" meant queries where the existing error message *does* make sense.

For example, with the following query it makes sense to include the column
reference because we know what it is:

ij> select c1 from test group by c2;
ERROR 42Y36: Column reference 'C1' is invalid.  For a SELECT list with a GROUP
BY, the list may only contain grouping columns and valid aggregate expressions.

And that's what we'll get before your patch.  But after your patch we a) get a
different SQLSTATE and b) lose the column reference name:

ij> select c1 from test group by c2;
ERROR 42Y30: The SELECT list of a grouped query contains at least one invalid
_expression_. If a SELECT list has a GROUP BY, the list may only contain valid
grouping expressions and valid aggregate expressions.

So now we've changed the way an existing query behaves (by returning a different
SQLSTATE)--which (I think?) means we need to mark this as "Existing Application
Impact"--*and* we're now providing the user with less information.

I see your point but I would prefer to keep it this way, first because the error message in question 42Y36-- even though it gives more information, is misleading; specifically this phrase: "the list may only contain grouping columns .....". I also don't like error handling for a subset of the cases taking recourse to instanceof. I'll throw it up to the community to weigh in.

I can see that it would be useful to point out which _expression_ is invalid as opposed to saying ".. atleast one invalid _expression_....". If the community wants this, then I'd like to be done with this patch and file another issue to specifically improve the error handling of this case.

If it's possible to figure out when we have a simple column reference (c1)
verses an _expression_ (c1+1) and to throw a corresponding error, I think that
would be ideal.  But if you decide that it's not possible or not worth it to
preserve the error for queries with simple column references, then feel free to
leave this as it is.  If that's your decision, though, I think you should ask
derby-dev if a different SQLSTATE for the same query counts as "Existing
Application Impact", and if so, mark this issue as appropriate.

>> + return (operator.equals(other.operator)
>> +        || (operand == other.operand)
>> +        || ((operand != null) && operand.isEquivalent(other)));
>
> If you look at bindUnaryOperator for UnaryOperatorNode, you will see that
> it is possible for operand to be null; not so in BinaryOperatorNode.

So the assumption here is that if (operand == other.operand) then both operands
are null?  If so, a simple comment saying that could be helpful.

OK. I think the code speaks clearly for itself (the second part of the || checks for != null before calling isEquivalent) but I've added the comment. I like to add comments if they really clarify something that the code may not. Usually the code changes and the comments don't and thats worse than no comments at all. Those XP guys were onto something!

> In GroupByList.java:
<snip code snippet>
>
> That is a good question-- this bit of code add invisible column references
> when a column reference in the group by list, is not in the projection
> list;
> i.e.
>
> select c2, sum(c3) from test group by c1;
>
> will add c1 to the projection list as a generated column. I'm not sure if
> you want to do the same kind of rewrite for expressions. I thought this
> wasn't worth doing but I'm not sure anymore.

I just tried this query with and without your patch and it fails in both cases
with error 42Y36/42Y30. So I guess I'm a little confused as to why Derby would
add "c1" to the projection list and what good it does us?

Sorry, wrong query, try

select c2, sum(c3) from test group by c1,c2;

If you're not sure whether or not it's worth adding equivalent logic for
expressions, do you plan to follow-up with this later? If not, a comment
explaining what you have found (or what your uncertainties are) might help other
developers down the road (esp. if a bug comes up in this area later).

I'll add a comment. My preference would be to leave as is and evaluate it later.

> I disagree with you here-- when a contributor has spent a considerable
> amount of time understanding and making a lot of changes to a class, it
> makes sense to remove dead code at the same time. If you don't, stuff like
> this will likely get ignored or slip through the cracks, imho.

I didn't mean to say that the changes were not useful nor that they shouldn't be
committed; just that it might make it easier to track "cleanup" vs "functional"
changes by filing a separate Jira issue to remove the dead code and posting an
isolated patch to that issue. The changes will still go in--no slipping through
the cracks--but they will go in as part of their *own* Jira commit, which seems
cleaner to me.

In any event, that was just a suggestion. No need to let this prevent the
commit of the patch.

OK, I see your point although the overhead of a new jira issue, code reviews and such is a bit high, imho.

> 7) There are a couple of places where you have commented lines out instead
>
> yes, you're right. I've removed it.

There are still a couple of these lingering:

I got rid of both of them.

>> A. Unnecessary white-space changes?
>
> Atleast some of these were done to improve existing indentation. Next time,
> I'll resist the urge to improve readability.

Thanks for cleaning a bunch of these whitespace diffs up in the version four
patch.  That said, there are still quite a few white-space changes left--did you
intentionally leave those in?  Ex.

GroupByNode.java:

@@ -470,10 +489,10 @@
                        ** bottom project restrict.
                        */
                        newRC = (ResultColumn) getNodeFactory().getNode(
-                                                                               C_NodeTypes.RESULT_COLUMN,
-                                                                               "##aggregate result",
-                                                                               aggregate.getNewNullResultExpression(),
-                                                                               getContextManager());
+                                       C_NodeTypes.RESULT_COLUMN,
+                                       "##aggregate result",
+                                       aggregate.getNewNullResultExpression (),
+                                       getContextManager());
                        newRC.markGenerated();
                        newRC.bindResultColumnToExpression();
                        bottomRCL.addElement (newRC);

Maybe one or two slipped by-- this one does reduce the indentation and its possible it was going past the right margin and I decided to fix it. Some of the others really served no purpose and I got rid of them.

1 -- SubstituteExpressionVisitor.java:

* The copyright says 1998, 2004; I think this is supposed to just say 2006
since that's when the file was created.

OK.

2 -- I see that in version 4 of the patch you updated UnaryOperatorNode to have
the following logic in isEquivalent():

+               UnaryOperatorNode other = (UnaryOperatorNode)o;
+               return (operator.equals(other.operator) &&
+                       ((operand == other.operand)||
+                        ((operand != null) && operand.isEquivalent(other.operand))));

Did you become aware of this problem because of some test(s) that you were
running?  If so, it'd be great if you could add the actual test case(s) to your
JUnit test for regression purposes.  The more tests there are to check for these
kinds of subtleties, the better...

Not tests, just visual inspection. I was going over the email exchange when it hit me. I thought about tests and the only one case where the operand is null is count(*)-- since isEquivalent will only be called for grouping expressions, we can never exercise this particular branch of the conditional.

And finally:

> Manish Khettry updated DERBY-883:
> ---------------------------------
>
>     Attachment: 883.patch4.txt
>
> I found a problem in one of my changes. Also got rid of a lot of
> white space only diffs.

In the future it would be helpful if you could say what changed between versions
of the patch, so that reviewers don't have to scour the patches to try to figure
out what changed.  I managed to find out what "a problem" meant in this case,
but it's generally a good idea to explicitly state the problem/changes in the
comments.  Makes my life as a reviewer easier :)

OK. I will attach a new patch shortly incorporating your comments and the NPE that Yip found. Also, just as an fyi-- I will be away all of next week, so if you have additional comments, I can only get to them, the second week of august.

m

Re: [jira] Updated: (DERBY-883) Enhance GROUP BY clause to support expressions instead of just column references.

Reply via email to