[ 
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573520#action_12573520
 ] 

A B commented on DERBY-2998:
----------------------------

> the two last join queries both fail due to failure to classify some of the 
> PRNs as noops.

The code to check for "no-op" ProjectRestrictNodes depends, among other things, 
on object equality between two ResultColumn nodes.  See 
ResultColumnList.nopProjection().  I think the manipulation that is done in 
SelectNode.genProjectRestrict() re-arranges the tree and creates new nodes such 
that the object equality fails.  And even if the object equality wasn't an 
issue, I think re-arragement of the tree would cause the method to return false 
anyway.  Which corresponds with your findings on "redundant" result column 
lists.

I did some tracing for the queries that are still failing with patch 18.  The 
failure occurs during code generation, when we try to generate the predicate 
"x2.c4 = t1.i" as a qualifier on table T1.  When it comes time to generate the 
left operand, which points to a table that is BENEATH the WindowNode that was 
generated for the ROW_NUMBER(), we can't find the target result set number for 
the operand "X2.C4".  In looking at the predicate contents at the point of 
failure, I could see that the target result set number was buried down inside 
of a ResultColumn-to-VirtualColumn chain rooted at X2.C4.  Again, this matches 
your findings regarding "redundant" result columns.

I then replaced ROW_NUMBER() with a simple expression ("i+j") and stopped at 
the same point; in this case the target result set number is readily available 
(i.e. it's not buried) and the query passes.

On a whim, I then removed the following code from SelectNode:

  Code fragment "A":

   /*
    * The this.resultColumns object is shared with any PRNs above this
    * SelectNode in case this is a subquery. The following RCL
    * modifications cause problems in the column mapping of the upper
    * PRN unless we start off by making a copy for ourselves.
    */
    setResultColumns(resultColumns.copyListAndObjects());

When I took that out, the target result set number became readily available for 
the join queries and the queries executed without error, which was good.  Of 
course, the queries returned the wrong results, and several other ROW_NUMBER() 
queries which used to work started failing again.  So I tried to figure out why 
the code was added in the first place, as the comments didn't quite make it 
clear for me.

As far as I can tell, the scenario that prompted the addition of the above code 
is something like:

  select * from
    (select row_number() over(), 'HMM' from t1) x(a,c)
  where a > 2

which leads to a (simplified) tree that looks like:

  ... PRN0(a,c) -> SELECT(row_number(), 'HMM') -> ...

In this tree, PRN0's RCL is a different object than SELECT's RCL, and the 
columns in PRN0's RCL have VirtualColumnNodes that do not appear in SELECT's 
RCL. But underneath it all, both RCLs ultimately point to the same underlying 
ResultColumn objects.  Thus any changes we make to the underlying ResultColumns 
in SELECT's RCL will be "shared" by PRN0, meaning that the ResultColumns 
beneath PRN0's RCL will see the changes, as well.

>From what I can gather, the changes that we are talking about here deal with 
>virtual column ids.  During SelectNode.genProjectRestrict() we remove the 
>WindowFunctionColumns from SELECT's ResultColumnList and then adjust the 
>virtual column id's accordingly.  So we get:

  ... PRN0(a,c) -> SELECT('HMM') -> ...

In the absence of code fragment "A", the result column for "HMM" now has a 
virtual column id of "1" (instead of "2").  The fact that we changed the 
virtual column id from "2" to "1" means that the result column beneath "C" in 
PRN0's ResultColumnList *also* now has a virtual column id of "1".  That 
indicates that "C" should pull its value from the 1st column in its child's 
RCL. This is where things start to go wrong...

When we finish with SelectNode.genProjectRestrict(), our simplified tree looks 
like:

  PRN0(a,c)
    -> PRNDUMMY(row_number(), 'HMM')
      -> WindowNode(row_number(), 'HMM')
        -> PRN1('HMM') // this now has virtual column id of "1"
          -> ...

Note that PRNDUMMY will be a "no-op" project restrict and thus will not 
actually be generated.  So PRN0's child will end up being WindowNode, which has 
TWO columns.  But PRN0.C still has a virtual column id of "1", meaning that it 
should pull its value from the first column of PRN0's child.  PRN0's child is 
WindowNode, and the first column in WindowNode is row_number()--not 'HMM'.  So 
"C" is now pointing to the wrong place.

At code generation time, when we try to generate the projection columns for 
PRN0, we'll take a look at it's virtual column ids and generate the project 
"mapping" array from those.  In ProjectRestrictNode we find the following 
within "generateMinion()":

        // Map the result columns to the source columns
        int[] mapArray = resultColumns.mapSourceColumns();

The method "mapSourceColumns()" simply walks through the RCL and plugs in the 
virtual column ids for each of the columns.  For PRN0, since "A" and "C" both 
have a virtual column id of "1", we'll get back an array whose contents are 
"[1, 1]".

That array is then passed to the execution-time ProjectRestrictResultSet and is 
used for extracting columns from the underlying result set, which will be a 
WindowResultSet in this case.  So we'll get rows from WindowResultSet that have 
the values (1, 'HMM'), (2, 'HMM'), etc. That part is fine.  But then the 
ProjectRestrictResultSet for PRN0 will use the "mapping" array to pull columns 
out of each row from WindowResultSet.  Since _both_ elements in the array are 
"1", we extract out the 1st column and put it into _both_ places.  So the final 
rows will end up as (1, 1), (2, 2), etc. which is wrong.

All of that said, code fragment "A" resolved this problem by cloning SELECT's 
result column list and making all modifications on the clone.  So when the 
virtual column id for "HMM" changes from "2" to "1", PRN0 doesn't see the 
change and therefore code generation and execution run fine.

But as I mentioned at the start of this comment, the cloning leads to other 
problems--such as the queries that are still failing with patch 18.  To make a 
long story short, I tried various "tweaks" (or perhaps "hacks" is a better 
word) to get the virtual column ids for PRN0 to remain correct without having 
to clone SELECT's columns. In the end I couldn't find a clean way to do so.

But I did find that if I made the following changes, things started working:

1) REMOVE code fragment "A" from SelectNode, as mentioned above.
2) REMOVE the following from SelectNode.java:

    for (int index = 0; index < size; index++) {
        ResultColumn rc = (ResultColumn) originalRCL.elementAt(index);
        /*
         * NOTE: We only care about window function columns that appear
         * here, and not about references into subquerys or similar.
         */
         if (!rc.expressionIsWindowFunction()) {
             windowFunctionRCL.setElementAt(
                prnRSN.getResultColumns().elementAt(srcindex), index);
             srcindex++;
         }
    }

3) REMOVE the following from ResultColumnList:

    /*
     * All expressions are not columns if this RCL contains
     *  - A windowfunction column
     *  - A virtual column node pointing to window function column
     * This check is done as a separate loop to avoid incorrectly start
     * marking any/parts of the base table columns as correlated columns.
     * Marking a VCN pointing into a WindowResultSet as correlated cause
     * resultset reference errors during code generation.
     */
    for (int index = 0; index < size; index++) {
       ResultColumn rc = (ResultColumn) elementAt(index);
       if (rc.isWindowFunction()){
           return false;
        }
    }

4) Change ProjectRestrictNode.java as follows:

 // Map the result columns to the source columns
- int[] mapArray = resultColumns.mapSourceColumns();
+ int[] mapArray =
    resultColumns.containsWindowFunctionResultColumn()
        ? childResult.getResultColumns().mapSourceColumns()
        : resultColumns.mapSourceColumns();

I *think* this change has the effect of saying "ignore PRN0's RCL if it points 
to a WindowNode, because PRN0's RCL may not be correct--esp. it may have 
incorrect virtual column ids".  Thus instead of using "[1,1]" for the 
projection mapping, PRN0 will create its mapping from PRNDUMMY, which has 
correct virtual column ids of "[1,2]", so everything ends up okay.  At the same 
time, we did *not* have to clone the SELECT's ResultColumnList objects, and 
that appears to make the join queries run correctly.

With these changes, OLAP tests runs without error and the two join queries I 
posted previously also execute without error.  And from the brief looking that 
I did it seems like the results are correct, though it wouldn't hurt to 
double-check that.

All in all, though, I'd have to say this seems pretty hackish, so it may not be 
a proper solution.  But it does cause all of the queries posted so far to pass, 
and it simplifies the changes by removing the chunks of code mentioned above.  
So I figured I'd post my findings anyways.  I'll leave it up to you to 
determine if any of it is useful or worth pursing further...

> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
>                 Key: DERBY-2998
>                 URL: https://issues.apache.org/jira/browse/DERBY-2998
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Thomas Nielsen
>            Assignee: Thomas Nielsen
>            Priority: Minor
>         Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, 
> d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff, 
> d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-16.diff, d2998-16.stat, 
> d2998-17.diff, d2998-17.stat, d2998-18.diff, d2998-18.stat, d2998-4.diff, 
> d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff, d2998-6.stat, 
> d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat, d2998-9-derby.log, 
> d2998-9.diff, d2998-9.stat, d2998-doc-1.diff, d2998-doc-1.stat, 
> d2998-test.diff, d2998-test.stat, d2998-test2.diff, d2998-test2.stat, 
> d2998-test3.diff, d2998-test3.stat, d2998-test4.diff, d2998-test4.stat, 
> d2998-test6.diff, d2998-test7.diff, d2998-test8.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL 
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at 
> http://wiki.apache.org/db-derby/OLAPRowNumber

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to