[jira] Updated: (DERBY-2998) Add support for ROW_NUMBER() window function

Thomas Nielsen (JIRA) Tue, 26 Feb 2008 03:56:49 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas Nielsen updated DERBY-2998:
----------------------------------

    Attachment: d2998-16.stat
                d2998-16.diff

Thanks again for having a look Army!

Attaching patch d2998-16.diff that fixes most of Army comments, plus a few 
other very minor issues. Testwise I get the same result with patch 16.

> 1)  ....  I wonder if it would be worth it to add a new "isWindowColumn()" 
> method

Agree - makes the code more easily readable. Fixed in patch 16.

Also added RC.isVirtualWindowColumn() for checking for a VirtualColumnNode 
pointing to a WindowFunctionColumnNode.

> 2)
> It might be good to add an ASSERT here to make sure that the 'hopefully' 
> condition is satisified...

The comment is not correct and should be removed. Fixed in patch 16 by 
rephrasing to reflect actual conditions.

> 3)
> Many of the fields in WindowNode (and other new classes added by this patch) 
> are 
> "public", ...

Fixed in patch 16

> 4)
> In WindowNode.java there is a field "level" which is initialized to -1 and 
> then is set from
> SelectNode.java.  

Patch 16 adds getters and setters, as well as comments on use of 'level'.

I'm not too satisfied with the variable name 'level' but it was the best 
I could think of...

> 5)
> In ColumnReference.java there are two different kinds of checks to see if the 
> ColumnReference is pointing to a WindowColumnNode.  ...

Yes, the two different approaches are intentional. See updated comment for 
CR.pointsToWindowFunction().
Fixed to use isWindowFunction() and isVirtualWindowFunction() in patch 16.

> 6)
> See comment #1 above for an alternate (cleaner?) approach...

Patch 16 now uses isWindowFunction() and new isVirtualWindowFunction() for 
this if().

> 7)
> The result set statistics node for WindowResultSet does not currently print 
> out the 
> name of the result set...(I don't think). 

I'll need to look into this.

> 8)
> If I'm understanding correctly, the execution-time logic for WindowResultSet 
> operates 
> in a fashion similar to ProjectRestrictResultSet.

True, but more like the 'inverse' of PRRS. The PRRS gets many columns 
from its child resultset, and projects a few of these. The WRS projects 
a few columns into the many destination columns, along with its 
windowfunction result.

>  That is, it iterates through all of its source's rows and, for each one, it 
> applies a 
> restriction.  If the row fails the restriction, then the iteration moves on 
> to the next source
> row, and so on.  This agrees with the following comments, pulled from the 
> getNextRowCore() method of WindowResultSet:
> 
>    /*
>     * Loop until we get a row from the source that qualifies, or there are
>     * no more rows to qualify. For each iteration fetch a row from the
>     * source, and evaluate against the restriction if any.
>     */
> 

What should happen is that the outer selects(S1) PRN calls getNextRow() 
that again calls into the child/subquerys(S2) PRN getNextRowCore(). That 
again calls into the chain of WindowResultSets and PRNs, and eventually 
ends up in a a BulkTableScanResultSet.getNextRowCore() or similar to 
actually fetch *one* row. This single row is then pulled up the 
resultset chain, and the restriction is evaluated at the level it's been 
pushed down to.

In your example S1 evaluates the restriction. It's not 
pushed from S1 into S2 for evaluation as it's a restriction on a window 
function 
column of S1s subquery/child resultset.
S1 knows that column r is a row number column and that it 
is ascending, so we should stop pulling rows from below once we reach 
the condition.

See reply to #9 too.

> If this is correct, then I'm not entirely sure we'll get the performance 
> boost that one might perhaps expect from a query like:
> 
>     SELECT * FROM (
>       SELECT row_number() over () as r, t.* FROM T
>     ) AS tmp WHERE r <= 3;
> 
> .... For the above query, the plan shows the following for the top-most 
> ProjectRestrictResultSet:
> 
>   ******* Project-Restrict ResultSet (1):
>   Number of opens = 1
>   Rows seen = 1280
>   Rows filtered = 1277
>   restriction = true
> 

This looks wrong given my above explaination.
It seems we don't take the fact that we know we have an ascending column 
into consideration any more (we used to). I'll have to have another look at 
this.

I haven't focused on the store layer at all. But based on how many rows 
there are in your table the store layer may load either all or parts of 
your data from disk into its cache. But they should not be pulled up the 
resultset chain to either S2 or S1.

> 9)
> For the query mentioned in comment #8 above, the query plan shows:
> 
>   ProjectRestrictResultSet (sees ALL rows, restricts down to 3)
>     -> ProjectRestrictResultSet (sees ALL rows)
>       -> WindowResultSet (sees ALL rows)
> 
> I intuitively expected the restriction to appear at the level of the 
> WindowResultSet, as 
> opposed to appearing at the level of the top-most PRN.  The plan as shown 
> above 
> makes it look like the restriction is not being enforced by the Window 
> ResultSet, but is 
> instead being enforced by the ProjectRestrict. Is that correct?  Or is this 
> just a case 
> where the query plan fails to capture what is really happening with the 
> WindowResultSet?  I'm guessing the latter, but am not sure...

This is (more or less) correct.

As explained above, the top PRN must do the restriction, but it should 
never *fetch* all the rows.

I briefly tried pushing the restriction down to the WindowResultSet some time 
ago, but the expression generation/execution code does not 
currently want to work with the window function columns.
I did not pursue this further, but agree it should be looked at for a 
future optimization.

However, the code needed to do restriction at the WindowResultSet level 
have been kept in its getNextRowCore(). This is the reason for the loop 
and comment mentioned in reply to #8.

> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
>                 Key: DERBY-2998
>                 URL: https://issues.apache.org/jira/browse/DERBY-2998
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Thomas Nielsen
>            Assignee: Thomas Nielsen
>            Priority: Minor
>         Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, 
> d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff, 
> d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-16.diff, d2998-16.stat, 
> d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff, 
> d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat, 
> d2998-9-derby.log, d2998-9.diff, d2998-9.stat, d2998-doc-1.diff, 
> d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff, 
> d2998-test2.stat, d2998-test3.diff, d2998-test3.stat, d2998-test4.diff, 
> d2998-test4.stat, d2998-test6.diff, d2998-test7.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL 
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at 
> http://wiki.apache.org/db-derby/OLAPRowNumber

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-2998) Add support for ROW_NUMBER() window function

Reply via email to