[ 
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572260#action_12572260
 ] 

A B commented on DERBY-2998:
----------------------------

Hi Thomas, thanks for your continued work on this.

> Sorry about that :|

Not a problem :)

I haven't had a chance to look at the failing queries that you refer to in 
test7, but I did take a look at patch 15.  Some comments/questions after my 
first run through are below.

1)  There are a lot of places with logic similar to the following:

  ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
  if (rc.getExpression() instanceof WindowFunctionColumnNode)
  {
    ...
  }

Since WindowFunctionColumnNode extends ResultColumn, I wonder if it would be 
worth it to add a new "isWindowColumn()" method (or something like that) to 
ResultColumn.java and let that method contain all of the necessary logic for 
finding a WindowFunctionColumnNode.  If that was done then the above type of 
"if" expression becomes simply:

  ResultColumn rc = (ResultColumn) resultColumns.elementAt(index);
  if (rc.isWindowColumn())
  {
     ...
  }

which makes it easier to extend the check for "window columns" in the future if 
needed.  For example, would it ever be necessary to walk further down the 
ResultColumn's expression chain to look for a RowNumber column?  That is, could 
we ever have:

   ResultColumn
     -> VirtualColumnNode
        -> ResultColumn
          -> ...
            -> RowNumberColumnNode

If so, then the "isWindowColumn()" as defined in ResultColumn.java could 
include the necessary logic to walk the chain, which would simplify the calling 
code.  See, for example, comments #5 and #6 below.

2)

In WindowNode.bind() there is the following:

  /* At this stage there is hopefully a PRN below us as the fromList */
  return this;

It might be good to add an ASSERT here to make sure that the 'hopefully' 
condition is satisified...

3)

Many of the fields in WindowNode (and other new classes added by this patch) 
are "public", but some of them are never referenced outside of WindowNode. 
Unless a "public" declaration is functionally necessary, it seems like it would 
be safer to make them private and then supply the necessary getter/setter 
methods where needed, to avoid accidental modification.

4)

In WindowNode.java there is a field "level" which is initialized to -1 and then 
is set from SelectNode.java.  But as far as I can tell there are no comments 
explaining what that field does.  I think I was able to figure it out from the 
code, but it'd be nice to have explanatory comments within WindowNode itself.  
And per comment #3 above, I think it'd be nice to use a setter method in 
SelectNode instead of direct assignment to the public field.

5)

In ColumnReference.java there are two different kinds of checks to see if the 
ColumnReference is pointing to a WindowColumnNode.  The first is via the method 
"pointsToWindowFunction()", which does an instanceof check on the source 
ResultColumn's expression and returns true if it is a WindowFunctionColumnNode. 
 The second is in the categorize() method, where the logic only evaluates to 
true if the source ResultColumn's expression is a *VirtualColumnNode* whose 
source ResultColumn's expression is a WindowFunctionColumnNode.  So the checks 
do not appear to be equivalent.  Is that intentional?  If so, then explanatory 
comments might be good here.  Also, I believe this is a good example of the 
kind of code that might benefit from an "isWindowColumn()" function on 
ResultColumn, per comment #1 above.

6)

In ResultColumnList.java, there is the following:

    /* Window function columns are added by the WindowResultSet */
    if (sourceExpr instanceof WindowFunctionColumnNode ||
        (sourceExpr instanceof VirtualColumnNode &&
        (sourceExpr.getSourceResultColumn().getExpression()
            instanceof WindowFunctionCumnNode)))
    {
        continue;
    }

See comment #1 above for an alternate (cleaner?) approach...

7)

The result set statistics node for WindowResultSet does not currently print out 
the name of the result set...(I don't think).  This is pretty minor, but I was 
briefly confused when I saw the RealWindowResultSetStatistics class and yet the 
query plans weren't showing anything "window" releated.

8)

If I'm understanding correctly, the execution-time logic for WindowResultSet 
operates in a fashion similar to ProjectRestrictResultSet.  That is, it 
iterates through all of its source's rows and, for each one, it applies a 
restriction.  If the row fails the restriction, then the iteration moves on to 
the next source row, and so on.  This agrees with the following comments, 
pulled from the getNextRowCore() method of WindowResultSet:

   /*
    * Loop until we get a row from the source that qualifies, or there are
    * no more rows to qualify. For each iteration fetch a row from the
    * source, and evaluate against the restriction if any.
    */

If this is correct, then I'm not entirely sure we'll get the performance boost 
that one might perhaps expect from a query like:

    SELECT * FROM (
      SELECT row_number() over () as r, t.* FROM T
    ) AS tmp WHERE r <= 3;

I made a similar comment on Februrary 5th that was based on materialization of 
the source result set, and that comment ended up being wrong because I 
misunderstood materialization.  But in looking at this query again, I think 
something might still be slightly sub-optimal here.

If we take the approach of "loop until we get a row that qualifies", then we 
will loop through _all_ of the rows in table T, even though only the first 3 
satisfy the restriction.  In the end we will correctly return the first 3 rows 
(and only the first 3 rows) but not before we've read all of the others from 
disk and applied the "<= 3" restriction to each one.  This can be observed by 
either a) adding debug printouts to WindowResultSet.getNextRowCore() for every 
call to "source.getNextRowCore()", or b) looking at the query plan.  For the 
above query, the plan shows the following for the top-most 
ProjectRestrictResultSet:

  ******* Project-Restrict ResultSet (1):
  Number of opens = 1
  Rows seen = 1280
  Rows filtered = 1277
  restriction = true

So we actually read all 1280 rows from disk, then filtered 1277 of them out so 
that, in the end, we only returned 3 rows.  Thus from a performance perspective 
the ROW_NUMBER() function does not appear to improve things.  Is that a known 
limitation of the current development (which, for the record, would be 
perfectly acceptable)?

Having said that, there is absolutely _NO_ need for you to boost performance 
before committing the changes :)  Incremental development is good and it's 
great to have the functionality working.  Performance improvements can always 
be addressed later, if you (or anyone else) so chooses.  I'm just raising the 
issue to make sure that I understand how things are working.

9)

For the query mentioned in comment #8 above, the query plan shows:

  ProjectRestrictResultSet (sees ALL rows, restricts down to 3)
    -> ProjectRestrictResultSet (sees ALL rows)
      -> WindowResultSet (sees ALL rows)

I intuitively expected the restriction to appear at the level of the 
WindowResultSet, as opposed to appearing at the level of the top-most PRN.  The 
plan as shown above makes it look like the restriction is not being enforced by 
the Window ResultSet, but is instead being enforced by the ProjectRestrict. Is 
that correct?  Or is this just a case where the query plan fails to capture 
what is really happening with the WindowResultSet?  I'm guessing the latter, 
but am not sure...

> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
>                 Key: DERBY-2998
>                 URL: https://issues.apache.org/jira/browse/DERBY-2998
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Thomas Nielsen
>            Assignee: Thomas Nielsen
>            Priority: Minor
>         Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, 
> d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff, 
> d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-4.diff, d2998-4.stat, 
> d2998-5.diff, d2998-5.stat, d2998-6.diff, d2998-6.stat, d2998-7.diff, 
> d2998-7.stat, d2998-8.diff, d2998-8.stat, d2998-9-derby.log, d2998-9.diff, 
> d2998-9.stat, d2998-doc-1.diff, d2998-doc-1.stat, d2998-test.diff, 
> d2998-test.stat, d2998-test2.diff, d2998-test2.stat, d2998-test3.diff, 
> d2998-test3.stat, d2998-test4.diff, d2998-test4.stat, d2998-test6.diff, 
> d2998-test7.diff
>
>
> As part of implementing the overall OLAP Operations features of SQL 
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at 
> http://wiki.apache.org/db-derby/OLAPRowNumber

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to