[ 
https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565963#action_12565963
 ] 

A B commented on DERBY-2998:
----------------------------

Thank you for updating the wiki page with more details, Thomas.  A couple of 
comments that come to mind from a quick read of that page (without having 
examined the code changes themselves in detail):

 1. Near the top of the wiki page there is a sentence saying:

     "In this example the ROW_NUMBER function is used to limit the
      query as soon as the first N rows have been determined".

    But in the details that you've added to the wiki, you note the following:

     "For the nested select query above we materialize the subquery
      select result, and have the outer SELECT pull rows from the
      materialized result".

    From a quick reading of your writeup, it seems like materialization
    of the subquery might be counter-productive?  That is, if I have the
    following query:

        SELECT * FROM (
          SELECT row_number() over () as r, t.* FROM T
        ) AS tmp WHERE r <= 3;

    and table T has a thousand rows in it, the ideal would be to use the
    ROW_NUMBER() function to limit the scan on T so that we only fetch
    the first 3 rows from disk.  But if we materialize the subquery, I
    think that means the nested SELECT will issue a full scan on table T,
    returning all 1000 rows, and we'll "materialize" those into memory.
    Then the ROW_NUMBER() function will simply extract out the first 3
    rows from the materialized result set.  Is that an accurate description
    of what you mean by "materialized result", or am I misreading?  If
    this is in fact what you are proposing, then can you explain a bit more
    about what the benefit of such materialization is?  Am I right in
    thinking that this could negate potential performance gains that might
    otherwise come from ROW_NUMBER() as a "limiting" predicate?

    The wiki page does say:

      "window function results that span multiple rows, like a moving
       average, will benefit from materialization..."

    which may be true.  But it seems like there may be better ways to
    deal with such functions than full materialization of the result set.
    In particular, the idea of a "sort observer", similar to what is used
    for GroupedAggregateResultSet, seems like it could potentially be
    useful for such a function?  Which brings me to my next comment...

 2. Did you by chance look at GroupedAggregateResultSet and its surrounding
    code to see if the OLAP functions might extend that class?  It's perhaps
    overkill for a ROW_NUMBER() function that only supports empty, unnamed
    windows.  But the OLAP window specification in general seems to share some
    common concepts with GROUP BY processing, so I wonder if it'd be possible
    to put building blocks in place based on that similarity--ex. to create
    some kind of "WindowFunctionValue" class that extends GroupedAggregated-
    ResultSet, and then have the various OLAP functions in turn extend
    WindowFunctionValue as needed.

    I have no idea if/how that would actually work, but I thought I'd mention
    it to see if you've had an opportunity to look at that option?  Maybe
    it's not worth it for the simple ROW_NUMBER() function that you're currently
    working on--feel free to say so :)

 3. Under "modification of access paths" there is the following:

    "We add one WindowNode for each window function column, and they are
     evaluated left to right in the result column list, with the right most 
column 
     being the top WindowNode in the query plan."

    So if I have:

        SELECT
            row_number() over () as r1,
            row_number() over () as r2,
            t.col1
        FROM T

    Is it correct to say that my query tree will end up as:

      ProjectRestrictNode
        ==> WindowNode (r2)
            ==> WindowNode (r1)
                ==> ProjectRestrictNode
                    ==> FromBaseTable

    and my execution tree will look like:

      ProjectRestrictResultSet
        ==> WindowResultSet (r2)
            ==> WindowResultSet (r1)
                ==> ProjectRestrictResultSet
                    ==> TableScanResultSet

    I guess this isn't really a comment, just a quick check to make
    sure I'm understanding what is supposed to happen :)

> Add support for ROW_NUMBER() window function
> --------------------------------------------
>
>                 Key: DERBY-2998
>                 URL: https://issues.apache.org/jira/browse/DERBY-2998
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Thomas Nielsen
>            Assignee: Thomas Nielsen
>            Priority: Minor
>         Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, 
> d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff, 
> d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat, 
> d2998-9-derby.log, d2998-9.diff, d2998-9.stat, d2998-doc-1.diff, 
> d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff, 
> d2998-test2.stat, d2998-test3.diff, d2998-test3.stat
>
>
> As part of implementing the overall OLAP Operations features of SQL 
> (DERBY-581), implement the ROW_NUMBER() window function.
> More information about this feature is available at 
> http://wiki.apache.org/db-derby/OLAPRowNumber

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to