[ https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565963#action_12565963 ]
A B commented on DERBY-2998: ---------------------------- Thank you for updating the wiki page with more details, Thomas. A couple of comments that come to mind from a quick read of that page (without having examined the code changes themselves in detail): 1. Near the top of the wiki page there is a sentence saying: "In this example the ROW_NUMBER function is used to limit the query as soon as the first N rows have been determined". But in the details that you've added to the wiki, you note the following: "For the nested select query above we materialize the subquery select result, and have the outer SELECT pull rows from the materialized result". From a quick reading of your writeup, it seems like materialization of the subquery might be counter-productive? That is, if I have the following query: SELECT * FROM ( SELECT row_number() over () as r, t.* FROM T ) AS tmp WHERE r <= 3; and table T has a thousand rows in it, the ideal would be to use the ROW_NUMBER() function to limit the scan on T so that we only fetch the first 3 rows from disk. But if we materialize the subquery, I think that means the nested SELECT will issue a full scan on table T, returning all 1000 rows, and we'll "materialize" those into memory. Then the ROW_NUMBER() function will simply extract out the first 3 rows from the materialized result set. Is that an accurate description of what you mean by "materialized result", or am I misreading? If this is in fact what you are proposing, then can you explain a bit more about what the benefit of such materialization is? Am I right in thinking that this could negate potential performance gains that might otherwise come from ROW_NUMBER() as a "limiting" predicate? The wiki page does say: "window function results that span multiple rows, like a moving average, will benefit from materialization..." which may be true. But it seems like there may be better ways to deal with such functions than full materialization of the result set. In particular, the idea of a "sort observer", similar to what is used for GroupedAggregateResultSet, seems like it could potentially be useful for such a function? Which brings me to my next comment... 2. Did you by chance look at GroupedAggregateResultSet and its surrounding code to see if the OLAP functions might extend that class? It's perhaps overkill for a ROW_NUMBER() function that only supports empty, unnamed windows. But the OLAP window specification in general seems to share some common concepts with GROUP BY processing, so I wonder if it'd be possible to put building blocks in place based on that similarity--ex. to create some kind of "WindowFunctionValue" class that extends GroupedAggregated- ResultSet, and then have the various OLAP functions in turn extend WindowFunctionValue as needed. I have no idea if/how that would actually work, but I thought I'd mention it to see if you've had an opportunity to look at that option? Maybe it's not worth it for the simple ROW_NUMBER() function that you're currently working on--feel free to say so :) 3. Under "modification of access paths" there is the following: "We add one WindowNode for each window function column, and they are evaluated left to right in the result column list, with the right most column being the top WindowNode in the query plan." So if I have: SELECT row_number() over () as r1, row_number() over () as r2, t.col1 FROM T Is it correct to say that my query tree will end up as: ProjectRestrictNode ==> WindowNode (r2) ==> WindowNode (r1) ==> ProjectRestrictNode ==> FromBaseTable and my execution tree will look like: ProjectRestrictResultSet ==> WindowResultSet (r2) ==> WindowResultSet (r1) ==> ProjectRestrictResultSet ==> TableScanResultSet I guess this isn't really a comment, just a quick check to make sure I'm understanding what is supposed to happen :) > Add support for ROW_NUMBER() window function > -------------------------------------------- > > Key: DERBY-2998 > URL: https://issues.apache.org/jira/browse/DERBY-2998 > Project: Derby > Issue Type: Sub-task > Components: SQL > Reporter: Thomas Nielsen > Assignee: Thomas Nielsen > Priority: Minor > Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, > d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, d2998-6.diff, > d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, d2998-8.stat, > d2998-9-derby.log, d2998-9.diff, d2998-9.stat, d2998-doc-1.diff, > d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, d2998-test2.diff, > d2998-test2.stat, d2998-test3.diff, d2998-test3.stat > > > As part of implementing the overall OLAP Operations features of SQL > (DERBY-581), implement the ROW_NUMBER() window function. > More information about this feature is available at > http://wiki.apache.org/db-derby/OLAPRowNumber -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.