[ https://issues.apache.org/jira/browse/DERBY-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573867#action_12573867 ]
A B commented on DERBY-2998: ---------------------------- Thanks for patch 19, Thomas. I think your approach to handling the virtual column nodes is a good one (certainly better than the alternatives seen thus far). In the interest of getting something into the codeline for others to play with--and to simplify future patches--per Dan's comment a few days back, I'll try to commit patch 19 sometime today. Once that is committed, perhaps you can post one or more follow-up patches to address the following un-related comments, which I noticed while looking into the VCN issue. Of these # 3 is probably the most important one... 1) Instead of using "row_number()" for the default name of a ROW_NUMBER() column, I think it'd be better to use a normal generated column name, as is done for all other un-named expressions in Derby (I believe?). So something like: select i, j, i+j, row_number() over() from t1 should return four columns, "I", "J", "3", and "4", where the last two are generic generated names used for unnamed expressions. With patch 19 such a query currently returns "I", "J", "3", "row_number()", which deviates from the existing pattern. 2) Regarding the following logic in PredicateList.pushExpressionsIntoSelect(): ColumnReference ref = (ColumnReference)e.nextElement(); if (!ref.pointsToColumnReference() && !ref.pointsToWindowFunction()) { state = false; break; } I think it says "if the column reference points to something that is neither a) another column reference NOR b) a window function, then do *not* push it down." So if the column reference points to a window function, we we could still potentially push it down. Is that what we want? For the record, I commented the call to "pointsToWindowFunction()" out completely and all of the queries still ran the same. So is this check still necessary? 3) The following query returns one row, regardless of how many rows there are in T1: select distinct row_number() over(), 'HMM' from t1 It appears that we will evaluate the entire SELECT result set, then apply the DISTINCT, and finally, assign the row numbers. So if there are five rows in T1, we'll get five rows having a single "HMM" column, then we'll apply the distinct to get a single "HMM" row, and that row will have row number "1". But is that the correct behavior? Or is the row_number() supposed to be included in the "DISTINCT" qualification, in which case the query would return 5 rows because each row has a different row number and thus each is distinct from the others? Just to see what would happen I ran the above query on DB2 v8 and it returned one row for each row in T1. I took a look at the syntax rules for <window function> in SQL 2003 spec section 6.10 but could not, in my inexperience with reading specs, determine one way or the other. That said, though, I think functions in a ResultColumnList are typically evaluated once per row, *prior* to the DISTINCT being applied, so I would guess ROW_NUMBER() should do the same? 4) Might be good to add some test cases for GROUP BY and HAVING clauses in the presence of ROW_NUMBER(). I tried a few quick ones and they seem to be working fine, but it wouldn't hurt to add some tests for posterity. These are the ones I tried, but use your imagination :) select r from (select i, row_number() over() as r, j from t1) x group by r; select * from (select i, row_number() over() as r, j from t1) x group by i, j, r; select * from (select i, row_number() over() as r, j from t1) x group by j, r, i; select * from (select i, row_number() over() as r, j from t1) x group by j, r, i having r > 2; select * from (select i, row_number() over() as r, j from t1) x group by j, r, i having r > 2 and i >=3 order by i desc; select * from (select i, row_number() over() as r, j from t1) x group by j, r, i having r > 2 and i >=3 order by r desc; select * from (select i, row_number() over() as r, j from t1) x group by j, r, i having r > 2 and i >=3 order by i asc, r desc; And yes I realize that grouping by a ROW_NUMBER() is kind of silly since row numbers are unique across rows--but it's good make sure the behavior is correct :) 5) SQL spec indicates that window functions like ROW_NUMBER() can be used in an ORDER BY clause, but that is not yet implemented (right?). Do you think it would be worth it to add a statement saying as much to the documentation? > Add support for ROW_NUMBER() window function > -------------------------------------------- > > Key: DERBY-2998 > URL: https://issues.apache.org/jira/browse/DERBY-2998 > Project: Derby > Issue Type: Sub-task > Components: SQL > Reporter: Thomas Nielsen > Assignee: Thomas Nielsen > Priority: Minor > Attachments: d2998-10.diff, d2998-10.stat, d2998-11.diff, > d2998-12.diff, d2998-12.stat, d2998-13.diff, d2998-13.stat, d2998-14.diff, > d2998-14.stat, d2998-15.diff, d2998-15.stat, d2998-16.diff, d2998-16.stat, > d2998-17.diff, d2998-17.stat, d2998-18.diff, d2998-18.stat, d2998-19.diff, > d2998-19.stat, d2998-4.diff, d2998-4.stat, d2998-5.diff, d2998-5.stat, > d2998-6.diff, d2998-6.stat, d2998-7.diff, d2998-7.stat, d2998-8.diff, > d2998-8.stat, d2998-9-derby.log, d2998-9.diff, d2998-9.stat, > d2998-doc-1.diff, d2998-doc-1.stat, d2998-test.diff, d2998-test.stat, > d2998-test2.diff, d2998-test2.stat, d2998-test3.diff, d2998-test3.stat, > d2998-test4.diff, d2998-test4.stat, d2998-test6.diff, d2998-test7.diff, > d2998-test8.diff, d2998-test9.diff > > > As part of implementing the overall OLAP Operations features of SQL > (DERBY-581), implement the ROW_NUMBER() window function. > More information about this feature is available at > http://wiki.apache.org/db-derby/OLAPRowNumber -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.