[ 
https://issues.apache.org/jira/browse/TRAFODION-3296?focusedWorklogId=226235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-226235
 ]

ASF GitHub Bot logged work on TRAFODION-3296:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Apr/19 17:56
            Start Date: 11/Apr/19 17:56
    Worklog Time Spent: 10m 
      Work Description: DaveBirdsall commented on pull request #1828: 
[TRAFODION-3296] Fix handling of [first n] + ORDER BY subqueries
URL: https://github.com/apache/trafodion/pull/1828#discussion_r274554946
 
 

 ##########
 File path: core/sql/optimizer/BindRelExpr.cpp
 ##########
 @@ -6807,6 +6846,100 @@ RelExpr *RelRoot::bindNode(BindWA *bindWA)
   return boundExpr;
 } // RelRoot::bindNode()
 
+
+RelExpr * RelRoot::rewriteFirstNOrderBySubquery(BindWA *bindWA)
+{
+  // We have a subquery of the form:
+  //
+  // select [first n] <x> from <t> order by <y>
+  //
+  // Unfortunately, there are chicken-and-egg problems that prevent us
+  // from pushing the ORDER BY down to the firstN node. (We have to wait
+  // until normalizeNode time to insure we get the right ValueIds, but
+  // the RelRoot containing the ORDER BY has already been deleted by
+  // the time we get there.) To circumvent that, we rewrite the subquery
+  // in this form:
+  //
+  // select <x>
+  // from (select <x>, row_number() over(order by <y>) rn from <t>)
+  // where rn <= n
+  //
+  // Aside: For row subqueries in the top-most select list, the 
+  // chicken-and-egg problem doesn't exist because the top-most RelRoot
+  // does survive until normalizeNode time.
+  //
+  // To perform this transformation, we take the input tree (on the left
+  // below) and rewrite it to the output tree (on the right below). Since
+  // this transformation is being done before binding, we move around
+  // ItemExpr trees instead of ValueIds. Binding happens at the end of
+  // the method, after the rewrite.
+  //
+  //   
+  //                                       new RelRoot (newRelRoot), with copy 
of
+  //                                        original select list
+  //                                          | 
+  //                                       new RenameTable (renameTable), 
+  //                                         with the WHERE rn <= n
+  //                                         clause attached
+  //                                          |
+  //     RelRoot of subquery (this)        original RelRoot (this),
+  //       with original select             with ROW_NUMBER ORDER BY aggregate 
added
+  //       list (originalSelectList)         to select list, ORDER BY removed 
+  //        |                                from RelRoot 
+  //        |                                 |
+  //        |                              new RelSequence (sequenceNode)
+  //        |                                 |
+  //     Subquery tree (query)             Subquery tree (query)            
+
+
+  // point to subquery tree
+  RelExpr * query = child(0)->castToRelExpr();
+
+  // retain a pointer to the present select list to put in our new RelRoot
+  ItemExpr * originalSelectList = getCompExprTree();
+
+  // create "row_number() over(order by <y>) as rn" parse tree, removing the
+  // order by tree from this RelRoot in the process
+  Aggregate * rowNumberOverOrderBy = 
+    new (bindWA->wHeap()) Aggregate(ITM_COUNT, 
+                                    new (bindWA->wHeap()) SystemLiteral(1),
+                                    FALSE /*i.e. not distinct*/,
+                                    ITM_COUNT_STAR__ORIGINALLY,
+                                    '!');
+  rowNumberOverOrderBy->setOLAPInfo(NULL, removeOrderByTree(), -INT_MAX, 0);
+  NAString rowNumberColumnName = "_sys_RN_" + bindWA->fabricateUniqueName();
+  ColRefName * colRefName = new (bindWA->wHeap()) 
ColRefName(rowNumberColumnName, bindWA->wHeap());
+  RenameCol * rename = new (bindWA->wHeap())RenameCol(rowNumberOverOrderBy, 
colRefName);
+
+  // add it to select list of the current RelRoot
+  compExprTree_ = new (bindWA->wHeap()) ItemList(compExprTree_,rename);
+
+  // put a RelSequence node on top of the query node, and make it the child of 
this
+  RelSequence * sequenceNode = new (bindWA->wHeap()) RelSequence(query,NULL);
+  sequenceNode->setHasOlapFunctions(TRUE);
+  this->child(0) = sequenceNode;
 
 Review comment:
   Hmmmm... The presence of an ORDER BY implies that all rows must be read and 
sorted. There is no escaping that. Even in firstn plans (where [first n] + 
ORDER BY is specified at the top level query) we do that. There may be an 
optimization possibility of doing Top N sort rather than a full sort. But in 
any case, all rows must be read.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 226235)
    Time Spent: 2h  (was: 1h 50m)

> Subquery with [first n] + ORDER BY gives wrong answer
> -----------------------------------------------------
>
>                 Key: TRAFODION-3296
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3296
>             Project: Apache Trafodion
>          Issue Type: Bug
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> The following test script illustrates the problem:
> {quote}?section setup
> drop table if exists t1;
> drop table if exists t2;
> create table t1 (val integer);
> create table t2 (val integer);
> insert into t1 values(994707150),(1923979352),(-1865644273);
> insert into t2 select * from t1;
> ?section testit
> select [first 1] val from t2 order by val;
> prepare xx from select val from t1 where val in (select [first 1] val from t2 
> order by val);
> explain options 'f' xx;
> -- should return -1865644273, but returns something different
> execute xx;
> {quote}
> When run, the script shows:
> {quote}>>?section testit
> >>
> >>select [first 1] val from t2 order by val;
> VAL 
> -----------
> -1865644273
> --- 1 row(s) selected.
> >>
> >>prepare xx from select val from t1 where val in (select [first 1] val from 
> >>t2 order by val);
> --- SQL command prepared.
> >>
> >>explain options 'f' xx;
> LC RC OP OPERATOR OPT DESCRIPTION CARD
> ---- ---- ---- -------------------- -------- -------------------- ---------
> 5 . 6 root 1.00E+002
> 3 4 5 nested_join 1.00E+002
> . . 4 trafodion_scan T1 1.00E+002
> 2 . 3 hash_groupby 1.00E+000
> 1 . 2 firstn 1.00E+000
> . . 1 trafodion_scan T2 1.00E+002
> --- SQL operation complete.
> >>
> >>-- should return -1865644273, but returns something different
> >>execute xx;
> VAL 
> -----------
> 994707150
> --- 1 row(s) selected.
> >>
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to