alamb commented on issue #4854:
URL: 
https://github.com/apache/arrow-datafusion/issues/4854#issuecomment-1400821159

   Hi @jackwener  -- you are right this is tricky. I think the correct 
semantics are to resolve the ORDER BY in terms of the output schema of the 
stage (not the output of the select list)
   
   > spark-sql> explain extended select uuid from hudi_test order by price+1;
   
   As I recall the way postgres handled this case was to add `price+1` to the 
select list but mark it "hidden" so it was removed from the final select list
   
   Maybe we can also special case using the postgres model of "resolve using 
select list" and if that is not possible, try and "pull" up relevant columns 
through the Projection. For example, if the input Plan to `order_by` 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/query.rs#L144
 was 
   
   ```text
   Projection(column1)
     TableScan(column1, column2)
   ```
   
   And the sort was by `column2`:
   
   Today the code will try to put the sort *above* the projection:
   ```text
   Sort(column2) <-- will error / resolve to the incorrect alias
     Projection(column1)
       TableScan(column1, column2)
   ```
   
   Perhaps we could put the Sort *below* the Projection like
   
   ```text
   Projection(column1)
     Sort(coumn2)
       TableScan(column1, column2)
   ```
   
   🤔 
   
   I can take a shot at doing this if you wanted. A few of our users have hit 
this so I am incentivized to try and help this
   
   Here are some other examples from postgres
   
   ```sql
   postgres=# create table foo as values (1, 2), (3, 4), (5, 6);
   SELECT 3
   postgres=# select * from foo;
    column1 | column2 
   ---------+---------
          1 |       2
          3 |       4
          5 |       6
   (3 rows)
   
   
   postgres=# select column1 from foo order by column2;
    column1 
   ---------
          1
          3
          5
   (3 rows)
   
   postgres=# select column1 from foo order by column1 + column2;
    column1 
   ---------
          1
          3
          5
   (3 rows)
   ```
   
   But for trickier stuff postgres requires the expressions to appear directly 
in the select list
   
   ```sql
   postgres=# select distinct column1 from foo order by column1 + column2;
   ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list
   LINE 1: select distinct column1 from foo order by column1 + column2;
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to