alamb commented on issue #4854: URL: https://github.com/apache/arrow-datafusion/issues/4854#issuecomment-1400821159
Hi @jackwener -- you are right this is tricky. I think the correct semantics are to resolve the ORDER BY in terms of the output schema of the stage (not the output of the select list) > spark-sql> explain extended select uuid from hudi_test order by price+1; As I recall the way postgres handled this case was to add `price+1` to the select list but mark it "hidden" so it was removed from the final select list Maybe we can also special case using the postgres model of "resolve using select list" and if that is not possible, try and "pull" up relevant columns through the Projection. For example, if the input Plan to `order_by` https://github.com/apache/arrow-datafusion/blob/master/datafusion/sql/src/query.rs#L144 was ```text Projection(column1) TableScan(column1, column2) ``` And the sort was by `column2`: Today the code will try to put the sort *above* the projection: ```text Sort(column2) <-- will error / resolve to the incorrect alias Projection(column1) TableScan(column1, column2) ``` Perhaps we could put the Sort *below* the Projection like ```text Projection(column1) Sort(coumn2) TableScan(column1, column2) ``` 🤔 I can take a shot at doing this if you wanted. A few of our users have hit this so I am incentivized to try and help this Here are some other examples from postgres ```sql postgres=# create table foo as values (1, 2), (3, 4), (5, 6); SELECT 3 postgres=# select * from foo; column1 | column2 ---------+--------- 1 | 2 3 | 4 5 | 6 (3 rows) postgres=# select column1 from foo order by column2; column1 --------- 1 3 5 (3 rows) postgres=# select column1 from foo order by column1 + column2; column1 --------- 1 3 5 (3 rows) ``` But for trickier stuff postgres requires the expressions to appear directly in the select list ```sql postgres=# select distinct column1 from foo order by column1 + column2; ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: select distinct column1 from foo order by column1 + column2; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org