logan-keede commented on issue #16137: URL: https://github.com/apache/datafusion/issues/16137#issuecomment-2907673352
https://github.com/apache/datafusion/blob/34f250a2b4800845b5c4e61bd928ddbbc4af7ba0/datafusion/expr/src/logical_plan/invariants.rs#L174-L201 DataFusion tries to predict maximum rows possible instead of actually checking the number of rows. DuckDB used to fix this by limiting the number of rows to one, now they have changed the behaviour to pass error by using something like `case when rows> 0 then error(err_msg)` into the projection itself . ``` ┌──────────────┬────────────┐ │ PROJECTION │ │ ──────────────────── │ │ CASE WHEN ((#1 > 1)) THEN│ │ (error('More than one row│ │ returned by a subquery │ │ used as an expression - │ │ scalar subqueries can │ │ only return a single row.│ │ Use "SET │ │ scalar_subquery_error_on_m│ │ ultiple_rows=false" to │ │ revert to previous │ │ behavior of returning a │ │ random row.')) ELSE #0 END│ │ │ │ ~1 Rows │ └─────────────┬─────────────┘ ``` [Reference PR](https://github.com/duckdb/duckdb/pull/13514) I tried using limit and got ```sql > WITH src AS ( SELECT * FROM (VALUES (1, NULL, 'Europe'), (2, 1, 'Warsaw'), (3, 1, 'Paris') ) t(id, parent_id, name) ) SELECT id, name, (SELECT p.name FROM src p WHERE p.id = s.parent_id limit 1) AS parent_name FROM src s; This feature is not implemented: Physical plan does not support logical expression ScalarSubquery(<subquery>) ``` which is weird to say the least. I do not understand why just adding a limit will make it different. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
