ethan-tyler commented on issue #6051: URL: https://github.com/apache/datafusion/issues/6051#issuecomment-3792625467
@alamb - Awesome! Happy to take this on, appreciate the endorsement! Starting with an SLT for Parquet external tables: ```sql CREATE EXTERNAL TABLE t STORED AS PARQUET LOCATION '...'; SELECT col1, input_file_name() FROM t; ``` I’d prefer keeping the semantics “Spark-like” with `input_file_name()` , can adjust if we prefer `filename()`. @adriangb - Love it :) Trivial expression that defaults to NULL, rewritten by file opener. Cost paid only when referenced and no schema mutation. Composes well with `(file, row_position)` direction for DV semantics (#13261). Good point on ensuring pushdown for `GROUP BY filename()` / `ORDER BY filename()`. May need the rewrite early in planning so downstream operators see the resolved column. I'll reach out if that gets tricky. I'll prototype the rewrite in Parquet scan first (per-row literal, likely dictionary encoded). Once SLT passes, I'll open a draft PR to iterate on naming/semantics and gather feedback before splitting into reviewable pieces. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
