ethan-tyler commented on issue #6051:
URL: https://github.com/apache/datafusion/issues/6051#issuecomment-3792625467

   @alamb - Awesome! Happy to take this on, appreciate the endorsement! 
   
   Starting with an SLT for Parquet external tables:
   ```sql
   CREATE EXTERNAL TABLE t STORED AS PARQUET LOCATION '...';
   SELECT col1, input_file_name() FROM t;
   ```
   
   I’d prefer keeping the semantics “Spark-like” with `input_file_name()` , can 
adjust if we prefer `filename()`.
   
   @adriangb - Love it :) Trivial expression that defaults to NULL, rewritten 
by file opener. Cost paid only when referenced and no schema mutation. Composes 
well with `(file, row_position)` direction for DV semantics (#13261).
   
   Good point on ensuring pushdown for `GROUP BY filename()` / `ORDER BY 
filename()`. May need the rewrite early in planning so downstream operators see 
the resolved column. I'll reach out if that gets tricky.
   
   I'll prototype the rewrite in Parquet scan first (per-row literal, likely 
dictionary encoded). Once SLT passes, I'll open a draft PR to iterate on 
naming/semantics and gather feedback before splitting into reviewable pieces.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to