adriangb commented on issue #20135: URL: https://github.com/apache/datafusion/issues/20135#issuecomment-3841492089
> > Maybe something like filename(a.id) (really just any reference to a column in either table) would force the expression to be > > It looks to me like what spark / databricks did was to add a special column to each table named `_metadata` > > And then that `_metadata` column has various fields like `filename`, `row`, etc > > This might fit nicely into the expression pushdown / rewrite mechanism that you are working on 🤔 The idea of a single struct containing all of the info is interesting. If we make it a column we could use the rewrite stuff at the file level to rewrite any column with this particular name into the right data for that file but adding a column opens up a whole can of worms: https://github.com/apache/datafusion/pull/14362, https://github.com/apache/datafusion/pull/14057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
