adriangb commented on issue #20135:
URL: https://github.com/apache/datafusion/issues/20135#issuecomment-3841492089

   > > Maybe something like filename(a.id) (really just any reference to a 
column in either table) would force the expression to be
   > 
   > It looks to me like what spark / databricks did was to add a special 
column to each table named `_metadata`
   > 
   > And then that `_metadata` column has various fields like `filename`, 
`row`, etc
   > 
   > This might fit nicely into the expression pushdown / rewrite mechanism 
that you are working on 🤔
   
   The idea of a single struct containing all of the info is interesting.
   
   If we make it a column we could use the rewrite stuff at the file level to 
rewrite any column with this particular name into the right data for that file 
but adding a column opens up a whole can of worms: 
https://github.com/apache/datafusion/pull/14362, 
https://github.com/apache/datafusion/pull/14057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to