seddonm1 opened a new pull request #9944:
URL: https://github.com/apache/arrow/pull/9944


   For lineage and diffing purposes (used by protocols like DeltaLake) it can 
be useful to know the source of input data for a Dataframe. This adds the 
`input_file_name` function which, like Spark, returns the name of the file 
being read, or NULL if not available.
   
   Unfortunately the Arrow RecordBatch does not have the ability to serialise 
this information correctly so this is runtime only. See: 
https://lists.apache.org/thread.html/rd1ab179db7e899635351df7d5de2286915cc439fd1f48e0057a373db%40%3Cdev.arrow.apache.org%3E


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to