seddonm1 opened a new pull request #9944: URL: https://github.com/apache/arrow/pull/9944
For lineage and diffing purposes (used by protocols like DeltaLake) it can be useful to know the source of input data for a Dataframe. This adds the `input_file_name` function which, like Spark, returns the name of the file being read, or NULL if not available. Unfortunately the Arrow RecordBatch does not have the ability to serialise this information correctly so this is runtime only. See: https://lists.apache.org/thread.html/rd1ab179db7e899635351df7d5de2286915cc439fd1f48e0057a373db%40%3Cdev.arrow.apache.org%3E -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org