[GitHub] [arrow] seddonm1 opened a new pull request #9944: ARROW-12290: [Rust][DataFusion] Add input_file_name function

GitBox Wed, 07 Apr 2021 19:30:16 -0700


seddonm1 opened a new pull request #9944:
URL: https://github.com/apache/arrow/pull/9944



   For lineage and diffing purposes (used by protocols like DeltaLake) it can 
be useful to know the source of input data for a Dataframe. This adds the 
`input_file_name` function which, like Spark, returns the name of the file 
being read, or NULL if not available.
   
   Unfortunately the Arrow RecordBatch does not have the ability to serialise 
this information correctly so this is runtime only. See: 
https://lists.apache.org/thread.html/rd1ab179db7e899635351df7d5de2286915cc439fd1f48e0057a373db%40%3Cdev.arrow.apache.org%3E


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] seddonm1 opened a new pull request #9944: ARROW-12290: [Rust][DataFusion] Add input_file_name function

Reply via email to