All, the only documentation about the File Metadata ( hidden_metadata struct) I can seem to find is on the databricks website

 

https://docs.databricks.com/en/ingestion/file-metadata-column.html#file-metadata-column

 

for reference here is the struct:

_metadata: struct (nullable = false)
 |-- file_path: string (nullable = false)
 |-- file_name: string (nullable = false)
 |-- file_size: long (nullable = false)
 |-- file_block_start: long (nullable = false)
 |-- file_block_length: long (nullable = false)
 |-- file_modification_time: timestamp (nullable = false)

 

 

As far as I can tell this feature was released as part of spark 3.20 based on this stack overflow post

 

https://stackoverflow.com/questions/62846669/can-i-get-metadata-of-files-reading-by-spark/77238087#77238087

 

unfortunately I wasn’t able to locate this in the release notes. Though I may have missed it somehow.

 

So I have  the following questions and seeking guidance from the list at how to best approach this

 

  1. Is the documentation “missing” from the spark 3.20 site or am I just unable to find it:
  2. While it provides the file_modification_time, there doesn’t seem to be a corresponding file_creation_time

 

Would both of these be issues that should be opened in JIRA?  Both of these seem like simple and useful things to add but are above my ability to submit PR’s for without some guidance.

 

I’m happy to help especially with a documentation PR’ if someone can confirm and get me started in the right direction. I don’t really have the java / scala skills needed to implement the feature.

 

 

Thanks for any pointers

 

 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to