jorisvandenbossche commented on issue #11935:
URL: https://github.com/apache/arrow/issues/11935#issuecomment-992476427


   > attributes set in R don't appear to be easily accessible via the python 
implementation and vice versa.
   
   Can you give a more concrete code example?
   
   As far as I know for Python, metadata in the table schema's metadata is 
written to Parquet FileMetaData key_value_metadata, which should be a standard 
place to put this.
   
   I am less familiar with the R side, but it seems this is similarly available 
in the R arrow table's metadata:
   
   ```python
   # create a table with some top-level metadata
   >>> table = pa.table({"a": [1, 2, 3], "b": [4, 5, 6]})
   >>> table = table.replace_schema_metadata({"a": "long name"})
   # in python this is exposed as a dict
   >>> table.schema.metadata
   {b'a': b'long name'}
   
   >>> import pyarrow.parquet as pq
   >>> pq.write_table(table, "test_metadata.parquet")
   # this metadata is stored in the Parquet FileMetaData "key_value_metadata", 
in the python interface again exposed as a dict
   >>> file_metadata = pq.read_metadata("test_metadata.parquet")
   >>> file_metadata.metadata
   {b'ARROW:schema': b'/////+gAAAAQAAAAAAAKAA4ABgAFAAgACgAAAAABBAAQAAAAAA....',
    b'a': b'long name'}
   # and after reading also available
   >>> pq.read_table("test_metadata.parquet").schema.metadata
   {b'a': b'long name'}
   ```
   
   Reading the same file from R:
   
   ```R
   > table <- read_parquet("test_metadata.parquet", as_data_frame=F)
   > table
   Table
   3 rows x 2 columns
   $a <int64>
   $b <int64>
   
   See $metadata for additional Schema metadata
   > table$metadata
   $a
   [1] "long name"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to