René Rex created ARROW-8703: ------------------------------- Summary: [R][Parquet] table$schema$metadata is a string Key: ARROW-8703 URL: https://issues.apache.org/jira/browse/ARROW-8703 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 0.17.0 Reporter: René Rex
Currently, I try to export numeric data plus some metadata in Python into to a parquet file and read it in R. However, the metadata seems to be a dict in Python but a string in R. I would have expected a list (which is roughly a dict in Python). Am I missing something? Here is the code to demonstrate the issue: {{import sys}}{{import numpy as np}} {{import pyarrow as pa}} {{import pyarrow.parquet as pq}}{{print(sys.version)}} {{print(pa.__version__)}}{{x = np.random.randint(0, 10, (10, 3))}} {{arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]}} {{table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],}} {{ metadata=\{'foo': '42'})}} {{pq.write_table(table, 'array.parquet', compression='snappy')}}{{table = pq.read_table('array.parquet')}} {{metadata = table.schema.metadata}} {{print(metadata)}} {{print(type(metadata))}} And in R: {{library(arrow)}}{{print(R.version)}} {{print(packageVersion("arrow"))}}{{table <- read_parquet("array.parquet", as_data_frame = FALSE)}} {{metadata <- table$schema$metadata}} {{print(metadata)}} {{print(is(metadata))}} {{print(metadata["foo"])}}{{ }} Output Python: {{3.6.8 (default, Aug 7 2019, 17:28:10) }} {{[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]}} {{0.13.0}} {{OrderedDict([(b'foo', b'42')])}} {{<class 'collections.OrderedDict'>}} Output R: {{[1] ‘0.17.0’}} {{[1] "\n-- metadata --\nfoo: 42"}} {{[1] "character" "vector" "data.frameRowLabels"}} {{[4] "SuperClassMethod" }} {{[1] NA}} -- This message was sent by Atlassian Jira (v8.3.4#803005)