[ https://issues.apache.org/jira/browse/ARROW-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Saint-Jacques updated ARROW-8703: ------------------------------------------ Summary: [R] schema$metadata should be properly typed (was: [R][Parquet] table$schema$metadata is a string) > [R] schema$metadata should be properly typed > -------------------------------------------- > > Key: ARROW-8703 > URL: https://issues.apache.org/jira/browse/ARROW-8703 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Affects Versions: 0.17.0 > Reporter: René Rex > Priority: Critical > > Currently, I try to export numeric data plus some metadata in Python into to > a parquet file and read it in R. However, the metadata seems to be a dict in > Python but a string in R. I would have expected a list (which is roughly a > dict in Python). Am I missing something? Here is the code to demonstrate the > issue: > {{import sys}} > {{import numpy as np}} > {{import pyarrow as pa}} > {{import pyarrow.parquet as pq}} > {{print(sys.version)}} > {{print(pa.__version__)}} > {{x = np.random.randint(0, 10, (10, 3))}} > {{arrays = [pa.array(x[:, i]) for i in range(x.shape[1])]}} > {{table = pa.Table.from_arrays(arrays=arrays, names=['A', 'B', 'C'],}} > {{ metadata=\{'foo': '42'})}} > {{pq.write_table(table, 'array.parquet', compression='snappy')}} > {{table = pq.read_table('array.parquet')}} > {{metadata = table.schema.metadata}} > {{print(metadata)}} > {{print(type(metadata))}} > > And in R: > > {{library(arrow)}} > {{print(R.version)}} > {{print(packageVersion("arrow"))}} > {{table <- read_parquet("array.parquet", as_data_frame = FALSE)}} > {{metadata <- table$schema$metadata}} > {{print(metadata)}} > {{print(is(metadata))}} > {{print(metadata["foo"])}}{{ }} > > Output Python: > {{3.6.8 (default, Aug 7 2019, 17:28:10) }} > {{[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]}} > {{0.13.0}} > {{OrderedDict([(b'foo', b'42')])}} > {{<class 'collections.OrderedDict'>}} > > Output R: > {{[1] ‘0.17.0’}} > {{[1] "\n-- metadata --\nfoo: 42"}} > {{[1] "character" "vector" "data.frameRowLabels"}} > {{[4] "SuperClassMethod" }} > {{[1] NA}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)