Alan Snow created ARROW-12823: --------------------------------- Summary: [Parquet][Python] Read and write file/column metadata using pandas attrs Key: ARROW-12823 URL: https://issues.apache.org/jira/browse/ARROW-12823 Project: Apache Arrow Issue Type: Improvement Components: Parquet, Python Reporter: Alan Snow
Related: https://github.com/pandas-dev/pandas/issues/20521 What the general thoughts are to use [DataFrame.attrs|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.attrs.html#pandas-dataframe-attrs] and [Series.attrs|https://pandas.pydata.org/pandas-docs/stable//reference/api/pandas.Series.attrs.html#pandas-series-attrs] for reading and writing metadata to/from parquet? For example, here is how the metadata would be written: {code:python} pdf = pandas.DataFrame({"a": [1]}) pdf.attrs = {"name": "my custom dataset"} pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"} pdf.to_parquet("file.parquet"){code} Then, when loading in the data: {code:python} pdf = pandas.read_parquet("file.parquet") pdf.attrs{code} {"name": "my custom dataset"} {code:java} pdf.a.attrs{code} {"long_name": "Description about data", "nodata": -1, "units": "metre"} -- This message was sent by Atlassian Jira (v8.3.4#803005)