Alan Snow created ARROW-12823:
---------------------------------

             Summary: [Parquet][Python] Read and write file/column metadata 
using pandas attrs
                 Key: ARROW-12823
                 URL: https://issues.apache.org/jira/browse/ARROW-12823
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Parquet, Python
            Reporter: Alan Snow


Related: https://github.com/pandas-dev/pandas/issues/20521

What the general thoughts are to use 
[DataFrame.attrs|https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.attrs.html#pandas-dataframe-attrs]
 and 
[Series.attrs|https://pandas.pydata.org/pandas-docs/stable//reference/api/pandas.Series.attrs.html#pandas-series-attrs]
 for reading and writing metadata to/from parquet?

For example, here is how the metadata would be written:
{code:python}
pdf = pandas.DataFrame({"a": [1]})
pdf.attrs = {"name": "my custom dataset"}
pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": 
"metre"}
pdf.to_parquet("file.parquet"){code}

Then, when loading in the data:
{code:python}
pdf = pandas.read_parquet("file.parquet")
pdf.attrs{code}
{"name": "my custom dataset"}
{code:java}
pdf.a.attrs{code}
{"long_name": "Description about data", "nodata": -1, "units": "metre"}



 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to