[ 
https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-4359:
-----------------------------------------
    Description: 
Hi all,

a while ago I posted this issue:

{color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color}

{color:#333333}While working with Pyarrow I encountered another potential bug 
related to column metadata: If I create a table containing columns with 
metadata everything is fine. But after I save the table to parquet and load it 
back as a table using pq.read_table, the column metadata is gone.{color}

 

{color:#333333}As of now I can not say yet whether the metadata is not saved 
correctly or not loaded correctly, as I have no idea how to verify it. 
Unfortunately I also don't have the time try a lot, but I wanted to let you 
know anyway. 
{color}

 
{code}
field0 = pa.field('field1', pa.int64(), metadata=dict(a="A", b="B"))
field1 = pa.field('field2', pa.int64(), nullable=False)
columns = [
    pa.column(field0, pa.array([1, 2])),
    pa.column(field1, pa.array([3, 4]))
]
table = pa.Table.from_arrays(columns)

pq.write_table(tab, path)

tab2 = pq.read_table(path)
tab2.column(0).field.metadata
{code}
 

  was:
Hi all,

a while ago I posted this issue:

{color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color}

{color:#333333}While working with Pyarrow I encountered another potential bug 
related to column metadata: If I create a table containing columns with 
metadata everything is fine. But after I save the table to parquet and load it 
back as a table using pq.read_table, the column metadata is gone.{color}

 

{color:#333333}As of now I can not say yet whether the metadata is not saved 
correctly or not loaded correctly, as I have no idea how to verify it. 
Unfortunately I also don't have the time try a lot, but I wanted to let you 
know anyway. The mentioned issue can be used as example, just add the following 
lines:{color}

 

>>> pq.write_table(tab, path)

>>> tab2 = pq.read_table(path)

>>> tab2.column(0).field.metadata

 

 


> [Python][Parquet] Column metadata is not saved or loaded in parquet
> -------------------------------------------------------------------
>
>                 Key: ARROW-4359
>                 URL: https://issues.apache.org/jira/browse/ARROW-4359
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Seb Fru
>            Priority: Major
>              Labels: parquet
>
> Hi all,
> a while ago I posted this issue:
> {color:#333333}https://issues.apache.org/jira/browse/ARROW-3866{color}
> {color:#333333}While working with Pyarrow I encountered another potential bug 
> related to column metadata: If I create a table containing columns with 
> metadata everything is fine. But after I save the table to parquet and load 
> it back as a table using pq.read_table, the column metadata is gone.{color}
>  
> {color:#333333}As of now I can not say yet whether the metadata is not saved 
> correctly or not loaded correctly, as I have no idea how to verify it. 
> Unfortunately I also don't have the time try a lot, but I wanted to let you 
> know anyway. 
> {color}
>  
> {code}
> field0 = pa.field('field1', pa.int64(), metadata=dict(a="A", b="B"))
> field1 = pa.field('field2', pa.int64(), nullable=False)
> columns = [
>     pa.column(field0, pa.array([1, 2])),
>     pa.column(field1, pa.array([3, 4]))
> ]
> table = pa.Table.from_arrays(columns)
> pq.write_table(tab, path)
> tab2 = pq.read_table(path)
> tab2.column(0).field.metadata
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to