[ 
https://issues.apache.org/jira/browse/ARROW-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van den Bossche updated ARROW-8860:
-----------------------------------------
    Description: 
When writing a table with a Struct typed column, this is read back with garbage 
values when using compression (which is the default):

{code:python}
>>>  table = pa.table({'col': pa.StructArray.from_arrays([[0, 1, 2], [1, 2, 
>>> 3]], names=["f1", "f2"])})

# roundtrip through feather
>>> feather.write_feather(table, "test_struct.feather")
>>> table2 = feather.read_table("test_struct.feather")

>>> table2.column("col")
<pyarrow.lib.ChunkedArray object at 0x7f0b0c4d7728>
[
  -- is_valid: all not null
  -- child 0 type: int64
    [
      24,
      1261641627085906436,
      1369095386551025664
    ]
  -- child 1 type: int64
    [
      24,
      1405756815161762308,
      281479842103296
    ]
]
{code}

When not using compression, it is read back correctly:

{code:python}
>>> feather.write_feather(table, "test_struct.feather", 
>>> compression="uncompressed")                                                 
>>>                                                                           
>>> table2 = feather.read_table("test_struct.feather")                          
>>>                                                                             
>>>                                                   

>>> table2.column("col")                                                        
>>>                                                                             
>>>                                                   
<pyarrow.lib.ChunkedArray object at 0x7f0b0e466778>
[
  -- is_valid: all not null
  -- child 0 type: int64
    [
      0,
      1,
      2
    ]
  -- child 1 type: int64
    [
      1,
      2,
      3
    ]
]
{code}


  was:
When writing a table with a Struct typed column, this is read back with garbage 
values when using compression (which is the default):

{code:python}
>>>  table = pa.table({'col': pa.StructArray.from_arrays([[0,1,2], [1,2,3]], 
>>> names=["f1", "f2"])})
>>>  table.column("col")
<pyarrow.lib.ChunkedArray object at 0x7f0b0c4d7458>
[
  -- is_valid: all not null
  -- child 0 type: int64
    [
      0,
      1,
      2
    ]
  -- child 1 type: int64
    [
      1,
      2,
      3
    ]
]

# roundtrip through feather
>>> feather.write_feather(table, "test_struct.feather")
>>> table2 = feather.read_table("test_struct.feather")

>>> table2.column("col")
<pyarrow.lib.ChunkedArray object at 0x7f0b0c4d7728>
[
  -- is_valid: all not null
  -- child 0 type: int64
    [
      24,
      1261641627085906436,
      1369095386551025664
    ]
  -- child 1 type: int64
    [
      24,
      1405756815161762308,
      281479842103296
    ]
]
{code}

When not using compression, it is read back correctly:

{code:python}
>>> feather.write_feather(table, "test_struct.feather", 
>>> compression="uncompressed")                                                 
>>>                                                                           
>>> table2 = feather.read_table("test_struct.feather")                          
>>>                                                                             
>>>                                                   

>>> table2.column("col")                                                        
>>>                                                                             
>>>                                                   
<pyarrow.lib.ChunkedArray object at 0x7f0b0e466778>
[
  -- is_valid: all not null
  -- child 0 type: int64
    [
      0,
      1,
      2
    ]
  -- child 1 type: int64
    [
      1,
      2,
      3
    ]
]
{code}



> [C++] Compressed Feather file with struct array roundtrips incorrectly
> ----------------------------------------------------------------------
>
>                 Key: ARROW-8860
>                 URL: https://issues.apache.org/jira/browse/ARROW-8860
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> When writing a table with a Struct typed column, this is read back with 
> garbage values when using compression (which is the default):
> {code:python}
> >>>  table = pa.table({'col': pa.StructArray.from_arrays([[0, 1, 2], [1, 2, 
> >>> 3]], names=["f1", "f2"])})
> # roundtrip through feather
> >>> feather.write_feather(table, "test_struct.feather")
> >>> table2 = feather.read_table("test_struct.feather")
> >>> table2.column("col")
> <pyarrow.lib.ChunkedArray object at 0x7f0b0c4d7728>
> [
>   -- is_valid: all not null
>   -- child 0 type: int64
>     [
>       24,
>       1261641627085906436,
>       1369095386551025664
>     ]
>   -- child 1 type: int64
>     [
>       24,
>       1405756815161762308,
>       281479842103296
>     ]
> ]
> {code}
> When not using compression, it is read back correctly:
> {code:python}
> >>> feather.write_feather(table, "test_struct.feather", 
> >>> compression="uncompressed")                                               
> >>>                                                                           
> >>>   
> >>> table2 = feather.read_table("test_struct.feather")                        
> >>>                                                                           
> >>>                                                       
> >>> table2.column("col")                                                      
> >>>                                                                           
> >>>                                                       
> <pyarrow.lib.ChunkedArray object at 0x7f0b0e466778>
> [
>   -- is_valid: all not null
>   -- child 0 type: int64
>     [
>       0,
>       1,
>       2
>     ]
>   -- child 1 type: int64
>     [
>       1,
>       2,
>       3
>     ]
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to