[ 
https://issues.apache.org/jira/browse/ARROW-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369582#comment-17369582
 ] 

Jim Pivarski commented on ARROW-13151:
--------------------------------------

Great, thank you! I see now that your calling it a "bug" was commenting onĀ 
Joris's question about whether it ought to be supported, and that's what I was 
responding to.

When this is fixed, it will be a new minimum version of Arrow for us because of 
its importance in our work.

(As a side-note, if you do change the ugly "list.item" access, we'll have to 
adjust because of course we're generating column names to request them like 
that. So if that changes, we'll definitely need to pin a minimum Arrow version 
because the new names would be incompatible. I'd prefer it not to; and after 
all, it's what's in the Parquet schema. Maybe "synonyms" could hide that 
feature from high-level users, though that complicates the interface.)

> [Python] Unable to read single child field of struct column from Parquet
> ------------------------------------------------------------------------
>
>                 Key: ARROW-13151
>                 URL: https://issues.apache.org/jira/browse/ARROW-13151
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>            Reporter: Angus Hollands
>            Priority: Major
>
> Given the following table
> {code:java}
> data = {"root": [[{"addr": {"this": 3, "that": 3}}]]}
> table = pa.Table.from_pydict(data)
> {code}
> reading the nested column leads to an `pyarrow.lib.ArrowInvalid` error:
> {code}
> pq.write_table(table, "/tmp/table.parquet")
> file = pq.ParquetFile("/tmp/table.parquet")
> array = file.read(["root.list.item.addr.that"])
> {code}
> Traceback:
> {code}
> Traceback (most recent call last):
>   File "....", line 21, in <module>
>     array = file.read(["root.list.item.addr.that"])
>   File 
> "/home/angus/.mambaforge/envs/awkward/lib/python3.9/site-packages/pyarrow/parquet.py",
>  line 383, in read
>     return self.reader.read_all(column_indices=column_indices,
>   File "pyarrow/_parquet.pyx", line 1097, in 
> pyarrow._parquet.ParquetReader.read_all
>   File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: List child array invalid: Invalid: Struct child 
> array #0 does not match type field: struct<that: int64> vs struct<that: 
> int64, this: int64>
> {code}
> It's possible that I don't quite understand this properly - am I doing 
> something wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to