[ https://issues.apache.org/jira/browse/ARROW-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-2587. --------------------------------- Resolution: Fixed Issue resolved by pull request 6751 [https://github.com/apache/arrow/pull/6751] > [Python] Unable to write StructArrays with multiple children to parquet > ----------------------------------------------------------------------- > > Key: ARROW-2587 > URL: https://issues.apache.org/jira/browse/ARROW-2587 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.9.0 > Reporter: jacques > Assignee: Micah Kornfield > Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.17.0 > > Attachments: Screen Shot 2018-05-16 at 12.24.39.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Although I am able to read StructArray from parquet, I am still unable to > write it back from pa.Table to parquet. > I get an "ArrowInvalid: Nested column branch had multiple children" > Here is a quick example: > {noformat} > In [2]: import pyarrow.parquet as pq > In [3]: table = pq.read_table('test.parquet') > In [4]: table > Out[4]: > pyarrow.Table > weight: double > animal_type: string > animal_interpretation: struct<is_large_animal: bool, is_mammal: bool> > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > -------- > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [5]: table.schema > Out[5]: > weight: double > animal_type: string > animal_interpretation: struct<is_large_animal: bool, is_mammal: bool> > child 0, is_large_animal: bool > child 1, is_mammal: bool > metadata > -------- > \{'org.apache.spark.sql.parquet.row.metadata': > '{"type":"struct","fields":[{"name":"weight","type":"double","nullable":true,"metadata":{}},\{"name":"animal_type","type":"string","nullable":true,"metadata":{}},{"name":"animal_interpretation","type":{"type":"struct","fields":[\\{"name":"is_large_animal","type":"boolean","nullable":true,"metadata":{}},\\\{"name":"is_mammal","type":"boolean","nullable":true,"metadata":{}}]},"nullable":false,"metadata":{}}]}'} > In [6]: pq.write_table(table,"test_write.parquet") > --------------------------------------------------------------------------- > ArrowInvalid Traceback (most recent call last) > <ipython-input-6-bd9d7deee437> in <module>() > ----> 1 pq.write_table(table,"test_write.parquet") > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(table, where, row_group_size, version, use_dictionary, > compression, use_deprecated_int96_timestamps, coerce_timestamps, flavor, > **kwargs) > 982 use_deprecated_int96_timestamps=use_int96, > 983 **kwargs) as writer: > --> 984 writer.write_table(table, row_group_size=row_group_size) > 985 except Exception: > 986 if is_path(where): > /usr/local/lib/python2.7/dist-packages/pyarrow/parquet.pyc in > write_table(self, table, row_group_size) > 325 table = _sanitize_table(table, self.schema, self.flavor) > 326 assert self.is_open > --> 327 self.writer.write_table(table, row_group_size=row_group_size) > 328 > 329 def close(self): > /usr/local/lib/python2.7/dist-packages/pyarrow/_parquet.so in > pyarrow._parquet.ParquetWriter.write_table() > /usr/local/lib/python2.7/dist-packages/pyarrow/lib.so in > pyarrow.lib.check_status() > ArrowInvalid: Nested column branch had multiple children > {noformat} > > I would really appreciate a fix on this. > Best, > Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)