[jira] [Commented] (ARROW-5030) [Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs
[ https://issues.apache.org/jira/browse/ARROW-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528890#comment-17528890 ] Judah commented on ARROW-5030: -- [~wesm_impala_7e40] I'm also running into this issue. Is this likely to be fixed / easy to fix? I'd be happy to give it a go but not really sure where to start. > [Python] read_row_group fails with Nested data conversions not implemented > for chunked array outputs > > > Key: ARROW-5030 > URL: https://issues.apache.org/jira/browse/ARROW-5030 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.12.0 >Reporter: Jakub Okoński >Priority: Major > Labels: parquet > > Hey, I'm trying to concatenate two files and to avoid reading everything to > memory at once, I wanted to use `read_row_group` for my solution, but it > fails. > > I think it's due to fields like these: > {{pyarrow.Field>}} > > But I'm not sure. Is this a duplicate? The issue linked in the code is > resolved > https://github.com/apache/arrow/blob/fd0b90a7f7e65fde32af04c4746004a1240914cf/cpp/src/parquet/arrow/reader.cc#L915 > > Stacktrace is > > {{ File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, > in read_batches}} > {{ table = pf.read_row_group(ix, columns=self._columns)}} > {{ File > "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", > line 186, in read_row_group}} > {{ use_threads=use_threads)}} > {{ File "pyarrow/_parquet.pyx", line 695, in > pyarrow._parquet.ParquetReader.read_row_group}} > {{ File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status}} > {{pyarrow.lib.ArrowNotImplementedError: Nested data conversions not > implemented for chunked array outputs}} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Issue Comment Deleted] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Judah updated ARROW-14196: -- Comment: was deleted (was: [~trucnguyenlam] Is this something that could be flipped now? https://github.com/pandas-dev/pandas/pull/43690#pullrequestreview-760364590) > [C++][Parquet] Default to compliant nested types in Parquet writer > -- > > Key: ARROW-14196 > URL: https://issues.apache.org/jira/browse/ARROW-14196 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: Joris Van den Bossche >Priority: Major > > In C++ there is already an option to get the "compliant_nested_types" (to > have the list columns follow the Parquet specification), and ARROW-11497 > exposed this option in Python. > This is still set to False by default, but in the source it says "TODO: At > some point we should flip this.", and in ARROW-11497 there was also some > discussion about what it would take to change the default. > cc [~emkornfield] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-14196) [C++][Parquet] Default to compliant nested types in Parquet writer
[ https://issues.apache.org/jira/browse/ARROW-14196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423299#comment-17423299 ] Judah commented on ARROW-14196: --- [~trucnguyenlam] Is this something that could be flipped now? https://github.com/pandas-dev/pandas/pull/43690#pullrequestreview-760364590 > [C++][Parquet] Default to compliant nested types in Parquet writer > -- > > Key: ARROW-14196 > URL: https://issues.apache.org/jira/browse/ARROW-14196 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Parquet >Reporter: Joris Van den Bossche >Priority: Major > > In C++ there is already an option to get the "compliant_nested_types" (to > have the list columns follow the Parquet specification), and ARROW-11497 > exposed this option in Python. > This is still set to False by default, but in the source it says "TODO: At > some point we should flip this.", and in ARROW-11497 there was also some > discussion about what it would take to change the default. > cc [~emkornfield] [~apitrou] -- This message was sent by Atlassian Jira (v8.3.4#803005)