[ https://issues.apache.org/jira/browse/ARROW-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582918#comment-17582918 ]
Arthur Passos commented on ARROW-17459: --------------------------------------- [~willjones127] at a first glance, it seems to be working. The client code I had was something like the below: {code:java} std::shared_ptr<arrow::Table> table; arrow::Status read_status = file_reader->ReadRowGroup(row_group_current, column_indices, &table); if (!read_status.ok()) throw ParsingException{"Error while reading Parquet data: " + read_status.ToString(), ErrorCodes::CANNOT_READ_ALL_DATA}; ++row_group_current; {code} Now it's the below: {code:java} std::shared_ptr<arrow::Table> table; std::unique_ptr<::arrow::RecordBatchReader> rbr; std::vector<int> row_group_indices { row_group_current }; arrow::Status get_batch_reader_status = file_reader->GetRecordBatchReader(row_group_indices, column_indices, &rbr); if (!get_batch_reader_status.ok()) throw ParsingException{"Error while reading Parquet data: " + get_batch_reader_status.ToString(), ErrorCodes::CANNOT_READ_ALL_DATA}; arrow::Status read_status = rbr->ReadAll(&table); if (!read_status.ok()) throw ParsingException{"Error while reading Parquet data: " + read_status.ToString(), ErrorCodes::CANNOT_READ_ALL_DATA}; ++row_group_current;{code} *Question: Should I expect any regressions or different behaviour by changing the code path to the latter?* > [C++] Support nested data conversions for chunked array > ------------------------------------------------------- > > Key: ARROW-17459 > URL: https://issues.apache.org/jira/browse/ARROW-17459 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Arthur Passos > Priority: Blocker > > `FileReaderImpl::ReadRowGroup` fails with "Nested data conversions not > implemented for chunked array outputs". It fails on > [ChunksToSingle]([https://github.com/apache/arrow/blob/7f6b074b84b1ca519b7c5fc7da318e8d47d44278/cpp/src/parquet/arrow/reader.cc#L95]) > Data schema is: > {code:java} > optional group fields_map (MAP) = 217 { > repeated group key_value { > required binary key (STRING) = 218; > optional binary value (STRING) = 219; > } > } > fields_map.key_value.value-> Size In Bytes: 13243589 Size In Ratio: 0.20541047 > fields_map.key_value.key-> Size In Bytes: 3008860 Size In Ratio: 0.046667963 > {code} > Is there a way to work around this issue in the cpp lib? > In any case, I am willing to implement this, but I need some guidance. I am > very new to parquet (as in started reading about it yesterday). > > Probably related to: https://issues.apache.org/jira/browse/ARROW-10958 -- This message was sent by Atlassian Jira (v8.20.10#820010)