[ 
https://issues.apache.org/jira/browse/ARROW-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629192#comment-17629192
 ] 

Arthur Passos commented on ARROW-17459:
---------------------------------------

Hi [~willjones127] . I have implemented your suggestion of GetRecordBatchReader 
and, at first, things seemed to work as expected. Recently, an issue regarding 
parquet data has been reported and reverting it to the ReadRowGroup solution 
seems to address this. This might be a misuse of the arrow library on my side, 
even though I have read the API docs and it looks correct.

 

My question is pretty much: should there be difference in the output when using 
the two APIs?

> [C++] Support nested data conversions for chunked array
> -------------------------------------------------------
>
>                 Key: ARROW-17459
>                 URL: https://issues.apache.org/jira/browse/ARROW-17459
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Arthur Passos
>            Assignee: Arthur Passos
>            Priority: Blocker
>
> `FileReaderImpl::ReadRowGroup` fails with "Nested data conversions not 
> implemented for chunked array outputs". It fails on 
> [ChunksToSingle]([https://github.com/apache/arrow/blob/7f6b074b84b1ca519b7c5fc7da318e8d47d44278/cpp/src/parquet/arrow/reader.cc#L95])
> Data schema is: 
> {code:java}
>   optional group fields_map (MAP) = 217 {
>     repeated group key_value {
>       required binary key (STRING) = 218;
>       optional binary value (STRING) = 219;
>     }
>   }
> fields_map.key_value.value-> Size In Bytes: 13243589 Size In Ratio: 0.20541047
> fields_map.key_value.key-> Size In Bytes: 3008860 Size In Ratio: 0.046667963
> {code}
> Is there a way to work around this issue in the cpp lib?
> In any case, I am willing to implement this, but I need some guidance. I am 
> very new to parquet (as in started reading about it yesterday).
>  
> Probably related to: https://issues.apache.org/jira/browse/ARROW-10958



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to