[ 
https://issues.apache.org/jira/browse/ARROW-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662343#comment-17662343
 ] 

Rok Mihevc commented on ARROW-5322:
-----------------------------------

This issue has been migrated to [issue 
#21784|https://github.com/apache/arrow/issues/21784] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [C++] [Parquet] Parquet files with dictionary page offset as 0 is not 
> readable 
> -------------------------------------------------------------------------------
>
>                 Key: ARROW-5322
>                 URL: https://issues.apache.org/jira/browse/ARROW-5322
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: shyam narayan singh
>            Priority: Major
>              Labels: parquet, pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are many parquet files generated in our customers environment that can 
> be read by Java parquet readers but not C parquet readers or pyarrow.
> Reason being Java readers handles "dictionaryPageOffset = 0" to determine if 
> dictionary page exists where as the C readers uses "has_dictionaryPageOffset" 
> (_isset bit in thrift message) to determine the same resulting in 
> incompatible behaviour. This incompatibility is curbing the pyarrow usage in 
> our customers env.
> Making this change makes C parquet readers and pyarrow more usable and 
> compatible to java parquet readers.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to