[ https://issues.apache.org/jira/browse/ARROW-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662343#comment-17662343 ]
Rok Mihevc commented on ARROW-5322: ----------------------------------- This issue has been migrated to [issue #21784|https://github.com/apache/arrow/issues/21784] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [C++] [Parquet] Parquet files with dictionary page offset as 0 is not > readable > ------------------------------------------------------------------------------- > > Key: ARROW-5322 > URL: https://issues.apache.org/jira/browse/ARROW-5322 > Project: Apache Arrow > Issue Type: Bug > Reporter: shyam narayan singh > Priority: Major > Labels: parquet, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > There are many parquet files generated in our customers environment that can > be read by Java parquet readers but not C parquet readers or pyarrow. > Reason being Java readers handles "dictionaryPageOffset = 0" to determine if > dictionary page exists where as the C readers uses "has_dictionaryPageOffset" > (_isset bit in thrift message) to determine the same resulting in > incompatible behaviour. This incompatibility is curbing the pyarrow usage in > our customers env. > Making this change makes C parquet readers and pyarrow more usable and > compatible to java parquet readers. > -- This message was sent by Atlassian Jira (v8.20.10#820010)