[ https://issues.apache.org/jira/browse/ARROW-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc updated ARROW-5322: ------------------------------ External issue URL: https://github.com/apache/arrow/issues/21784 > [C++] [Parquet] Parquet files with dictionary page offset as 0 is not > readable > ------------------------------------------------------------------------------- > > Key: ARROW-5322 > URL: https://issues.apache.org/jira/browse/ARROW-5322 > Project: Apache Arrow > Issue Type: Bug > Reporter: shyam narayan singh > Priority: Major > Labels: parquet, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > There are many parquet files generated in our customers environment that can > be read by Java parquet readers but not C parquet readers or pyarrow. > Reason being Java readers handles "dictionaryPageOffset = 0" to determine if > dictionary page exists where as the C readers uses "has_dictionaryPageOffset" > (_isset bit in thrift message) to determine the same resulting in > incompatible behaviour. This incompatibility is curbing the pyarrow usage in > our customers env. > Making this change makes C parquet readers and pyarrow more usable and > compatible to java parquet readers. > -- This message was sent by Atlassian Jira (v8.20.10#820010)