shyam narayan singh created ARROW-5322:
------------------------------------------

             Summary: [C++] [Parquet] Parquet files with dictionary page offset 
as 0 is not readable 
                 Key: ARROW-5322
                 URL: https://issues.apache.org/jira/browse/ARROW-5322
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: shyam narayan singh


There are many parquet files generated in our customers environment that can be 
read by Java parquet readers but not C parquet readers or pyarrow.

Reason being Java readers handles "dictionaryPageOffset = 0" to determine if 
dictionary page exists where as the C readers uses "has_dictionaryPageOffset" 
(_isset bit in thrift message) to determine the same resulting in incompatible 
behaviour. This incompatibility is curbing the pyarrow usage in our customers 
env.

Making this change makes C parquet readers and pyarrow more usable and 
compatible to java parquet readers.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to