Pierre Belzile created ARROW-8657:
-------------------------------------

             Summary: Distinguish parquet version 2 logical type vs DataPageV2
                 Key: ARROW-8657
                 URL: https://issues.apache.org/jira/browse/ARROW-8657
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.17.0
            Reporter: Pierre Belzile


With the recent release of 0.17, the ParquetVersion is used to define the 
logical type interpretation of fields and the selection of the DataPage format.

As a result all parquet files that were created with ParquetVersion::V2 to get 
features such as unsigned int32s, timestamps with nanosecond resolution, etc 
are now unreadable. That's TBs of data in my case.

Those two concerns should be separated. Given that that DataPageV2 pages were 
not written prior to 0.17 and in order to allow reading existing files, the 
existing version property should continue to operate as in 0.16 and inform the 
logical type mapping.

Some consideration should be given to issue a release 0.17.1.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to