[ https://issues.apache.org/jira/browse/PARQUET-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030160#comment-17030160 ]
Deepak Majeti commented on PARQUET-1781: ---------------------------------------- Even though the 1.3 writer wrote the "min_value", "max_value" along with the old "min", "max", the new statistics are not valid since the column order is not set according to the Parquet spec. In a way, this is a bug in the 1.3 reader to return new stats without verifying the column order. The reader in 1.4 does the right thing. > [C++] 1.4.0+ reader ignore stats created by 1.3.* writer > -------------------------------------------------------- > > Key: PARQUET-1781 > URL: https://issues.apache.org/jira/browse/PARQUET-1781 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Affects Versions: cpp-1.4.0, cpp-1.5.0 > Reporter: Milos Sukovic > Priority: Major > Original Estimate: 48h > Remaining Estimate: 48h > > [https://github.com/apache/arrow/commit/d257a88ed612301c0411894dfa783fcbff1bc867] > In referenced commit, change to metadata.cc file changed the way for checking > if new stats (min_value/max_value) are used. > From > if (metadata.statistics.__isset.max_value || > metadata.statistics.__isset.min_value) > to > if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER) > > This change is breaking backward compat - all files which contain new stats > (min_value/max_value), and are created before this change are valid, but they > do not set column order flag. > After this change, those stats are ignored, because column order flag is > checked. > Possible fix would be something like: > if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER || > (version == parquetcpp 1.3.* && (metadata.statistics.__isset.max_value || > metadata.statistics.__isset.min_value))) > I checked parquet-mr, and it seems like there, columnOrder is introduced as > part of the same change as min_value and max_value, so issue shouldn't happen > for files created by java code, but probably, stats are ignored by their > reader too for files created by parquet-cpp 1.3.*. -- This message was sent by Atlassian Jira (v8.3.4#803005)