[ https://issues.apache.org/jira/browse/PARQUET-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Micah Kornfield resolved PARQUET-2090. -------------------------------------- Resolution: Invalid > [C++] Parquet writes incorrect file_offset > ------------------------------------------- > > Key: PARQUET-2090 > URL: https://issues.apache.org/jira/browse/PARQUET-2090 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Reporter: Chao Sun > Assignee: Micah Kornfield > Priority: Critical > > Currently the Parquet writer sets {{file_offset}} in the following way (from > {{metadata.cc}}) > {code:cpp} > if (dictionary_page_offset > 0) { > > column_chunk_->meta_data.__set_dictionary_page_offset(dictionary_page_offset); > column_chunk_->__set_file_offset(dictionary_page_offset + > compressed_size); > } else { > column_chunk_->__set_file_offset(data_page_offset + compressed_size); > }{code} > This doesn't look correct, as it shouldn't take {{compressed_size}} into > consideration. > The {{file_offset}} is used when filtering row groups, and the above could > cause correctness issue. See SPARK-36696. -- This message was sent by Atlassian Jira (v8.3.4#803005)