[ https://issues.apache.org/jira/browse/SPARK-34276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425959#comment-17425959 ]
Micah Kornfield commented on SPARK-34276: ----------------------------------------- Sorry for the late reply. PARQUET-2089 has been a long standing bug in the C++ implementation where we were setting file_offset to the beginning of column_chunk metatadata and not the actual data page. It's not clear to me if this was a problem before parquet-mr 1.12 in practice. [~gershinsky] Would the fix in PARQUET-2078 make parquet-mr resilient to this bug? > Check the unreleased/unresolved JIRAs/PRs of Parquet 1.11 and 1.12 > ------------------------------------------------------------------ > > Key: SPARK-34276 > URL: https://issues.apache.org/jira/browse/SPARK-34276 > Project: Spark > Issue Type: Task > Components: Build, SQL > Affects Versions: 3.2.0 > Reporter: Yuming Wang > Assignee: Chao Sun > Priority: Blocker > > Before the release, we need to double check the unreleased/unresolved > JIRAs/PRs of Parquet 1.11/1.12 and then decide whether we should > upgrade/revert Parquet. At the same time, we should encourage the whole > community to do the compatibility and performance tests for their production > workloads, including both read and write code paths. > More details: > [https://github.com/apache/spark/pull/26804#issuecomment-768790620] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org