suxiaogang223 opened a new pull request, #61759: URL: https://github.com/apache/doris/pull/61759
### What problem does this PR solve? Iceberg parquet position delete files currently treat the `file_path` column as dictionary-coded as long as the column chunk has a dictionary page. That check is too loose: parquet allows mixed encodings in the same column chunk, so a chunk can contain both dictionary-encoded and plain-encoded data pages. When that happens, Doris builds a `ColumnDictI32` for `file_path`, but the plain decoder later calls `insert_many_strings()`, which fails with: `Method insert_many_strings is not supported for ColumnDictionary` This PR fixes the issue by only using dictionary-backed decoding for Iceberg position delete `file_path` columns when the entire parquet column chunk is fully dictionary encoded. Mixed-encoding chunks now fall back to normal string columns. It also adds BE unit coverage for: - fully dictionary-encoded parquet metadata - mixed dictionary/plain parquet metadata - parquet metadata without `encoding_stats` but with non-dictionary encodings ### Release note None ### Check List - [x] This issue was confirmed with code analysis and user logs - [x] This change includes unit test coverage - [ ] Local unit tests were run in this environment ### Testing Local `git diff --check` passed. BE unit test execution was not run locally because the current build directory on this machine does not include the `doris_be_test` target. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
