zclllyybb commented on issue #64800: URL: https://github.com/apache/doris/issues/64800#issuecomment-4790366568
Breakwater-GitHub-Analysis-Slot: slot_688d925f92cd This content is generated by AI for reference only. Initial triage: this looks like a real FE planner bug in Hive external partition pruning, not a BE scan failure. What I checked: - The issue currently has no comments or labels, and I did not find a public matching PR for this exact stack. - On current `upstream/master`, and also on `branch-3.1`, `branch-4.0`, and `branch-4.1`, `PruneFileScanPartition.pruneExternalPartitions()` uses `scan.getSelectedPartitions().selectedPartitions` as the selected partition map. - When `enable_binary_search_filtering_partitions` is enabled, the same rule also calls `externalTable.getSortedPartitionRanges(scan)`. - For Hive tables, `HMSExternalTable.getSortedPartitionRanges()` obtains ranges from `HivePartitionValues`; it does not derive the ranges from the scan's captured selected-partition map. - `PartitionPruner.binarySearchFiltering()` can therefore return partition names from the sorted-range source. `PruneFileScanPartition` then does `selectedPartitionItems.put(name, nameToPartitionItem.get(name))`. If that name is absent from the scan's selected map, the value inserted is `null`. - `LogicalFileScan.SelectedPartitions` then calls `ImmutableMap.copyOf(selectedPartitions)`, which exactly matches the reported exception: `null value in entry: dt=2026-06-22=null`. Assessment: The reporter's root-cause direction is plausible and matches the current code path. The binary-search pruning path mixes two partition sources: the scan's captured selected-partition map and Hive's current sorted partition ranges. For Hive, `loadSnapshot()` returns an empty MVCC snapshot, so the planner does not pin a Hive partition-list version. If HMS/cache state changes between scan construction and partition pruning, the binary-search result can contain a partition name that is not present in the scan map, causing the NPE. A practical workaround should be to disable `enable_binary_search_filtering_partitions`, because the sequential pruning path iterates only the scan's selected-partition map and should not produce missing-name/null-value entries. Suggested maintainer next steps: - Make the binary-search path use the same partition source as `scan.getSelectedPartitions().selectedPartitions`, for example by building `SortedPartitionRanges` from that map for this pruning pass, or otherwise enforce a coherent snapshot/map before constructing `SelectedPartitions`. - Do not silently insert `nameToPartitionItem.get(name)` into `selectedPartitionItems` without verifying the value is present; a missing name indicates a partition-source mismatch and should be handled by rebuilding/falling back rather than producing a null map value. - Add a regression test where the sorted ranges contain a partition name not present in the selected-partition map, and verify the planner does not throw an NPE. A stronger test would simulate Hive partition cache refresh/add-partition during planning. Information still needed from the reporter to confirm the exact blast radius: - Exact Doris build/tag for 3.1, and whether the same issue has been reproduced on concrete 4.0/4.1 release tags. - Minimal Hive table DDL, partition DDL, query SQL, and all relevant session variables. - Whether `dt=2026-06-22` was added, dropped, or refreshed in HMS while the failing query was being planned. - Full FE stack trace, query id, and FE logs around external metadata cache refresh/invalidation for the affected table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
