zclllyybb commented on issue #64800:
URL: https://github.com/apache/doris/issues/64800#issuecomment-4790366568

   Breakwater-GitHub-Analysis-Slot: slot_688d925f92cd
   This content is generated by AI for reference only.
   
   Initial triage: this looks like a real FE planner bug in Hive external 
partition pruning, not a BE scan failure.
   
   What I checked:
   
   - The issue currently has no comments or labels, and I did not find a public 
matching PR for this exact stack.
   - On current `upstream/master`, and also on `branch-3.1`, `branch-4.0`, and 
`branch-4.1`, `PruneFileScanPartition.pruneExternalPartitions()` uses 
`scan.getSelectedPartitions().selectedPartitions` as the selected partition map.
   - When `enable_binary_search_filtering_partitions` is enabled, the same rule 
also calls `externalTable.getSortedPartitionRanges(scan)`.
   - For Hive tables, `HMSExternalTable.getSortedPartitionRanges()` obtains 
ranges from `HivePartitionValues`; it does not derive the ranges from the 
scan's captured selected-partition map.
   - `PartitionPruner.binarySearchFiltering()` can therefore return partition 
names from the sorted-range source. `PruneFileScanPartition` then does 
`selectedPartitionItems.put(name, nameToPartitionItem.get(name))`. If that name 
is absent from the scan's selected map, the value inserted is `null`.
   - `LogicalFileScan.SelectedPartitions` then calls 
`ImmutableMap.copyOf(selectedPartitions)`, which exactly matches the reported 
exception: `null value in entry: dt=2026-06-22=null`.
   
   Assessment:
   
   The reporter's root-cause direction is plausible and matches the current 
code path. The binary-search pruning path mixes two partition sources: the 
scan's captured selected-partition map and Hive's current sorted partition 
ranges. For Hive, `loadSnapshot()` returns an empty MVCC snapshot, so the 
planner does not pin a Hive partition-list version. If HMS/cache state changes 
between scan construction and partition pruning, the binary-search result can 
contain a partition name that is not present in the scan map, causing the NPE.
   
   A practical workaround should be to disable 
`enable_binary_search_filtering_partitions`, because the sequential pruning 
path iterates only the scan's selected-partition map and should not produce 
missing-name/null-value entries.
   
   Suggested maintainer next steps:
   
   - Make the binary-search path use the same partition source as 
`scan.getSelectedPartitions().selectedPartitions`, for example by building 
`SortedPartitionRanges` from that map for this pruning pass, or otherwise 
enforce a coherent snapshot/map before constructing `SelectedPartitions`.
   - Do not silently insert `nameToPartitionItem.get(name)` into 
`selectedPartitionItems` without verifying the value is present; a missing name 
indicates a partition-source mismatch and should be handled by 
rebuilding/falling back rather than producing a null map value.
   - Add a regression test where the sorted ranges contain a partition name not 
present in the selected-partition map, and verify the planner does not throw an 
NPE. A stronger test would simulate Hive partition cache refresh/add-partition 
during planning.
   
   Information still needed from the reporter to confirm the exact blast radius:
   
   - Exact Doris build/tag for 3.1, and whether the same issue has been 
reproduced on concrete 4.0/4.1 release tags.
   - Minimal Hive table DDL, partition DDL, query SQL, and all relevant session 
variables.
   - Whether `dt=2026-06-22` was added, dropped, or refreshed in HMS while the 
failing query was being planned.
   - Full FE stack trace, query id, and FE logs around external metadata cache 
refresh/invalidation for the affected table.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to