Re: [PR] [improvement](query) Align olap scan schema with storage keys [doris]

via GitHub Mon, 22 Jun 2026 05:10:33 -0700


github-actions[bot] commented on code in PR #64413:
URL: https://github.com/apache/doris/pull/64413#discussion_r3452061760



##########
be/src/storage/segment/segment_iterator.cpp:
##########
@@ -1486,6 +1368,9 @@ Status 
SegmentIterator::_apply_inverted_index_on_column_predicate(
 }
 
 bool SegmentIterator::_need_read_data(ColumnId cid) {
+    if (_can_skip_reading_extra_column(cid)) {
+        return false;

Review Comment:
   This new no-read path should also honor `enable_no_need_read_data_opt`. 
Right now `_can_skip_reading_extra_column()` can return `false` from 
`_need_read_data()` before the session kill switch is checked, and 
`_prune_column()` will synthesize defaults for the extra key. The existing 
no-read-data paths below and in `_no_need_read_key_data()` both consult 
`query_options().enable_no_need_read_data_opt` first, so setting 
`enable_no_need_read_data_opt=false` no longer disables all 
default-fill/no-read behavior. Please move the session-variable check before 
this extra-column shortcut, or add the same guard inside 
`_can_skip_reading_extra_column()`.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java:
##########
@@ -1001,6 +1024,84 @@ private PlanFragment 
computePhysicalOlapScan(PhysicalOlapScan olapScan, PlanTran
         return planFragment;
     }
 
+    private StorageAlignedScanSlots 
computeStorageAlignedScanSlots(PhysicalOlapScan olapScan) {
+        if (!shouldAlignScanSlotsToStorageSchema(olapScan)) {
+            return new StorageAlignedScanSlots(olapScan.getOutput(), 
Collections.emptySet());
+        }
+
+        Set<ExprId> outputExprIds = olapScan.getOutput().stream()
+                .map(Slot::getExprId)
+                .collect(Collectors.toSet());
+        Map<Integer, Slot> slotByColumnUniqueId = new HashMap<>();
+        Map<String, Slot> slotByColumnName = new HashMap<>();
+        for (Slot slot : olapScan.getOutput()) {
+            Optional<Column> originalColumn = ((SlotReference) 
slot).getOriginalColumn();
+            if (originalColumn.isPresent()) {
+                Column column = originalColumn.get();
+                if (column.getUniqueId() == 
Column.COLUMN_UNIQUE_ID_INIT_VALUE) {
+                    slotByColumnName.put(column.getName(), slot);
+                } else {
+                    slotByColumnUniqueId.put(column.getUniqueId(), slot);
+                }
+            }
+        }
+
+        List<Slot> storageSlots = new ArrayList<>();
+        Set<ExprId> storageExprIds = new HashSet<>();
+        Set<ExprId> extraKeyExprIds = new HashSet<>();
+        long selectedIndexId = olapScan.getSelectedIndexId() == -1
+                ? olapScan.getTable().getBaseIndexId()
+                : olapScan.getSelectedIndexId();
+        for (Column column : 
olapScan.getTable().getSchemaByIndexId(selectedIndexId, true)) {
+            if (!column.isKey()) {
+                break;
+            }
+            Slot slot = column.getUniqueId() == 
Column.COLUMN_UNIQUE_ID_INIT_VALUE
+                    ? slotByColumnName.get(column.getName())
+                    : slotByColumnUniqueId.get(column.getUniqueId());
+            slot = Objects.requireNonNull(slot, "missing scan slot for storage 
key column " + column.getName());
+            if (storageExprIds.add(slot.getExprId())) {
+                storageSlots.add(slot);
+            }
+            if (!outputExprIds.contains(slot.getExprId())) {
+                extraKeyExprIds.add(slot.getExprId());

Review Comment:
   `extraKeyExprIds` is currently unreachable. `outputExprIds` is built from 
`olapScan.getOutput()`, and every storage-key `slot` above is fetched from maps 
built from that same `olapScan.getOutput()`, so this test is always false and 
`extra_key_column_slot_ids` stays empty. The parent `PhysicalProject` later 
calls `updateScanSlotsMaterialization()` and only preserves slots from 
`getExtraKeyColumnSlotIds()` before pruning the scan tuple. For an AGG/non-MOW 
UNIQUE table with keys `(k1, k2)` and a query that projects only `k2`, `k1` can 
still be removed from the scan tuple even though BE non-direct reads expand 
`return_columns` back to all keys for merge/aggregation. Please mark required 
storage-key slots relative to the projected/required output, or otherwise 
preserve these key slots during scan materialization pruning, and add a 
translator test for `Project(OlapScan)` on `(k1, k2)` with only `k2` projected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [improvement](query) Align olap scan schema with storage keys [doris]

Reply via email to