924060929 commented on PR #64436: URL: https://github.com/apache/doris/pull/64436#issuecomment-4853411497
Correctness is fine here — full-access is a safe upper bound. But the implementation **over-reads**: it uses a fresh `CollectorContext` + `ACCESS_ALL` and discards the incoming context, forcing a full element read even when only metadata is needed. Example — `cardinality(array_map(x->1, arr))` (body doesn't reference the item): the access path comes out as `[arr, *]` (full element read), but only the array length is needed → it should be `[arr, OFFSET]`. `array_map`/`array_count`/`array_exists` with an element-independent body depend only on the array's length, not its elements. A more principled approach: when the body doesn't reference the item, branch on the function's *result* semantics instead of always full-reading: - derived value (array_map / array_count / array_exists) → `[arr, OFFSET]` (metadata only) - original elements (array_filter / array_first) → full read Related: `array_sort` is in the same `return collectArrayPathInLambda(...)` (pruning) branch but returns the original reordered elements — a comparator that reads only part of an element can prune away data it still has to return. That's the flip side of the same "blanket rule vs. decide-by-result-semantics" issue and worth a separate look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
