yaooqinn commented on PR #11837:
URL: https://github.com/apache/gluten/pull/11837#issuecomment-4144087793
## Root Cause Analysis
Thanks @zhouyuan for catching this! I traced the issue:
### What changed
**Before PR #16416**: `collect_set` had **1 signature** →
`hasSameIntermediateTypesAcrossSignatures()` returns `false` → companion
function registered as `collect_set_merge_extract` (no suffix).
**After PR #16416**: `collect_set` has **2 signatures** (1-arg + 2-arg with
boolean), both with `intermediateType("array(T)")` →
`hasSameIntermediateTypesAcrossSignatures()` returns `true` → companion
function registered as `collect_set_merge_extract_array_T` (**with suffix**).
### The mismatch
Gluten's `SubstraitToVeloxPlan.cc::toAggregationFunctionName()` (line
280-294):
1. First tries `collect_set_merge_extract` → **not found** (was registered
with suffix now)
2. Falls through, constructs suffix from concrete result type:
`collect_set_merge_extract_array_row_VARCHAR_BIGINT_BIGINT_endrow`
3. Looks up that name → **not found** (registered as generic
`collect_set_merge_extract_array_T`)
4. **Throws**: `Cannot find function signature`
### Fix options
1. **In Velox**: Ensure `hasSameIntermediateTypesAcrossSignatures` returns
`false` by keeping companion functions registered without suffix. Could use
`ignoreDuplicates` or separate registration for the 2-arg signature.
2. **In Gluten Substrait C++**: Update `toAggregationFunctionName` to try
the generic (no-suffix) companion name via
`getAggregateFunctionSignatures(baseName + "_merge_extract")` first with type
resolution, similar to how Velox's own planner resolves generic companion
functions.
I'll fix this in the Velox PR #16416 (or a follow-up) since the root cause
is the signature change creating a suffix-based companion registration.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]