Richard Tia created ARROW-17061: ----------------------------------- Summary: [Python] Acero consumer is unable to consume count function from substrait query plan Key: ARROW-17061 URL: https://issues.apache.org/jira/browse/ARROW-17061 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Richard Tia
SQL {code:java} select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from '{}' where l_shipdate <= date '1998-12-01' - interval '120' day (3) group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus {code} The substrait plan generated from SQL, using Isthmus. substrait count: [https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml] Running the substrait plan with Acero returns this error: {code:java} E pyarrow.lib.ArrowInvalid: JsonToBinaryStream returned INVALID_ARGUMENT:(relations[0].root.input.sort.input.aggregate.measures[7].measure) arguments: Cannot find field. {code} >From substrait query plan: relations[0].root.input.sort.input.aggregate.measures[7].measure {code:java} "measure": { "functionReference": 7, "args": [], "sorts": [], "phase": "AGGREGATION_PHASE_INITIAL_TO_RESULT", "outputType": { "i64": { "typeVariationReference": 0, "nullability": "NULLABILITY_REQUIRED" } }, "invocation": "AGGREGATION_INVOCATION_ALL", "arguments": [] } {code} {code:java} "extensionFunction": { "extensionUriReference": 3, "functionAnchor": 7, "name": "count:opt" } {code} Count is a unary function and should be consumable, but isn't in this case. -- This message was sent by Atlassian Jira (v8.20.10#820010)