Nicola Crane created ARROW-18403: ------------------------------------ Summary: [C++] Error consuming Substrait plan which uses count function: "only unary aggregate functions are currently supported" Key: ARROW-18403 URL: https://issues.apache.org/jira/browse/ARROW-18403 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Nicola Crane
ARROW-17523 added support for the Substrait extension function "count", but when I write code which produces a Substrait plan which calls it, and then try to run it in Acero, I get an error. The plan: {code:r} message of type 'substrait.Plan' with 3 fields set extension_uris { extension_uri_anchor: 1 uri: "https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml" } extension_uris { extension_uri_anchor: 2 uri: "https://github.com/substrait-io/substrait/blob/main/extensions/functions_comparison.yaml" } extension_uris { extension_uri_anchor: 3 uri: "https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml" } extensions { extension_function { extension_uri_reference: 3 function_anchor: 2 name: "count" } } relations { rel { aggregate { input { project { common { emit { output_mapping: 9 output_mapping: 10 output_mapping: 11 output_mapping: 12 output_mapping: 13 output_mapping: 14 output_mapping: 15 output_mapping: 16 output_mapping: 17 } } input { read { base_schema { names: "int" names: "dbl" names: "dbl2" names: "lgl" names: "false" names: "chr" names: "verses" names: "padded_strings" names: "some_negative" struct_ { types { i32 { nullability: NULLABILITY_NULLABLE } } types { fp64 { nullability: NULLABILITY_NULLABLE } } types { fp64 { nullability: NULLABILITY_NULLABLE } } types { bool_ { nullability: NULLABILITY_NULLABLE } } types { bool_ { nullability: NULLABILITY_NULLABLE } } types { string { nullability: NULLABILITY_NULLABLE } } types { string { nullability: NULLABILITY_NULLABLE } } types { string { nullability: NULLABILITY_NULLABLE } } types { fp64 { nullability: NULLABILITY_NULLABLE } } } } local_files { items { uri_file: "file:///tmp/RtmpsBsoZJ/file1915f604cff4a" parquet { } } } } } expressions { selection { direct_reference { struct_field { } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 1 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 2 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 3 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 4 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 5 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 6 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 7 } } root_reference { } } } expressions { selection { direct_reference { struct_field { field: 8 } } root_reference { } } } } } groupings { grouping_expressions { selection { direct_reference { struct_field { field: 3 } } root_reference { } } } } measures { measure { function_reference: 2 phase: AGGREGATION_PHASE_INITIAL_TO_RESULT output_type { i64 { nullability: NULLABILITY_NULLABLE } } invocation: AGGREGATION_INVOCATION_ALL } } } } } {code} The error: {code:java} Error: NotImplemented: Only unary aggregate functions are currently supported /home/nic2/arrow/cpp/src/arrow/engine/substrait/relation_internal.cc:587 converter(aggregate_call) /home/nic2/arrow/cpp/src/arrow/engine/substrait/serde.cc:153 FromProto(plan_rel.has_root() ? plan_rel.root().input() : plan_rel.rel(), ext_set, conversion_options) {code} I have no idea what the "phase" and "invocation" fields above do, but previous attempts to get Acero to consume this plan led to errors due to me using default values instead of the ones specified there (e.g. "Not Implemented: Unsupported aggregation phase 'AGGREGATION_PHASE_UNSPECIFIED'"), so I just changed them to see if it helped. -- This message was sent by Atlassian Jira (v8.20.10#820010)