alamb commented on code in PR #10148: URL: https://github.com/apache/datafusion/pull/10148#discussion_r1581806680
########## datafusion/functions/src/core/arrowtypeof.rs: ########## @@ -69,4 +69,8 @@ impl ScalarUDFImpl for ArrowTypeOfFunc { "{input_data_type}" )))) } + + fn validate_number_of_rows(&self) -> bool { Review Comment: I don't think this is correct -- arrow_typeof always returns the same number of rows as its input... ########## datafusion/functions/src/core/getfield.rs: ########## @@ -107,29 +109,55 @@ impl ScalarUDFImpl for GetFieldFunc { ); } }; + match (array.data_type(), name) { - (DataType::Map(_, _), ScalarValue::Utf8(Some(k))) => { - let map_array = as_map_array(array.as_ref())?; - let key_scalar = Scalar::new(StringArray::from(vec![k.clone()])); - let keys = arrow::compute::kernels::cmp::eq(&key_scalar, map_array.keys())?; - let entries = arrow::compute::filter(map_array.entries(), &keys)?; Review Comment: I don't understand this If the input is like this (two rows, each three elements) ``` { a: 1, b: 2, c: 100} { a: 3, b: 4, c: 200} ``` An expression like `col['c']` will still return 2 rows (but each row will have only a single element) ``` { c: 100 } { c: 200 } ``` ########## datafusion/physical-expr/src/scalar_function.rs: ########## @@ -146,11 +146,22 @@ impl PhysicalExpr for ScalarFunctionExpr { // evaluate the function match self.fun { ScalarFunctionDefinition::UDF(ref fun) => { - if self.args.is_empty() { - fun.invoke_no_args(batch.num_rows()) - } else { - fun.invoke(&inputs) + let output = match self.args.is_empty() { + true => fun.invoke_no_args(batch.num_rows()), + false => fun.invoke(&inputs), + }?; + + if fun.validate_number_of_rows() { + let output_size = match &output { + ColumnarValue::Array(array) => array.len(), + ColumnarValue::Scalar(_) => 1, Review Comment: I don't think this logic is correct -- specifically `ColumnarValue::Scalar` represents (logically) a single value for all the rows. Thus I think if the function returns `ColumnarValue::Scalar` that implicitly can be any number of rows. Therefore I think the check could be something like ```rust if let ColumnarValue::Array(array) = &output { if output_size != array.len() { return internal_err... } } ``` I'll try and make a PR to improve the documentation on ColumnarValue to make this clearer: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.ColumnarValue.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org