alamb commented on code in PR #10148:
URL: https://github.com/apache/datafusion/pull/10148#discussion_r1581806680


##########
datafusion/functions/src/core/arrowtypeof.rs:
##########
@@ -69,4 +69,8 @@ impl ScalarUDFImpl for ArrowTypeOfFunc {
             "{input_data_type}"
         ))))
     }
+
+    fn validate_number_of_rows(&self) -> bool {

Review Comment:
   I don't think this is correct -- arrow_typeof always returns the same number 
of rows as its input...



##########
datafusion/functions/src/core/getfield.rs:
##########
@@ -107,29 +109,55 @@ impl ScalarUDFImpl for GetFieldFunc {
                 );
             }
         };
+
         match (array.data_type(), name) {
-                (DataType::Map(_, _), ScalarValue::Utf8(Some(k))) => {
-                    let map_array = as_map_array(array.as_ref())?;
-                    let key_scalar = 
Scalar::new(StringArray::from(vec![k.clone()]));
-                    let keys = arrow::compute::kernels::cmp::eq(&key_scalar, 
map_array.keys())?;
-                    let entries = arrow::compute::filter(map_array.entries(), 
&keys)?;

Review Comment:
   I don't understand this
   
   If the input is like this (two rows, each three elements)
   ```
   { a: 1, b: 2, c: 100}
   { a: 3, b: 4, c: 200}
   ```
   
   An expression like `col['c']` will still return 2 rows (but each row will 
have only a single element)
   
   ```
   { c: 100 }
   { c: 200 }
   ```



##########
datafusion/physical-expr/src/scalar_function.rs:
##########
@@ -146,11 +146,22 @@ impl PhysicalExpr for ScalarFunctionExpr {
         // evaluate the function
         match self.fun {
             ScalarFunctionDefinition::UDF(ref fun) => {
-                if self.args.is_empty() {
-                    fun.invoke_no_args(batch.num_rows())
-                } else {
-                    fun.invoke(&inputs)
+                let output = match self.args.is_empty() {
+                    true => fun.invoke_no_args(batch.num_rows()),
+                    false => fun.invoke(&inputs),
+                }?;
+
+                if fun.validate_number_of_rows() {
+                    let output_size = match &output {
+                        ColumnarValue::Array(array) => array.len(),
+                        ColumnarValue::Scalar(_) => 1,

Review Comment:
   I don't think this logic is correct -- specifically `ColumnarValue::Scalar` 
represents (logically) a single value for all the rows.
   
   Thus I think if the function returns `ColumnarValue::Scalar` that implicitly 
can be any number of rows. 
   
   Therefore I think the check could be something like 
   ```rust
   if let ColumnarValue::Array(array) = &output {
     if output_size != array.len() {
       return internal_err...
     }
   }
   ```
   
   I'll try and make a PR to improve the documentation on ColumnarValue to make 
this clearer: 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.ColumnarValue.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to