Re: [I] CaseWhen does not work with custom implemented column expression [datafusion]

via GitHub Tue, 31 Mar 2026 14:48:28 -0700


rluvaton commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4164883881


   no, and I already have it in my repo.
   
   because lets say I have this:
   ```
   select external_column, array_transform(list, item -> item + 
external_column) from tbl
   ```
   
   so array_transform knows that it has 1 column that it should get from the 
parent scope
   
   
   then this is what the implementation will save:
   ```
   ArrayTransform:
        child: column list
        external references mapping:
                // (name used in the lambda expression, reference expression)
                "external_column", column(external column)
        lambda:
                Add:
                        bound lambda variable: position: 0
                        bound lambda variable: position 1
   ```
   
   and when you create the schema in the lambda expression you map from 
external used references to the internal column position
   
   and when case expression rewrite the columns, it rewrite the external 
reference mapping
   
   Look at this example implementation of array transform with support for 
external reference capture that will work with the mapping
   ```rust
   struct ArrayTransformExpression {
        child: Arc<dyn PhysicalExpr>,
        lambda_function: Arc<dyn PhysicalExpr>,
        external_references: Vec<(
                // Renamed schema field name in lambda schema
                String,
                // the expression to bound
                Arc<dyn PhysicalExpr>
        )>
   }
   
   impl ArrayTransformExpression {
        fn create_lambda_function_schema(&self, input_schema: &Schema, 
lambda_args: Vec<FieldRef>) -> SchemaRef {
                let mut lambda_schema_fields = lambda_args.clone(); 
                let start_index = lambda_schema_fields.len()
   
                for (field_name, external_ref) in self.external_references {
                        let field = 
external_ref.return_field(input_schema).unwrap();
                        let renamed_field = field.with_name(field_name);
                        lambda_schema_fields.push(renamed_field);
                }
   
                Arc::new(Schema::new(lambda_schema_fields))
        }
   
        fn create_lambda_function_input_batch(&self, input_batch: &RecordBatch, 
lambda_args: Vec<(FieldRef, ArrayRef)>) -> SchemaRef {
                let (mut lambda_schema_fields, mut lambda_arrays) = 
lambda_args.into_iter().unzip(); 
                let start_index = lambda_schema_fields.len()
   
                for (field_name, external_ref) in self.external_references {
                        let field = 
external_ref.return_field(input_schema).unwrap();
                        let renamed_field = field.with_name(field_name);
                        lambda_schema_fields.push(renamed_field);
   
                        let external_reference_value = 
external_ref.evaluate(input_batch);
                        lambda_arrays.push(external_reference_value);
                }
   
                let lambda_function_schema = 
Arc::new(Schema::new(lambda_schema_fields));
                
                RecordBatch::try_new(lambda_function_schema, lambda_arrays)
        }
   
        fn children_in_scope(&self) -> Vec<&Arc<dyn PhysicalExpr>> {
                
std::iter::once(&self.child).chain(self.external_references.iter().map(|(_, 
external_ref)| external_ref)).collect()
        }
   
        fn with_new_children_in_scope(
                self: Arc<Self>,
                children_in_scope: Vec<Arc<dyn PhysicalExpr>>,
            ) -> Result<Arc<dyn PhysicalExpr>> {
                        // child and external references
                        assert_eq!(children_in_scope.len(), 1 + 
self.external_references.len())
   
                        Ok(Arc::new(ArrayTransformExpression {
                                child: children_in_scope[0],
                                lambda_function: self.lambda_function.clone(),
                                // this update the mapping      
                                external_references: self.external_references
                                        .iter()
                                        .map(|(name, _old_ref)| name)
                                        .zip(children_in_scope.iter().skip(1))
                                        .collect()
                        }))
            }
   }
   ``` 
   
   so in your case, if you work through children_in_scope and keep the same 
code you already do in CASE WHEN it will work
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] CaseWhen does not work with custom implemented column expression [datafusion]

Reply via email to