rluvaton commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4164883881
no, and I already have it in my repo.
because lets say I have this:
```
select external_column, array_transform(list, item -> item +
external_column) from tbl
```
so array_transform knows that it has 1 column that it should get from the
parent scope
then this is what the implementation will save:
```
ArrayTransform:
child: column list
external references mapping:
// (name used in the lambda expression, reference expression)
"external_column", column(external column)
lambda:
Add:
bound lambda variable: position: 0
bound lambda variable: position 1
```
and when you create the schema in the lambda expression you map from
external used references to the internal column position
and when case expression rewrite the columns, it rewrite the external
reference mapping
Look at this example implementation of array transform with support for
external reference capture that will work with the mapping
```rust
struct ArrayTransformExpression {
child: Arc<dyn PhysicalExpr>,
lambda_function: Arc<dyn PhysicalExpr>,
external_references: Vec<(
// Renamed schema field name in lambda schema
String,
// the expression to bound
Arc<dyn PhysicalExpr>
)>
}
impl ArrayTransformExpression {
fn create_lambda_function_schema(&self, input_schema: &Schema,
lambda_args: Vec<FieldRef>) -> SchemaRef {
let mut lambda_schema_fields = lambda_args.clone();
let start_index = lambda_schema_fields.len()
for (field_name, external_ref) in self.external_references {
let field =
external_ref.return_field(input_schema).unwrap();
let renamed_field = field.with_name(field_name);
lambda_schema_fields.push(renamed_field);
}
Arc::new(Schema::new(lambda_schema_fields))
}
fn create_lambda_function_input_batch(&self, input_batch: &RecordBatch,
lambda_args: Vec<(FieldRef, ArrayRef)>) -> SchemaRef {
let (mut lambda_schema_fields, mut lambda_arrays) =
lambda_args.into_iter().unzip();
let start_index = lambda_schema_fields.len()
for (field_name, external_ref) in self.external_references {
let field =
external_ref.return_field(input_schema).unwrap();
let renamed_field = field.with_name(field_name);
lambda_schema_fields.push(renamed_field);
let external_reference_value =
external_ref.evaluate(input_batch);
lambda_arrays.push(external_reference_value);
}
let lambda_function_schema =
Arc::new(Schema::new(lambda_schema_fields));
RecordBatch::try_new(lambda_function_schema, lambda_arrays)
}
fn children_in_scope(&self) -> Vec<&Arc<dyn PhysicalExpr>> {
std::iter::once(&self.child).chain(self.external_references.iter().map(|(_,
external_ref)| external_ref)).collect()
}
fn with_new_children_in_scope(
self: Arc<Self>,
children_in_scope: Vec<Arc<dyn PhysicalExpr>>,
) -> Result<Arc<dyn PhysicalExpr>> {
// child and external references
assert_eq!(children_in_scope.len(), 1 +
self.external_references.len())
Ok(Arc::new(ArrayTransformExpression {
child: children_in_scope[0],
lambda_function: self.lambda_function.clone(),
// this update the mapping
external_references: self.external_references
.iter()
.map(|(name, _old_ref)| name)
.zip(children_in_scope.iter().skip(1))
.collect()
}))
}
}
```
so in your case, if you work through children_in_scope and keep the same
code you already do in CASE WHEN it will work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]