alamb commented on code in PR #16589:
URL: https://github.com/apache/datafusion/pull/16589#discussion_r2176112016


##########
datafusion/physical-expr-adapter/src/schema_rewriter.rs:
##########
@@ -0,0 +1,811 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Physical expression schema rewriting utilities with struct support
+
+use std::sync::Arc;
+
+use arrow::compute::can_cast_types;
+use arrow::datatypes::{DataType, FieldRef, Schema};
+use datafusion_common::{
+    exec_err,
+    nested_struct::validate_struct_compatibility,
+    tree_node::{Transformed, TransformedResult, TreeNode},
+    Result, ScalarValue,
+};
+use datafusion_expr::ScalarUDF;
+use datafusion_functions::core::{getfield::GetFieldFunc, r#struct::StructFunc};
+use datafusion_physical_expr::{
+    expressions::{self, CastExpr, Column},
+    ScalarFunctionExpr,
+};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+
+/// Build a struct expression by recursively extracting and rewriting fields 
from a source struct.

Review Comment:
   > > Something about relying on struct and field extraction feels very wrong 
to me. Among other things now this casting may not be consistent with what 
happens when someone tries to call CAST(..) manually
   > 
   > Does it have to match? These seem like two similar but not exactly 
identical use cases to me.
   
   I worry that if they don't match we'll have potentially divergent behavior 
which could be confusing to users. 
   
   Also another use case is when doing stuff like coercing structs (so for 
example you could compare two struct columns for equality or `UNION` them 
together)
   
   > 
   > I am probably missing something but thinking about it to me it didn't seem 
like those calls would be much of a problem. Either way a new struct column is 
going to have to be built and the existing struct column is going to have to be 
read from disk in its entirety.
   
   Yeah, I don't think it will be that much faster (maybe it would save some 
virtual function calls or something). What I am really concerned about is 
having consistent cast behavior
   
   
   > But the pushback made me think that maybe we should be doing something 
else here: we should be inspecting the expression more deeply and in the case 
of `col.field` if `field` is not present in the physical schema we replace the 
whole `get_field(col, 'field')` expression with `null` instead of 
`get_col(struct('other', get_field(col, 'other'), 'field', null)))` which is 
basically what is happening now. Then we avoid reading the column altogether. 
We can do something similar with casts. I think this is complimentary to your 
proposal because the fallthrough case for `select struct_col` can call `cast` 
if needed like any other column, and our internal `cast` implementation can do 
the "smart things" with structs.
   
   👍 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to