vbarua commented on code in PR #12245:
URL: https://github.com/apache/datafusion/pull/12245#discussion_r1744627789
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -850,6 +868,31 @@ pub async fn from_substrait_rel(
}
}
+/// Validates that the given Substrait schema matches the given DataFusion
schema
Review Comment:
Good point, I've made the validation more flexible based on the work in your
PR.
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -850,6 +864,82 @@ pub async fn from_substrait_rel(
}
}
+/// Ensures that the given Substrait schema is compatible with the schema as
given by DataFusion
+///
+/// This means:
+/// 1. All fields present in the Substrait schema are present in the
DataFusion schema. The
+/// DataFusion schema may have MORE fields, but not the other way around.
+/// 2. All fields are compatible. See [`ensure_field_compatability`] for
details
+///
+/// This function returns a DataFrame with fields adjusted if necessary in the
event that the
+/// SubstraitSchema is a subset of the DataFusion schema.
+// ```
+fn ensure_schema_compatability(
+ table: DataFrame,
+ substrait_schema: DFSchema,
+) -> Result<DataFrame> {
+ // For now strip the qualifiers from the DataFusion and Substrait schemas
to make comparisons
+ // easier as there are issues around qualifier case.
+ // TODO: ensure that qualifiers are the same
+ let df_schema = table.schema().to_owned().strip_qualifiers();
+ if df_schema.logically_equivalent_names_and_types(&substrait_schema) {
+ return Ok(table);
+ }
+ let selected_columns = substrait_schema
+ .strip_qualifiers()
+ .fields()
+ .iter()
+ .map(|substrait_field| {
+ let df_field =
+ df_schema.field_with_unqualified_name(substrait_field.name())?;
+ ensure_field_compatability(df_field, substrait_field)?;
+ Ok(col(format!("\"{}\"", df_field.name())))
+ })
+ .collect::<Result<_>>()?;
+
+ table.select(selected_columns)
+}
+
+///
+/// Ensures that the given Substrait field is compatible with the given
DataFusion field
+///
+/// A field is compatible between Substrait and DataFusion if:
+/// 1. They have logically equivalent types.
+/// 2. They have the same nullability OR the Substrait field is nullable and
the DataFusion fields
+/// is not nullable.
+///
+/// If a Substrait field is not nullable, the Substrait plan may be built
around assuming it is not
+/// nullable. As such if DataFusion has that field as not nullable the plan
should be rejected.
+//
+fn ensure_field_compatability(
+ datafusion_field: &Field,
+ substrait_field: &Field,
+) -> Result<()> {
+ if DFSchema::datatype_is_logically_equal(
+ datafusion_field.data_type(),
+ substrait_field.data_type(),
+ ) {
+ if (datafusion_field.is_nullable() == substrait_field.is_nullable())
+ || (substrait_field.is_nullable() &&
!datafusion_field.is_nullable())
+ {
+ Ok(())
+ } else {
+ // TODO: from_substrait_struct_type needs to be updated to set the
nullability correctly. It defaults to true for now.
+ substrait_err!(
+ "Field '{}' is nullable in the Substrait schema but not
nullable in the DataFusion schema.",
+ substrait_field.name()
+ )
Review Comment:
Because of what's mentioned in the TODO, this code never fires. I'm opting
not to fix that as part of this PR.
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1586,9 +1629,12 @@ fn next_struct_field_name(
}
}
-fn from_substrait_named_struct(
+/// Convert Substrait NamedStruct to DataFusion DFSchemaRef
+pub fn from_substrait_named_struct(
Review Comment:
I tried doing that originally, but ran into failures in the intergration
tests which failed to compile with:
```
error[E0603]: function `from_substrait_named_struct` is private
--> datafusion/substrait/tests/utils.rs:24:55
|
24 | use
datafusion_substrait::logical_plan::consumer::from_substrait_named_struct;
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^ private function
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]