wiedld commented on code in PR #17250:
URL: https://github.com/apache/datafusion/pull/17250#discussion_r2286541349


##########
datafusion/optimizer/src/optimize_projections/mod.rs:
##########
@@ -55,6 +55,24 @@ use datafusion_common::tree_node::{
 /// The rule analyzes the input logical plan, determines the necessary column
 /// indices, and then removes any unnecessary columns. It also removes any
 /// unnecessary projections from the plan tree.
+///
+/// ## Schema, Field Properties, and Metadata Handling
+///
+/// The `OptimizeProjections` rule preserves schema and field metadata in most 
optimization scenarios:
+///
+/// **Schema-level metadata preservation by plan type**:
+/// - **Window and Aggregate plans**: Schema metadata is preserved
+/// - **Projection plans**: Schema metadata is preserved per 
[`projection_schema`](datafusion_expr::logical_plan::projection_schema).
+/// - **Other logical plans**: Schema metadata is preserved unless 
[`LogicalPlan::recompute_schema`]
+///   is called on plan types that drop metadata
+///
+/// **Field-level properties and metadata**: Individual field properties are 
preserved when fields
+/// are retained in the optimized plan, determined by 
[`exprlist_to_fields`](datafusion_expr::utils::exprlist_to_fields)
+/// and 
[`ExprSchemable::to_field`](datafusion_expr::expr_schema::ExprSchemable::to_field).
+///
+/// **Field precedence**: When the same field appears multiple times, the 
optimizer
+/// maintains one occurrence and removes duplicates (refer to 
`RequiredIndices::compact()`),

Review Comment:
   Note that `RequiredIndices::compact()` currently [uses 
sort_unstable](https://github.com/apache/datafusion/blob/4bc069693dc374c1b52c6266ae7a01ef46e1fdf5/datafusion/optimizer/src/optimize_projections/required_indices.rs#L222),
 which means that it (performantly) picks a winner field -- not the first field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to