aaraujo commented on PR #17634:
URL: https://github.com/apache/datafusion/pull/17634#issuecomment-3316071773

   @Jefffrey Absolutely! Here's a standalone test case that reproduces the 
issue without external dependencies:
   
   ## Standalone Reproduction
   
   ```rust
   // Save as: datafusion/core/examples/qualified_column_repro.rs
   // Run with: cargo run --example qualified_column_repro
   
   use datafusion::arrow::datatypes::{DataType, Field};
   use datafusion::common::{Column, DFSchema, Result, TableReference};
   use datafusion::logical_expr::{lit, BinaryExpr, Expr, ExprSchemable, 
Operator};
   
   #[tokio::main]
   async fn main() -> Result<()> {
       // Create a schema that represents the output of an aggregation
       // Aggregations produce unqualified column names in their output schema
       let post_agg_schema = DFSchema::from_unqualified_fields(
           vec![Field::new("avg(metrics.value)", DataType::Float64, 
true)].into(),
           Default::default(),
       )?;
   
       println!("Post-aggregation schema has field: {:?}",
                post_agg_schema.fields()[0].name());
   
       // Create a qualified column reference (as the optimizer might produce)
       let qualified_col = Expr::Column(Column::new(
           Some(TableReference::bare("metrics")),
           "avg(metrics.value)"
       ));
   
       // Create a binary expression: metrics.avg(metrics.value) / 1024
       let binary_expr = Expr::BinaryExpr(BinaryExpr::new(
           Box::new(qualified_col.clone()),
           Operator::Divide,
           Box::new(lit(1024.0)),
       ));
   
       println!("\nTrying to resolve qualified column: 
metrics.avg(metrics.value)");
       match qualified_col.get_type(&post_agg_schema) {
           Ok(dtype) => println!("✓ SUCCESS: Resolved to type {:?}", dtype),
           Err(e) => println!("✗ ERROR: {}", e),
       }
   
       println!("\nTrying to resolve binary expression: 
metrics.avg(metrics.value) / 1024");
       match binary_expr.get_type(&post_agg_schema) {
           Ok(dtype) => println!("✓ SUCCESS: Resolved to type {:?}", dtype),
           Err(e) => println!("✗ ERROR: {}", e),
       }
       Ok(())
   }
   ```
   Results
   
   Without the fix:
   
   Post-aggregation schema has field: "avg(metrics.value)"
   
   Trying to resolve qualified column: metrics.avg(metrics.value)
   ✗ ERROR: Schema error: No field named metrics."avg(metrics.value)". Did you 
mean 'avg(metrics.value)'?.
   
   Trying to resolve binary expression: metrics.avg(metrics.value) / 1024
   ✗ ERROR: Schema error: No field named metrics."avg(metrics.value)". Did you 
mean 'avg(metrics.value)'?.
   
   With the fix:
   
   Post-aggregation schema has field: "avg(metrics.value)"
   
   Trying to resolve qualified column: metrics.avg(metrics.value)
   ✓ SUCCESS: Resolved to type Float64
   
   Trying to resolve binary expression: metrics.avg(metrics.value) / 1024
   ✓ SUCCESS: Resolved to type Float64
   
   The Issue
   
   The problem occurs when:
   1. An aggregation produces an unqualified output schema (e.g., 
avg(metrics.value) becomes just "avg(metrics.value)" without the table 
qualifier)
   2. Subsequent operations (like binary expressions) still reference the 
qualified column name (metrics."avg(metrics.value)")
   3. Schema resolution fails because the qualified name doesn't exist in the 
post-aggregation schema
   
   This pattern commonly occurs in query builders, ORMs, and SQL translation 
layers that maintain qualified references throughout the query pipeline for 
clarity and correctness.
   
   The Fix
   
   The fix adds a fallback mechanism in expr_schema.rs that:
   - First attempts to resolve the column with its qualifier
   - If that fails AND the column has a qualifier, tries resolving without the 
qualifier
   - Only returns an error if both attempts fail (preserving the original error 
message)
   
   This conservative approach maintains backward compatibility while enabling 
legitimate query patterns that were previously failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to