kosiew commented on code in PR #18814:
URL: https://github.com/apache/datafusion/pull/18814#discussion_r2542019520


##########
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -1099,6 +1099,92 @@ async fn simple_intersect_table_reuse() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn self_referential_intersect() -> Result<()> {
+    // Test INTERSECT with the same table on both sides
+    // This previously failed with "Schema contains duplicate qualified field 
name"
+    // The fix ensures requalify_sides_if_needed is called in 
intersect_or_except
+    // After roundtrip through Substrait, SubqueryAlias is lost and 
requalification
+    // produces "left" and "right" aliases
+    // Note: INTERSECT (without ALL) includes DISTINCT, but the outer Aggregate
+    // is optimized away, resulting in just the LeftSemi join
+    assert_expected_plan(
+        "SELECT a FROM data WHERE a > 0 INTERSECT SELECT a FROM data WHERE a < 
5",
+        "LeftSemi Join: left.a = right.a\
+        \n  SubqueryAlias: left\
+        \n    Aggregate: groupBy=[[data.a]], aggr=[[]]\
+        \n      Filter: data.a > Int64(0)\
+        \n        TableScan: data projection=[a], partial_filters=[data.a > 
Int64(0)]\
+        \n  SubqueryAlias: right\
+        \n    Filter: data.a < Int64(5)\
+        \n      TableScan: data projection=[a], partial_filters=[data.a < 
Int64(5)]",
+        true,
+    )
+    .await
+}
+
+#[tokio::test]
+async fn self_referential_except() -> Result<()> {
+    // Test EXCEPT with the same table on both sides
+    // This previously failed with "Schema contains duplicate qualified field 
name"
+    // The fix ensures requalify_sides_if_needed is called in 
intersect_or_except
+    // After roundtrip through Substrait, SubqueryAlias is lost and 
requalification
+    // produces "left" and "right" aliases
+    // Note: EXCEPT (without ALL) includes DISTINCT, but the outer Aggregate
+    // is optimized away, resulting in just the LeftAnti join
+    assert_expected_plan(
+        "SELECT a FROM data WHERE a > 0 EXCEPT SELECT a FROM data WHERE a < 5",
+        "LeftAnti Join: left.a = right.a\

Review Comment:
   The expected plans look almost identical in the test assertions, which is 
confusing. The key difference is actually in the **join type**, not the overall 
structure:
   
   - `self_referential_intersect` produces: `**LeftSemi** Join: left.a = 
right.a`
   - `self_referential_except` produces: `**LeftAnti** Join: left.a = right.a`
   
   The rest of the plan structure is identical because:
   1. Both operate on the same table (`data`) with similar filters
   2. Both include the DISTINCT operation (via `Aggregate: groupBy=[[data.a]]`) 
because neither uses `ALL`
   3. Both get requalified to `left` and `right` aliases due to the duplicate 
field name issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to