DDtKey commented on code in PR #14102:
URL: https://github.com/apache/datafusion/pull/14102#discussion_r1913426831
##########
datafusion/sql/tests/sql_integration.rs:
##########
@@ -4552,3 +4552,31 @@ fn
test_error_message_invalid_window_aggregate_function_signature() {
"Error during planning: sum does not support zero arguments",
);
}
+
+// Test issue: https://github.com/apache/datafusion/issues/14058
+// Select with wildcard over a USING/NATURAL JOIN should deduplicate condition
columns.
+#[test]
+fn test_using_join_wildcard_schema() {
+ let sql = "SELECT * FROM orders o1 JOIN orders o2 USING (order_id)";
+ let plan = logical_plan(sql).unwrap();
+ let count = plan
+ .schema()
+ .iter()
+ .filter(|(_, f)| f.name() == "order_id")
+ .count();
+ // Only one order_id column
+ assert_eq!(count, 1);
+
+ let sql = "SELECT * FROM orders o1 NATURAL JOIN orders o2";
+ let plan = logical_plan(sql).unwrap();
+ // Only columns from one join side should be present
+ let expected_fields = vec![
+ "o1.order_id".to_string(),
+ "o1.customer_id".to_string(),
+ "o1.o_item_id".to_string(),
+ "o1.qty".to_string(),
+ "o1.price".to_string(),
+ "o1.delivered".to_string(),
+ ];
+ assert_eq!(plan.schema().field_names(), expected_fields);
+}
Review Comment:
I'd insist on better test coverage:
- test-case where output (expected fields) contains at least 1 column from
second table. Just like in MRE #14058
- more complex select, e.g with `WITH` or subselect
- Join of >2 tables (?)
Because otherwise, another regression may happen easily
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]