nuno-faria commented on code in PR #21715:
URL: https://github.com/apache/datafusion/pull/21715#discussion_r3213121882
##########
datafusion/sqllogictest/test_files/cte.slt:
##########
@@ -1319,3 +1319,34 @@ RESET datafusion.execution.enable_recursive_ctes;
statement ok
RESET datafusion.sql_parser.enable_ident_normalization;
+
+# Regression test: functional dependencies from the static (anchor) term of a
+# recursive CTE must NOT be propagated to the outer SubqueryAlias. The
+# recursive term can produce rows that violate any uniqueness constraint that
+# holds for the anchor alone. Without this guard, Filter(pk = const) on the
+# CTE result would be mis-identified as scalar (at most 1 row) and return only
+# one row instead of all matching rows.
+statement ok
+CREATE TABLE pk_table(id INT NOT NULL, val INT NOT NULL, PRIMARY KEY(id));
+
+statement ok
+INSERT INTO pk_table VALUES (1, 100), (2, 200);
+
+# The recursive term produces a second row with id=1 (val=300). Without the
+# FD fix, Filter(nodes.id = 1) would be deemed scalar and return only the
+# first matching row.
+query II rowsort
+WITH RECURSIVE nodes AS (
+ SELECT id, val FROM pk_table
+ UNION ALL
+ SELECT 1 AS id, 300 AS val
+ FROM nodes
+ WHERE nodes.id = 2
+)
+SELECT id, val FROM nodes WHERE id = 1
+----
+1 100
+1 300
Review Comment:
I'm not sure the filter will test the bug here. Shouldn't it be a group by
like `count(*) from nodes group by id, val`?
##########
datafusion/common/src/functional_dependencies.rs:
##########
@@ -196,15 +196,25 @@ impl FunctionalDependencies {
}
/// Creates a new `FunctionalDependencies` object from the given
constraints.
+ ///
+ /// `nullable_flags` must contain one entry per field in the relation,
+ /// indicating whether that field is nullable. A `UNIQUE` constraint whose
+ /// source columns include any nullable field is **not** a functional
+ /// dependency — because SQL treats `NULL` values as distinct, multiple
rows
+ /// may carry `NULL` in a unique-key column without violating the
constraint.
+ /// Such constraints are therefore omitted entirely. When all source
columns
+ /// are non-nullable a `UNIQUE` constraint is equivalent to a primary key
and
+ /// is recorded with `nullable = false`.
pub fn new_from_constraints(
constraints: Option<&Constraints>,
- n_field: usize,
+ nullable_flags: &[bool],
) -> Self {
Review Comment:
Since the signature of this pub function has been changed, I think we should
add an entry to the upgrade guide
(docs/source/library-user-guide/upgrading/54.0.0.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]