[ https://issues.apache.org/jira/browse/FLINK-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405065#comment-17405065 ]
wangzhihao commented on FLINK-23740: ------------------------------------ This SIM is currently depending on https://issues.apache.org/jira/browse/FLINK-23989 > SQL Full Outer Join bug > ----------------------- > > Key: FLINK-23740 > URL: https://issues.apache.org/jira/browse/FLINK-23740 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.13.1, 1.13.2 > Reporter: Fu Kai > Priority: Critical > > Hi team, > We encountered an issue about FULL OUTER JOIN of Flink SQL, which happens > occasionally at very low probability that join output records cannot be > correctly updated. We cannot locate the root cause for now by glancing at the > SQL join logic in > [StreamingJoinOperator.|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/join/stream/StreamingJoinOperator.java#L198] > It cannot be stably reproduced and it does happen with massive data volume. > The reason we suspect it's the FULL OUER join problem instead of others like > LEFT OUTER join is because the issue only arises after we introduced FULL > OUTER into the join flow. The query we are using is like the following. The > are two join code pieces below, the fist one contains solely left join(though > with nested) and there is no issue detected; the second one contains both > left and full outer join(nested as well), and the problem is that sometimes > update from the left table A(and other tables before the full outer join > operator) cannot be reflected in the final output. We suspect it could be the > introduce of full outer join that caused the problem, although at a very low > probability(~10 out of ~30million). > The root cause of the bug could be something else, the suspecting of FULL OUT > join is based on the result of our current experiment and observation. > {code:java} > create table A( > k1 int, > k2 int, > k3 int, > k4 int, > k5 int, > PRIMARY KEY (k1, k2, k3, k4, k5) NOT ENFORCED > ) WITH (); > create table B( > k1 int, > k2 int, > k3 int, > PRIMARY KEY (k1, k2, k3) NOT ENFORCED > ) WITH (); > create table C( > k1 int, > k2 int, > k3 int, > PRIMARY KEY (k1, k2, k3) NOT ENFORCED > ) WITH (); > create table D( > k1 int, > k2 int, > PRIMARY KEY (k1, k2) NOT ENFORCED > ) WITH (); > // query with left join, no issue detected > select * from A > left outer join > (select * from B > left outer join C > on > B.k1 = C.k1 > B.k2 = C.k2 > B.k3 = C.k3 > ) as BC > on > A.k1 = BC.k1 > A.k2 = BC.k2 > A.k3 = BC.k3 > left outer join D > on > A.k1 = D.k1 > A.k2 = D.k2 > ; > // query with full outer join combined with left outer join, record updates > from left table A cannot be updated in the final output record some times > select * from A > left outer join > (select * from B > full outer join C > on > B.k1 = C.k1 > B.k2 = C.k2 > B.k3 = C.k3 > ) as BC > on > A.k1 = BC.k1 > A.k2 = BC.k2 > A.k3 = BC.k3 > left outer join D > on > A.k1 = D.k1 > A.k2 = D.k2 > ; > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)