[ https://issues.apache.org/jira/browse/IGNITE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Ozerov updated IGNITE-8732: ------------------------------------ Description: *Steps to reproduce* # Run {{org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned}} # Observe that we have 2x results on 2-node cluster *Root Cause* {{left LEFT JOIN right ON cond}} operation assumes full scan of of a left expression. Currently we perform this scan on every node and then simply merge results on reducer. Two nodes, two scans of {{REPLICATED}} cache, 2x results. *Solution* We may consider several solutions. Deeper analysis is required to understand which is the right one. # Perform deduplication on reducer # Treat {{REPLICATED}} cache as {{PARTITIONED}}. Essentially, we just need to pass proper backup filter. But what if {{REPLICATED}} cache spans more nodes than {{PARTITIONED}}? We cannot rely on primary/backup in this case # Implement additional execution phase as follows: {code} SELECT left.cols, right.cols FROM left INNER JOIN right ON cond; // Get "inner join" part UNION UNICAST SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids from the first phase]) // Get "outer join" part {code} was: *Steps to reproduce* # Run {{org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned}} # Observe that we have 2x results on 2-node cluster *Root Cause* {{left LEFT JOIN right ON cond}} operation assumes full scan of of a left expression. Currently we perform this scan on every node and then simply merge results on reducer. Two nodes, two scans of {{REPLICATED}} cache, 2x results. *Solution* We may consider several solutions. Deeper analysis is required to understand which is the right one. # Perform deduplication on reducer # Treat {{REPLICATED}} cache as {{PARTITIONED}}. Essentially, we just need to pass proper backup filter. But what if {{REPLICATED}} cache spans more nodes than {{PARTITIONED}}? We cannot rely on primary/backup in this case # Implement additional execution phase as follows: {code} SELECT left.cols, right.cols FROM left INNER JOIN right ON cond; // Get "inner join" part UNION SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids from the first phase]) // Get "outer join" part {code} > SQL: REPLICATED cache cannot be left-joined to PARTITIONED > ---------------------------------------------------------- > > Key: IGNITE-8732 > URL: https://issues.apache.org/jira/browse/IGNITE-8732 > Project: Ignite > Issue Type: Bug > Components: sql > Affects Versions: 2.5 > Reporter: Vladimir Ozerov > Priority: Major > > *Steps to reproduce* > # Run > {{org.apache.ignite.sqltests.ReplicatedSqlTest#testLeftJoinReplicatedPartitioned}} > # Observe that we have 2x results on 2-node cluster > *Root Cause* > {{left LEFT JOIN right ON cond}} operation assumes full scan of of a left > expression. Currently we perform this scan on every node and then simply > merge results on reducer. Two nodes, two scans of {{REPLICATED}} cache, 2x > results. > *Solution* > We may consider several solutions. Deeper analysis is required to understand > which is the right one. > # Perform deduplication on reducer > # Treat {{REPLICATED}} cache as {{PARTITIONED}}. Essentially, we just need to > pass proper backup filter. But what if {{REPLICATED}} cache spans more nodes > than {{PARTITIONED}}? We cannot rely on primary/backup in this case > # Implement additional execution phase as follows: > {code} > SELECT left.cols, right.cols FROM left INNER JOIN right ON cond; > // Get "inner join" part > UNION > UNICAST SELECT left.cols, [NULL].cols FROM left WHERE left.id NOT IN ([ids > from the first phase]) // Get "outer join" part > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)