advancedxy commented on PR #9233:
URL: https://github.com/apache/iceberg/pull/9233#issuecomment-1847450030
>
```
== Physical Plan ==
ReplaceData (13)
+- * Sort (12)
+- * Project (11)
+- MergeRows (10)
+- SortMergeJoin FullOuter (9) <---- Full Outer here
```
If the join type is full outer, it means that there are NoMatchedActions. So
your merge into command should have an `when not matched` clause, is that
correct?
>
```(1) BatchScan target
Output [60]: [..., _file#2279]
target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02'
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307'
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD)
IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD)
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))),
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
(5) BatchScan source
Output [60]: [...]
source (branch=null) [filters=, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
(14) BatchScan target
Output [8]: [..., _file#2590]
target (branch=null) [filters=((((MEAS_YM = '202306' AND ((MEAS_DD = '02'
AND bucket[4](POD) IN (0, 2, 3)) OR MEAS_DD = '01')) OR ((MEAS_YM = '202307'
AND MEAS_DD = '02') AND bucket[4](POD) IN (1, 3))) OR ((MEAS_YM = '202306' AND
MEAS_DD = '03') OR ((MEAS_YM = '202308' AND MEAS_DD = '01') AND bucket[4](POD)
IN (0, 1, 2)))) OR ((MEAS_DD = '03' AND ((MEAS_YM = '202307' AND bucket[4](POD)
IN (0, 1, 2)) OR (MEAS_YM = '202308' AND bucket[4](POD) IN (0, 3)))) OR
(((MEAS_YM = '202307' AND MEAS_DD = '01') AND bucket[4](POD) IN (0, 1, 2)) OR
((MEAS_YM = '202308' AND MEAS_DD = '02') AND bucket[4](POD) = 3)))), POD IS NOT
NULL, MEAS_YM IS NOT NULL, MEAS_DD IS NOT NULL, MAGNITUDE IS NOT NULL,
METER_KEY IS NOT NULL, REC_ID IS NOT NULL, COLLECT_ID IS NOT NULL,
groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
(18) BatchScan source
Output [7]: [...]
source (branch=null) [filters=POD IS NOT NULL, MEAS_YM IS NOT NULL, MEAS_DD
IS NOT NULL, MAGNITUDE IS NOT NULL, METER_KEY IS NOT NULL, REC_ID IS NOT NULL,
COLLECT_ID IS NOT NULL, groupedBy=MEAS_YM, MEAS_DD, POD_bucket]
```
Could you give the full plan tree or dag for this changed plan? Is the join
type still full outer? This is quite strange. I'm not sure why Filter would
be pushed down to the data source for a full outer join. You may set
`spark.sql.planChangeLog.level` to `INFO` to get which rule changes the plan,
and posted related plan changes in a gist, that would help to clarify the
problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]