2010YOUY01 opened a new issue, #17611: URL: https://github.com/apache/datafusion/issues/17611
### Is your feature request related to a problem or challenge? @comphead has come up with this great idea: https://github.com/apache/datafusion/pull/17319#discussion_r2354215736 If we re-write `IS NOT DISTINCT FROM` into equality in hash join (see write-up in https://github.com/apache/datafusion/pull/17319 for details), it's better to mark them in the EXPLAIN output, otherwise it might freak out careful users. ### Describe the solution you'd like I think we don't have to change the output for default behavior (null does not equal to null), however if we modify that behavior from the above-mentioned optimization, it should be marked in the join executors. (Hash Join and Sort Merge Join supports this feature) ```sh DataFusion CLI v49.0.2 > create table t1(v1 int); 0 row(s) fetched. Elapsed 0.025 seconds. > create table t2(v1 int); 0 row(s) fetched. Elapsed 0.001 seconds. > EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.v1 IS NOT DISTINCT FROM t2.v1; +---------------+------------------------------------------------------------+ | plan_type | plan | +---------------+------------------------------------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ HashJoinExec │ | | | │ -------------------- ├──────────────┐ | | | │ on: (v1 = v1) │ │ | | | └─────────────┬─────────────┘ │ | | | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ | | | │ DataSourceExec ││ DataSourceExec │ | | | │ -------------------- ││ -------------------- │ | | | │ bytes: 0 ││ bytes: 0 │ | | | │ format: memory ││ format: memory │ | | | │ rows: 0 ││ rows: 0 │ | | | └───────────────────────────┘└───────────────────────────┘ | | | | +---------------+------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.032 seconds. > set datafusion.explain.format = indent; 0 row(s) fetched. Elapsed 0.000 seconds. > EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.v1 IS NOT DISTINCT FROM t2.v1; +---------------+----------------------------------------------------------------------+ | plan_type | plan | +---------------+----------------------------------------------------------------------+ | logical_plan | Inner Join: t1.v1 = t2.v1 | | | TableScan: t1 projection=[v1] | | | TableScan: t2 projection=[v1] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | HashJoinExec: mode=Partitioned, join_type=Inner, on=[(v1@0, v1@0)] | | | DataSourceExec: partitions=1, partition_sizes=[0] | | | DataSourceExec: partitions=1, partition_sizes=[0] | | | | +---------------+----------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.002 seconds. ``` We can add a new entry `Null Equality: NULL equals NULL` in the HashJoinExec node ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org