2010YOUY01 opened a new issue, #17611:
URL: https://github.com/apache/datafusion/issues/17611

   ### Is your feature request related to a problem or challenge?
   
   @comphead has come up with this great idea: 
https://github.com/apache/datafusion/pull/17319#discussion_r2354215736
   
   If we re-write `IS NOT DISTINCT FROM` into equality in hash join (see 
write-up in https://github.com/apache/datafusion/pull/17319 for details), it's 
better to mark them in the EXPLAIN output, otherwise it might freak out careful 
users.
   
   ### Describe the solution you'd like
   
   I think we don't have to change the output for default behavior (null does 
not equal to null), however if we modify that behavior from the above-mentioned 
optimization, it should be marked in the join executors. (Hash Join and Sort 
Merge Join supports this feature)
   
   ```sh
   DataFusion CLI v49.0.2
   > create table t1(v1 int);
   0 row(s) fetched.
   Elapsed 0.025 seconds.
   
   > create table t2(v1 int);
   0 row(s) fetched.
   Elapsed 0.001 seconds.
   
   > EXPLAIN SELECT *
   FROM t1
   JOIN t2 ON t1.v1 IS NOT DISTINCT FROM t2.v1;
   
+---------------+------------------------------------------------------------+
   | plan_type     | plan                                                       
|
   
+---------------+------------------------------------------------------------+
   | physical_plan | ┌───────────────────────────┐                              
|
   |               | │    CoalesceBatchesExec    │                              
|
   |               | │    --------------------   │                              
|
   |               | │     target_batch_size:    │                              
|
   |               | │            8192           │                              
|
   |               | └─────────────┬─────────────┘                              
|
   |               | ┌─────────────┴─────────────┐                              
|
   |               | │        HashJoinExec       │                              
|
   |               | │    --------------------   ├──────────────┐               
|
   |               | │       on: (v1 = v1)       │              │               
|
   |               | └─────────────┬─────────────┘              │               
|
   |               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ 
|
   |               | │       DataSourceExec      ││       DataSourceExec      │ 
|
   |               | │    --------------------   ││    --------------------   │ 
|
   |               | │          bytes: 0         ││          bytes: 0         │ 
|
   |               | │       format: memory      ││       format: memory      │ 
|
   |               | │          rows: 0          ││          rows: 0          │ 
|
   |               | └───────────────────────────┘└───────────────────────────┘ 
|
   |               |                                                            
|
   
+---------------+------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.032 seconds.
   
   > set datafusion.explain.format = indent;
   0 row(s) fetched.
   Elapsed 0.000 seconds.
   
   > EXPLAIN SELECT *
   FROM t1
   JOIN t2 ON t1.v1 IS NOT DISTINCT FROM t2.v1;
   
+---------------+----------------------------------------------------------------------+
   | plan_type     | plan                                                       
          |
   
+---------------+----------------------------------------------------------------------+
   | logical_plan  | Inner Join: t1.v1 = t2.v1                                  
          |
   |               |   TableScan: t1 projection=[v1]                            
          |
   |               |   TableScan: t2 projection=[v1]                            
          |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
          |
   |               |   HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(v1@0, v1@0)] |
   |               |     DataSourceExec: partitions=1, partition_sizes=[0]      
          |
   |               |     DataSourceExec: partitions=1, partition_sizes=[0]      
          |
   |               |                                                            
          |
   
+---------------+----------------------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.002 seconds.
   ```
   
   We can add a new entry `Null Equality: NULL equals NULL` in the HashJoinExec 
node
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to