eejbyfeldt commented on PR #11232:
URL: https://github.com/apache/datafusion/pull/11232#issuecomment-2206921868

   > > @viirya Good suggestion. I tried added some sqllogictest, they work. But 
as far as I can tell they are not using HashJoinExec (even if the exact same 
query joining on only field is). Is there some way to force a HashJoin or help 
me understand why hashjoin is not used?
   > 
   > Hmm, what join operator it is using? I think HashJoin is used by default.
   
   Just using `=` and an inner join. Here are the same joins in datafusion-cli. 
The one using struct uses `NestedLoopJoin` while the one using the id direcly 
uses the HashJoin
   
   ```
   > CREATE TABLE join_t3(s3 struct<id INT>)
     AS VALUES
     (NULL),
     (struct(1)),
     (struct(2));
   
   0 row(s) fetched. 
   Elapsed 0.003 seconds.
   
   > CREATE TABLE join_t4(s4 struct<id INT>)
     AS VALUES
     (NULL),
     (struct(2)),
     (struct(3));
   
   0 row(s) fetched. 
   Elapsed 0.002 seconds.
   
   > explain analyze select join_t3.s3, join_t4.s4
   from join_t3
   inner join join_t4 on join_t3.s3 = join_t4.s4;
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                                                                
                    |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | NestedLoopJoinExec: join_type=Inner, filter=s3@0 = 
s4@1, metrics=[output_rows=1, output_batches=1, build_input_batches=1, 
input_rows=3, build_input_rows=3, input_batches=1, build_mem_used=266, 
join_time=270.817µs, build_time=34.164µs] |
   |                   |   MemoryExec: partitions=1, partition_sizes=[1], 
metrics=[]                                                                      
                                                                                
                          |
   |                   |   MemoryExec: partitions=1, partition_sizes=[1], 
metrics=[]                                                                      
                                                                                
                          |
   |                   |                                                        
                                                                                
                                                                                
                    |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.002 seconds.
   
   > explain analyze select join_t3.s3, join_t4.s4
   from join_t3
   inner join join_t4 on join_t3.s3.id = join_t4.s4.id;
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                                                                
                                                                                
    |
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=1, elapsed_compute=13.975µs]                               
                                                                                
                                                                                
              |
   |                   |   HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(join_t3.s3[id]@1, join_t4.s4[id]@1)], projection=[s3@0, s4@2], 
metrics=[output_rows=1, output_batches=3, build_input_batches=3, input_rows=3, 
build_input_rows=3, input_batches=3, build_mem_used=2146, join_time=141.757µs, 
build_time=647.423µs] |
   |                   |     CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=3, elapsed_compute=68.41µs]                                
                                                                                
                                                                                
          |
   |                   |       RepartitionExec: 
partitioning=Hash([join_t3.s3[id]@1], 16), input_partitions=1, 
metrics=[fetch_time=50.816µs, repart_time=84.96µs, send_time=16.675µs]          
                                                                                
                                                     |
   |                   |         ProjectionExec: expr=[s3@0 as s3, 
get_field(s3@0, id) as join_t3.s3[id]], metrics=[output_rows=3, 
elapsed_compute=28.443µs]                                                       
                                                                                
                                 |
   |                   |           MemoryExec: partitions=1, 
partition_sizes=[1], metrics=[]                                                 
                                                                                
                                                                                
                       |
   |                   |     CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=3, elapsed_compute=129.626µs]                              
                                                                                
                                                                                
          |
   |                   |       RepartitionExec: 
partitioning=Hash([join_t4.s4[id]@1], 16), input_partitions=1, 
metrics=[fetch_time=9.288µs, repart_time=36.86µs, send_time=11.054µs]           
                                                                                
                                                     |
   |                   |         ProjectionExec: expr=[s4@0 as s4, 
get_field(s4@0, id) as join_t4.s4[id]], metrics=[output_rows=3, 
elapsed_compute=4.569µs]                                                        
                                                                                
                                 |
   |                   |           MemoryExec: partitions=1, 
partition_sizes=[1], metrics=[]                                                 
                                                                                
                                                                                
                       |
   |                   |                                                        
                                                                                
                                                                                
                                                                                
    |
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.003 seconds.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to