2010YOUY01 opened a new issue, #18407:
URL: https://github.com/apache/datafusion/issues/18407

   ### Is your feature request related to a problem or challenge?
   
   In a Nested Loop Join, the selectivity metric is defined as `output_rows / 
possible_combinations`. I believe this provides useful application-level 
insight.
   
   ### Example
   In the below query (run in `datafusion-cli`)
   ```
   > set datafusion.explain.analyze_level = summary;
   0 row(s) fetched.
   Elapsed 0.000 seconds.
   
   > explain analyze select *
   from generate_series(10) as t1(a)
   join generate_series(10) as t2(b)
   on (t1.a + t2.b) = 20;
   
   > explain analyze select *
   from generate_series(10) as t1(a)
   join generate_series(10) as t2(b)
   on (t1.a + t2.b) = 20;
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                            |
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | NestedLoopJoinExec: join_type=Inner, filter=a@0 + b@1 
= 20, metrics=[output_rows=1, elapsed_compute=709.665µs, output_bytes=128.0 KB] 
                                             |
   |                   |   ProjectionExec: expr=[value@0 as a], 
metrics=[output_rows=11, elapsed_compute=1.958µs, output_bytes=64.0 KB]         
                                                            |
   |                   |     LazyMemoryExec: partitions=1, 
batch_generators=[generate_series: start=0, end=10, batch_size=8192], 
metrics=[output_rows=11, elapsed_compute=13.083µs, output_bytes=64.0 KB]   |
   |                   |   RepartitionExec: partitioning=RoundRobinBatch(14), 
input_partitions=1, metrics=[]                                                  
                                              |
   |                   |     ProjectionExec: expr=[value@0 as b], 
metrics=[output_rows=11, elapsed_compute=584ns, output_bytes=64.0 KB]           
                                                          |
   |                   |       LazyMemoryExec: partitions=1, 
batch_generators=[generate_series: start=0, end=10, batch_size=8192], 
metrics=[output_rows=11, elapsed_compute=14.875µs, output_bytes=64.0 KB] |
   |                   |                                                        
                                                                                
                                            |
   
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.003 seconds.
   ```
   
   Logically NLJ will do
   ```
   for left_row in t1:
       for right_row in t2:
           if left_row.a + right_row.b = 20:
               output(left_row, right_row)
   ```
   The selectivity will be calculated as `output_size / (left_size * 
right_size)` => `1 / (10 * 10)` => `1%` 
   
   ### Describe the solution you'd like
   
   Add a `selectivity` metrics to `NestedLoopJoinExec`.
   Reference PR: https://github.com/apache/datafusion/pull/18406
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to