[ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v updated IMPALA-7564:
------------------------------
    Description: 
With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
    // For each group, we compute the join NDV of the rhs slots and compare it 
to the
    // number of rows in the rhs table.
    for (List<EqJoinConjunctScanSlots> fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
      double jointNdv = 1.0;
      for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
      double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
      if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
        // We cannot disprove that the RHS is a PK.
        if (result == null) result = Lists.newArrayList();
        result.addAll(fkPkCandidate);
      }
    }
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form <slotRef> = <slotRef>)

{noformat}
/**
     * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
     * the given conjunct is not of the form <SlotRef> = <SlotRef> or if the 
underlying
     * table/column of at least one side is missing stats.
     */
    public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
      if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
      SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
      if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
      SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. (This happens 
because EqJoinConjunctScanSlots.create() returns null for any non-simple 
predicates which not considered later).

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-----------------------------------------------------------------------------------------+
| Explain String                                                                
          |
+-----------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                    
          |
| Per-Host Resource Estimates: Memory=66MB                                      
          |
|                                                                               
          |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
          |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1          |
| PLAN-ROOT SINK                                                                
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |                                                                             
          |
| 04:EXCHANGE [UNPARTITIONED]                                                   
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023                        
          |
| |  in pipelines: 00(GETNEXT)                                                  
          |
| |                                                                             
          |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
          |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2    |
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                     
          |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none                                                      
          |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023                        
          |
| |  in pipelines: 00(GETNEXT), 01(OPEN)                                        
          |
| |                                                                             
          |
| |--03:EXCHANGE [BROADCAST]                                                    
          |
| |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0                 
          |
| |  |  tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |  |  in pipelines: 01(GETNEXT)                                               
          |
| |  |                                                                          
          |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                             
          |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
| |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                               
          |
| |     partitions=1/1 files=1 size=8B                                          
          |
| |     stored statistics:                                                      
          |
| |       table: rows=2510 size=8B                                              
          |
| |       columns: all                                                          
          |
| |     extrapolated-rows=disabled max-scan-range-rows=2510                     
          |
| |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1        
          |
| |     tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |     in pipelines: 01(GETNEXT)                                               
          |
| |                                                                             
          |
| 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                   
          |
|    partitions=1/1 files=1 size=8B                                             
          |
|    stored statistics:                                                         
          |
|      table: rows=589658570 size=8B                                            
          |
|      columns: all                                                             
          |
|    extrapolated-rows=disabled max-scan-range-rows=589658570                   
          |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
          |
|    tuple-ids=0 row-size=47B cardinality=589658570                             
          |
|    in pipelines: 00(GETNEXT)                                                  
          |
+-----------------------------------------------------------------------------------------+
Fetched 48 row(s) in 0.02s
{noformat}

We should either consider NDVs of complex conjuncts (error-prone for obvious 
reasons) or generally assume a join to be FK/PK until and unless proven 
otherwise.

Here is the plan if the complex conjunct is relaxed to a simple <slotRef> = 
<slotRef> type. We can see that it detects the join to be of fk/pk type.

{noformat}
[localhost:21000] cdh72724> explain select * from test_left l left outer join 
test_right r on l.c1 = r.c1 and l.c2 = r.c2 and l.c3 = r.c3;
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and l.c3 = r.c3
+-----------------------------------------------------------------------------------------+
| Explain String                                                                
          |
+-----------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                    
          |
| Per-Host Resource Estimates: Memory=66MB                                      
          |
|                                                                               
          |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
          |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1          |
| PLAN-ROOT SINK                                                                
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |                                                                             
          |
| 04:EXCHANGE [UNPARTITIONED]                                                   
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |  tuple-ids=0,1N row-size=94B cardinality=589658570                          
          |
| |  in pipelines: 00(GETNEXT)                                                  
          |
| |                                                                             
          |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
          |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2    |
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                     
          |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                     
          |
| |  fk/pk conjuncts: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                     
          |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=589658570                          
          |
| |  in pipelines: 00(GETNEXT), 01(OPEN)                                        
          |
| |                                                                             
          |
| |--03:EXCHANGE [BROADCAST]                                                    
          |
| |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0                 
          |
| |  |  tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |  |  in pipelines: 01(GETNEXT)                                               
          |
| |  |                                                                          
          |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                             
          |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
| |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                               
          |
| |     partitions=1/1 files=1 size=8B                                          
          |
| |     stored statistics:                                                      
          |
| |       table: rows=2510 size=8B                                              
          |
| |       columns: all                                                          
          |
| |     extrapolated-rows=disabled max-scan-range-rows=2510                     
          |
| |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1        
          |
| |     tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |     in pipelines: 01(GETNEXT)                                               
          |
| |                                                                             
          |
| 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                   
          |
|    partitions=1/1 files=1 size=8B                                             
          |
|    stored statistics:                                                         
          |
|      table: rows=589658570 size=8B                                            
          |
|      columns: all                                                             
          |
|    extrapolated-rows=disabled max-scan-range-rows=589658570                   
          |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
          |
|    tuple-ids=0 row-size=47B cardinality=589658570                             
          |
|    in pipelines: 00(GETNEXT)                                                  
          |
+-----------------------------------------------------------------------------------------+
Fetched 48 row(s) in 0.01s
{noformat}

  was:
With IMPALA-5547, we predict whether a join is an FK/PK join as follows.

{noformat}
 // Iterate over all groups of conjuncts that belong to the same joined tuple 
id pair.
    // For each group, we compute the join NDV of the rhs slots and compare it 
to the
    // number of rows in the rhs table.
    for (List<EqJoinConjunctScanSlots> fkPkCandidate: 
scanSlotsByJoinedTids.values()) {
      double jointNdv = 1.0;
      for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
slots.rhsNdv();
      double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
      if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
FK_PK_MAX_STATS_DELTA_PERC))) {
        // We cannot disprove that the RHS is a PK.
        if (result == null) result = Lists.newArrayList();
        result.addAll(fkPkCandidate);
      }
    }
{noformat}

We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
their NDVs and check if it close to rhsNumRows. The issue here is that this can 
result in conservative FK/Pk detection if the equi-join conjuncts are not 
simple (of the form <slotRef> = <slotRef>)

{noformat}
/**
     * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
or null if
     * the given conjunct is not of the form <SlotRef> = <SlotRef> or if the 
underlying
     * table/column of at least one side is missing stats.
     */
    public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
      if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
      SlotDescriptor lhsScanSlot = eqJoinConjunct.getChild(0).findSrcScanSlot();
      if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
null;
      SlotDescriptor rhsScanSlot = eqJoinConjunct.getChild(1).findSrcScanSlot();
{noformat}

For example, the following query contains a complex equi-join conjunct 
{{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
outer join is an FK/PK, we just check if 
{{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. 

{noformat}
[localhost:21000]> explain select * from test_left l left outer join test_right 
r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6);
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
+-----------------------------------------------------------------------------------------+
| Explain String                                                                
          |
+-----------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                    
          |
| Per-Host Resource Estimates: Memory=66MB                                      
          |
|                                                                               
          |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
          |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1          |
| PLAN-ROOT SINK                                                                
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |                                                                             
          |
| 04:EXCHANGE [UNPARTITIONED]                                                   
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023                        
          |
| |  in pipelines: 00(GETNEXT)                                                  
          |
| |                                                                             
          |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
          |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2    |
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                     
          |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
substr(r.c3, 1, 6)   |
| |  fk/pk conjuncts: none                                                      
          |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=49334767023                        
          |
| |  in pipelines: 00(GETNEXT), 01(OPEN)                                        
          |
| |                                                                             
          |
| |--03:EXCHANGE [BROADCAST]                                                    
          |
| |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0                 
          |
| |  |  tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |  |  in pipelines: 01(GETNEXT)                                               
          |
| |  |                                                                          
          |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                             
          |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
| |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                               
          |
| |     partitions=1/1 files=1 size=8B                                          
          |
| |     stored statistics:                                                      
          |
| |       table: rows=2510 size=8B                                              
          |
| |       columns: all                                                          
          |
| |     extrapolated-rows=disabled max-scan-range-rows=2510                     
          |
| |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1        
          |
| |     tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |     in pipelines: 01(GETNEXT)                                               
          |
| |                                                                             
          |
| 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                   
          |
|    partitions=1/1 files=1 size=8B                                             
          |
|    stored statistics:                                                         
          |
|      table: rows=589658570 size=8B                                            
          |
|      columns: all                                                             
          |
|    extrapolated-rows=disabled max-scan-range-rows=589658570                   
          |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
          |
|    tuple-ids=0 row-size=47B cardinality=589658570                             
          |
|    in pipelines: 00(GETNEXT)                                                  
          |
+-----------------------------------------------------------------------------------------+
Fetched 48 row(s) in 0.02s
{noformat}

We should either consider NDVs of complex conjuncts (error-prone for obvious 
reasons) or generally assume a join to be FK/PK until and unless proven 
otherwise.

Here is the plan if the complex conjunct is relaxed to a simple <slotRef> = 
<slotRef> type. We can see that it detects the join to be of fk/pk type.

{noformat}
[localhost:21000] cdh72724> explain select * from test_left l left outer join 
test_right r on l.c1 = r.c1 and l.c2 = r.c2 and l.c3 = r.c3;
Query: explain select * from test_left l left outer join test_right r on l.c1 = 
r.c1 and l.c2 = r.c2 and l.c3 = r.c3
+-----------------------------------------------------------------------------------------+
| Explain String                                                                
          |
+-----------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                    
          |
| Per-Host Resource Estimates: Memory=66MB                                      
          |
|                                                                               
          |
| F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
          |
| |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
thread-reservation=1          |
| PLAN-ROOT SINK                                                                
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |                                                                             
          |
| 04:EXCHANGE [UNPARTITIONED]                                                   
          |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                    
          |
| |  tuple-ids=0,1N row-size=94B cardinality=589658570                          
          |
| |  in pipelines: 00(GETNEXT)                                                  
          |
| |                                                                             
          |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                
          |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
thread-reservation=2    |
| 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                     
          |
| |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                     
          |
| |  fk/pk conjuncts: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                     
          |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
thread-reservation=0 |
| |  tuple-ids=0,1N row-size=94B cardinality=589658570                          
          |
| |  in pipelines: 00(GETNEXT), 01(OPEN)                                        
          |
| |                                                                             
          |
| |--03:EXCHANGE [BROADCAST]                                                    
          |
| |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0                 
          |
| |  |  tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |  |  in pipelines: 01(GETNEXT)                                               
          |
| |  |                                                                          
          |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                             
          |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
thread-reservation=2 |
| |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                               
          |
| |     partitions=1/1 files=1 size=8B                                          
          |
| |     stored statistics:                                                      
          |
| |       table: rows=2510 size=8B                                              
          |
| |       columns: all                                                          
          |
| |     extrapolated-rows=disabled max-scan-range-rows=2510                     
          |
| |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1        
          |
| |     tuple-ids=1 row-size=47B cardinality=2510                               
          |
| |     in pipelines: 01(GETNEXT)                                               
          |
| |                                                                             
          |
| 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                   
          |
|    partitions=1/1 files=1 size=8B                                             
          |
|    stored statistics:                                                         
          |
|      table: rows=589658570 size=8B                                            
          |
|      columns: all                                                             
          |
|    extrapolated-rows=disabled max-scan-range-rows=589658570                   
          |
|    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1           
          |
|    tuple-ids=0 row-size=47B cardinality=589658570                             
          |
|    in pipelines: 00(GETNEXT)                                                  
          |
+-----------------------------------------------------------------------------------------+
Fetched 48 row(s) in 0.01s
{noformat}


> Conservative FK/PK join type detection with complex equi-join conjuncts
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-7564
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7564
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.12.0, Impala 2.13.0, Impala 3.1.0
>            Reporter: bharath v
>            Priority: Major
>
> With IMPALA-5547, we predict whether a join is an FK/PK join as follows.
> {noformat}
>  // Iterate over all groups of conjuncts that belong to the same joined tuple 
> id pair.
>     // For each group, we compute the join NDV of the rhs slots and compare 
> it to the
>     // number of rows in the rhs table.
>     for (List<EqJoinConjunctScanSlots> fkPkCandidate: 
> scanSlotsByJoinedTids.values()) {
>       double jointNdv = 1.0;
>       for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
> slots.rhsNdv();
>       double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
>       if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
> FK_PK_MAX_STATS_DELTA_PERC))) {
>         // We cannot disprove that the RHS is a PK.
>         if (result == null) result = Lists.newArrayList();
>         result.addAll(fkPkCandidate);
>       }
>     }
> {noformat}
> We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
> their NDVs and check if it close to rhsNumRows. The issue here is that this 
> can result in conservative FK/Pk detection if the equi-join conjuncts are not 
> simple (of the form <slotRef> = <slotRef>)
> {noformat}
> /**
>      * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
> or null if
>      * the given conjunct is not of the form <SlotRef> = <SlotRef> or if the 
> underlying
>      * table/column of at least one side is missing stats.
>      */
>     public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
>       if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
>       SlotDescriptor lhsScanSlot = 
> eqJoinConjunct.getChild(0).findSrcScanSlot();
>       if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
> null;
>       SlotDescriptor rhsScanSlot = 
> eqJoinConjunct.getChild(1).findSrcScanSlot();
> {noformat}
> For example, the following query contains a complex equi-join conjunct 
> {{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
> outer join is an FK/PK, we just check if 
> {{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. (This happens 
> because EqJoinConjunctScanSlots.create() returns null for any non-simple 
> predicates which not considered later).
> {noformat}
> [localhost:21000]> explain select * from test_left l left outer join 
> test_right r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = 
> substr(r.c3, 1,6);
> Query: explain select * from test_left l left outer join test_right r on l.c1 
> = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
> +-----------------------------------------------------------------------------------------+
> | Explain String                                                              
>             |
> +-----------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                  
>             |
> | Per-Host Resource Estimates: Memory=66MB                                    
>             |
> |                                                                             
>             |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>             |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1          |
> | PLAN-ROOT SINK                                                              
>             |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
>             |
> | |                                                                           
>             |
> | 04:EXCHANGE [UNPARTITIONED]                                                 
>             |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
>             |
> | |  tuple-ids=0,1N row-size=94B cardinality=49334767023                      
>             |
> | |  in pipelines: 00(GETNEXT)                                                
>             |
> | |                                                                           
>             |
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                              
>             |
> | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
> thread-reservation=2    |
> | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                   
>             |
> | |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, substr(l.c3, 1, 6) = 
> substr(r.c3, 1, 6)   |
> | |  fk/pk conjuncts: none                                                    
>             |
> | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0 |
> | |  tuple-ids=0,1N row-size=94B cardinality=49334767023                      
>             |
> | |  in pipelines: 00(GETNEXT), 01(OPEN)                                      
>             |
> | |                                                                           
>             |
> | |--03:EXCHANGE [BROADCAST]                                                  
>             |
> | |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0               
>             |
> | |  |  tuple-ids=1 row-size=47B cardinality=2510                             
>             |
> | |  |  in pipelines: 01(GETNEXT)                                             
>             |
> | |  |                                                                        
>             |
> | |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                           
>             |
> | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
> thread-reservation=2 |
> | |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                             
>             |
> | |     partitions=1/1 files=1 size=8B                                        
>             |
> | |     stored statistics:                                                    
>             |
> | |       table: rows=2510 size=8B                                            
>             |
> | |       columns: all                                                        
>             |
> | |     extrapolated-rows=disabled max-scan-range-rows=2510                   
>             |
> | |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1      
>             |
> | |     tuple-ids=1 row-size=47B cardinality=2510                             
>             |
> | |     in pipelines: 01(GETNEXT)                                             
>             |
> | |                                                                           
>             |
> | 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                 
>             |
> |    partitions=1/1 files=1 size=8B                                           
>             |
> |    stored statistics:                                                       
>             |
> |      table: rows=589658570 size=8B                                          
>             |
> |      columns: all                                                           
>             |
> |    extrapolated-rows=disabled max-scan-range-rows=589658570                 
>             |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>             |
> |    tuple-ids=0 row-size=47B cardinality=589658570                           
>             |
> |    in pipelines: 00(GETNEXT)                                                
>             |
> +-----------------------------------------------------------------------------------------+
> Fetched 48 row(s) in 0.02s
> {noformat}
> We should either consider NDVs of complex conjuncts (error-prone for obvious 
> reasons) or generally assume a join to be FK/PK until and unless proven 
> otherwise.
> Here is the plan if the complex conjunct is relaxed to a simple <slotRef> = 
> <slotRef> type. We can see that it detects the join to be of fk/pk type.
> {noformat}
> [localhost:21000] cdh72724> explain select * from test_left l left outer join 
> test_right r on l.c1 = r.c1 and l.c2 = r.c2 and l.c3 = r.c3;
> Query: explain select * from test_left l left outer join test_right r on l.c1 
> = r.c1 and l.c2 = r.c2 and l.c3 = r.c3
> +-----------------------------------------------------------------------------------------+
> | Explain String                                                              
>             |
> +-----------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=5                  
>             |
> | Per-Host Resource Estimates: Memory=66MB                                    
>             |
> |                                                                             
>             |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>             |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1          |
> | PLAN-ROOT SINK                                                              
>             |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
>             |
> | |                                                                           
>             |
> | 04:EXCHANGE [UNPARTITIONED]                                                 
>             |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
>             |
> | |  tuple-ids=0,1N row-size=94B cardinality=589658570                        
>             |
> | |  in pipelines: 00(GETNEXT)                                                
>             |
> | |                                                                           
>             |
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                              
>             |
> | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.95MB 
> thread-reservation=2    |
> | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST]                                   
>             |
> | |  hash predicates: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                   
>             |
> | |  fk/pk conjuncts: l.c1 = r.c1, l.c2 = r.c2, l.c3 = r.c3                   
>             |
> | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB 
> thread-reservation=0 |
> | |  tuple-ids=0,1N row-size=94B cardinality=589658570                        
>             |
> | |  in pipelines: 00(GETNEXT), 01(OPEN)                                      
>             |
> | |                                                                           
>             |
> | |--03:EXCHANGE [BROADCAST]                                                  
>             |
> | |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0               
>             |
> | |  |  tuple-ids=1 row-size=47B cardinality=2510                             
>             |
> | |  |  in pipelines: 01(GETNEXT)                                             
>             |
> | |  |                                                                        
>             |
> | |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                           
>             |
> | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
> thread-reservation=2 |
> | |  01:SCAN HDFS [cdh72724.test_right r, RANDOM]                             
>             |
> | |     partitions=1/1 files=1 size=8B                                        
>             |
> | |     stored statistics:                                                    
>             |
> | |       table: rows=2510 size=8B                                            
>             |
> | |       columns: all                                                        
>             |
> | |     extrapolated-rows=disabled max-scan-range-rows=2510                   
>             |
> | |     mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1      
>             |
> | |     tuple-ids=1 row-size=47B cardinality=2510                             
>             |
> | |     in pipelines: 01(GETNEXT)                                             
>             |
> | |                                                                           
>             |
> | 00:SCAN HDFS [cdh72724.test_left l, RANDOM]                                 
>             |
> |    partitions=1/1 files=1 size=8B                                           
>             |
> |    stored statistics:                                                       
>             |
> |      table: rows=589658570 size=8B                                          
>             |
> |      columns: all                                                           
>             |
> |    extrapolated-rows=disabled max-scan-range-rows=589658570                 
>             |
> |    mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1         
>             |
> |    tuple-ids=0 row-size=47B cardinality=589658570                           
>             |
> |    in pipelines: 00(GETNEXT)                                                
>             |
> +-----------------------------------------------------------------------------------------+
> Fetched 48 row(s) in 0.01s
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to