[ 
https://issues.apache.org/jira/browse/IMPALA-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730018#comment-16730018
 ] 

Paul Rogers commented on IMPALA-8015:
-------------------------------------

Here is an example from {{PlannerTest.testJoins()}} that illustrates the 
incorrect math. This uses the cardinality estimates added in IMPALA-8021

{noformat}
# Tests that the partitioned join between a and b exploits the existing
# data partition of its rhs input.
select * from functional.alltypes a
inner join [shuffle]
  (select count(*), int_col, bool_col
   from functional.alltypes group by int_col, bool_col) b
on (a.int_col = b.int_col and b.bool_col = a.bool_col)
---- PLAN
...
03:HASH JOIN [INNER JOIN]
|  hash predicates: a.bool_col = bool_col, a.int_col = int_col
|  runtime filters: RF000 <- bool_col, RF001 <- int_col
|  row-size=102B cardinality=14.60K
|
|--02:AGGREGATE [FINALIZE]
|  |  output: count(*)
|  |  group by: int_col, bool_col
|  |  row-size=13B cardinality=20
|  |
|  01:SCAN HDFS [functional.alltypes]
|     partitions=24/24 files=24 size=478.45KB row-size=5B cardinality=7.30K
{noformat}

The probe table has 7K rows. The AGGREGATION node has 20 rows. We know its keys 
must be unique, by definition. We then join on these unique keys. The most rows 
that could pass through the join is all 7K. Yet, the hash join cardinality 
estimate is twice that: 14K.

> Incorrect cardinality calculation for the generic case
> ------------------------------------------------------
>
>                 Key: IMPALA-8015
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8015
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Please see the background information in IMPALA-8014, including notation. The 
> material here is a version of that in [these 
> note|https://github.com/paul-rogers/impala/wiki/Planner].
> h4. Background
> Impala uses two distinct ways to estimate join cardinality: FK/PK and the 
> "generic" case. Both are in {{JoinNode.getJoinCardinality()}}.
> The generic case handles the M:N case, multiple rows in the left table join 
> with multiple rows in the right table. The {{getGenericJoinCardinality()}} 
> works out the estimated cardinality.
> h4. Deriving the Cardinality
> Assume a join of two tables, left ({{l}}) and right ({{r}}), with no 
> predicate. We have a Cartesian product:
> {noformat}
> |l >< r| = |l| * |r|
> {noformat}
> Suppose we have an equi-join predicate: {{l.a = r.b}} and that we have the 
> NDV values for both columns.
> Lets assume the left is the probe side. How many rows will it match? It will 
> match all the rows with a given value, or {{|r 𝜎 b=x|}}, (which we'll 
> abbreviate to {{|r.b=x|}} which is:
> {noformat}
> |r.b=x| = |r| / ndv(r.b)
> {noformat}
> Intuitively, the {{r.b}} keys are uniformly distributed, each left-hand row 
> matches a group of right-hand rows the size of which is given by the equation 
> above. If all values in r are unique, then {{|r| / ndv(r.b)}} is one and we 
> match one row. If all values are the same, {{|r| / ndv(r.b) = |r|}}, so we 
> match all rows (or none).
> The total number of output rows, to a first approximation, is:
> {noformat}
> |l >< r| = |l| * |r| / ndv(r.b)
> {noformat}
> Next, how many rows will appear on the left (probe) side? It is the output of 
> the previous join, or the scan of the table in that node, thus {{scan(l)}}.
> How many rows on the right? Impala requires that the right side be a table, 
> so the number is {{|scan(r)|}}. (Actually, either side can be a table scan, 
> but the reasoning is symmetrical.) If we randomly sample the {{r}} table to 
> get {{scan(r)}}, then the probability of getting any one row from r is:
> {noformat}
> p(scan(r) : row = x) = |scan(r)| / |r|
> {noformat}
> (Read this as the probability or a row x appearing in the scan result of r.)
> Let's check the math. Using the above, the total number of scanned rows is:
> {noformat}
> |scan(r)| = |r| * p(scan(r) : row = x)
>           = |r| * |scan(r)| / |r|
>           = |scan(r)|
> {noformat}
> The total number of keys in each group in the scanned result is the group 
> size adjusted for the probability that any one group member appears:
> {noformat}
> |scan(r.b)| = |r| / ndv(r.b) * p(r.b)
>             = |r| / ndv(r.b) * |scan(r)| / |r|
>             = |scan(r)| / ndv(r.b)
> {noformat}
> Intuitively, however may rows are scanned, they are still divided into the 
> same set of groups.
> Next we can work out the total join cardinality given the number of rows on 
> the left, and the number of matching rows on the right:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r.b)|
>          = |scan(l)| * |scan(r)| / ndv(r.b)
> {noformat}
> The above assumes that all rows from l match rows in r. But, if there are 
> more key values in l than r, some can't match, and visa-versa. What is the 
> probability that a row from l will find a match? We can reason it out like 
> this:
> * If there are fewer values in l.a than in r.b, we can assume all will match.
> * If there are more values in l.a than in r.b, we can assume we'll match all 
> values in l.a, then discard the extra values that don't match.
> The details are already worked out in IMPALA-8014. The probability is:
> {noformat}
> p(match) = ndv(l.a) / ndv(r.b) if ndv(l.a) < ndv(r.b),
>            ndv(r.b) / ndv(l.a) otherwise.
> {noformat}
> Intuitively, if there are more l rows than r, then a match is the ratio 
> between them and visa-versa.
> Let's check. If the ndv's are equal, the probability of a match is 1. If 
> either table is empty, the probability is 0. If ndv(l.a) is half that of 
> ndv(l.b) the probability is 0.5 and visa versa. All good.
> Putting it all together:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r)| / ndv(r.b) * p(match)
> {noformat}
> For the case that {{ndv(l.a) > ndv(r.b)}}:
> {noformat}
> |l >< r| = |scan(l)| * (|scan(r)| / ndv(r.b)) * (ndv(r.b) / ndv(l.a))
>          = |scan(l)| * |scan(r)| / ndv(l.a)
>          = (|scan(l)| / ndv(l.a)) * |scan(r)|
> {noformat}
> Think if this as the inverse of the above: every row in r matches the set of 
> rows in l with the same key value.
> h4. Code Bug
> The code uses the following:
> {code:java}
>       double lhsAdjNdv = slots.lhsNdv();
>       if (slots.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
> slots.lhsNumRows();
>       double rhsAdjNdv = slots.rhsNdv();
>       if (slots.rhsNumRows() > rhsCard) rhsAdjNdv *= rhsCard / 
> slots.rhsNumRows();
>       long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, 
> rhsAdjNdv))) *
>           rhsCard);
> {code}
> Where:
> * {{lhsCard}} is the output cardinality of the l (previous join) table or 
> {{|l|}}
> * {{rhsCard}} is the output cardinality of the r (scanned) table or 
> {{|scan(r)|}}
> * {slots.lhsNdv()}} is the NDV of the left key or {{ndv(l.a)}}
> * {slots.lnsNumRows()}} is the cardinality of the LHS table, or {{|l|}}
> * {slots.rhsNdv()}} is the NDV of the right key or {{ndv(r.a)}}
> * {slots.rnsNumRows()}} is the cardinality of the LHS table, or {{|r|}}
> Translated to our notation:
> {noformat}
> adjNdv(l.a) = ndv(l.a) * ( |scan(l)| / |l| if |l| > |scan(l)|, 1 otherwise )
> adjNdv(r.b) = ndv(r.b) * ( |scan(r)| / |r| if |r| > |scan(r)|, 1 otherwise )
> |join| = |scan(l)| * |scan(r)| / max( adjNdv(l.a), adjNdv(r.b) )
> {noformat}
> Since {{|scan\(x)| <= |x|}} we can reduce the above to:
> {noformat}
> adjNdv(l.a) = ndv(l.a) * |scan(l)| / |l|
> adjNdv(r.b) = ndv(r.b) * |scan(r)| / |r|
> |join| = |scan(l)| * |scan(r)| / max( adjNdv(l.a), adjNdv(r.b) )
> {noformat}
> For one case:
> {noformat}
> |join| = |scan(l)| * |scan(r)| / adjNdv(l.a)
>        = |scan(l)| * |scan(r)| / (ndv(l.a) * |scan(l)| / |l|)
>        = |scan(l)| * |scan(r)| * |l| / (ndv(l.a) * |scan(l)|)
>        = |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
> {noformat}
> Comparing the two results
> {noformat}
> |l >< r|  = |scan(l)| * |scan(r)| / ndv(l.a)
>          != |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
> {noformat}
> The only way we get the code's answer is if we assume {{|l| / |scan(r)| = 1}} 
> which seems unlikely. That is, the code applies an unnecessary multiplication 
> factor that increases as the ratio of scanned rows decreases.
> So, a bug. 
> h4. Worked Example
> Let's check.
> * {{|l| = 10000}}
> * {{|r| = 100}}
> * {{scan(l) = 3000}}
> * {{scan(r) = 50}}
> * {{ndv(l.a) = 40}}
> * {{ndv(r.a) = 20}}
> Let's work this out intuitively first. The join is the normal cross join with 
> adjustments:
> {noformat}
> |l >< r| = |l| * |r| * adjustments
> {noformat}
> Adjustments:
> * The left side scan returns only 30% of the total rows. 
> * The left hand has twice the key values of the right, so we have to discard 
> half of the left rows.
> * Of the l keys with a match in r, it will match, on average, 100/20 = 5 rows.
> * But half of r is filtered out, reducing the available matching rows.
> And:
> {noformat}
> |l >< r| = (|l| / 2) * (|r| / 20 / 2)
>          = 10000 * 30% / 2 * 100 * / 20 / 2
>          = 1500 * 2.5
>          = 3750
> {noformat}
> Using the formula derived above:
> {noformat}
> p(match) = ndv(r.b) / ndv(l.a) if ndv(r.b) < ndv(l.a),
>            ndv(l.a) / ndv(r.b) otherwise
> |l >< r| = |scan(l)| * |scan(r)| * (|r| / ndv(r.b)) * p(match)
>          = |scan(l)| * |scan(r)| * (|r| / ndv(r.b)) * (ndv(r.b) / ndv(l.a))
>          = |scan(l)| * |scan(r)| * |r| / ndv(l.a)
>          = 3000 * 50 * 100 / 40
>          = 3750
> {noformat}
> Which all works out.
> Now let's check the code's formula:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
>          = (|scan(l)| / |scan(l)|) * |scan(r)| * |l| / ndv(l.a)
>          = |scan(r)| * |l| / ndv(l.a)
>          = 50 * 10,000 / 50
>          = 10,000
> {noformat}
> Which does not quite work out, verifying the bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to