[ 
https://issues.apache.org/jira/browse/IMPALA-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730053#comment-16730053
 ] 

Paul Rogers commented on IMPALA-8015:
-------------------------------------

Here's an example, also from {{PlannerTest.testJoins()}}, that estimates a join 
incorrectly on the small side: 53 instead of 11K:
{noformat}
# hbase-hdfs join
select *
from functional.alltypesagg join functional_hbase.alltypessmall using (id, 
int_col)
---- PLAN
PLAN-ROOT SINK
|
02:HASH JOIN [INNER JOIN]
|  hash predicates: functional.alltypesagg.id = 
functional_hbase.alltypessmall.id, functional.alltypesagg.int_col = 
functional_hbase.alltypessmall.int_col
|  runtime filters: RF000 <- functional_hbase.alltypessmall.id, RF001 <- 
functional_hbase.alltypessmall.int_col
|  row-size=184B cardinality=53
|
|--01:SCAN HBASE [functional_hbase.alltypessmall]
|     row-size=89B cardinality=50
|
00:SCAN HDFS [functional.alltypesagg]
   partitions=11/11 files=11 size=814.73KB row-size=95B cardinality=11.00K
   runtime filters: RF000 -> functional.alltypesagg.id, RF001 -> 
functional.alltypesagg.int_col
{noformat}
We read 11K rows on the probe side, joined with 53 rows on the build side. The 
id column is unique (a PK), adding the int_col makes it "more unique." So, each 
of the 11K probe rows joins with (at most) one row on the build side. The 
correct output cardinality is 11K, not 53.

> Incorrect cardinality calculation for the generic case
> ------------------------------------------------------
>
>                 Key: IMPALA-8015
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8015
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Please see the background information in IMPALA-8014, including notation. The 
> material here is a version of that in [these 
> note|https://github.com/paul-rogers/impala/wiki/Planner].
> h4. Background
> Impala uses two distinct ways to estimate join cardinality: FK/PK and the 
> "generic" case. Both are in {{JoinNode.getJoinCardinality()}}.
> The generic case handles the M:N case, multiple rows in the left table join 
> with multiple rows in the right table. The {{getGenericJoinCardinality()}} 
> works out the estimated cardinality.
> h4. Deriving the Cardinality
> Assume a join of two tables, left ({{l}}) and right ({{r}}), with no 
> predicate. We have a Cartesian product:
> {noformat}
> |l >< r| = |l| * |r|
> {noformat}
> Suppose we have an equi-join predicate: {{l.a = r.b}} and that we have the 
> NDV values for both columns.
> Lets assume the left is the probe side. How many rows will it match? It will 
> match all the rows with a given value, or {{|r 𝜎 b=x|}}, (which we'll 
> abbreviate to {{|r.b=x|}} which is:
> {noformat}
> |r.b=x| = |r| / ndv(r.b)
> {noformat}
> Intuitively, the {{r.b}} keys are uniformly distributed, each left-hand row 
> matches a group of right-hand rows the size of which is given by the equation 
> above. If all values in r are unique, then {{|r| / ndv(r.b)}} is one and we 
> match one row. If all values are the same, {{|r| / ndv(r.b) = |r|}}, so we 
> match all rows (or none).
> The total number of output rows, to a first approximation, is:
> {noformat}
> |l >< r| = |l| * |r| / ndv(r.b)
> {noformat}
> Next, how many rows will appear on the left (probe) side? It is the output of 
> the previous join, or the scan of the table in that node, thus {{scan(l)}}.
> How many rows on the right? Impala requires that the right side be a table, 
> so the number is {{|scan(r)|}}. (Actually, either side can be a table scan, 
> but the reasoning is symmetrical.) If we randomly sample the {{r}} table to 
> get {{scan(r)}}, then the probability of getting any one row from r is:
> {noformat}
> p(scan(r) : row = x) = |scan(r)| / |r|
> {noformat}
> (Read this as the probability or a row x appearing in the scan result of r.)
> Let's check the math. Using the above, the total number of scanned rows is:
> {noformat}
> |scan(r)| = |r| * p(scan(r) : row = x)
>           = |r| * |scan(r)| / |r|
>           = |scan(r)|
> {noformat}
> The total number of keys in each group in the scanned result is the group 
> size adjusted for the probability that any one group member appears:
> {noformat}
> |scan(r.b)| = |r| / ndv(r.b) * p(r.b)
>             = |r| / ndv(r.b) * |scan(r)| / |r|
>             = |scan(r)| / ndv(r.b)
> {noformat}
> Intuitively, however may rows are scanned, they are still divided into the 
> same set of groups.
> Next we can work out the total join cardinality given the number of rows on 
> the left, and the number of matching rows on the right:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r.b)|
>          = |scan(l)| * |scan(r)| / ndv(r.b)
> {noformat}
> The above assumes that all rows from l match rows in r. But, if there are 
> more key values in l than r, some can't match, and visa-versa. What is the 
> probability that a row from l will find a match? We can reason it out like 
> this:
> * If there are fewer values in l.a than in r.b, we can assume all will match.
> * If there are more values in l.a than in r.b, we can assume we'll match all 
> values in l.a, then discard the extra values that don't match.
> The details are already worked out in IMPALA-8014. The probability is:
> {noformat}
> p(match) = ndv(l.a) / ndv(r.b) if ndv(l.a) < ndv(r.b),
>            ndv(r.b) / ndv(l.a) otherwise.
> {noformat}
> Intuitively, if there are more l rows than r, then a match is the ratio 
> between them and visa-versa.
> Let's check. If the ndv's are equal, the probability of a match is 1. If 
> either table is empty, the probability is 0. If ndv(l.a) is half that of 
> ndv(l.b) the probability is 0.5 and visa versa. All good.
> Putting it all together:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r)| / ndv(r.b) * p(match)
> {noformat}
> For the case that {{ndv(l.a) > ndv(r.b)}}:
> {noformat}
> |l >< r| = |scan(l)| * (|scan(r)| / ndv(r.b)) * (ndv(r.b) / ndv(l.a))
>          = |scan(l)| * |scan(r)| / ndv(l.a)
>          = (|scan(l)| / ndv(l.a)) * |scan(r)|
> {noformat}
> Think if this as the inverse of the above: every row in r matches the set of 
> rows in l with the same key value.
> h4. Code Bug
> The code uses the following:
> {code:java}
>       double lhsAdjNdv = slots.lhsNdv();
>       if (slots.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
> slots.lhsNumRows();
>       double rhsAdjNdv = slots.rhsNdv();
>       if (slots.rhsNumRows() > rhsCard) rhsAdjNdv *= rhsCard / 
> slots.rhsNumRows();
>       long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, 
> rhsAdjNdv))) *
>           rhsCard);
> {code}
> Where:
> * {{lhsCard}} is the output cardinality of the l (previous join) table or 
> {{|l|}}
> * {{rhsCard}} is the output cardinality of the r (scanned) table or 
> {{|scan(r)|}}
> * {slots.lhsNdv()}} is the NDV of the left key or {{ndv(l.a)}}
> * {slots.lnsNumRows()}} is the cardinality of the LHS table, or {{|l|}}
> * {slots.rhsNdv()}} is the NDV of the right key or {{ndv(r.a)}}
> * {slots.rnsNumRows()}} is the cardinality of the LHS table, or {{|r|}}
> Translated to our notation:
> {noformat}
> adjNdv(l.a) = ndv(l.a) * ( |scan(l)| / |l| if |l| > |scan(l)|, 1 otherwise )
> adjNdv(r.b) = ndv(r.b) * ( |scan(r)| / |r| if |r| > |scan(r)|, 1 otherwise )
> |join| = |scan(l)| * |scan(r)| / max( adjNdv(l.a), adjNdv(r.b) )
> {noformat}
> Since {{|scan\(x)| <= |x|}} we can reduce the above to:
> {noformat}
> adjNdv(l.a) = ndv(l.a) * |scan(l)| / |l|
> adjNdv(r.b) = ndv(r.b) * |scan(r)| / |r|
> |join| = |scan(l)| * |scan(r)| / max( adjNdv(l.a), adjNdv(r.b) )
> {noformat}
> For one case:
> {noformat}
> |join| = |scan(l)| * |scan(r)| / adjNdv(l.a)
>        = |scan(l)| * |scan(r)| / (ndv(l.a) * |scan(l)| / |l|)
>        = |scan(l)| * |scan(r)| * |l| / (ndv(l.a) * |scan(l)|)
>        = |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
> {noformat}
> Comparing the two results
> {noformat}
> |l >< r|  = |scan(l)| * |scan(r)| / ndv(l.a)
>          != |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
> {noformat}
> The only way we get the code's answer is if we assume {{|l| / |scan(r)| = 1}} 
> which seems unlikely. That is, the code applies an unnecessary multiplication 
> factor that increases as the ratio of scanned rows decreases.
> So, a bug. 
> h4. Worked Example
> Let's check.
> * {{|l| = 10000}}
> * {{|r| = 100}}
> * {{scan(l) = 3000}}
> * {{scan(r) = 50}}
> * {{ndv(l.a) = 40}}
> * {{ndv(r.a) = 20}}
> Let's work this out intuitively first. The join is the normal cross join with 
> adjustments:
> {noformat}
> |l >< r| = |l| * |r| * adjustments
> {noformat}
> Adjustments:
> * The left side scan returns only 30% of the total rows. 
> * The left hand has twice the key values of the right, so we have to discard 
> half of the left rows.
> * Of the l keys with a match in r, it will match, on average, 100/20 = 5 rows.
> * But half of r is filtered out, reducing the available matching rows.
> And:
> {noformat}
> |l >< r| = (|l| / 2) * (|r| / 20 / 2)
>          = 10000 * 30% / 2 * 100 * / 20 / 2
>          = 1500 * 2.5
>          = 3750
> {noformat}
> Using the formula derived above:
> {noformat}
> p(match) = ndv(r.b) / ndv(l.a) if ndv(r.b) < ndv(l.a),
>            ndv(l.a) / ndv(r.b) otherwise
> |l >< r| = |scan(l)| * |scan(r)| * (|r| / ndv(r.b)) * p(match)
>          = |scan(l)| * |scan(r)| * (|r| / ndv(r.b)) * (ndv(r.b) / ndv(l.a))
>          = |scan(l)| * |scan(r)| * |r| / ndv(l.a)
>          = 3000 * 50 * 100 / 40
>          = 3750
> {noformat}
> Which all works out.
> Now let's check the code's formula:
> {noformat}
> |l >< r| = |scan(l)| * |scan(r)| * (|l| / |scan(l)|) / ndv(l.a)
>          = (|scan(l)| / |scan(l)|) * |scan(r)| * |l| / ndv(l.a)
>          = |scan(r)| * |l| / ndv(l.a)
>          = 50 * 10,000 / 50
>          = 10,000
> {noformat}
> Which does not quite work out, verifying the bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to