[ 
https://issues.apache.org/jira/browse/IMPALA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917317#comment-16917317
 ] 

Paul Rogers commented on IMPALA-8014:
-------------------------------------

Hi [~tarmstrong]. The change is somewhat isolated in the join calculation. As a 
result, you can include an if-statement to include the old or new calculations. 
However, as work proceeded, I discovered a widening net of errors: wrong 
cardinality estimates, failure to correct for correlated filters, etc. Still, 
these can be fixed one by one, each wrapped in an if-statement. (The problem 
is, at the end, you've got 2 * 2 * 2 cases, which means 8 different copies of 
the planer test Golden files.)

The work itself should be in my Github repo. There were also some patches with 
some of the fixes. I can point you to that work when you are ready to start on 
this.

This work was inspired by a very complex real-world star schema with correlated 
keys. The net result of all the fixes was that the resulting plan was close to 
ideal; much better than the somewhat random plan when the work started. Not 
sure if the artifacts of that work still exist; would have been on my dev. EC2 
instance.

> Incorrect M:1 (FK/PK) cardinality estimation
> --------------------------------------------
>
>                 Key: IMPALA-8014
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8014
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Tim Armstrong
>            Priority: Major
>
> The math is wrong when we estimate the cardinality of a join we've labeled as 
> "FK/PK" (commonly known as many-to-one or M:1.)
> h4. Join Estimation
> The standard calculation for joins is explained in Swami & Schiefer, [On the 
> Estimation of Join Result 
> Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
>  (S&S). Note especially section 3, Background. Also see [How Good Are Query 
> Optimizers, Really?|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] by Leis et 
> al.
> {noformat}
>                 |L'| * |R'| 
> |L' ⋈ R'| = ------------------
>             max(|L.k'|, |R.k'|)
> {noformat}
> Here:
> * {{|x|}} is the row cardinality if x is a relation, the NDV (domain 
> cardinality) if x is a column.
> * {{L}}, {{R}} are left and right input relations.
> * {{L.k}} and {{R.k}} are the (possibly compound) join keys.
> * {{x’}} is the result of applying a filter to relation or column {{x}}.
> Intuitively, the cardinality of the join is the Cartesian product reduced by 
> the largest column domain. Since the relations fed into the join are 
> filtered, we are concerned with the new, filtered relations created by the 
> scan nodes.
> Impala, like most planners, makes three basic assumptions (see the first 
> paper above):
> * Uniformity: keys are evenly distributed.
> * Independence: the filtering on the two tables is independent of keys.
> * Containment: that all detail (FK) rows have a corresponding master (FK) row.
> (These are traditional assumptions, but they turn out to be unrealistic in 
> many cases; see the second paper above.)
> h4. M:1 Case
> The planner (unnecessarily) divides join planning into two cases. 
> (IMPALA-8018 describes how the two steps are unnecessary. But, to minimize 
> code change, we live with this process here.)
> * M:1 (AKA "FK/PK", "detail/master", "fact/dimension"). A "foreign key" (FK) 
> in the detail table matches at most one "primary key" (PK) in the master 
> table. ("At most one" because of filtering which may have removed master 
> keys.) Impala assumes that the detail ("FK") table is on the left (probe) 
> side, and the master ("PK") table is on the right ("build") side.
> * M:N (AKA "generic", "many-to-many"). Every key on the left matches 
> potentially many keys on the right.
> If we focus on just the M:1 case we can rename the relations for convenience 
> and observe a simplification:
> {noformat}
> L = D
> R = M
> L.k = M.pk
> R.k = D.fk
> {noformat}
> By definition of M:1:
> {noformat}
> |M.pk'| = |M'|
> {noformat}
> Which gives a revised join expression:
> {noformat}
>                 |D'| * |M'| 
> |D' ⋈ M'| = ------------------
>             max(|D.fk'|, |M'|)
> {noformat}
> The expression above is handy: it shows that we only need three values to 
> compute the join cardinality:
> * {{|D'|}} the output cardinality from the left plan node, which is already 
> available.
> * {{|M'|}} the output cardinality from the right scan node, which is already 
> available. (In Impala, the right node will always be a table scan.)
> * {{|D.fk'|}} the number of foreign key values in the filtered {{D'}} 
> relation. (In relational theory terms, the cardinality of the domain of the 
> foreign key after applying pushed-down selection operations.)
> See IMPALA-XXXX for a necessary adjustment to {{|M'|}} to handle predicates 
> common to both sides.
> We only need to estimate {{|D.fk'|}}. We will work up to this step by step 
> because the intermediate steps help explain the bug in the current code.
> h4. Filtering of Master Table Only
> Let's start with the simplest case, filtering on only the master table:
> {noformat}
> |D.fk'| = |D.fk| <= |M.pk|
> max(|D.fk'|, |M'|) = max(|M.pk|, |M'|) = |M.pk| = |M|
> {noformat}
> So:
> {noformat}   
> |D ⋈ M'| = |D| * |M'| / |M|
> {noformat}
> Intuitively: the probability of any detail row finding a match is simply the 
> selectivity of the master filter, or {{|M'| / |M|}}.
> h4. Filtering on the Foreign Key Column
> The next case is also fairly easy. Suppose we know that the left (detail) 
> scan applied a filter. Impala presently uses an incorrect, but simplified, 
> model for this case. (See IMPALA-XXXX for the correct model.)
> The cardinality of the foreign key is simply the result of applying that 
> filter:
> {noformat}
> |D.fk'| = |D.fk| * sel(f)
> {noformat}
> Let us assume we do not filter the master table in this case, so:
> {noformat}
> |D.fk'| < |D.fk| <= |M.pk|
> max(|D.fk'|, |M|) = |M|
> {noformat}
> And:
> {noformat}
> |D' ⋈ M| = |D'| * |M| / |M| = |D'|
> {noformat}
> We assume "containment" from the S&S paper above: all the foreign keys have a 
> matching primary key if {{|F.fk| <= |M.pk|}}. So, if we filter only on the 
> detail table, all foreign keys find a match and the cardinality of the join 
> is just the cardinality of the left input relation.
> h4. Combined Left and Right Filtering
> Now, let's combine the master and detail filtering cases. In general, we will 
> have filtering on both sides. The {{max}} in the expression automatically 
> combines the cases:
> {noformat}
> |D.fk'| = |D.fk| * |D'| / |D|
>                 |D'| * |M'| 
> |D' ⋈ M'| = ------------------
>             max(|D.fk'|, |M'|)
> {noformat}
> Intuitively, the number of rows is the Cartesian product divided by the 
> larger of the number of keys in either table after the scan. Said another 
> way, each detail row finds a master, unless filtering has removed so many 
> masters that some detail rows find no match, in which case the probability of 
> a match is {{|M'| / |D.fk'|}}.
> h4. Compound Primary Keys
> The final complexity is to consider a compound key. That is:
> {noformat}
> (D.fk1, Dfk2) --> (M.pk1, M.pk2)
> {noformat}
> The foreign key pair points to a matching primary key pair. Here we consider 
> only pairs, but the logic is the same for a compound key with any number of 
> columns. Obviously, by the definition of a join in SQL, the number of keys on 
> each side must be the same.
> If we know (from HMS metadata) that the above pairs are, in fact, the keys, 
> then we can make a simplifying assumption:
> {noformat}
> |(M.pk1, M.pk2)| = |M|
> |(D.fk1, Dfk2)| <= |(M.pk1, M.pk2)| = |M|
> {noformat}
> If we don't know a-priori that a pair (k1, k1) is a foreign or primary key, 
> then we can estimate its cardinality (assuming independence of values) as:
> {noformat}
> |(k1, k2)| = |k1| * |k2|
> {noformat}
> In the M:1 case, the primary key cardinality must be the same as table 
> cardinality. If the two columns are completely independent, then:
> {noformat}
> |(M.pk1, M.pk2)| = |M.pk1| * |M.pk2| = |M|
> {noformat}
> More typically, there is some correlation between the columns so:
> {noformat}
> |(M.pk1, M.pk2)| = |M| <= |M.pk1| * |M.pk2|
> {noformat}
> Indeed, Impala uses the above relation to decide we have the FK/PK case. Said 
> another way, if Impala follows the FK/PK logic, then we can simply assume 
> that {{|M.pk| = |M|}} even if the key is compound.
> h4. Compound Foreign Keys
> Foreign keys require a bit more thought. We could have a detail table with 
> only one row. If we have many detail rows, the assumption of containment says 
> that each detail record points to some master record, so: 
> {noformat}
> |D.fk| <= |M.pk|
> {noformat}
> This lets us estimate the cardinality of a compound foreign key as:
> {noformat}
> |(D.fk1, D.fk2)| = min( |D.fk1| * |D.fk2|, |M| )
> {noformat}
> That is, if the detail table is small, the left term is a reasonable estimate 
> (ignoring the urn model issue). But, as the table gets larger, and begins to 
> include most primary keys, we know that the number of foreign keys can't be 
> larger than the number of primary keys, so the right term is the better 
> estimate.
> h4. Filtering on Compound Keys
> Suppose that the join inputs have filtering applied to the left (detail) 
> table. We discussed how to handle this fo a single column. For a compound 
> key, we can observe that the cardinality of the key is the product of the 
> cardinality of the columns, but (using the containment assumption), no larger 
> than the cardinality of the primary key (which is the cardinality of the 
> master table.)
> So:
> {noformat}
> |D.fk| = min( ∏ |D.fki|, |M| )
>                            
> |D.fk'| = |D.fk| * |D'| / |D|
>         = min( ∏ |D.fki|, |M| ) * |D'| / |D|
> {noformat}
> So the final expression for join cardinality is:
> {noformat}
> |D.fk'| = min( ∏ |D.fki|, |M| ) * |D'| / |D|
>                 |D'| * |M'| 
> |D' ⋈ M'| = -------------------
>             max( |D.fk'|, |M'|)
> {noformat}
> The first expression says that the compound foreign key cardinality is either 
> the product of the columns that make up the key, or the cardinality of the 
> primary key, whichever is less. We then adjust that amount (incorrectly) by 
> the percentage of the detail table that is scanned.
> The second expression is just the Cartesian product divided by the largest 
> key cardinality: either foreign key or primary key (which is, by definition, 
> equal to the cardinality of the master table.) 
> h4. Complexity: Compound Joins
> The above expression works well if the left input to a join is a base table 
> row which we know the original table cardinality that corresponds to the 
> original column NDV. The above can be used as-is in a M:N ("generic") join 
> for a right-side table. But, when when used on the left side, we must recall 
> that Impala builds left-deep join plans, so the left side may be a join. In 
> this case, there is no original left-side base table.
> See IMPALA-XXXX for discussion of the complexities in this case. A quick and 
> dirty solution is to use the scan output cardinality in place of the left 
> input cardinality. That is:
> * Determine the table that contains the join key.
> * Search the left subtree for the scan for that table.
> To do this, start with the left input:
> * If the node is a scan node, and the scan is for the target table, return 
> the output cardinality of that scan.
> * If the node is a join, then apply the search to *both* sides of the join.
> This approach is cumbersome, and may run into complexities if the node is 
> something other than a scan or join. It also may underestimate if a filter is 
> applied at the join level.
> A cleaner, tough more involved, solution is to track adjusted NDV for each 
> column through each operator as described in IMPALA-8220.
> h4. Code Bug
> The commit "IMPALA-5547: Rework FK/PK join detection", ID 
> [{{9f678a74269250bf5c7ae2c5e8afd93c5b3734de}}|https://github.com/apache/impala/commit/9f678a74269250bf5c7ae2c5e8afd93c5b3734de#diff-b10ccc2cdf68b236be400cde1e858a7c]
>  on Jun 6, 2017 reworked the FK/PK logic. It has one flaw: after determining 
> that we have an FK/PK (M:1) case, it then attempts to adjust the compound FK 
> and PK columns. This has two problems:
> * It is unnecessary, as we saw above.
> * The math is wrong and produces bogus estimates.
> Here is the code in {{JoinNode.java}}:
> {code:java}
>     long result = -1;
>     for (EqJoinConjunctScanSlots slots: eqJoinConjunctSlots) {
>       // Adjust the join selectivity based on the NDV ratio to avoid 
> underestimating
>       // the cardinality if the PK side has a higher NDV than the FK side.
>       double ndvRatio = 1.0;
>       if (slots.lhsNdv() > 0) ndvRatio = slots.rhsNdv() / slots.lhsNdv();
>       double rhsSelectivity = Double.MIN_VALUE;
>       if (slots.rhsNumRows() > 0) rhsSelectivity = rhsCard / 
> slots.rhsNumRows();
>       long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
>       if (result == -1) {
>         result = joinCard;
>       } else {
>         result = Math.min(result, joinCard);
>       }
>     }
>     // FK/PK join cardinality must be <= the lhs cardinality.
>     result = Math.min(result, lhsCard);
> {code}
> The above is hard to follow, which may account for why the bug was not 
> caught. Ignoring some corner cases, the logic is essentially:
> {noformat}
> lhsCard = |D'|
> ndvRatioi = |D.fki| / |M.pki|
> rhsSelectivityi = |M'| / |M|
> joinCardi = lhsCard * ndvRatioi * rhsSelectivityi
>           = |D'| * (|D.fki| / |M.pki|) * |M'| / |M|
> |join| = min(joinCardi)
>        = (|D'| * |M'| / |M|) * min(|D.fki| / |M.pki|)
> {noformat}
> Though hard to see, the above is not equivalent to the logic worked out in 
> the previous section. Using the correct expression from earlier sections:
> {noformat}
>                             |D'| * |M'| 
> |D' ⋈ M'| = ----------------------------------------------
>             max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), |M'|)
> {noformat}
> We can factor out the common {{|D'| * |M'|}} terms and compare:
> {noformat}
> min(|D.fki| / |M.pki|)                     1
> ----------------------  !=  ----------------------------------------------
>         |M|                 max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), |M'|)
>              1                                  1
> ----------------------------  !=  
> ----------------------------------------------
> |M| * max(|M.pki| / |D.fki|)      max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), 
> |M'|)
> |M| * max(|M.pki| / |D.fki|)  !=  max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), 
> |M'|)
> {noformat}
> There is no valid operation that can convert one side to the other, so they 
> are unequal.
> It is likely that the code's version attempts to work around issues elsewhere 
> in the calculations (such as ignoring some predicates, using exponential 
> back-off for filters, not having a good estimate for {{|D|}}, etc.)
> h4. Longer-Term Fix
> The above simple fix is the target of this ticket. Longer term, the code 
> should evolve to use a single path for both the M:1 and M:N cases since as 
> described in IMPALA-8018. (Both cases start with HMS data. Currently we use 
> two paths to arrive at the same result. IMPALA-8018 suggests we need only one 
> path.) We should also adopt the simple urn model as described in IMPALA-8218.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to