[ https://issues.apache.org/jira/browse/IMPALA-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers resolved IMPALA-8034. --------------------------------- Resolution: Fixed > PlannerTest cardinality tests are not realistic > ----------------------------------------------- > > Key: IMPALA-8034 > URL: https://issues.apache.org/jira/browse/IMPALA-8034 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.1.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > > Impala generally assumes that queries are M:1, joined on the FK/PK. A PK > uniquely identifies a row, so {{|pl1| = |Table|}}. This assumption is build > into join estimation: that columns are independent, so if we have multiple > keys, {{|pk1| * |pk2| * … * |pkn| = |Table|}}. > But, PlannerTest frequently uses non-independent, non unique columns. For > example, it might join on both the (unique) {{id}} column and the non-unique > {{int_col}} column, which throws off calculations. For example: > {noformat} > select * > from functional.alltypesagg a > full outer join functional.alltypessmall b using (id, int_col) > right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = > c.string_col) > {noformat} > If we then try to get the estimated cardinalities to match the actual > cardinalities obtained from running the query, we end up fighting our > assumptions. This shows up in the code: rather than use the classic > assumption that the key columns are independent, the code uses special > adjustments for redundant columns, perhaps so that tests such as the above > produce good estimates. > Better to modify (or add) tests that are based on our assumptions so we can > verify that the intended logic works. It is fine to then add a few “oddball” > queries to see how well the estimates hold up when the data (or user) does > not follow the independence assumption. > Alternatively, add new tests that use realistic joins, and retain the > existing tests, adding a note of explanation why the resulting cardinality > estimates appear wrong (because we are using unrealistic, redundant columns > in joins, which real users seldom do.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org