Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19354 )
Change subject: IMPALA-11787, IMPALA-11516: Cardinality estimate for UNION in Iceberg position-delete plans can double the actual table cardinality ...................................................................... IMPALA-11787, IMPALA-11516: Cardinality estimate for UNION in Iceberg position-delete plans can double the actual table cardinality The plan for Iceberg tables with position-delete files includes a UNION operator that takes the following inputs: LHS: Scan of the data files that don't have corresponding delete files RHS: ANTI JOIN that filters the data files that do have corresponding delete files based on the content of the delete files. The planner's cardinality estimates for each of these two inputs to the UNION can be as large as the full row count of the table (assuming no other predicates in the scan) and the planner simply sums these in the UNION which can result in a cardinality estimate for the UNION that's twice the size of the table. In this patch IcebergScanNode overrides computeCardinalities() of the HdfsScanNode. The method is implemented similarly with a few modifications: * we exactly know the record counts of the data files * for table sampling we know the file descriptors, hence the record counts as well * IDENTITY-based partition conjuncts already filtered out the files, so we don't need their selectivity So we calculate the SCAN NODE's cardinalities much more precisely. This patch also sets the column stats for the virtual columns of the scan node of the left-hand side of the ANTI JOIN. But because of IMPALA-11797 the ANTI JOIN's cardinality always equals to the LHS cardinality. IMPALA-11619 can also resolve this. Testing: * planner tests updated Change-Id: Ie2927c58c4adfd0ba1e135b63454ac9b07991cbf Reviewed-on: http://gerrit.cloudera.org:8080/19354 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M common/fbs/IcebergObjects.fbs M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test 9 files changed, 419 insertions(+), 91 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/19354 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie2927c58c4adfd0ba1e135b63454ac9b07991cbf Gerrit-Change-Number: 19354 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Anonymous Coward <lipeng...@sensorsdata.cn> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>