Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18995 )
Change subject: IMPALA-11583: Use Iceberg API to update stats ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1698 PS1, Line 1698: Preconditions.checkState(msTbl.getParameters().containsKey(COMPUTE_STATS_TIME)); Are you sure that this is always set? Is it possible for the condition at line 1658 to be false in case of Iceberg tables? I forgot how this worked exacly, but I think that it is false if you manually change columns stats. http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1707 PS1, Line 1707: IcebergCatalogOpExecutor.unsetTblProperties(iceTxn, Lists.newArrayList( : StatsSetupConst.COLUMN_STATS_ACCURATE)); Do you know whether COLUMN_STATS_ACCURATE is used with Iceberg tables? It may make sense to discuss this with Hive people - AFAIK the only stat where this matters in numRows, and numRows will stay accurate. So we do not change stats stored as table properties, while column stats are stored for Hive and Impala independently - this means that we cannot mess up the stats of Hive. http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test: http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@33 PS1, Line 33: compute stats ice_alltypes; What happens if we run COMPUTE INCREMENTAL STATS for Iceberg? It shouldn't be supported, but I don't know whether we have tests for that. http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@57 PS1, Line 57: ==== Can you also add a test for setting stats manually? See https://docs.cloudera.com/best-practices/latest/impala-performance/topics/bp-impala-manually-setting-table-and-partition-statistics.html http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py@795 PS1, Line 795: unique_database) nit: would fit to one line -- To view, visit http://gerrit.cloudera.org:8080/18995 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92 Gerrit-Change-Number: 18995 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tamas Mate <tma...@apache.org> Gerrit-Comment-Date: Fri, 16 Sep 2022 16:37:54 +0000 Gerrit-HasComments: Yes