Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18995 )

Change subject: IMPALA-11583: Use Iceberg API to update stats
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1698
PS1, Line 1698:     
Preconditions.checkState(msTbl.getParameters().containsKey(COMPUTE_STATS_TIME));
Are you sure that this is always set? Is it possible for the condition at line 
1658 to be false in case of Iceberg tables? I forgot how this worked exacly, 
but I think that it is false if you manually change columns stats.


http://gerrit.cloudera.org:8080/#/c/18995/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1707
PS1, Line 1707:     IcebergCatalogOpExecutor.unsetTblProperties(iceTxn, 
Lists.newArrayList(
              :         StatsSetupConst.COLUMN_STATS_ACCURATE));
Do you know whether COLUMN_STATS_ACCURATE is used with Iceberg tables? It may 
make sense to discuss this with Hive people - AFAIK the only stat where this 
matters in numRows, and numRows will stay accurate. So we do not change stats 
stored as table properties, while column stats are stored for Hive and Impala 
independently - this means that we cannot mess up the stats of Hive.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test:

http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@33
PS1, Line 33: compute stats ice_alltypes;
What happens if we run COMPUTE INCREMENTAL STATS for Iceberg? It shouldn't be 
supported, but I don't know whether we have tests for that.


http://gerrit.cloudera.org:8080/#/c/18995/1/testdata/workloads/functional-query/queries/QueryTest/iceberg-compute-stats.test@57
PS1, Line 57: ====
Can you also add a test for setting stats manually? See 
https://docs.cloudera.com/best-practices/latest/impala-performance/topics/bp-impala-manually-setting-table-and-partition-statistics.html


http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18995/1/tests/query_test/test_iceberg.py@795
PS1, Line 795:                        unique_database)
nit: would fit to one line



--
To view, visit http://gerrit.cloudera.org:8080/18995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46b6e0a5a65e18e5aaf2a007ec0242b28e0fed92
Gerrit-Change-Number: 18995
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gfurnst...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tma...@apache.org>
Gerrit-Comment-Date: Fri, 16 Sep 2022 16:37:54 +0000
Gerrit-HasComments: Yes

Reply via email to