Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14066 )
Change subject: IMPALA-8836: Support COMPUTE STATS on insert only ACID tables ...................................................................... IMPALA-8836: Support COMPUTE STATS on insert only ACID tables For ACID tables COMPUTE STATS needs to use a new HMS API, as the old one is rejected by metastore. This API currently has some counter intuitive parts: - setPartitionColumnStatistics is used to set table stats, as there is no similar function exposed by HMS client for tables at the moment. - A new writeId is allocated for the stat change, and this needs a transaction, so a transaction is opened/committed/aborted even though this doesn't seem necessary. The Hive code seems to use internal API for this. - Even though the HMS thrift Table object has a colStats field, it is only applied during alter_table if there are other changes like new columns in the tables, so alter_table couldn't be used to change column stats. Additional changes: - DROP STATS is no longer allowed for transactional tables, as it turned out that there is no transactional version of the old API. - Remove COLUMN_STATS_ACCURATE table property during COMPUTE STATS to ensure that Hive does use stats computed by Impala to return answer queries like SELECT count(*) - Changed CatalogOpExecutor.updateCatalog() to get the writeIds earlier. This can mean unnecassary HMS RPC calls if no property change is needed in the end, but I felt it hard to reason about what happens if these RPC calls fail at their original location. TODOs (My plan is to do these in IMPALA-8865): - Tried to make the MetastoreShim API easier to use by adding a class to encapsulate thing like txnId and writeId, but it feels rather half baked and under documented. A similar class is added in https://gerrit.cloudera.org/#/c/14071/, it would be good to merge them. - The validWriteIdList of the original SELECT(s) behind COMPUTE STATS could be used in the HMS API calls, but this would need more plumbing. Change-Id: I5c06b4678c1ff75c5aa1586a78afea563e64057f Reviewed-on: http://gerrit.cloudera.org:8080/14066 Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com> Tested-by: Tim Armstrong <tarmstr...@cloudera.com> --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java A testdata/workloads/functional-query/queries/QueryTest/acid-compute-stats.test M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test M tests/query_test/test_acid.py 10 files changed, 406 insertions(+), 119 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5c06b4678c1ff75c5aa1586a78afea563e64057f Gerrit-Change-Number: 14066 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>