[ https://issues.apache.org/jira/browse/IMPALA-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483015#comment-16483015 ]
ASF subversion and git services commented on IMPALA-6131: --------------------------------------------------------- Commit 5c7d3b12e3aa750e7ab88e3ef1092d5218e53cc2 in impala's branch refs/heads/master from [~csringhofer] [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5c7d3b1 ] IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Reviewed-on: http://gerrit.cloudera.org:8080/10116 Reviewed-by: Lars Volker <l...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Track time of last statistics update in metadata > ------------------------------------------------ > > Key: IMPALA-6131 > URL: https://issues.apache.org/jira/browse/IMPALA-6131 > Project: IMPALA > Issue Type: Sub-task > Components: Backend, Frontend > Reporter: Lars Volker > Assignee: Csaba Ringhofer > Priority: Major > Labels: ramp-up > > Currently we (ab-)use {{transient_lastDdlTime}} to track the last update time > of statistics. Instead we should introduce a separate counter to track the > last update. With that we should also remove all occurrences of > {{catalog_.updateLastDdlTime()}} from {{CatalogOpExecutor}} and fall back to > Hive's default behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org