[ 
https://issues.apache.org/jira/browse/IMPALA-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483015#comment-16483015
 ] 

ASF subversion and git services commented on IMPALA-6131:
---------------------------------------------------------

Commit 5c7d3b12e3aa750e7ab88e3ef1092d5218e53cc2 in impala's branch 
refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5c7d3b1 ]

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Reviewed-on: http://gerrit.cloudera.org:8080/10116
Reviewed-by: Lars Volker <l...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Track time of last statistics update in metadata
> ------------------------------------------------
>
>                 Key: IMPALA-6131
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6131
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend, Frontend
>            Reporter: Lars Volker
>            Assignee: Csaba Ringhofer
>            Priority: Major
>              Labels: ramp-up
>
> Currently we (ab-)use {{transient_lastDdlTime}} to track the last update time 
> of statistics. Instead we should introduce a separate counter to track the 
> last update. With that we should also remove all occurrences of 
> {{catalog_.updateLastDdlTime()}} from {{CatalogOpExecutor}} and fall back to 
> Hive's default behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to