[ https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980439#comment-16980439 ]
ASF subversion and git services commented on IMPALA-7322: --------------------------------------------------------- Commit 65198faa3beeea13aec905f8cda8f644e99af960 in impala's branch refs/heads/master from Jiawei Wang [ https://gitbox.apache.org/repos/asf?p=impala.git;h=65198fa ] IMPALA-9110: Add table loading time break-down metrics for HdfsTable A. Problem: Catalog table loading currently only records the total loading time. We will need some break-down times, i.e. more detailed time recording on each loading function. Also, the table schema loading is not taken into account for load-duration. We will need to add some more metrics for that. B. Solution: - We added "hms-load-tbl-schema", "load-duration.all-column-stats", "load-duration.all-partitions.total-time", "load-duration.all-partitions.file-metadata". Also, we logged the loadValidWriteIdList() time. So now we have a more detailed breakdown time for table loading info. The table loading time metrics for HDFS tables are in the following hierarchy: - Table Schema Loading - Table Metadata Loading - total time - all column stats loading time - ValidWriteIds loading time - all partitions loading time - total time - file metadata loading time - storage-metadata-loading-time(standalone metric) 1. Table Schema Loading: * Meaning: The time for HMS to fetch table object and the real schema loading time. Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)" * Metric : hms-load-tbl-schema 2. Table Metadata Loading -- total time * Meaning: The time to load all the table metadata. The code path is load() function in HdfsTable.load() function. * Metric: load-duration.total-time 2.1 Table Metadata Loading -- all column stats * Meaning: load all column stats, this is part of table metadata loading The code path is HdfsTable.loadAllColumnStats() * Metric: load-duration.all-column-stats 2.2 Table Metadata Loading -- loadValidWriteIdList * Meaning: fetch ValidWriteIds from HMS The code path is HdfsTable.loadValidWriteIdList() * Metric: no metric recorded for this one. Instead, a debug log is generated. 2.3 Table Metadata Loading -- storage metadata loading(standalone metric) * Meaning: Storage related to file system operations during metadata loading.(The amount of time spent loading metadata from the underlying storage layer.) * Metric: we rename it to load-duration.storage-metadata. This is a metric introduced by IMPALA-7322 2.4 Table Metadata Loading -- load all partitions * Meaning: Load all partitions time, including fetching all partitions from HMS and loading all partitions. The code path is MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions() * Metric: load-duration.all-partitions 2.4.1 Table Metadata Loading -- load all partitions -- load file metadata * Meaning: The file metadata loading for all all partitions. (This is part of 2.4). Code path: loadFileMetadataForPartitions() inside loadAllPartitions() * Metric: load-duration.all-partitions.file-metadata C. Extra thing in this commit: 1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd 2. Add explanation for table loading manager D. Test: 1. Add Unit tests for PrintUtils.printTime() function 2. Manual describe table and verify the table loading metrics are correct. Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10 Reviewed-on: http://gerrit.cloudera.org:8080/14611 Reviewed-by: Vihang Karajgaonkar <vih...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Add storage wait time to profile for operations with metadata load > ------------------------------------------------------------------ > > Key: IMPALA-7322 > URL: https://issues.apache.org/jira/browse/IMPALA-7322 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 3.0, Impala 2.12.0 > Reporter: Balazs Jeszenszky > Assignee: Yongzhi Chen > Priority: Major > Fix For: Impala 3.4.0 > > > The profile of a REFRESH or of the query triggering metadata load should > point out how much time was spent waiting for source systems. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org