[ 
https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980439#comment-16980439
 ] 

ASF subversion and git services commented on IMPALA-7322:
---------------------------------------------------------

Commit 65198faa3beeea13aec905f8cda8f644e99af960 in impala's branch 
refs/heads/master from Jiawei Wang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=65198fa ]

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
    - all column stats loading time
    - ValidWriteIds loading time
    - all partitions loading time - total time
        - file metadata loading time
    - storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading 
time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage 
layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric 
introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Reviewed-on: http://gerrit.cloudera.org:8080/14611
Reviewed-by: Vihang Karajgaonkar <vih...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Add storage wait time to profile for operations with metadata load
> ------------------------------------------------------------------
>
>                 Key: IMPALA-7322
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7322
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.0, Impala 2.12.0
>            Reporter: Balazs Jeszenszky
>            Assignee: Yongzhi Chen
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> The profile of a REFRESH or of the query triggering metadata load should 
> point out how much time was spent waiting for source systems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to