[ https://issues.apache.org/jira/browse/IMPALA-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761485#comment-16761485 ]
ASF subversion and git services commented on IMPALA-7694: --------------------------------------------------------- Commit b5714097e096c6e4b0573a7b326789807a1e4e5f in impala's branch refs/heads/master from Lars Volker [ https://gitbox.apache.org/repos/asf?p=impala.git;h=b571409 ] IMPALA-7694: Add host resource usage metrics to profile This change adds a mechanism to collect host resource usage metrics to profiles. Metric collection can be controlled through a new query option 'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics collection will be enabled. Collection always happens per query for all executors that run one or more fragment instances of the query. This mechanism adds a new time series counter class that collects all measured values and does not re-sample them. It will re-sample values when printing them into a string profile, preserving up to 64 values, but Thrift profiles will contain the full list of values. We add a new section "Per Node Profiles" to the profile to store and show these values: Per Node Profiles: lv-desktop:22000: CpuIoWaitPercentage (500.000ms): 0, 0 CpuSysPercentage (500.000ms): 1, 1 CpuUserPercentage (500.000ms): 4, 0 - ScratchBytesRead: 0 - ScratchBytesWritten: 0 - ScratchFileUsedBytes: 0 - ScratchReads: 0 (0) - ScratchWrites: 0 (0) - TotalEncryptionTime: 0.000ns - TotalReadBlockTime: 0.000ns This change also uses the aforementioned mechanism to collect CPU usage metrics (user, system, and IO wait time). A future change can then add a tool to decode a Thrift profile and plot the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a tool is not included in this change because it will require some reworking of the python dependencies. This change also includes a few minor improvements to make the resulting code more readable: - Extend the PeriodicCounterUpdater to call functions to update global metrics before updating the counters. - Expose the scratch profile within the per node resource usage section. - Improve documentation of the profile counter classes. - Remove synchronization from StreamingSampler. - Remove a few pieces of dead code that otherwise would have required updates. - Factor some code for profile decoding into the Impala python library Testing: This change contains a unit test for the system level metrics collection and e2e tests for the profile changes. Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Reviewed-on: http://gerrit.cloudera.org:8080/12069 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Add CPU resource utilization (user, system, iowait) timelines to profiles > ------------------------------------------------------------------------- > > Key: IMPALA-7694 > URL: https://issues.apache.org/jira/browse/IMPALA-7694 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.1.0 > Reporter: Lars Volker > Assignee: Lars Volker > Priority: Major > Labels: observability, supportability > > We often struggle to determine why a query was slow, in particular if it was > caused by other tasks on the same machine using resources. To help with this > we should include timelines for system resource utilization to the profiles. > These should eventually include CPU and disk and network I/O. If it is too > expensive to include these in all queries we should add a flag to add these > to a percentage of queries, and a query option to force-enable them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org