Hi, We did some research on the topic, the answer we've come so far is
Impala has two sets of information tracked on the coordinator node for each query: a summary and a profile. The profile is currently accessible as a string, which is unwieldy for parsing. A thrift format is theoretically available, but there is a bug: https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in v3.2.0. So you need to have version >=3.2 After that Thrift Encoding form Twitter commons may be used - https://github.com/twitter/commons/blob/06905dc0f1a26440a79ff1164831c85ce2d1bdf0/src/python/twitter/thrift/text/thrift_json_encoder.py The thrift can be downloaded from Coordinator node e.g http://coord-node:25000/query_profile_encoded?query_id=442c057197d9c0d:81810ccd00000000 ( 442c057197d9c0d:81810ccd00000000 is the Query ID) The thrift can be downloaded from Cloudera REST API (if using Cloudera) Or if using impyla<https://github.com/cloudera/impyla> Python library you can get the profile after execution cur.execute(sql) return cur.get_profile(profile_format=TRuntimeProfileFormat.THRIFT) Just posting here in case it's helpful to anyone following the user group. -Antoni From: Antoni Ivanov Sent: Wednesday, August 7, 2019 10:13 AM To: u...@impala.apache.org Cc: dev@impala <dev@impala.apache.org>; Jenny Kwan (c) <kje...@vmware.com> Subject: How to parse a query plan /summary/profile Hi, We'd like to get better visibility into way our Impala Cluster is used. For example there's per node utilization - e.g sometimes fragments on a given node are slower, and this is visible in profile . Or there are some statistics available only in profile (like Runtime filters used or parquet file pruning stats) I think you can download it as a Thrift ? But is it easily de-serializable (we need to have the Thrift Schema at least I think) Thanks, Antoni