Hi,

We did some research on the topic, the answer we've come so far is

Impala has two sets of information tracked on the coordinator node for each 
query: a summary and a profile.
The profile is currently accessible as a string, which is unwieldy for parsing. 
A thrift format is theoretically available, but there is a bug: 
https://issues.apache.org/jira/browse/IMPALA-8252 , which is resolved in 
v3.2.0. So you need to have version >=3.2


After that Thrift Encoding form Twitter commons may be used -
https://github.com/twitter/commons/blob/06905dc0f1a26440a79ff1164831c85ce2d1bdf0/src/python/twitter/thrift/text/thrift_json_encoder.py


The thrift can be downloaded from Coordinator node e.g 
http://coord-node:25000/query_profile_encoded?query_id=442c057197d9c0d:81810ccd00000000
 ( 442c057197d9c0d:81810ccd00000000 is the Query ID)
The thrift can be downloaded from Cloudera REST API (if using Cloudera)
Or if using impyla<https://github.com/cloudera/impyla> Python library you can 
get the profile after execution
        cur.execute(sql)
        return cur.get_profile(profile_format=TRuntimeProfileFormat.THRIFT)


Just posting here in  case it's helpful to anyone following the user group.

-Antoni

From: Antoni Ivanov
Sent: Wednesday, August 7, 2019 10:13 AM
To: u...@impala.apache.org
Cc: dev@impala <dev@impala.apache.org>; Jenny Kwan (c) <kje...@vmware.com>
Subject: How to parse a query plan /summary/profile

Hi,

We'd like to get better visibility into way our Impala Cluster is used.
For example there's per node utilization - e.g sometimes fragments on a given 
node are slower, and this is visible in profile . Or there are some statistics 
available only in profile (like Runtime filters used or parquet file pruning 
stats)

I think you can download it as a Thrift ? But is it easily de-serializable (we 
need to have the Thrift Schema at least I think)
Thanks,
Antoni

Reply via email to