Riza Suminto created IMPALA-12984:
-------------------------------------

             Summary: Show inactivity of data exchanges in query profile
                 Key: IMPALA-12984
                 URL: https://issues.apache.org/jira/browse/IMPALA-12984
             Project: IMPALA
          Issue Type: Improvement
          Components: Distributed Exec
            Reporter: Riza Suminto


Many-to-many data exchanges can be bottlenecked by hotspot receiver such 
scenario described in IMPALA-6692 or when data spilling happens in subset of 
backend. Ideally, this occurrences should be easily figured out in query 
profile. But triaging this kind of issue often requires correlation analysis of 
several counters in query profile. There are few ideas on how to improve this 
identification:
 # Upon query completion, let coordinator do some profile analysis and print 
WARNING in query profile pointing at the skew. One group of EXCHANGE senders 
and receivers can only complete simultaneously since all receivers need to wait 
for EOS signal from all senders. Let say we take max of TotalNetworkSendTime 
from all senders and max of DataWaitTime from all receivers, a "mutual wait" 
time of min(TotalNetworkSendTime,DataWaitTime) can be used as indicator of how 
long the exchanges are waiting for query operators above them to progress.
 # Add "Max Inactive" column in ExecSummary table. Existing "Avg Time" and "Max 
Time" are derived from RuntimeProfileBase::local_time_ns_. If ExecSummary also 
display maximum value of RuntimeProfileBase::inactive_timer_ of each query 
operator as "Max Inactive", we can then compare it against "Max Time" and 
figure out which exchange is mostly idle waiting. The calculation between 
local_time_ns, children_total_time, and inactive_timer can be seen here at 
[https://github.com/apache/impala/blob/0721858/be/src/util/runtime-profile.cc#L935-L938]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to