[ 
https://issues.apache.org/jira/browse/IMPALA-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725292#comment-16725292
 ] 

ASF subversion and git services commented on IMPALA-5200:
---------------------------------------------------------

Commit ba6ca11d8b073fce20f8f9d824b3dc8a2f331082 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=ba6ca11 ]

IMPALA-5200: Count child time for parent's total time

One problem with the total time counter on runtime
profiles is that a parent's time may not be updated
if execution is stuck in a child node. The child
can accumulate time while the parent is stuck at
zero. This leads to incorrect or misleading
calculations of total time or non-child time
for the parent node during execution.

This makes a modest change in calculation for total
time for parent nodes. It takes advantage of the
fact that the parent should count all of the time
from all of its children as total time for itself.
Specifically, if a parent has accumulated X in its
total timer and its children have accumulated Y
summed across all of their timers, then a parent's
total time should be at least max(X, Y). There is no way
to know the appropriate overlap between X and Y,
so this uses a conservative calculation assuming
complete overlap.

This prevents a parent node from reporting itself
as 100% non-child time when it is actually stuck
executing child code. However, it does not help
if a child node is stuck and is not reporting its
own time.

Testing:
 - Added test case to runtime-profile-test
 - Core tests pass

Change-Id: Id6c1191c39fd18b6be45325366a74cf54908c77e
Reviewed-on: http://gerrit.cloudera.org:8080/11791
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Profile timers not updated during long-running sort
> ---------------------------------------------------
>
>                 Key: IMPALA-5200
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5200
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Tim Armstrong
>            Priority: Minor
>              Labels: observability, ramp-up, supportability
>
> If you have a query plan with a long-running sort operation (e.g. minutes), 
> the profile timers are not updated to reflect the time spent in the sort 
> until the sort starts returning rows.
> E.g. this is a summary from a sort query that was running for a few hours 
> (!). The summary was misleading and the "heat map" plan in the debug web UI 
> is misleading - it showed the join as the "hot" operator. It would be ideal 
> if we could somehow at least periodically update the time spent in the 
> operator.
> {code}
> Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   
> Peak Mem  Est. Peak Mem  Detail                   
> ----------------------------------------------------------------------------------------------------------------------------
> 05:MERGING-EXCHANGE        1    0.000ns    0.000ns        0     635.58M       
>    0        -1.00 B  UNPARTITIONED            
> 03:SORT                    1    0.000ns    0.000ns        0     635.58M   
> 47.86 GB      800.00 MB                           
> 02:HASH JOIN               1    4s859ms    4s859ms  771.02M     635.58M  
> 162.11 MB       16.03 MB  INNER JOIN, BROADCAST    
> |--04:EXCHANGE             1   38.988ms   38.988ms  247.20K     247.20K       
>    0              0  BROADCAST                
> |  01:SCAN HDFS            1    8s089ms    8s089ms  247.20K     247.20K    
> 3.79 MB       32.00 MB  product b     
> 00:SCAN HDFS               1  209.997ms  209.997ms   15.09M     635.58M  
> 185.27 MB      176.00 MB  sales a
> {code}
> http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-join-query-running-slow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to