[ 
https://issues.apache.org/jira/browse/IMPALA-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-7449:
-------------------------------
    Description: The network throughput computation fails to take into account 
of the fact that multiple RPCs can be happening in parallel. Currently, the 
throughput is computed by (total bytes sent / total network time). The total 
network time is the aggregate of the network time observed of each RPC. This 
seems hard to understand (or wrong?) when there are drastically different 
throughput when sending to different hosts. It may be slightly easier to 
understand if we switch to measuring the observed network throughput of each 
individual RPC and use a summary counter or a histogram to record the 
throughput.  (was: The network throughput computation fails to take into 
account of the fact that multiple RPCs can be happening in parallel. Currently, 
the throughput is computed by (total bytes sent / total network time). The 
total network time is the aggregate of the network time observed of each RPC. 
This seems hard to understand especially when there are drastically different 
throughput when sending to different hosts. It may be slightly easier to 
understand if we switch to measuring the observed network throughput of each 
individual RPC and use a summary counter to record the avg/min/max.)

> TotalNetworkThroughput in KrpcDataStreamSender is broken
> --------------------------------------------------------
>
>                 Key: IMPALA-7449
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7449
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 3.0, Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Michael Ho
>            Priority: Critical
>
> The network throughput computation fails to take into account of the fact 
> that multiple RPCs can be happening in parallel. Currently, the throughput is 
> computed by (total bytes sent / total network time). The total network time 
> is the aggregate of the network time observed of each RPC. This seems hard to 
> understand (or wrong?) when there are drastically different throughput when 
> sending to different hosts. It may be slightly easier to understand if we 
> switch to measuring the observed network throughput of each individual RPC 
> and use a summary counter or a histogram to record the throughput.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to