[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10141) Include aggregate TCP metrics in per-node profiles

2020-09-02 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-10141:
-

 Summary: Include aggregate TCP metrics in per-node profiles
 Key: IMPALA-10141
 URL: https://issues.apache.org/jira/browse/IMPALA-10141
 Project: IMPALA
  Issue Type: Improvement
Reporter: Sahil Takiar


The /rpcz endpoint in the debug web ui includes a ton of useful TCP-level 
metrics per kRPC connection for all inbound / outbound connections. It would be 
useful to aggregate some of these metrics and put them in the per-node 
profiles. Since it is not possible to currently split these metrics out per 
query, they should be added at the per-host level. Furthermore, only metrics 
that can be sanely aggregated across all connections should be included. For 
example, tracking the number of Retransmitted TCP Packets across all 
connections for the duration of the query would be useful. TCP retransmissions 
should be rare and are typically indicate of network hardware issues or network 
congestions, having at least some high level idea of the number of TCP 
retransmissions that occur during a query can drastically help determine if the 
network is to blame for query slowness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)