As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC).

This capture, showing one ksoftirqd eating all cycles
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller() :

Most NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

First patch is a fix in poll_one_napi().

Then following patches take care of ten drivers.

Eric Dumazet (11):
  netpoll: do not test NAPI_STATE_SCHED in poll_one_napi()
  hinic: remove ndo_poll_controller
  ehea: remove ndo_poll_controller
  net: hns: remove ndo_poll_controller
  virtio_net: remove ndo_poll_controller
  qlcnic: remove ndo_poll_controller
  qlogic: netxen: remove ndo_poll_controller
  net: ena: remove ndo_poll_controller
  sfc: remove ndo_poll_controller
  sfc-falcon: remove ndo_poll_controller
  ibmvnic: remove ndo_poll_controller

 drivers/net/ethernet/amazon/ena/ena_netdev.c  | 22 ---------
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 18 --------
 .../net/ethernet/huawei/hinic/hinic_main.c    | 20 ---------
 drivers/net/ethernet/ibm/ehea/ehea_main.c     | 14 ------
 drivers/net/ethernet/ibm/ibmvnic.c            | 16 -------
 .../ethernet/qlogic/netxen/netxen_nic_main.c  | 23 ----------
 .../net/ethernet/qlogic/qlcnic/qlcnic_main.c  | 45 -------------------
 drivers/net/ethernet/sfc/efx.c                | 26 -----------
 drivers/net/ethernet/sfc/falcon/efx.c         | 26 -----------
 drivers/net/virtio_net.c                      | 14 ------
 net/core/netpoll.c                            | 20 +--------
 11 files changed, 1 insertion(+), 243 deletions(-)

-- 
2.19.0.605.g01d371f741-goog

Reply via email to