We are debugging an issue with netconsole and ixgbe, that ksoftirqd takes 100%
of a core. It happens with both current net and net-next.

To reproduce the issue:

  1. Setup server with ixgbe and netconsole. We bind each queue to a separate
     core via smp_affinity;
  2. Start simple netperf job from client, like:
        ./super_netperf 201 -P 0 -t TCP_RR -p 8888 -H <SERVER> -l 7200 -- -r 
300,300 -o -s 1M,1M -S 1M,1M
  3. On server, write to /dev/kmsg in a loop (to send netconsole):
        for x in {1..7200} ; do echo aa >> /dev/kmsg ; sleep 1; done
  4. On server, monitor ksoftirqd in top

Within a few minutes, top will show one ksoftirqd take 100% of the core for many
seconds in a row. 

When the ksoftirqd takes 100% of a core, the driver hits "clean_complete=false"
path below, so this napi stays in polling mode. 

        ixgbe_for_each_ring(ring, q_vector->rx) {
                int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
                                                 per_ring_budget);

                work_done += cleaned;
                if (cleaned >= per_ring_budget)
                        clean_complete = false;
        }

        /* If all work not completed, return budget and keep polling */
        if (!clean_complete)
                return budget;

We didn't see this issue on a 4.6 based kernel.

We are still debugging the issue. But we would like to check whether there is
known solution for it. Any comments and suggestions are highly appreciated.

Best,
Song

Reply via email to