Hi All,

We are running a multi-core (32-core) system at Owens Valley that has a
dual-port Myricom 10GBe NIC.  We ran the system very successfully under
Ubuntu 12.04 for more than 1 year, but after upgrading to Ubuntu 18.04
(generic) we are now experiencing reliability problems, despite the tuning
parameters and smp_affinity adjustments being (as far as we can tell) the
same.  The problem seems to be somehow associated with system load and
packet handling rather than receipt of the packets by the interface, since
things run fine for up to 10 minutes, then start to deteriorate.  In
researching this, I see various other flavors of Ubuntu (low-latency,
realtime, rt, preempt) that make kernel adjustments that might help, but I
am not able to tell from the descriptions which if any of these might
address the problem.  Has anyone had a similar experience, and/or have
advice about what options we might have?  I am using the myri10ge driver
that came with Ubuntu 18.04.

One thing I might mention is that I ran this script:
https://github.com/majek/dump/blob/master/how-to-receive-a-packet/softnet.sh,
and find a certain number of "squeezed" packets, which are "# of times
ksoftirq ran out of netdev_budget or time slice with work remaining."  I
don't know if this is something to worry about?  The output of softnet.sh
is like this.  Note we had the NIC assigned to cpus 1 and 2, but changed to
30 and 31.

user@dpp:~$ ./softnet.sh
cpu      total    dropped   squeezed  collision        rps flow_limit
  0    1328082          0       3729          0          0          0
  1 1716559544          0    7208929          0          0          0
  2 1793125842          0    8158475          0          0          0
  3    1069150          0       3714          0          0          0
  4    1400569          0       5443          0          0          0
  5    6988379          0       5985          0          0          0
  6    6466640          0       5950          0          0          0
  7    1070366          0       4097          0          0          0
  8     878808          0       3906          0          0          0
  9     933541          0       4207          0          0          0
 10       1229          0          4          0          0          0
 11        848          0          0          0          0          0
 12       1310          0          5          0          0          0
 13        662          0          0          0          0          0
 14       1304          0          2          0          0          0
 15        680          0          3          0          0          0
 16       1817          0          2          0          0          0
 17        648          0          3          0          0          0
 18        742          0          2          0          0          0
 19        605          0          2          0          0          0
 20        690          0          2          0          0          0
 21        536          0          3          0          0          0
 22        860          0          0          0          0          0
 23        493          0          3          0          0          0
 24       1657          0          4          0          0          0
 25    9244642          0       1487          0          0          0
 26        912          0          2          0          0          0
 27        287          0          0          0          0          0
 28    5252171          0        877          0          0          0
 29        339          0          3          0          0          0
 30 3378532079          0   17299324          0          0          0
 31 3390959304          0   16129528          0          0          0

Thanks,
Dale

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To post to this group, send email to casper@lists.berkeley.edu.

Reply via email to