Re: [casper] NIC tuning and IRQ binding : Regarding

Hariharan Krishnan Fri, 11 Sep 2020 11:48:10 -0700

Hello David,

            Thank you for the detailed pointers to the networking issue,
I'll certainly have a look at your suggestion about the NUMA topology.  We
use a bifrost framework for our application that subscribes to the
application code to the multicast network. I had already checked the
interface statistics, clearly the kernel is dropping the packets in our
case. It is true that IRQs are polled, in guess the IRQ numbers indicate
that, however IRQ core mapping seems essential for optimized throughput
with close to loss <0.5 %. Right now, we are losing almost all packets (95
%), I was actually wondering if there are ways to do the mapping more
efficiently rather than a trial and error with the bitmask which seems to
change on reboot which I don't understand fully.


And I was curious with your statement " average processing rate isn't
keeping up with the data rate" , are you suggesting that the application
run-time is greater than the data rate ? (which often isn't the case, right
?)

Regards,

Hari

On Thu, Sep 10, 2020 at 1:03 AM David MacMahon <dav...@berkeley.edu> wrote:

> Hi, Hari,
>
> I think modern Linux network drivers use a "polling" approach rather than
> an interrupt driven approach, so I've found IRQ affinity to be less
> important than it used to be.  This can be observed as relatively low
> interrupt counts in /proc/interrupts.  The main things that I've found
> beneficial are:
>
> 1. Ensuring that the processing code runs on CPU cores in the same socket
> that the NIC's PCIe slot is connected to.  If you have a multi-socket NUMA
> system you will want to become familiar with its NUMA topology.  The
> "hwloc" package includes the cool "lstopo" utility that will show you a lot
> about your system's topology.  Even on a single socket system it can help
> to stay away from core 0 where many OS things tend to run.
>
> 2. Ensuring that memory allocations happen after your processes/threads
> have had their CPU affinity set, either by "taskset" or "numactl" or its
> own built-in CPU affinity setting code.  This is mostly for NUMA systems.
>
> 3. Ensuring that various buffers are sized appropriately.  There are a
> number of settings that can be tweaked in this category, most via
> "sysctl".  I won't dare to make any specific recommendations here.
> Everybody seems to have their own set of "these are the settings I used
> last time".  One of the most important things you can do in your packet
> receiving code is to keep track of how many packets you receive over a
> certain time interval.  If this value does not match the expected number of
> packets then you have a problem.  Any difference usually will be that the
> received packet count is lower than the expected packet count.  Some people
> call these dropped packets, but I prefer to call them "missed packets" at
> this point because all we can say is that we didn't get them.  We don't yet
> know what happened to them (maybe they were dropped, maybe they were
> misdirected, maybe they were never sent), but it helps to know where to
> look to find out.
>
> 4. Places to check for missing packets getting "dropped":
>
> 4.1 If you are using "normal" (aka SOCK_DGRAM) sockets to receive UDP
> packets, you will see a line in /proc/net/udp for your socket.  The last
> number on that line will be the number of packets that the kernel wanted to
> give to your socket but couldn't because the socket's receive buffer was
> full so the kernel had to drop the packet.
>
> 4.2 If you are using "packet" (aka SOCK_RAW) sockets to receive UDP
> packets, there are ways to get the total number of packets the kernel has
> handled for that socket and the number it had to drop because of lack of
> kernel/application buffer space.  I forget the details, but I'm sure you
> can google for it.  If you're using Hashpipe's packet socket support it has
> a function that will fetch these values for you.
>
> 4.3 The ifconfig utility will give you a count of "RX errors".  This is a
> generic category and I don't know all possible contributions to it, but one
> is that the NIC couldn't pass packets to the kernel.
>
> 4.4 Using "ethtool -S IFACE" (eg "ethtool -S eth4") will show you loads of
> stats.  These values all come from counters on the NIC.  Two interesting
> ones are called something like "rx_dropped" and "rx_fifo_errors".  A
> non-zero rx_fifo_errors value means that the kernel was not keeping up with
> the packet rate for long enough that the NIC/kernel buffers filled up and
> packets had to be dropped.
>
> 4.5 If you're using a lower-level kernel bypass approach (e.g. IBVerbs or
> DPDK), then you may have to dig a little harder to find the packet drop
> counters as th kernel is no longer involved and all the previously
> mentioned counters will be useless (with the possible exception of the NIC
> counters).
>
> 4.6 You may be able to login to and query your switch for interface
> statistics.  That can show various data and packet rates as well as bytes
> sent, packets sent, and some various error counters.
>
> One thing to remember about buffer sizes is that if your average
> processing rate isn't keeping up with the data rate, larger buffers won't
> solve your problem.  Larger buffers will only allow the system to withstand
> slightly longer temporary lulls in throughput ("hiccups") if the overall
> throughput of the system (including the lulls/hiccups) is as fast or
> (ideally) faster than the incoming data rate.
>
> Hope this helps,
> Dave
>
> On Sep 9, 2020, at 22:15, Hariharan Krishnan <vasanthikrishh...@gmail.com>
> wrote:
>
> Hello Everyone,
>
>                   I'm trying to tune the NIC on a server with Ubuntu 18.04
> OS to listen to a multicast network and optimize it for throughput through
> IRQ affinity binding. It is a Mellanox card and I tried using the
> "mlnx_tune" for doing this, but haven't been successful.
> I would really appreciate any help in this regard.
>
> Looking forward to responses from the group.
>
> Thank you.
>
> Regards,
>
> Hari
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1yn5xkdjfDVMm0UMO%3DQ-vjfm4nmVQbf-Jt1b4kGjB9VUQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/3E685598-8E83-429C-AD7F-3B44D3C90F05%40berkeley.edu
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/3E685598-8E83-429C-AD7F-3B44D3C90F05%40berkeley.edu?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CAHNYk1wjCo1gG0re1GDCE%2BxbHDXgaj0i0DoikxPoS5Pz%3DHaMgQ%40mail.gmail.com.

Re: [casper] NIC tuning and IRQ binding : Regarding

Reply via email to