On 21.03.2014, at 16:22, Christopher Forgeron <csforge...@gmail.com> wrote:
> Markus, > > I don't know why I didn't notice this before.. I copied your cpuset ping > verbatim, not realizing that I should be using 172.16.0.x as that's my > network on the ix's > > On this tester box, 10.0.0.1 goes out a different interface, thus it never > reported back any problems. I’m sorry, I could have mentioned that. The good news is, this makes our two problems look very similar again. > Now that I've corrected that, I see I have problems on the same queues: > > CPU0 > ping: sendto: No buffer space available > ping: sendto: No buffer space available > CPU1 > CPU2 > CPU3 > CPU4 > CPU5 > CPU6 > CPU7 > CPU8 > ping: sendto: No buffer space available > ping: sendto: No buffer space available > CPU9 > CPU10 > CPU11 > CPU12 > CPU13 > CPU14 > CPU15 > CPU16 > ping: sendto: No buffer space available > ping: sendto: No buffer space available > CPU17 > CPU18 > CPU19 > CPU20 > CPU21 > CPU22 > CPU23 While this is not EFBIG, we’ve seen this too. It usually starts out as EFBIG because _bus_dmamap_load_mbuf_sg fails, and at some point turns into ENOBUFS when the software tx queue fills up too. If there’s no flow id, CPU cores and tx queues have a direct relationship, which seems to be the case with ping packets. ixgbe.c 798 /* Which queue to use */ 799 if ((m->m_flags & M_FLOWID) != 0) 800 i = m->m_pkthdr.flowid % adapter->num_queues; 801 else 802 i = curcpu % adapter->num_queues; In your example, queue 0 got stuck, which is the one that cpus 0, 8 and 16 will queue their ping packets in. num_queues defaults to 8 on systems with 8 cores or more. So it’s actually enough to run this test for the first 8 cpus to actually cover all tx queues. > I can run that three times and get the same CPU's. I'll try a reboot and > see if they always fail on the same queues, tho I don't know if that would > show anything. I’ve seen it happen on 2 queues at the same time. And I’ve seen it go away after a couple of hours leaving the system idle. > At this stage, NFS connections coming into the box are down, but I can > still ping out. Incoming pings show 'host is down’ I guess what will still work or not really depends on which queue is affected and which flows are tied to that queue. Markus > Here is the dump of ix0 's sysctls (only ix0 is in use on this machine for > testing) > > dev.ix.0.queue0.interrupt_rate: 500000 > dev.ix.0.queue0.irqs: 100179 > dev.ix.0.queue0.txd_head: 0 > dev.ix.0.queue0.txd_tail: 0 > dev.ix.0.queue0.tso_tx: 104156 > dev.ix.0.queue0.no_tx_dma_setup: 0 > dev.ix.0.queue0.no_desc_avail: 5 > dev.ix.0.queue0.tx_packets: 279480 > dev.ix.0.queue0.rxd_head: 513 > dev.ix.0.queue0.rxd_tail: 512 > dev.ix.0.queue0.rx_packets: 774424 > dev.ix.0.queue0.rx_bytes: 281916 > dev.ix.0.queue0.rx_copies: 4609 > dev.ix.0.queue0.lro_queued: 0 > dev.ix.0.queue0.lro_flushed: 0 > dev.ix.0.queue1.interrupt_rate: 71428 > dev.ix.0.queue1.irqs: 540682 > dev.ix.0.queue1.txd_head: 1295 > dev.ix.0.queue1.txd_tail: 1295 > dev.ix.0.queue1.tso_tx: 15 > dev.ix.0.queue1.no_tx_dma_setup: 0 > dev.ix.0.queue1.no_desc_avail: 0 > dev.ix.0.queue1.tx_packets: 93248 > dev.ix.0.queue1.rxd_head: 0 > dev.ix.0.queue1.rxd_tail: 2047 > dev.ix.0.queue1.rx_packets: 462225 > dev.ix.0.queue1.rx_bytes: 0 > dev.ix.0.queue1.rx_copies: 0 > dev.ix.0.queue1.lro_queued: 0 > dev.ix.0.queue1.lro_flushed: 0 > dev.ix.0.queue2.interrupt_rate: 71428 > dev.ix.0.queue2.irqs: 282801 > dev.ix.0.queue2.txd_head: 367 > dev.ix.0.queue2.txd_tail: 367 > dev.ix.0.queue2.tso_tx: 312757 > dev.ix.0.queue2.no_tx_dma_setup: 0 > dev.ix.0.queue2.no_desc_avail: 0 > dev.ix.0.queue2.tx_packets: 876533 > dev.ix.0.queue2.rxd_head: 0 > dev.ix.0.queue2.rxd_tail: 2047 > dev.ix.0.queue2.rx_packets: 2324954 > dev.ix.0.queue2.rx_bytes: 0 > dev.ix.0.queue2.rx_copies: 0 > dev.ix.0.queue2.lro_queued: 0 > dev.ix.0.queue2.lro_flushed: 0 > dev.ix.0.queue3.interrupt_rate: 71428 > dev.ix.0.queue3.irqs: 1424108 > dev.ix.0.queue3.txd_head: 499 > dev.ix.0.queue3.txd_tail: 499 > dev.ix.0.queue3.tso_tx: 1263116 > dev.ix.0.queue3.no_tx_dma_setup: 0 > dev.ix.0.queue3.no_desc_avail: 0 > dev.ix.0.queue3.tx_packets: 1590798 > dev.ix.0.queue3.rxd_head: 0 > dev.ix.0.queue3.rxd_tail: 2047 > dev.ix.0.queue3.rx_packets: 8319143 > dev.ix.0.queue3.rx_bytes: 0 > dev.ix.0.queue3.rx_copies: 0 > dev.ix.0.queue3.lro_queued: 0 > dev.ix.0.queue3.lro_flushed: 0 > dev.ix.0.queue4.interrupt_rate: 71428 > dev.ix.0.queue4.irqs: 138019 > dev.ix.0.queue4.txd_head: 1620 > dev.ix.0.queue4.txd_tail: 1620 > dev.ix.0.queue4.tso_tx: 29235 > dev.ix.0.queue4.no_tx_dma_setup: 0 > dev.ix.0.queue4.no_desc_avail: 0 > dev.ix.0.queue4.tx_packets: 200853 > dev.ix.0.queue4.rxd_head: 6 > dev.ix.0.queue4.rxd_tail: 5 > dev.ix.0.queue4.rx_packets: 218327 > dev.ix.0.queue4.rx_bytes: 1527 > dev.ix.0.queue4.rx_copies: 0 > dev.ix.0.queue4.lro_queued: 0 > dev.ix.0.queue4.lro_flushed: 0 > dev.ix.0.queue5.interrupt_rate: 71428 > dev.ix.0.queue5.irqs: 131367 > dev.ix.0.queue5.txd_head: 330 > dev.ix.0.queue5.txd_tail: 330 > dev.ix.0.queue5.tso_tx: 9907 > dev.ix.0.queue5.no_tx_dma_setup: 0 > dev.ix.0.queue5.no_desc_avail: 0 > dev.ix.0.queue5.tx_packets: 150955 > dev.ix.0.queue5.rxd_head: 0 > dev.ix.0.queue5.rxd_tail: 2047 > dev.ix.0.queue5.rx_packets: 72814 > dev.ix.0.queue5.rx_bytes: 0 > dev.ix.0.queue5.rx_copies: 0 > dev.ix.0.queue5.lro_queued: 0 > dev.ix.0.queue5.lro_flushed: 0 > dev.ix.0.queue6.interrupt_rate: 71428 > dev.ix.0.queue6.irqs: 839814 > dev.ix.0.queue6.txd_head: 1402 > dev.ix.0.queue6.txd_tail: 1402 > dev.ix.0.queue6.tso_tx: 327633 > dev.ix.0.queue6.no_tx_dma_setup: 0 > dev.ix.0.queue6.no_desc_avail: 0 > dev.ix.0.queue6.tx_packets: 1371262 > dev.ix.0.queue6.rxd_head: 0 > dev.ix.0.queue6.rxd_tail: 2047 > dev.ix.0.queue6.rx_packets: 2559592 > dev.ix.0.queue6.rx_bytes: 0 > dev.ix.0.queue6.rx_copies: 0 > dev.ix.0.queue6.lro_queued: 0 > dev.ix.0.queue6.lro_flushed: 0 > dev.ix.0.queue7.interrupt_rate: 71428 > dev.ix.0.queue7.irqs: 150693 > dev.ix.0.queue7.txd_head: 1965 > dev.ix.0.queue7.txd_tail: 1965 > dev.ix.0.queue7.tso_tx: 248 > dev.ix.0.queue7.no_tx_dma_setup: 0 > dev.ix.0.queue7.no_desc_avail: 0 > dev.ix.0.queue7.tx_packets: 145736 > dev.ix.0.queue7.rxd_head: 0 > dev.ix.0.queue7.rxd_tail: 2047 > dev.ix.0.queue7.rx_packets: 19030 > dev.ix.0.queue7.rx_bytes: 0 > dev.ix.0.queue7.rx_copies: 0 > dev.ix.0.queue7.lro_queued: 0 > dev.ix.0.queue7.lro_flushed: 0 > > > > On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert > <markus.geb...@hostpoint.ch>wrote: > >> >> >> Can you try this when the problem occurs? >> >> for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.2 -c 2 >> -W 1 10.0.0.1 | grep sendto; done >> >> It will tie ping to certain cpus to test the different tx queues of your >> ix interface. If the pings reliably fail only on some queues, then your >> problem is more likely to be the same as ours. >> >> Also, if you have dtrace available: >> >> kldload dtraceall >> dtrace -n 'fbt:::return / arg1 == EFBIG && execname == "ping" / { stack(); >> }' >> >> while you run pings over the interface affected. This will give you hints >> about where the EFBIG error comes from. >> >>> [...] >> >> >> Markus >> >> >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"