Re: em(4) watchdog timeouts
Hi Gregor, Thank you for your feedback. Did you have some timeout on 5.6 ? On amd64 version, I experienced some on heavy network load. Is it related ? Regards, Alexis VACHETTE. On 11/11/2015 21:19, Gregor Best wrote: Hi Alexis, On Wed, Nov 11, 2015 at 08:11:15PM +, Alexis VACHETTE wrote: [...] Even with heavy network load ? [...] So far, yes. I've saturated the device for about 45 Minutes with something like this (the other end is my laptop): ## on the router $ dd if=/dev/zero bs=8k | nc 172.31.64.174 55000 ## on my laptop $ nc -l 55000 | dd of=/dev/null bs=8k (with two or three streams in parallel). There were about 6k interrupts per second and bandwidth was about 250Mbps, which seems to be the maximum the tiny CPU in this router can do. No watchdog timeouts appeared, where previously something relatively low bandwidth (the SSDs in router and laptop suck) like this caused one every 20 or 30 seconds: ## on the router $ pax -w /home | nc 172.31.64.174 55000 I'll keep an eye on things, but so far it looks good. Regular usage works out so far as well. If you need me to run some special workload for you, I'd be more than happy to do that.
Re: em(4) watchdog timeouts
Hi Gregor, I use the same revision than yours : - "Intel 82583V" rev 0x00: msi Regards, Alexis VACHETTE.* * On 16/11/2015 10:12, Alexis VACHETTE wrote: Hi Gregor, Thank you for your feedback. Did you have some timeout on 5.6 ? On amd64 version, I experienced some on heavy network load. Is it related ? Regards, Alexis VACHETTE. On 11/11/2015 21:19, Gregor Best wrote: Hi Alexis, On Wed, Nov 11, 2015 at 08:11:15PM +0000, Alexis VACHETTE wrote: [...] Even with heavy network load ? [...] So far, yes. I've saturated the device for about 45 Minutes with something like this (the other end is my laptop): ## on the router $ dd if=/dev/zero bs=8k | nc 172.31.64.174 55000 ## on my laptop $ nc -l 55000 | dd of=/dev/null bs=8k (with two or three streams in parallel). There were about 6k interrupts per second and bandwidth was about 250Mbps, which seems to be the maximum the tiny CPU in this router can do. No watchdog timeouts appeared, where previously something relatively low bandwidth (the SSDs in router and laptop suck) like this caused one every 20 or 30 seconds: ## on the router $ pax -w /home | nc 172.31.64.174 55000 I'll keep an eye on things, but so far it looks good. Regular usage works out so far as well. If you need me to run some special workload for you, I'd be more than happy to do that.
Re: em(4) watchdog timeouts
Hi Gregor, Even with heavy network load ? Regards, Alexis. De : owner-t...@openbsd.org de la part de Gregor Best Envoyé : mercredi 11 novembre 2015 15:20 À : Mark Kettenis Cc : tech@openbsd.org; m...@openbsd.org Objet : Re: em(4) watchdog timeouts I've done some further testing and I think I've narrowed it down to the "Unlocking em(4) a bit further"-patch [0]. With the patch reverted, I haven't seen any watchdog timeouts yet. I'm currently running the router with the patch reverted to make sure the timeouts don't happen again. [0]: https://www.marc.info/?l=openbsd-tech&m=144347723907388&w=4 -- Gregor
Re: Possible em(4) fix
Hi Mark, If you need a box for testing purpose on this issue. I can provide you bug reports when I will get a spare box which trigger the watchdog timeout. In my case it's only with trunk device on failover mode so far. Regards, Alexis VACHETTE* * On 05/10/2015 22:45, Mark Kettenis wrote: Several people seem to complain on misc@ that they're seeing watchdog timeouts on em(4). But none of them bother to submit a proper bug report to bugs@. Anyway, here is a diff that might fix the issue. Please test, even if you're not experiencing any problems. Thanks, Mark Index: if_em.c === RCS file: /home/cvs/src/sys/dev/pci/if_em.c,v retrieving revision 1.306 diff -u -p -r1.306 if_em.c --- if_em.c 30 Sep 2015 11:25:08 - 1.306 +++ if_em.c 5 Oct 2015 20:35:13 - @@ -1210,12 +1210,6 @@ em_encap(struct em_softc *sc, struct mbu } } - sc->next_avail_tx_desc = i; - if (sc->pcix_82544) - atomic_sub_int(&sc->num_tx_desc_avail, txd_used); - else - atomic_sub_int(&sc->num_tx_desc_avail, map->dm_nsegs); - #if NVLAN > 0 /* Find out if we are in VLAN mode */ if (m_head->m_flags & M_VLANTAG) { @@ -1249,6 +1243,14 @@ em_encap(struct em_softc *sc, struct mbu tx_buffer = &sc->tx_buffer_area[first]; tx_buffer->next_eop = last; + membar_producer(); + + sc->next_avail_tx_desc = i; + if (sc->pcix_82544) + atomic_sub_int(&sc->num_tx_desc_avail, txd_used); + else + atomic_sub_int(&sc->num_tx_desc_avail, map->dm_nsegs); + /* * Advance the Transmit Descriptor Tail (Tdt), * this tells the E1000 that this frame is @@ -2377,6 +2379,8 @@ em_transmit_checksum_setup(struct em_sof tx_buffer->m_head = NULL; tx_buffer->next_eop = -1; + + membar_producer(); if (++curr_txd == sc->num_tx_desc) curr_txd = 0;