Re: iwm: work around missing Tx completion interrupts
On Mon, Feb 10, 2020 at 01:14:25PM +0100, Mark Kettenis wrote: > Stefan Sperling schreef op 2020-02-06 12:45: > > At 36c3 I noticed roaming failures with iwm(4) where we would get stuck > > trying to roam to a different AP. Debugging this with bluhm@ we found > > that the reason it gets stuck is a non-zero refcount on the ic_bss node. > > > > When roaming, we wait for this reference count to hit zero before > > switching > > the new AP. A non-zero reference count implies that the driver still has > > outstanding frames destined for the old AP queued to hardware. > > What we observed was that the reference count never went back to zero > > so roaming never completed and no further data frames could be sent. > > ifconfig iwm0 down/up was required to get the interface working again. > > > > iwm(4) decrements the refcount whenever hardware signals Tx completion > > for a frame at Tx queue index 'N'. We observed that we sometimes get an > > interrupt for frame 'N - 2' followed by an interrupt for frame 'N', > > with no interrupt being received for frame 'N - 1'. > > I don't really know how the rings work on this hardware, but coalescing > interrupts like that doesn't sound unreasonable. Interrupt coalescing is indeed enabled so it could be related to that. I don't know how it is supposed to work either. We will never know for sure until somebody leaks datasheets. You could be correct that "missed Tx completion" is what coalescing looks like for individually transmitted frames. I would expect coalescing to signal one interrupt with multiple 'packets', which contain firmware command responses and/or 802.11 frames, delivered for processing by the driver. But that's just a guess based on how interrupts behave when receiving frames. The proposed fix is about Tx. Since we do not yet aggregrate on Tx it seems reasonable to assume that each frame will be completed separately which is what happens virtually 99.9% of the time. It is also possible that the amount of frames in the air at 36c3 triggered errors in the firmware or hardware that lead to the symptoms we observed.
Re: iwm: work around missing Tx completion interrupts
Stefan Sperling schreef op 2020-02-06 12:45: At 36c3 I noticed roaming failures with iwm(4) where we would get stuck trying to roam to a different AP. Debugging this with bluhm@ we found that the reason it gets stuck is a non-zero refcount on the ic_bss node. When roaming, we wait for this reference count to hit zero before switching the new AP. A non-zero reference count implies that the driver still has outstanding frames destined for the old AP queued to hardware. What we observed was that the reference count never went back to zero so roaming never completed and no further data frames could be sent. ifconfig iwm0 down/up was required to get the interface working again. iwm(4) decrements the refcount whenever hardware signals Tx completion for a frame at Tx queue index 'N'. We observed that we sometimes get an interrupt for frame 'N - 2' followed by an interrupt for frame 'N', with no interrupt being received for frame 'N - 1'. I don't really know how the rings work on this hardware, but coalescing interrupts like that doesn't sound unreasonable. Whenever this had occurred a later decision to roam to another AP would fail as described above. A side-effect of this is that an mbuf gets leaked. This diff implements a workaround in iwm's interrupt handler. It's not pretty but I don't know the root cause. Given that this happens very rarely, and all we lose is success/failure information for the affected frames, this workaround seems acceptable to me. So far I only managed to trigger the problem at 36c3. If you want to see a printf when it happens, compile with 'option IWM_DEBUG'. ok? diff refs/heads/master refs/heads/iwm-txintr blob - 348771350037a5a281b12647bf95b5c2173bf57f (mode 644) blob + bdc3cebd73f392a5c1bfab702e6e8685298a6648 (mode 600) --- sys/dev/pci/if_iwm.c +++ sys/dev/pci/if_iwm.c @@ -1276,6 +1276,7 @@ iwm_alloc_tx_ring(struct iwm_softc *sc, struct iwm_tx_ ring->qid = qid; ring->queued = 0; ring->cur = 0; + ring->tail = 0; /* Allocate TX descriptors (256-byte aligned). */ size = IWM_TX_RING_COUNT * sizeof (struct iwm_tfd); @@ -1377,6 +1378,7 @@ iwm_reset_tx_ring(struct iwm_softc *sc, struct iwm_tx_ } ring->queued = 0; ring->cur = 0; + ring->tail = 0; } void @@ -4193,6 +4195,25 @@ iwm_rx_tx_cmd_single(struct iwm_softc *sc, struct iwm_ } void +iwm_txd_done(struct iwm_softc *sc, struct iwm_tx_data *txd) +{ + struct ieee80211com *ic = &sc->sc_ic; + + bus_dmamap_sync(sc->sc_dmat, txd->map, 0, txd->map->dm_mapsize, + BUS_DMASYNC_POSTWRITE); + bus_dmamap_unload(sc->sc_dmat, txd->map); + m_freem(txd->m); + txd->m = NULL; + + KASSERT(txd->in); + ieee80211_release_node(ic, &txd->in->in_ni); + txd->in = NULL; + + KASSERT(txd->done == 0); + txd->done = 1; +} + +void iwm_rx_tx_cmd(struct iwm_softc *sc, struct iwm_rx_packet *pkt, struct iwm_rx_data *data) { @@ -4202,31 +4223,35 @@ iwm_rx_tx_cmd(struct iwm_softc *sc, struct iwm_rx_pack int idx = cmd_hdr->idx; int qid = cmd_hdr->qid; struct iwm_tx_ring *ring = &sc->txq[qid]; - struct iwm_tx_data *txd = &ring->data[idx]; - struct iwm_node *in = txd->in; + struct iwm_tx_data *txd; - if (txd->done) - return; - bus_dmamap_sync(sc->sc_dmat, data->map, 0, IWM_RBUF_SIZE, BUS_DMASYNC_POSTREAD); sc->sc_tx_timer = 0; - iwm_rx_tx_cmd_single(sc, pkt, in); + txd = &ring->data[idx]; + if (txd->done) + return; - bus_dmamap_sync(sc->sc_dmat, txd->map, 0, txd->map->dm_mapsize, - BUS_DMASYNC_POSTWRITE); - bus_dmamap_unload(sc->sc_dmat, txd->map); - m_freem(txd->m); + iwm_rx_tx_cmd_single(sc, pkt, txd->in); + iwm_txd_done(sc, txd); - KASSERT(txd->done == 0); - txd->done = 1; - KASSERT(txd->in); - - txd->m = NULL; - txd->in = NULL; - ieee80211_release_node(ic, &in->in_ni); + /* +* XXX Sometimes we miss Tx completion interrupts. +* We cannot check Tx success/failure for affected frames; just free +* the associated mbuf and release the associated node reference. +*/ + while (ring->tail != idx) { + txd = &ring->data[ring->tail]; + if (!txd->done) { + DPRINTF(("%s: missed Tx completion: tail=%d idx=%d\n", + __func__, ring->tail, idx)); + iwm_txd_done(sc, txd); + ring->queued--; + } + ring->tail = (ring->tail + 1) % IWM_TX_RING_COUNT; + } if (--ring->queued < IWM_TX_RING_LOMARK) { sc->qfullmsk &= ~(1 << ring->qid); blob - 5206d9ac72a829d4bc2fd307a1d1a79e6e14627e blob + ad0975b69dd3d3888f584dd759f5ebcf5dd8e940 --- sys/dev/pci/if_iwmvar.h +++ sys/dev/pci/if_iwmvar.h @@ -270,6 +270,7
Re: iwm: work around missing Tx completion interrupts
On Thu, Feb 06, 2020 at 12:45:38PM +0100, Stefan Sperling wrote: > At 36c3 I noticed roaming failures with iwm(4) where we would get stuck > trying to roam to a different AP. Debugging this with bluhm@ we found > that the reason it gets stuck is a non-zero refcount on the ic_bss node. > > When roaming, we wait for this reference count to hit zero before switching > the new AP. A non-zero reference count implies that the driver still has > outstanding frames destined for the old AP queued to hardware. > What we observed was that the reference count never went back to zero > so roaming never completed and no further data frames could be sent. > ifconfig iwm0 down/up was required to get the interface working again. > > iwm(4) decrements the refcount whenever hardware signals Tx completion > for a frame at Tx queue index 'N'. We observed that we sometimes get an > interrupt for frame 'N - 2' followed by an interrupt for frame 'N', > with no interrupt being received for frame 'N - 1'. > Whenever this had occurred a later decision to roam to another AP would > fail as described above. A side-effect of this is that an mbuf gets leaked. > > This diff implements a workaround in iwm's interrupt handler. > It's not pretty but I don't know the root cause. > Given that this happens very rarely, and all we lose is success/failure > information for the affected frames, this workaround seems acceptable to me. > > So far I only managed to trigger the problem at 36c3. > If you want to see a printf when it happens, compile with 'option IWM_DEBUG'. > > ok? I've been testing the diff for the last two days and haven't seen any regressions. Unfortunately it seems I also didn't manage to trigger the actual bug so I can not say for sure whether your workaround actually fixes it. I think your solution makes sense until we found the underlying problem.
iwm: work around missing Tx completion interrupts
At 36c3 I noticed roaming failures with iwm(4) where we would get stuck trying to roam to a different AP. Debugging this with bluhm@ we found that the reason it gets stuck is a non-zero refcount on the ic_bss node. When roaming, we wait for this reference count to hit zero before switching the new AP. A non-zero reference count implies that the driver still has outstanding frames destined for the old AP queued to hardware. What we observed was that the reference count never went back to zero so roaming never completed and no further data frames could be sent. ifconfig iwm0 down/up was required to get the interface working again. iwm(4) decrements the refcount whenever hardware signals Tx completion for a frame at Tx queue index 'N'. We observed that we sometimes get an interrupt for frame 'N - 2' followed by an interrupt for frame 'N', with no interrupt being received for frame 'N - 1'. Whenever this had occurred a later decision to roam to another AP would fail as described above. A side-effect of this is that an mbuf gets leaked. This diff implements a workaround in iwm's interrupt handler. It's not pretty but I don't know the root cause. Given that this happens very rarely, and all we lose is success/failure information for the affected frames, this workaround seems acceptable to me. So far I only managed to trigger the problem at 36c3. If you want to see a printf when it happens, compile with 'option IWM_DEBUG'. ok? diff refs/heads/master refs/heads/iwm-txintr blob - 348771350037a5a281b12647bf95b5c2173bf57f (mode 644) blob + bdc3cebd73f392a5c1bfab702e6e8685298a6648 (mode 600) --- sys/dev/pci/if_iwm.c +++ sys/dev/pci/if_iwm.c @@ -1276,6 +1276,7 @@ iwm_alloc_tx_ring(struct iwm_softc *sc, struct iwm_tx_ ring->qid = qid; ring->queued = 0; ring->cur = 0; + ring->tail = 0; /* Allocate TX descriptors (256-byte aligned). */ size = IWM_TX_RING_COUNT * sizeof (struct iwm_tfd); @@ -1377,6 +1378,7 @@ iwm_reset_tx_ring(struct iwm_softc *sc, struct iwm_tx_ } ring->queued = 0; ring->cur = 0; + ring->tail = 0; } void @@ -4193,6 +4195,25 @@ iwm_rx_tx_cmd_single(struct iwm_softc *sc, struct iwm_ } void +iwm_txd_done(struct iwm_softc *sc, struct iwm_tx_data *txd) +{ + struct ieee80211com *ic = &sc->sc_ic; + + bus_dmamap_sync(sc->sc_dmat, txd->map, 0, txd->map->dm_mapsize, + BUS_DMASYNC_POSTWRITE); + bus_dmamap_unload(sc->sc_dmat, txd->map); + m_freem(txd->m); + txd->m = NULL; + + KASSERT(txd->in); + ieee80211_release_node(ic, &txd->in->in_ni); + txd->in = NULL; + + KASSERT(txd->done == 0); + txd->done = 1; +} + +void iwm_rx_tx_cmd(struct iwm_softc *sc, struct iwm_rx_packet *pkt, struct iwm_rx_data *data) { @@ -4202,31 +4223,35 @@ iwm_rx_tx_cmd(struct iwm_softc *sc, struct iwm_rx_pack int idx = cmd_hdr->idx; int qid = cmd_hdr->qid; struct iwm_tx_ring *ring = &sc->txq[qid]; - struct iwm_tx_data *txd = &ring->data[idx]; - struct iwm_node *in = txd->in; + struct iwm_tx_data *txd; - if (txd->done) - return; - bus_dmamap_sync(sc->sc_dmat, data->map, 0, IWM_RBUF_SIZE, BUS_DMASYNC_POSTREAD); sc->sc_tx_timer = 0; - iwm_rx_tx_cmd_single(sc, pkt, in); + txd = &ring->data[idx]; + if (txd->done) + return; - bus_dmamap_sync(sc->sc_dmat, txd->map, 0, txd->map->dm_mapsize, - BUS_DMASYNC_POSTWRITE); - bus_dmamap_unload(sc->sc_dmat, txd->map); - m_freem(txd->m); + iwm_rx_tx_cmd_single(sc, pkt, txd->in); + iwm_txd_done(sc, txd); - KASSERT(txd->done == 0); - txd->done = 1; - KASSERT(txd->in); - - txd->m = NULL; - txd->in = NULL; - ieee80211_release_node(ic, &in->in_ni); + /* +* XXX Sometimes we miss Tx completion interrupts. +* We cannot check Tx success/failure for affected frames; just free +* the associated mbuf and release the associated node reference. +*/ + while (ring->tail != idx) { + txd = &ring->data[ring->tail]; + if (!txd->done) { + DPRINTF(("%s: missed Tx completion: tail=%d idx=%d\n", + __func__, ring->tail, idx)); + iwm_txd_done(sc, txd); + ring->queued--; + } + ring->tail = (ring->tail + 1) % IWM_TX_RING_COUNT; + } if (--ring->queued < IWM_TX_RING_LOMARK) { sc->qfullmsk &= ~(1 << ring->qid); blob - 5206d9ac72a829d4bc2fd307a1d1a79e6e14627e blob + ad0975b69dd3d3888f584dd759f5ebcf5dd8e940 --- sys/dev/pci/if_iwmvar.h +++ sys/dev/pci/if_iwmvar.h @@ -270,6 +270,7 @@ struct iwm_tx_ring { int qid; int queued; int cur; + int tail; };