> -----Original Message-----
> From: Loktionov, Aleksandr <[email protected]>
> Sent: Friday, December 12, 2025 6:18 PM
> To: Behera, Vivek (DI FA DSP ICC PRC1) <[email protected]>; Keller,
> Jacob E <[email protected]>; Nguyen, Anthony L
> <[email protected]>; Kitszel, Przemyslaw
> <[email protected]>; Fijalkowski, Maciej
> <[email protected]>; [email protected];
> [email protected]
> Cc: [email protected]
> Subject: RE: [PATCH iwl-net v1] igb: Fix trigger of incorrect irq in
> igb_xsk_wakeup
> 
> 
> 
> > -----Original Message-----
> > From: Behera, VIVEK <[email protected]>
> > Sent: Friday, December 12, 2025 5:26 PM
> > To: Loktionov, Aleksandr <[email protected]>; Keller,
> > Jacob E <[email protected]>; Nguyen, Anthony L
> > <[email protected]>; Kitszel, Przemyslaw
> > <[email protected]>; Fijalkowski, Maciej
> > <[email protected]>; [email protected];
> > [email protected]
> > Cc: [email protected]
> > Subject: RE: [PATCH iwl-net v1] igb: Fix trigger of incorrect irq in
> > igb_xsk_wakeup
> >
> >
> >
> > > -----Original Message-----
> > > From: Loktionov, Aleksandr <[email protected]>
> > > Sent: Friday, December 12, 2025 2:56 PM
> > > To: Behera, Vivek (DI FA DSP ICC PRC1) <[email protected]>;
> > > Keller, Jacob E <[email protected]>; Nguyen, Anthony L
> > > <[email protected]>; Kitszel, Przemyslaw
> > > <[email protected]>; Fijalkowski, Maciej
> > > <[email protected]>; [email protected];
> > > [email protected]
> > > Cc: [email protected]; Behera, Vivek (DI FA DSP ICC
> > > PRC1) <[email protected]>
> > > Subject: RE: [PATCH iwl-net v1] igb: Fix trigger of incorrect irq in
> > > igb_xsk_wakeup
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Vivek Behera <[email protected]>
> > > > Sent: Friday, December 12, 2025 2:15 PM
> > > > To: Loktionov, Aleksandr <[email protected]>; Keller,
> > > > Jacob E <[email protected]>; Nguyen, Anthony L
> > > > <[email protected]>; Kitszel, Przemyslaw
> > > > <[email protected]>; Fijalkowski, Maciej
> > > > <[email protected]>; [email protected];
> > > > [email protected]
> > > > Cc: [email protected]; Behera, Vivek
> > > > <[email protected]>
> > > > Subject: [PATCH iwl-net v1] igb: Fix trigger of incorrect irq in
> > > > igb_xsk_wakeup
> > > >
> > > > The current implementation in the igb_xsk_wakeup expects the Rx
> > and
> > > > Tx queues to share the same irq. This would lead to triggering of
> > > > incorrect irq in split irq configuration.
> > > > This patch addresses this issue which could impact environments
> > with
> > > > 2 active cpu cores or when the number of queues is reduced to 2 or
> > > > less
> > > >
> > > > cat /proc/interrupts | grep eno2
> > > >  167:          0          0          0          0 IR-PCI-MSIX-
> > > > 0000:08:00.0    0-edge      eno2
> > > >  168:          0          0          0          0 IR-PCI-MSIX-
> > > > 0000:08:00.0    1-edge      eno2-rx-0
> > > >  169:          0          0          0          0 IR-PCI-MSIX-
> > > > 0000:08:00.0    2-edge      eno2-rx-1
> > > >  170:          0          0          0          0 IR-PCI-MSIX-
> > > > 0000:08:00.0    3-edge      eno2-tx-0
> > > >  171:          0          0          0          0 IR-PCI-MSIX-
> > > > 0000:08:00.0    4-edge      eno2-tx-1
> > > >
> > > > Furthermore it uses the flags input argument to trigger either rx,
> > > > tx or both rx and tx irqs as specified in the ndo_xsk_wakeup api
> > > > documentation
> > > >
> > > > Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and
> > > > helpers")
> > > > Signed-off-by: Vivek Behera <[email protected]>
> > > > ---
> > > >  drivers/net/ethernet/intel/igb/igb_xsk.c | 77
> > > > ++++++++++++++++++++++-
> > > > -
> > > >  1 file changed, 71 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/intel/igb/igb_xsk.c
> > > > b/drivers/net/ethernet/intel/igb/igb_xsk.c
> > > > index 30ce5fbb5b77..9f23e6740640 100644
> > > > --- a/drivers/net/ethernet/intel/igb/igb_xsk.c
> > > > +++ b/drivers/net/ethernet/intel/igb/igb_xsk.c
> > > > @@ -529,7 +529,9 @@ int igb_xsk_wakeup(struct net_device *dev, u32
> > > > qid, u32 flags)
> > > >         struct igb_adapter *adapter = netdev_priv(dev);
> > > >         struct e1000_hw *hw = &adapter->hw;
> > > >         struct igb_ring *ring;
> > > > +       struct igb_q_vector *q_vector;
> > > >         u32 eics = 0;
> > > > +       bool trigger_irq_no_msix = false;
> > > >
> > > >         if (test_bit(__IGB_DOWN, &adapter->state))
> > > >                 return -ENETDOWN;
> > > > @@ -537,13 +539,76 @@ int igb_xsk_wakeup(struct net_device *dev,
> > u32
> > > > qid, u32 flags)
> > > >         if (!igb_xdp_is_enabled(adapter))
> > > >                 return -EINVAL;
> > > >
> > > > -       if (qid >= adapter->num_tx_queues)
> > > > -               return -EINVAL;
> > > > -
> > > > -       ring = adapter->tx_ring[qid];
> > > > +       if ((flags & XDP_WAKEUP_RX) && (flags & XDP_WAKEUP_TX)) {
> > > > +               /* If both TX and RX need to be woken up */
> > > > +               /* check if queue pairs are active. */
> > > > +               if ((adapter->flags & IGB_FLAG_QUEUE_PAIRS)) {
> > > Extra parentheses in if().
> > Noted. Will remove in the next version
> > >
> > > > +                       /* Extract ring params from Rx. */
> > > > +                       if (qid >= adapter->num_rx_queues)
> > > > +                               return -EINVAL;
> > > > +                       ring = adapter->rx_ring[qid];
> > > > +               } else {
> > > > +                       /* Two irqs for Rx AND Tx need to be triggered 
> > > > */
> > > > +                       if (qid >= adapter->num_rx_queues)
> > > > +                               return -EINVAL;
> > > > +
> > > > +                       if (qid >= adapter->num_tx_queues)
> > > > +                               return -EINVAL;
> > > > +
> > > > +                       /* IRQ trigger preparation for Rx */
> > > > +                       ring = adapter->rx_ring[qid];
> > > > +                       if (!ring->xsk_pool)
> > > > +                               return -ENXIO;
> > > > +
> > > > +                       q_vector = ring->q_vector;
> > > > +                       if (!napi_if_scheduled_mark_missed(&q_vector-
> > > > >napi)) {
> > > > +                               /* Extend the BIT mask for eics */
> > > > +                               if (adapter->flags & IGB_FLAG_HAS_MSIX)
> > > > +                                       eics |= 
> > > > ring->q_vector->eims_value;
> > > > +                               else
> > > > +                                       trigger_irq_no_msix = true;
> > > > +                       }
> > > > +                       /* IRQ trigger preparation for Tx */
> > > > +                       ring = adapter->tx_ring[qid];
> > > > +                       if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring-
> > > > >flags))
> > > > +                               return -ENETDOWN;
> > > > +
> > > > +                       if (!ring->xsk_pool)
> > > > +                               return -ENXIO;
> > > Looks like violation of the API contract:
> > > When XDP_WAKEUP_RX | XDP_WAKEUP_TX are both set, if RX can be
> woken
> > > but TX cannot, the caller expects a clear error with no side
> > effects,
> > > not a half- prepared state.
> > > Shouldn't we validate both RX and TX rings before preparing any
> > interrupt state?
> > >
> > I am not sure if I understand correctly what you mean by a half-
> > prepared state.
> > Preparation of Tx and Rx purely includes(unless I missed something)
> > validating all the necessary conditions required to wakeup both Tx and
> > RX. Only after all checks are done tx and rx rings a single trigger is
> > executed either over EICS or E1000_ICS right at the end.
> > In my understanding the ndo_xsk_wakeup with RX and TX flags set
> > expects both the xsk pools to be woken up.
> > So how is it a violation of the API contract if we exit immediately on
> > error during our checks?
> >  Any failing check eliminates the success criteria defined by the API
> 
> Good day, Vivek
> 
> Thanks for the clarification. Let me explain the side effect concern more 
> clearly:
> 
> > Do you mean that the error should only be returned after doing all the
> > checks instead of exiting the function with the corresponding error
> > immediately when a check fails?
> 
> Yes, exactly. The issue is that napi_if_scheduled_mark_missed() is not just a
> "check" - it has side effects. When NAPI is already scheduled, calling this 
> function
> marks the queue as "missed," which modifies the NAPI subsystem's state. If we
> then fail the TX validation and return an error, we've left the RX NAPI in a 
> modified
> state without completing the operation.
> 
> The sequence should be:
> 1. Validate both RX and TX rings (xsk_pool existence, disabled flags, etc.) 2.
> Only if BOTH validations pass, call napi_if_scheduled_mark_missed() and
>    prepare interrupt state
> 3. Trigger interrupts
> 
> This ensures atomicity - either the entire operation succeeds, or no state is
> modified.

Hi, Alexander

Thanks. Now I understand that you were referring to the napi state in your 
original comment. Yes indeed we have to wait for all checks to finish
Before changing the napi state and triggering irqs. This would ensure that 
operation required by napi state change is executed.
I will add this to the next version
> 
> > >
> > > > +
> > > > +                       /* Retrieve the q_vector saved in the ring */
> > > > +                       q_vector = ring->q_vector;
> > > > +                       if (!napi_if_scheduled_mark_missed(&q_vector-
> > > > >napi)) {
> > > > +                               /* Extend the BIT mask for eics */
> > > > +                               if (adapter->flags & IGB_FLAG_HAS_MSIX)
> > > > +                                       eics |= 
> > > > ring->q_vector->eims_value;
> > > > +                               else
> > > > +                                       trigger_irq_no_msix = true;
> > > > +                       }
> > > > +                       /* Now we trigger the split irqs for Rx and Tx
> > > > over eics */
> > > > +                       if (eics)
> > > > +                               wr32(E1000_EICS, eics);
> > > > +                       else if (trigger_irq_no_msix)
> > > > +                               wr32(E1000_ICS, E1000_ICS_RXDMT0);
> > > Shouldn't we use a combined interrupt bit or set both RX and TX bits
> > here?
> > >   wr32(E1000_ICS, E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
> > I think the bits for both RX and TX should only be used if there isn't
> > a combined interrupt bit that can trigger both RX and TX.
> > I am not sure what is the correct setting for this use case. What do
> > you think?
> 
> You're correct that we need both RX and TX bits when both wakeups are
> requested.
> E1000_ICS_RXDMT0 alone only triggers RX processing. Using:
> 
>     wr32(E1000_ICS, E1000_ICS_RXDMT0 | E1000_ICS_TXDW);
> 
> will trigger both RX and TX interrupt processing in legacy interrupt mode, 
> which is
> what the flags parameter requires.
 Okay. I will add this change to the next version
Thanks for your vital inputs,
Vivek
> 
> Thanks,
> Aleksandr
> 
> > > >
> > > > -       if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> > > > -               return -ENETDOWN;
> > > > +                       return 0;
> > > > +               }
> > > > +       } else if (flags & XDP_WAKEUP_TX) {
> > > > +               if (qid >= adapter->num_tx_queues)
> > > > +                       return -EINVAL;
> > > > +               /* Get the ring params from Tx */
> > > > +               ring = adapter->tx_ring[qid];
> > > > +               if (test_bit(IGB_RING_FLAG_TX_DISABLED, &ring->flags))
> > > > +                       return -ENETDOWN;
> > > > +       } else if (flags & XDP_WAKEUP_RX) {
> > > > +               if (qid >= adapter->num_rx_queues)
> > > > +                       return -EINVAL;
> > > > +               /* Get the ring params from Rx */
> > > > +               ring = adapter->rx_ring[qid];
> > > > +       } else {
> > > > +               /* Invalid Flags */
> > > > +               return -EINVAL;
> > > > +       }
> > > >
> > > >         if (!READ_ONCE(ring->xsk_pool))
> > > >                 return -EINVAL;
> > > > --
> > > > 2.34.1

Reply via email to