On Thu, Jul 24, 2025 at 12:03:33AM +0000, Hay, Joshua A wrote:
> > On Thu, Jul 17, 2025 at 05:21:44PM -0700, Joshua Hay wrote:
> > > This series fixes a stability issue in the flow scheduling Tx send/clean
> > > path that results in a Tx timeout.
> > >
> > > The existing guardrails in the Tx path were not sufficient to prevent
> > > the driver from reusing completion tags that were still in flight (held
> > > by the HW).  This collision would cause the driver to erroneously clean
> > > the wrong packet thus leaving the descriptor ring in a bad state.
> > >
> > > The main point of this refactor is to replace the flow scheduling buffer
> > > ring with a large pool/array of buffers.  The completion tag then simply
> > > is the index into this array.  The driver tracks the free tags and pulls
> > > the next free one from a refillq.  The cleaning routines simply use the
> > > completion tag from the completion descriptor to index into the array to
> > > quickly find the buffers to clean.
> > >
> > > All of the code to support the refactor is added first to ensure traffic
> > > still passes with each patch.  The final patch then removes all of the
> > > obsolete stashing code.
> > >
> > > ---
> > > v2:
> > > - Add a new patch "idpf: simplify and fix splitq Tx packet rollback
> > >   error path" that fixes a bug in the error path. It also sets up
> > >   changes in patch 4 that are necessary to prevent a crash when a packet
> > >   rollback occurs using the buffer pool.
> > >
> > > v1:
> > > https://lore.kernel.org/intel-wired-lan/c6444d15-bc20-41a8-9230-
> > [email protected]/T/#maf9f464c598951ee860e5dd24ef8a45
> > 1a488c5a0
> > >
> > > Joshua Hay (6):
> > >   idpf: add support for Tx refillqs in flow scheduling mode
> > >   idpf: improve when to set RE bit logic
> > >   idpf: simplify and fix splitq Tx packet rollback error path
> > >   idpf: replace flow scheduling buffer ring with buffer pool
> > >   idpf: stop Tx if there are insufficient buffer resources
> > >   idpf: remove obsolete stashing code
> > >
> > >  .../ethernet/intel/idpf/idpf_singleq_txrx.c   |  61 +-
> > >  drivers/net/ethernet/intel/idpf/idpf_txrx.c   | 723 +++++++-----------
> > >  drivers/net/ethernet/intel/idpf/idpf_txrx.h   |  87 +--
> > >  3 files changed, 356 insertions(+), 515 deletions(-)
> > 
> > Hi Joshua, all,
> > 
> > Perhaps it is not followed much anymore, but at least according to [1]
> > patches for stable should not be more than 100 lines, with context.
> > 
> > This patch-set is an order of magnitude larger.
> > Can something be done to create a more minimal fix?
> > 
> > [1] https://docs.kernel.org/process/stable-kernel-rules.html#stable-kernel-
> > rules
> 
> Hi Simon,
> 
> It will be quite difficult to make this series smaller without any negative 
> side effects. Here's a summary of why certain patches are necessary or where 
> deferrals are possible (and the side effects):
> 
> "idpf: add support for Tx refillqs in flow scheduling mode" is required to 
> keep the Tx path lockless. Otherwise, we would have to use locking in hot 
> path to pass the tags freed in the cleaning thread back to whatever data 
> struct the sending thread pulls the tags from.
> 
> Without "idpf: improve when to set RE bit logic", the driver will be much 
> more susceptible to Tx timeouts when sending large packets.
> 
> "idpf: simplify and fix splitq Tx packet rollback error path" is a must to 
> setup for the new buffer chaining. The previous implementation rolled back 
> based on ring indexing, which will crash the system if a dma_mapping_error 
> occurs or we run out of tags (more on that below). The buffer tags (indexes) 
> pulled for a packet are not guaranteed to be contiguous, especially as 
> out-of-order completions are processed. 
> 
> "idpf: stop Tx if there are insufficient buffer resources" could possibly be 
> deferred, but that makes the rollback change above even more critical as we 
> lose an extra layer of protection from running out of tags. It's also one of 
> the smaller patches and won't make a significant difference in terms of size.
> 
> "idpf: remove obsolete stashing code" could also potentially be deferred but 
> is 95% removing obsolete code, which leaves a lot of dead code.

Thanks Joshua,

if there is justification, which is indeed the case,
then I withdraw my suggestion to create a smaller fix.

Reply via email to