> On Thu, Jul 17, 2025 at 05:21:44PM -0700, Joshua Hay wrote:
> > This series fixes a stability issue in the flow scheduling Tx send/clean
> > path that results in a Tx timeout.
> >
> > The existing guardrails in the Tx path were not sufficient to prevent
> > the driver from reusing completion tags that were still in flight (held
> > by the HW).  This collision would cause the driver to erroneously clean
> > the wrong packet thus leaving the descriptor ring in a bad state.
> >
> > The main point of this refactor is to replace the flow scheduling buffer
> > ring with a large pool/array of buffers.  The completion tag then simply
> > is the index into this array.  The driver tracks the free tags and pulls
> > the next free one from a refillq.  The cleaning routines simply use the
> > completion tag from the completion descriptor to index into the array to
> > quickly find the buffers to clean.
> >
> > All of the code to support the refactor is added first to ensure traffic
> > still passes with each patch.  The final patch then removes all of the
> > obsolete stashing code.
> >
> > ---
> > v2:
> > - Add a new patch "idpf: simplify and fix splitq Tx packet rollback
> >   error path" that fixes a bug in the error path. It also sets up
> >   changes in patch 4 that are necessary to prevent a crash when a packet
> >   rollback occurs using the buffer pool.
> >
> > v1:
> > https://lore.kernel.org/intel-wired-lan/c6444d15-bc20-41a8-9230-
> [email protected]/T/#maf9f464c598951ee860e5dd24ef8a45
> 1a488c5a0
> >
> > Joshua Hay (6):
> >   idpf: add support for Tx refillqs in flow scheduling mode
> >   idpf: improve when to set RE bit logic
> >   idpf: simplify and fix splitq Tx packet rollback error path
> >   idpf: replace flow scheduling buffer ring with buffer pool
> >   idpf: stop Tx if there are insufficient buffer resources
> >   idpf: remove obsolete stashing code
> >
> >  .../ethernet/intel/idpf/idpf_singleq_txrx.c   |  61 +-
> >  drivers/net/ethernet/intel/idpf/idpf_txrx.c   | 723 +++++++-----------
> >  drivers/net/ethernet/intel/idpf/idpf_txrx.h   |  87 +--
> >  3 files changed, 356 insertions(+), 515 deletions(-)
> 
> Hi Joshua, all,
> 
> Perhaps it is not followed much anymore, but at least according to [1]
> patches for stable should not be more than 100 lines, with context.
> 
> This patch-set is an order of magnitude larger.
> Can something be done to create a more minimal fix?
> 
> [1] https://docs.kernel.org/process/stable-kernel-rules.html#stable-kernel-
> rules

Hi Simon,

It will be quite difficult to make this series smaller without any negative 
side effects. Here's a summary of why certain patches are necessary or where 
deferrals are possible (and the side effects):

"idpf: add support for Tx refillqs in flow scheduling mode" is required to keep 
the Tx path lockless. Otherwise, we would have to use locking in hot path to 
pass the tags freed in the cleaning thread back to whatever data struct the 
sending thread pulls the tags from.

Without "idpf: improve when to set RE bit logic", the driver will be much more 
susceptible to Tx timeouts when sending large packets.

"idpf: simplify and fix splitq Tx packet rollback error path" is a must to 
setup for the new buffer chaining. The previous implementation rolled back 
based on ring indexing, which will crash the system if a dma_mapping_error 
occurs or we run out of tags (more on that below). The buffer tags (indexes) 
pulled for a packet are not guaranteed to be contiguous, especially as 
out-of-order completions are processed. 

"idpf: stop Tx if there are insufficient buffer resources" could possibly be 
deferred, but that makes the rollback change above even more critical as we 
lose an extra layer of protection from running out of tags. It's also one of 
the smaller patches and won't make a significant difference in terms of size.

"idpf: remove obsolete stashing code" could also potentially be deferred but is 
95% removing obsolete code, which leaves a lot of dead code.

Thanks!
Josh

Reply via email to