On 2/27/26 9:29 AM, David Marchand via dev wrote:
> Prepare dpif-netdev to handle larger batches.
> 
> Metering does not need support for large batches, as it is invoked
> before actions that will produce large batches in next commits (ip reassembly,
> GSO).

It is possible for metering to go after ct(commit).  I think, you may even
just add a meter action to your ipf test in the patch 7 to reproduce.

There is a way to refactor the meter action to avoid having the stack
allocated arrays that depend on the batch size.  It will look somewhat
like this:

1. Go over all the bands and sub the full batch value.  Remember failed
   bands in a bitmap.  MAX_BANDS is 8, so the bitmap is simple.

2. Return if the bitmap is empty.

3. For each packet:
    For each failed band in a bitmap (ctz):
      Try to sub, remember the highest failed band.
    Count band bytes and packets for the highest band.

4. Add counted band stats to the actual band.

Since we're iterating over the packets in the outer loop in this case,
there should be no need to store the highest bands per packet across
iteration, so simple temporary variables can be used.

Inner loop for the failed bands should be fast enough.

Alternatively, we could also split the large batch as you did in other
places before calling the actual meter code.

> (can someone with more knowledge about this code confirm?)
> 
> Split big batches into "small" NETDEV_MAX_BURST batches for recirculation,
> as some dpif-netdev input operations (flow extraction and AVX512 lookups)
> are using arrays (on the stack) sized for NETDEV_MAX_BURST burst of
> packets.

This is the most unfortunate.  AVX512 is no more, but all other stuff is
deeply rooted in the NETDEV_MAX_BURST.  I can't think of right now of a
good way to rework all of that, so I guess, splitting on recirculation
is fine for now.

> 
> Also update dpif-netdev output action, as it relies on structures sized 
> against
> NETDEV_MAX_BURST (for stats/cycles tracking, and txq distribution).

Maybe this one split can be avoided?  We could try to track the growth
of the output_pkts batch and make the output_pkts_rxqs dynamic and
grow it at the same time.  The only place where the output_pkts can
grow seem to be dp_execute_output_chunk().  We could store the previous
size, then add_array(), check if the size changed and realloc the rxqs
array before updating the pointer in it.  This should allow us to avoid
extra splitting work.  WDYT?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to