This commit optimizes the output action, by enabling the compiler to optimize the code better through reducing code complexity.
The core concept of this optimization is that the array-length checks have already been performed above the copying code, so can be removed. Removing of the per-packet length checks allows the compiler to auto-vectorize the stores using SIMD registers. Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> --- v8: Add NEWS entry. --- NEWS | 1 + lib/dpif-netdev.c | 23 ++++++++++++++++++----- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index 5f1e3b5e0..2ffc155f9 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,7 @@ Post-v2.15.0 * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. + * Optimize dp_netdev_output by enhancing compiler optimization potential. v2.15.0 - xx xxx xxxx --------------------- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5e83755d7..b2cf1bd46 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7254,12 +7254,25 @@ dp_execute_output_action(struct dp_netdev_pmd_thread *pmd, pmd->n_output_batches++; } - struct dp_packet *packet; - DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) { - p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] = - pmd->ctx.last_rxq; - dp_packet_batch_add(&p->output_pkts, packet); + /* The above checks ensure that there is enough space in the output batch. + * Using dp_packet_batch_add() has a branch to check if the batch is full. + * This branch reduces the compiler's ability to optimize efficiently. The + * below code implements packet movement between batches without checks, + * with the required semantics of output batch perhaps containing packets. + */ + int batch_size = dp_packet_batch_size(packets_); + int out_batch_idx = dp_packet_batch_size(&p->output_pkts); + struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq; + struct dp_packet_batch *output_batch = &p->output_pkts; + + for (int i = 0; i < batch_size; i++) { + struct dp_packet *packet = packets_->packets[i]; + p->output_pkts_rxqs[out_batch_idx] = rxq; + output_batch->packets[out_batch_idx] = packet; + out_batch_idx++; } + output_batch->count += batch_size; + return true; } -- 2.25.1 _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev