Hi Cristian, Please see my comments inline.
> > >> -----Original Message----- >> From: Robert Sanford [mailto:rsanford2 at gmail.com] >> Sent: Monday, March 28, 2016 9:52 PM >> To: dev at dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu at intel.com> >> Subject: [PATCH 4/4] port: fix ethdev writer burst too big >> >> For f_tx_bulk functions in rte_port_ethdev.c, we may unintentionally >> send bursts larger than tx_burst_sz to the underlying ethdev. >> Some PMDs (e.g., ixgbe) may truncate this request to their maximum >> burst size, resulting in unnecessary enqueuing failures or ethdev >> writer retries. > >Sending bursts larger than tx_burst_sz is actually intentional. The >assumption is that NIC performance benefits from larger burst size. So >the tx_burst_sz is used as a minimal burst size requirement, not as a >maximal or fixed burst size requirement. > >I agree with you that a while ago the vector version of IXGBE driver used >to work the way you describe it, but I don't think this is the case >anymore. As an example, if TX burst size is set to 32 and 48 packets are >transmitted, than the PMD will TX all the 48 packets (internally it can >work in batches of 4, 8, 32, etc, should not matter) rather than TXing >just 32 packets out of 48 and user having to either discard or retry with >the remaining 16 packets. I am CC-ing Steve Liang for confirming this. > >Is there any PMD that people can name that currently behaves the >opposite, i.e. given a burst of 48 pkts for TX, accept 32 pkts and >discard the other 16? > >> Yes, I believe that IXGBE *still* truncates. What am I missing? :) My interpretation of the latest vector TX burst function is that it truncates bursts longer than txq->tx_rs_thresh. Here are relevant code snippets that show it lowering the number of packets (nb_pkts) to enqueue (apologies in advance for the email client mangling the indentation): --- #define IXGBE_DEFAULT_TX_RSBIT_THRESH 32 static void ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { ... dev_info->default_txconf = (struct rte_eth_txconf) { ... .tx_rs_thresh = IXGBE_DEFAULT_TX_RSBIT_THRESH, ... }; ... } uint16_t ixgbe_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) { ... /* cross rx_thresh boundary is not allowed */ nb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh); if (txq->nb_tx_free < txq->tx_free_thresh) ixgbe_tx_free_bufs(txq); nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts); if (unlikely(nb_pkts == 0)) return 0; ... return nb_pkts; } --- >> We propose to fix this by moving the tx buffer flushing logic from >> *after* the loop that puts all packets into the tx buffer, to *inside* >> the loop, testing for a full burst when adding each packet. >> > >The issue I have with this approach is the introduction of a branch that >has to be tested for each iteration of the loop rather than once for the >entire loop. > >The code branch where you add this is actually the slow(er) code path >(where local variable expr != 0), which is used for non-contiguous or >bursts smaller than tx_burst_sz. Is there a particular reason you are >only interested of enabling this strategy (of using tx_burst_sz as a >fixed burst size requirement) only on this code path? The reason I am >asking is the other fast(er) code path (where expr == 0) also uses >tx_burst_sz as a minimal requirement and therefore it can send burst >sizes bigger than tx_burst_sz. The reason we limit the burst size only in the "else" path is that we also proposed to limit the ethdev tx burst in the "if (expr==0)" path, in patch 3/4. > > >> Signed-off-by: Robert Sanford <rsanford at akamai.com> >> --- >> lib/librte_port/rte_port_ethdev.c | 20 ++++++++++---------- >> 1 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/lib/librte_port/rte_port_ethdev.c >> b/lib/librte_port/rte_port_ethdev.c >> index 3fb4947..1283338 100644 >> --- a/lib/librte_port/rte_port_ethdev.c >> +++ b/lib/librte_port/rte_port_ethdev.c >> @@ -151,7 +151,7 @@ static int rte_port_ethdev_reader_stats_read(void >> *port, >> struct rte_port_ethdev_writer { >> struct rte_port_out_stats stats; >> >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; >> uint32_t tx_burst_sz; >> uint16_t tx_buf_count; >> uint64_t bsz_mask; >> @@ -257,11 +257,11 @@ rte_port_ethdev_writer_tx_bulk(void *port, >> p->tx_buf[tx_buf_count++] = pkt; >> >> RTE_PORT_ETHDEV_WRITER_STATS_PKTS_IN_ADD(p, 1); >> pkts_mask &= ~pkt_mask; >> - } >> >> - p->tx_buf_count = tx_buf_count; >> - if (tx_buf_count >= p->tx_burst_sz) >> - send_burst(p); >> + p->tx_buf_count = tx_buf_count; >> + if (tx_buf_count >= p->tx_burst_sz) >> + send_burst(p); >> + } >> } > >One observation here: if we enable this proposal (which I have an issue >with due to the executing the branch per loop iteration rather than once >per entire loop), it also eliminates the buffer overflow issue flagged by >you in the other email :), so no need to e.g. doble the size of the port >internal buffer (tx_buf). > >> Not exactly correct: We suggested doubling tx_buf[] for *ring* writers. Here (the hunks above) we suggest the opposite: *reduce* the size of the *ethdev* tx_buf[], because we never expect to buffer more than a full burst. You are correct about the additional branch per loop iteration. On the other hand, the proposed change is simpler than something like this: compute how many more packets we need to complete a full burst, copy them to tx_buf[], send_burst(), and then copy the rest to tx_buf[]. Either way is acceptable to me. >> return 0; >> @@ -328,7 +328,7 @@ static int rte_port_ethdev_writer_stats_read(void >> *port, >> struct rte_port_ethdev_writer_nodrop { >> struct rte_port_out_stats stats; >> >> - struct rte_mbuf *tx_buf[2 * RTE_PORT_IN_BURST_SIZE_MAX]; >> + struct rte_mbuf *tx_buf[RTE_PORT_IN_BURST_SIZE_MAX]; >> uint32_t tx_burst_sz; >> uint16_t tx_buf_count; >> uint64_t bsz_mask; >> @@ -466,11 +466,11 @@ rte_port_ethdev_writer_nodrop_tx_bulk(void >> *port, >> p->tx_buf[tx_buf_count++] = pkt; >> >> RTE_PORT_ETHDEV_WRITER_NODROP_STATS_PKTS_IN_ADD(p, 1); >> pkts_mask &= ~pkt_mask; >> - } >> >> - p->tx_buf_count = tx_buf_count; >> - if (tx_buf_count >= p->tx_burst_sz) >> - send_burst_nodrop(p); >> + p->tx_buf_count = tx_buf_count; >> + if (tx_buf_count >= p->tx_burst_sz) >> + send_burst_nodrop(p); >> + } >> } >> >> return 0; >> -- >> 1.7.1 > -- Regards, Robert