Re: [dpdk-dev] Retire x86 32 bit?
On 4/17/18 4:46 PM, Stephen Hemminger wrote: On Tue, 17 Apr 2018 13:01:14 -0700 Jim Murphy wrote: Still used in certain memory constrained environments. On Tue, Apr 17, 2018 at 11:39 AM, David Harton (dharton) wrote: It is used and tested in production and non-production environments. Regards, Dave -Original Message- From: dev On Behalf Of Stephen Hemminger Sent: Tuesday, April 17, 2018 2:31 PM To: dev@dpdk.org Subject: [dpdk-dev] Retire x86 32 bit? I wonder if x86 32 bit is still useful? Many distributions no longer support it, and not sure if it is tested througly by anyone. Maybe time to deprecate it (gradually)? Pure 32 bit, or x86-64 instructions and registers used in 32 bit mode (which can be faster). . Pure 32bit in our case. We do not use x32 ABI. -Roger
Re: [dpdk-dev] [PATCH v2] net/e1000: correct VLAN tag byte order for i35x LB packets
On 10/25/17 4:22 PM, Ferruh Yigit wrote: On 10/25/2017 1:16 PM, Bruce Richardson wrote: On Wed, Oct 25, 2017 at 11:11:08AM -0700, Ferruh Yigit wrote: On 10/23/2017 10:42 AM, Roger B. Melton wrote: On 10/20/17 3:04 PM, Ferruh Yigit wrote: On 10/12/2017 10:24 AM, Roger B Melton wrote: When copying VLAN tags from the RX descriptor to the vlan_tci field in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always little endian. While i350, i354 and /i350vf VLAN non-loopback packets are stored little endian, VLAN tags in loopback packets for those devices are big endian. For i350, i354 and i350vf VLAN loopback packets, swap the tag when copying from the RX descriptor to the mbuf header. This will ensure that the mbuf vlan_tci is always little endian. Signed-off-by: Roger B Melton <...> @@ -946,9 +954,16 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxm->hash.rss = rxd.wb.lower.hi_dword.rss; hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); - /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */ - rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); - + /* +* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is +* set in the pkt_flags field and must be in CPU byte order. +*/ + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { This is adding more condition checks into Rx path. What is the performance cost of this addition? I have not measured the performance cost, but I can collect data. What specifically are you looking for? To be clear the current implementation incorrect as it does not normalize the vlan tag to CPU byte order before copying it into mbuf and applications have no visibility to determine if the tag in the mbuf is big or little endian. Do you have any suggestions for an alternative approach to avoid rx patch checks? No suggestion indeed. And correctness matters. But this add a cost and I wonder how much it is, based on that result it may be possible to do more investigation for alternate solutions or trade-offs. Konstantin, Bruce, Wenzhuo, What do you think, do you have any comment? For a 1G driver, is performance really that big an issue? I don't know. So is this an Ack from you for the patch? I can tell you that from the perspective of my application the performance impact for 1G is not a concern. FWIW, I did go through a few iterations with Wenzhou to minimize the performance impact before we settled on this implementation, and Wenzhou did Ack it btw. I'm hoping we can get this into 17.11. Thanks, -Roger Unless you have a *lot* of 1G ports, I would expect most platforms not to notice an extra couple of cycles when dealing with 1G line rates. /Bruce .
Re: [dpdk-dev] [PATCH v2] net/e1000: correct VLAN tag byte order for i35x LB packets
On 10/20/17 3:04 PM, Ferruh Yigit wrote: On 10/12/2017 10:24 AM, Roger B Melton wrote: When copying VLAN tags from the RX descriptor to the vlan_tci field in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always little endian. While i350, i354 and /i350vf VLAN non-loopback packets are stored little endian, VLAN tags in loopback packets for those devices are big endian. For i350, i354 and i350vf VLAN loopback packets, swap the tag when copying from the RX descriptor to the mbuf header. This will ensure that the mbuf vlan_tci is always little endian. Signed-off-by: Roger B Melton <...> @@ -946,9 +954,16 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxm->hash.rss = rxd.wb.lower.hi_dword.rss; hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); - /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */ - rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); - + /* +* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is +* set in the pkt_flags field and must be in CPU byte order. +*/ + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { This is adding more condition checks into Rx path. What is the performance cost of this addition? I have not measured the performance cost, but I can collect data. What specifically are you looking for? To be clear the current implementation incorrect as it does not normalize the vlan tag to CPU byte order before copying it into mbuf and applications have no visibility to determine if the tag in the mbuf is big or little endian. Do you have any suggestions for an alternative approach to avoid rx patch checks? Thanks, Roger + rxm->vlan_tci = rte_be_to_cpu_16(rxd.wb.upper.vlan); + } else { + rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); + } pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, hlen_type_rss); pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr); pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr); <...> .
[dpdk-dev] [PATCH v2] net/e1000: correct VLAN tag byte order for i35x LB packets
When copying VLAN tags from the RX descriptor to the vlan_tci field in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always little endian. While i350, i354 and /i350vf VLAN non-loopback packets are stored little endian, VLAN tags in loopback packets for those devices are big endian. For i350, i354 and i350vf VLAN loopback packets, swap the tag when copying from the RX descriptor to the mbuf header. This will ensure that the mbuf vlan_tci is always little endian. Signed-off-by: Roger B Melton --- v2: * PF: Limit swap to i350 and i354 * VF: Limit swap to i350VF drivers/net/e1000/igb_rxtx.c | 55 1 file changed, 50 insertions(+), 5 deletions(-) diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index 1c80a2a..3eb547b 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -105,6 +105,13 @@ struct igb_tx_entry { }; /** + * rx queue flags + */ +enum igb_rxq_flags { + IGB_RXQ_FLAG_LB_BSWAP_VLAN = 0x01, +}; + +/** * Structure associated with each RX queue. */ struct igb_rx_queue { @@ -128,6 +135,7 @@ struct igb_rx_queue { uint8_t wthresh;/**< Write-back threshold register. */ uint8_t crc_len;/**< 0 if CRC stripped, 4 otherwise. */ uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */ + uint32_tflags; /**< RX flags. */ }; /** @@ -946,9 +954,16 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxm->hash.rss = rxd.wb.lower.hi_dword.rss; hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); - /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */ - rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); - + /* +* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is +* set in the pkt_flags field and must be in CPU byte order. +*/ + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { + rxm->vlan_tci = rte_be_to_cpu_16(rxd.wb.upper.vlan); + } else { + rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); + } pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, hlen_type_rss); pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr); pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr); @@ -1181,9 +1196,16 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, /* * The vlan_tci field is only valid when PKT_RX_VLAN_PKT is -* set in the pkt_flags field. +* set in the pkt_flags field and must be in CPU byte order. */ - first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { + first_seg->vlan_tci = + rte_be_to_cpu_16(rxd.wb.upper.vlan); + } else { + first_seg->vlan_tci = + rte_le_to_cpu_16(rxd.wb.upper.vlan); + } hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, hlen_type_rss); pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr); @@ -2278,6 +2300,18 @@ eth_igb_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* i350 and i354 vlan packets have vlan tags byte swapped. +*/ + if ((hw->mac.type == e1000_i350) || + (hw->mac.type == e1000_i354)) { + rxq->flags |= IGB_RXQ_FLAG_LB_BSWAP_VLAN; + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap required"); + } else { + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap not required"); + } + /* Allocate buffers for descriptor rings and set up queue */ ret = igb_alloc_rx_queue_mbufs(rxq); if (ret) @@ -2557,6 +2591,17 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* i350VF LB vlan packets have vlan tags byte swapped. +*/ + if (hw->mac.type == e1000_vfadapt_i350) { + rxq->flags |= I
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/i40e: fix unexpected mbuf free in vPMD
On 10/10/17 7:47 AM, Roger B. Melton wrote: On 10/10/17 7:39 AM, Bruce Richardson wrote: On Tue, Oct 10, 2017 at 07:05:33AM -0400, Roger B. Melton wrote: Hi Bruce, I can. It will take a day or 2 to get the results. Regards, Roger Thanks. Keep us posted. We want to ensure we have the best fix possible for the issue in this release. Thanks for finding a better solution ;-). I'll let you know the results when I have them, just wanted you to know it may take a couple of days to turn around. -Roger Confirmed that this patch does address the issue. Thanks again Bruce/Qi for finding a better alternative. Regards, Roger /Bruce On 10/10/17 4:48 AM, Bruce Richardson wrote: +Roger Melton On Tue, Oct 10, 2017 at 09:22:05AM -0400, Qi Zhang wrote: vPMD tx does not set sw_ring's mbuf to NULL after free it. So to prevent same mbuf be free again, we need more carefully check in i40e_tx_queue_release_mbufs. Fixes: b4669bb95038 ("i40e: add vector Tx") Cc: sta...@dpdk.org Signed-off-by: Qi Zhang --- v2: - fix at i40e_tx_queue_release_mbufs so no performance impact. This patch also supercedes this one, http://dpdk.org/dev/patchwork/patch/29814/ right? Roger, can you please confirm that this alternative fix works in your testing and that your patch is no longer necessary too. Thanks, /Bruce . . .
Re: [dpdk-dev] [PATCH] net/e1000: i350 VLAN tags in loopback packets are big endian
Hi Wenzhou, On 10/10/17 9:32 PM, Lu, Wenzhuo wrote: Hi Roger, -Original Message- From: Roger B Melton [mailto:rmel...@cisco.com] Sent: Saturday, October 7, 2017 7:34 AM To: Lu, Wenzhuo Cc: dev@dpdk.org; Roger B Melton Subject: [PATCH] net/e1000: i350 VLAN tags in loopback packets are big endian When copying VLAN tags from the RX descriptor to the vlan_tci field in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always little endian. While i350 VLAN non-loopback packets are stored little endian, VLAN tags for i350 VLAN loopback packets are big endian. For i350 VLAN loopback packets, swap the tag when copying from the RX descriptor to the mbuf header. This will ensure that the mbuf vlan_tci is always little endian. Signed-off-by: Roger B Melton --- drivers/net/e1000/igb_rxtx.c | 56 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index 1c80a2a..8432d8b 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -2278,6 +2300,18 @@ eth_igb_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* The i210, i211, i350, i354 and i350VF LB vlan packets +* have vlan tags byte swapped. +*/ + if (hw->mac.type >= e1000_i350) { Many thanks for the patch. I think we need to check i350 and i354 here as LB bit is only meaningful on these NICs. Agreed, I'll change this to: if ((hw->mac.type == e1000_i350) || (hw->mac.type == e1000_i354)) {... + rxq->flags |= IGB_RXQ_FLAG_LB_BSWAP_VLAN; + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap required"); + } else { + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap not required"); + } + /* Allocate buffers for descriptor rings and set up queue */ ret = igb_alloc_rx_queue_mbufs(rxq); if (ret) @@ -2557,6 +2591,18 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* The i210, i211, i350, i354 and i350VF LB vlan packets +* have vlan tags byte swapped. +*/ + if (hw->mac.type >= e1000_i350) { We need to check e1000_vfadapt_i350 here to exclude e1000_vfadapt. Agreed, I'll change this to: if (hw->mac.type == e1000_vfadapt_i350) {... I'll make these changes and submit an updated patch. + rxq->flags |= IGB_RXQ_FLAG_LB_BSWAP_VLAN; + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap required"); + } else { + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap not required"); + } + /* Allocate buffers for descriptor rings and set up queue */ ret = igb_alloc_rx_queue_mbufs(rxq); if (ret) -- 2.10.3.dirty .
Re: [dpdk-dev] [PATCH] net/i40e: fix unexpected mbuf free in vPMD
On 10/10/17 7:39 AM, Bruce Richardson wrote: On Tue, Oct 10, 2017 at 07:05:33AM -0400, Roger B. Melton wrote: Hi Bruce, I can. It will take a day or 2 to get the results. Regards, Roger Thanks. Keep us posted. We want to ensure we have the best fix possible for the issue in this release. Thanks for finding a better solution ;-). I'll let you know the results when I have them, just wanted you to know it may take a couple of days to turn around. -Roger /Bruce On 10/10/17 4:48 AM, Bruce Richardson wrote: +Roger Melton On Tue, Oct 10, 2017 at 09:22:05AM -0400, Qi Zhang wrote: vPMD tx does not set sw_ring's mbuf to NULL after free it. So to prevent same mbuf be free again, we need more carefully check in i40e_tx_queue_release_mbufs. Fixes: b4669bb95038 ("i40e: add vector Tx") Cc: sta...@dpdk.org Signed-off-by: Qi Zhang --- v2: - fix at i40e_tx_queue_release_mbufs so no performance impact. This patch also supercedes this one, http://dpdk.org/dev/patchwork/patch/29814/ right? Roger, can you please confirm that this alternative fix works in your testing and that your patch is no longer necessary too. Thanks, /Bruce . .
Re: [dpdk-dev] [PATCH] net/i40e: fix unexpected mbuf free in vPMD
Hi Bruce, I can. It will take a day or 2 to get the results. Regards, Roger On 10/10/17 4:48 AM, Bruce Richardson wrote: +Roger Melton On Tue, Oct 10, 2017 at 09:22:05AM -0400, Qi Zhang wrote: vPMD tx does not set sw_ring's mbuf to NULL after free it. So to prevent same mbuf be free again, we need more carefully check in i40e_tx_queue_release_mbufs. Fixes: b4669bb95038 ("i40e: add vector Tx") Cc: sta...@dpdk.org Signed-off-by: Qi Zhang --- v2: - fix at i40e_tx_queue_release_mbufs so no performance impact. This patch also supercedes this one, http://dpdk.org/dev/patchwork/patch/29814/ right? Roger, can you please confirm that this alternative fix works in your testing and that your patch is no longer necessary too. Thanks, /Bruce .
[dpdk-dev] [PATCH] net/e1000: i350 VLAN tags in loopback packets are big endian
When copying VLAN tags from the RX descriptor to the vlan_tci field in the mbuf header, igb_rxtx.c:eth_igb_recv_pkts() and eth_igb_recv_scattered_pkts() both assume that the VLAN tag is always little endian. While i350 VLAN non-loopback packets are stored little endian, VLAN tags for i350 VLAN loopback packets are big endian. For i350 VLAN loopback packets, swap the tag when copying from the RX descriptor to the mbuf header. This will ensure that the mbuf vlan_tci is always little endian. Signed-off-by: Roger B Melton --- drivers/net/e1000/igb_rxtx.c | 56 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c index 1c80a2a..8432d8b 100644 --- a/drivers/net/e1000/igb_rxtx.c +++ b/drivers/net/e1000/igb_rxtx.c @@ -105,6 +105,13 @@ struct igb_tx_entry { }; /** + * rx queue flags + */ +enum igb_rxq_flags { + IGB_RXQ_FLAG_LB_BSWAP_VLAN = 0x01, +}; + +/** * Structure associated with each RX queue. */ struct igb_rx_queue { @@ -128,6 +135,7 @@ struct igb_rx_queue { uint8_t wthresh;/**< Write-back threshold register. */ uint8_t crc_len;/**< 0 if CRC stripped, 4 otherwise. */ uint8_t drop_en; /**< If not 0, set SRRCTL.Drop_En. */ + uint32_tflags; /**< RX flags. */ }; /** @@ -946,9 +954,16 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, rxm->hash.rss = rxd.wb.lower.hi_dword.rss; hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); - /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */ - rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); - + /* +* The vlan_tci field is only valid when PKT_RX_VLAN_PKT is +* set in the pkt_flags field and must be in CPU byte order. +*/ + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { + rxm->vlan_tci = rte_be_to_cpu_16(rxd.wb.upper.vlan); + } else { + rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); + } pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, hlen_type_rss); pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr); pkt_flags = pkt_flags | rx_desc_error_to_pkt_flags(staterr); @@ -1181,9 +1196,16 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, /* * The vlan_tci field is only valid when PKT_RX_VLAN_PKT is -* set in the pkt_flags field. +* set in the pkt_flags field and must be in CPU byte order. */ - first_seg->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); + if ((staterr & rte_cpu_to_le_32(E1000_RXDEXT_STATERR_LB)) && + (rxq->flags & IGB_RXQ_FLAG_LB_BSWAP_VLAN)) { + first_seg->vlan_tci = + rte_be_to_cpu_16(rxd.wb.upper.vlan); + } else { + first_seg->vlan_tci = + rte_le_to_cpu_16(rxd.wb.upper.vlan); + } hlen_type_rss = rte_le_to_cpu_32(rxd.wb.lower.lo_dword.data); pkt_flags = rx_desc_hlen_type_rss_to_pkt_flags(rxq, hlen_type_rss); pkt_flags = pkt_flags | rx_desc_status_to_pkt_flags(staterr); @@ -2278,6 +2300,18 @@ eth_igb_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* The i210, i211, i350, i354 and i350VF LB vlan packets +* have vlan tags byte swapped. +*/ + if (hw->mac.type >= e1000_i350) { + rxq->flags |= IGB_RXQ_FLAG_LB_BSWAP_VLAN; + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap required"); + } else { + PMD_INIT_LOG(DEBUG, "IGB rx vlan bswap not required"); + } + /* Allocate buffers for descriptor rings and set up queue */ ret = igb_alloc_rx_queue_mbufs(rxq); if (ret) @@ -2557,6 +2591,18 @@ eth_igbvf_rx_init(struct rte_eth_dev *dev) rxq = dev->data->rx_queues[i]; + rxq->flags = 0; + /* +* The i210, i211, i350, i354 and i350VF LB vlan packets +* have vlan tags byte swapped. +*/ + if (hw->mac.type >= e1000_i350) { + rxq->flags |= IGB_RXQ_FLAG_LB_BSWAP_VLAN; + PMD_INIT_LOG(DEBUG, &quo
[dpdk-dev] [PATCH v2] net/i40e: improve i40evf buffer cleanup in Tx vector mode
Signed-off-by: Roger B Melton --- v2 - Same content as v1, but properly signed-off. i40evf tx vector logic frees mbufs, but it does not remove the mbufs from software rings which leads to double frees. This change corrects that oversight. I've validated this fix within our application. drivers/net/i40e/i40e_rxtx_vec_common.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 39a6da0..fdc6fce 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -127,6 +127,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) if (likely(m != NULL)) { free[0] = m; nb_free = 1; + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (likely(m != NULL)) { @@ -139,14 +140,17 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) free[0] = m; nb_free = 1; } + txep[i].mbuf = NULL; } } rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); } else { + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (m != NULL) rte_mempool_put(m->pool, m); + txep[i].mbuf = NULL; } } -- 2.10.3.dirty
Re: [dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
On 10/6/17 9:51 AM, Bruce Richardson wrote: On Fri, Oct 06, 2017 at 08:38:01AM -0400, Roger B. Melton wrote: On 10/6/17 4:54 AM, Bruce Richardson wrote: On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote: --- i40evf tx vector logic frees mbufs, but it does not remove the mbufs from software rings which leads to double frees. This change corrects that oversight. We've validated this fix within our application. Hi Roger, I'm a little concerned here by this driver fix, since if we are getting double frees of mbufs there must be another bug somewhere in tracking the free ring elements. Clearing the entries to NULL introduces extra writes to the vector code path which will likely have a performance impact, but also should be unnecessary, given proper tracking of the ring status. Can you provide us with some details as to how to reproduce the issue, especially with a public sample app? I'd really like to look more into why this is happening, and if other fixes may work. Thanks, /Bruce . Hey Bruce, I've not attempted to reproduce the issue using sample apps. It was initially difficult for us to reproduce with our application until we stumbled on a recipe. The symptoms of the double free are two fold: * Application crashes: Corrupted packet headers, walking a chain of rx segments and loosing one in the middle,... * i40e adminq lockup - Meaning VF sends an OP and PF does not process it The former has been directly correlated to tx vector mode. We still don't understand how this could lead to adminq lockup, but we have not observed the issue applied the patch I submitted. We have questions out to Intel i40e team on this. Here's a high level view of the scenario under which the issues are observed and how we concluded that there were issues with tx vector free. We provided this information to the Intel i40e DPDK team, they reviewed the tx vector logic and suggested changes. With the changes suggested by Intel (the patch I submitted) we have re-enabled tx vector when MTU is < MBUF size and observed no crashes. Below that you will find additional detail on the procedure within our application for changing MTU, including DPDK API calls. Let me know if you have additional questions. Regards, Roger o Scenario: + Intel i40e-da4 (4x10G) or Cisco branded equivalent + KVM or ESXi 6.5 host + In a single VM we have at least 1 VF from all 4 PFs. + We are running traffic on all interfaces. + We stop traffic + We stop a device (VF) + We start the device + We start traffic + At any point after we start the device we can crash o Experiment (Jumping over some of the work to get to the point where we believed that i40e driver was doing double frees): + Our application attaches userdata to mbufs and we put that userdata on a linked list. + We observed that when we processed the userdata it had been corrupted which lead to crashes. + This gave us a hint that the mbuf was on multiple lists. + We reviewed our application code and could not find a cause. + We began to suspect a double free in the i40evf PMD. # We disabled rx free logic and observed crashes (intentionally leaking mbufs in search of the double free). # We disabled tx free logic and observed no crashes # This gave us a hint that the double frees were coming from the i40evf PMD tx logic. + We had also observed that if we forced MTU to large always that there were no crashes # A side effect of forcing large MTU is that multi-segment is enabled. # This gave us a hint that enabling multi-segment was somehow avoiding the double free. + We forced multi-segments regardless of MTU and permitted MTU changes and observed no crashes. + We reviewed the i40evf mbuf free logic to see the effect of enabling multi-segment and observed that when multi-segment is enabled, rx vector was enabled but tx vector was not. + This lead us to examine RX vector mode free logic vs TX vector mode free logic. # RX free logic has special handling for vector mode free # TX free logic does not have any special handling for vector free o By enabling multiple segments always (even if MTU does not require multiple segments), we effectively disabled tx vector mode and this avoided the double free. o Our application no longer crashed, but it could not take advantage of tx vector optimizations. CP == Control Plane DP == Data Plane * CP sends admin down to DP * DP disables RX/TX o Block all future tx burst calls
Re: [dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
On 10/6/17 4:54 AM, Bruce Richardson wrote: On Thu, Oct 05, 2017 at 03:11:11PM -0400, Roger B Melton wrote: --- i40evf tx vector logic frees mbufs, but it does not remove the mbufs from software rings which leads to double frees. This change corrects that oversight. We've validated this fix within our application. Hi Roger, I'm a little concerned here by this driver fix, since if we are getting double frees of mbufs there must be another bug somewhere in tracking the free ring elements. Clearing the entries to NULL introduces extra writes to the vector code path which will likely have a performance impact, but also should be unnecessary, given proper tracking of the ring status. Can you provide us with some details as to how to reproduce the issue, especially with a public sample app? I'd really like to look more into why this is happening, and if other fixes may work. Thanks, /Bruce . Hey Bruce, I've not attempted to reproduce the issue using sample apps. It was initially difficult for us to reproduce with our application until we stumbled on a recipe. The symptoms of the double free are two fold: * Application crashes: Corrupted packet headers, walking a chain of rx segments and loosing one in the middle,... * i40e adminq lockup - Meaning VF sends an OP and PF does not process it The former has been directly correlated to tx vector mode. We still don't understand how this could lead to adminq lockup, but we have not observed the issue applied the patch I submitted. We have questions out to Intel i40e team on this. Here's a high level view of the scenario under which the issues are observed and how we concluded that there were issues with tx vector free. We provided this information to the Intel i40e DPDK team, they reviewed the tx vector logic and suggested changes. With the changes suggested by Intel (the patch I submitted) we have re-enabled tx vector when MTU is < MBUF size and observed no crashes. Below that you will find additional detail on the procedure within our application for changing MTU, including DPDK API calls. Let me know if you have additional questions. Regards, Roger o Scenario: + Intel i40e-da4 (4x10G) or Cisco branded equivalent + KVM or ESXi 6.5 host + In a single VM we have at least 1 VF from all 4 PFs. + We are running traffic on all interfaces. + We stop traffic + We stop a device (VF) + We start the device + We start traffic + At any point after we start the device we can crash o Experiment (Jumping over some of the work to get to the point where we believed that i40e driver was doing double frees): + Our application attaches userdata to mbufs and we put that userdata on a linked list. + We observed that when we processed the userdata it had been corrupted which lead to crashes. + This gave us a hint that the mbuf was on multiple lists. + We reviewed our application code and could not find a cause. + We began to suspect a double free in the i40evf PMD. # We disabled rx free logic and observed crashes (intentionally leaking mbufs in search of the double free). # We disabled tx free logic and observed no crashes # This gave us a hint that the double frees were coming from the i40evf PMD tx logic. + We had also observed that if we forced MTU to large always that there were no crashes # A side effect of forcing large MTU is that multi-segment is enabled. # This gave us a hint that enabling multi-segment was somehow avoiding the double free. + We forced multi-segments regardless of MTU and permitted MTU changes and observed no crashes. + We reviewed the i40evf mbuf free logic to see the effect of enabling multi-segment and observed that when multi-segment is enabled, rx vector was enabled but tx vector was not. + This lead us to examine RX vector mode free logic vs TX vector mode free logic. # RX free logic has special handling for vector mode free # TX free logic does not have any special handling for vector free o By enabling multiple segments always (even if MTU does not require multiple segments), we effectively disabled tx vector mode and this avoided the double free. o Our application no longer crashed, but it could not take advantage of tx vector optimizations. CP == Control Plane DP == Data Plane * CP sends admin down to DP * DP disables RX/TX o Block all future tx burst calls o rte_eth_dev_set_link_down() invoked o Block all future rx burst calls * DP notifies CP admin down action is complete * CP sends MTU change
Re: [dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
Hi Everyone, As soon as I submitted the patch, I realized I neglected to signoff. What's the recommended procedure, resubmit the original signed off, or bump to v1? Thanks, Roger On 10/5/17 3:11 PM, Roger B Melton wrote: --- i40evf tx vector logic frees mbufs, but it does not remove the mbufs from software rings which leads to double frees. This change corrects that oversight. We've validated this fix within our application. drivers/net/i40e/i40e_rxtx_vec_common.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 39a6da0..fdc6fce 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -127,6 +127,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) if (likely(m != NULL)) { free[0] = m; nb_free = 1; + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (likely(m != NULL)) { @@ -139,14 +140,17 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) free[0] = m; nb_free = 1; } + txep[i].mbuf = NULL; } } rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); } else { + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (m != NULL) rte_mempool_put(m->pool, m); + txep[i].mbuf = NULL; } }
[dpdk-dev] [PATCH] net/i40e: Improve i40evf buffer cleanup in tx vector mode
--- i40evf tx vector logic frees mbufs, but it does not remove the mbufs from software rings which leads to double frees. This change corrects that oversight. We've validated this fix within our application. drivers/net/i40e/i40e_rxtx_vec_common.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 39a6da0..fdc6fce 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -127,6 +127,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) if (likely(m != NULL)) { free[0] = m; nb_free = 1; + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (likely(m != NULL)) { @@ -139,14 +140,17 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) free[0] = m; nb_free = 1; } + txep[i].mbuf = NULL; } } rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); } else { + txep[0].mbuf = NULL; for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (m != NULL) rte_mempool_put(m->pool, m); + txep[i].mbuf = NULL; } } -- 2.10.3.dirty
Re: [dpdk-dev] Byte order of vlan_tci of rte_mbuf is different on different source
Hi folks, Resurrecting this old thread. I ran across this issue recently in DPDK 17.05, and it's also present in17.08. It appears no fix was committed. I'm working on a solution, but if anyone has a fix in flight please let me know. Regards, Roger On 4/29/16 3:29 AM, Olivier Matz wrote: Hi, On 04/25/2016 04:35 AM, zhang.xingh...@zte.com.cn wrote: When using I350 working on SR-IOV mode, we got confused that byte order of vlan_tci in the VF received packet descriptor is different when the packet source is different. 1) Packets from VF to VF, the byte order is big-endian. (e.g. 0xF00) 2) Packets from PC to VF, the byte order is little-endian. (e.g. 0xF) Below is the testing net-work: VM0VM1 PC VF0VF1 | | | | +--+--+ | | | PF | hypervisor | SR-IOV NIC | | | |VLAN 15 | +-switch---+ We make a breakpoint at the following line of eth_igb_recv_pkts, the vlan_tci we observed that everytime. uint16_t eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */ rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan); In rte_mbuf.h, it is specified that these values (vlan_tci and vlan_tci_outer) must be stored in CPU order. It's probably a driver or hardware issue. Note that in linux there is something that looks similar to your issue: http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/igb/igb_main.c#L1278 /* On i350, i354, i210, and i211, loopback VLAN packets * have the tag byte-swapped. */ if (adapter->hw.mac.type >= e1000_i350) set_bit(IGB_RING_FLAG_RX_LB_VLAN_BSWAP, &ring->flags); I think you could check if the same thing is done in the dpdk driver. ZTE Information Security Notice: The information contained in this mail (and any attachment transmitted herewith) is privileged and confidential and is intended for the exclusive use of the addressee(s). If you are not an intended recipient, any disclosure, reproduction, distribution or other dissemination or use of the information contained is strictly prohibited. If you have received this mail in error, please delete it and notify us immediately. This notice should be removed in public emails. Regards, Olivier .
Re: [dpdk-dev] 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [PATCH v4] net/ixgbe: ensure link status is updated
Hi Wei, Have you had a chance to look at the debug output yet? Is there any additional data you need? Regards, Roger On 5/24/17 2:38 PM, Roger B. Melton wrote: Hi Wei, I am using ixgbe: 00:02.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) The debug output you requested is below showing the results for both far end link up and far end link down. In the case of far end link up, the elapse time is too small for the granularity of my logging system. In the case when the far end link is down, the elapsed time for rte_eth_link_get_nowait() is ~1.4 seconds. You can really see those 40 and 100msec delays add up when the link is down. Let me know if you require more information. Regards, Roger Far end link up: 2017/05/24 18:15:14.353 [ulord-dp]: [3706]: UUID: 0, ra: 0 (debug): rmeltonDBG Start rte_eth_link_get_nowait() 2017/05/24 18:15:14.353 [ulord-dp]: [3706]: UUID: 0, ra: 0 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:14.353 [ulord-dp]: [3706]: UUID: 0, ra: 0 (debug): rmeltonDBG :End rte_eth_link_get_nowait() Far end link down: 2017/05/24 18:15:38.502 (debug): rmeltonDBG: Start rte_eth_link_get_nowait() 2017/05/24 18:15:38.502 (debug): PMD: ixgbe_get_media_type_82599(): ixgbe_get_media_type_82599 2017/05/24 18:15:38.502 (debug): PMD: ixgbe_setup_mac_link_multispeed_fiber(): ixgbe_setup_mac_link_multispeed_fiber 2017/05/24 18:15:38.502 (debug): PMD: ixgbe_get_link_capabilities_82599(): ixgbe_get_link_capabilities_82599 2017/05/24 18:15:38.542 (debug): PMD: ixgbe_setup_mac_link_82599(): ixgbe_setup_mac_link_82599 2017/05/24 18:15:38.542 (debug): PMD: ixgbe_get_link_capabilities_82599(): ixgbe_get_link_capabilities_82599 2017/05/24 18:15:38.542 (debug): PMD: ixgbe_flap_tx_laser_multispeed_fiber(): ixgbe_flap_tx_laser_multispeed_fiber 2017/05/24 18:15:38.542 (debug): PMD: ixgbe_check_reset_blocked(): ixgbe_check_reset_blocked 2017/05/24 18:15:38.643 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:38.743 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:38.843 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:38.943 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.043 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_setup_mac_link_82599(): ixgbe_setup_mac_link_82599 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_get_link_capabilities_82599(): ixgbe_get_link_capabilities_82599 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_check_reset_blocked(): ixgbe_check_reset_blocked 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_verify_lesm_fw_enabled_82599(): ixgbe_verify_lesm_fw_enabled_82599 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_read_eeprom_82599(): ixgbe_read_eeprom_82599 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_read_eerd_buffer_generic(): ixgbe_read_eerd_buffer_generic 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_init_eeprom_params_generic(): ixgbe_init_eeprom_params_generic 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_poll_eerd_eewr_done(): ixgbe_poll_eerd_eewr_done 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_read_eeprom_82599(): ixgbe_read_eeprom_82599 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_read_eerd_buffer_generic(): ixgbe_read_eerd_buffer_generic 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_init_eeprom_params_generic(): ixgbe_init_eeprom_params_generic 2017/05/24 18:15:39.083 (debug): PMD: ixgbe_poll_eerd_eewr_done(): ixgbe_poll_eerd_eewr_done 2017/05/24 18:15:39.165 (debug): PMD: ixgbe_flap_tx_laser_multispeed_fiber(): ixgbe_flap_tx_laser_multispeed_fiber 2017/05/24 18:15:39.165 (debug): PMD: ixgbe_check_reset_blocked(): ixgbe_check_reset_blocked 2017/05/24 18:15:39.265 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.265 (debug): PMD: ixgbe_setup_mac_link_multispeed_fiber(): ixgbe_setup_mac_link_multispeed_fiber 2017/05/24 18:15:39.265 (debug): PMD: ixgbe_get_link_capabilities_82599(): ixgbe_get_link_capabilities_82599 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_setup_mac_link_82599(): ixgbe_setup_mac_link_82599 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_get_link_capabilities_82599(): ixgbe_get_link_capabilities_82599 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_check_reset_blocked(): ixgbe_check_reset_blocked 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_verify_lesm_fw_enabled_82599(): ixgbe_verify_lesm_fw_enabled_82599 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_read_eeprom_82599(): ixgbe_read_eeprom_82599 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_read_eerd_buffer_generic(): ixgbe_read_eerd_buffer_generic 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_init_eeprom_params_generic(): ixgbe_init_eeprom_params_generic 2017/05/24 18:15:39.305 (debug): PMD
Re: [dpdk-dev] 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [PATCH v4] net/ixgbe: ensure link status is updated
(debug): PMD: ixgbe_read_eerd_buffer_generic(): ixgbe_read_eerd_buffer_generic 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_init_eeprom_params_generic(): ixgbe_init_eeprom_params_generic 2017/05/24 18:15:39.305 (debug): PMD: ixgbe_poll_eerd_eewr_done(): ixgbe_poll_eerd_eewr_done 2017/05/24 18:15:39.387 (debug): PMD: ixgbe_flap_tx_laser_multispeed_fiber(): ixgbe_flap_tx_laser_multispeed_fiber 2017/05/24 18:15:39.387 (debug): PMD: ixgbe_check_reset_blocked(): ixgbe_check_reset_blocked 2017/05/24 18:15:39.487 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.587 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.687 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.787 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.887 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.887 (debug): PMD: ixgbe_check_mac_link_generic(): ixgbe_check_mac_link_generic 2017/05/24 18:15:39.887 (debug): rmeltonDBG posix_pmd_oper_state_get_by_id:End rte_eth_link_get_nowait() On 5/24/17 10:40 AM, Roger B. Melton wrote: Hi Wei, Thanks for getting back to me. I'm also dealing with various priorities so it'll take me a bit to get the data, but I'll get back to you with the details in the next day or so. Regards, Roger On 5/23/17 3:41 AM, Dai, Wei wrote: Hi, Roger Sorry for typo error. You should set --log-level=8 Regards -Wei -Original Message- From: Dai, Wei Sent: Tuesday, May 23, 2017 3:24 PM To: 'Roger B. Melton' ; Laurent Hardy ; dev@dpdk.org Cc: Yigit, Ferruh ; Zhang, Helin ; Ananyev, Konstantin ; olivier.m...@6wind.com Subject: RE: 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [dpdk-dev] [PATCH v4] net/ixgbe: ensure link status is updated Hi, Roger Sorry for late response as we are busy with other higher priority task. ixgbe_setup_mac_link_multispeed_fiber( ) in ixgbe_common.c calls ixgbe_setup_mac_link( ) which some functions defined in ixgbe/base . Would you please give us more info to narrow down this issue ? What device id did you use to trigger this issue ? To get more info, would you please change "CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n" to" CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=y" in config/common_base And run your application with EAL parameter --log-level=7 to get trace path of ixgbe functions for us. Thanks & Best Regards -Wei -Original Message- From: Roger B. Melton [mailto:rmel...@cisco.com] Sent: Tuesday, May 23, 2017 2:54 AM To: Laurent Hardy ; dev@dpdk.org; Dai, Wei Cc: Yigit, Ferruh ; Zhang, Helin ; Ananyev, Konstantin ; olivier.m...@6wind.com Subject: 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [dpdk-dev] [PATCH v4] net/ixgbe: ensure link status is updated Hi Laurent/Wei, Any thoughts? Regards, Roger On 5/17/17 9:31 AM, Roger B Melton wrote: Hi Laurent/Wei, As I continue to integrate DPDK 17.05 into my application, I have hit another issue with this patch while testing in a VM with multispeed fiber ixgbe PCI passthrough. My application periodically invokes rte_eth_link_get_nowait() to detect link state changes. If the link is down (no cable or far end disabled), ixgbe_setup_link() will not return for ~1.3seconds due to the link setup algorithm in ixgbe_common.c:ixgbe_multispeed_fiber(): +if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && +hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { +speed = hw->phy.autoneg_advertised; +if (!speed) +ixgbe_get_link_capabilities(hw, &speed, &autoneg); +ixgbe_setup_link(hw, speed, true); +} + I have two questions: * Shouldn't we avoid the link setup cost if the caller has specified not to wait_to_complete? * If the concern is speed may not be properly configured, shouldn't the link setup be deferred until state changes link up thus minimizing the delays enforced in ixgbe_multispeed_fiber()? Regards, -Roger On 4/27/17 11:03 AM, Laurent Hardy wrote: In case of fiber and link speed set to 1Gb at peer side (with autoneg or with defined speed), link status could be not properly updated at time cable is plugged-in. Indeed if cable was not plugged when device has been configured and started then link status will not be updated properly with new speed as no link setup will be triggered. To avoid this issue, IXGBE_FLAG_NEED_LINK_CONFIG is set to try a link setup each time link_update() is triggered and current link status is down. When cable is plugged-in, link setup will be performed via ixgbe_setup_link(). Signed-off-by: Laurent Hardy --- Hi Wei, please find enclosed patch v4, tested using testpmd. v1 -> v2: - rebase on top of head (change flag to 1&l
Re: [dpdk-dev] 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [PATCH v4] net/ixgbe: ensure link status is updated
Hi Wei, Thanks for getting back to me. I'm also dealing with various priorities so it'll take me a bit to get the data, but I'll get back to you with the details in the next day or so. Regards, Roger On 5/23/17 3:41 AM, Dai, Wei wrote: Hi, Roger Sorry for typo error. You should set --log-level=8 Regards -Wei -Original Message- From: Dai, Wei Sent: Tuesday, May 23, 2017 3:24 PM To: 'Roger B. Melton' ; Laurent Hardy ; dev@dpdk.org Cc: Yigit, Ferruh ; Zhang, Helin ; Ananyev, Konstantin ; olivier.m...@6wind.com Subject: RE: 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [dpdk-dev] [PATCH v4] net/ixgbe: ensure link status is updated Hi, Roger Sorry for late response as we are busy with other higher priority task. ixgbe_setup_mac_link_multispeed_fiber( ) in ixgbe_common.c calls ixgbe_setup_mac_link( ) which some functions defined in ixgbe/base . Would you please give us more info to narrow down this issue ? What device id did you use to trigger this issue ? To get more info, would you please change "CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=n" to" CONFIG_RTE_LIBRTE_IXGBE_DEBUG_DRIVER=y" in config/common_base And run your application with EAL parameter --log-level=7 to get trace path of ixgbe functions for us. Thanks & Best Regards -Wei -Original Message- From: Roger B. Melton [mailto:rmel...@cisco.com] Sent: Tuesday, May 23, 2017 2:54 AM To: Laurent Hardy ; dev@dpdk.org; Dai, Wei Cc: Yigit, Ferruh ; Zhang, Helin ; Ananyev, Konstantin ; olivier.m...@6wind.com Subject: 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [dpdk-dev] [PATCH v4] net/ixgbe: ensure link status is updated Hi Laurent/Wei, Any thoughts? Regards, Roger On 5/17/17 9:31 AM, Roger B Melton wrote: Hi Laurent/Wei, As I continue to integrate DPDK 17.05 into my application, I have hit another issue with this patch while testing in a VM with multispeed fiber ixgbe PCI passthrough. My application periodically invokes rte_eth_link_get_nowait() to detect link state changes. If the link is down (no cable or far end disabled), ixgbe_setup_link() will not return for ~1.3seconds due to the link setup algorithm in ixgbe_common.c:ixgbe_multispeed_fiber(): +if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && +hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { +speed = hw->phy.autoneg_advertised; +if (!speed) +ixgbe_get_link_capabilities(hw, &speed, &autoneg); +ixgbe_setup_link(hw, speed, true); +} + I have two questions: * Shouldn't we avoid the link setup cost if the caller has specified not to wait_to_complete? * If the concern is speed may not be properly configured, shouldn't the link setup be deferred until state changes link up thus minimizing the delays enforced in ixgbe_multispeed_fiber()? Regards, -Roger On 4/27/17 11:03 AM, Laurent Hardy wrote: In case of fiber and link speed set to 1Gb at peer side (with autoneg or with defined speed), link status could be not properly updated at time cable is plugged-in. Indeed if cable was not plugged when device has been configured and started then link status will not be updated properly with new speed as no link setup will be triggered. To avoid this issue, IXGBE_FLAG_NEED_LINK_CONFIG is set to try a link setup each time link_update() is triggered and current link status is down. When cable is plugged-in, link setup will be performed via ixgbe_setup_link(). Signed-off-by: Laurent Hardy --- Hi Wei, please find enclosed patch v4, tested using testpmd. v1 -> v2: - rebase on top of head (change flag to 1<<4) - fix regression with copper links: only update link for fiber links v2 -> v3: - remove unnescessary check on speed mask if autoneg is false v3 -> v4: - remove default speed set to 10Gb if autoneg is false, rely on ixgbe_get_link_capabilities( ) instead. --- drivers/net/ixgbe/ixgbe_ethdev.c | 14 ++ drivers/net/ixgbe/ixgbe_ethdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 7b856bb..8a0c0a7 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3782,8 +3782,12 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_eth_link link, old; ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_UNKNOWN; +struct ixgbe_interrupt *intr = + IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private); int link_up; int diag; +u32 speed = 0; +bool autoneg = false; link.link_status = ETH_LINK_DOWN; link.link_speed = 0; @@ -3793,6 +3797,14 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get
[dpdk-dev] 2nd try: problem with ixgbe_dev_link_update() for multi-speed fiber [was] Re: [PATCH v4] net/ixgbe: ensure link status is updated
Hi Laurent/Wei, Any thoughts? Regards, Roger On 5/17/17 9:31 AM, Roger B Melton wrote: Hi Laurent/Wei, As I continue to integrate DPDK 17.05 into my application, I have hit another issue with this patch while testing in a VM with multispeed fiber ixgbe PCI passthrough. My application periodically invokes rte_eth_link_get_nowait() to detect link state changes. If the link is down (no cable or far end disabled), ixgbe_setup_link() will not return for ~1.3seconds due to the link setup algorithm in ixgbe_common.c:ixgbe_multispeed_fiber(): +if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && +hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { +speed = hw->phy.autoneg_advertised; +if (!speed) +ixgbe_get_link_capabilities(hw, &speed, &autoneg); +ixgbe_setup_link(hw, speed, true); +} + I have two questions: * Shouldn't we avoid the link setup cost if the caller has specified not to wait_to_complete? * If the concern is speed may not be properly configured, shouldn't the link setup be deferred until state changes link up thus minimizing the delays enforced in ixgbe_multispeed_fiber()? Regards, -Roger On 4/27/17 11:03 AM, Laurent Hardy wrote: In case of fiber and link speed set to 1Gb at peer side (with autoneg or with defined speed), link status could be not properly updated at time cable is plugged-in. Indeed if cable was not plugged when device has been configured and started then link status will not be updated properly with new speed as no link setup will be triggered. To avoid this issue, IXGBE_FLAG_NEED_LINK_CONFIG is set to try a link setup each time link_update() is triggered and current link status is down. When cable is plugged-in, link setup will be performed via ixgbe_setup_link(). Signed-off-by: Laurent Hardy --- Hi Wei, please find enclosed patch v4, tested using testpmd. v1 -> v2: - rebase on top of head (change flag to 1<<4) - fix regression with copper links: only update link for fiber links v2 -> v3: - remove unnescessary check on speed mask if autoneg is false v3 -> v4: - remove default speed set to 10Gb if autoneg is false, rely on ixgbe_get_link_capabilities( ) instead. --- drivers/net/ixgbe/ixgbe_ethdev.c | 14 ++ drivers/net/ixgbe/ixgbe_ethdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 7b856bb..8a0c0a7 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3782,8 +3782,12 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_eth_link link, old; ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_UNKNOWN; +struct ixgbe_interrupt *intr = + IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private); int link_up; int diag; +u32 speed = 0; +bool autoneg = false; link.link_status = ETH_LINK_DOWN; link.link_speed = 0; @@ -3793,6 +3797,14 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; +if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && +hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { +speed = hw->phy.autoneg_advertised; +if (!speed) +ixgbe_get_link_capabilities(hw, &speed, &autoneg); +ixgbe_setup_link(hw, speed, true); +} + /* check if it needs to wait to complete, if lsc interrupt is enabled */ if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0) diag = ixgbe_check_link(hw, &link_speed, &link_up, 0); @@ -3810,10 +3822,12 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) if (link_up == 0) { rte_ixgbe_dev_atomic_write_link_status(dev, &link); +intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG; if (link.link_status == old.link_status) return -1; return 0; } +intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG; link.link_status = ETH_LINK_UP; link.link_duplex = ETH_LINK_FULL_DUPLEX; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h index 5176b02..b576a6f 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.h +++ b/drivers/net/ixgbe/ixgbe_ethdev.h @@ -45,6 +45,7 @@ #define IXGBE_FLAG_MAILBOX (uint32_t)(1 << 1) #define IXGBE_FLAG_PHY_INTERRUPT(uint32_t)(1 << 2) #define IXGBE_FLAG_MACSEC (uint32_t)(1 << 3) +#define IXGBE_FLAG_NEED_LINK_CONFIG (uint32_t)(1 << 4) /* * Defines that were not part of ixgbe_type.h as they are not used by the .
Re: [dpdk-dev] [PATCH v4] net/ixgbe: ensure link status is updated
Hi Laurent/Wei, As I continue to integrate DPDK 17.05 into my application, I have hit another issue with this patch while testing in a VM with multispeed fiber ixgbe PCI passthrough. My application periodically invokes rte_eth_link_get_nowait() to detect link state changes. If the link is down (no cable or far end disabled), ixgbe_setup_link() will not return for ~1.3seconds due to the link setup algorithm in ixgbe_common.c:ixgbe_multispeed_fiber(): + if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && + hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + speed = hw->phy.autoneg_advertised; + if (!speed) + ixgbe_get_link_capabilities(hw, &speed, &autoneg); + ixgbe_setup_link(hw, speed, true); +} + I have two questions: * Shouldn't we avoid the link setup cost if the caller has specified not to wait_to_complete? * If the concern is speed may not be properly configured, shouldn't the link setup be deferred until state changes link up thus minimizing the delays enforced in ixgbe_multispeed_fiber()? Regards, -Roger On 4/27/17 11:03 AM, Laurent Hardy wrote: In case of fiber and link speed set to 1Gb at peer side (with autoneg or with defined speed), link status could be not properly updated at time cable is plugged-in. Indeed if cable was not plugged when device has been configured and started then link status will not be updated properly with new speed as no link setup will be triggered. To avoid this issue, IXGBE_FLAG_NEED_LINK_CONFIG is set to try a link setup each time link_update() is triggered and current link status is down. When cable is plugged-in, link setup will be performed via ixgbe_setup_link(). Signed-off-by: Laurent Hardy --- Hi Wei, please find enclosed patch v4, tested using testpmd. v1 -> v2: - rebase on top of head (change flag to 1<<4) - fix regression with copper links: only update link for fiber links v2 -> v3: - remove unnescessary check on speed mask if autoneg is false v3 -> v4: - remove default speed set to 10Gb if autoneg is false, rely on ixgbe_get_link_capabilities( ) instead. --- drivers/net/ixgbe/ixgbe_ethdev.c | 14 ++ drivers/net/ixgbe/ixgbe_ethdev.h | 1 + 2 files changed, 15 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 7b856bb..8a0c0a7 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3782,8 +3782,12 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct rte_eth_link link, old; ixgbe_link_speed link_speed = IXGBE_LINK_SPEED_UNKNOWN; + struct ixgbe_interrupt *intr = + IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private); int link_up; int diag; + u32 speed = 0; + bool autoneg = false; link.link_status = ETH_LINK_DOWN; link.link_speed = 0; @@ -3793,6 +3797,14 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; + if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && + hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + speed = hw->phy.autoneg_advertised; + if (!speed) + ixgbe_get_link_capabilities(hw, &speed, &autoneg); + ixgbe_setup_link(hw, speed, true); + } + /* check if it needs to wait to complete, if lsc interrupt is enabled */ if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0) diag = ixgbe_check_link(hw, &link_speed, &link_up, 0); @@ -3810,10 +3822,12 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) if (link_up == 0) { rte_ixgbe_dev_atomic_write_link_status(dev, &link); + intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG; if (link.link_status == old.link_status) return -1; return 0; } + intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG; link.link_status = ETH_LINK_UP; link.link_duplex = ETH_LINK_FULL_DUPLEX; diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h index 5176b02..b576a6f 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.h +++ b/drivers/net/ixgbe/ixgbe_ethdev.h @@ -45,6 +45,7 @@ #define IXGBE_FLAG_MAILBOX (uint32_t)(1 << 1) #define IXGBE_FLAG_PHY_INTERRUPT(uint32_t)(1 << 2) #define IXGBE_FLAG_MACSEC (uint32_t)(1 << 3) +#define IXGBE_FLAG_NEED_LINK_CONFIG (uint32_t)(1 << 4) /* * Defines that were not part of ixgbe_type.h as they are not used by the
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/ixgbe: fix calling null function of VF
-sta...@dpdk.org per Thomas I know this patch has already been applied but to complete the loop, I have applied the patch to 17.05-rc4 and verified within my application that the NULL function pointer issue for ixgbe_vf has been resolved. Regards, Roger On 5/10/17 3:00 AM, Wei Dai wrote: hw->mac.ops.get_media-type() of ixgbe VF is NULL and should not be called directly. It had better be replaced by calling ixgbe_get_media_type( ) to avoid crash. Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is updated") Cc: sta...@dpdk.org Signed-off-by: Wei Dai Tested-by Roger B Melton --- drivers/net/ixgbe/ixgbe_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index ec667d8..ed2baec 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3799,7 +3799,7 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && - hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) { speed = hw->phy.autoneg_advertised; if (!speed) ixgbe_get_link_capabilities(hw, &speed, &autoneg);
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/ixgbe: fix calling null function of VF
On 5/10/17 5:50 AM, Ferruh Yigit wrote: On 5/10/2017 8:00 AM, Wei Dai wrote: hw->mac.ops.get_media-type() of ixgbe VF is NULL and should not be called directly. It had better be replaced by calling ixgbe_get_media_type( ) to avoid crash. Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is updated") Cc: sta...@dpdk.org Signed-off-by: Wei Dai Hi Roger and Laurent, This is very close the release, can you please support verifying this defect? An explicit Acked-by and/or Tested-by would help a lot. Thanks, ferruh Hi Ferruh, Happy to test it. I'll get back to you shortly. Regards, Roger --- drivers/net/ixgbe/ixgbe_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index ec667d8..ed2baec 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3799,7 +3799,7 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && - hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) { speed = hw->phy.autoneg_advertised; if (!speed) ixgbe_get_link_capabilities(hw, &speed, &autoneg); .
Re: [dpdk-dev] bug in net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_link_update beginning in 17.05-rc3?
Thanks -Roger On 5/10/17 4:20 AM, Dai, Wei wrote: Yes, it is a bug. The hw->mac.ops.get_media_type of ixgbe VF is NULL. I have just submitted a patch to fix it. http://dpdk.org/dev/patchwork/patch/24188/ -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Roger B Melton Sent: Wednesday, May 10, 2017 1:53 AM To: dev@dpdk.org; laurent.ha...@6wind.com Subject: [dpdk-dev] bug in net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_link_update beginning in 17.05-rc3? After updating to 17.05-rc4 I hit a crash in drivers/net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_link_update(). The issue was a NULL get_media_type FV for the VF driver. Looking at recent commits, I see the following added the get_media_type() check: commit c12d22f65b132c56db7b4fdbfd5ddce27d1e9572 Author: Laurent Hardy Date: Thu Apr 27 17:03:42 2017 +0200 @@ -3793,6 +3797,14 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; + if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && + hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + speed = hw->phy.autoneg_advertised; + if (!speed) + ixgbe_get_link_capabilities(hw, &speed, &autoneg); + ixgbe_setup_link(hw, speed, true); + } + This is fine for the PF driver, but we shouldn't invoke get_media_type for the VF. Laurent, Is this a bug, or am I missing something? If it is a bug, what's the proper fix? Regards, Roger --
[dpdk-dev] bug in net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_link_update beginning in 17.05-rc3?
After updating to 17.05-rc4 I hit a crash in drivers/net/ixgbe/ixgbe_ethdev.c:ixgbe_dev_link_update(). The issue was a NULL get_media_type FV for the VF driver. Looking at recent commits, I see the following added the get_media_type() check: commit c12d22f65b132c56db7b4fdbfd5ddce27d1e9572 Author: Laurent Hardy Date: Thu Apr 27 17:03:42 2017 +0200 @@ -3793,6 +3797,14 @@ ixgbe_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete) hw->mac.get_link_status = true; + if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) && + hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) { + speed = hw->phy.autoneg_advertised; + if (!speed) + ixgbe_get_link_capabilities(hw, &speed, &autoneg); + ixgbe_setup_link(hw, speed, true); + } + This is fine for the PF driver, but we shouldn't invoke get_media_type for the VF. Laurent, Is this a bug, or am I missing something? If it is a bug, what's the proper fix? Regards, Roger -- ________ |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmel...@cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
Re: [dpdk-dev] [PATCH v6 0/5] Extended xstats API in ethdev library to allow grouping of stats
On 4/24/17 11:49 AM, Mcnamara, John wrote: -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon Sent: Monday, April 24, 2017 1:41 PM To: Kozak, KubaX Cc: dev@dpdk.org; Olivier Matz ; Van Haaren, Harry ; Jain, Deepak K ; Piasecki, JacekX Subject: Re: [dpdk-dev] [PATCH v6 0/5] Extended xstats API in ethdev library to allow grouping of stats 24/04/2017 14:32, Olivier Matz: Hi, ... So, I wonder if it wouldn't be more simple to keep the old API intact (it would avoid unannounced breakage). The new feature can be implemented in an additional API: rte_eth_xstats_get_by_id(uint8_t port_id, const uint64_t *ids, uint64_t *values, unsigned int size) rte_eth_xstats_get_names_by_id(uint8_t port_id, const uint64_t *ids, struct rte_eth_xstat_name *xstats_names, unsigned int size) Or: rte_eth_xstats_get_by_id(uint8_t port_id, const uint64_t *ids, struct rte_eth_xstat *values, unsigned int size) rte_eth_xstats_get_names_by_id(uint8_t port_id, const uint64_t *ids, struct rte_eth_xstat_name *xstats_names, unsigned int size) (which would allow to deprecate the old API, but I'm not sure we need to) Can we fix that for 17.05? ... Back to the issues, please try to fix it quickly or we should revert it for 17.05-rc3. Hi, We'll submit a patch to change the APIs for rte_eth_xstats_get() and rte_eth_xstats_get_names() back to their previous signature, without symbol versions, and add the APIs suggested by Olivier. We'll work on that as soon as possible. John . Hi folks, I adapted our application to the API changes presented in 17.05-rc2 and encountered problem that I don't think was pointed out in Olivier's list of issues. In our application we call rte_eth_stats_get() with NULL pointers to determine the number of xstats. We use the returned count to pre-allocate buffers for the name strings and xstats values and then invoke rte_eth_xstats_get_names() and rte_eth_xstats_get() with pointers to appropriately sized buffers and the number of entries available in those buffers. Things appeared to work fine until I requested stats for a net_ixgbe_vf port. In that case the initial call to rte_eth_xstats_get() did not account for the xstats extensions from the ixgbevf PMD. Long story short, my buffers were undersized and bad things happened. John, If you intend to restore the previous API signature but retain the -rc2 internal implementation, may I suggest that both APIs use the same internal function to determine stats count. Otherwise the potential for disagreement between the APIs will remain. Also while on this topic, IMO the above demonstrates that we really need an API to query the xstats count rather than relying on rte_eth_xstats_get() or rte_eth_xstats_get_names() with 0'd out arguments. Regards, Roger
[dpdk-dev] [PATCH v2] eal: fix compile error for old glibc caused by pthread_setname_np()
On 11/25/15 9:03 AM, Thomas Monjalon wrote: > 2015-11-25 08:51, Roger B. Melton: >> Have you thought about a way to set thread name when glibc < 2.12. I >> also ran into the problem recently and played around with prctl() >> (Linux) to set thread (process) name. e.g. >> >> ret = prctl(PR_SET_NAME,,0,0,0); >> >> >> There are 2 issues I think: >> >> 1) The semantics are different than prthread_setname_np(). With >> pthread_setname_np() a name can be assigned to any thread, with >> prctl() the name is assigned to the active thread. That would mean >> that rather than rte_eal_init(), rte_eal_intr_init() could not >> assign thread names. Rather the threads would have to name themselves. >> >> 2) I think BSD lacks prctl(), but some (not all?) BSD >> implementations have setproctitle() to do the same thing. >> >> >> It might be too late for 2.2, but something to think about for the future. > I don't think this feature is important enough to deal with old environments > and to risk some complicated bugs. > Do you think it deserves more tricks? > . > I agree with you Thomas. While I am one of those living in an old environment, I believe that the complications of the tricks out weight the debug benefit. However there may be other in the community who have a different view, so I thought I would at least suggest that there are alternatives. Thanks, -Roger
[dpdk-dev] [PATCH v2] eal: fix compile error for old glibc caused by pthread_setname_np()
> +/** > + * Set thread names. > + * > + * Macro to wrap `pthread_setname_np()` with a glibc version check. > + * Only glibc >= 2.12 supports this feature. > + * > + * This macro only used for Linux, BSD does direct libc call. > + * BSD libc version of function is `pthread_set_name_np()`. > + */ > +#if defined(__DOXYGEN__) > +#define rte_thread_setname(...) pthread_setname_np(__VA_ARGS__) > +#endif > + > +#if defined(__GLIBC__) && defined(__GLIBC_PREREQ) > +#if __GLIBC_PREREQ(2, 12) > +#define rte_thread_setname(...) pthread_setname_np(__VA_ARGS__) > +#else > +#define rte_thread_setname(...) 0 > +#endif > +#endif Have you thought about a way to set thread name when glibc < 2.12. I also ran into the problem recently and played around with prctl() (Linux) to set thread (process) name. e.g. ret = prctl(PR_SET_NAME,,0,0,0); There are 2 issues I think: 1) The semantics are different than prthread_setname_np(). With pthread_setname_np() a name can be assigned to any thread, with prctl() the name is assigned to the active thread. That would mean that rather than rte_eal_init(), rte_eal_intr_init() could not assign thread names. Rather the threads would have to name themselves. 2) I think BSD lacks prctl(), but some (not all?) BSD implementations have setproctitle() to do the same thing. It might be too late for 2.2, but something to think about for the future. Regards, Roger
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
David/Thomas, in-line -Roger On 11/18/15 5:13 PM, Roger B. Melton wrote: > Hi Thomas, in-line -Roger > > On 11/17/15 10:46 AM, Thomas Monjalon wrote: >> 2015-11-17 08:56, Roger B. Melton: >>> Hi David, in-line -Roger >>> >>> On 11/16/15 4:46 AM, David Marchand wrote: >>>> Hello Roger, >>>> >>>> On Sun, Nov 15, 2015 at 3:45 PM, Roger B. Melton >>> <mailto:rmelton at cisco.com>> wrote: >>>> >>>> I like the "-b all" and "-w none" idea, but I think it might be >>>> complicated to implement it the way we would need it to work. >>>> The >>>> existing -b and -w options persist for the duration of the >>>> application, and we would need the "-b all"/"-w none" to persists >>>> only through rte_eal_init() time. Otherwise our attempt to to >>>> attach a device at a later time would be blocked by the option. >>>> >>>> I agree, the black/white lists should only apply to initial scan. >>>> I forgot about this problem ... >>>> I had started some cleanup in the pci scan / attach code but this is >>>> too late for 2.2, I will post this in the next merge window. >>>> >>>> >>>> Wouldn't it be simpler to have an option to disable the >>>> rte_eal_init() time the probe. Would that address the issue with >>>> VFIO, prevent automatically attaching to devices while permitting >>>> on demand attach? >>>> >>>> >>>> I suppose we can do this yes (I think Thomas once proposed off-list an >>>> option like --no-pci-scan). >>>> Do you think you can send a patch ? >>> What about --no-pci-init-probe? I know it's long, but it is more >>> descriptive of it's purpose to disable only the init time pci probe. >> Why not a "-b all"? >> Making it work would also solve the case where you to scan only part of >> the devices and initialize the blacklisted ones later. >> . >> > Do you envision "-b all" setting a flag that would be used to block > rte_eal_init() invocation of rte_eal_pci_probe()? e.g. If we have a > new API, *rte_eal_pci_blacklist_all_get()* that returns a non-zero > value if "-b all" was specified, then in rte_eal_init() we would have > something like: > > ... > /* Probe & Initialize PCI devices */ > *if (!rte_eal_pci_blacklist_all_get())** <--- New check* > if (rte_eal_pci_probe()) > rte_panic("Cannot probe PCI\n"); > ... > > > Or setting a flag that would be checked in rte_eal_probe_one() similar > to the existing per device blacklist check? e.g. Again with a new > API, *rte_eal_pci_blacklist_all_get()* that returns a non-zero value > if "-b all" was specified, then in rte_eal_pci_probe_one_driver() we > would have something like: > > /* no initialization when blacklisted, return without error */ > if (*rte_eal_pci_blacklist_all_get() || <--- New check* > (dev->devargs != NULL && > dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI)) { > RTE_LOG(DEBUG, EAL, " Device is blacklisted, not > initializing\n"); > return 1; > } > > The former would work, but I think it would be confusing for "-b all" > to only apply to init. > > The latter would be consistent with how "-b " works, but in > order to initialize devices at a later time, we would need a way to > clear the blacklist all state at run time so that > *rte_eal_pci_blacklist_all()* would return false. For example, > something like *rte_eal_pci_blacklist_all_clear()*. > > Or do you have something else in mind entirely? > > . > How about something like this: * Add pci_blacklist_all member to eal internal config. * When non-zero, pci_blacklist_all will prevent PCI probing. Result is that probes will find no match. * pci_blacklist_all is temporal o Can be set at rte_eal_init() time with "-b all" switch. o Can be set or cleared at run time with rte_eal_pci_blacklist_all_set() or rte_eal_pci_blacklist_all_clear() respectively. o State can be queried with rte_eal_pci_blacklist_all_get() * Blacklist and whitelist switches with PCI DBDF are still permitted as before, but they are used only when the blacklist all override is cleared. I've tested the implementation from the diff below in our application and it works well. Note ou
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi Thomas, in-line -Roger On 11/17/15 10:46 AM, Thomas Monjalon wrote: > 2015-11-17 08:56, Roger B. Melton: >> Hi David, in-line -Roger >> >> On 11/16/15 4:46 AM, David Marchand wrote: >>> Hello Roger, >>> >>> On Sun, Nov 15, 2015 at 3:45 PM, Roger B. Melton >> <mailto:rmelton at cisco.com>> wrote: >>> >>> I like the "-b all" and "-w none" idea, but I think it might be >>> complicated to implement it the way we would need it to work. The >>> existing -b and -w options persist for the duration of the >>> application, and we would need the "-b all"/"-w none" to persists >>> only through rte_eal_init() time. Otherwise our attempt to to >>> attach a device at a later time would be blocked by the option. >>> >>> I agree, the black/white lists should only apply to initial scan. >>> I forgot about this problem ... >>> I had started some cleanup in the pci scan / attach code but this is >>> too late for 2.2, I will post this in the next merge window. >>> >>> >>> Wouldn't it be simpler to have an option to disable the >>> rte_eal_init() time the probe. Would that address the issue with >>> VFIO, prevent automatically attaching to devices while permitting >>> on demand attach? >>> >>> >>> I suppose we can do this yes (I think Thomas once proposed off-list an >>> option like --no-pci-scan). >>> Do you think you can send a patch ? >> What about --no-pci-init-probe? I know it's long, but it is more >> descriptive of it's purpose to disable only the init time pci probe. > Why not a "-b all"? > Making it work would also solve the case where you to scan only part of > the devices and initialize the blacklisted ones later. > . > Do you envision "-b all" setting a flag that would be used to block rte_eal_init() invocation of rte_eal_pci_probe()? e.g. If we have a new API, *rte_eal_pci_blacklist_all_get()* that returns a non-zero value if "-b all" was specified, then in rte_eal_init() we would have something like: ... /* Probe & Initialize PCI devices */ *if (!rte_eal_pci_blacklist_all_get())** <--- New check* if (rte_eal_pci_probe()) rte_panic("Cannot probe PCI\n"); ... Or setting a flag that would be checked in rte_eal_probe_one() similar to the existing per device blacklist check? e.g. Again with a new API, *rte_eal_pci_blacklist_all_get()* that returns a non-zero value if "-b all" was specified, then in rte_eal_pci_probe_one_driver() we would have something like: /* no initialization when blacklisted, return without error */ if (*rte_eal_pci_blacklist_all_get() || <--- New check* (dev->devargs != NULL && dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI)) { RTE_LOG(DEBUG, EAL, " Device is blacklisted, not initializing\n"); return 1; } The former would work, but I think it would be confusing for "-b all" to only apply to init. The latter would be consistent with how "-b " works, but in order to initialize devices at a later time, we would need a way to clear the blacklist all state at run time so that *rte_eal_pci_blacklist_all()* would return false. For example, something like *rte_eal_pci_blacklist_all_clear()*. Or do you have something else in mind entirely?
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi David, in-line -Roger On 11/16/15 4:46 AM, David Marchand wrote: > Hello Roger, > > On Sun, Nov 15, 2015 at 3:45 PM, Roger B. Melton <mailto:rmelton at cisco.com>> wrote: > > I like the "-b all" and "-w none" idea, but I think it might be > complicated to implement it the way we would need it to work. The > existing -b and -w options persist for the duration of the > application, and we would need the "-b all"/"-w none" to persists > only through rte_eal_init() time. Otherwise our attempt to to > attach a device at a later time would be blocked by the option. > > > I agree, the black/white lists should only apply to initial scan. > I forgot about this problem ... > I had started some cleanup in the pci scan / attach code but this is > too late for 2.2, I will post this in the next merge window. > > > Wouldn't it be simpler to have an option to disable the > rte_eal_init() time the probe. Would that address the issue with > VFIO, prevent automatically attaching to devices while permitting > on demand attach? > > > I suppose we can do this yes (I think Thomas once proposed off-list an > option like --no-pci-scan). > Do you think you can send a patch ? What about --no-pci-init-probe? I know it's long, but it is more descriptive of it's purpose to disable only the init time pci probe. I code and test and have it ready. I'm still working through internal processes to allow me to submit patches, but I hope to have that resolved in the next few weeks and at that time I can submit the patch. > > > -- > David Marchand >
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi David, We are on a very old kernel (2.6.xx) that lacks VFIO. In the future however, after migration to a newer kernel it will be an option. I like the "-b all" and "-w none" idea, but I think it might be complicated to implement it the way we would need it to work. The existing -b and -w options persist for the duration of the application, and we would need the "-b all"/"-w none" to persists only through rte_eal_init() time. Otherwise our attempt to to attach a device at a later time would be blocked by the option. Wouldn't it be simpler to have an option to disable the rte_eal_init() time the probe. Would that address the issue with VFIO, prevent automatically attaching to devices while permitting on demand attach? Thanks again. Regards, -Roger On 11/14/15 12:51 PM, David Marchand wrote: > Hello Roger, > > On Sat, Nov 14, 2015 at 4:55 PM, Roger B. Melton <mailto:rmelton at cisco.com>> wrote: > > Agreed. For our application, the debug case would be to _enable_ > the PCI scan. > > Again, thanks David for pointing it out. It did solve our problem. > > > The only problem with --no-pci is that I think that vfio won't work > properly if used. > > Did you try to blacklist all your devices then attach them later ? > I would say what you need here is to "blacklist all" or "whitelist > none" at startup, so maybe a special keyword for -b/-w options. > > > -- > David Marchand -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Agreed. For our application, the debug case would be to _enable_ the PCI scan. Again, thanks David for pointing it out. It did solve our problem. Regards, Roger On 11/13/15 5:58 PM, Don Provan wrote: > --no-pci is cool. I'm pretty sure that wasn't there when the PCI scan was > first added to > the library init routine. I'm glad to see it, so thanks for pointing it out. > > Just for the record: The comment says, "for debug purposes, PCI can be > disabled". > This exhibits one of those classic DPDK blindspots. The library can be used > for many > things entirely unrelated to hardware. My project has several DPDK > applications > intended to be used by users that do not have privs to mess around with PCI > hardware, so, for them, turning off PCI wouldn't be just for debugging > purposes. > > -don provan > dprovan at bivio.net > > -Original Message- > From: David Marchand [mailto:david.marchand at 6wind.com] > Sent: Friday, November 13, 2015 12:50 AM > To: Roger B Melton > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional? > > ... > Did you try the --no-pci option ? > It avoids the initial sysfs scan, so with no pci device, the initial > pci_probe should do nothing. > ... >
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi David, On 11/13/15 3:49 AM, David Marchand wrote: > Hello Roger, > > On Thu, Nov 12, 2015 at 11:43 PM, Roger B Melton <mailto:rmelton at cisco.com>> wrote: > > Hi folks, > > With the addition of hot plug support we have been migrating away > from device discovery and attach at initialization time to a model > where it is controlled from a separate process. The separate > process manages the binding of devices to UIO and instructs the > DPDK process when to attach. One of the problems we stumbled onto > was that if our control process discovered devices and bound them > to UIO before our DPDK process started, then rte_eal_init() would > discover and attach to those devices via the rte_eal_pci_probe() > invocation. This caused problems later on when when our control > process, instructed our DPDK process to attach to a device. > > There are a number of ways we could address this, but the simplest > is to prevent the rte_eal_pci_probe() at rte_eal_init() time. In > our model we will never need it and I suspect others may also be > in that boat. > > What are your thoughts on adding an argument to instruct > rte_eal_init() to skip the PCI probe? > > > Did you try the --no-pci option ? > It avoids the initial sysfs scan, so with no pci device, the initial > pci_probe should do nothing. > > Attaching devices later will trigger this sysfs scan and only probe > the requested device. > I am not totally happy with the way it is done right now, but I think > this should work for you. I saw it, but I was so caught up in the probe that I didn't consider that delaying the scan until attach time might solve the problem. I'll give it shot. Thanks for pointing it out David. Regards, -Roger > > > -- > David Marchand -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi folks, With the addition of hot plug support we have been migrating away from device discovery and attach at initialization time to a model where it is controlled from a separate process. The separate process manages the binding of devices to UIO and instructs the DPDK process when to attach. One of the problems we stumbled onto was that if our control process discovered devices and bound them to UIO before our DPDK process started, then rte_eal_init() would discover and attach to those devices via the rte_eal_pci_probe() invocation. This caused problems later on when when our control process, instructed our DPDK process to attach to a device. There are a number of ways we could address this, but the simplest is to prevent the rte_eal_pci_probe() at rte_eal_init() time. In our model we will never need it and I suspect others may also be in that boat. What are your thoughts on adding an argument to instruct rte_eal_init() to skip the PCI probe? Thanks, -Roger
[dpdk-dev] [PATCH 2/2] igb: fix VF statistic wraparound handling macro
On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpreatation of VF statistic macro in e1000/igb. > > Signed-off-by: Harry van Haaren Acked-by: Roger Melton
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpretation of VF stats in ixgbe > > Signed-off-by: Harry van Haaren Acked-by: Roger Melton
[dpdk-dev] [PATCH 2/2] igb: fix VF statistic wraparound handling macro
ack On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpreatation of VF statistic macro in e1000/igb. > > Signed-off-by: Harry van Haaren > --- > drivers/net/e1000/igb_ethdev.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c > index 848ef6e..e4911fc 100644 > --- a/drivers/net/e1000/igb_ethdev.c > +++ b/drivers/net/e1000/igb_ethdev.c > @@ -246,11 +246,10 @@ static void eth_igb_configure_msix_intr(struct > rte_eth_dev *dev); > #define UPDATE_VF_STAT(reg, last, cur)\ > { \ > u32 latest = E1000_READ_REG(hw, reg); \ > - cur += latest - last; \ > + cur += (latest-last) & UINT_MAX; \ > last = latest;\ > } > > - > #define IGB_FC_PAUSE_TIME 0x0680 > #define IGB_LINK_UPDATE_CHECK_TIMEOUT 90 /* 9s */ > #define IGB_LINK_UPDATE_CHECK_INTERVAL 100 /* ms */ -- ____ |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
ack On 10/12/15 12:45 PM, Harry van Haaren wrote: > Fix a misinterpretation of VF stats in ixgbe > > Signed-off-by: Harry van Haaren > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index ec2918c..86dcd87 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -329,10 +329,10 @@ static int ixgbe_timesync_read_tx_timestamp(struct > rte_eth_dev *dev, > /* >* Define VF Stats MACRO for Non "cleared on read" register >*/ > -#define UPDATE_VF_STAT(reg, last, cur) \ > +#define UPDATE_VF_STAT(reg, last, cur) \ > { \ > uint32_t latest = IXGBE_READ_REG(hw, reg); \ > - cur += latest - last; \ > + cur += (latest-last) & UINT_MAX;\ > last = latest; \ > } > -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
Agreed, this handles the off by one error on wrap around and should be faster. -Roger On 10/12/15 11:41 AM, Alexander Duyck wrote: > On 10/12/2015 06:33 AM, Harry van Haaren wrote: >> Fix a misinterpretation of VF stats in ixgbe >> >> Signed-off-by: Harry van Haaren >> --- >> drivers/net/ixgbe/ixgbe_ethdev.c | 8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >> b/drivers/net/ixgbe/ixgbe_ethdev.c >> index ec2918c..d226e8d 100644 >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >> @@ -329,10 +329,14 @@ static int >> ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev, >> /* >>* Define VF Stats MACRO for Non "cleared on read" register >>*/ >> -#define UPDATE_VF_STAT(reg, last, cur)\ >> +#define UPDATE_VF_STAT(reg, last, cur) \ >> { \ >> uint32_t latest = IXGBE_READ_REG(hw, reg); \ >> -cur += latest - last; \ >> +if(likely(latest > last)) { \ >> +cur += latest - last; \ >> +} else {\ >> +cur += (UINT_MAX - last) + latest; \ >> +} \ >> last = latest; \ >> } > > From what I can tell your math is adding an off by one error. You > should probably be using UINT_MAX as a mask for the result, not as a > part of the calculation itself. > > So the correct way to compute this would be "cur += (latest - last) & > UINT_MAX". Also the mask approach should be faster as it avoids any > conditional jumps. > > - Alex > . > -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 2/2] igb: fix VF statistic wraparound handling macro
ack On 10/12/15 9:33 AM, Harry van Haaren wrote: > Fix a misinterpreatation of VF statistic macro in e1000/igb. > > Signed-off-by: Harry van Haaren > --- > drivers/net/e1000/igb_ethdev.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c > index 848ef6e..e3f7402 100644 > --- a/drivers/net/e1000/igb_ethdev.c > +++ b/drivers/net/e1000/igb_ethdev.c > @@ -246,7 +246,11 @@ static void eth_igb_configure_msix_intr(struct > rte_eth_dev *dev); > #define UPDATE_VF_STAT(reg, last, cur)\ > { \ > u32 latest = E1000_READ_REG(hw, reg); \ > - cur += latest - last; \ > + if(likely(latest > last)) { \ > + cur += latest - last; \ > + } else { \ > + cur += (UINT_MAX - last) + latest;\ > + } \ > last = latest;\ > } > -- ____ |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 1/2] ixgbe: fix VF statistic wraparound handling macro
ack On 10/12/15 9:33 AM, Harry van Haaren wrote: > Fix a misinterpretation of VF stats in ixgbe > > Signed-off-by: Harry van Haaren > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index ec2918c..d226e8d 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -329,10 +329,14 @@ static int ixgbe_timesync_read_tx_timestamp(struct > rte_eth_dev *dev, > /* >* Define VF Stats MACRO for Non "cleared on read" register >*/ > -#define UPDATE_VF_STAT(reg, last, cur) \ > +#define UPDATE_VF_STAT(reg, last, cur) \ > { \ > uint32_t latest = IXGBE_READ_REG(hw, reg); \ > - cur += latest - last; \ > + if(likely(latest > last)) { \ > + cur += latest - last; \ > + } else {\ > + cur += (UINT_MAX - last) + latest; \ > + } \ > last = latest; \ > } > -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |
[dpdk-dev] [PATCH 4/4] ethdev: check support for rx_queue_count and descriptor_done fns
Hi Bruce, Comment in-line. Regards, Roger On 6/12/15 7:28 AM, Bruce Richardson wrote: > The functions rte_eth_rx_queue_count and rte_eth_descriptor_done are > supported by very few PMDs. Therefore, it is best to check for support > for the functions in the ethdev library, so as to avoid crashes > at run-time if the application goes to use those APIs. The performance > impact of this change should be very small as this is a predictable > branch in the function. > > Signed-off-by: Bruce Richardson > --- > lib/librte_ether/rte_ethdev.h | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h > index 827ca3e..9ad1b6a 100644 > --- a/lib/librte_ether/rte_ethdev.h > +++ b/lib/librte_ether/rte_ethdev.h > @@ -2496,6 +2496,8 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, >* The queue id on the specific port. >* @return >* The number of used descriptors in the specific queue. > + * NOTE: if function is not supported by device this call > + *returns (uint32_t)-ENOTSUP >*/ > static inline uint32_t Why not change the return type to int32_t? In this way, the caller isn't required to make the assumption that a large queue count indicates an error. < 0 means error, other wise it's a valid queue count. This approach would be consistent with other APIs. > rte_eth_rx_queue_count(uint8_t port_id, uint16_t queue_id) > @@ -2507,8 +2509,9 @@ rte_eth_rx_queue_count(uint8_t port_id, uint16_t > queue_id) > RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); > return 0; > } > - RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, 0); > #endif > + RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_queue_count, > (uint32_t)-ENOTSUP); > + > return (*dev->dev_ops->rx_queue_count)(dev, queue_id); > } > > @@ -2525,6 +2528,7 @@ rte_eth_rx_queue_count(uint8_t port_id, uint16_t > queue_id) >* - (1) if the specific DD bit is set. >* - (0) if the specific DD bit is not set. >* - (-ENODEV) if *port_id* invalid. > + * - (-ENOTSUP) if the device does not support this function >*/ > static inline int > rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t > offset) > @@ -2536,8 +2540,8 @@ rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t > queue_id, uint16_t offset) > RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); > return -ENODEV; > } > - RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP); > #endif > + RTE_ETH_FPTR_OR_ERR_RET(*dev->dev_ops->rx_descriptor_done, -ENOTSUP); > > return (*dev->dev_ops->rx_descriptor_done)( \ > dev->data->rx_queues[queue_id], offset); -- |Roger B. Melton| | Cisco Systems | |CPP Software :|::|: 7100 Kit Creek Rd | |+1.919.476.2332 phone:|||: :|||:RTP, NC 27709-4987 | |+1.919.392.1094 fax .:|||:..:|||:. rmelton at cisco.com | || | This email may contain confidential and privileged material for the| | sole use of the intended recipient. Any review, use, distribution | | or disclosure by others is strictly prohibited. If you are not the | | intended recipient (or authorized to receive for the recipient), | | please contact the sender by reply email and delete all copies of | | this message. | || | For corporate legal information go to: | | http://www.cisco.com/web/about/doing_business/legal/cri/index.html | |__ http://www.cisco.com |