[dpdk-dev] Coverity policy for upstream (base) drivers.
2015-11-12 14:05, Stephen Hemminger: > Looking at the Coverity scan for DPDK, it looks like all the base > drivers are marked to be ignored. > > Although the changes to base drivers should not be done directly through > DPDK list. I think it is still valuable to have these driver scanned and > notify (badger) the vendors to fix there code. > > Since lots of the bugs could be there, just blindly ignoring warnings > and issues is being naive. I think the Coverity setup is outdated: ignore_driver_1 /lib/librte_pmd_e1000/e1000/.* Yes Remove ignore_driver_2 /lib/librte_pmd_fm10k/base/.* Yes Remove ignore_driver_3 /lib/librte_pmd_i40e/i40e/.*Yes Remove ignore_driver_4 /lib/librte_pmd_ixgbe/ixgbe/.* Yes Remove These directories don't exist anymore.
[dpdk-dev] [PATCH v2 0/7] ethdev: force deprecation of statistics
2015-11-05 17:04, Stephen Hemminger: > Several fields in ether statistics were tagged with comment that they > were going to be deprecated, but comments don't cause compile warnings. > Instead use Gcc attributes to force the issue. > > Of course to do that, all the drivers and tests which are using > those fields have to be fixed first. > > The input multicast statistic was listed as deprecated, but I find > it useful, and therefore the first patch is to revive it. > > Stephen Hemminger (7): > ether: don't mark input multicast for deprecation not applied > bond: don't sum deprecated statistics > cxgbe: don't report deprecated statistics > i40e: don't report deprecated statistics > e1000: don't report deprecated statistics > test-pmd: remove references to deprecated statistics > rte_ether: mark deprecated statistics with attribute The rest is applied with an extra patch for ip_pipeline example. Thanks
[dpdk-dev] [PATCH 7/7] rte_ether: mark deprecated statistics with attribute
2015-11-05 17:04, Stephen Hemminger: > Use deprecated attribute to highlight any use of fields that > are marked as going away in the rte_ether device statistics. The example app ip_pipeline does not compile. I will add a patch to fix it.
[dpdk-dev] [PATCH v5 4/4] example/vhost: add virtio offload test in vhost sample
Change the codes in vhost sample to test virtio offload feature. These changes include, 1. add two test options: tx-csum and tso. 2. add virtio_tx_offload() function to test vhost TX offload feature for VM to NIC case; however, for VM to VM case, it doesn't need to call this function, the reason is explained in patch 2. Signed-off-by: Jijiang Liu --- examples/vhost/main.c | 105 +++- 1 files changed, 102 insertions(+), 3 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 044c680..210e631 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -51,6 +51,9 @@ #include #include #include +#include +#include +#include #include "main.h" @@ -198,6 +201,13 @@ typedef enum { static uint32_t enable_stats = 0; /* Enable retries on RX. */ static uint32_t enable_retry = 1; + +/* Disable TX checksum offload */ +static uint32_t enable_tx_csum; + +/* Disable TSO offload */ +static uint32_t enable_tso; + /* Specify timeout (in useconds) between retries on RX. */ static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US; /* Specify the number of retries on RX. */ @@ -428,6 +438,14 @@ port_init(uint8_t port) if (port >= rte_eth_dev_count()) return -1; + if (enable_tx_csum == 0) + rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_CSUM); + + if (enable_tso == 0) { + rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4); + rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6); + } + rx_rings = (uint16_t)dev_info.max_rx_queues; /* Configure ethernet device. */ retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); @@ -563,7 +581,9 @@ us_vhost_usage(const char *prgname) " --rx-desc-num [0-N]: the number of descriptors on rx, " "used only when zero copy is enabled.\n" " --tx-desc-num [0-N]: the number of descriptors on tx, " - "used only when zero copy is enabled.\n", + "used only when zero copy is enabled.\n" + " --tx-csum [0|1] disable/enable TX checksum offload.\n" + " --tso [0|1] disable/enable TCP segement offload.\n", prgname); } @@ -589,6 +609,8 @@ us_vhost_parse_args(int argc, char **argv) {"zero-copy", required_argument, NULL, 0}, {"rx-desc-num", required_argument, NULL, 0}, {"tx-desc-num", required_argument, NULL, 0}, + {"tx-csum", required_argument, NULL, 0}, + {"tso", required_argument, NULL, 0}, {NULL, 0, 0, 0}, }; @@ -643,6 +665,28 @@ us_vhost_parse_args(int argc, char **argv) } } + /* Enable/disable TX checksum offload. */ + if (!strncmp(long_option[option_index].name, "tx-csum", MAX_LONG_OPT_SZ)) { + ret = parse_num_opt(optarg, 1); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for tx-csum [0|1]\n"); + us_vhost_usage(prgname); + return -1; + } else + enable_tx_csum = ret; + } + + /* Enable/disable TSO offload. */ + if (!strncmp(long_option[option_index].name, "tso", MAX_LONG_OPT_SZ)) { + ret = parse_num_opt(optarg, 1); + if (ret == -1) { + RTE_LOG(INFO, VHOST_CONFIG, "Invalid argument for tso [0|1]\n"); + us_vhost_usage(prgname); + return -1; + } else + enable_tso = ret; + } + /* Specify the retries delay time (in useconds) on RX. */ if (!strncmp(long_option[option_index].name, "rx-retry-delay", MAX_LONG_OPT_SZ)) { ret = parse_num_opt(optarg, INT32_MAX); @@ -1101,6 +1145,58 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf *m, return 0; } +static uint16_t +get_psd_sum(void *l3_hdr, uint64_t ol_flags) +{ + if (ol_flags & PKT_TX_IPV4) + return rte_ipv4_phdr_cksum(l3_hdr, ol_flags); + else /* assume ethertype == ETHER_TYPE_IPv6 */ + return rte_ipv6_phdr_cksum(l3_hdr, ol_flags); +} + +static void virtio_tx_offload(struct rte_mbuf *m) +{ + void *l3_hdr; + struct ipv4_hdr *ipv4_hdr = NULL; + struct tcp_hdr *tcp_hdr = NULL; + struct udp_hdr *udp_hdr = NULL; + struct sctp_hdr *sctp_hdr = N
[dpdk-dev] [PATCH v5 3/4] sample/vhost: remove the ipv4_hdr structure defination
Remove the ipv4_hdr structure defination in vhost sample. The same structure has already defined in the rte_ip.h file, so we remove the defination from the sample, and include that header file. Signed-off-by: Jijiang Liu --- examples/vhost/main.c | 15 +-- 1 files changed, 1 insertions(+), 14 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index c081b18..044c680 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -50,6 +50,7 @@ #include #include #include +#include #include "main.h" @@ -292,20 +293,6 @@ struct vlan_ethhdr { __be16 h_vlan_encapsulated_proto; }; -/* IPv4 Header */ -struct ipv4_hdr { - uint8_t version_ihl; /**< version and header length */ - uint8_t type_of_service; /**< type of service */ - uint16_t total_length; /**< length of packet */ - uint16_t packet_id; /**< packet ID */ - uint16_t fragment_offset; /**< fragmentation offset */ - uint8_t time_to_live; /**< time to live */ - uint8_t next_proto_id; /**< protocol ID */ - uint16_t hdr_checksum; /**< header checksum */ - uint32_t src_addr; /**< source address */ - uint32_t dst_addr; /**< destination address */ -} __attribute__((__packed__)); - /* Header lengths. */ #define VLAN_HLEN 4 #define VLAN_ETH_HLEN 18 -- 1.7.7.6
[dpdk-dev] [PATCH v5 2/4] vhost/lib: add guest offload setting
Add guest offload setting in vhost lib. Refer to the feature bits description in the Virtual I/O Device (VIRTIO) Version 1.0 below, 1. VIRTIO_NET_F_GUEST_CSUM (1) Driver handles packets with partial checksum. 2. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_NEEDS_- CSUM bit in flags MAY be set: if so, the checksum on the packet is incomplete and csum_start and csum_offset indicate how to calculate it (see Packet Transmission point 1). 3. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were negotiated, then gso_type MAY be something other than VIRTIO_NET_HDR_GSO_NONE, and gso_size field indicates the desired MSS (see Packet Transmission point 2). In order to support these features, the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Enqueue these offloads: convert some fields in mbuf to the fields in virtio_net_hdr. There are more explanations for the implementation. For VM2VM case, there is no need to do checksum, for we think the data should be reliable enough, and setting VIRTIO_NET_HDR_F_NEEDS_CSUM at RX side will let the TCP layer to bypass the checksum validation, so that the RX side could receive the packet in the end. In terms of us-vhost, at vhost RX side, the offload information is inherited from mbuf, which is in turn inherited from TX side. If we can still get those info at RX side, it means the packet is from another VM at same host. So, it's safe to set the VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation. Signed-off-by: Jijiang Liu --- lib/librte_vhost/vhost_rxtx.c | 47 +++- lib/librte_vhost/virtio-net.c |5 +++- 2 files changed, 49 insertions(+), 3 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 47d5f85..9d97e19 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -54,6 +54,44 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM; } +static void +virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) +{ + memset(net_hdr, 0, sizeof(struct virtio_net_hdr)); + + if (m_buf->ol_flags & PKT_TX_L4_MASK) { + net_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + net_hdr->csum_start = m_buf->l2_len + m_buf->l3_len; + + switch (m_buf->ol_flags & PKT_TX_L4_MASK) { + case PKT_TX_TCP_CKSUM: + net_hdr->csum_offset = (offsetof(struct tcp_hdr, + cksum)); + break; + case PKT_TX_UDP_CKSUM: + net_hdr->csum_offset = (offsetof(struct udp_hdr, + dgram_cksum)); + break; + case PKT_TX_SCTP_CKSUM: + net_hdr->csum_offset = (offsetof(struct sctp_hdr, + cksum)); + break; + } + } + + if (m_buf->ol_flags & PKT_TX_TCP_SEG) { + if (m_buf->ol_flags & PKT_TX_IPV4) + net_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; + else + net_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6; + net_hdr->gso_size = m_buf->tso_segsz; + net_hdr->hdr_len = m_buf->l2_len + m_buf->l3_len + + m_buf->l4_len; + } + + return; +} + /** * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet @@ -67,7 +105,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, { struct vhost_virtqueue *vq; struct vring_desc *desc; - struct rte_mbuf *buff; + struct rte_mbuf *buff, *first_buff; /* The virtio_hdr is initialised to 0. */ struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint64_t buff_addr = 0; @@ -139,6 +177,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, desc = &vq->desc[head[packet_success]]; buff = pkts[packet_success]; + first_buff = buff; /* Convert from gpa to vva (guest physical addr -> vhost virtual addr) */ buff_addr = gpa_to_vva(dev, desc->addr); @@ -221,7 +260,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, if (unlikely(uncompleted_pkt == 1)) continue; - + + virtio_enqueue_offload(first_buff, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)buff_hdr_addr, (const void *)&virtio_hdr, vq->vhost_hlen); @@ -295,6 +336,8 @@ copy_from_mbuf_to_vring(struct virtio_net *dev,
[dpdk-dev] [PATCH v5 1/4] vhost/lib: add vhost TX offload capabilities in vhost lib
Add vhost TX offload(CSUM and TSO) support capabilities in vhost lib. Refer to feature bits in Virtual I/O Device (VIRTIO) Version 1.0 below, VIRTIO_NET_F_CSUM (0) Device handles packets with partial checksum. This "checksum offload" is a common feature on modern network cards. VIRTIO_NET_F_HOST_TSO4 (11) Device can receive TSOv4. VIRTIO_NET_F_HOST_TSO6 (12) Device can receive TSOv6. In order to support these features, and the following changes are added, 1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features negotiation. 2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the related fileds in mbuf. Signed-off-by: Jijiang Liu --- lib/librte_vhost/vhost_rxtx.c | 103 + lib/librte_vhost/virtio-net.c |6 ++- 2 files changed, 108 insertions(+), 1 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 9322ce6..47d5f85 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -37,7 +37,12 @@ #include #include +#include +#include #include +#include +#include +#include #include "vhost-net.h" @@ -568,6 +573,97 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, return virtio_dev_rx(dev, queue_id, pkts, count); } +static void +parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr) +{ + struct ipv4_hdr *ipv4_hdr; + struct ipv6_hdr *ipv6_hdr; + void *l3_hdr = NULL; + struct ether_hdr *eth_hdr; + uint16_t ethertype; + + eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); + + m->l2_len = sizeof(struct ether_hdr); + ethertype = rte_be_to_cpu_16(eth_hdr->ether_type); + + if (ethertype == ETHER_TYPE_VLAN) { + struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1); + + m->l2_len += sizeof(struct vlan_hdr); + ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto); + } + + l3_hdr = (char *)eth_hdr + m->l2_len; + + switch (ethertype) { + case ETHER_TYPE_IPv4: + ipv4_hdr = (struct ipv4_hdr *)l3_hdr; + *l4_proto = ipv4_hdr->next_proto_id; + m->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4; + *l4_hdr = (char *)l3_hdr + m->l3_len; + m->ol_flags |= PKT_TX_IPV4; + break; + case ETHER_TYPE_IPv6: + ipv6_hdr = (struct ipv6_hdr *)l3_hdr; + *l4_proto = ipv6_hdr->proto; + m->l3_len = sizeof(struct ipv6_hdr); + *l4_hdr = (char *)l3_hdr + m->l3_len; + m->ol_flags |= PKT_TX_IPV6; + break; + default: + m->l3_len = 0; + *l4_proto = 0; + break; + } +} + +static inline void __attribute__((always_inline)) +vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) +{ + uint16_t l4_proto = 0; + void *l4_hdr = NULL; + struct tcp_hdr *tcp_hdr = NULL; + + parse_ethernet(m, &l4_proto, &l4_hdr); + if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { + if (hdr->csum_start == (m->l2_len + m->l3_len)) { + switch (hdr->csum_offset) { + case (offsetof(struct tcp_hdr, cksum)): + if (l4_proto == IPPROTO_TCP) + m->ol_flags |= PKT_TX_TCP_CKSUM; + break; + case (offsetof(struct udp_hdr, dgram_cksum)): + if (l4_proto == IPPROTO_UDP) + m->ol_flags |= PKT_TX_UDP_CKSUM; + break; + case (offsetof(struct sctp_hdr, cksum)): + if (l4_proto == IPPROTO_SCTP) + m->ol_flags |= PKT_TX_SCTP_CKSUM; + break; + default: + break; + } + } + } + + if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { + switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_TCPV6: + tcp_hdr = (struct tcp_hdr *)l4_hdr; + m->ol_flags |= PKT_TX_TCP_SEG; + m->tso_segsz = hdr->gso_size; + m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + break; + default: + RTE_LOG(WARNING, VHOST_DATA, + "unsupported gso type %u.\n", hdr->gso_type); + break; + } + } +} + uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -576,11 +672,13 @@
[dpdk-dev] [PATCH v5 0/4] add virtio offload support in us-vhost
Adds virtio offload support in us-vhost. The patch set adds the feature negotiation of checksum and TSO between us-vhost and vanilla Linux virtio guest, and add these offload features support in the vhost lib, and change vhost sample to test them. v5 changes: Add more clear descriptions to explain these changes. reset the 'virtio_net_hdr' value in the virtio_enqueue_offload() function. reorganize patches. v4 change: remove virtio-net change, only keep vhost changes. add guest TX offload capabilities to support VM to VM case. split the cleanup code as a separate patch. v3 change: rebase latest codes. v2 change: fill virtio device information for TX offloads. *** BLURB HERE *** Jijiang Liu (4): add vhost offload capabilities remove ipv4_hdr structure from vhost sample. add guest offload setting ln the vhost lib. change vhost application to test checksum and TSO for VM to NIC case examples/vhost/main.c | 120 - lib/librte_vhost/vhost_rxtx.c | 150 - lib/librte_vhost/virtio-net.c |9 ++- 3 files changed, 259 insertions(+), 20 deletions(-) -- 1.7.7.6
[dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD
> > + if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) { > + ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG, > +&open_iface, &iface_name); > + if (ret < 0) > + goto out_free; > + } > I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass the iface option, so this should probably return an error if the option doesn't exist.
[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs
2015-11-12 09:38, Stephen Hemminger: > On Thu, 12 Nov 2015 08:17:57 +0800 > Jianfeng Tan wrote: > > > This patch enables calculating space left in a hugetlbfs. > > There are three sources to get the information: 1. from > > sysfs; 2. from option size specified when mount; 3. use > > statfs. We should use the minimum one of these three sizes. > > > > Signed-off-by: Jianfeng Tan > > Thanks, the hugetlbfs usage up until now has been rather brute force. > I wonder if long term it might be better to defer all this stuff > to another library like libhugetlbfs. > https://github.com/libhugetlbfs/libhugetlbfs > > Especially wen dealing with other architectures it might provide > some nice abstraction. Maybe, maybe not :) Sergio arleady looked at it: http://dpdk.org/ml/archives/dev/2015-July/022080.html
[dpdk-dev] Coverity policy for upstream (base) drivers.
On Thu, Nov 12, 2015 at 02:05:08PM -0800, Stephen Hemminger wrote: > Looking at the Coverity scan for DPDK, it looks like all the base > drivers are marked to be ignored. > > Although the changes to base drivers should not be done directly through > DPDK list. I think it is still valuable to have these driver scanned and > notify (badger) the vendors to fix there code. > > Since lots of the bugs could be there, just blindly ignoring warnings > and issues is being naive. I am with Stephen. Ignoring base driver vulns is a bad practice. With these L1-L4 bugs the chances are good somebody could trigger these and find 0days using tools as old and simple as this one: http://isic.sourceforge.net/ Matthew.
[dpdk-dev] [PATCH 1/7] ether: don't mark input multicast for deprecation
2015-11-05 17:04, Stephen Hemminger: > The number of received multicast frames is useful and already > available in many/most drivers. Therefore don't mark it as > deprecated. There are other useful stats in xstats. The idea of this basic stats structure is to provide only the really mandatory and basic counters. A multicast counter is not so basic and won't be implemented everywhere. This patch won't be applied. We'll need a consensus to definitively remove the deprecated stats.
[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?
Hi folks, With the addition of hot plug support we have been migrating away from device discovery and attach at initialization time to a model where it is controlled from a separate process. The separate process manages the binding of devices to UIO and instructs the DPDK process when to attach. One of the problems we stumbled onto was that if our control process discovered devices and bound them to UIO before our DPDK process started, then rte_eal_init() would discover and attach to those devices via the rte_eal_pci_probe() invocation. This caused problems later on when when our control process, instructed our DPDK process to attach to a device. There are a number of ways we could address this, but the simplest is to prevent the rte_eal_pci_probe() at rte_eal_init() time. In our model we will never need it and I suspect others may also be in that boat. What are your thoughts on adding an argument to instruct rte_eal_init() to skip the PCI probe? Thanks, -Roger
[dpdk-dev] [PATCH 0/3] xstats queue handling
2015-11-06 14:12, Harry van Haaren: > This patchset modifies how queue statistics are presented by > rte_eth_xstats_get() and each PMD's xstats_get(). > > Generic stats from the rte_eth_stats struct are presented by rte, and each > PMD can augment those stats with extra stats that are available (if any). > > Currently ixgbe and i40e are the only NICs supporting queue xstats, and > they have been updated to conform with the new method of presentation. > > > Harry van Haaren (3): > ethdev: xstats generic Q stats refactor > ixgbe: refactor xstats queue handling > i40e: refactor xstats queue handling Applied, thanks
[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote: > The guest could trigger this buffer overflow by creating a cycle of > descriptors > (which would also cause an infinite loop). The more common case is that > vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This > happens nearly every time when restarting a DPDK app inside a VM connected to > a > vhost-user vswitch because the virtqueue memory allocated by the previous run > is zeroed. Hi, I somehow was aware of this issue before while reading the code. Thinking that we never met that, I delayed the fix (it was still in my TODO list). Would you please tell me the steps (commands would be better) to reproduce your issue? I'd like to know more about the isue: I'm guessing maybe we need fix it with a bit more cares. --yliu > > Signed-off-by: Rich Lane > --- > lib/librte_vhost/vhost_rxtx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > index 9322ce6..d95b478 100644 > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id, > vq->buf_vec[vec_id].desc_idx = idx; > vec_id++; > > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < > BUF_VECTOR_MAX) { > idx = vq->desc[idx].next; > next_desc = 1; > } > -- > 1.9.1
[dpdk-dev] [PATCH v4 0/2] Add support for driver directories
> > This mini-series adds support for driver directory concept > > based on idea by Thomas Monjalon back in February: > > http://dpdk.org/ml/archives/dev/2015-February/013285.html > > > > In the process FreeBSD also gains plugin support (but untested). > > > > v4: - introduce error-early behavior for invalid plugin paths > > - support directories via the existing -d option instead of adding new > > > > v3: - merge the first commits > > > > v2: - move code to eal/common > > - add bsd support > > > > Panu Matilainen (2): > > eal: move plugin loading to eal/common > > eal: add support for driver directory concept > > > checkpatch complains for some indent problem (Thomas, can you fix this ?), > but the rest looks good to me. > > Acked-by: David Marchand > > Thanks Panu. Applied, thanks
[dpdk-dev] [PATCH] MAINTAINERS: update maintainer for reorder library
> > Reorder > > -M: Sergio Gonzalez Monroy > > +M: Reshma Pattan > > F: lib/librte_reorder/ > > F: doc/guides/prog_guide/reorder_lib.rst > > F: app/test/test_reorder* > Acked-by: Sergio Gonzalez Monroy So you are replacing Sergio. Any enhancement or feature planned?
[dpdk-dev] [PATCH v2] app/test: fix reorder library unit test
2015-10-30 14:30, Sergio Gonzalez Monroy: > On 21/10/2015 14:01, Sergio Gonzalez Monroy wrote: > > On 21/10/2015 11:50, Reshma Pattan wrote: > >> The reorder library unit test was performed under the assumption that > >> the start > >> sequence number was always 0. > >> This is not the case anymore as the start sequence number is > >> initialized by the first > >> packet inserted into the reorder buffer. > >> > >> This patch updates the unit test to reflect the new behavior. > >> > >> Fixes: 7e1fa1de8a53 ("reorder: allow random number as starting point") > >> > >> Signed-off-by: Reshma Pattan > >> > > Acked-by: Sergio Gonzalez Monroy > Forgot to add this tag: > > Reported-by: Mukesh Dua Applied, thanks
[dpdk-dev] [PATCH] i40e: fix the issue of trying more VSIs for VMDq than available
It fixes the issue of trying to allocate more VSIs for VMDq than hardware remaining. It adds a check of the hardware remaining before allocating VSIs for VMDq. Fixes: c80707a0fd9c ("i40e: fix VMDq pool limit") Signed-off-by: Helin Zhang --- drivers/net/i40e/i40e_ethdev.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index e4684d3..323b1ff 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -3118,7 +3118,8 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev) pf->vmdq_nb_qps = 0; pf->max_nb_vmdq_vsi = 0; if (hw->func_caps.vmdq) { - if (qp_count < hw->func_caps.num_tx_qp) { + if (qp_count < hw->func_caps.num_tx_qp && + vsi_count < hw->func_caps.num_vsis) { pf->max_nb_vmdq_vsi = (hw->func_caps.num_tx_qp - qp_count) / pf->vmdq_nb_qp_max; @@ -3126,6 +3127,8 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev) * ethdev can support */ pf->max_nb_vmdq_vsi = RTE_MIN(pf->max_nb_vmdq_vsi, + hw->func_caps.num_vsis - vsi_count); + pf->max_nb_vmdq_vsi = RTE_MIN(pf->max_nb_vmdq_vsi, ETH_64_POOLS); if (pf->max_nb_vmdq_vsi) { pf->flags |= I40E_FLAG_VMDQ; @@ -3140,7 +3143,7 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev) "VMDq"); } } else { - PMD_DRV_LOG(INFO, "No queue left for VMDq"); + PMD_DRV_LOG(INFO, "No queue or VSI left for VMDq"); } } qp_count += pf->vmdq_nb_qps * pf->max_nb_vmdq_vsi; -- 1.9.3
[dpdk-dev] [PATCH] doc: add entry for enic PMD Tx improvement to the 2.2 release notes.
2015-11-06 15:08, johndale: > Signed-off-by: johndale Applied, thanks
[dpdk-dev] Permanently binding NIC ports with DPDK drivers
On 11/11/2015 06:28 PM, Bruce Richardson wrote: > On Wed, Nov 11, 2015 at 04:13:01PM +, Montorsi, Francesco wrote: >> Hi, >> Is there a way to permanently (i.e., have the configuration automatically >> applied after reboot) bind a NIC port to DPDK? >> >> In case there's none, I'm thinking to save in my software a list of the NIC >> ports chosen by the user for use with DPDK and then, upon software startup >> to just do >> for (int i=0; i < ...; i++) >> system("dpdk_nic_bind.py --bind=igb_uio " + PCI_device_chosen[i]); >> Do you see any problem with that? >> >> Thanks! >> Francesco Montorsi >> > > Hi Francesco, > > I'm not aware of any way to make the bindings permanent across reboots. What > you > have suggested will work, but there are probably better ways to do the same > thing. > For example, a couple of lines in an rc.local script can reapply the bindings > at > boot for you. I'm sure others can suggest other ways of having the same > effect, > for example, there may be a way to automatically do this using udev or systemd > or some such package. I've been looking into this recently, here's what I have so far: http://laiskiainen.org/git/?p=driverctl.git For the impatient, "make rpm" should produce something usable for recent Fedora/RHEL systems, usage looks somewhat like this: Find devices currently driven by ixgbe driver: # driverctl -v list-devices | grep ixgbe :01:00.0 ixgbe (Ethernet 10G 4P X520/I350 rNDC) :01:00.1 ixgbe (Ethernet 10G 4P X520/I350 rNDC) Change them to use the vfio-pci driver permanently: # driverctl set-override :01:00.0 vfio-pci # driverctl set-override :01:00.1 vfio-pci Find devices with driver overrides: [root at wsfd-netdev32 ~]# driverctl -v list-devices|grep \* :01:00.0 vfio-pci [*] (Ethernet 10G 4P X520/I350 rNDC) :01:00.1 vfio-pci [*] (Ethernet 10G 4P X520/I350 rNDC) Remove the permanent driver override for device :01:00.1: # driverctl unset-override :01:00.1 In addition it has udev rules to export vfio and uio devices on systemd level, eg the above looks like this with normal drivers: # systemctl |grep :01:00 sys-devices-pci:00-:00:03.0-:01:00.0-net-em1.device loaded active plugged Ethernet 10G 4P X520/I350 rNDC sys-devices-pci:00-:00:03.0-:01:00.1-net-em2.device loaded active plugged Ethernet 10G 4P X520/I350 rNDC When changed to vfio, with upstream systemd/udev rules they would just disappear entirely, but with the driverctl rules they become: # systemctl |grep :01:00 sys-devices-pci:00-:00:03.0-:01:00.0-vfio.device loaded active plugged /sys/devices/pci:00/:00:03.0/:01:00.0/vfio sys-devices-pci:00-:00:03.0-:01:00.1-vfio.device loaded active plugged /sys/devices/pci:00/:00:03.0/:01:00.1/vfio - Panu -
[dpdk-dev] [PATCH 1/7] ether: don't mark input multicast for deprecation
On Thu, 5 Nov 2015 17:04:33 -0800 Stephen Hemminger wrote: > The number of received multicast frames is useful and already > available in many/most drivers. Therefore don't mark it as > deprecated. > > Signed-off-by: Stephen Hemminger > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 1 - > lib/librte_ether/rte_ethdev.h| 3 +-- > 2 files changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index 0b0bbcf..3b71c0c 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -2715,7 +2715,6 @@ ixgbevf_dev_stats_get(struct rte_eth_dev *dev, struct > rte_eth_stats *stats) > stats->opackets = hw_stats->vfgptc; > stats->obytes = hw_stats->vfgotc; > stats->imcasts = hw_stats->vfmprc; > - /* stats->imcasts should be removed as imcasts is deprecated */ > } > > static void > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h > index 48a540d..f653e37 100644 > --- a/lib/librte_ether/rte_ethdev.h > +++ b/lib/librte_ether/rte_ethdev.h > @@ -204,8 +204,7 @@ struct rte_eth_stats { > /**< Deprecated; Total of RX packets with bad length. */ > uint64_t ierrors; /**< Total number of erroneous received packets. */ > uint64_t oerrors; /**< Total number of failed transmitted packets. */ > - uint64_t imcasts; > - /**< Deprecated; Total number of multicast received packets. */ > + uint64_t imcasts; /**< Total number of multicast received packets. */ > uint64_t rx_nombuf; /**< Total number of RX mbuf allocation failures. */ > uint64_t fdirmatch; > /**< Deprecated; Total number of RX packets matching a filter. */ I am okay with removing imcasts if all the drivers that support provide the same information in xstats.
[dpdk-dev] [PATCH v3] vhost: fix mmap failure as len not aligned with hugepage size
This patch fixes a bug under lower version linux kernel, mmap() fails when length is not aligned with hugepage size. mmap() without flag of MAP_ANONYMOUS, should be called with length argument aligned with hugepagesz at older longterm version Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL. This bug was fixed in Linux kernel by commit: dab2d3dc45ae7343216635d981d43637e1cb7d45 To avoid failure, make sure in caller to keep length aligned. v3 changes: - fix (u64) -> (void *) convert error on 32-bit system v2 changes: - add Kernel version comments and commit msg - remove unnecessary alignments when munmap Signed-off-by: Jianfeng Tan --- lib/librte_vhost/vhost_user/virtio-net-user.c | 36 --- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index d07452a..99da029 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -74,7 +74,6 @@ free_mem_region(struct virtio_net *dev) { struct orig_region_map *region; unsigned int idx; - uint64_t alignment; if (!dev || !dev->mem) return; @@ -82,12 +81,8 @@ free_mem_region(struct virtio_net *dev) region = orig_region(dev->mem, dev->mem->nregions); for (idx = 0; idx < dev->mem->nregions; idx++) { if (region[idx].mapped_address) { - alignment = region[idx].blksz; - munmap((void *)(uintptr_t) - RTE_ALIGN_FLOOR( - region[idx].mapped_address, alignment), - RTE_ALIGN_CEIL( - region[idx].mapped_size, alignment)); + munmap((void *)(uintptr_t)region[idx].mapped_address, + region[idx].mapped_size); close(region[idx].fd); } } @@ -147,6 +142,18 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) /* This is ugly */ mapped_size = memory.regions[idx].memory_size + memory.regions[idx].mmap_offset; + + /* mmap() without flag of MAP_ANONYMOUS, should be called +* with length argument aligned with hugepagesz at older +* longterm version Linux, like 2.6.32 and 3.2.72, or +* mmap() will fail with EINVAL. +* +* to avoid failure, make sure in caller to keep length +* aligned. +*/ + alignment = get_blk_size(pmsg->fds[idx]); + mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment); + mapped_address = (uint64_t)(uintptr_t)mmap(NULL, mapped_size, PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9 +161,11 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) 0); RTE_LOG(INFO, VHOST_CONFIG, - "mapped region %d fd:%d to %p sz:0x%"PRIx64" off:0x%"PRIx64"\n", + "mapped region %d fd:%d to:%p sz:0x%"PRIx64" " + "off:0x%"PRIx64" align:0x%"PRIx64"\n", idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address, - mapped_size, memory.regions[idx].mmap_offset); + mapped_size, memory.regions[idx].mmap_offset, + alignment); if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { RTE_LOG(ERR, VHOST_CONFIG, @@ -166,7 +175,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) pregion_orig[idx].mapped_address = mapped_address; pregion_orig[idx].mapped_size = mapped_size; - pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]); + pregion_orig[idx].blksz = alignment; pregion_orig[idx].fd = pmsg->fds[idx]; mapped_address += memory.regions[idx].mmap_offset; @@ -193,11 +202,8 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) err_mmap: while (idx--) { - alignment = pregion_orig[idx].blksz; - munmap((void *)(uintptr_t)RTE_ALIGN_FLOOR( - pregion_orig[idx].mapped_address, alignment), - RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size, - alignment)); + munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address, + pregion_orig[idx].mapped_size); close(pregion_orig[idx].fd); } free(dev->mem); -- 2.1.4
[dpdk-dev] Coverity policy for upstream (base) drivers.
Looking at the Coverity scan for DPDK, it looks like all the base drivers are marked to be ignored. Although the changes to base drivers should not be done directly through DPDK list. I think it is still valuable to have these driver scanned and notify (badger) the vendors to fix there code. Since lots of the bugs could be there, just blindly ignoring warnings and issues is being naive.
[dpdk-dev] ACL Library Information Request
HI, I've read the documentation and looked at the example acl app. What is the best practice for deleting rules? The API looks like a new context needs created and built. Is that true? Also, this is more of a confirmation, but RTE_ACL_MAX_FIELDS is defined as 64, so I assume that for ipv4 we can have a tuple that's larger than 5? Thanks, Jason
[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
You can reproduce this with l2fwd and the vhost PMD. You'll need this patch on top of the vhost PMD patches: --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx) return -1; if (dev->flags & VIRTIO_DEV_RUNNING) - notify_ops->destroy_device(dev); + notify_destroy_device(dev); cleanup_device(dev); reset_device(dev); 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev eth_vhost0,iface=/run/vhost0.sock -- -p3 2. Start a VM using vhost-user and set up uio, hugepages, etc. 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3 4. Kill the l2fwd inside the VM with SIGINT. 5. Start l2fwd inside the VM. 6. l2fwd on the host crashes. I found the source of the memory corruption by setting a watchpoint in gdb: watch -l rte_eth_devices[1].data->rx_queues On Thu, Nov 12, 2015 at 1:23 AM, Yuanhan Liu wrote: > On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote: > > The guest could trigger this buffer overflow by creating a cycle of > descriptors > > (which would also cause an infinite loop). The more common case is that > > vq->avail->idx jumps out of the range [last_used_idx, > last_used_idx+256). This > > happens nearly every time when restarting a DPDK app inside a VM > connected to a > > vhost-user vswitch because the virtqueue memory allocated by the > previous run > > is zeroed. > > Hi, > > I somehow was aware of this issue before while reading the code. > Thinking that we never met that, I delayed the fix (it was still > in my TODO list). > > Would you please tell me the steps (commands would be better) to > reproduce your issue? I'd like to know more about the isue: I'm > guessing maybe we need fix it with a bit more cares. > > --yliu > > > > Signed-off-by: Rich Lane > > --- > > lib/librte_vhost/vhost_rxtx.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/librte_vhost/vhost_rxtx.c > b/lib/librte_vhost/vhost_rxtx.c > > index 9322ce6..d95b478 100644 > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, > uint32_t id, > > vq->buf_vec[vec_id].desc_idx = idx; > > vec_id++; > > > > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { > > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < > BUF_VECTOR_MAX) { > > idx = vq->desc[idx].next; > > next_desc = 1; > > } > > -- > > 1.9.1 >
[dpdk-dev] [PATCH v6 0/8] add sample ptp slave application
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara > Sent: Thursday, November 12, 2015 12:56 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v6 0/8] add sample ptp slave application > > > Add a sample application that acts as a PTP slave using the DPDK IEEE1588 > functions. > > Also add some additional IEEE1588 support functions to enable getting, > setting and adjusting the device time. > > V5->v6: > - Moved common functionality for cyclecounter and time conversions >functions to lib/librte_eal/common/include/rte_time.h, based on mailing >list comments. > - Prefixed functions with rte_ and added Doxygen comments. > - Refactored cyclecounter structs from previous version to make it more >generic. > - Fix ieee1588 fwd output in testpmd. Series Acked-by: John McNamara
[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs
Hi, On 12/11/2015 02:10, Jianfeng Tan wrote: > This patch enables calculating space left in a hugetlbfs. > There are three sources to get the information: 1. from > sysfs; 2. from option size specified when mount; 3. use > statfs. We should use the minimum one of these three sizes. We could improve the message by stating the current issue (when the hugetlbfs mount specifies size= option), then how the patch deals with the problem and also outstanding issues. > Signed-off-by: Jianfeng Tan > --- > Changes in v2: > - reword title > - fix compiler error of v1 > > lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 > - > 1 file changed, 84 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > index 18858e2..8305a58 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c > @@ -44,6 +44,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz) > return retval; > } > > +/* Caller to make sure this mnt_dir exist > + */ > +static uint64_t > +get_hugetlbfs_mount_size(const char *mnt_dir) > +{ > + char *start, *end, *opt_size; > + struct mntent *ent; > + uint64_t size; > + FILE *f; > + int len; > + > + f = setmntent("/proc/mounts", "r"); > + if (f == NULL) { > + RTE_LOG(ERR, EAL, "setmntent() error: %s\n", > + strerror(errno)); > + return 0; > + } > + while (NULL != (ent = getmntent(f))) { > + if (!strcmp(ent->mnt_dir, mnt_dir)) > + break; > + } > + > + start = hasmntopt(ent, "size"); > + if (start == NULL) { > + RTE_LOG(DEBUG, EAL, "option size not specified for %s\n", > + mnt_dir); > + size = 0; > + goto end; > + } > + start += strlen("size="); > + end = strstr(start, ","); > + if (end != NULL) > + len = end - start; > + else > + len = strlen(start); > + opt_size = strndup(start, len); > + size = rte_str_to_size(opt_size); > + free(opt_size); > + > +end: > + endmntent(f); > + return size; > +} > + The function above is very similar to get_hugepage_dir, ie. open and parse /proc/mounts. I think it would be better to have a more generic function that retrieves all needed info from /proc/mounts. > +/* Caller to make sure this mount has option size > + * so that statfs is not zero. > + */ > +static uint64_t > +get_hugetlbfs_free_size(const char *mnt_dir) > +{ > + int r; > + struct statfs stats; > + > + r = statfs(mnt_dir, &stats); > + if (r != 0) { > + RTE_LOG(ERR, EAL, "statfs() error: %s\n", > + strerror(errno)); > + return 0; > + } > + > + return stats.f_bfree * stats.f_bsize; > +} > + > + > /* >* Clear the hugepage directory of whatever hugepage files >* there are. Checks if the file is locked (i.e. > @@ -329,9 +395,26 @@ eal_hugepage_info_init(void) > if (clear_hugedir(hpi->hugedir) == -1) > break; > > + /* there are three souces of how much space left in a > + * hugetlbfs dir. > + */ > + uint64_t sz_left, sz_sysfs, sz_option, sz_statfs; > + > + sz_sysfs = get_num_hugepages(dirent->d_name) * > + hpi->hugepage_sz; > + sz_left = sz_sysfs; > + sz_option = get_hugetlbfs_mount_size(hpi->hugedir); > + if (sz_option) { > + sz_statfs = get_hugetlbfs_free_size(hpi->hugedir); > + sz_left = RTE_MIN(sz_sysfs, sz_statfs); > + RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: " > + "%"PRIu64", sz_statfs: %"PRIu64"\n", > + sz_sysfs, sz_option, sz_statfs); > + } > + > /* for now, put all pages into socket 0, >* later they will be sorted */ > - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); > + hpi->num_pages[0] = sz_left / hpi->hugepage_sz; > > #ifndef RTE_ARCH_64 > /* for 32-bit systems, limit number of hugepages to A couple more things: - Update release-notes and/or relevant doc about improved detection of free hugepages - Update the status of previous/old patches in patchwork Sergio
[dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from testpmd
From: "Chen Jing D(Mark)" When the fm10k port is closed, both func tx_queue_clean() and fm10k_tx_queue_release_mbufs_vec() will try to release buffer in SW ring. The latter func won't do sanity check on those pointers and cause crash. The fix include 2 parts. 1. Remove Vector TX buffer release func since it can share the release functions with regular TX. 2. Add log to print out what actual Rx/Tx func is used. Signed-off-by: Chen Jing D(Mark) --- drivers/net/fm10k/fm10k.h |1 - drivers/net/fm10k/fm10k_ethdev.c | 17 - drivers/net/fm10k/fm10k_rxtx_vec.c | 28 3 files changed, 12 insertions(+), 34 deletions(-) diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h index 754aa6a..38d5489 100644 --- a/drivers/net/fm10k/fm10k.h +++ b/drivers/net/fm10k/fm10k.h @@ -237,7 +237,6 @@ struct fm10k_tx_queue { }; struct fm10k_txq_ops { - void (*release_mbufs)(struct fm10k_tx_queue *txq); void (*reset)(struct fm10k_tx_queue *txq); }; diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c index cf7ada7..af7b0c2 100644 --- a/drivers/net/fm10k/fm10k_ethdev.c +++ b/drivers/net/fm10k/fm10k_ethdev.c @@ -386,7 +386,6 @@ fm10k_check_mq_mode(struct rte_eth_dev *dev) } static const struct fm10k_txq_ops def_txq_ops = { - .release_mbufs = tx_queue_free, .reset = tx_queue_reset, }; @@ -1073,7 +1072,7 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev) for (i = 0; i < dev->data->nb_tx_queues; i++) { struct fm10k_tx_queue *txq = dev->data->tx_queues[i]; - txq->ops->release_mbufs(txq); + tx_queue_free(txq); } } @@ -1793,7 +1792,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_id, if (dev->data->tx_queues[queue_id] != NULL) { struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id]; - txq->ops->release_mbufs(txq); + tx_queue_free(txq); dev->data->tx_queues[queue_id] = NULL; } @@ -1872,7 +1871,7 @@ fm10k_tx_queue_release(void *queue) struct fm10k_tx_queue *q = queue; PMD_INIT_FUNC_TRACE(); - q->ops->release_mbufs(q); + tx_queue_free(q); } static int @@ -2439,13 +2438,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev) } if (use_sse) { + PMD_INIT_LOG(ERR, "Use vector Tx func"); for (i = 0; i < dev->data->nb_tx_queues; i++) { txq = dev->data->tx_queues[i]; fm10k_txq_vec_setup(txq); } dev->tx_pkt_burst = fm10k_xmit_pkts_vec; - } else + } else { dev->tx_pkt_burst = fm10k_xmit_pkts; + PMD_INIT_LOG(ERR, "Use regular Tx func"); + } } static void __attribute__((cold)) @@ -2469,6 +2471,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev) (dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec || dev->rx_pkt_burst == fm10k_recv_pkts_vec); + if (rx_using_sse) + PMD_INIT_LOG(ERR, "Use vector Rx func"); + else + PMD_INIT_LOG(ERR, "Use regular Rx func"); + for (i = 0; i < dev->data->nb_rx_queues; i++) { struct fm10k_rx_queue *rxq = dev->data->rx_queues[i]; diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 06beca9..6042568 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -45,8 +45,6 @@ #endif static void -fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq); -static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); /* Handling the offload flags (olflags) field takes computation @@ -634,7 +632,6 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue, } static const struct fm10k_txq_ops vec_txq_ops = { - .release_mbufs = fm10k_tx_queue_release_mbufs_vec, .reset = fm10k_reset_tx_queue, }; @@ -795,31 +792,6 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf **tx_pkts, } static void __attribute__((cold)) -fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq) -{ - unsigned i; - const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1); - - if (txq->sw_ring == NULL || txq->nb_free == max_desc) - return; - - /* release the used mbufs in sw_ring */ - for (i = txq->next_dd - (txq->rs_thresh - 1); -i != txq->next_free; -i = (i + 1) & max_desc) - rte_pktmbuf_free_seg(txq->sw_ring[i]); - - txq->nb_free = max_desc; - - /* reset tx_entry */ - for (i = 0; i < txq->nb_desc; i++) - txq->sw_ring[i] = NULL; - - rte_free(txq->sw_ring); - txq->sw_ring = NULL; -} - -static void __attribute__((cold)) fm10k_reset_tx_queue(struct fm10k_tx_queue *txq) { static const st
[dpdk-dev] [PATCH v6 8/8] doc: add a ptpclient sample guide
From: Daniel Mrzyglod Add a sample app guide for the ptpclient application. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- doc/guides/sample_app_ug/img/ptpclient.svg | 524 + doc/guides/sample_app_ug/index.rst | 3 + doc/guides/sample_app_ug/ptpclient.rst | 306 + 3 files changed, 833 insertions(+) create mode 100644 doc/guides/sample_app_ug/img/ptpclient.svg create mode 100644 doc/guides/sample_app_ug/ptpclient.rst diff --git a/doc/guides/sample_app_ug/img/ptpclient.svg b/doc/guides/sample_app_ug/img/ptpclient.svg new file mode 100644 index 000..84f9c22 --- /dev/null +++ b/doc/guides/sample_app_ug/img/ptpclient.svg @@ -0,0 +1,524 @@ + + + +http://purl.org/dc/elements/1.1/"; + xmlns:cc="http://creativecommons.org/ns#"; + xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; + xmlns:svg="http://www.w3.org/2000/svg"; + xmlns="http://www.w3.org/2000/svg"; + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"; + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"; + width="105mm" + height="148mm" + viewBox="0 0 372.04724 524.40945" + id="svg2" + version="1.1" + inkscape:version="0.91 r13725" + sodipodi:docname="ptpclient.svg"> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +image/svg+xml +http://purl.org/dc/dcmitype/StillImage"; /> + + + + + + + + + +T2 +FOLLOW UP:T1 + +DELAY REQUEST +T3 +T4 +T1 + +DELAY RESPONSE:T4 +time + +master + +slave +SYNC + + diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index 9beedd9..8ae86c0 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -73,6 +73,7 @@ Sample Applications User Guide vm_power_management tep_termination proc_info +ptpclient **Figures** @@ -136,6 +137,8 @@ Sample Applications User Guide :numref:`figure_overlay_networking` :ref:`figure_overlay_networking` :numref:`figure_tep_termination_arch` :ref:`figure_tep_termination_arch` +:numref:`figure_ptpclient_highlevel` :ref:`figure_ptpclient_highlevel` + **Tables** :numref:`table_qos_metering_1` :ref:`table_qos_metering_1` diff --git a/doc/guides/sample_app_ug/ptpclient.rst b/doc/guides/sample_app_ug/ptpclient.rst new file mode 100644 index 000..6e425b7 --- /dev/null +++ b/doc/guides/sample_app_ug/ptpclient.rst @@ -0,0 +1,306 @@ +.. BSD LICENSE +Copyright(c) 2015 Intel Corporation. All rights reserved. +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions +are met: + +* Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. +* Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in +the documentation and/or other materials provided with the +distribution. +* Neither the name of Intel Corporation nor the names of its +contributors may be used to endorse or promote products derived +from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + +PTP Client Sample Application += + +The PTP (Precision Time Protocol) client sample application is a simple +example of using the DPDK IEEE1588 API to communicate with a PTP master clock +to synchronize the time on the NIC and, optionally, on the Linux system. + +Note, PTP is a time syncing protocol and cannot be used within DPDK as a +time-stamping mechanism. See the following for an explanation of th
[dpdk-dev] [PATCH v6 7/8] example: minimal ptp client implementation
From: Daniel Mrzyglod Add a sample application that acts as a PTP slave using the DPDK ieee1588 functions. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- MAINTAINERS| 4 + examples/Makefile | 1 + examples/ptpclient/Makefile| 56 +++ examples/ptpclient/ptpclient.c | 780 + 4 files changed, 841 insertions(+) create mode 100644 examples/ptpclient/Makefile create mode 100644 examples/ptpclient/ptpclient.c diff --git a/MAINTAINERS b/MAINTAINERS index c8be5d2..28b04ae 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -520,3 +520,7 @@ F: examples/tep_termination/ F: examples/vmdq/ F: examples/vmdq_dcb/ F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst + +M: Pablo de Lara +M: Daniel Mrzyglod +F: examples/ptpclient diff --git a/examples/Makefile b/examples/Makefile index b4eddbd..4672534 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -74,5 +74,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen DIRS-y += vmdq DIRS-y += vmdq_dcb DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager +DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient include $(RTE_SDK)/mk/rte.extsubdir.mk diff --git a/examples/ptpclient/Makefile b/examples/ptpclient/Makefile new file mode 100644 index 000..b77cf71 --- /dev/null +++ b/examples/ptpclient/Makefile @@ -0,0 +1,56 @@ +# BSD LICENSE +# +# Copyright(c) 2015 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriddegitn by command line or environment +RTE_TARGET ?= x86_64-native-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = ptpclient + +# all source are stored in SRCS-y +SRCS-y := ptpclient.c + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c new file mode 100644 index 000..0af4f3b --- /dev/null +++ b/examples/ptpclient/ptpclient.c @@ -0,0 +1,780 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + *
[dpdk-dev] [PATCH v6 6/8] testpmd: add nanosecond output for ieee1588 fwd
Testpmd was only printing out second values when printing RX/TX timestamp value, instead of both second and nanoseconds. Since resolution of time counters is in nanoseconds, testpmd should print out both. Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- app/test-pmd/ieee1588fwd.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/app/test-pmd/ieee1588fwd.c b/app/test-pmd/ieee1588fwd.c index b1a301b..c69023a 100644 --- a/app/test-pmd/ieee1588fwd.c +++ b/app/test-pmd/ieee1588fwd.c @@ -89,8 +89,8 @@ port_ieee1588_rx_timestamp_check(portid_t pi, uint32_t index) (unsigned) pi); return; } - printf("Port %u RX timestamp value %lu\n", - (unsigned) pi, timestamp.tv_sec); + printf("Port %u RX timestamp value %lu s %lu ns\n", + (unsigned) pi, timestamp.tv_sec, timestamp.tv_nsec); } #define MAX_TX_TMST_WAIT_MICROSECS 1000 /**< 1 milli-second */ @@ -112,9 +112,9 @@ port_ieee1588_tx_timestamp_check(portid_t pi) (unsigned) pi, (unsigned) MAX_TX_TMST_WAIT_MICROSECS); return; } - printf("Port %u TX timestamp value %lu validated after " + printf("Port %u TX timestamp value %lu s %lu ns validated after " "%u micro-second%s\n", - (unsigned) pi, timestamp.tv_sec, wait_us, + (unsigned) pi, timestamp.tv_sec, timestamp.tv_nsec, wait_us, (wait_us == 1) ? "" : "s"); } -- 1.8.1.4
[dpdk-dev] [PATCH v6 5/8] i40e: add additional ieee1588 support functions
Add additional functions to support the existing IEEE1588 functionality and to enable getting, setting and adjusting the device time. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- drivers/net/i40e/i40e_ethdev.c | 147 +++-- drivers/net/i40e/i40e_ethdev.h | 6 +- 2 files changed, 132 insertions(+), 21 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index ddf3d38..d6b3311 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -125,11 +125,13 @@ (1UL << RTE_ETH_FLOW_NONFRAG_IPV6_OTHER) | \ (1UL << RTE_ETH_FLOW_L2_PAYLOAD)) -#define I40E_PTP_40GB_INCVAL 0x01ULL -#define I40E_PTP_10GB_INCVAL 0x03ULL -#define I40E_PTP_1GB_INCVAL 0x20ULL -#define I40E_PRTTSYN_TSYNENA 0x8000 -#define I40E_PRTTSYN_TSYNTYPE 0x0e00 +/* Additional timesync values. */ +#define I40E_PTP_40GB_INCVAL 0x01ULL +#define I40E_PTP_10GB_INCVAL 0x03ULL +#define I40E_PTP_1GB_INCVAL 0x20ULL +#define I40E_PRTTSYN_TSYNENA 0x8000 +#define I40E_PRTTSYN_TSYNTYPE0x0e00 +#define I40E_CYCLECOUNTER_MASK 0x #define I40E_MAX_PERCENT100 #define I40E_DEFAULT_DCB_APP_NUM1 @@ -400,11 +402,20 @@ static int i40e_timesync_read_rx_timestamp(struct rte_eth_dev *dev, static int i40e_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct timespec *timestamp); static void i40e_read_stats_registers(struct i40e_pf *pf, struct i40e_hw *hw); + +static int i40e_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta); + +static int i40e_timesync_read_time(struct rte_eth_dev *dev, + struct timespec *timestamp); +static int i40e_timesync_write_time(struct rte_eth_dev *dev, + const struct timespec *timestamp); + static int i40e_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int i40e_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); + static const struct rte_pci_id pci_id_i40e_map[] = { #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)}, #include "rte_pci_dev_ids.h" @@ -469,6 +480,9 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .timesync_read_rx_timestamp = i40e_timesync_read_rx_timestamp, .timesync_read_tx_timestamp = i40e_timesync_read_tx_timestamp, .get_dcb_info = i40e_dev_get_dcb_info, + .timesync_adjust_time = i40e_timesync_adjust_time, + .timesync_read_time = i40e_timesync_read_time, + .timesync_write_time = i40e_timesync_write_time, }; /* store statistics names and its offset in stats structure */ @@ -7738,17 +7752,36 @@ i40e_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t sw_id) return 0; } -static int -i40e_timesync_enable(struct rte_eth_dev *dev) +static uint64_t +i40e_read_cyclecounter(void *arg) { + struct rte_eth_dev *dev = (struct rte_eth_dev *) arg; struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); - struct rte_eth_link *link = &dev->data->dev_link; - uint32_t tsync_ctl_l; - uint32_t tsync_ctl_h; + uint64_t systim_cycles = 0; + + systim_cycles |= (uint64_t)I40E_READ_REG(hw, I40E_PRTTSYN_TIME_L); + systim_cycles |= (uint64_t)I40E_READ_REG(hw, I40E_PRTTSYN_TIME_H) + << 32; + + return systim_cycles; +} + +static void +i40e_start_cyclecounter(struct rte_eth_dev *dev) +{ + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct i40e_adapter *adapter = + (struct i40e_adapter *)dev->data->dev_private; + struct rte_eth_link link; uint32_t tsync_inc_l; uint32_t tsync_inc_h; - switch (link->link_speed) { + /* Get current link speed. */ + memset(&link, 0, sizeof(link)); + i40e_dev_link_update(dev, 1); + rte_i40e_dev_atomic_read_link_status(dev, &link); + + switch (link.link_speed) { case ETH_LINK_SPEED_40G: tsync_inc_l = I40E_PTP_40GB_INCVAL & 0x; tsync_inc_h = I40E_PTP_40GB_INCVAL >> 32; @@ -7766,6 +7799,72 @@ i40e_timesync_enable(struct rte_eth_dev *dev) tsync_inc_h = 0x0; } + /* Set the timesync increment value. */ + I40E_WRITE_REG(hw, I40E_PRTTSYN_INC_L, tsync_inc_l); + I40E_WRITE_REG(hw, I40E_PRTTSYN_INC_H, tsync_inc_h); + + memset(&adapter->tc, 0, sizeof(struct rte_timecounter)); + adapter->tc.read = i40e_read_cyclecounter; + adapter->tc.cc_mask = I40E_CYCLECOUNTER_MASK; + adapter->tc.cc_shift = 0; + adapter->tc.arg = dev; +} + +static int +i40e_timesyn
[dpdk-dev] [PATCH v6 4/8] igb: add additional ieee1588 support functions
Add additional functions to support the existing IEEE1588 functionality and to enable getting, setting and adjusting the device time. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- drivers/net/e1000/e1000_ethdev.h | 2 + drivers/net/e1000/igb_ethdev.c | 202 +-- 2 files changed, 194 insertions(+), 10 deletions(-) diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h index a667a1a..5401277 100644 --- a/drivers/net/e1000/e1000_ethdev.h +++ b/drivers/net/e1000/e1000_ethdev.h @@ -33,6 +33,7 @@ #ifndef _E1000_ETHDEV_H_ #define _E1000_ETHDEV_H_ +#include /* need update link, bit flag */ #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0) @@ -257,6 +258,7 @@ struct e1000_adapter { struct e1000_vf_info*vfdata; struct e1000_filter_info filter; bool stopped; + struct rte_timecounter tc; }; #define E1000_DEV_PRIVATE(adapter) \ diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c index 2cb115c..ec2e79c 100644 --- a/drivers/net/e1000/igb_ethdev.c +++ b/drivers/net/e1000/igb_ethdev.c @@ -78,10 +78,11 @@ #define IGB_8_BIT_MASK UINT8_MAX /* Additional timesync values. */ -#define E1000_ETQF_FILTER_1588 3 -#define E1000_TIMINCA_INCVALUE 1600 -#define E1000_TIMINCA_INIT ((0x02 << E1000_TIMINCA_16NS_SHIFT) \ - | E1000_TIMINCA_INCVALUE) +#define E1000_CYCLECOUNTER_MASK 0x +#define E1000_ETQF_FILTER_1588 3 +#define IGB_82576_TSYNC_SHIFT16 +#define E1000_INCPERIOD_82576(1 << E1000_TIMINCA_16NS_SHIFT) +#define E1000_INCVALUE_82576 (16 << IGB_82576_TSYNC_SHIFT) #define E1000_TSAUXC_DISABLE_SYSTIME 0x8000 static int eth_igb_configure(struct rte_eth_dev *dev); @@ -236,6 +237,11 @@ static int igb_timesync_read_rx_timestamp(struct rte_eth_dev *dev, uint32_t flags); static int igb_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct timespec *timestamp); +static int igb_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta); +static int igb_timesync_read_time(struct rte_eth_dev *dev, + struct timespec *timestamp); +static int igb_timesync_write_time(struct rte_eth_dev *dev, + const struct timespec *timestamp); static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, @@ -349,6 +355,9 @@ static const struct eth_dev_ops eth_igb_ops = { .get_eeprom_length= eth_igb_get_eeprom_length, .get_eeprom = eth_igb_get_eeprom, .set_eeprom = eth_igb_set_eeprom, + .timesync_adjust_time = igb_timesync_adjust_time, + .timesync_read_time = igb_timesync_read_time, + .timesync_write_time = igb_timesync_write_time, }; /* @@ -4182,20 +4191,151 @@ eth_igb_set_mc_addr_list(struct rte_eth_dev *dev, return 0; } +static uint64_t +igb_read_cyclecounter(void *arg) +{ + struct rte_eth_dev *dev = (struct rte_eth_dev *) arg; + struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint64_t systime_cycles = 0; + + switch (hw->mac.type) { + case e1000_i210: + case e1000_i211: + /* +* Need to read System Time Residue Register to be able +* to read the other two registers. +*/ + E1000_READ_REG(hw, E1000_SYSTIMR); + /* SYSTIMEL stores ns and SYSTIMEH stores seconds. */ + systime_cycles = (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML); + systime_cycles += (uint64_t)E1000_READ_REG(hw, E1000_SYSTIMH) + * NSEC_PER_SEC; + break; + case e1000_82580: + case e1000_i350: + case e1000_i354: + /* +* Need to read System Time Residue Register to be able +* to read the other two registers. +*/ + E1000_READ_REG(hw, E1000_SYSTIMR); + systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML); + /* Only the 8 LSB are valid. */ + systime_cycles |= (uint64_t)(E1000_READ_REG(hw, E1000_SYSTIMH) + & 0xff) << 32; + break; + default: + systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML); + systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIMH) + << 32; + break; + } + + return systime_cycles; +} + +static void +igb_start_cyclecounter(struct rte_eth_dev *dev) +{ + struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); +
[dpdk-dev] [PATCH v6 3/8] ixgbe: add additional ieee1588 support functions
From: Daniel Mrzyglod Add additional functions to support the existing IEEE1588 functionality and to enable getting, setting and adjusting the device time. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- drivers/net/ixgbe/ixgbe_ethdev.c | 187 --- drivers/net/ixgbe/ixgbe_ethdev.h | 2 + 2 files changed, 178 insertions(+), 11 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 0b0bbcf..91a903d 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -126,10 +126,17 @@ #define IXGBE_HKEY_MAX_INDEX 10 /* Additional timesync values. */ -#define IXGBE_TIMINCA_16NS_SHIFT 24 -#define IXGBE_TIMINCA_INCVALUE 1600 -#define IXGBE_TIMINCA_INIT ((0x02 << IXGBE_TIMINCA_16NS_SHIFT) \ - | IXGBE_TIMINCA_INCVALUE) +#define NSEC_PER_SEC 10L +#define IXGBE_INCVAL_10GB0x +#define IXGBE_INCVAL_1GB 0x4000 +#define IXGBE_INCVAL_100 0x5000 +#define IXGBE_INCVAL_SHIFT_10GB 28 +#define IXGBE_INCVAL_SHIFT_1GB 24 +#define IXGBE_INCVAL_SHIFT_100 21 +#define IXGBE_INCVAL_SHIFT_82599 7 +#define IXGBE_INCPER_SHIFT_82599 24 + +#define IXGBE_CYCLECOUNTER_MASK 0x static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev); static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev); @@ -325,6 +332,11 @@ static int ixgbe_timesync_read_rx_timestamp(struct rte_eth_dev *dev, uint32_t flags); static int ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct timespec *timestamp); +static int ixgbe_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta); +static int ixgbe_timesync_read_time(struct rte_eth_dev *dev, + struct timespec *timestamp); +static int ixgbe_timesync_write_time(struct rte_eth_dev *dev, + const struct timespec *timestamp); /* * Define VF Stats MACRO for Non "cleared on read" register @@ -480,6 +492,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .get_eeprom = ixgbe_get_eeprom, .set_eeprom = ixgbe_set_eeprom, .get_dcb_info = ixgbe_dev_get_dcb_info, + .timesync_adjust_time = ixgbe_timesync_adjust_time, + .timesync_read_time = ixgbe_timesync_read_time, + .timesync_write_time = ixgbe_timesync_write_time, }; /* @@ -5608,20 +5623,147 @@ ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev, ixgbe_dev_addr_list_itr, TRUE); } +static uint64_t +ixgbe_read_cyclecounter(void *arg) +{ + struct rte_eth_dev *dev = (struct rte_eth_dev *) arg; + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint64_t systime_cycles = 0; + + switch (hw->mac.type) { + case ixgbe_mac_X550: + /* SYSTIMEL stores ns and SYSTIMEH stores seconds. */ + systime_cycles = (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML); + systime_cycles += (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) + * NSEC_PER_SEC; + break; + default: + systime_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML); + systime_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) + << 32; + } + + return systime_cycles; +} + +static void +ixgbe_start_cyclecounter(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + struct ixgbe_adapter *adapter = + (struct ixgbe_adapter *)dev->data->dev_private; + struct rte_eth_link link; + uint32_t incval = 0; + uint32_t shift = 0; + + /* Get current link speed. */ + memset(&link, 0, sizeof(link)); + ixgbe_dev_link_update(dev, 1); + rte_ixgbe_dev_atomic_read_link_status(dev, &link); + + switch (link.link_speed) { + case ETH_LINK_SPEED_100: + incval = IXGBE_INCVAL_100; + shift = IXGBE_INCVAL_SHIFT_100; + break; + case ETH_LINK_SPEED_1000: + incval = IXGBE_INCVAL_1GB; + shift = IXGBE_INCVAL_SHIFT_1GB; + break; + case ETH_LINK_SPEED_1: + default: + incval = IXGBE_INCVAL_10GB; + shift = IXGBE_INCVAL_SHIFT_10GB; + break; + } + + switch (hw->mac.type) { + case ixgbe_mac_X550: + /* Independent of link speed. */ + incval = 1; + /* Cycles read will be interpreted as ns. */ + shift = 0; + /* Fall-through */ + case ixgbe_mac_X540: + IXGBE_WRITE_REG(hw, IXGBE_TIMINCA, incval); + break; +
[dpdk-dev] [PATCH v6 2/8] eal: add common time structures and functions
From: Daniel Mrzyglod Add common functions and structures to handle time, and cycle counts which will be used for PTP processing. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/include/rte_time.h | 210 +++ 2 files changed, 211 insertions(+), 1 deletion(-) create mode 100644 lib/librte_eal/common/include/rte_time.h diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile index 0c43d6a..8508473 100644 --- a/lib/librte_eal/common/Makefile +++ b/lib/librte_eal/common/Makefile @@ -40,7 +40,7 @@ INC += rte_string_fns.h rte_version.h INC += rte_eal_memconfig.h rte_malloc_heap.h INC += rte_hexdump.h rte_devargs.h rte_dev.h INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h -INC += rte_malloc.h +INC += rte_malloc.h rte_time.h ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y) INC += rte_warnings.h diff --git a/lib/librte_eal/common/include/rte_time.h b/lib/librte_eal/common/include/rte_time.h new file mode 100644 index 000..33f3038 --- /dev/null +++ b/lib/librte_eal/common/include/rte_time.h @@ -0,0 +1,210 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#define NSEC_PER_SEC 10L + +/** + * @internal + * + * Structure to hold the parameters of a running cycle counter to assist + * in converting cycles to nanoseconds. + */ +struct rte_timecounter { + /** Last cycle counter value read. */ + uint64_t cycle_last; + /** Nanoseconds count. */ + uint64_t nsec; + /** Bitmask separating nanosecond and sub-nanoseconds. */ + uint64_t nsec_mask; + /** Sub-nanoseconds count. */ + uint64_t nsec_frac; + /** Reads the current cycle counter value. */ + uint64_t (*read)(void *arg); + /** Bitmask for two's complement subtraction of non-64 bit counters. */ + uint64_t cc_mask; + /** Cycle to nanosecond divisor (power of two). */ + uint32_t cc_shift; + /** Argument of read() function pointer. */ + void *arg; +}; + +/** + * @internal + * + * Initialize the rte_timecounter structure. + */ +static inline void +rte_timecounter_init(struct rte_timecounter *tc, uint64_t start_time) +{ + tc->cycle_last = tc->read(tc->arg); + tc->nsec = start_time; + tc->nsec_mask = (1ULL << tc->cc_shift) - 1; + tc->nsec_frac = 0; +} + +/** + * @internal + * + * Converts cyclecounter cycles to nanoseconds. + */ +static inline uint64_t +rte_cyclecounter_cycles_to_ns(uint64_t cycles, uint64_t *frac, + uint32_t shift, uint64_t mask) +{ + uint64_t ns; + + /* Add fractional nanoseconds. */ + ns = cycles + *frac; + *frac = ns & mask; + + /* Shift to get only nanoseconds. */ + return ns >> shift; +} + +/** + * @internal + * + * Similar to rte_cyclecounter_cycles_to_ns(), but this is used when computing + * a time previous to the time stored in the cycle counter. + */ +static inline uint64_t +rte_cyclecounter_cycles_to_ns_previous(uint64_t cycles, uint64_t frac, + uint32_t shift) +{ + return ((cycles - frac) >> shift); +} + +/** + * @internal + * + * Converts cycle units into nanoseconds and adds to the previ
[dpdk-dev] [PATCH v6 1/8] ethdev: add additional ieee1588 support functions
From: Daniel Mrzyglod Add additional functions to support the existing IEEE1588 functionality. * rte_eth_timesync_write_time(): set the device clock time. * rte_eth_timesync_read_time(): get the device clock time. * rte_eth_timesync_adjust_time(): adjust the device clock time. Signed-off-by: Daniel Mrzyglod Signed-off-by: Pablo de Lara Reviewed-by: John McNamara --- doc/guides/rel_notes/release_2_2.rst | 4 ++ lib/librte_ether/rte_ethdev.c | 36 + lib/librte_ether/rte_ethdev.h | 71 ++ lib/librte_ether/rte_ether_version.map | 3 ++ 4 files changed, 114 insertions(+) diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 59dda59..2ef6c29 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -94,6 +94,10 @@ New Features * **Added port hotplug support to xenvirt.** +* **Added API in in ethdev to support IEEE1588.** + + Added functions to read and write and adjust system time in the NIC. + Resolved Issues --- diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index e0e1dca..daca6fa 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -3193,6 +3193,42 @@ rte_eth_timesync_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp) } int +rte_eth_timesync_adjust_time(uint8_t port_id, int64_t delta) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_adjust_time, -ENOTSUP); + return (*dev->dev_ops->timesync_adjust_time)(dev, delta); +} + +int +rte_eth_timesync_read_time(uint8_t port_id, struct timespec *timestamp) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_read_time, -ENOTSUP); + return (*dev->dev_ops->timesync_read_time)(dev, timestamp); +} + +int +rte_eth_timesync_write_time(uint8_t port_id, const struct timespec *timestamp) +{ + struct rte_eth_dev *dev; + + VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_write_time, -ENOTSUP); + return (*dev->dev_ops->timesync_write_time)(dev, timestamp); +} + +int rte_eth_dev_get_reg_length(uint8_t port_id) { struct rte_eth_dev *dev; diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 48a540d..b7be4b8 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1206,6 +1206,17 @@ typedef int (*eth_timesync_read_tx_timestamp_t)(struct rte_eth_dev *dev, struct timespec *timestamp); /**< @internal Function used to read a TX IEEE1588/802.1AS timestamp. */ +typedef int (*eth_timesync_adjust_time)(struct rte_eth_dev *dev, int64_t); +/**< @internal Function used to adjust the device clock */ + +typedef int (*eth_timesync_read_time)(struct rte_eth_dev *dev, + struct timespec *timestamp); +/**< @internal Function used to get time from the device clock. */ + +typedef int (*eth_timesync_write_time)(struct rte_eth_dev *dev, + const struct timespec *timestamp); +/**< @internal Function used to get time from the device clock */ + typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev); /**< @internal Retrieve device register count */ @@ -1400,6 +1411,12 @@ struct eth_dev_ops { /** Get DCB information */ eth_get_dcb_info get_dcb_info; + /** Adjust the device clock.*/ + eth_timesync_adjust_time timesync_adjust_time; + /** Get the device clock time. */ + eth_timesync_read_time timesync_read_time; + /** Set the device clock time. */ + eth_timesync_write_time timesync_write_time; }; /** @@ -3755,6 +3772,60 @@ extern int rte_eth_timesync_read_tx_timestamp(uint8_t port_id, struct timespec *timestamp); /** + * Adjust the timesync clock on an Ethernet device. + * + * This is usually used in conjunction with other Ethdev timesync functions to + * synchronize the device time using the IEEE1588/802.1AS protocol. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param delta + * The adjustment in nanoseconds. + * + * @return + * - 0: Success. + * - -ENODEV: The port ID is invalid. + * - -ENOTSUP: The function is not supported by the Ethernet driver. + */ +extern int rte_eth_timesync_adjust_time(uint8_t port_id, int64_t delta); + +/** + * Read the time from the timesync clock on an Ethernet device. + * + * This is usually used in conjunction with other Ethdev timesync functions to + * synchronize the device time using the IEEE1588/802.1AS protoco
[dpdk-dev] [PATCH v6 0/8] add sample ptp slave application
Add a sample application that acts as a PTP slave using the DPDK IEEE1588 functions. Also add some additional IEEE1588 support functions to enable getting, setting and adjusting the device time. V5->v6: - Moved common functionality for cyclecounter and time conversions functions to lib/librte_eal/common/include/rte_time.h, based on mailing list comments. - Prefixed functions with rte_ and added Doxygen comments. - Refactored cyclecounter structs from previous version to make it more generic. - Fix ieee1588 fwd output in testpmd. V4->v5: - rebase to the current master V3->V4: Doc: - Update documentation for ptpclient - fix: put information about ptp application in correct place V2->V3: PMD: - move common structures and functions for PTP protocol to librte_net/rte_ptp.h V1->V2: PMDs: - add support for e1000 - add support for ixgbe - add support for i40 ethdev: - change function names to more proper Doc: - add documentation for ptpclient sample: - add kernel adjustment option - add portmask option to provide portmask to application Daniel Mrzyglod (5): ethdev: add additional ieee1588 support functions eal: add common time structures and functions ixgbe: add additional ieee1588 support functions doc: add a ptpclient sample guide example: minimal ptp client implementation Pablo de Lara (3): igb: add additional ieee1588 support functions i40e: add additional ieee1588 support functions testpmd: add nanosecond output for ieee1588 fwd MAINTAINERS| 4 + app/test-pmd/ieee1588fwd.c | 8 +- doc/guides/rel_notes/release_2_2.rst | 4 + doc/guides/sample_app_ug/img/ptpclient.svg | 524 +++ doc/guides/sample_app_ug/index.rst | 3 + doc/guides/sample_app_ug/ptpclient.rst | 306 +++ drivers/net/e1000/e1000_ethdev.h | 2 + drivers/net/e1000/igb_ethdev.c | 202 +++- drivers/net/i40e/i40e_ethdev.c | 147 +- drivers/net/i40e/i40e_ethdev.h | 6 +- drivers/net/ixgbe/ixgbe_ethdev.c | 187 ++- drivers/net/ixgbe/ixgbe_ethdev.h | 2 + examples/Makefile | 1 + examples/ptpclient/Makefile| 56 +++ examples/ptpclient/ptpclient.c | 780 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/include/rte_time.h | 210 lib/librte_ether/rte_ethdev.c | 36 ++ lib/librte_ether/rte_ethdev.h | 71 +++ lib/librte_ether/rte_ether_version.map | 3 + 20 files changed, 2507 insertions(+), 47 deletions(-) create mode 100644 doc/guides/sample_app_ug/img/ptpclient.svg create mode 100644 doc/guides/sample_app_ug/ptpclient.rst create mode 100644 examples/ptpclient/Makefile create mode 100644 examples/ptpclient/ptpclient.c create mode 100644 lib/librte_eal/common/include/rte_time.h -- 1.8.1.4
[dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD
Hi Tetsuya, In my test I created 2 vdev using "--vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev 'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled in wrong order. The reason is that: 2 threads are created to handle message from 2 sockets, but their fds are SHARED, so each thread are reading from both sockets. This can lead to incorrect behaviors, in my case sometimes the VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to destroy_device(). Detailed log as shown below: thread 69351 & 69352 are both reading fd 25. Thanks Yuanhan for helping debugging! Thanks Zhihong - > debug: setting up new vq conn for fd: 23, tid: 69352 VHOST_CONFIG: new virtio connection is 25 VHOST_CONFIG: new device, handle is 0 > debug: vserver_message_handler thread id: 69352, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_OWNER > debug: vserver_message_handler thread id: 69352, fd: 25 VHOST_CONFIG: read message VHOST_USER_GET_FEATURES > debug: vserver_message_handler thread id: 69352, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:26 > debug: vserver_message_handler thread id: 69352, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:1 file:27 > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:28 > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:1 file:26 > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_FEATURES > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE > debug: device_fh: 0: user_set_mem_table VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c000 sz:0xa off:0x0 VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff68000 sz:0x4000 off:0xc > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:0 file:30 > debug: vserver_message_handler thread id: 69352, fd: 25 VHOST_CONFIG: virtio is not ready for processing. > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR > debug: vserver_message_handler thread id: 69351, fd: 25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:1 file:31 VHOST_CONFIG: virtio is now ready for processing. PMD: New connection established VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM - > ... > + > +static void *vhost_driver_session(void *param __rte_unused) > +{ > + static struct virtio_net_device_ops *vhost_ops; > + > + vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0); > + if (vhost_ops == NULL) > + rte_panic("Can't allocate memory\n"); > + > + /* set vhost arguments */ > + vhost_ops->new_device = new_device; > + vhost_ops->destroy_device = destroy_device; > + if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0) > + rte_panic("Can't register callbacks\n"); > + > + /* start event handling */ > + rte_vhost_driver_session_start(); > + > + rte_free(vhost_ops); > + pthread_exit(0); > +} > + > +static void vhost_driver_session_start(struct pmd_internal *internal) > +{ > + int ret; > + > + ret = pthread_create(&internal->session_th, > + NULL, vhost_driver_session, NULL); > + if (ret) > + rte_panic("Can't create a thread\n"); > +} > + > ...
[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size
> -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Thursday, November 12, 2015 7:19 PM > To: Tan, Jianfeng > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned > with hugepage size > > 2015-11-12 06:04, Jianfeng Tan: > > - alignment = region[idx].blksz; > > - munmap((void *)(uintptr_t) > > - RTE_ALIGN_FLOOR( > > - region[idx].mapped_address, > alignment), > > - RTE_ALIGN_CEIL( > > - region[idx].mapped_size, > alignment)); > > + munmap((void *)region[idx].mapped_address, > > + region[idx].mapped_size); > > Sorry, it does not compile for 32-bit: > virtio-net-user.c:84:11: error: cast to pointer from integer of different size Oops, sorry, should use (void *)(uintptr_t). I'll resend this patch. Jianfeng
[dpdk-dev] [PATCH] maintainers: Add maintainers for enic PMD
> --- a/MAINTAINERS > +++ b/MAINTAINERS > Cisco enic > +M: John Daley > +M: Sujith Sankar > F: drivers/net/enic/ Welcome :) Now as we officially have some maintainers for enic, please could you consider writing doc/guides/nics/enic.rst? Thanks
[dpdk-dev] [PATCH] maintainers: claim to be reviewer of virtio/vhost component
2015-11-12 12:10, Yuanhan Liu: > Firstly, Chuangchun's email address's been invalid for a while. > > Secondly, I'd like to take the responsibility to review patches > of virtio/vhost component. [...] > RedHat virtio > M: Huawei Xie > -M: Changchun Ouyang > +M: Yuanhan Liu > F: drivers/net/virtio/ > F: doc/guides/nics/virtio.rst > F: lib/librte_vhost/ Again, thanks Yuanhan for the excellent contributions and welcome new maintainer! Changchun, you are still welcome with a new email address if you have some time.
[dpdk-dev] [PATCH] vhost: reset device properly
> > Currently, we reset all fields of a device to zero when reset > > happens, which is wrong, since for some fields like device_fh, > > ifname, and virt_qp_nb, they should be same and be kept after > > reset until the device is removed. And this is what's the new > > helper function reset_device() for. > > > > And use rte_zmalloc() instead of rte_malloc, so that we could > > avoid init_device(), which basically dose zero reset only so far. > > Hence, init_device() is dropped in this patch. > > > > This patch also removes a hack of using the offset a specific > > field (which is virtqueue now) inside of `virtio_net' structure > > to do reset, which could be broken easily if someone changed the > > field order without caution. > > > > Cc: Tetsuya Mukawa > > Cc: Xie Huawei > > Signed-off-by: Yuanhan Liu > > > > I had a patch that just saved the ifname but this is much better. > > Acked-by: Rich Lane Applied, thanks
[dpdk-dev] [PATCH] vhost: make destroy callback on VHOST_USER_RESET_OWNER
2015-11-10 10:25, Yuanhan Liu: > On Mon, Nov 09, 2015 at 06:15:13PM -0800, Rich Lane wrote: > > QEMU sends this message first when shutting down. There was previously no > > way > > for the dataplane to know that the virtio_net instance had become unusable > > and > > it would segfault when trying to do RX/TX. > > > > Signed-off-by: Rich Lane > > Thanks. Even I have same patch in my patch queue (I have some other > issues to fix), you got my ack. > > Acked-by: Yuanhan Liu Applied, thanks
[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size
2015-11-12 06:04, Jianfeng Tan: > - alignment = region[idx].blksz; > - munmap((void *)(uintptr_t) > - RTE_ALIGN_FLOOR( > - region[idx].mapped_address, alignment), > - RTE_ALIGN_CEIL( > - region[idx].mapped_size, alignment)); > + munmap((void *)region[idx].mapped_address, > + region[idx].mapped_size); Sorry, it does not compile for 32-bit: virtio-net-user.c:84:11: error: cast to pointer from integer of different size
[dpdk-dev] [PATCH] i40e: fix the issue of trying more VSIs for VMDq than available
2015-11-12 15:09, Helin Zhang: > It fixes the issue of trying to allocate more VSIs for VMDq than > hardware remaining. It adds a check of the hardware remaining > before allocating VSIs for VMDq. > > Fixes: c80707a0fd9c ("i40e: fix VMDq pool limit") > > Signed-off-by: Helin Zhang Applied, thanks
[dpdk-dev] [PATCH] vhost: reset device properly
Currently, we reset all fields of a device to zero when reset happens, which is wrong, since for some fields like device_fh, ifname, and virt_qp_nb, they should be same and be kept after reset until the device is removed. And this is what's the new helper function reset_device() for. And use rte_zmalloc() instead of rte_malloc, so that we could avoid init_device(), which basically dose zero reset only so far. Hence, init_device() is dropped in this patch. This patch also removes a hack of using the offset a specific field (which is virtqueue now) inside of `virtio_net' structure to do reset, which could be broken easily if someone changed the field order without caution. Cc: Tetsuya Mukawa Cc: Xie Huawei Signed-off-by: Yuanhan Liu --- This patch is based on: http://dpdk.org/dev/patchwork/patch/8818/ --- lib/librte_vhost/virtio-net.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index 39a6a5e..cc917da 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -204,6 +204,7 @@ cleanup_device(struct virtio_net *dev) munmap((void *)(uintptr_t)dev->mem->mapped_address, (size_t)dev->mem->mapped_size); free(dev->mem); + dev->mem = NULL; } for (i = 0; i < dev->virt_qp_nb; i++) { @@ -306,20 +307,18 @@ alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx) } /* - * Initialise all variables in device structure. + * Reset some variables in device structure, while keeping few + * others untouched, such as device_fh, ifname, virt_qp_nb: they + * should be same unless the device is removed. */ static void -init_device(struct virtio_net *dev) +reset_device(struct virtio_net *dev) { - int vq_offset; uint32_t i; - /* -* Virtqueues have already been malloced so -* we don't want to set them to NULL. -*/ - vq_offset = offsetof(struct virtio_net, virtqueue); - memset(dev, 0, vq_offset); + dev->features = 0; + dev->protocol_features = 0; + dev->flags = 0; for (i = 0; i < dev->virt_qp_nb; i++) init_vring_queue_pair(dev, i); @@ -336,7 +335,7 @@ new_device(struct vhost_device_ctx ctx) struct virtio_net_config_ll *new_ll_dev; /* Setup device and virtqueues. */ - new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0); + new_ll_dev = rte_zmalloc(NULL, sizeof(struct virtio_net_config_ll), 0); if (new_ll_dev == NULL) { RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") Failed to allocate memory for dev.\n", @@ -344,9 +343,6 @@ new_device(struct vhost_device_ctx ctx) return -1; } - /* Initialise device and virtqueues. */ - init_device(&new_ll_dev->dev); - new_ll_dev->next = NULL; /* Add entry to device configuration linked list. */ @@ -430,7 +426,6 @@ static int reset_owner(struct vhost_device_ctx ctx) { struct virtio_net *dev; - uint64_t device_fh; dev = get_device(ctx); if (dev == NULL) @@ -439,10 +434,8 @@ reset_owner(struct vhost_device_ctx ctx) if (dev->flags & VIRTIO_DEV_RUNNING) notify_ops->destroy_device(dev); - device_fh = dev->device_fh; cleanup_device(dev); - init_device(dev); - dev->device_fh = device_fh; + reset_device(dev); return 0; } -- 1.9.0
[dpdk-dev] [PATCH] maintainers: claim to be reviewer of virtio/vhost component
Firstly, Chuangchun's email address's been invalid for a while. Secondly, I'd like to take the responsibility to review patches of virtio/vhost component. Cc: Huawei Xie Cc: Thomas Monjalon Signed-off-by: Yuanhan Liu --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index c8be5d2..b05724a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -262,7 +262,7 @@ F: doc/guides/nics/mlx5.rst RedHat virtio M: Huawei Xie -M: Changchun Ouyang +M: Yuanhan Liu F: drivers/net/virtio/ F: doc/guides/nics/virtio.rst F: lib/librte_vhost/ -- 1.9.0
[dpdk-dev] [PATCH] doc: update release notes
Updated release notes about adding X722 support. Signed-off-by: Helin Zhang --- doc/guides/rel_notes/release_2_2.rst | 4 1 file changed, 4 insertions(+) diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5636aad..5811c2f 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -59,6 +59,10 @@ New Features * **Added flow director support in i40e VF.** +* **Added i40e support of early X722 series.** + + * Add early X722 support for evaluation only, as the hardware is in A0. + * **Added fm10k vector RX/TX.** * **Added fm10k TSO support for both PF and VF.** -- 1.9.3
[dpdk-dev] [PATCH] examples/l3fwd: fix eth-dest commandline strncmp size
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara > Sent: Monday, November 2, 2015 5:46 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix eth-dest commandline > strncmp size > > Fix minor, and non critical, copy and paste error in strncmp() of eth-dest > commandline argument. > > Fixes: bd785f6f6791 ("examples/l3fwd: make destination mac address > configurable") > > Signed-off-by: John McNamara Acked-by: Andrey Chilikin
[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs
This patch enables calculating space left in a hugetlbfs. There are three sources to get the information: 1. from sysfs; 2. from option size specified when mount; 3. use statfs. We should use the minimum one of these three sizes. Signed-off-by: Jianfeng Tan --- Changes in v2: - reword title - fix compiler error of v1 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 - 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 18858e2..8305a58 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include #include @@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz) return retval; } +/* Caller to make sure this mnt_dir exist + */ +static uint64_t +get_hugetlbfs_mount_size(const char *mnt_dir) +{ + char *start, *end, *opt_size; + struct mntent *ent; + uint64_t size; + FILE *f; + int len; + + f = setmntent("/proc/mounts", "r"); + if (f == NULL) { + RTE_LOG(ERR, EAL, "setmntent() error: %s\n", + strerror(errno)); + return 0; + } + while (NULL != (ent = getmntent(f))) { + if (!strcmp(ent->mnt_dir, mnt_dir)) + break; + } + + start = hasmntopt(ent, "size"); + if (start == NULL) { + RTE_LOG(DEBUG, EAL, "option size not specified for %s\n", + mnt_dir); + size = 0; + goto end; + } + start += strlen("size="); + end = strstr(start, ","); + if (end != NULL) + len = end - start; + else + len = strlen(start); + opt_size = strndup(start, len); + size = rte_str_to_size(opt_size); + free(opt_size); + +end: + endmntent(f); + return size; +} + +/* Caller to make sure this mount has option size + * so that statfs is not zero. + */ +static uint64_t +get_hugetlbfs_free_size(const char *mnt_dir) +{ + int r; + struct statfs stats; + + r = statfs(mnt_dir, &stats); + if (r != 0) { + RTE_LOG(ERR, EAL, "statfs() error: %s\n", + strerror(errno)); + return 0; + } + + return stats.f_bfree * stats.f_bsize; +} + + /* * Clear the hugepage directory of whatever hugepage files * there are. Checks if the file is locked (i.e. @@ -329,9 +395,26 @@ eal_hugepage_info_init(void) if (clear_hugedir(hpi->hugedir) == -1) break; + /* there are three souces of how much space left in a +* hugetlbfs dir. +*/ + uint64_t sz_left, sz_sysfs, sz_option, sz_statfs; + + sz_sysfs = get_num_hugepages(dirent->d_name) * + hpi->hugepage_sz; + sz_left = sz_sysfs; + sz_option = get_hugetlbfs_mount_size(hpi->hugedir); + if (sz_option) { + sz_statfs = get_hugetlbfs_free_size(hpi->hugedir); + sz_left = RTE_MIN(sz_sysfs, sz_statfs); + RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: " + "%"PRIu64", sz_statfs: %"PRIu64"\n", + sz_sysfs, sz_option, sz_statfs); + } + /* for now, put all pages into socket 0, * later they will be sorted */ - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + hpi->num_pages[0] = sz_left / hpi->hugepage_sz; #ifndef RTE_ARCH_64 /* for 32-bit systems, limit number of hugepages to -- 2.1.4
[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow
> -Original Message- > From: Wu, Jingjing > Sent: Tuesday, November 10, 2015 3:11 AM > To: dev at dpdk.org > Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey > Subject: [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow > > Signed-off-by: Jingjing Wu Acked-by: Andrey Chilikin
[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs
This patch enables calculating space left in a hugetlbfs. There are three sources to get the information: 1. from sysfs; 2. from option size specified when mount; 3. use statfs. We should use the minimum one of these three sizes. Signed-off-by: Jianfeng Tan --- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 - 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 18858e2..8305a58 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include #include @@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz) return retval; } +/* Caller to make sure this mnt_dir exist + */ +static uint64_t +get_hugetlbfs_mount_size(const char *mnt_dir) +{ + char *start, *end, *opt_size; + struct mntent *ent; + uint64_t size; + FILE *f; + int len; + + f = setmntent("/proc/mounts", "r"); + if (f == NULL) { + RTE_LOG(ERR, EAL, "setmntent() error: %s\n", + strerror(errno)); + return 0; + } + while (NULL != (ent = getmntent(f))) { + if (!strcmp(ent->mnt_dir, mnt_dir)) + break; + } + + start = hasmntopt(ent, "size"); + if (start == NULL) { + RTE_LOG(DEBUG, EAL, "option size not specified for %s\n", + mnt_dir); + size = 0; + goto end; + } + start += strlen("size="); + end = strstr(start, ","); + if (end != NULL) + len = end - start; + else + len = strlen(start); + opt_size = strndup(start, len); + size = rte_str_to_size(opt_size); + free(opt_size); + +end: + endmntent(f); + return size; +} + +/* Caller to make sure this mount has option size + * so that statfs is not zero. + */ +static uint64_t +get_hugetlbfs_free_size(const char *mnt_dir) +{ + int r; + struct statfs stats; + + r = statfs(mnt_dir, &stats); + if (r != 0) { + RTE_LOG(ERR, EAL, "statfs() error: %s\n", + strerror(errno)); + return 0; + } + + return stats.f_bfree * stats.f_bsize; +} + + /* * Clear the hugepage directory of whatever hugepage files * there are. Checks if the file is locked (i.e. @@ -329,9 +395,26 @@ eal_hugepage_info_init(void) if (clear_hugedir(hpi->hugedir) == -1) break; + /* there are three souces of how much space left in a +* hugetlbfs dir. +*/ + uint64_t sz_left, sz_sysfs, sz_option, sz_statfs; + + sz_sysfs = get_num_hugepages(dirent->d_name) * + hpi->hugepage_sz; + sz_left = sz_sysfs; + sz_option = get_hugetlbfs_mount_size(hpi->hugedir); + if (sz_option) { + sz_statfs = get_hugetlbfs_free_size(hpi->hugedir); + sz_left = RTE_MIN(sz_sysfs, sz_statfs); + RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: " + "%"PRIu64", sz_statfs: %"PRIu64"\n", + sz_sysfs, sz_option, sz_statfs); + } + /* for now, put all pages into socket 0, * later they will be sorted */ - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + hpi->num_pages[0] = sz_left / hpi->hugepage_sz; #ifndef RTE_ARCH_64 /* for 32-bit systems, limit number of hugepages to -- 2.1.4
[dpdk-dev] [PATCH v4 3/8] virtio/lib:add vhost TX checksum support capabilities
On Wed, Nov 11, 2015 at 09:31:14AM -0800, Stephen Hemminger wrote: > On Wed, 11 Nov 2015 16:26:57 +0800 > Yuanhan Liu wrote: > > > On Wed, Nov 11, 2015 at 02:40:41PM +0800, Jijiang Liu wrote: > > > Add vhost TX offload(CSUM and TSO) support capabilities. > > > > Claiming first that we support something, and then actually implementing > > in a later patch is wrong, as at this stage, we actually does not support > > that, hence, the functionality is broken. > > > > --yliu > > Actually in this case it is okay to claim that driver "might" use offload > cabability but never do it. But it will not work once it does use it, right? --yliu > But agree in general better to keep both together.
[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs
On Thu, 12 Nov 2015 08:17:57 +0800 Jianfeng Tan wrote: > This patch enables calculating space left in a hugetlbfs. > There are three sources to get the information: 1. from > sysfs; 2. from option size specified when mount; 3. use > statfs. We should use the minimum one of these three sizes. > > Signed-off-by: Jianfeng Tan Thanks, the hugetlbfs usage up until now has been rather brute force. I wonder if long term it might be better to defer all this stuff to another library like libhugetlbfs. https://github.com/libhugetlbfs/libhugetlbfs Especially wen dealing with other architectures it might provide some nice abstraction.
[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs
This patch enables calculating space left in a hugetlbfs. There are three sources to get the information: 1. from sysfs; 2. from option size specified when mount; 3. use statfs. We should use the minimum one of these three sizes. Signed-off-by: Jianfeng Tan --- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 - 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c index 18858e2..6db8c33 100644 --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include #include @@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz) return retval; } +/* Caller to make sure this mnt_dir exist + */ +static uint64_t +get_hugetlbfs_mount_size(const char *mnt_dir) +{ + char *start, *end, *opt_size; + struct mntent *ent; + uint64_t size; + FILE *f; + int len; + + f = setmntent("/proc/mounts", "r"); + if (f == NULL) { + RTE_LOG(ERR, EAL, "setmntent() error: %s\n", + strerror(errno)); + return 0; + } + while (NULL != (ent = getmntent(f))) { + if (!strcmp(ent->mnt_dir, mnt_dir)) + break; + } + + start = hasmntopt(ent, "size"); + if (start == NULL) { + RTE_LOG(DEBUG, EAL, "option size not specified for %s\n", + mnt_dir); + size = 0; + goto end; + } + start += strlen("size="); + end = strstr(start, ","); + if (end != NULL) + len = end - start; + else + len = strlen(start); + opt_size = strndup(start, len); + size = rte_str_to_size(opt_size); + free(opt_size); + +end: + endmntent(f); + return size; +} + +/* Caller to make sure this mount has option size + * so that statfs is not zero. + */ +static uint64_t +get_hugetlbfs_free_size(const char *mnt_dir) +{ + int r; + struct statfs stats; + + r = statfs(mnt_dir, &stats); + if (r != 0) { + RTE_LOG(ERR, EAL, "statfs() error: %s\n", + strerror(errno)); + return 0; + } + + return stats.f_bfree * stats.f_bsize; +} + + /* * Clear the hugepage directory of whatever hugepage files * there are. Checks if the file is locked (i.e. @@ -329,9 +395,26 @@ eal_hugepage_info_init(void) if (clear_hugedir(hpi->hugedir) == -1) break; + /* there are three souces of how much space left in a +* hugetlbfs dir. +*/ + uint64_t sz_left, sz_sysfs, sz_option, sz_statfs; + + sz_sysfs = get_num_hugepages(dirent->d_name) * + hpi->hugepage_sz; + sz_left = sz_sysfs; + sz_option = get_hugetlbfs_mount_size(hpi->hugedir); + if (sz_option) { + sz_statfs = get_hugetlbfs_free_size(hpi->hugedir); + sz_left = RTE_MIN(sz_sysfs, sz_statfs); + RTE_LOG(INFO, "sz_sysfs: %"PRIu64", sz_option: " + "%"PRIu64", sz_statfs: %"PRIu64"\n", + sz_sysfs, sz_option, sz_statfs); + } + /* for now, put all pages into socket 0, * later they will be sorted */ - hpi->num_pages[0] = get_num_hugepages(dirent->d_name); + hpi->num_pages[0] = sz_left / hpi->hugepage_sz; #ifndef RTE_ARCH_64 /* for 32-bit systems, limit number of hugepages to -- 2.1.4
[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs
Hi Jianfeng, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan > Sent: Thursday, November 12, 2015 12:18 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] mem: fix how to calculate space left in a > hugetlbfs > > This patch enables calculating space left in a hugetlbfs. > There are three sources to get the information: 1. from > sysfs; 2. from option size specified when mount; 3. use > statfs. We should use the minimum one of these three sizes. > > Signed-off-by: Jianfeng Tan You should reword the title of the patch, as this does not look like a fix.
[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size
On 11/12/2015 1:04 PM, Tan, Jianfeng wrote: > This patch fixes a bug under lower version linux kernel, mmap() > fails when length is not aligned with hugepage size. mmap() > without flag of MAP_ANONYMOUS, should be called with length > argument aligned with hugepagesz at older longterm version > Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL. > This bug was fixed in Linux kernel by commit: > dab2d3dc45ae7343216635d981d43637e1cb7d45 > To avoid failure, make sure in caller to keep length aligned. > > Signed-off-by: Jianfeng Tan Acked-by: Huawei Xie Next time please add --in-reply-to with original message id. > --- > lib/librte_vhost/vhost_user/virtio-net-user.c | 36 > --- > 1 file changed, 21 insertions(+), 15 deletions(-) >
[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size
This patch fixes a bug under lower version linux kernel, mmap() fails when length is not aligned with hugepage size. mmap() without flag of MAP_ANONYMOUS, should be called with length argument aligned with hugepagesz at older longterm version Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL. This bug was fixed in Linux kernel by commit: dab2d3dc45ae7343216635d981d43637e1cb7d45 To avoid failure, make sure in caller to keep length aligned. Signed-off-by: Jianfeng Tan --- lib/librte_vhost/vhost_user/virtio-net-user.c | 36 --- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index d07452a..7ce48d0 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -74,7 +74,6 @@ free_mem_region(struct virtio_net *dev) { struct orig_region_map *region; unsigned int idx; - uint64_t alignment; if (!dev || !dev->mem) return; @@ -82,12 +81,8 @@ free_mem_region(struct virtio_net *dev) region = orig_region(dev->mem, dev->mem->nregions); for (idx = 0; idx < dev->mem->nregions; idx++) { if (region[idx].mapped_address) { - alignment = region[idx].blksz; - munmap((void *)(uintptr_t) - RTE_ALIGN_FLOOR( - region[idx].mapped_address, alignment), - RTE_ALIGN_CEIL( - region[idx].mapped_size, alignment)); + munmap((void *)region[idx].mapped_address, + region[idx].mapped_size); close(region[idx].fd); } } @@ -147,6 +142,18 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) /* This is ugly */ mapped_size = memory.regions[idx].memory_size + memory.regions[idx].mmap_offset; + + /* mmap() without flag of MAP_ANONYMOUS, should be called +* with length argument aligned with hugepagesz at older +* longterm version Linux, like 2.6.32 and 3.2.72, or +* mmap() will fail with EINVAL. +* +* to avoid failure, make sure in caller to keep length +* aligned. +*/ + alignment = get_blk_size(pmsg->fds[idx]); + mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment); + mapped_address = (uint64_t)(uintptr_t)mmap(NULL, mapped_size, PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9 +161,11 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) 0); RTE_LOG(INFO, VHOST_CONFIG, - "mapped region %d fd:%d to %p sz:0x%"PRIx64" off:0x%"PRIx64"\n", + "mapped region %d fd:%d to:%p sz:0x%"PRIx64" " + "off:0x%"PRIx64" align:0x%"PRIx64"\n", idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address, - mapped_size, memory.regions[idx].mmap_offset); + mapped_size, memory.regions[idx].mmap_offset, + alignment); if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { RTE_LOG(ERR, VHOST_CONFIG, @@ -166,7 +175,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) pregion_orig[idx].mapped_address = mapped_address; pregion_orig[idx].mapped_size = mapped_size; - pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]); + pregion_orig[idx].blksz = alignment; pregion_orig[idx].fd = pmsg->fds[idx]; mapped_address += memory.regions[idx].mmap_offset; @@ -193,11 +202,8 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) err_mmap: while (idx--) { - alignment = pregion_orig[idx].blksz; - munmap((void *)(uintptr_t)RTE_ALIGN_FLOOR( - pregion_orig[idx].mapped_address, alignment), - RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size, - alignment)); + munmap((void *)pregion_orig[idx].mapped_address, + pregion_orig[idx].mapped_size); close(pregion_orig[idx].fd); } free(dev->mem); -- 2.1.4
[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_tunnel_filter_conf
> -Original Message- > From: Wu, Jingjing > Sent: Tuesday, November 10, 2015 11:50 AM > To: dev at dpdk.org > Cc: Wu, Jingjing; Zhang, Helin; Lu, Wenzhuo > Subject: [PATCH] doc: announce ABI change for struct > rte_eth_tunnel_filter_conf > > Signed-off-by: Jingjing Wu Acked-by: Helin Zhang
[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow
> -Original Message- > From: Wu, Jingjing > Sent: Tuesday, November 10, 2015 11:11 AM > To: dev at dpdk.org > Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey > Subject: [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow > > Signed-off-by: Jingjing Wu Acked-by: Helin Zhang
[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_tunnel_filter_conf
Hi, > -Original Message- > From: Wu, Jingjing > Sent: Tuesday, November 10, 2015 11:50 AM > To: dev at dpdk.org > Cc: Wu, Jingjing ; Zhang, Helin > ; Lu, Wenzhuo > Subject: [PATCH] doc: announce ABI change for struct > rte_eth_tunnel_filter_conf > > Signed-off-by: Jingjing Wu Acked-by: Wenzhuo Lu
[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow
Hi, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu > Sent: Tuesday, November 10, 2015 11:11 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH] doc: announce ABI change for struct > rte_eth_fdir_flow > > Signed-off-by: Jingjing Wu Acked-by: Wenzhuo Lu
[dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned with hugepage size
On 11/12/2015 10:35 AM, Tan, Jianfeng wrote: > >> -Original Message- >> From: Xie, Huawei >> Sent: Wednesday, November 11, 2015 11:57 AM >> To: Tan, Jianfeng; dev at dpdk.org >> Subject: Re: [dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned >> with >> hugepage size >> >> On 10/30/2015 2:52 PM, Jianfeng Tan wrote: >>> This patch fixes a bug under lower version linux kernel, mmap() fails >>> when >> Since which version Linux hugetlbfs changes the requirement of size >> alignment? >>> length is not aligned with hugepage size. > This link shows this bug was fixed in Linux kernel commit: > dab2d3dc45ae7343216635d981d43637e1cb7d45 > After my check, that patch was applied to long term version 3.4.110+ > So distributions using 2.6.32 and 3.2.72 need this patch to make vhost work > well. > https://bugzilla.kernel.org/show_bug.cgi?id=56881 OK, please add this in commit message, remove unnecessary RTE_ALIGN in free_memory_region, and add comment to the code because our fix is a workaround to kernel hugetlbfs implementation issue. > >>> Signed-off-by: Jianfeng Tan >>> --- >>> lib/librte_vhost/vhost_user/virtio-net-user.c | 12 +--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c >>> b/lib/librte_vhost/vhost_user/virtio-net-user.c >>> index a998ad8..641561c 100644 >>> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c >>> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c >>> @@ -147,6 +147,10 @@ user_set_mem_table(struct vhost_device_ctx ctx, >> struct VhostUserMsg *pmsg) >>> /* This is ugly */ >>> mapped_size = memory.regions[idx].memory_size + >>> memory.regions[idx].mmap_offset; >>> + >>> + alignment = get_blk_size(pmsg->fds[idx]); >>> + mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment); >> Probably we could remove the alignment of mapped size in free_mem_region as >> well. > Yes, after aligning mapped_address when mmap(), this address does not need to > be aligned again > when munmap(). But this will effect nothing, or incur any performance issue. > I'm prone to take no > change to it. > >>RTE_ALIGN_CEIL( >> region[idx].mapped_size, alignment) If we are not sure, leave it as >> it is. >>> + >>> mapped_address = (uint64_t)(uintptr_t)mmap(NULL, >>> mapped_size, >>> PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9 >> +158,11 @@ >>> user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) >>> 0); >>> >>> RTE_LOG(INFO, VHOST_CONFIG, >>> - "mapped region %d fd:%d to %p sz:0x%"PRIx64" >> off:0x%"PRIx64"\n", >>> + "mapped region %d fd:%d to:%p sz:0x%"PRIx64" " >>> + "off:0x%"PRIx64" align:0x%"PRIx64"\n", >>> idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address, >>> - mapped_size, memory.regions[idx].mmap_offset); >>> + mapped_size, memory.regions[idx].mmap_offset, >>> + alignment); >>> >>> if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { >>> RTE_LOG(ERR, VHOST_CONFIG, >>> @@ -166,7 +172,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, >>> struct VhostUserMsg *pmsg) >>> >>> pregion_orig[idx].mapped_address = mapped_address; >>> pregion_orig[idx].mapped_size = mapped_size; >>> - pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]); >>> + pregion_orig[idx].blksz = alignment; >>> pregion_orig[idx].fd = pmsg->fds[idx]; >>> >>> mapped_address += memory.regions[idx].mmap_offset; >
[dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned with hugepage size
> -Original Message- > From: Xie, Huawei > Sent: Wednesday, November 11, 2015 11:57 AM > To: Tan, Jianfeng; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned > with > hugepage size > > On 10/30/2015 2:52 PM, Jianfeng Tan wrote: > > This patch fixes a bug under lower version linux kernel, mmap() fails > > when > Since which version Linux hugetlbfs changes the requirement of size alignment? > > length is not aligned with hugepage size. This link shows this bug was fixed in Linux kernel commit: dab2d3dc45ae7343216635d981d43637e1cb7d45 After my check, that patch was applied to long term version 3.4.110+ So distributions using 2.6.32 and 3.2.72 need this patch to make vhost work well. https://bugzilla.kernel.org/show_bug.cgi?id=56881 > > > > Signed-off-by: Jianfeng Tan > > --- > > lib/librte_vhost/vhost_user/virtio-net-user.c | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c > > b/lib/librte_vhost/vhost_user/virtio-net-user.c > > index a998ad8..641561c 100644 > > --- a/lib/librte_vhost/vhost_user/virtio-net-user.c > > +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c > > @@ -147,6 +147,10 @@ user_set_mem_table(struct vhost_device_ctx ctx, > struct VhostUserMsg *pmsg) > > /* This is ugly */ > > mapped_size = memory.regions[idx].memory_size + > > memory.regions[idx].mmap_offset; > > + > > + alignment = get_blk_size(pmsg->fds[idx]); > > + mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment); > Probably we could remove the alignment of mapped size in free_mem_region as > well. Yes, after aligning mapped_address when mmap(), this address does not need to be aligned again when munmap(). But this will effect nothing, or incur any performance issue. I'm prone to take no change to it. >RTE_ALIGN_CEIL( > region[idx].mapped_size, alignment) If we are not sure, leave it as > it is. > > + > > mapped_address = (uint64_t)(uintptr_t)mmap(NULL, > > mapped_size, > > PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9 > +158,11 @@ > > user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg) > > 0); > > > > RTE_LOG(INFO, VHOST_CONFIG, > > - "mapped region %d fd:%d to %p sz:0x%"PRIx64" > off:0x%"PRIx64"\n", > > + "mapped region %d fd:%d to:%p sz:0x%"PRIx64" " > > + "off:0x%"PRIx64" align:0x%"PRIx64"\n", > > idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address, > > - mapped_size, memory.regions[idx].mmap_offset); > > + mapped_size, memory.regions[idx].mmap_offset, > > + alignment); > > > > if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) { > > RTE_LOG(ERR, VHOST_CONFIG, > > @@ -166,7 +172,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, > > struct VhostUserMsg *pmsg) > > > > pregion_orig[idx].mapped_address = mapped_address; > > pregion_orig[idx].mapped_size = mapped_size; > > - pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]); > > + pregion_orig[idx].blksz = alignment; > > pregion_orig[idx].fd = pmsg->fds[idx]; > > > > mapped_address += memory.regions[idx].mmap_offset;
[dpdk-dev] [PATCH] vhost: reset device properly
On Wed, Nov 11, 2015 at 8:10 PM, Yuanhan Liu wrote: > Currently, we reset all fields of a device to zero when reset > happens, which is wrong, since for some fields like device_fh, > ifname, and virt_qp_nb, they should be same and be kept after > reset until the device is removed. And this is what's the new > helper function reset_device() for. > > And use rte_zmalloc() instead of rte_malloc, so that we could > avoid init_device(), which basically dose zero reset only so far. > Hence, init_device() is dropped in this patch. > > This patch also removes a hack of using the offset a specific > field (which is virtqueue now) inside of `virtio_net' structure > to do reset, which could be broken easily if someone changed the > field order without caution. > > Cc: Tetsuya Mukawa > Cc: Xie Huawei > Signed-off-by: Yuanhan Liu > I had a patch that just saved the ifname but this is much better. Acked-by: Rich Lane
[dpdk-dev] [PATCH] bonding: fix enumerated type mixed with another type
> > ICC complains about enumerated types being mixed in link bonding driver, > > as ETH_MQ_RX_RSS is an enum type of mq_mode and not a bitmask as it > > was > > being treated. > > > > Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration") > > > > Signed-off-by: Tomasz Kulasek > > Acked-by: Pablo de Lara Applied, thanks
[dpdk-dev] [PATCHv7 0/2] ixgbe: fix TX hang when RS distance exceeds HW limit
> > First patch contains changes in testpmd that allow to reproduce the issue. > > Second patch is the actual fix. > > > > Konstantin Ananyev (2): > > testpmd: add ability to split outgoing packets > > ixgbe: fix TX hang when RS distance exceeds HW limit > > Series-acked-by: Pablo de Lara Applied, thanks
[dpdk-dev] Permanently binding NIC ports with DPDK drivers
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Montorsi, Francesco > Sent: Wednesday, November 11, 2015 4:13 PM > To: dev at dpdk.org > Subject: [dpdk-dev] Permanently binding NIC ports with DPDK drivers > > Hi, > Is there a way to permanently (i.e., have the configuration automatically > applied after reboot) bind a NIC port to DPDK? Hi, The Ubuntu dpdk package for 15.10 contains system scripts with functions for reserving hugepages and binding interfaces on bootup: /etc/dpdk/dpdk.conf /etc/dpdk/interfaces /etc/init.d/dpdk /lib/dpdk/dpdk-init /lib/systemd/system/dpdk.service /sbin/dpdk_nic_bind /usr/bin/testpmd /usr/share/doc/dpdk/README.Debian /usr/share/doc/dpdk/changelog.Debian.gz /usr/share/doc/dpdk/copyright /usr/share/dpdk/tools/cpu_layout.py /usr/share/dpdk/tools/dpdk_nic_bind.py /usr/share/dpdk/tools/setup.sh /usr/share/python/runtime.d/dpdk.rtupdate http://packages.ubuntu.com/wily/amd64/dpdk/filelist If you have the latest version of Ubuntu you can check that out or else download and extract the files from the .deb to see how they do it. John. --
[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
The guest could trigger this buffer overflow by creating a cycle of descriptors (which would also cause an infinite loop). The more common case is that vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This happens nearly every time when restarting a DPDK app inside a VM connected to a vhost-user vswitch because the virtqueue memory allocated by the previous run is zeroed. Signed-off-by: Rich Lane --- lib/librte_vhost/vhost_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 9322ce6..d95b478 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id, vq->buf_vec[vec_id].desc_idx = idx; vec_id++; - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < BUF_VECTOR_MAX) { idx = vq->desc[idx].next; next_desc = 1; } -- 1.9.1