[dpdk-dev] Coverity policy for upstream (base) drivers.

2015-11-12 Thread Thomas Monjalon
2015-11-12 14:05, Stephen Hemminger:
> Looking at the Coverity scan for DPDK, it looks like all the base
> drivers are marked to be ignored.
> 
> Although the changes to base drivers should not be done directly through
> DPDK list. I think it is still valuable to have these driver scanned and
> notify (badger) the vendors to fix there code.
> 
> Since lots of the bugs could be there, just blindly ignoring warnings
> and issues is being naive.

I think the Coverity setup is outdated:
ignore_driver_1 /lib/librte_pmd_e1000/e1000/.*  Yes Remove
ignore_driver_2 /lib/librte_pmd_fm10k/base/.*   Yes Remove
ignore_driver_3 /lib/librte_pmd_i40e/i40e/.*Yes Remove
ignore_driver_4 /lib/librte_pmd_ixgbe/ixgbe/.*  Yes Remove

These directories don't exist anymore.


[dpdk-dev] [PATCH v2 0/7] ethdev: force deprecation of statistics

2015-11-12 Thread Thomas Monjalon
2015-11-05 17:04, Stephen Hemminger:
> Several fields in ether statistics were tagged with comment that they
> were going to be deprecated, but comments don't cause compile warnings.
> Instead use Gcc attributes to force the issue.
> 
> Of course to do that, all the drivers and tests which are using
> those fields have to be fixed first.
> 
> The input multicast statistic was listed as deprecated, but I find
> it useful, and therefore the first patch is to revive it.
> 
> Stephen Hemminger (7):
>   ether: don't mark input multicast for deprecation

not applied

>   bond: don't sum deprecated statistics
>   cxgbe: don't report deprecated statistics
>   i40e: don't report deprecated statistics
>   e1000: don't report deprecated statistics
>   test-pmd: remove references to deprecated statistics
>   rte_ether: mark deprecated statistics with attribute

The rest is applied with an extra patch for ip_pipeline example.
Thanks


[dpdk-dev] [PATCH 7/7] rte_ether: mark deprecated statistics with attribute

2015-11-12 Thread Thomas Monjalon
2015-11-05 17:04, Stephen Hemminger:
> Use deprecated attribute to highlight any use of fields that
> are marked as going away in the rte_ether device statistics.

The example app ip_pipeline does not compile.
I will add a patch to fix it.



[dpdk-dev] [PATCH v5 4/4] example/vhost: add virtio offload test in vhost sample

2015-11-12 Thread Jijiang Liu
Change the codes in vhost sample to test virtio offload feature.

These changes include,

1. add two test options: tx-csum and tso.

2. add virtio_tx_offload() function to test vhost TX offload feature for VM to 
NIC case; 
however, for VM to VM case, it doesn't need to call this function, the reason 
is explained in patch 2.

Signed-off-by: Jijiang Liu 
---
 examples/vhost/main.c |  105 +++-
 1 files changed, 102 insertions(+), 3 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 044c680..210e631 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -51,6 +51,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 

 #include "main.h"

@@ -198,6 +201,13 @@ typedef enum {
 static uint32_t enable_stats = 0;
 /* Enable retries on RX. */
 static uint32_t enable_retry = 1;
+
+/* Disable TX checksum offload */
+static uint32_t enable_tx_csum;
+
+/* Disable TSO offload */
+static uint32_t enable_tso;
+
 /* Specify timeout (in useconds) between retries on RX. */
 static uint32_t burst_rx_delay_time = BURST_RX_WAIT_US;
 /* Specify the number of retries on RX. */
@@ -428,6 +438,14 @@ port_init(uint8_t port)

if (port >= rte_eth_dev_count()) return -1;

+   if (enable_tx_csum == 0)
+   rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_CSUM);
+
+   if (enable_tso == 0) {
+   rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
+   rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+   }
+
rx_rings = (uint16_t)dev_info.max_rx_queues;
/* Configure ethernet device. */
retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
@@ -563,7 +581,9 @@ us_vhost_usage(const char *prgname)
"   --rx-desc-num [0-N]: the number of descriptors on rx, "
"used only when zero copy is enabled.\n"
"   --tx-desc-num [0-N]: the number of descriptors on tx, "
-   "used only when zero copy is enabled.\n",
+   "used only when zero copy is enabled.\n"
+   "   --tx-csum [0|1] disable/enable TX checksum offload.\n"
+   "   --tso [0|1] disable/enable TCP segement offload.\n",
   prgname);
 }

@@ -589,6 +609,8 @@ us_vhost_parse_args(int argc, char **argv)
{"zero-copy", required_argument, NULL, 0},
{"rx-desc-num", required_argument, NULL, 0},
{"tx-desc-num", required_argument, NULL, 0},
+   {"tx-csum", required_argument, NULL, 0},
+   {"tso", required_argument, NULL, 0},
{NULL, 0, 0, 0},
};

@@ -643,6 +665,28 @@ us_vhost_parse_args(int argc, char **argv)
}
}

+   /* Enable/disable TX checksum offload. */
+   if (!strncmp(long_option[option_index].name, "tx-csum", 
MAX_LONG_OPT_SZ)) {
+   ret = parse_num_opt(optarg, 1);
+   if (ret == -1) {
+   RTE_LOG(INFO, VHOST_CONFIG, "Invalid 
argument for tx-csum [0|1]\n");
+   us_vhost_usage(prgname);
+   return -1;
+   } else
+   enable_tx_csum = ret;
+   }
+
+   /* Enable/disable TSO offload. */
+   if (!strncmp(long_option[option_index].name, "tso", 
MAX_LONG_OPT_SZ)) {
+   ret = parse_num_opt(optarg, 1);
+   if (ret == -1) {
+   RTE_LOG(INFO, VHOST_CONFIG, "Invalid 
argument for tso [0|1]\n");
+   us_vhost_usage(prgname);
+   return -1;
+   } else
+   enable_tso = ret;
+   }
+
/* Specify the retries delay time (in useconds) on RX. 
*/
if (!strncmp(long_option[option_index].name, 
"rx-retry-delay", MAX_LONG_OPT_SZ)) {
ret = parse_num_opt(optarg, INT32_MAX);
@@ -1101,6 +1145,58 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf 
*m,
return 0;
 }

+static uint16_t
+get_psd_sum(void *l3_hdr, uint64_t ol_flags)
+{
+   if (ol_flags & PKT_TX_IPV4)
+   return rte_ipv4_phdr_cksum(l3_hdr, ol_flags);
+   else /* assume ethertype == ETHER_TYPE_IPv6 */
+   return rte_ipv6_phdr_cksum(l3_hdr, ol_flags);
+}
+
+static void virtio_tx_offload(struct rte_mbuf *m)
+{
+   void *l3_hdr;
+   struct ipv4_hdr *ipv4_hdr = NULL;
+   struct tcp_hdr *tcp_hdr = NULL;
+   struct udp_hdr *udp_hdr = NULL;
+   struct sctp_hdr *sctp_hdr = N

[dpdk-dev] [PATCH v5 3/4] sample/vhost: remove the ipv4_hdr structure defination

2015-11-12 Thread Jijiang Liu
Remove the ipv4_hdr structure defination in vhost sample.

The same structure has already defined in the rte_ip.h file, so we remove the 
defination from the sample, and include that header file.

Signed-off-by: Jijiang Liu 
---
 examples/vhost/main.c |   15 +--
 1 files changed, 1 insertions(+), 14 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c081b18..044c680 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"

@@ -292,20 +293,6 @@ struct vlan_ethhdr {
__be16  h_vlan_encapsulated_proto;
 };

-/* IPv4 Header */
-struct ipv4_hdr {
-   uint8_t  version_ihl;   /**< version and header length */
-   uint8_t  type_of_service;   /**< type of service */
-   uint16_t total_length;  /**< length of packet */
-   uint16_t packet_id; /**< packet ID */
-   uint16_t fragment_offset;   /**< fragmentation offset */
-   uint8_t  time_to_live;  /**< time to live */
-   uint8_t  next_proto_id; /**< protocol ID */
-   uint16_t hdr_checksum;  /**< header checksum */
-   uint32_t src_addr;  /**< source address */
-   uint32_t dst_addr;  /**< destination address */
-} __attribute__((__packed__));
-
 /* Header lengths. */
 #define VLAN_HLEN   4
 #define VLAN_ETH_HLEN   18
-- 
1.7.7.6



[dpdk-dev] [PATCH v5 2/4] vhost/lib: add guest offload setting

2015-11-12 Thread Jijiang Liu
Add guest offload setting in vhost lib.

Refer to the feature bits description in the Virtual I/O Device (VIRTIO) 
Version 1.0 below, 

1. VIRTIO_NET_F_GUEST_CSUM (1) Driver handles packets with partial checksum.

2. If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the 
VIRTIO_NET_HDR_F_NEEDS_- CSUM bit in flags MAY be set: if so, the checksum on 
the packet is incomplete and csum_start and csum_offset indicate how to 
calculate it (see Packet Transmission point 1).

3. If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were negotiated, then 
gso_type MAY be something other than VIRTIO_NET_HDR_GSO_NONE, and gso_size 
field indicates the desired MSS (see Packet Transmission point 2).

In order to support these features, the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features 
negotiation.

2. Enqueue these offloads: convert some fields in mbuf to the fields in 
virtio_net_hdr.

There are more explanations for the implementation.

For VM2VM case, there is no need to do checksum, for we
think the data should be reliable enough, and setting 
VIRTIO_NET_HDR_F_NEEDS_CSUM
at RX side will let the TCP layer to bypass the checksum validation,
so that the RX side could receive the packet in the end.

In terms of us-vhost, at vhost RX side, the offload information is inherited 
from mbuf, which is
in turn inherited from TX side. If we can still get those info at RX
side, it means the packet is from another VM at same host.  So, it's
safe to set the VIRTIO_NET_HDR_F_NEEDS_CSUM, to skip checksum validation.

Signed-off-by: Jijiang Liu 
---
 lib/librte_vhost/vhost_rxtx.c |   47 +++-
 lib/librte_vhost/virtio-net.c |5 +++-
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 47d5f85..9d97e19 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -54,6 +54,44 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
qp_nb)
return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM;
 }

+static void
+virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr)
+{
+   memset(net_hdr, 0, sizeof(struct virtio_net_hdr));
+
+   if (m_buf->ol_flags & PKT_TX_L4_MASK) {
+   net_hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+   net_hdr->csum_start = m_buf->l2_len + m_buf->l3_len;
+
+   switch (m_buf->ol_flags & PKT_TX_L4_MASK) {
+   case PKT_TX_TCP_CKSUM:
+   net_hdr->csum_offset = (offsetof(struct tcp_hdr,
+   cksum));
+   break;
+   case PKT_TX_UDP_CKSUM:
+   net_hdr->csum_offset = (offsetof(struct udp_hdr,
+   dgram_cksum));
+   break;
+   case PKT_TX_SCTP_CKSUM:
+   net_hdr->csum_offset = (offsetof(struct sctp_hdr,
+   cksum));
+   break;
+   }
+   }
+
+   if (m_buf->ol_flags & PKT_TX_TCP_SEG) {
+   if (m_buf->ol_flags & PKT_TX_IPV4)
+   net_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+   else
+   net_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+   net_hdr->gso_size = m_buf->tso_segsz;
+   net_hdr->hdr_len = m_buf->l2_len + m_buf->l3_len
+   + m_buf->l4_len;
+   }
+
+   return;
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -67,7 +105,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 {
struct vhost_virtqueue *vq;
struct vring_desc *desc;
-   struct rte_mbuf *buff;
+   struct rte_mbuf *buff, *first_buff;
/* The virtio_hdr is initialised to 0. */
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
uint64_t buff_addr = 0;
@@ -139,6 +177,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
desc = &vq->desc[head[packet_success]];

buff = pkts[packet_success];
+   first_buff = buff;

/* Convert from gpa to vva (guest physical addr -> vhost 
virtual addr) */
buff_addr = gpa_to_vva(dev, desc->addr);
@@ -221,7 +260,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

if (unlikely(uncompleted_pkt == 1))
continue;
-
+   
+   virtio_enqueue_offload(first_buff, &virtio_hdr.hdr);
+   
rte_memcpy((void *)(uintptr_t)buff_hdr_addr,
(const void *)&virtio_hdr, vq->vhost_hlen);

@@ -295,6 +336,8 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, 

[dpdk-dev] [PATCH v5 1/4] vhost/lib: add vhost TX offload capabilities in vhost lib

2015-11-12 Thread Jijiang Liu
Add vhost TX offload(CSUM and TSO) support capabilities in vhost lib.

Refer to feature bits in Virtual I/O Device (VIRTIO) Version 1.0 below,

VIRTIO_NET_F_CSUM (0) Device handles packets with partial checksum. This 
"checksum offload" is a common feature on modern network cards.
VIRTIO_NET_F_HOST_TSO4 (11) Device can receive TSOv4.
VIRTIO_NET_F_HOST_TSO6 (12) Device can receive TSOv6.

In order to support these features, and the following changes are added,

1. Extend 'VHOST_SUPPORTED_FEATURES' macro to add the offload features 
negotiation.

2. Dequeue TX offload: convert the fileds in virtio_net_hdr to the related 
fileds in mbuf.


Signed-off-by: Jijiang Liu 
---
 lib/librte_vhost/vhost_rxtx.c |  103 +
 lib/librte_vhost/virtio-net.c |6 ++-
 2 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 9322ce6..47d5f85 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -37,7 +37,12 @@

 #include 
 #include 
+#include 
+#include 
 #include 
+#include 
+#include 
+#include 

 #include "vhost-net.h"

@@ -568,6 +573,97 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t 
queue_id,
return virtio_dev_rx(dev, queue_id, pkts, count);
 }

+static void
+parse_ethernet(struct rte_mbuf *m, uint16_t *l4_proto, void **l4_hdr)
+{
+   struct ipv4_hdr *ipv4_hdr;
+   struct ipv6_hdr *ipv6_hdr;
+   void *l3_hdr = NULL;
+   struct ether_hdr *eth_hdr;
+   uint16_t ethertype;
+
+   eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   m->l2_len = sizeof(struct ether_hdr);
+   ethertype = rte_be_to_cpu_16(eth_hdr->ether_type);
+
+   if (ethertype == ETHER_TYPE_VLAN) {
+   struct vlan_hdr *vlan_hdr = (struct vlan_hdr *)(eth_hdr + 1);
+
+   m->l2_len += sizeof(struct vlan_hdr);
+   ethertype = rte_be_to_cpu_16(vlan_hdr->eth_proto);
+   }
+
+   l3_hdr = (char *)eth_hdr + m->l2_len;
+
+   switch (ethertype) {
+   case ETHER_TYPE_IPv4:
+   ipv4_hdr = (struct ipv4_hdr *)l3_hdr;
+   *l4_proto = ipv4_hdr->next_proto_id;
+   m->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+   *l4_hdr = (char *)l3_hdr + m->l3_len;
+   m->ol_flags |= PKT_TX_IPV4;
+   break;
+   case ETHER_TYPE_IPv6:
+   ipv6_hdr = (struct ipv6_hdr *)l3_hdr;
+   *l4_proto = ipv6_hdr->proto;
+   m->l3_len = sizeof(struct ipv6_hdr);
+   *l4_hdr = (char *)l3_hdr + m->l3_len;
+   m->ol_flags |= PKT_TX_IPV6;
+   break;
+   default:
+   m->l3_len = 0;
+   *l4_proto = 0;
+   break;
+   }
+}
+
+static inline void __attribute__((always_inline))
+vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
+{
+   uint16_t l4_proto = 0;
+   void *l4_hdr = NULL;
+   struct tcp_hdr *tcp_hdr = NULL;
+
+   parse_ethernet(m, &l4_proto, &l4_hdr);
+   if (hdr->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+   if (hdr->csum_start == (m->l2_len + m->l3_len)) {
+   switch (hdr->csum_offset) {
+   case (offsetof(struct tcp_hdr, cksum)):
+   if (l4_proto == IPPROTO_TCP)
+   m->ol_flags |= PKT_TX_TCP_CKSUM;
+   break;
+   case (offsetof(struct udp_hdr, dgram_cksum)):
+   if (l4_proto == IPPROTO_UDP)
+   m->ol_flags |= PKT_TX_UDP_CKSUM;
+   break;
+   case (offsetof(struct sctp_hdr, cksum)):
+   if (l4_proto == IPPROTO_SCTP)
+   m->ol_flags |= PKT_TX_SCTP_CKSUM;
+   break;
+   default:
+   break;
+   }
+   }
+   }
+
+   if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+   switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+   case VIRTIO_NET_HDR_GSO_TCPV4:
+   case VIRTIO_NET_HDR_GSO_TCPV6:
+   tcp_hdr = (struct tcp_hdr *)l4_hdr;
+   m->ol_flags |= PKT_TX_TCP_SEG;
+   m->tso_segsz = hdr->gso_size;
+   m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   break;
+   default:
+   RTE_LOG(WARNING, VHOST_DATA,
+   "unsupported gso type %u.\n", hdr->gso_type);
+   break;
+   }
+   }
+}
+
 uint16_t
 rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
@@ -576,11 +672,13 @@ 

[dpdk-dev] [PATCH v5 0/4] add virtio offload support in us-vhost

2015-11-12 Thread Jijiang Liu
Adds virtio offload support in us-vhost.

The patch set adds the feature negotiation of checksum and TSO between us-vhost 
and vanilla Linux virtio guest, and add these offload features support in the 
vhost lib, and change vhost sample to test them.

v5 changes:
  Add more clear descriptions to explain these changes.
  reset the 'virtio_net_hdr' value in the virtio_enqueue_offload() function.
  reorganize patches. 


v4 change:
  remove virtio-net change, only keep vhost changes.
  add guest TX offload capabilities to support VM to VM case.
  split the cleanup code as a separate patch.

v3 change:
  rebase latest codes.

v2 change:
  fill virtio device information for TX offloads.

*** BLURB HERE ***

Jijiang Liu (4):
  add vhost offload capabilities
  remove ipv4_hdr structure from vhost sample.
  add guest offload setting ln the vhost lib.
  change vhost application to test checksum and TSO for VM to NIC case

 examples/vhost/main.c |  120 -
 lib/librte_vhost/vhost_rxtx.c |  150 -
 lib/librte_vhost/virtio-net.c |9 ++-
 3 files changed, 259 insertions(+), 20 deletions(-)

-- 
1.7.7.6



[dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD

2015-11-12 Thread Rich Lane
>
> +   if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +   ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +&open_iface, &iface_name);
> +   if (ret < 0)
> +   goto out_free;
> +   }
>

I noticed that the strdup in eth_dev_vhost_create crashes if you don't pass
the iface option, so this should probably return an error if the option
doesn't exist.


[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs

2015-11-12 Thread Thomas Monjalon
2015-11-12 09:38, Stephen Hemminger:
> On Thu, 12 Nov 2015 08:17:57 +0800
> Jianfeng Tan  wrote:
> 
> > This patch enables calculating space left in a hugetlbfs.
> > There are three sources to get the information: 1. from
> > sysfs; 2. from option size specified when mount; 3. use
> > statfs. We should use the minimum one of these three sizes.
> > 
> > Signed-off-by: Jianfeng Tan 
> 
> Thanks, the hugetlbfs usage up until now has been rather brute force.
> I wonder if long term it might be better to defer all this stuff
> to another library like libhugetlbfs.
>  https://github.com/libhugetlbfs/libhugetlbfs
> 
> Especially wen dealing with other architectures it might provide
> some nice abstraction.

Maybe, maybe not :)
Sergio arleady looked at it:
http://dpdk.org/ml/archives/dev/2015-July/022080.html


[dpdk-dev] Coverity policy for upstream (base) drivers.

2015-11-12 Thread Matthew Hall
On Thu, Nov 12, 2015 at 02:05:08PM -0800, Stephen Hemminger wrote:
> Looking at the Coverity scan for DPDK, it looks like all the base
> drivers are marked to be ignored.
> 
> Although the changes to base drivers should not be done directly through
> DPDK list. I think it is still valuable to have these driver scanned and
> notify (badger) the vendors to fix there code.
> 
> Since lots of the bugs could be there, just blindly ignoring warnings
> and issues is being naive.

I am with Stephen. Ignoring base driver vulns is a bad practice.

With these L1-L4 bugs the chances are good somebody could trigger these and 
find 0days using tools as old and simple as this one:

http://isic.sourceforge.net/

Matthew.


[dpdk-dev] [PATCH 1/7] ether: don't mark input multicast for deprecation

2015-11-12 Thread Thomas Monjalon
2015-11-05 17:04, Stephen Hemminger:
> The number of received multicast frames is useful and already
> available in many/most drivers. Therefore don't mark it as
> deprecated.

There are other useful stats in xstats.
The idea of this basic stats structure is to provide only
the really mandatory and basic counters.
A multicast counter is not so basic and won't be implemented everywhere.

This patch won't be applied.
We'll need a consensus to definitively remove the deprecated stats.



[dpdk-dev] Making rte_eal_pci_probe() in rte_eal_init() optional?

2015-11-12 Thread Roger B Melton
Hi folks,

With the addition of hot plug support we have been migrating away from 
device discovery and attach at initialization time to a model where it 
is controlled from a separate process.  The separate process manages the 
binding of devices to UIO and instructs the DPDK process when to 
attach.  One of the problems we stumbled onto was that if our control 
process discovered devices and bound them to UIO before our DPDK process 
started, then rte_eal_init() would discover and attach to those devices 
via the rte_eal_pci_probe() invocation. This caused problems later on 
when when our control process, instructed our DPDK process to attach to 
a device.

There are a number of ways we could address this, but the simplest is to 
prevent the rte_eal_pci_probe() at rte_eal_init() time.  In our model we 
will never need it and I suspect others may also be in that boat.

What are your thoughts on adding an argument to instruct rte_eal_init() 
to skip the PCI probe?

Thanks,
-Roger



[dpdk-dev] [PATCH 0/3] xstats queue handling

2015-11-12 Thread Thomas Monjalon
2015-11-06 14:12, Harry van Haaren:
> This patchset modifies how queue statistics are presented by
> rte_eth_xstats_get() and each PMD's xstats_get().
> 
> Generic stats from the rte_eth_stats struct are presented by rte, and each
> PMD can augment those stats with extra stats that are available (if any).
> 
> Currently ixgbe and i40e are the only NICs supporting queue xstats, and
> they have been updated to conform with the new method of presentation.
> 
> 
> Harry van Haaren (3):
>   ethdev: xstats generic Q stats refactor
>   ixgbe: refactor xstats queue handling
>   i40e: refactor xstats queue handling

Applied, thanks


[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-12 Thread Yuanhan Liu
On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> The guest could trigger this buffer overflow by creating a cycle of 
> descriptors
> (which would also cause an infinite loop). The more common case is that
> vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This
> happens nearly every time when restarting a DPDK app inside a VM connected to 
> a
> vhost-user vswitch because the virtqueue memory allocated by the previous run
> is zeroed.

Hi,

I somehow was aware of this issue before while reading the code.
Thinking that we never met that, I delayed the fix (it was still
in my TODO list).

Would you please tell me the steps (commands would be better) to
reproduce your issue? I'd like to know more about the isue: I'm
guessing maybe we need fix it with a bit more cares.

--yliu
> 
> Signed-off-by: Rich Lane 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 9322ce6..d95b478 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
>   vq->buf_vec[vec_id].desc_idx = idx;
>   vec_id++;
>  
> - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < 
> BUF_VECTOR_MAX) {
>   idx = vq->desc[idx].next;
>   next_desc = 1;
>   }
> -- 
> 1.9.1


[dpdk-dev] [PATCH v4 0/2] Add support for driver directories

2015-11-12 Thread Thomas Monjalon
> > This mini-series adds support for driver directory concept
> > based on idea by Thomas Monjalon back in February:
> > http://dpdk.org/ml/archives/dev/2015-February/013285.html
> >
> > In the process FreeBSD also gains plugin support (but untested).
> >
> > v4: - introduce error-early behavior for invalid plugin paths
> > - support directories via the existing -d option instead of adding new
> >
> > v3: - merge the first commits
> >
> > v2: - move code to eal/common
> > - add bsd support
> >
> > Panu Matilainen (2):
> >   eal: move plugin loading to eal/common
> >   eal: add support for driver directory concept
> 
> 
> checkpatch complains for some indent problem (Thomas, can you fix this ?),
> but the rest looks good to me.
> 
> Acked-by: David Marchand 
> 
> Thanks Panu.

Applied, thanks


[dpdk-dev] [PATCH] MAINTAINERS: update maintainer for reorder library

2015-11-12 Thread Thomas Monjalon
> >   Reorder
> > -M: Sergio Gonzalez Monroy 
> > +M: Reshma Pattan 
> >   F: lib/librte_reorder/
> >   F: doc/guides/prog_guide/reorder_lib.rst
> >   F: app/test/test_reorder*
> Acked-by: Sergio Gonzalez Monroy 

So you are replacing Sergio.
Any enhancement or feature planned?


[dpdk-dev] [PATCH v2] app/test: fix reorder library unit test

2015-11-12 Thread Thomas Monjalon
2015-10-30 14:30, Sergio Gonzalez Monroy:
> On 21/10/2015 14:01, Sergio Gonzalez Monroy wrote:
> > On 21/10/2015 11:50, Reshma Pattan wrote:
> >> The reorder library unit test was performed under the assumption that 
> >> the start
> >> sequence number was always 0.
> >> This is not the case anymore as the start sequence number is 
> >> initialized by the first
> >> packet inserted into the reorder buffer.
> >>
> >> This patch updates the unit test to reflect the new behavior.
> >>
> >> Fixes: 7e1fa1de8a53 ("reorder: allow random number as starting point")
> >>
> >> Signed-off-by: Reshma Pattan
> >>
> > Acked-by: Sergio Gonzalez Monroy 
> Forgot to add this tag:
> 
> Reported-by: Mukesh Dua 

Applied, thanks


[dpdk-dev] [PATCH] i40e: fix the issue of trying more VSIs for VMDq than available

2015-11-12 Thread Helin Zhang
It fixes the issue of trying to allocate more VSIs for VMDq than
hardware remaining. It adds a check of the hardware remaining
before allocating VSIs for VMDq.

Fixes: c80707a0fd9c ("i40e: fix VMDq pool limit")

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index e4684d3..323b1ff 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3118,7 +3118,8 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev)
pf->vmdq_nb_qps = 0;
pf->max_nb_vmdq_vsi = 0;
if (hw->func_caps.vmdq) {
-   if (qp_count < hw->func_caps.num_tx_qp) {
+   if (qp_count < hw->func_caps.num_tx_qp &&
+   vsi_count < hw->func_caps.num_vsis) {
pf->max_nb_vmdq_vsi = (hw->func_caps.num_tx_qp -
qp_count) / pf->vmdq_nb_qp_max;

@@ -3126,6 +3127,8 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev)
 * ethdev can support
 */
pf->max_nb_vmdq_vsi = RTE_MIN(pf->max_nb_vmdq_vsi,
+   hw->func_caps.num_vsis - vsi_count);
+   pf->max_nb_vmdq_vsi = RTE_MIN(pf->max_nb_vmdq_vsi,
ETH_64_POOLS);
if (pf->max_nb_vmdq_vsi) {
pf->flags |= I40E_FLAG_VMDQ;
@@ -3140,7 +3143,7 @@ i40e_pf_parameter_init(struct rte_eth_dev *dev)
"VMDq");
}
} else {
-   PMD_DRV_LOG(INFO, "No queue left for VMDq");
+   PMD_DRV_LOG(INFO, "No queue or VSI left for VMDq");
}
}
qp_count += pf->vmdq_nb_qps * pf->max_nb_vmdq_vsi;
-- 
1.9.3



[dpdk-dev] [PATCH] doc: add entry for enic PMD Tx improvement to the 2.2 release notes.

2015-11-12 Thread Thomas Monjalon
2015-11-06 15:08, johndale:
> Signed-off-by: johndale 

Applied, thanks


[dpdk-dev] Permanently binding NIC ports with DPDK drivers

2015-11-12 Thread Panu Matilainen
On 11/11/2015 06:28 PM, Bruce Richardson wrote:
> On Wed, Nov 11, 2015 at 04:13:01PM +, Montorsi, Francesco wrote:
>> Hi,
>> Is there a way to permanently (i.e., have the configuration automatically 
>> applied after reboot) bind a NIC port to DPDK?
>>
>> In case there's none, I'm thinking to save in my software a list of the NIC 
>> ports chosen by the user for use with DPDK and then, upon software startup 
>> to just do
>>  for (int i=0; i < ...; i++)
>>   system("dpdk_nic_bind.py --bind=igb_uio " + PCI_device_chosen[i]);
>> Do you see any problem with that?
>>
>> Thanks!
>> Francesco Montorsi
>>
>
> Hi Francesco,
>
> I'm not aware of any way to make the bindings permanent across reboots. What 
> you
> have suggested will work, but there are probably better ways to do the same 
> thing.
> For example, a couple of lines in an rc.local script can reapply the bindings 
> at
> boot for you. I'm sure others can suggest other ways of having the same 
> effect,
> for example, there may be a way to automatically do this using udev or systemd
> or some such package.

I've been looking into this recently, here's what I have so far:
http://laiskiainen.org/git/?p=driverctl.git

For the impatient, "make rpm" should produce something usable for recent 
Fedora/RHEL systems, usage looks somewhat like this:

Find devices currently driven by ixgbe driver:
# driverctl -v list-devices | grep ixgbe
:01:00.0 ixgbe (Ethernet 10G 4P X520/I350 rNDC)
:01:00.1 ixgbe (Ethernet 10G 4P X520/I350 rNDC)

Change them to use the vfio-pci driver permanently:
# driverctl set-override :01:00.0 vfio-pci
# driverctl set-override :01:00.1 vfio-pci

Find devices with driver overrides:
[root at wsfd-netdev32 ~]# driverctl -v list-devices|grep \*
:01:00.0 vfio-pci [*] (Ethernet 10G 4P X520/I350 rNDC)
:01:00.1 vfio-pci [*] (Ethernet 10G 4P X520/I350 rNDC)

Remove the permanent driver override for device :01:00.1:
# driverctl unset-override :01:00.1

In addition it has udev rules to export vfio and uio devices on systemd 
level, eg the above looks like this with normal drivers:

# systemctl |grep :01:00
sys-devices-pci:00-:00:03.0-:01:00.0-net-em1.device 
  loaded active plugged   Ethernet 10G 4P X520/I350 rNDC
sys-devices-pci:00-:00:03.0-:01:00.1-net-em2.device 
  loaded active plugged   Ethernet 10G 4P X520/I350 rNDC

When changed to vfio, with upstream systemd/udev rules they would just 
disappear entirely, but with the driverctl rules they become:

# systemctl |grep :01:00
sys-devices-pci:00-:00:03.0-:01:00.0-vfio.device 
  loaded active plugged 
/sys/devices/pci:00/:00:03.0/:01:00.0/vfio
sys-devices-pci:00-:00:03.0-:01:00.1-vfio.device 
  loaded active plugged 
/sys/devices/pci:00/:00:03.0/:01:00.1/vfio

- Panu -


[dpdk-dev] [PATCH 1/7] ether: don't mark input multicast for deprecation

2015-11-12 Thread Stephen Hemminger
On Thu,  5 Nov 2015 17:04:33 -0800
Stephen Hemminger  wrote:

> The number of received multicast frames is useful and already
> available in many/most drivers. Therefore don't mark it as
> deprecated.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 1 -
>  lib/librte_ether/rte_ethdev.h| 3 +--
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 0b0bbcf..3b71c0c 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -2715,7 +2715,6 @@ ixgbevf_dev_stats_get(struct rte_eth_dev *dev, struct 
> rte_eth_stats *stats)
>   stats->opackets = hw_stats->vfgptc;
>   stats->obytes = hw_stats->vfgotc;
>   stats->imcasts = hw_stats->vfmprc;
> - /* stats->imcasts should be removed as imcasts is deprecated */
>  }
>  
>  static void
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 48a540d..f653e37 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -204,8 +204,7 @@ struct rte_eth_stats {
>   /**< Deprecated; Total of RX packets with bad length. */
>   uint64_t ierrors;   /**< Total number of erroneous received packets. */
>   uint64_t oerrors;   /**< Total number of failed transmitted packets. */
> - uint64_t imcasts;
> - /**< Deprecated; Total number of multicast received packets. */
> + uint64_t imcasts;   /**< Total number of multicast received packets. */
>   uint64_t rx_nombuf; /**< Total number of RX mbuf allocation failures. */
>   uint64_t fdirmatch;
>   /**< Deprecated; Total number of RX packets matching a filter. */

I am okay with removing imcasts if all the drivers that support provide
the same information in xstats.


[dpdk-dev] [PATCH v3] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Jianfeng Tan
This patch fixes a bug under lower version linux kernel, mmap()
fails when length is not aligned with hugepage size. mmap()
without flag of MAP_ANONYMOUS, should be called with length
argument aligned with hugepagesz at older longterm version
Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
This bug was fixed in Linux kernel by commit:
dab2d3dc45ae7343216635d981d43637e1cb7d45
To avoid failure, make sure in caller to keep length aligned.

v3 changes:
 - fix (u64) -> (void *) convert error on 32-bit system

v2 changes:
 - add Kernel version comments and commit msg
 - remove unnecessary alignments when munmap

Signed-off-by: Jianfeng Tan 
---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 36 ---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..99da029 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -74,7 +74,6 @@ free_mem_region(struct virtio_net *dev)
 {
struct orig_region_map *region;
unsigned int idx;
-   uint64_t alignment;

if (!dev || !dev->mem)
return;
@@ -82,12 +81,8 @@ free_mem_region(struct virtio_net *dev)
region = orig_region(dev->mem, dev->mem->nregions);
for (idx = 0; idx < dev->mem->nregions; idx++) {
if (region[idx].mapped_address) {
-   alignment = region[idx].blksz;
-   munmap((void *)(uintptr_t)
-   RTE_ALIGN_FLOOR(
-   region[idx].mapped_address, alignment),
-   RTE_ALIGN_CEIL(
-   region[idx].mapped_size, alignment));
+   munmap((void *)(uintptr_t)region[idx].mapped_address,
+   region[idx].mapped_size);
close(region[idx].fd);
}
}
@@ -147,6 +142,18 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
/* This is ugly */
mapped_size = memory.regions[idx].memory_size +
memory.regions[idx].mmap_offset;
+
+   /* mmap() without flag of MAP_ANONYMOUS, should be called
+* with length argument aligned with hugepagesz at older
+* longterm version Linux, like 2.6.32 and 3.2.72, or
+* mmap() will fail with EINVAL.
+*
+* to avoid failure, make sure in caller to keep length
+* aligned.
+*/
+   alignment = get_blk_size(pmsg->fds[idx]);
+   mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+
mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
mapped_size,
PROT_READ | PROT_WRITE, MAP_SHARED,
@@ -154,9 +161,11 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
0);

RTE_LOG(INFO, VHOST_CONFIG,
-   "mapped region %d fd:%d to %p sz:0x%"PRIx64" 
off:0x%"PRIx64"\n",
+   "mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
+   "off:0x%"PRIx64" align:0x%"PRIx64"\n",
idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-   mapped_size, memory.regions[idx].mmap_offset);
+   mapped_size, memory.regions[idx].mmap_offset,
+   alignment);

if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
RTE_LOG(ERR, VHOST_CONFIG,
@@ -166,7 +175,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)

pregion_orig[idx].mapped_address = mapped_address;
pregion_orig[idx].mapped_size = mapped_size;
-   pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
+   pregion_orig[idx].blksz = alignment;
pregion_orig[idx].fd = pmsg->fds[idx];

mapped_address +=  memory.regions[idx].mmap_offset;
@@ -193,11 +202,8 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)

 err_mmap:
while (idx--) {
-   alignment = pregion_orig[idx].blksz;
-   munmap((void *)(uintptr_t)RTE_ALIGN_FLOOR(
-   pregion_orig[idx].mapped_address, alignment),
-   RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size,
-   alignment));
+   munmap((void *)(uintptr_t)pregion_orig[idx].mapped_address,
+   pregion_orig[idx].mapped_size);
close(pregion_orig[idx].fd);
}
free(dev->mem);
-- 
2.1.4



[dpdk-dev] Coverity policy for upstream (base) drivers.

2015-11-12 Thread Stephen Hemminger
Looking at the Coverity scan for DPDK, it looks like all the base
drivers are marked to be ignored.

Although the changes to base drivers should not be done directly through
DPDK list. I think it is still valuable to have these driver scanned and
notify (badger) the vendors to fix there code.

Since lots of the bugs could be there, just blindly ignoring warnings
and issues is being naive.


[dpdk-dev] ACL Library Information Request

2015-11-12 Thread Jason Terry
HI,

   I've read the documentation and looked at the example acl app.  What is the 
best practice for deleting rules?  The API looks like a new context needs 
created and built.  Is that true?  Also, this is more of a confirmation, but 
RTE_ACL_MAX_FIELDS is defined as 64, so I assume that for ipv4 we can have a 
tuple that's larger than 5?

Thanks,
Jason



[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-12 Thread Rich Lane
You can reproduce this with l2fwd and the vhost PMD.

You'll need this patch on top of the vhost PMD patches:
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
return -1;

if (dev->flags & VIRTIO_DEV_RUNNING)
-   notify_ops->destroy_device(dev);
+   notify_destroy_device(dev);

cleanup_device(dev);
reset_device(dev);

1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
eth_vhost0,iface=/run/vhost0.sock -- -p3
2. Start a VM using vhost-user and set up uio, hugepages, etc.
3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
4. Kill the l2fwd inside the VM with SIGINT.
5. Start l2fwd inside the VM.
6. l2fwd on the host crashes.

I found the source of the memory corruption by setting a watchpoint in
gdb: watch -l rte_eth_devices[1].data->rx_queues

On Thu, Nov 12, 2015 at 1:23 AM, Yuanhan Liu 
wrote:

> On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> > The guest could trigger this buffer overflow by creating a cycle of
> descriptors
> > (which would also cause an infinite loop). The more common case is that
> > vq->avail->idx jumps out of the range [last_used_idx,
> last_used_idx+256). This
> > happens nearly every time when restarting a DPDK app inside a VM
> connected to a
> > vhost-user vswitch because the virtqueue memory allocated by the
> previous run
> > is zeroed.
>
> Hi,
>
> I somehow was aware of this issue before while reading the code.
> Thinking that we never met that, I delayed the fix (it was still
> in my TODO list).
>
> Would you please tell me the steps (commands would be better) to
> reproduce your issue? I'd like to know more about the isue: I'm
> guessing maybe we need fix it with a bit more cares.
>
> --yliu
> >
> > Signed-off-by: Rich Lane 
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c
> b/lib/librte_vhost/vhost_rxtx.c
> > index 9322ce6..d95b478 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq,
> uint32_t id,
> >   vq->buf_vec[vec_id].desc_idx = idx;
> >   vec_id++;
> >
> > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id <
> BUF_VECTOR_MAX) {
> >   idx = vq->desc[idx].next;
> >   next_desc = 1;
> >   }
> > --
> > 1.9.1
>


[dpdk-dev] [PATCH v6 0/8] add sample ptp slave application

2015-11-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pablo de Lara
> Sent: Thursday, November 12, 2015 12:56 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 0/8] add sample ptp slave application
> 
> 
> Add a sample application that acts as a PTP slave using the DPDK IEEE1588
> functions.
> 
> Also add some additional IEEE1588 support functions to enable getting,
> setting and adjusting the device time.
> 
> V5->v6:
>  - Moved common functionality for cyclecounter and time conversions
>functions to lib/librte_eal/common/include/rte_time.h, based on mailing
>list comments.
>  - Prefixed functions with rte_ and added Doxygen comments.
>  - Refactored cyclecounter structs from previous version to make it more
>generic.
>  - Fix ieee1588 fwd output in testpmd.

Series Acked-by: John McNamara 


[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs

2015-11-12 Thread Sergio Gonzalez Monroy
Hi,

On 12/11/2015 02:10, Jianfeng Tan wrote:
> This patch enables calculating space left in a hugetlbfs.
> There are three sources to get the information: 1. from
> sysfs; 2. from option size specified when mount; 3. use
> statfs. We should use the minimum one of these three sizes.
We could improve the message by stating the current issue (when the
hugetlbfs mount specifies size= option), then how the patch deals
with the problem and also outstanding issues.
> Signed-off-by: Jianfeng Tan 
> ---
> Changes in v2:
>   - reword title
>   - fix compiler error of v1
>
>   lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 
> -
>   1 file changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
> b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> index 18858e2..8305a58 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
> @@ -44,6 +44,8 @@
>   #include 
>   #include 
>   #include 
> +#include 
> +#include 
>   
>   #include 
>   #include 
> @@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz)
>   return retval;
>   }
>   
> +/* Caller to make sure this mnt_dir exist
> + */
> +static uint64_t
> +get_hugetlbfs_mount_size(const char *mnt_dir)
> +{
> + char *start, *end, *opt_size;
> + struct mntent *ent;
> + uint64_t size;
> + FILE *f;
> + int len;
> +
> + f = setmntent("/proc/mounts", "r");
> + if (f == NULL) {
> + RTE_LOG(ERR, EAL, "setmntent() error: %s\n",
> + strerror(errno));
> + return 0;
> + }
> + while (NULL != (ent = getmntent(f))) {
> + if (!strcmp(ent->mnt_dir, mnt_dir))
> + break;
> + }
> +
> + start = hasmntopt(ent, "size");
> + if (start == NULL) {
> + RTE_LOG(DEBUG, EAL, "option size not specified for %s\n",
> + mnt_dir);
> + size = 0;
> + goto end;
> + }
> + start += strlen("size=");
> + end = strstr(start, ",");
> + if (end != NULL)
> + len = end - start;
> + else
> + len = strlen(start);
> + opt_size = strndup(start, len);
> + size = rte_str_to_size(opt_size);
> + free(opt_size);
> +
> +end:
> + endmntent(f);
> + return size;
> +}
> +
The function above is very similar to get_hugepage_dir, ie. open and parse
/proc/mounts.
I think it would be better to have a more generic function that retrieves
all needed info from /proc/mounts.
> +/* Caller to make sure this mount has option size
> + * so that statfs is not zero.
> + */
> +static uint64_t
> +get_hugetlbfs_free_size(const char *mnt_dir)
> +{
> + int r;
> + struct statfs stats;
> +
> + r = statfs(mnt_dir, &stats);
> + if (r != 0) {
> + RTE_LOG(ERR, EAL, "statfs() error: %s\n",
> + strerror(errno));
> + return 0;
> + }
> +
> + return stats.f_bfree * stats.f_bsize;
> +}
> +
> +
>   /*
>* Clear the hugepage directory of whatever hugepage files
>* there are. Checks if the file is locked (i.e.
> @@ -329,9 +395,26 @@ eal_hugepage_info_init(void)
>   if (clear_hugedir(hpi->hugedir) == -1)
>   break;
>   
> + /* there are three souces of how much space left in a
> +  * hugetlbfs dir.
> +  */
> + uint64_t sz_left, sz_sysfs, sz_option, sz_statfs;
> +
> + sz_sysfs = get_num_hugepages(dirent->d_name) *
> + hpi->hugepage_sz;
> + sz_left = sz_sysfs;
> + sz_option = get_hugetlbfs_mount_size(hpi->hugedir);
> + if (sz_option) {
> + sz_statfs = get_hugetlbfs_free_size(hpi->hugedir);
> + sz_left = RTE_MIN(sz_sysfs, sz_statfs);
> + RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: "
> + "%"PRIu64", sz_statfs: %"PRIu64"\n",
> + sz_sysfs, sz_option, sz_statfs);
> + }
> +
>   /* for now, put all pages into socket 0,
>* later they will be sorted */
> - hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
> + hpi->num_pages[0] = sz_left / hpi->hugepage_sz;
>   
>   #ifndef RTE_ARCH_64
>   /* for 32-bit systems, limit number of hugepages to

A couple more things:
- Update release-notes and/or relevant doc about improved detection of 
free hugepages
- Update the status of previous/old patches in patchwork

Sergio


[dpdk-dev] [PATCH] fm10k: fix a crash bug when quit from testpmd

2015-11-12 Thread Chen Jing D(Mark)
From: "Chen Jing D(Mark)" 

When the fm10k port is closed, both func tx_queue_clean() and
fm10k_tx_queue_release_mbufs_vec() will try to release buffer in
SW ring. The latter func won't do sanity check on those pointers
and cause crash.

The fix include 2 parts.
1. Remove Vector TX buffer release func since it can share the
   release functions with regular TX.
2. Add log to print out what actual Rx/Tx func is used.

Signed-off-by: Chen Jing D(Mark) 
---
 drivers/net/fm10k/fm10k.h  |1 -
 drivers/net/fm10k/fm10k_ethdev.c   |   17 -
 drivers/net/fm10k/fm10k_rxtx_vec.c |   28 
 3 files changed, 12 insertions(+), 34 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 754aa6a..38d5489 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -237,7 +237,6 @@ struct fm10k_tx_queue {
 };

 struct fm10k_txq_ops {
-   void (*release_mbufs)(struct fm10k_tx_queue *txq);
void (*reset)(struct fm10k_tx_queue *txq);
 };

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index cf7ada7..af7b0c2 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -386,7 +386,6 @@ fm10k_check_mq_mode(struct rte_eth_dev *dev)
 }

 static const struct fm10k_txq_ops def_txq_ops = {
-   .release_mbufs = tx_queue_free,
.reset = tx_queue_reset,
 };

@@ -1073,7 +1072,7 @@ fm10k_dev_queue_release(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_tx_queues; i++) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[i];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
}
}

@@ -1793,7 +1792,7 @@ fm10k_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
queue_id,
if (dev->data->tx_queues[queue_id] != NULL) {
struct fm10k_tx_queue *txq = dev->data->tx_queues[queue_id];

-   txq->ops->release_mbufs(txq);
+   tx_queue_free(txq);
dev->data->tx_queues[queue_id] = NULL;
}

@@ -1872,7 +1871,7 @@ fm10k_tx_queue_release(void *queue)
struct fm10k_tx_queue *q = queue;
PMD_INIT_FUNC_TRACE();

-   q->ops->release_mbufs(q);
+   tx_queue_free(q);
 }

 static int
@@ -2439,13 +2438,16 @@ fm10k_set_tx_function(struct rte_eth_dev *dev)
}

if (use_sse) {
+   PMD_INIT_LOG(ERR, "Use vector Tx func");
for (i = 0; i < dev->data->nb_tx_queues; i++) {
txq = dev->data->tx_queues[i];
fm10k_txq_vec_setup(txq);
}
dev->tx_pkt_burst = fm10k_xmit_pkts_vec;
-   } else
+   } else {
dev->tx_pkt_burst = fm10k_xmit_pkts;
+   PMD_INIT_LOG(ERR, "Use regular Tx func");
+   }
 }

 static void __attribute__((cold))
@@ -2469,6 +2471,11 @@ fm10k_set_rx_function(struct rte_eth_dev *dev)
(dev->rx_pkt_burst == fm10k_recv_scattered_pkts_vec ||
dev->rx_pkt_burst == fm10k_recv_pkts_vec);

+   if (rx_using_sse)
+   PMD_INIT_LOG(ERR, "Use vector Rx func");
+   else
+   PMD_INIT_LOG(ERR, "Use regular Rx func");
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct fm10k_rx_queue *rxq = dev->data->rx_queues[i];

diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 06beca9..6042568 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -45,8 +45,6 @@
 #endif

 static void
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq);
-static void
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq);

 /* Handling the offload flags (olflags) field takes computation
@@ -634,7 +632,6 @@ fm10k_recv_scattered_pkts_vec(void *rx_queue,
 }

 static const struct fm10k_txq_ops vec_txq_ops = {
-   .release_mbufs = fm10k_tx_queue_release_mbufs_vec,
.reset = fm10k_reset_tx_queue,
 };

@@ -795,31 +792,6 @@ fm10k_xmit_pkts_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 }

 static void __attribute__((cold))
-fm10k_tx_queue_release_mbufs_vec(struct fm10k_tx_queue *txq)
-{
-   unsigned i;
-   const uint16_t max_desc = (uint16_t)(txq->nb_desc - 1);
-
-   if (txq->sw_ring == NULL || txq->nb_free == max_desc)
-   return;
-
-   /* release the used mbufs in sw_ring */
-   for (i = txq->next_dd - (txq->rs_thresh - 1);
-i != txq->next_free;
-i = (i + 1) & max_desc)
-   rte_pktmbuf_free_seg(txq->sw_ring[i]);
-
-   txq->nb_free = max_desc;
-
-   /* reset tx_entry */
-   for (i = 0; i < txq->nb_desc; i++)
-   txq->sw_ring[i] = NULL;
-
-   rte_free(txq->sw_ring);
-   txq->sw_ring = NULL;
-}
-
-static void __attribute__((cold))
 fm10k_reset_tx_queue(struct fm10k_tx_queue *txq)
 {
static const st

[dpdk-dev] [PATCH v6 8/8] doc: add a ptpclient sample guide

2015-11-12 Thread Pablo de Lara
From: Daniel Mrzyglod 

Add a sample app guide for the ptpclient application.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 doc/guides/sample_app_ug/img/ptpclient.svg | 524 +
 doc/guides/sample_app_ug/index.rst |   3 +
 doc/guides/sample_app_ug/ptpclient.rst | 306 +
 3 files changed, 833 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/img/ptpclient.svg
 create mode 100644 doc/guides/sample_app_ug/ptpclient.rst

diff --git a/doc/guides/sample_app_ug/img/ptpclient.svg 
b/doc/guides/sample_app_ug/img/ptpclient.svg
new file mode 100644
index 000..84f9c22
--- /dev/null
+++ b/doc/guides/sample_app_ug/img/ptpclient.svg
@@ -0,0 +1,524 @@
+
+
+
+http://purl.org/dc/elements/1.1/";
+   xmlns:cc="http://creativecommons.org/ns#";
+   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
+   xmlns:svg="http://www.w3.org/2000/svg";
+   xmlns="http://www.w3.org/2000/svg";
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd";
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape";
+   width="105mm"
+   height="148mm"
+   viewBox="0 0 372.04724 524.40945"
+   id="svg2"
+   version="1.1"
+   inkscape:version="0.91 r13725"
+   sodipodi:docname="ptpclient.svg">
+  
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+
+
+
+
+
+
+  
+  
+  
+
+  
+image/svg+xml
+http://purl.org/dc/dcmitype/StillImage"; />
+
+  
+
+  
+  
+
+
+
+
+T2
+FOLLOW UP:T1
+
+DELAY REQUEST
+T3
+T4
+T1
+
+DELAY RESPONSE:T4
+time
+
+master
+
+slave
+SYNC
+  
+
diff --git a/doc/guides/sample_app_ug/index.rst 
b/doc/guides/sample_app_ug/index.rst
index 9beedd9..8ae86c0 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -73,6 +73,7 @@ Sample Applications User Guide
 vm_power_management
 tep_termination
 proc_info
+ptpclient

 **Figures**

@@ -136,6 +137,8 @@ Sample Applications User Guide
 :numref:`figure_overlay_networking` :ref:`figure_overlay_networking`
 :numref:`figure_tep_termination_arch` :ref:`figure_tep_termination_arch`

+:numref:`figure_ptpclient_highlevel` :ref:`figure_ptpclient_highlevel`
+
 **Tables**

 :numref:`table_qos_metering_1` :ref:`table_qos_metering_1`
diff --git a/doc/guides/sample_app_ug/ptpclient.rst 
b/doc/guides/sample_app_ug/ptpclient.rst
new file mode 100644
index 000..6e425b7
--- /dev/null
+++ b/doc/guides/sample_app_ug/ptpclient.rst
@@ -0,0 +1,306 @@
+..  BSD LICENSE
+Copyright(c) 2015 Intel Corporation. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+PTP Client Sample Application
+=
+
+The PTP (Precision Time Protocol) client sample application is a simple
+example of using the DPDK IEEE1588 API to communicate with a PTP master clock
+to synchronize the time on the NIC and, optionally, on the Linux system.
+
+Note, PTP is a time syncing protocol and cannot be used within DPDK as a
+time-stamping mechanism. See the following for an explanation of th

[dpdk-dev] [PATCH v6 7/8] example: minimal ptp client implementation

2015-11-12 Thread Pablo de Lara
From: Daniel Mrzyglod 

Add a sample application that acts as a PTP slave using the
DPDK ieee1588 functions.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 MAINTAINERS|   4 +
 examples/Makefile  |   1 +
 examples/ptpclient/Makefile|  56 +++
 examples/ptpclient/ptpclient.c | 780 +
 4 files changed, 841 insertions(+)
 create mode 100644 examples/ptpclient/Makefile
 create mode 100644 examples/ptpclient/ptpclient.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c8be5d2..28b04ae 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -520,3 +520,7 @@ F: examples/tep_termination/
 F: examples/vmdq/
 F: examples/vmdq_dcb/
 F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+
+M: Pablo de Lara 
+M: Daniel Mrzyglod 
+F: examples/ptpclient
diff --git a/examples/Makefile b/examples/Makefile
index b4eddbd..4672534 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -74,5 +74,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += vhost_xen
 DIRS-y += vmdq
 DIRS-y += vmdq_dcb
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += vm_power_manager
+DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient

 include $(RTE_SDK)/mk/rte.extsubdir.mk
diff --git a/examples/ptpclient/Makefile b/examples/ptpclient/Makefile
new file mode 100644
index 000..b77cf71
--- /dev/null
+++ b/examples/ptpclient/Makefile
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2015 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriddegitn by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = ptpclient
+
+# all source are stored in SRCS-y
+SRCS-y := ptpclient.c
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c
new file mode 100644
index 000..0af4f3b
--- /dev/null
+++ b/examples/ptpclient/ptpclient.c
@@ -0,0 +1,780 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *

[dpdk-dev] [PATCH v6 6/8] testpmd: add nanosecond output for ieee1588 fwd

2015-11-12 Thread Pablo de Lara
Testpmd was only printing out second values when printing
RX/TX timestamp value, instead of both second and nanoseconds.
Since resolution of time counters is in nanoseconds,
testpmd should print out both.

Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 app/test-pmd/ieee1588fwd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/ieee1588fwd.c b/app/test-pmd/ieee1588fwd.c
index b1a301b..c69023a 100644
--- a/app/test-pmd/ieee1588fwd.c
+++ b/app/test-pmd/ieee1588fwd.c
@@ -89,8 +89,8 @@ port_ieee1588_rx_timestamp_check(portid_t pi, uint32_t index)
   (unsigned) pi);
return;
}
-   printf("Port %u RX timestamp value %lu\n",
-  (unsigned) pi, timestamp.tv_sec);
+   printf("Port %u RX timestamp value %lu s %lu ns\n",
+  (unsigned) pi, timestamp.tv_sec, timestamp.tv_nsec);
 }

 #define MAX_TX_TMST_WAIT_MICROSECS 1000 /**< 1 milli-second */
@@ -112,9 +112,9 @@ port_ieee1588_tx_timestamp_check(portid_t pi)
   (unsigned) pi, (unsigned) MAX_TX_TMST_WAIT_MICROSECS);
return;
}
-   printf("Port %u TX timestamp value %lu validated after "
+   printf("Port %u TX timestamp value %lu s %lu ns validated after "
   "%u micro-second%s\n",
-  (unsigned) pi, timestamp.tv_sec, wait_us,
+  (unsigned) pi, timestamp.tv_sec, timestamp.tv_nsec, wait_us,
   (wait_us == 1) ? "" : "s");
 }

-- 
1.8.1.4



[dpdk-dev] [PATCH v6 5/8] i40e: add additional ieee1588 support functions

2015-11-12 Thread Pablo de Lara
Add additional functions to support the existing IEEE1588
functionality and to enable getting, setting and adjusting
the device time.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 drivers/net/i40e/i40e_ethdev.c | 147 +++--
 drivers/net/i40e/i40e_ethdev.h |   6 +-
 2 files changed, 132 insertions(+), 21 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index ddf3d38..d6b3311 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -125,11 +125,13 @@
(1UL << RTE_ETH_FLOW_NONFRAG_IPV6_OTHER) | \
(1UL << RTE_ETH_FLOW_L2_PAYLOAD))

-#define I40E_PTP_40GB_INCVAL  0x01ULL
-#define I40E_PTP_10GB_INCVAL  0x03ULL
-#define I40E_PTP_1GB_INCVAL   0x20ULL
-#define I40E_PRTTSYN_TSYNENA  0x8000
-#define I40E_PRTTSYN_TSYNTYPE 0x0e00
+/* Additional timesync values. */
+#define I40E_PTP_40GB_INCVAL 0x01ULL
+#define I40E_PTP_10GB_INCVAL 0x03ULL
+#define I40E_PTP_1GB_INCVAL  0x20ULL
+#define I40E_PRTTSYN_TSYNENA 0x8000
+#define I40E_PRTTSYN_TSYNTYPE0x0e00
+#define I40E_CYCLECOUNTER_MASK   0x

 #define I40E_MAX_PERCENT100
 #define I40E_DEFAULT_DCB_APP_NUM1
@@ -400,11 +402,20 @@ static int i40e_timesync_read_rx_timestamp(struct 
rte_eth_dev *dev,
 static int i40e_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
   struct timespec *timestamp);
 static void i40e_read_stats_registers(struct i40e_pf *pf, struct i40e_hw *hw);
+
+static int i40e_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta);
+
+static int i40e_timesync_read_time(struct rte_eth_dev *dev,
+  struct timespec *timestamp);
+static int i40e_timesync_write_time(struct rte_eth_dev *dev,
+   const struct timespec *timestamp);
+
 static int i40e_dev_rx_queue_intr_enable(struct rte_eth_dev *dev,
 uint16_t queue_id);
 static int i40e_dev_rx_queue_intr_disable(struct rte_eth_dev *dev,
  uint16_t queue_id);

+
 static const struct rte_pci_id pci_id_i40e_map[] = {
 #define RTE_PCI_DEV_ID_DECL_I40E(vend, dev) {RTE_PCI_DEVICE(vend, dev)},
 #include "rte_pci_dev_ids.h"
@@ -469,6 +480,9 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.timesync_read_rx_timestamp   = i40e_timesync_read_rx_timestamp,
.timesync_read_tx_timestamp   = i40e_timesync_read_tx_timestamp,
.get_dcb_info = i40e_dev_get_dcb_info,
+   .timesync_adjust_time = i40e_timesync_adjust_time,
+   .timesync_read_time   = i40e_timesync_read_time,
+   .timesync_write_time  = i40e_timesync_write_time,
 };

 /* store statistics names and its offset in stats structure */
@@ -7738,17 +7752,36 @@ i40e_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t 
sw_id)
return 0;
 }

-static int
-i40e_timesync_enable(struct rte_eth_dev *dev)
+static uint64_t
+i40e_read_cyclecounter(void *arg)
 {
+   struct rte_eth_dev *dev = (struct rte_eth_dev *) arg;
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct rte_eth_link *link = &dev->data->dev_link;
-   uint32_t tsync_ctl_l;
-   uint32_t tsync_ctl_h;
+   uint64_t systim_cycles = 0;
+
+   systim_cycles |= (uint64_t)I40E_READ_REG(hw, I40E_PRTTSYN_TIME_L);
+   systim_cycles |= (uint64_t)I40E_READ_REG(hw, I40E_PRTTSYN_TIME_H)
+   << 32;
+
+   return systim_cycles;
+}
+
+static void
+i40e_start_cyclecounter(struct rte_eth_dev *dev)
+{
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_adapter *adapter =
+   (struct i40e_adapter *)dev->data->dev_private;
+   struct rte_eth_link link;
uint32_t tsync_inc_l;
uint32_t tsync_inc_h;

-   switch (link->link_speed) {
+   /* Get current link speed. */
+   memset(&link, 0, sizeof(link));
+   i40e_dev_link_update(dev, 1);
+   rte_i40e_dev_atomic_read_link_status(dev, &link);
+
+   switch (link.link_speed) {
case ETH_LINK_SPEED_40G:
tsync_inc_l = I40E_PTP_40GB_INCVAL & 0x;
tsync_inc_h = I40E_PTP_40GB_INCVAL >> 32;
@@ -7766,6 +7799,72 @@ i40e_timesync_enable(struct rte_eth_dev *dev)
tsync_inc_h = 0x0;
}

+   /* Set the timesync increment value. */
+   I40E_WRITE_REG(hw, I40E_PRTTSYN_INC_L, tsync_inc_l);
+   I40E_WRITE_REG(hw, I40E_PRTTSYN_INC_H, tsync_inc_h);
+
+   memset(&adapter->tc, 0, sizeof(struct rte_timecounter));
+   adapter->tc.read = i40e_read_cyclecounter;
+   adapter->tc.cc_mask = I40E_CYCLECOUNTER_MASK;
+   adapter->tc.cc_shift = 0;
+   adapter->tc.arg = dev;
+}
+
+static int
+i40e_timesyn

[dpdk-dev] [PATCH v6 4/8] igb: add additional ieee1588 support functions

2015-11-12 Thread Pablo de Lara
Add additional functions to support the existing IEEE1588
functionality and to enable getting, setting and adjusting
the device time.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 drivers/net/e1000/e1000_ethdev.h |   2 +
 drivers/net/e1000/igb_ethdev.c   | 202 +--
 2 files changed, 194 insertions(+), 10 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index a667a1a..5401277 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -33,6 +33,7 @@

 #ifndef _E1000_ETHDEV_H_
 #define _E1000_ETHDEV_H_
+#include 

 /* need update link, bit flag */
 #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
@@ -257,6 +258,7 @@ struct e1000_adapter {
struct e1000_vf_info*vfdata;
struct e1000_filter_info filter;
bool stopped;
+   struct rte_timecounter tc;
 };

 #define E1000_DEV_PRIVATE(adapter) \
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 2cb115c..ec2e79c 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -78,10 +78,11 @@
 #define IGB_8_BIT_MASK   UINT8_MAX

 /* Additional timesync values. */
-#define E1000_ETQF_FILTER_1588 3
-#define E1000_TIMINCA_INCVALUE 1600
-#define E1000_TIMINCA_INIT ((0x02 << E1000_TIMINCA_16NS_SHIFT) \
-   | E1000_TIMINCA_INCVALUE)
+#define E1000_CYCLECOUNTER_MASK  0x
+#define E1000_ETQF_FILTER_1588   3
+#define IGB_82576_TSYNC_SHIFT16
+#define E1000_INCPERIOD_82576(1 << E1000_TIMINCA_16NS_SHIFT)
+#define E1000_INCVALUE_82576 (16 << IGB_82576_TSYNC_SHIFT)
 #define E1000_TSAUXC_DISABLE_SYSTIME 0x8000

 static int  eth_igb_configure(struct rte_eth_dev *dev);
@@ -236,6 +237,11 @@ static int igb_timesync_read_rx_timestamp(struct 
rte_eth_dev *dev,
  uint32_t flags);
 static int igb_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
  struct timespec *timestamp);
+static int igb_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta);
+static int igb_timesync_read_time(struct rte_eth_dev *dev,
+ struct timespec *timestamp);
+static int igb_timesync_write_time(struct rte_eth_dev *dev,
+  const struct timespec *timestamp);
 static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev,
uint16_t queue_id);
 static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev,
@@ -349,6 +355,9 @@ static const struct eth_dev_ops eth_igb_ops = {
.get_eeprom_length= eth_igb_get_eeprom_length,
.get_eeprom   = eth_igb_get_eeprom,
.set_eeprom   = eth_igb_set_eeprom,
+   .timesync_adjust_time = igb_timesync_adjust_time,
+   .timesync_read_time   = igb_timesync_read_time,
+   .timesync_write_time  = igb_timesync_write_time,
 };

 /*
@@ -4182,20 +4191,151 @@ eth_igb_set_mc_addr_list(struct rte_eth_dev *dev,
return 0;
 }

+static uint64_t
+igb_read_cyclecounter(void *arg)
+{
+   struct rte_eth_dev *dev = (struct rte_eth_dev *) arg;
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint64_t systime_cycles = 0;
+
+   switch (hw->mac.type) {
+   case e1000_i210:
+   case e1000_i211:
+   /*
+* Need to read System Time Residue Register to be able
+* to read the other two registers.
+*/
+   E1000_READ_REG(hw, E1000_SYSTIMR);
+   /* SYSTIMEL stores ns and SYSTIMEH stores seconds. */
+   systime_cycles = (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML);
+   systime_cycles += (uint64_t)E1000_READ_REG(hw, E1000_SYSTIMH)
+   * NSEC_PER_SEC;
+   break;
+   case e1000_82580:
+   case e1000_i350:
+   case e1000_i354:
+   /*
+* Need to read System Time Residue Register to be able
+* to read the other two registers.
+*/
+   E1000_READ_REG(hw, E1000_SYSTIMR);
+   systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML);
+   /* Only the 8 LSB are valid. */
+   systime_cycles |= (uint64_t)(E1000_READ_REG(hw, E1000_SYSTIMH)
+   & 0xff) << 32;
+   break;
+   default:
+   systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIML);
+   systime_cycles |= (uint64_t)E1000_READ_REG(hw, E1000_SYSTIMH)
+   << 32;
+   break;
+   }
+
+   return systime_cycles;
+}
+
+static void
+igb_start_cyclecounter(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+  

[dpdk-dev] [PATCH v6 3/8] ixgbe: add additional ieee1588 support functions

2015-11-12 Thread Pablo de Lara
From: Daniel Mrzyglod 

Add additional functions to support the existing IEEE1588
functionality and to enable getting, setting and adjusting
the device time.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 187 ---
 drivers/net/ixgbe/ixgbe_ethdev.h |   2 +
 2 files changed, 178 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0b0bbcf..91a903d 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -126,10 +126,17 @@
 #define IXGBE_HKEY_MAX_INDEX 10

 /* Additional timesync values. */
-#define IXGBE_TIMINCA_16NS_SHIFT 24
-#define IXGBE_TIMINCA_INCVALUE   1600
-#define IXGBE_TIMINCA_INIT   ((0x02 << IXGBE_TIMINCA_16NS_SHIFT) \
- | IXGBE_TIMINCA_INCVALUE)
+#define NSEC_PER_SEC 10L
+#define IXGBE_INCVAL_10GB0x
+#define IXGBE_INCVAL_1GB 0x4000
+#define IXGBE_INCVAL_100 0x5000
+#define IXGBE_INCVAL_SHIFT_10GB  28
+#define IXGBE_INCVAL_SHIFT_1GB   24
+#define IXGBE_INCVAL_SHIFT_100   21
+#define IXGBE_INCVAL_SHIFT_82599 7
+#define IXGBE_INCPER_SHIFT_82599 24
+
+#define IXGBE_CYCLECOUNTER_MASK   0x

 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
@@ -325,6 +332,11 @@ static int ixgbe_timesync_read_rx_timestamp(struct 
rte_eth_dev *dev,
uint32_t flags);
 static int ixgbe_timesync_read_tx_timestamp(struct rte_eth_dev *dev,
struct timespec *timestamp);
+static int ixgbe_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta);
+static int ixgbe_timesync_read_time(struct rte_eth_dev *dev,
+  struct timespec *timestamp);
+static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
+  const struct timespec *timestamp);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -480,6 +492,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.get_eeprom   = ixgbe_get_eeprom,
.set_eeprom   = ixgbe_set_eeprom,
.get_dcb_info = ixgbe_dev_get_dcb_info,
+   .timesync_adjust_time = ixgbe_timesync_adjust_time,
+   .timesync_read_time   = ixgbe_timesync_read_time,
+   .timesync_write_time  = ixgbe_timesync_write_time,
 };

 /*
@@ -5608,20 +5623,147 @@ ixgbe_dev_set_mc_addr_list(struct rte_eth_dev *dev,
 ixgbe_dev_addr_list_itr, TRUE);
 }

+static uint64_t
+ixgbe_read_cyclecounter(void *arg)
+{
+   struct rte_eth_dev *dev = (struct rte_eth_dev *) arg;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint64_t systime_cycles = 0;
+
+   switch (hw->mac.type) {
+   case ixgbe_mac_X550:
+   /* SYSTIMEL stores ns and SYSTIMEH stores seconds. */
+   systime_cycles = (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML);
+   systime_cycles += (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH)
+   * NSEC_PER_SEC;
+   break;
+   default:
+   systime_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIML);
+   systime_cycles |= (uint64_t)IXGBE_READ_REG(hw, IXGBE_SYSTIMH)
+   << 32;
+   }
+
+   return systime_cycles;
+}
+
+static void
+ixgbe_start_cyclecounter(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_adapter *adapter =
+   (struct ixgbe_adapter *)dev->data->dev_private;
+   struct rte_eth_link link;
+   uint32_t incval = 0;
+   uint32_t shift = 0;
+
+   /* Get current link speed. */
+   memset(&link, 0, sizeof(link));
+   ixgbe_dev_link_update(dev, 1);
+   rte_ixgbe_dev_atomic_read_link_status(dev, &link);
+
+   switch (link.link_speed) {
+   case ETH_LINK_SPEED_100:
+   incval = IXGBE_INCVAL_100;
+   shift = IXGBE_INCVAL_SHIFT_100;
+   break;
+   case ETH_LINK_SPEED_1000:
+   incval = IXGBE_INCVAL_1GB;
+   shift = IXGBE_INCVAL_SHIFT_1GB;
+   break;
+   case ETH_LINK_SPEED_1:
+   default:
+   incval = IXGBE_INCVAL_10GB;
+   shift = IXGBE_INCVAL_SHIFT_10GB;
+   break;
+   }
+
+   switch (hw->mac.type) {
+   case ixgbe_mac_X550:
+   /* Independent of link speed. */
+   incval = 1;
+   /* Cycles read will be interpreted as ns. */
+   shift = 0;
+   /* Fall-through */
+   case ixgbe_mac_X540:
+   IXGBE_WRITE_REG(hw, IXGBE_TIMINCA, incval);
+   break;
+  

[dpdk-dev] [PATCH v6 2/8] eal: add common time structures and functions

2015-11-12 Thread Pablo de Lara
From: Daniel Mrzyglod 

Add common functions and structures to handle time, and cycle counts
which will be used for PTP processing.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 lib/librte_eal/common/Makefile   |   2 +-
 lib/librte_eal/common/include/rte_time.h | 210 +++
 2 files changed, 211 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/common/include/rte_time.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0c43d6a..8508473 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -40,7 +40,7 @@ INC += rte_string_fns.h rte_version.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
 INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
-INC += rte_malloc.h
+INC += rte_malloc.h rte_time.h

 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_time.h 
b/lib/librte_eal/common/include/rte_time.h
new file mode 100644
index 000..33f3038
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_time.h
@@ -0,0 +1,210 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define NSEC_PER_SEC 10L
+
+/**
+ * @internal
+ *
+ * Structure to hold the parameters of a running cycle counter to assist
+ * in converting cycles to nanoseconds.
+ */
+struct rte_timecounter {
+   /** Last cycle counter value read. */
+   uint64_t cycle_last;
+   /** Nanoseconds count. */
+   uint64_t nsec;
+   /** Bitmask separating nanosecond and sub-nanoseconds. */
+   uint64_t nsec_mask;
+   /** Sub-nanoseconds count. */
+   uint64_t nsec_frac;
+   /** Reads the current cycle counter value. */
+   uint64_t (*read)(void *arg);
+   /** Bitmask for two's complement subtraction of non-64 bit counters. */
+   uint64_t cc_mask;
+   /** Cycle to nanosecond divisor (power of two). */
+   uint32_t cc_shift;
+   /** Argument of read() function pointer. */
+   void *arg;
+};
+
+/**
+ * @internal
+ *
+ * Initialize the rte_timecounter structure.
+ */
+static inline void
+rte_timecounter_init(struct rte_timecounter *tc, uint64_t start_time)
+{
+   tc->cycle_last = tc->read(tc->arg);
+   tc->nsec = start_time;
+   tc->nsec_mask = (1ULL << tc->cc_shift) - 1;
+   tc->nsec_frac = 0;
+}
+
+/**
+ * @internal
+ *
+ * Converts cyclecounter cycles to nanoseconds.
+ */
+static inline uint64_t
+rte_cyclecounter_cycles_to_ns(uint64_t cycles, uint64_t *frac,
+ uint32_t shift, uint64_t mask)
+{
+   uint64_t ns;
+
+   /* Add fractional nanoseconds. */
+   ns = cycles + *frac;
+   *frac = ns & mask;
+
+   /* Shift to get only nanoseconds. */
+   return ns >> shift;
+}
+
+/**
+ * @internal
+ *
+ * Similar to rte_cyclecounter_cycles_to_ns(), but this is used when computing
+ * a time previous to the time stored in the cycle counter.
+ */
+static inline uint64_t
+rte_cyclecounter_cycles_to_ns_previous(uint64_t cycles, uint64_t frac,
+  uint32_t shift)
+{
+   return ((cycles - frac) >> shift);
+}
+
+/**
+ * @internal
+ *
+ * Converts cycle units into nanoseconds and adds to the previ

[dpdk-dev] [PATCH v6 1/8] ethdev: add additional ieee1588 support functions

2015-11-12 Thread Pablo de Lara
From: Daniel Mrzyglod 

Add additional functions to support the existing IEEE1588
functionality.

  * rte_eth_timesync_write_time():  set the device clock time.
  * rte_eth_timesync_read_time():   get the device clock time.
  * rte_eth_timesync_adjust_time(): adjust the device clock time.

Signed-off-by: Daniel Mrzyglod 
Signed-off-by: Pablo de Lara 
Reviewed-by: John McNamara 
---
 doc/guides/rel_notes/release_2_2.rst   |  4 ++
 lib/librte_ether/rte_ethdev.c  | 36 +
 lib/librte_ether/rte_ethdev.h  | 71 ++
 lib/librte_ether/rte_ether_version.map |  3 ++
 4 files changed, 114 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 59dda59..2ef6c29 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -94,6 +94,10 @@ New Features

 * **Added port hotplug support to xenvirt.**

+* **Added API in in ethdev to support IEEE1588.**
+
+  Added functions to read and write and adjust system time in the NIC.
+

 Resolved Issues
 ---
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index e0e1dca..daca6fa 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3193,6 +3193,42 @@ rte_eth_timesync_read_tx_timestamp(uint8_t port_id, 
struct timespec *timestamp)
 }

 int
+rte_eth_timesync_adjust_time(uint8_t port_id, int64_t delta)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_adjust_time, -ENOTSUP);
+   return (*dev->dev_ops->timesync_adjust_time)(dev, delta);
+}
+
+int
+rte_eth_timesync_read_time(uint8_t port_id, struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_read_time, -ENOTSUP);
+   return (*dev->dev_ops->timesync_read_time)(dev, timestamp);
+}
+
+int
+rte_eth_timesync_write_time(uint8_t port_id, const struct timespec *timestamp)
+{
+   struct rte_eth_dev *dev;
+
+   VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->timesync_write_time, -ENOTSUP);
+   return (*dev->dev_ops->timesync_write_time)(dev, timestamp);
+}
+
+int
 rte_eth_dev_get_reg_length(uint8_t port_id)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 48a540d..b7be4b8 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1206,6 +1206,17 @@ typedef int (*eth_timesync_read_tx_timestamp_t)(struct 
rte_eth_dev *dev,
struct timespec *timestamp);
 /**< @internal Function used to read a TX IEEE1588/802.1AS timestamp. */

+typedef int (*eth_timesync_adjust_time)(struct rte_eth_dev *dev, int64_t);
+/**< @internal Function used to adjust the device clock */
+
+typedef int (*eth_timesync_read_time)(struct rte_eth_dev *dev,
+ struct timespec *timestamp);
+/**< @internal Function used to get time from the device clock. */
+
+typedef int (*eth_timesync_write_time)(struct rte_eth_dev *dev,
+  const struct timespec *timestamp);
+/**< @internal Function used to get time from the device clock */
+
 typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
 /**< @internal Retrieve device register count  */

@@ -1400,6 +1411,12 @@ struct eth_dev_ops {

/** Get DCB information */
eth_get_dcb_info get_dcb_info;
+   /** Adjust the device clock.*/
+   eth_timesync_adjust_time timesync_adjust_time;
+   /** Get the device clock time. */
+   eth_timesync_read_time timesync_read_time;
+   /** Set the device clock time. */
+   eth_timesync_write_time timesync_write_time;
 };

 /**
@@ -3755,6 +3772,60 @@ extern int rte_eth_timesync_read_tx_timestamp(uint8_t 
port_id,
  struct timespec *timestamp);

 /**
+ * Adjust the timesync clock on an Ethernet device.
+ *
+ * This is usually used in conjunction with other Ethdev timesync functions to
+ * synchronize the device time using the IEEE1588/802.1AS protocol.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param delta
+ *   The adjustment in nanoseconds.
+ *
+ * @return
+ *   - 0: Success.
+ *   - -ENODEV: The port ID is invalid.
+ *   - -ENOTSUP: The function is not supported by the Ethernet driver.
+ */
+extern int rte_eth_timesync_adjust_time(uint8_t port_id, int64_t delta);
+
+/**
+ * Read the time from the timesync clock on an Ethernet device.
+ *
+ * This is usually used in conjunction with other Ethdev timesync functions to
+ * synchronize the device time using the IEEE1588/802.1AS protoco

[dpdk-dev] [PATCH v6 0/8] add sample ptp slave application

2015-11-12 Thread Pablo de Lara

Add a sample application that acts as a PTP slave using the DPDK IEEE1588
functions.

Also add some additional IEEE1588 support functions to enable getting,
setting and adjusting the device time.

V5->v6:
 - Moved common functionality for cyclecounter and time conversions
   functions to lib/librte_eal/common/include/rte_time.h, based on mailing
   list comments.
 - Prefixed functions with rte_ and added Doxygen comments.
 - Refactored cyclecounter structs from previous version to make it more
   generic.
 - Fix ieee1588 fwd output in testpmd.

V4->v5:
 - rebase to the current master

V3->V4:
Doc:
 - Update documentation for ptpclient
 - fix: put information about ptp application in correct place

V2->V3:
PMD:
 - move common structures and functions for PTP protocol to
   librte_net/rte_ptp.h

V1->V2:
PMDs:
 - add support for e1000
 - add support for ixgbe
 - add support for i40
ethdev:
 - change function names to more proper
Doc:
 - add documentation for ptpclient
sample:
 - add kernel adjustment option
 - add portmask option to provide portmask to application


Daniel Mrzyglod (5):
  ethdev: add additional ieee1588 support functions
  eal: add common time structures and functions
  ixgbe: add additional ieee1588 support functions
  doc: add a ptpclient sample guide
  example: minimal ptp client implementation

Pablo de Lara (3):
  igb: add additional ieee1588 support functions
  i40e: add additional ieee1588 support functions
  testpmd: add nanosecond output for ieee1588 fwd

 MAINTAINERS|   4 +
 app/test-pmd/ieee1588fwd.c |   8 +-
 doc/guides/rel_notes/release_2_2.rst   |   4 +
 doc/guides/sample_app_ug/img/ptpclient.svg | 524 +++
 doc/guides/sample_app_ug/index.rst |   3 +
 doc/guides/sample_app_ug/ptpclient.rst | 306 +++
 drivers/net/e1000/e1000_ethdev.h   |   2 +
 drivers/net/e1000/igb_ethdev.c | 202 +++-
 drivers/net/i40e/i40e_ethdev.c | 147 +-
 drivers/net/i40e/i40e_ethdev.h |   6 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   | 187 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h   |   2 +
 examples/Makefile  |   1 +
 examples/ptpclient/Makefile|  56 +++
 examples/ptpclient/ptpclient.c | 780 +
 lib/librte_eal/common/Makefile |   2 +-
 lib/librte_eal/common/include/rte_time.h   | 210 
 lib/librte_ether/rte_ethdev.c  |  36 ++
 lib/librte_ether/rte_ethdev.h  |  71 +++
 lib/librte_ether/rte_ether_version.map |   3 +
 20 files changed, 2507 insertions(+), 47 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/img/ptpclient.svg
 create mode 100644 doc/guides/sample_app_ug/ptpclient.rst
 create mode 100644 examples/ptpclient/Makefile
 create mode 100644 examples/ptpclient/ptpclient.c
 create mode 100644 lib/librte_eal/common/include/rte_time.h

--
1.8.1.4


[dpdk-dev] [PATCH v3 2/2] vhost: Add VHOST PMD

2015-11-12 Thread Wang, Zhihong
Hi Tetsuya,

In my test I created 2 vdev using "--vdev 
'eth_vhost0,iface=/tmp/sock0,queues=1' --vdev 
'eth_vhost1,iface=/tmp/sock1,queues=1'", and the qemu message got handled in 
wrong order.
The reason is that: 2 threads are created to handle message from 2 sockets, but 
their fds are SHARED, so each thread are reading from both sockets.

This can lead to incorrect behaviors, in my case sometimes the 
VHOST_USER_SET_MEM_TABLE got handled after VRING initialization and lead to 
destroy_device().

Detailed log as shown below: thread 69351 & 69352 are both reading fd 25. 
Thanks Yuanhan for helping debugging!


Thanks
Zhihong


-

>  debug: setting up new vq conn for fd: 23, tid: 69352
VHOST_CONFIG: new virtio connection is 25
VHOST_CONFIG: new device, handle is 0
>  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
>  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
>  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:26
>  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:27
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:28
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
>  debug: device_fh: 0: user_set_mem_table
VHOST_CONFIG: mapped region 0 fd:27 to 0x7ff6c000 sz:0xa off:0x0
VHOST_CONFIG: mapped region 1 fd:29 to 0x7ff68000 sz:0x4000 off:0xc
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
>  debug: vserver_message_handler thread id: 69352, fd: 25
VHOST_CONFIG: virtio is not ready for processing.
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
>  debug: vserver_message_handler thread id: 69351, fd: 25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:1 file:31
VHOST_CONFIG: virtio is now ready for processing.
PMD: New connection established
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM

-

> ...
> +
> +static void *vhost_driver_session(void *param __rte_unused)
> +{
> + static struct virtio_net_device_ops *vhost_ops;
> +
> + vhost_ops = rte_zmalloc(NULL, sizeof(*vhost_ops), 0);
> + if (vhost_ops == NULL)
> + rte_panic("Can't allocate memory\n");
> +
> + /* set vhost arguments */
> + vhost_ops->new_device = new_device;
> + vhost_ops->destroy_device = destroy_device;
> + if (rte_vhost_driver_pmd_callback_register(vhost_ops) < 0)
> + rte_panic("Can't register callbacks\n");
> +
> + /* start event handling */
> + rte_vhost_driver_session_start();
> +
> + rte_free(vhost_ops);
> + pthread_exit(0);
> +}
> +
> +static void vhost_driver_session_start(struct pmd_internal *internal)
> +{
> + int ret;
> +
> + ret = pthread_create(&internal->session_th,
> + NULL, vhost_driver_session, NULL);
> + if (ret)
> + rte_panic("Can't create a thread\n");
> +}
> +
> ...



[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Tan, Jianfeng


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 12, 2015 7:19 PM
> To: Tan, Jianfeng
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned
> with hugepage size
> 
> 2015-11-12 06:04, Jianfeng Tan:
> > -   alignment = region[idx].blksz;
> > -   munmap((void *)(uintptr_t)
> > -   RTE_ALIGN_FLOOR(
> > -   region[idx].mapped_address,
> alignment),
> > -   RTE_ALIGN_CEIL(
> > -   region[idx].mapped_size,
> alignment));
> > +   munmap((void *)region[idx].mapped_address,
> > +   region[idx].mapped_size);
> 
> Sorry, it does not compile for 32-bit:
> virtio-net-user.c:84:11: error: cast to pointer from integer of different size

Oops, sorry, should use (void *)(uintptr_t). I'll resend this patch.

Jianfeng


[dpdk-dev] [PATCH] maintainers: Add maintainers for enic PMD

2015-11-12 Thread Thomas Monjalon
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
>  Cisco enic
> +M: John Daley 
> +M: Sujith Sankar 
>  F: drivers/net/enic/

Welcome :)

Now as we officially have some maintainers for enic,
please could you consider writing doc/guides/nics/enic.rst?
Thanks


[dpdk-dev] [PATCH] maintainers: claim to be reviewer of virtio/vhost component

2015-11-12 Thread Thomas Monjalon
2015-11-12 12:10, Yuanhan Liu:
> Firstly, Chuangchun's email address's been invalid for a while.
> 
> Secondly, I'd like to take the responsibility to review patches
> of virtio/vhost component.
[...]
>  RedHat virtio
>  M: Huawei Xie 
> -M: Changchun Ouyang 
> +M: Yuanhan Liu 
>  F: drivers/net/virtio/
>  F: doc/guides/nics/virtio.rst
>  F: lib/librte_vhost/

Again, thanks Yuanhan for the excellent contributions
and welcome new maintainer!

Changchun, you are still welcome with a new email address
if you have some time. 




[dpdk-dev] [PATCH] vhost: reset device properly

2015-11-12 Thread Thomas Monjalon
> > Currently, we reset all fields of a device to zero when reset
> > happens, which is wrong, since for some fields like device_fh,
> > ifname, and virt_qp_nb, they should be same and be kept after
> > reset until the device is removed. And this is what's the new
> > helper function reset_device() for.
> >
> > And use rte_zmalloc() instead of rte_malloc, so that we could
> > avoid init_device(), which basically dose zero reset only so far.
> > Hence, init_device() is dropped in this patch.
> >
> > This patch also removes a hack of using the offset a specific
> > field (which is virtqueue now) inside of `virtio_net' structure
> > to do reset, which could be broken easily if someone changed the
> > field order without caution.
> >
> > Cc: Tetsuya Mukawa 
> > Cc: Xie Huawei 
> > Signed-off-by: Yuanhan Liu 
> >
> 
> I had a patch that just saved the ifname but this is much better.
> 
> Acked-by: Rich Lane 

Applied, thanks


[dpdk-dev] [PATCH] vhost: make destroy callback on VHOST_USER_RESET_OWNER

2015-11-12 Thread Thomas Monjalon
2015-11-10 10:25, Yuanhan Liu:
> On Mon, Nov 09, 2015 at 06:15:13PM -0800, Rich Lane wrote:
> > QEMU sends this message first when shutting down. There was previously no 
> > way
> > for the dataplane to know that the virtio_net instance had become unusable 
> > and
> > it would segfault when trying to do RX/TX.
> > 
> > Signed-off-by: Rich Lane 
> 
> Thanks. Even I have same patch in my patch queue (I have some other
> issues to fix), you got my ack.
> 
> Acked-by: Yuanhan Liu 

Applied, thanks


[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Thomas Monjalon
2015-11-12 06:04, Jianfeng Tan:
> - alignment = region[idx].blksz;
> - munmap((void *)(uintptr_t)
> - RTE_ALIGN_FLOOR(
> - region[idx].mapped_address, alignment),
> - RTE_ALIGN_CEIL(
> - region[idx].mapped_size, alignment));
> + munmap((void *)region[idx].mapped_address,
> + region[idx].mapped_size);

Sorry, it does not compile for 32-bit:
virtio-net-user.c:84:11: error: cast to pointer from integer of different size



[dpdk-dev] [PATCH] i40e: fix the issue of trying more VSIs for VMDq than available

2015-11-12 Thread Thomas Monjalon
2015-11-12 15:09, Helin Zhang:
> It fixes the issue of trying to allocate more VSIs for VMDq than
> hardware remaining. It adds a check of the hardware remaining
> before allocating VSIs for VMDq.
> 
> Fixes: c80707a0fd9c ("i40e: fix VMDq pool limit")
> 
> Signed-off-by: Helin Zhang 

Applied, thanks


[dpdk-dev] [PATCH] vhost: reset device properly

2015-11-12 Thread Yuanhan Liu
Currently, we reset all fields of a device to zero when reset
happens, which is wrong, since for some fields like device_fh,
ifname, and virt_qp_nb, they should be same and be kept after
reset until the device is removed. And this is what's the new
helper function reset_device() for.

And use rte_zmalloc() instead of rte_malloc, so that we could
avoid init_device(), which basically dose zero reset only so far.
Hence, init_device() is dropped in this patch.

This patch also removes a hack of using the offset a specific
field (which is virtqueue now) inside of `virtio_net' structure
to do reset, which could be broken easily if someone changed the
field order without caution.

Cc: Tetsuya Mukawa 
Cc: Xie Huawei 
Signed-off-by: Yuanhan Liu 

---

This patch is based on:

http://dpdk.org/dev/patchwork/patch/8818/
---
 lib/librte_vhost/virtio-net.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 39a6a5e..cc917da 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -204,6 +204,7 @@ cleanup_device(struct virtio_net *dev)
munmap((void *)(uintptr_t)dev->mem->mapped_address,
(size_t)dev->mem->mapped_size);
free(dev->mem);
+   dev->mem = NULL;
}

for (i = 0; i < dev->virt_qp_nb; i++) {
@@ -306,20 +307,18 @@ alloc_vring_queue_pair(struct virtio_net *dev, uint32_t 
qp_idx)
 }

 /*
- *  Initialise all variables in device structure.
+ * Reset some variables in device structure, while keeping few
+ * others untouched, such as device_fh, ifname, virt_qp_nb: they
+ * should be same unless the device is removed.
  */
 static void
-init_device(struct virtio_net *dev)
+reset_device(struct virtio_net *dev)
 {
-   int vq_offset;
uint32_t i;

-   /*
-* Virtqueues have already been malloced so
-* we don't want to set them to NULL.
-*/
-   vq_offset = offsetof(struct virtio_net, virtqueue);
-   memset(dev, 0, vq_offset);
+   dev->features = 0;
+   dev->protocol_features = 0;
+   dev->flags = 0;

for (i = 0; i < dev->virt_qp_nb; i++)
init_vring_queue_pair(dev, i);
@@ -336,7 +335,7 @@ new_device(struct vhost_device_ctx ctx)
struct virtio_net_config_ll *new_ll_dev;

/* Setup device and virtqueues. */
-   new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
+   new_ll_dev = rte_zmalloc(NULL, sizeof(struct virtio_net_config_ll), 0);
if (new_ll_dev == NULL) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to allocate memory for dev.\n",
@@ -344,9 +343,6 @@ new_device(struct vhost_device_ctx ctx)
return -1;
}

-   /* Initialise device and virtqueues. */
-   init_device(&new_ll_dev->dev);
-
new_ll_dev->next = NULL;

/* Add entry to device configuration linked list. */
@@ -430,7 +426,6 @@ static int
 reset_owner(struct vhost_device_ctx ctx)
 {
struct virtio_net *dev;
-   uint64_t device_fh;

dev = get_device(ctx);
if (dev == NULL)
@@ -439,10 +434,8 @@ reset_owner(struct vhost_device_ctx ctx)
if (dev->flags & VIRTIO_DEV_RUNNING)
notify_ops->destroy_device(dev);

-   device_fh = dev->device_fh;
cleanup_device(dev);
-   init_device(dev);
-   dev->device_fh = device_fh;
+   reset_device(dev);
return 0;
 }

-- 
1.9.0



[dpdk-dev] [PATCH] maintainers: claim to be reviewer of virtio/vhost component

2015-11-12 Thread Yuanhan Liu
Firstly, Chuangchun's email address's been invalid for a while.

Secondly, I'd like to take the responsibility to review patches
of virtio/vhost component.

Cc: Huawei Xie 
Cc: Thomas Monjalon 
Signed-off-by: Yuanhan Liu 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8be5d2..b05724a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -262,7 +262,7 @@ F: doc/guides/nics/mlx5.rst

 RedHat virtio
 M: Huawei Xie 
-M: Changchun Ouyang 
+M: Yuanhan Liu 
 F: drivers/net/virtio/
 F: doc/guides/nics/virtio.rst
 F: lib/librte_vhost/
-- 
1.9.0



[dpdk-dev] [PATCH] doc: update release notes

2015-11-12 Thread Helin Zhang
Updated release notes about adding X722 support.

Signed-off-by: Helin Zhang 
---
 doc/guides/rel_notes/release_2_2.rst | 4 
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5636aad..5811c2f 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -59,6 +59,10 @@ New Features

 * **Added flow director support in i40e VF.**

+* **Added i40e support of early X722 series.**
+
+  * Add early X722 support for evaluation only, as the hardware is in A0.
+
 * **Added fm10k vector RX/TX.**

 * **Added fm10k TSO support for both PF and VF.**
-- 
1.9.3



[dpdk-dev] [PATCH] examples/l3fwd: fix eth-dest commandline strncmp size

2015-11-12 Thread Chilikin, Andrey


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Monday, November 2, 2015 5:46 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix eth-dest commandline
> strncmp size
> 
> Fix minor, and non critical, copy and paste error in strncmp() of eth-dest
> commandline argument.
> 
> Fixes: bd785f6f6791 ("examples/l3fwd: make destination mac address
> configurable")
> 
> Signed-off-by: John McNamara 
Acked-by: Andrey Chilikin 


[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs

2015-11-12 Thread Jianfeng Tan
This patch enables calculating space left in a hugetlbfs.
There are three sources to get the information: 1. from
sysfs; 2. from option size specified when mount; 3. use
statfs. We should use the minimum one of these three sizes.

Signed-off-by: Jianfeng Tan 
---
Changes in v2:
 - reword title
 - fix compiler error of v1

 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 -
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..8305a58 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz)
return retval;
 }

+/* Caller to make sure this mnt_dir exist
+ */
+static uint64_t
+get_hugetlbfs_mount_size(const char *mnt_dir)
+{
+   char *start, *end, *opt_size;
+   struct mntent *ent;
+   uint64_t size;
+   FILE *f;
+   int len;
+
+   f = setmntent("/proc/mounts", "r");
+   if (f == NULL) {
+   RTE_LOG(ERR, EAL, "setmntent() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+   while (NULL != (ent = getmntent(f))) {
+   if (!strcmp(ent->mnt_dir, mnt_dir))
+   break;
+   }
+
+   start = hasmntopt(ent, "size");
+   if (start == NULL) {
+   RTE_LOG(DEBUG, EAL, "option size not specified for %s\n",
+   mnt_dir);
+   size = 0;
+   goto end;
+   }
+   start += strlen("size=");
+   end = strstr(start, ",");
+   if (end != NULL)
+   len = end - start;
+   else
+   len = strlen(start);
+   opt_size = strndup(start, len);
+   size = rte_str_to_size(opt_size);
+   free(opt_size);
+
+end:
+   endmntent(f);
+   return size;
+}
+
+/* Caller to make sure this mount has option size
+ * so that statfs is not zero.
+ */
+static uint64_t
+get_hugetlbfs_free_size(const char *mnt_dir)
+{
+   int r;
+   struct statfs stats;
+
+   r = statfs(mnt_dir, &stats);
+   if (r != 0) {
+   RTE_LOG(ERR, EAL, "statfs() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+
+   return stats.f_bfree * stats.f_bsize;
+}
+
+
 /*
  * Clear the hugepage directory of whatever hugepage files
  * there are. Checks if the file is locked (i.e.
@@ -329,9 +395,26 @@ eal_hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

+   /* there are three souces of how much space left in a
+* hugetlbfs dir.
+*/
+   uint64_t sz_left, sz_sysfs, sz_option, sz_statfs;
+
+   sz_sysfs = get_num_hugepages(dirent->d_name) *
+   hpi->hugepage_sz;
+   sz_left = sz_sysfs;
+   sz_option = get_hugetlbfs_mount_size(hpi->hugedir);
+   if (sz_option) {
+   sz_statfs = get_hugetlbfs_free_size(hpi->hugedir);
+   sz_left = RTE_MIN(sz_sysfs, sz_statfs);
+   RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: "
+   "%"PRIu64", sz_statfs: %"PRIu64"\n",
+   sz_sysfs, sz_option, sz_statfs);
+   }
+
/* for now, put all pages into socket 0,
 * later they will be sorted */
-   hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+   hpi->num_pages[0] = sz_left / hpi->hugepage_sz;

 #ifndef RTE_ARCH_64
/* for 32-bit systems, limit number of hugepages to
-- 
2.1.4



[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow

2015-11-12 Thread Chilikin, Andrey


> -Original Message-
> From: Wu, Jingjing
> Sent: Tuesday, November 10, 2015 3:11 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey
> Subject: [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Andrey Chilikin 


[dpdk-dev] [PATCH v2] mem: calculate space left in a hugetlbfs

2015-11-12 Thread Jianfeng Tan
This patch enables calculating space left in a hugetlbfs.
There are three sources to get the information: 1. from
sysfs; 2. from option size specified when mount; 3. use
statfs. We should use the minimum one of these three sizes.

Signed-off-by: Jianfeng Tan 
---
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 -
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..8305a58 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz)
return retval;
 }

+/* Caller to make sure this mnt_dir exist
+ */
+static uint64_t
+get_hugetlbfs_mount_size(const char *mnt_dir)
+{
+   char *start, *end, *opt_size;
+   struct mntent *ent;
+   uint64_t size;
+   FILE *f;
+   int len;
+
+   f = setmntent("/proc/mounts", "r");
+   if (f == NULL) {
+   RTE_LOG(ERR, EAL, "setmntent() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+   while (NULL != (ent = getmntent(f))) {
+   if (!strcmp(ent->mnt_dir, mnt_dir))
+   break;
+   }
+
+   start = hasmntopt(ent, "size");
+   if (start == NULL) {
+   RTE_LOG(DEBUG, EAL, "option size not specified for %s\n",
+   mnt_dir);
+   size = 0;
+   goto end;
+   }
+   start += strlen("size=");
+   end = strstr(start, ",");
+   if (end != NULL)
+   len = end - start;
+   else
+   len = strlen(start);
+   opt_size = strndup(start, len);
+   size = rte_str_to_size(opt_size);
+   free(opt_size);
+
+end:
+   endmntent(f);
+   return size;
+}
+
+/* Caller to make sure this mount has option size
+ * so that statfs is not zero.
+ */
+static uint64_t
+get_hugetlbfs_free_size(const char *mnt_dir)
+{
+   int r;
+   struct statfs stats;
+
+   r = statfs(mnt_dir, &stats);
+   if (r != 0) {
+   RTE_LOG(ERR, EAL, "statfs() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+
+   return stats.f_bfree * stats.f_bsize;
+}
+
+
 /*
  * Clear the hugepage directory of whatever hugepage files
  * there are. Checks if the file is locked (i.e.
@@ -329,9 +395,26 @@ eal_hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

+   /* there are three souces of how much space left in a
+* hugetlbfs dir.
+*/
+   uint64_t sz_left, sz_sysfs, sz_option, sz_statfs;
+
+   sz_sysfs = get_num_hugepages(dirent->d_name) *
+   hpi->hugepage_sz;
+   sz_left = sz_sysfs;
+   sz_option = get_hugetlbfs_mount_size(hpi->hugedir);
+   if (sz_option) {
+   sz_statfs = get_hugetlbfs_free_size(hpi->hugedir);
+   sz_left = RTE_MIN(sz_sysfs, sz_statfs);
+   RTE_LOG(INFO, EAL, "sz_sysfs: %"PRIu64", sz_option: "
+   "%"PRIu64", sz_statfs: %"PRIu64"\n",
+   sz_sysfs, sz_option, sz_statfs);
+   }
+
/* for now, put all pages into socket 0,
 * later they will be sorted */
-   hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+   hpi->num_pages[0] = sz_left / hpi->hugepage_sz;

 #ifndef RTE_ARCH_64
/* for 32-bit systems, limit number of hugepages to
-- 
2.1.4



[dpdk-dev] [PATCH v4 3/8] virtio/lib:add vhost TX checksum support capabilities

2015-11-12 Thread Yuanhan Liu
On Wed, Nov 11, 2015 at 09:31:14AM -0800, Stephen Hemminger wrote:
> On Wed, 11 Nov 2015 16:26:57 +0800
> Yuanhan Liu  wrote:
> 
> > On Wed, Nov 11, 2015 at 02:40:41PM +0800, Jijiang Liu wrote:
> > > Add vhost TX offload(CSUM and TSO) support capabilities.
> > 
> > Claiming first that we support something, and then actually implementing
> > in a later patch is wrong, as at this stage, we actually does not support
> > that, hence, the functionality is broken.
> > 
> > --yliu
> 
> Actually in this case it is okay to claim that driver "might" use offload
> cabability but never do it.

But it will not work once it does use it, right?

--yliu

> But agree in general better to keep both together.


[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs

2015-11-12 Thread Stephen Hemminger
On Thu, 12 Nov 2015 08:17:57 +0800
Jianfeng Tan  wrote:

> This patch enables calculating space left in a hugetlbfs.
> There are three sources to get the information: 1. from
> sysfs; 2. from option size specified when mount; 3. use
> statfs. We should use the minimum one of these three sizes.
> 
> Signed-off-by: Jianfeng Tan 

Thanks, the hugetlbfs usage up until now has been rather brute force.
I wonder if long term it might be better to defer all this stuff
to another library like libhugetlbfs.
 https://github.com/libhugetlbfs/libhugetlbfs

Especially wen dealing with other architectures it might provide
some nice abstraction.


[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs

2015-11-12 Thread Jianfeng Tan
This patch enables calculating space left in a hugetlbfs.
There are three sources to get the information: 1. from
sysfs; 2. from option size specified when mount; 3. use
statfs. We should use the minimum one of these three sizes.

Signed-off-by: Jianfeng Tan 
---
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 85 -
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c 
b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 18858e2..6db8c33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -189,6 +191,70 @@ get_hugepage_dir(uint64_t hugepage_sz)
return retval;
 }

+/* Caller to make sure this mnt_dir exist
+ */
+static uint64_t
+get_hugetlbfs_mount_size(const char *mnt_dir)
+{
+   char *start, *end, *opt_size;
+   struct mntent *ent;
+   uint64_t size;
+   FILE *f;
+   int len;
+
+   f = setmntent("/proc/mounts", "r");
+   if (f == NULL) {
+   RTE_LOG(ERR, EAL, "setmntent() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+   while (NULL != (ent = getmntent(f))) {
+   if (!strcmp(ent->mnt_dir, mnt_dir))
+   break;
+   }
+
+   start = hasmntopt(ent, "size");
+   if (start == NULL) {
+   RTE_LOG(DEBUG, EAL, "option size not specified for %s\n",
+   mnt_dir);
+   size = 0;
+   goto end;
+   }
+   start += strlen("size=");
+   end = strstr(start, ",");
+   if (end != NULL)
+   len = end - start;
+   else
+   len = strlen(start);
+   opt_size = strndup(start, len);
+   size = rte_str_to_size(opt_size);
+   free(opt_size);
+
+end:
+   endmntent(f);
+   return size;
+}
+
+/* Caller to make sure this mount has option size
+ * so that statfs is not zero.
+ */
+static uint64_t
+get_hugetlbfs_free_size(const char *mnt_dir)
+{
+   int r;
+   struct statfs stats;
+
+   r = statfs(mnt_dir, &stats);
+   if (r != 0) {
+   RTE_LOG(ERR, EAL, "statfs() error: %s\n",
+   strerror(errno));
+   return 0;
+   }
+
+   return stats.f_bfree * stats.f_bsize;
+}
+
+
 /*
  * Clear the hugepage directory of whatever hugepage files
  * there are. Checks if the file is locked (i.e.
@@ -329,9 +395,26 @@ eal_hugepage_info_init(void)
if (clear_hugedir(hpi->hugedir) == -1)
break;

+   /* there are three souces of how much space left in a
+* hugetlbfs dir.
+*/
+   uint64_t sz_left, sz_sysfs, sz_option, sz_statfs;
+
+   sz_sysfs = get_num_hugepages(dirent->d_name) *
+   hpi->hugepage_sz;
+   sz_left = sz_sysfs;
+   sz_option = get_hugetlbfs_mount_size(hpi->hugedir);
+   if (sz_option) {
+   sz_statfs = get_hugetlbfs_free_size(hpi->hugedir);
+   sz_left = RTE_MIN(sz_sysfs, sz_statfs);
+   RTE_LOG(INFO, "sz_sysfs: %"PRIu64", sz_option: "
+   "%"PRIu64", sz_statfs: %"PRIu64"\n",
+   sz_sysfs, sz_option, sz_statfs);
+   }
+
/* for now, put all pages into socket 0,
 * later they will be sorted */
-   hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+   hpi->num_pages[0] = sz_left / hpi->hugepage_sz;

 #ifndef RTE_ARCH_64
/* for 32-bit systems, limit number of hugepages to
-- 
2.1.4



[dpdk-dev] [PATCH] mem: fix how to calculate space left in a hugetlbfs

2015-11-12 Thread De Lara Guarch, Pablo
Hi Jianfeng,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jianfeng Tan
> Sent: Thursday, November 12, 2015 12:18 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] mem: fix how to calculate space left in a
> hugetlbfs
> 
> This patch enables calculating space left in a hugetlbfs.
> There are three sources to get the information: 1. from
> sysfs; 2. from option size specified when mount; 3. use
> statfs. We should use the minimum one of these three sizes.
> 
> Signed-off-by: Jianfeng Tan 

You should reword the title of the patch, as this does not look like a fix.


[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Xie, Huawei
On 11/12/2015 1:04 PM, Tan, Jianfeng wrote:
> This patch fixes a bug under lower version linux kernel, mmap()
> fails when length is not aligned with hugepage size. mmap()
> without flag of MAP_ANONYMOUS, should be called with length
> argument aligned with hugepagesz at older longterm version
> Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
> This bug was fixed in Linux kernel by commit:
> dab2d3dc45ae7343216635d981d43637e1cb7d45
> To avoid failure, make sure in caller to keep length aligned.
>
> Signed-off-by: Jianfeng Tan 
Acked-by: Huawei Xie 

Next time please add --in-reply-to with original message id.
> ---
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 36 
> ---
>  1 file changed, 21 insertions(+), 15 deletions(-)
>


[dpdk-dev] [PATCH v2] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Jianfeng Tan
This patch fixes a bug under lower version linux kernel, mmap()
fails when length is not aligned with hugepage size. mmap()
without flag of MAP_ANONYMOUS, should be called with length
argument aligned with hugepagesz at older longterm version
Linux, like 2.6.32 and 3.2.72, or mmap() will fail with EINVAL.
This bug was fixed in Linux kernel by commit:
dab2d3dc45ae7343216635d981d43637e1cb7d45
To avoid failure, make sure in caller to keep length aligned.

Signed-off-by: Jianfeng Tan 
---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 36 ---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index d07452a..7ce48d0 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -74,7 +74,6 @@ free_mem_region(struct virtio_net *dev)
 {
struct orig_region_map *region;
unsigned int idx;
-   uint64_t alignment;

if (!dev || !dev->mem)
return;
@@ -82,12 +81,8 @@ free_mem_region(struct virtio_net *dev)
region = orig_region(dev->mem, dev->mem->nregions);
for (idx = 0; idx < dev->mem->nregions; idx++) {
if (region[idx].mapped_address) {
-   alignment = region[idx].blksz;
-   munmap((void *)(uintptr_t)
-   RTE_ALIGN_FLOOR(
-   region[idx].mapped_address, alignment),
-   RTE_ALIGN_CEIL(
-   region[idx].mapped_size, alignment));
+   munmap((void *)region[idx].mapped_address,
+   region[idx].mapped_size);
close(region[idx].fd);
}
}
@@ -147,6 +142,18 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
/* This is ugly */
mapped_size = memory.regions[idx].memory_size +
memory.regions[idx].mmap_offset;
+
+   /* mmap() without flag of MAP_ANONYMOUS, should be called
+* with length argument aligned with hugepagesz at older
+* longterm version Linux, like 2.6.32 and 3.2.72, or
+* mmap() will fail with EINVAL.
+*
+* to avoid failure, make sure in caller to keep length
+* aligned.
+*/
+   alignment = get_blk_size(pmsg->fds[idx]);
+   mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
+
mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
mapped_size,
PROT_READ | PROT_WRITE, MAP_SHARED,
@@ -154,9 +161,11 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
0);

RTE_LOG(INFO, VHOST_CONFIG,
-   "mapped region %d fd:%d to %p sz:0x%"PRIx64" 
off:0x%"PRIx64"\n",
+   "mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
+   "off:0x%"PRIx64" align:0x%"PRIx64"\n",
idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
-   mapped_size, memory.regions[idx].mmap_offset);
+   mapped_size, memory.regions[idx].mmap_offset,
+   alignment);

if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
RTE_LOG(ERR, VHOST_CONFIG,
@@ -166,7 +175,7 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)

pregion_orig[idx].mapped_address = mapped_address;
pregion_orig[idx].mapped_size = mapped_size;
-   pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
+   pregion_orig[idx].blksz = alignment;
pregion_orig[idx].fd = pmsg->fds[idx];

mapped_address +=  memory.regions[idx].mmap_offset;
@@ -193,11 +202,8 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)

 err_mmap:
while (idx--) {
-   alignment = pregion_orig[idx].blksz;
-   munmap((void *)(uintptr_t)RTE_ALIGN_FLOOR(
-   pregion_orig[idx].mapped_address, alignment),
-   RTE_ALIGN_CEIL(pregion_orig[idx].mapped_size,
-   alignment));
+   munmap((void *)pregion_orig[idx].mapped_address,
+   pregion_orig[idx].mapped_size);
close(pregion_orig[idx].fd);
}
free(dev->mem);
-- 
2.1.4



[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_tunnel_filter_conf

2015-11-12 Thread Zhang, Helin


> -Original Message-
> From: Wu, Jingjing
> Sent: Tuesday, November 10, 2015 11:50 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Lu, Wenzhuo
> Subject: [PATCH] doc: announce ABI change for struct 
> rte_eth_tunnel_filter_conf
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Helin Zhang 


[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow

2015-11-12 Thread Zhang, Helin


> -Original Message-
> From: Wu, Jingjing
> Sent: Tuesday, November 10, 2015 11:11 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Zhang, Helin; Chilikin, Andrey
> Subject: [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Helin Zhang 


[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_tunnel_filter_conf

2015-11-12 Thread Lu, Wenzhuo
Hi,

> -Original Message-
> From: Wu, Jingjing
> Sent: Tuesday, November 10, 2015 11:50 AM
> To: dev at dpdk.org
> Cc: Wu, Jingjing ; Zhang, Helin
> ; Lu, Wenzhuo 
> Subject: [PATCH] doc: announce ABI change for struct
> rte_eth_tunnel_filter_conf
> 
> Signed-off-by: Jingjing Wu  
Acked-by: Wenzhuo Lu 


[dpdk-dev] [PATCH] doc: announce ABI change for struct rte_eth_fdir_flow

2015-11-12 Thread Lu, Wenzhuo
Hi,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jingjing Wu
> Sent: Tuesday, November 10, 2015 11:11 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: announce ABI change for struct
> rte_eth_fdir_flow
> 
> Signed-off-by: Jingjing Wu 
Acked-by: Wenzhuo Lu 



[dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Xie, Huawei
On 11/12/2015 10:35 AM, Tan, Jianfeng wrote:
>
>> -Original Message-
>> From: Xie, Huawei
>> Sent: Wednesday, November 11, 2015 11:57 AM
>> To: Tan, Jianfeng; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned 
>> with
>> hugepage size
>>
>> On 10/30/2015 2:52 PM, Jianfeng Tan wrote:
>>> This patch fixes a bug under lower version linux kernel, mmap() fails
>>> when
>> Since which version Linux hugetlbfs changes the requirement of size 
>> alignment?
>>> length is not aligned with hugepage size.
> This link shows this bug was fixed in Linux kernel commit: 
> dab2d3dc45ae7343216635d981d43637e1cb7d45
> After my check, that patch was applied to long term version 3.4.110+
> So distributions using 2.6.32 and 3.2.72 need this patch to make vhost work 
> well.
> https://bugzilla.kernel.org/show_bug.cgi?id=56881
OK, please add this in commit message, remove unnecessary RTE_ALIGN in
free_memory_region, and add comment to the code because our fix is a
workaround to kernel hugetlbfs implementation issue.
>
>>> Signed-off-by: Jianfeng Tan 
>>> ---
>>>  lib/librte_vhost/vhost_user/virtio-net-user.c | 12 +---
>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
>>> b/lib/librte_vhost/vhost_user/virtio-net-user.c
>>> index a998ad8..641561c 100644
>>> --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
>>> +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
>>> @@ -147,6 +147,10 @@ user_set_mem_table(struct vhost_device_ctx ctx,
>> struct VhostUserMsg *pmsg)
>>> /* This is ugly */
>>> mapped_size = memory.regions[idx].memory_size +
>>> memory.regions[idx].mmap_offset;
>>> +
>>> +   alignment = get_blk_size(pmsg->fds[idx]);
>>> +   mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
>> Probably we could remove the alignment of mapped size in free_mem_region as
>> well.
> Yes, after aligning mapped_address when mmap(), this address does not need to 
> be aligned again
> when munmap(). But this will effect nothing, or incur any performance issue. 
> I'm prone to take no
> change to it.
>
>>RTE_ALIGN_CEIL(
>> region[idx].mapped_size, alignment) If we are not sure, leave it as 
>> it is.
>>> +
>>> mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
>>> mapped_size,
>>> PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9
>> +158,11 @@
>>> user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
>>> 0);
>>>
>>> RTE_LOG(INFO, VHOST_CONFIG,
>>> -   "mapped region %d fd:%d to %p sz:0x%"PRIx64"
>> off:0x%"PRIx64"\n",
>>> +   "mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
>>> +   "off:0x%"PRIx64" align:0x%"PRIx64"\n",
>>> idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
>>> -   mapped_size, memory.regions[idx].mmap_offset);
>>> +   mapped_size, memory.regions[idx].mmap_offset,
>>> +   alignment);
>>>
>>> if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
>>> RTE_LOG(ERR, VHOST_CONFIG,
>>> @@ -166,7 +172,7 @@ user_set_mem_table(struct vhost_device_ctx ctx,
>>> struct VhostUserMsg *pmsg)
>>>
>>> pregion_orig[idx].mapped_address = mapped_address;
>>> pregion_orig[idx].mapped_size = mapped_size;
>>> -   pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
>>> +   pregion_orig[idx].blksz = alignment;
>>> pregion_orig[idx].fd = pmsg->fds[idx];
>>>
>>> mapped_address +=  memory.regions[idx].mmap_offset;
>



[dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned with hugepage size

2015-11-12 Thread Tan, Jianfeng


> -Original Message-
> From: Xie, Huawei
> Sent: Wednesday, November 11, 2015 11:57 AM
> To: Tan, Jianfeng; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: fix mmap failure as len not aligned 
> with
> hugepage size
> 
> On 10/30/2015 2:52 PM, Jianfeng Tan wrote:
> > This patch fixes a bug under lower version linux kernel, mmap() fails
> > when
> Since which version Linux hugetlbfs changes the requirement of size alignment?
> > length is not aligned with hugepage size.

This link shows this bug was fixed in Linux kernel commit: 
dab2d3dc45ae7343216635d981d43637e1cb7d45
After my check, that patch was applied to long term version 3.4.110+
So distributions using 2.6.32 and 3.2.72 need this patch to make vhost work 
well.
https://bugzilla.kernel.org/show_bug.cgi?id=56881


> >
> > Signed-off-by: Jianfeng Tan 
> > ---
> >  lib/librte_vhost/vhost_user/virtio-net-user.c | 12 +---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c
> > b/lib/librte_vhost/vhost_user/virtio-net-user.c
> > index a998ad8..641561c 100644
> > --- a/lib/librte_vhost/vhost_user/virtio-net-user.c
> > +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
> > @@ -147,6 +147,10 @@ user_set_mem_table(struct vhost_device_ctx ctx,
> struct VhostUserMsg *pmsg)
> > /* This is ugly */
> > mapped_size = memory.regions[idx].memory_size +
> > memory.regions[idx].mmap_offset;
> > +
> > +   alignment = get_blk_size(pmsg->fds[idx]);
> > +   mapped_size = RTE_ALIGN_CEIL(mapped_size, alignment);
> Probably we could remove the alignment of mapped size in free_mem_region as
> well.

Yes, after aligning mapped_address when mmap(), this address does not need to 
be aligned again
when munmap(). But this will effect nothing, or incur any performance issue. 
I'm prone to take no
change to it.

>RTE_ALIGN_CEIL(
> region[idx].mapped_size, alignment) If we are not sure, leave it as 
> it is.
> > +
> > mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> > mapped_size,
> > PROT_READ | PROT_WRITE, MAP_SHARED, @@ -154,9
> +158,11 @@
> > user_set_mem_table(struct vhost_device_ctx ctx, struct VhostUserMsg *pmsg)
> > 0);
> >
> > RTE_LOG(INFO, VHOST_CONFIG,
> > -   "mapped region %d fd:%d to %p sz:0x%"PRIx64"
> off:0x%"PRIx64"\n",
> > +   "mapped region %d fd:%d to:%p sz:0x%"PRIx64" "
> > +   "off:0x%"PRIx64" align:0x%"PRIx64"\n",
> > idx, pmsg->fds[idx], (void *)(uintptr_t)mapped_address,
> > -   mapped_size, memory.regions[idx].mmap_offset);
> > +   mapped_size, memory.regions[idx].mmap_offset,
> > +   alignment);
> >
> > if (mapped_address == (uint64_t)(uintptr_t)MAP_FAILED) {
> > RTE_LOG(ERR, VHOST_CONFIG,
> > @@ -166,7 +172,7 @@ user_set_mem_table(struct vhost_device_ctx ctx,
> > struct VhostUserMsg *pmsg)
> >
> > pregion_orig[idx].mapped_address = mapped_address;
> > pregion_orig[idx].mapped_size = mapped_size;
> > -   pregion_orig[idx].blksz = get_blk_size(pmsg->fds[idx]);
> > +   pregion_orig[idx].blksz = alignment;
> > pregion_orig[idx].fd = pmsg->fds[idx];
> >
> > mapped_address +=  memory.regions[idx].mmap_offset;



[dpdk-dev] [PATCH] vhost: reset device properly

2015-11-12 Thread Rich Lane
On Wed, Nov 11, 2015 at 8:10 PM, Yuanhan Liu 
wrote:

> Currently, we reset all fields of a device to zero when reset
> happens, which is wrong, since for some fields like device_fh,
> ifname, and virt_qp_nb, they should be same and be kept after
> reset until the device is removed. And this is what's the new
> helper function reset_device() for.
>
> And use rte_zmalloc() instead of rte_malloc, so that we could
> avoid init_device(), which basically dose zero reset only so far.
> Hence, init_device() is dropped in this patch.
>
> This patch also removes a hack of using the offset a specific
> field (which is virtqueue now) inside of `virtio_net' structure
> to do reset, which could be broken easily if someone changed the
> field order without caution.
>
> Cc: Tetsuya Mukawa 
> Cc: Xie Huawei 
> Signed-off-by: Yuanhan Liu 
>

I had a patch that just saved the ifname but this is much better.

Acked-by: Rich Lane 


[dpdk-dev] [PATCH] bonding: fix enumerated type mixed with another type

2015-11-12 Thread Thomas Monjalon
> > ICC complains about enumerated types being mixed in link bonding driver,
> > as ETH_MQ_RX_RSS is an enum type of mq_mode and not a bitmask as it
> > was
> > being treated.
> > 
> > Fixes: 734ce47f71e0 ("bonding: support RSS dynamic configuration")
> > 
> > Signed-off-by: Tomasz Kulasek 
> 
> Acked-by: Pablo de Lara 

Applied, thanks


[dpdk-dev] [PATCHv7 0/2] ixgbe: fix TX hang when RS distance exceeds HW limit

2015-11-12 Thread Thomas Monjalon
> > First patch contains changes in testpmd that allow to reproduce the issue.
> > Second patch is the actual fix.
> > 
> > Konstantin Ananyev (2):
> >   testpmd: add ability to split outgoing packets
> >   ixgbe: fix TX hang when RS distance exceeds HW limit
> 
> Series-acked-by: Pablo de Lara 

Applied, thanks


[dpdk-dev] Permanently binding NIC ports with DPDK drivers

2015-11-12 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Montorsi, Francesco
> Sent: Wednesday, November 11, 2015 4:13 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Permanently binding NIC ports with DPDK drivers
> 
> Hi,
> Is there a way to permanently (i.e., have the configuration automatically
> applied after reboot) bind a NIC port to DPDK?


Hi,

The Ubuntu dpdk package for 15.10 contains system scripts with functions for 
reserving hugepages and binding interfaces on bootup:


/etc/dpdk/dpdk.conf
/etc/dpdk/interfaces
/etc/init.d/dpdk
/lib/dpdk/dpdk-init
/lib/systemd/system/dpdk.service
/sbin/dpdk_nic_bind
/usr/bin/testpmd
/usr/share/doc/dpdk/README.Debian
/usr/share/doc/dpdk/changelog.Debian.gz
/usr/share/doc/dpdk/copyright
/usr/share/dpdk/tools/cpu_layout.py
/usr/share/dpdk/tools/dpdk_nic_bind.py
/usr/share/dpdk/tools/setup.sh
/usr/share/python/runtime.d/dpdk.rtupdate

http://packages.ubuntu.com/wily/amd64/dpdk/filelist

If you have the latest version of Ubuntu you can check that out or else 
download and extract the files from the .deb to see how they do it.

John.
-- 


[dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len

2015-11-12 Thread Rich Lane
The guest could trigger this buffer overflow by creating a cycle of descriptors
(which would also cause an infinite loop). The more common case is that
vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This
happens nearly every time when restarting a DPDK app inside a VM connected to a
vhost-user vswitch because the virtqueue memory allocated by the previous run
is zeroed.

Signed-off-by: Rich Lane 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 9322ce6..d95b478 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
vq->buf_vec[vec_id].desc_idx = idx;
vec_id++;

-   if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
+   if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < 
BUF_VECTOR_MAX) {
idx = vq->desc[idx].next;
next_desc = 1;
}
-- 
1.9.1