date:20160502

[dpdk-dev] perfomance of rte_lpm rule subsystem

2016-05-02 Thread Александр Киселев

Stephen, what was the main reason you use red-black tree instead of dir-24-8?
Did you switch to using trees because of too big memory working set of
dir-24-8 algorithm?


2016-04-19 18:46 GMT+03:00 Stephen Hemminger :

> On Tue, 19 Apr 2016 14:11:11 +0300
> ? ???  wrote:
>
> > Hi.
> >
> > Doing some test with rte_lpm (adding/deleting bgp full table rules) I
> > noticed that
> > rule subsystem is very slow even considering that probably it was never
> > designed for using
> > in a data forwarding plane. So I want to propose some changes to the
> "rule"
> > subsystem.
> >
> > I reimplemented rule part ot the lib using rte_hash, and perfomance of
> > adding/deleted routes have increased dramatically.
> > If increasing speed of adding deleting routes makes sence for anybody
> else
> > I would like to discuss my patch.
> > The patch also include changes that make next_hop 64 bit, so please just
> > ignore them. The rule changes are in the following
> > functions only:
> >
> > rte_lpm2_create
> >
> > rule_find
> > rule_add
> > rule_delete
> > find_previous_rule
> > delete_depth_small
> > delete_depth_big
> >
> > rte_lpm2_add
> > rte_lpm2_delete
> > rte_lpm2_is_rule_present
> > rte_lpm2_delete_all
> >
>
> We forked LPM back several versions ago.
> I sent the patches to use BSD red-black tree for rules but the patches were
> ignored. mostly because it broke ABI.
>



-- 
--
Kiselev Alexander

[dpdk-dev] [PATCH] mk: add rpath for applications

2016-05-02 Thread Thomas Monjalon

2016-04-29 17:34, Ferruh Yigit:
> Add default library output folder to the library search folder.
> 
> This is useful for development environment, in production environment
> DPDK libraries already should be in know locations.

Yes it is useful in dev environment, but can be risky or strange when
packaged for production environment.
Shouldn't we have a switch to avoid a development garbage in production?
I suggest to use RTE_DEVEL_BUILD.

> Patch removes requirement to set LD_LIBRARY_PATH variable when DPDK
> compiled as shared library.

Yes, this patch could remove
export LD_LIBRARY_PATH=$build/lib:$LD_LIBRARY_PATH
in scripts/test-null.sh.

[...]
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +LDFLAGS += --rpath=$(RTE_SDK_BIN)/lib
> +endif

Isn't it -rpath, with a single dash?

As it is a variable setting, it should be added before the rules,
just after LDLIBS settings.

[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop

2016-05-02 Thread De Lara Guarch, Pablo

Hi Jerin,

> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Monday, May 02, 2016 1:00 PM
> To: dev at dpdk.org
> Cc: De Lara Guarch, Pablo; Jerin Jacob
> Subject: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in
> macswap loop
> 
> prefetch the next packet data address in advance in macswap loop
> for performance improvement.
> 
> Signed-off-by: Jerin Jacob 
> ---
>  app/test-pmd/macswap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index 154889d..c10f4b5 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>   if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
>   ol_flags |= PKT_TX_QINQ_PKT;
>   for (i = 0; i < nb_rx; i++) {
> + if (likely(i < nb_rx - 1))
> + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> +void *));
>   mb = pkts_burst[i];
>   eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> 
> --
> 2.1.0

This looks good. Could you also add it in the other forwarding modes (the ones 
that make changes in the packets)?

Thanks,
Pablo

[dpdk-dev] [PATCH] mk: do not enforce any specific ARM ABI

2016-05-02 Thread Thomas Monjalon

2016-04-16 00:33, Jan Viktorin:
> The dpdk build system passes -mfloat-abi=softfp, which makes the build fail
> when the selected ABI is EABIhf. The dpdk build system should not make
> assumptions on the selected ARM ABI.
> 
> Signed-off-by: Jan Viktorin 
> Reported-by: Thomas Petazzoni 

Applied, thanks

[dpdk-dev] [PATCH 3/3] vhost: arrange virtio_net fields for better cache sharing

2016-05-02 Thread Yuanhan Liu

the ifname[] field takes so much space, that it seperate some frequently
used fields into different caches, say, features and broadcast_rarp.

This patch move all those fields that will be accessed frequently in Rx/Tx
together (before the ifname[] field) to let them share one cache line.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost-net.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 9dec83c..3b0ffe7 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -123,16 +123,16 @@ struct virtio_net {
int vid;
uint32_tflags;
uint16_tvhost_hlen;
+   /* to tell if we need broadcast rarp packet */
+   rte_atomic16_t  broadcast_rarp;
+   uint32_tvirt_qp_nb;
+   struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
charifname[IF_NAME_SZ];
-   uint32_tvirt_qp_nb;
uint64_tlog_size;
uint64_tlog_base;
struct ether_addr   mac;

-   /* to tell if we need broadcast rarp packet */
-   rte_atomic16_t  broadcast_rarp;
-   struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
 } __rte_cache_aligned;

 /**
-- 
1.9.0

[dpdk-dev] [PATCH 2/3] vhost: optimize dequeue for small packets

2016-05-02 Thread Yuanhan Liu

Both current kernel virtio driver and DPDK virtio driver use at least
2 desc buffer for Tx: the first for storing the header, and the others
for storing the data.

Therefore, we could fetch the first data desc buf before the main loop,
and do the copy first before the check of "are we done yet?". This
could save one check for small packets, that just have one data desc
buffer and need one mbuf to store it.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_rxtx.c | 52 ++-
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 2c3b810..34d6ed1 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -753,18 +753,48 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
return -1;

desc_addr = gpa_to_vva(dev, desc->addr);
-   rte_prefetch0((void *)(uintptr_t)desc_addr);
-
-   /* Retrieve virtio net header */
hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
-   desc_avail  = desc->len - dev->vhost_hlen;
-   desc_offset = dev->vhost_hlen;
+   rte_prefetch0(hdr);
+
+   /*
+* Both current kernel virio driver and DPDK virtio driver
+* use at least 2 desc bufferr for Tx: the first for storing
+* the header, and others for storing the data.
+*/
+   if (likely(desc->len == dev->vhost_hlen)) {
+   desc = >desc[desc->next];
+
+   desc_addr = gpa_to_vva(dev, desc->addr);
+   rte_prefetch0((void *)(uintptr_t)desc_addr);
+
+   desc_offset = 0;
+   desc_avail  = desc->len;
+   nr_desc+= 1;
+
+   PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
+   } else {
+   desc_avail  = desc->len - dev->vhost_hlen;
+   desc_offset = dev->vhost_hlen;
+   }

mbuf_offset = 0;
mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
-   while (desc_avail != 0 || (desc->flags & VRING_DESC_F_NEXT) != 0) {
+   while (1) {
+   cpy_len = RTE_MIN(desc_avail, mbuf_avail);
+   rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
+   (void *)((uintptr_t)(desc_addr + desc_offset)),
+   cpy_len);
+
+   mbuf_avail  -= cpy_len;
+   mbuf_offset += cpy_len;
+   desc_avail  -= cpy_len;
+   desc_offset += cpy_len;
+
/* This desc reaches to its end, get the next one */
if (desc_avail == 0) {
+   if ((desc->flags & VRING_DESC_F_NEXT) == 0)
+   break;
+
if (unlikely(desc->next >= vq->size ||
 ++nr_desc >= vq->size))
return -1;
@@ -800,16 +830,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
mbuf_offset = 0;
mbuf_avail  = cur->buf_len - RTE_PKTMBUF_HEADROOM;
}
-
-   cpy_len = RTE_MIN(desc_avail, mbuf_avail);
-   rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
-   (void *)((uintptr_t)(desc_addr + desc_offset)),
-   cpy_len);
-
-   mbuf_avail  -= cpy_len;
-   mbuf_offset += cpy_len;
-   desc_avail  -= cpy_len;
-   desc_offset += cpy_len;
}

prev->data_len = mbuf_offset;
-- 
1.9.0

[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx

2016-05-02 Thread Yuanhan Liu

Pre update and update used ring in batch for Tx and Rx at the stage
while fetching all avail desc idx. This would reduce some cache misses
and hence, increase the performance a bit.

Pre update would be feasible as guest driver will not start processing
those entries as far as we don't update "used->idx". (I'm not 100%
certain I don't miss anything, though).

Cc: Michael S. Tsirkin 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_rxtx.c | 58 +--
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index c9cd1c5..2c3b810 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -137,7 +137,7 @@ copy_virtio_net_hdr(struct virtio_net *dev, uint64_t 
desc_addr,

 static inline int __attribute__((always_inline))
 copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
- struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied)
+ struct rte_mbuf *m, uint16_t desc_idx)
 {
uint32_t desc_avail, desc_offset;
uint32_t mbuf_avail, mbuf_offset;
@@ -161,7 +161,6 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
desc_offset = dev->vhost_hlen;
desc_avail  = desc->len - dev->vhost_hlen;

-   *copied = rte_pktmbuf_pkt_len(m);
mbuf_avail  = rte_pktmbuf_data_len(m);
mbuf_offset = 0;
while (mbuf_avail != 0 || m->next != NULL) {
@@ -262,6 +261,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
struct vhost_virtqueue *vq;
uint16_t res_start_idx, res_end_idx;
uint16_t desc_indexes[MAX_PKT_BURST];
+   uint16_t used_idx;
uint32_t i;

LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
@@ -285,27 +285,29 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/* Retrieve all of the desc indexes first to avoid caching issues. */
rte_prefetch0(>avail->ring[res_start_idx & (vq->size - 1)]);
for (i = 0; i < count; i++) {
-   desc_indexes[i] = vq->avail->ring[(res_start_idx + i) &
- (vq->size - 1)];
+   used_idx = (res_start_idx + i) & (vq->size - 1);
+   desc_indexes[i] = vq->avail->ring[used_idx];
+   vq->used->ring[used_idx].id = desc_indexes[i];
+   vq->used->ring[used_idx].len = pkts[i]->pkt_len +
+  dev->vhost_hlen;
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, ring[used_idx]),
+   sizeof(vq->used->ring[used_idx]));
}

rte_prefetch0(>desc[desc_indexes[0]]);
for (i = 0; i < count; i++) {
uint16_t desc_idx = desc_indexes[i];
-   uint16_t used_idx = (res_start_idx + i) & (vq->size - 1);
-   uint32_t copied;
int err;

-   err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, );
-
-   vq->used->ring[used_idx].id = desc_idx;
-   if (unlikely(err))
+   err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx);
+   if (unlikely(err)) {
+   used_idx = (res_start_idx + i) & (vq->size - 1);
vq->used->ring[used_idx].len = dev->vhost_hlen;
-   else
-   vq->used->ring[used_idx].len = copied + dev->vhost_hlen;
-   vhost_log_used_vring(dev, vq,
-   offsetof(struct vring_used, ring[used_idx]),
-   sizeof(vq->used->ring[used_idx]));
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, ring[used_idx]),
+   sizeof(vq->used->ring[used_idx]));
+   }

if (i + 1 < count)
rte_prefetch0(>desc[desc_indexes[i+1]]);
@@ -879,6 +881,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
/* Prefetch available ring to retrieve head indexes. */
used_idx = vq->last_used_idx & (vq->size - 1);
rte_prefetch0(>avail->ring[used_idx]);
+   rte_prefetch0(>used->ring[used_idx]);

count = RTE_MIN(count, MAX_PKT_BURST);
count = RTE_MIN(count, free_entries);
@@ -887,22 +890,23 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,

/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < count; i++) {
-   desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
-   (vq->size - 1)];
+   used_idx = (vq->last_used_idx + i) & (vq->size - 1);
+   desc_indexes[i] = vq->avail->ring[used_idx];
+
+   vq->used->ring[used_idx].id  = desc_indexes[i];
+   vq->used->ring[used_idx].len = 0;
+   vhost_log_used_vring(dev, vq,

[dpdk-dev] ovs crash when running traffic from VM to VM over DPDK and vhostuser

2016-05-02 Thread Yi Ba

Running with dpdk 16.04 and latest ovs from git, and removing "mrg_rxbuf=off" 
from virtio params, the crash is no longer observed. However, we are 
wittnessing ovs gets stuck, and will post to ovs mailing 
list:2016-05-02T17:26:18.804Z|00111|ovs_rcu|WARN|blocked 1000 ms waiting for 
pmd145 to quiesce
2016-05-02T17:26:19.805Z|00112|ovs_rcu|WARN|blocked 2001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:21.804Z|00113|ovs_rcu|WARN|blocked 4000 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:25.805Z|00114|ovs_rcu|WARN|blocked 8001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:33.805Z|00115|ovs_rcu|WARN|blocked 16001 ms waiting for pmd145 
to quiesce
2016-05-02T17:26:49.805Z|00116|ovs_rcu|WARN|blocked 32001 ms waiting for pmd145 
to quiesce
2016-05-02T17:27:14.354Z|00072|ovs_rcu(vhost_thread2)|WARN|blocked 128000 ms 
waiting for pmd145 to quiesce
2016-05-02T17:27:15.841Z|8|ovs_rcu(urcu3)|WARN|blocked 128001 ms waiting 
for pmd145 to quiesce
2016-05-02T17:27:21.805Z|00117|ovs_rcu|WARN|blocked 64000 ms waiting for pmd145 
to quiesce
2016-05-02T17:28:25.804Z|00118|ovs_rcu|WARN|blocked 128000 ms waiting for 
pmd145 to quiesce

On Wednesday, 6 April 2016 10:56 AM, Yuanhan Liu  wrote:

 On Tue, Apr 05, 2016 at 08:36:19PM +, Yi Ba wrote:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ff1ddffb700 (LWP 21287)]
> 0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)
>? ? at /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c:452
> 452? ? /home/stack/ovs-dpdk/dpdk-2.2.0/lib/librte_vhost/vhost_rxtx.c: No such 
> file or directory.
> (gdb) bt
> #0? 0x00450da7 in update_secure_len (vec_idx=0x7ff1ddff27f8, 
> secure_len=0x7ff1ddff27fc, id=13948, vq=0x7fe7992c8940)

It looks like a known issue, which has been fixed in this release. So,
could you please just try again with the latest DPDK code? It should
be able to solve your issue.

??? --yliu

[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop

2016-05-02 Thread Jerin Jacob

prefetch the next packet data address in advance in macswap loop
for performance improvement.

Signed-off-by: Jerin Jacob 
---
 app/test-pmd/macswap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 154889d..c10f4b5 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
ol_flags |= PKT_TX_QINQ_PKT;
for (i = 0; i < nb_rx; i++) {
+   if (likely(i < nb_rx - 1))
+   rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
+  void *));
mb = pkts_burst[i];
eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);

-- 
2.1.0

[dpdk-dev] [PATCH] cmdline: fix unchecked return value

2016-05-02 Thread Olivier Matz

Hi Daniel,

On 04/14/2016 03:01 PM, Daniel Mrzyglod wrote:
> This patch is for checking if error values occurs.
> fix for coverity errors #13209 & #13195
> 
> If the function returns an error value, the error value may be mistaken
> for a normal value.
> 
> In rdline_char_in: Value returned from a function is not checked for errors
> before being used
> 
> Signed-off-by: Daniel Mrzyglod 
> ---
>  lib/librte_cmdline/cmdline_rdline.c | 19 +++
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_cmdline/cmdline_rdline.c 
> b/lib/librte_cmdline/cmdline_rdline.c
> index 1ef2258..e75a556 100644
> --- a/lib/librte_cmdline/cmdline_rdline.c
> +++ b/lib/librte_cmdline/cmdline_rdline.c
> @@ -377,7 +377,10 @@ rdline_char_in(struct rdline *rdl, char c)
>   case CMDLINE_KEY_CTRL_K:
>   cirbuf_get_buf_head(>right, rdl->kill_buf, 
> RDLINE_BUF_SIZE);
>   rdl->kill_size = CIRBUF_GET_LEN(>right);
> - cirbuf_del_buf_head(>right, rdl->kill_size);
> +
> + if (cirbuf_del_buf_head(>right, rdl->kill_size) < 
> 0)
> + return -EINVAL;
> +
>   rdline_puts(rdl, vt100_clear_right);
>   break;
>  

I wonder if a better way to fix wouldn't be to remove the checks
introduced in http://dpdk.org/browse/dpdk/commit/?id=ab971e562860

There is no reason to check that in cirbuf_get_buf_head/tail():
if (!cbuf || !c)

The function should never fail, it just returns the number of
copied chars. This is the responsibility of the caller to ensure
that the pointer to the circular buffer is not NULL.

Also, rdline_char_in() is not expected to return -EINVAL, but
RDLINE_RES_* instead.

So I think that partially revert ab971e562860 would fix the
coverity warning.

Regards,
Olivier

[dpdk-dev] [PATCH 0/4] cleanup debug and dead code

2016-05-02 Thread Thomas Monjalon

2016-04-22 15:43, Thomas Monjalon:
> With this series, the default log level is not debug anymore.
> And more code depends on debug level instead of having some
> almost dead code.
> 
> Thomas Monjalon (4):
>   eal: increase log level of some messages
>   log: increase default level to info
>   examples: remove useless debug flags
>   eal: add assert macro for debug

Applied with small fix discussed for vmxnet3.

[dpdk-dev] [PATCH v3 1/1] cmdline: add any multi string mode to token string

2016-05-02 Thread Thomas Monjalon

2016-04-29 16:29, Piotr Azarewicz:
> While parsing token string there may be several modes:
> - fixed single string
> - multi-choice single string
> - any single string
> 
> This patch add one more mode - any multi string.
> 
> Signed-off-by: Piotr Azarewicz 
> Acked-by: Olivier Matz 

Applied, thanks

[dpdk-dev] Flow Director Example?

2016-05-02 Thread Alex Forster

Hi Helin, thanks for the reply.

Some code might help me explain myself better-

port->configuration = rte_eth_conf {
.fdir_conf = {
.mode = RTE_FDIR_MODE_SIGNATURE,
.pballoc = RTE_FDIR_PBALLOC_64K,
.mask = rte_eth_fdir_masks {
.ipv4_mask = rte_eth_ipv4_flow {
.dst_ip = 0x0,
},
.ipv6_mask = rte_eth_ipv6_flow {
.dst_ip = { 0x0, 0x0, 0x0, 0x0 },
},
},
.status = RTE_FDIR_REPORT_STATUS,
.drop_queue = 127,
},
.rxmode = {
.mq_mode = ETH_MQ_RX_NONE,
.max_rx_pkt_len = ETHER_MAX_LEN,
.split_hdr_size = 0,
.header_split   = 0,
.hw_ip_checksum = 0,
.hw_vlan_filter = 0,
.jumbo_frame= 0,
.hw_strip_crc   = 0,
},
.txmode = {
.mq_mode = ETH_MQ_TX_NONE,
},
};




I'm trying to direct packets with the same destination IPv4 or IPv6 address 
into the same RX queues. I haven't been able to find any examples of using Flow 
Director with DPDK, so I'm sure I'm doing something obviously wrong here, but I 
can't figure out what it is.

Alex Forster

On 5/2/16, 1:38 AM, "Zhang, Helin"  wrote:

>Hi Alex
>
>Can you confirm that you are using DPDK? And how do you use DPDK and possibly 
>kernel driver?
>I need your detailed topo of how are you using DPDK, as I am a bit confused. 
>Thanks!
>
>Regards,
>Helin
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Forster
>> Sent: Saturday, April 30, 2016 4:34 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] Flow Director Example?
>> 
>> Hi guys, apologies if this is the wrong list, but the others look pretty 
>> bare.
>> 
>> We have a 32 core server that has two X520-QDA1's NICs with 2x10G ports
>> plugged into each. I'm using 2016.1 (latest stable) with ixgbe 4.3.15 
>> (latest stable).
>> I'm setting up 8 RX queues per port, and I'd like Flow Director in signature 
>> mode
>> (?) to place packets into queues based on a hash of destination IPv4 or IPv6
>> address. However, I can't figure out rte_fdir_conf, and despite a good 
>> amount of
>> trial and error, each of my ports are still only using one of the RX queues 
>> I set up.
>> 
>> Would anyone be able to point me in the right direction here? Thanks in 
>> advance!
>> 
>> Alex Forster

[dpdk-dev] [PATCH 16/16] vhost: make buf vector for scatter Rx local

2016-05-02 Thread Yuanhan Liu

From: Ilya Maximets 

Array of buf_vector's is just an array for temporary storing information
about available descriptors. It used only locally in virtio_dev_merge_rx()
and there is no reason for that array to be shared.

Fix that by allocating local buf_vec inside virtio_dev_merge_rx().

Signed-off-by: Ilya Maximets 
Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost-net.h  |  1 -
 lib/librte_vhost/vhost_rxtx.c | 41 ++---
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 9dec83c..e697d96 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -81,7 +81,6 @@ struct vhost_virtqueue {

/* Physical address of used ring, for logging */
uint64_tlog_guest_addr;
-   struct buf_vector   buf_vec[BUF_VECTOR_MAX];
 } __rte_cache_aligned;

 /* Old kernels have no such macro defined */
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index c9cd1c5..96720db 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -335,7 +335,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

 static inline int
 fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
-uint32_t *allocated, uint32_t *vec_idx)
+uint32_t *allocated, uint32_t *vec_idx,
+struct buf_vector *buf_vec)
 {
uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)];
uint32_t vec_id = *vec_idx;
@@ -346,9 +347,9 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
return -1;

len += vq->desc[idx].len;
-   vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
-   vq->buf_vec[vec_id].buf_len  = vq->desc[idx].len;
-   vq->buf_vec[vec_id].desc_idx = idx;
+   buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
+   buf_vec[vec_id].buf_len  = vq->desc[idx].len;
+   buf_vec[vec_id].desc_idx = idx;
vec_id++;

if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0)
@@ -371,7 +372,8 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
  */
 static inline int
 reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size,
-   uint16_t *start, uint16_t *end)
+   uint16_t *start, uint16_t *end,
+   struct buf_vector *buf_vec)
 {
uint16_t res_start_idx;
uint16_t res_cur_idx;
@@ -393,7 +395,7 @@ again:
return -1;

if (unlikely(fill_vec_buf(vq, res_cur_idx, ,
- _idx) < 0))
+ _idx, buf_vec) < 0))
return -1;

res_cur_idx++;
@@ -427,7 +429,7 @@ again:
 static inline uint32_t __attribute__((always_inline))
 copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
uint16_t res_start_idx, uint16_t res_end_idx,
-   struct rte_mbuf *m)
+   struct rte_mbuf *m, struct buf_vector *buf_vec)
 {
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
uint32_t vec_idx = 0;
@@ -444,10 +446,10 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
dev->vid, cur_idx, res_end_idx);

-   if (vq->buf_vec[vec_idx].buf_len < dev->vhost_hlen)
+   if (buf_vec[vec_idx].buf_len < dev->vhost_hlen)
return -1;

-   desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
+   desc_addr = gpa_to_vva(dev, buf_vec[vec_idx].buf_addr);
rte_prefetch0((void *)(uintptr_t)desc_addr);

virtio_hdr.num_buffers = res_end_idx - res_start_idx;
@@ -456,10 +458,10 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,

virtio_enqueue_offload(m, _hdr.hdr);
copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
-   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
+   vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);

-   desc_avail  = vq->buf_vec[vec_idx].buf_len - dev->vhost_hlen;
+   desc_avail  = buf_vec[vec_idx].buf_len - dev->vhost_hlen;
desc_offset = dev->vhost_hlen;

mbuf_avail  = rte_pktmbuf_data_len(m);
@@ -467,7 +469,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
while (mbuf_avail != 0 || m->next != NULL) {
/* done with current desc buf, get the next one */
if (desc_avail == 0) {
-   desc_idx = vq->buf_vec[vec_idx].desc_idx;
+

[dpdk-dev] [PATCH 15/16] vhost: per device vhost_hlen

2016-05-02 Thread Yuanhan Liu

Virtio net header length is set per device, but not per queue. So, there
is no reason to store it in vhost_virtqueue struct, instead, we should
store it in virtio_net struct, to make one copy only.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost-net.h  |  2 +-
 lib/librte_vhost/vhost_rxtx.c | 40 
 lib/librte_vhost/virtio-net.c | 13 ++---
 3 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 9710009..9dec83c 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -63,7 +63,6 @@ struct vhost_virtqueue {
struct vring_avail  *avail;
struct vring_used   *used;
uint32_tsize;
-   uint16_tvhost_hlen;

/* Last index used on the available ring */
volatile uint16_t   last_used_idx;
@@ -123,6 +122,7 @@ struct virtio_net {
uint64_tprotocol_features;
int vid;
uint32_tflags;
+   uint16_tvhost_hlen;
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
charifname[IF_NAME_SZ];
uint32_tvirt_qp_nb;
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 65278bb..c9cd1c5 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -126,10 +126,10 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct 
virtio_net_hdr *net_hdr)
 }

 static inline void
-copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr,
+copy_virtio_net_hdr(struct virtio_net *dev, uint64_t desc_addr,
struct virtio_net_hdr_mrg_rxbuf hdr)
 {
-   if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf))
+   if (dev->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf))
*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
else
*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
@@ -147,19 +147,19 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};

desc = >desc[desc_idx];
-   if (unlikely(desc->len < vq->vhost_hlen))
+   if (unlikely(desc->len < dev->vhost_hlen))
return -1;

desc_addr = gpa_to_vva(dev, desc->addr);
rte_prefetch0((void *)(uintptr_t)desc_addr);

virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(vq, desc_addr, virtio_hdr);
-   vhost_log_write(dev, desc->addr, vq->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
+   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
+   vhost_log_write(dev, desc->addr, dev->vhost_hlen);
+   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);

-   desc_offset = vq->vhost_hlen;
-   desc_avail  = desc->len - vq->vhost_hlen;
+   desc_offset = dev->vhost_hlen;
+   desc_avail  = desc->len - dev->vhost_hlen;

*copied = rte_pktmbuf_pkt_len(m);
mbuf_avail  = rte_pktmbuf_data_len(m);
@@ -300,9 +300,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

vq->used->ring[used_idx].id = desc_idx;
if (unlikely(err))
-   vq->used->ring[used_idx].len = vq->vhost_hlen;
+   vq->used->ring[used_idx].len = dev->vhost_hlen;
else
-   vq->used->ring[used_idx].len = copied + vq->vhost_hlen;
+   vq->used->ring[used_idx].len = copied + dev->vhost_hlen;
vhost_log_used_vring(dev, vq,
offsetof(struct vring_used, ring[used_idx]),
sizeof(vq->used->ring[used_idx]));
@@ -444,7 +444,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
LOG_DEBUG(VHOST_DATA, "(%d) current index %d | end index %d\n",
dev->vid, cur_idx, res_end_idx);

-   if (vq->buf_vec[vec_idx].buf_len < vq->vhost_hlen)
+   if (vq->buf_vec[vec_idx].buf_len < dev->vhost_hlen)
return -1;

desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr);
@@ -455,12 +455,12 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, 
struct vhost_virtqueue *vq,
dev->vid, virtio_hdr.num_buffers);

virtio_enqueue_offload(m, _hdr.hdr);
-   copy_virtio_net_hdr(vq, desc_addr, virtio_hdr);
-   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen);
-   PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
+   copy_virtio_net_hdr(dev, desc_addr, virtio_hdr);
+   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, dev->vhost_hlen);
+   PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0);

-   desc_avail  = vq->buf_vec[vec_idx].buf_len -

[dpdk-dev] [PATCH 14/16] vhost: reserve few more space for future extension

2016-05-02 Thread Yuanhan Liu

"virtio_net_device_ops" is the only left open struct that an application
can access, therefore, it's the only place that might introduce potential
ABI break in future for extension.

So, do some reservation for it. 5 should be pretty enough, considering
that we have barely touched it for a long while. Another reason to
choose 5 is for cache alignment: 5 makes the struct 64 bytes for 64 bit
machine.

With this, it's confidence to say that we might be able to be free from
the ABI violation forever.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_virtio_net.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 388621e..4e50425 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -77,6 +77,8 @@ struct virtio_net_device_ops {
void (*destroy_device)(int vid);/**< Remove device. */

int (*vring_state_changed)(int vid, uint16_t queue_id, int enable); 
/**< triggered when a vring is enabled or disabled */
+
+   void *reserved[5]; /**< Reserved for future extension */
 };

 /**
-- 
1.9.0

[dpdk-dev] [PATCH 13/16] vhost: remove virtio-net.h

2016-05-02 Thread Yuanhan Liu

It barely has anything useful there, just 2 functions prototype. Here
move them to vhost-net.h, and delete it.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost-net.h  |  3 ++
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  1 -
 lib/librte_vhost/vhost_rxtx.c |  1 -
 lib/librte_vhost/vhost_user/virtio-net-user.c |  1 -
 lib/librte_vhost/virtio-net.c |  1 -
 lib/librte_vhost/virtio-net.h | 43 ---
 6 files changed, 3 insertions(+), 47 deletions(-)
 delete mode 100644 lib/librte_vhost/virtio-net.h

diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
index 2e4c95d..9710009 100644
--- a/lib/librte_vhost/vhost-net.h
+++ b/lib/librte_vhost/vhost-net.h
@@ -214,6 +214,9 @@ gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
return vhost_va;
 }

+struct virtio_net_device_ops const *notify_ops;
+struct virtio_net *get_device(int vid);
+
 int vhost_new_device(void);
 void vhost_destroy_device(int);

diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
index 0723a7a..552be7d 100644
--- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
+++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
@@ -54,7 +54,6 @@
 #include "rte_virtio_net.h"
 #include "vhost-net.h"
 #include "virtio-net-cdev.h"
-#include "virtio-net.h"
 #include "eventfd_copy.h"

 /* Line size for reading maps file. */
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 08cab08..65278bb 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -46,7 +46,6 @@
 #include 

 #include "vhost-net.h"
-#include "virtio-net.h"

 #define MAX_PKT_BURST 32
 #define VHOST_LOG_PAGE 4096
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 7fa69a7..6463bdd 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -43,7 +43,6 @@
 #include 
 #include 

-#include "virtio-net.h"
 #include "virtio-net-user.h"
 #include "vhost-net-user.h"
 #include "vhost-net.h"
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 9fd80a8..6577fe0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -53,7 +53,6 @@
 #include 

 #include "vhost-net.h"
-#include "virtio-net.h"

 #define MAX_VHOST_DEVICE   1024
 static struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
diff --git a/lib/librte_vhost/virtio-net.h b/lib/librte_vhost/virtio-net.h
deleted file mode 100644
index 9812545..000
--- a/lib/librte_vhost/virtio-net.h
+++ /dev/null
@@ -1,43 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- * * Redistributions of source code must retain the above copyright
- *   notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- *   notice, this list of conditions and the following disclaimer in
- *   the documentation and/or other materials provided with the
- *   distribution.
- * * Neither the name of Intel Corporation nor the names of its
- *   contributors may be used to endorse or promote products derived
- *   from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_NET_H
-#define _VIRTIO_NET_H
-
-#include "vhost-net.h"
-#include "rte_virtio_net.h"
-
-struct virtio_net_device_ops const *notify_ops;
-struct virtio_net *get_device(int vid);
-
-#endif
-- 
1.9.0

[dpdk-dev] [PATCH 11/16] vhost: hide internal structs/macros/functions

2016-05-02 Thread Yuanhan Liu

We are now safe to move all those internal structs/macros/functions to
vhost-net.h, to hide them from external access.

This patch also breaks long lines and removes some redundant comments.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_virtio_net.h | 128 --
 lib/librte_vhost/vhost-net.h  | 142 ++
 2 files changed, 142 insertions(+), 128 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 0a26df9..388621e 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -65,111 +65,6 @@ struct rte_mbuf;
 /* Enum for virtqueue management. */
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};

-#define BUF_VECTOR_MAX 256
-
-/**
- * Structure contains buffer address, length and descriptor index
- * from vring to do scatter RX.
- */
-struct buf_vector {
-   uint64_t buf_addr;
-   uint32_t buf_len;
-   uint32_t desc_idx;
-};
-
-/**
- * Structure contains variables relevant to RX/TX virtqueues.
- */
-struct vhost_virtqueue {
-   struct vring_desc   *desc;  /**< Virtqueue 
descriptor ring. */
-   struct vring_avail  *avail; /**< Virtqueue 
available ring. */
-   struct vring_used   *used;  /**< Virtqueue used 
ring. */
-   uint32_tsize;   /**< Size of descriptor 
ring. */
-   int backend;/**< Backend value to 
determine if device should started/stopped. */
-   uint16_tvhost_hlen; /**< Vhost header 
length (varies depending on RX merge buffers. */
-   volatile uint16_t   last_used_idx;  /**< Last index used on 
the available ring */
-   volatile uint16_t   last_used_idx_res;  /**< Used for multiple 
devices reserving buffers. */
-#define VIRTIO_INVALID_EVENTFD (-1)
-#define VIRTIO_UNINITIALIZED_EVENTFD   (-2)
-   int callfd; /**< Used to notify the 
guest (trigger interrupt). */
-   int kickfd; /**< Currently unused 
as polling mode is enabled. */
-   int enabled;
-   uint64_tlog_guest_addr; /**< Physical address 
of used ring, for logging */
-   uint64_treserved[15];   /**< Reserve some 
spaces for future extension. */
-   struct buf_vector   buf_vec[BUF_VECTOR_MAX];/**< for 
scatter RX. */
-} __rte_cache_aligned;
-
-/* Old kernels have no such macro defined */
-#ifndef VIRTIO_NET_F_GUEST_ANNOUNCE
- #define VIRTIO_NET_F_GUEST_ANNOUNCE 21
-#endif
-
-
-/*
- * Make an extra wrapper for VIRTIO_NET_F_MQ and
- * VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX as they are
- * introduced since kernel v3.8. This makes our
- * code buildable for older kernel.
- */
-#ifdef VIRTIO_NET_F_MQ
- #define VHOST_MAX_QUEUE_PAIRS VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX
- #define VHOST_SUPPORTS_MQ (1ULL << VIRTIO_NET_F_MQ)
-#else
- #define VHOST_MAX_QUEUE_PAIRS 1
- #define VHOST_SUPPORTS_MQ 0
-#endif
-
-/*
- * Define virtio 1.0 for older kernels
- */
-#ifndef VIRTIO_F_VERSION_1
- #define VIRTIO_F_VERSION_1 32
-#endif
-
-/**
- * Device structure contains all configuration information relating to the 
device.
- */
-struct virtio_net {
-   struct virtio_memory*mem;   /**< QEMU memory and memory 
region information. */
-   uint64_tfeatures;   /**< Negotiated feature set. */
-   uint64_tprotocol_features;  /**< Negotiated 
protocol feature set. */
-   int vid;/**< device identifier. */
-   uint32_tflags;  /**< Device flags. Only used to 
check if device is running on data core. */
-#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
-   charifname[IF_NAME_SZ]; /**< Name of the tap 
device or socket path. */
-   uint32_tvirt_qp_nb; /**< number of queue pair we 
have allocated */
-   void*priv;  /**< private context */
-   uint64_tlog_size;   /**< Size of log area */
-   uint64_tlog_base;   /**< Where dirty pages are 
logged */
-   struct ether_addr   mac;/**< MAC address */
-   rte_atomic16_t  broadcast_rarp; /**< A flag to tell if we need 
broadcast rarp packet */
-   uint64_treserved[61];   /**< Reserve some spaces for 
future extension. */
-   struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];  /**< 
Contains all virtqueue information. */
-} __rte_cache_aligned;
-
-/**
- * Information relating to memory regions including offsets to addresses in 
QEMUs memory file.
- */
-struct virtio_memory_regions {
-   uint64_tguest_phys_address; /**< Base guest physical

[dpdk-dev] [PATCH 10/16] vhost: export vid as the only interface to applications

2016-05-02 Thread Yuanhan Liu

With all the previous prepare works, we are just one step away from
the final ABI refactoring. That is, to change current API to let them
stick to vid instead of the old virtio_net dev.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 61 ++-
 examples/vhost/main.c | 41 --
 lib/librte_vhost/rte_virtio_net.h | 30 +
 lib/librte_vhost/vhost_rxtx.c | 15 ++-
 lib/librte_vhost/vhost_user/virtio-net-user.c | 14 +++---
 lib/librte_vhost/virtio-net.c | 17 +---
 6 files changed, 91 insertions(+), 87 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 9763cd4..a9dada5 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -68,9 +68,9 @@ static struct ether_addr base_eth_addr = {
 };

 struct vhost_queue {
+   int vid;
rte_atomic32_t allow_queuing;
rte_atomic32_t while_queuing;
-   struct virtio_net *device;
struct pmd_internal *internal;
struct rte_mempool *mb_pool;
uint8_t port;
@@ -137,7 +137,7 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t 
nb_bufs)
goto out;

/* Dequeue packets from guest TX queue */
-   nb_rx = rte_vhost_dequeue_burst(r->device,
+   nb_rx = rte_vhost_dequeue_burst(r->vid,
r->virtqueue_id, r->mb_pool, bufs, nb_bufs);

r->rx_pkts += nb_rx;
@@ -168,7 +168,7 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t 
nb_bufs)
goto out;

/* Enqueue packets to guest RX queue */
-   nb_tx = rte_vhost_enqueue_burst(r->device,
+   nb_tx = rte_vhost_enqueue_burst(r->vid,
r->virtqueue_id, bufs, nb_bufs);

r->tx_pkts += nb_tx;
@@ -217,7 +217,7 @@ find_internal_resource(int vid)
 }

 static int
-new_device(struct virtio_net *dev)
+new_device(int vid)
 {
struct rte_eth_dev *eth_dev;
struct internal_list *list;
@@ -228,23 +228,17 @@ new_device(struct virtio_net *dev)
int newnode;
 #endif

-   if (dev == NULL) {
-   RTE_LOG(INFO, PMD, "Invalid argument\n");
-   return -1;
-   }
-
-   list = find_internal_resource(dev->vid);
+   list = find_internal_resource(vid);
if (list == NULL) {
-   RTE_LOG(INFO, PMD, "Invalid vid %d\n", dev->vid);
+   RTE_LOG(INFO, PMD, "Invalid vid %d\n", vid);
return -1;
}

eth_dev = list->eth_dev;
internal = eth_dev->data->dev_private;
-   internal->vid = dev->vid;

 #ifdef RTE_LIBRTE_VHOST_NUMA
-   newnode = rte_vhost_get_numa_node(dev->vid);
+   newnode = rte_vhost_get_numa_node(vid);
if (newnode > 0)
eth_dev->data->numa_node = newnode;
 #endif
@@ -253,7 +247,7 @@ new_device(struct virtio_net *dev)
vq = eth_dev->data->rx_queues[i];
if (vq == NULL)
continue;
-   vq->device = dev;
+   vq->vid = vid;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
}
@@ -261,15 +255,14 @@ new_device(struct virtio_net *dev)
vq = eth_dev->data->tx_queues[i];
if (vq == NULL)
continue;
-   vq->device = dev;
+   vq->vid = vid;
vq->internal = internal;
vq->port = eth_dev->data->port_id;
}

-   for (i = 0; i < rte_vhost_get_queue_num(dev->vid) * VIRTIO_QNUM; i++)
-   rte_vhost_enable_guest_notification(dev, i, 0);
+   for (i = 0; i < rte_vhost_get_queue_num(vid) * VIRTIO_QNUM; i++)
+   rte_vhost_enable_guest_notification(vid, i, 0);

-   dev->priv = eth_dev;
eth_dev->data->dev_link.link_status = ETH_LINK_UP;

for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
@@ -293,22 +286,19 @@ new_device(struct virtio_net *dev)
 }

 static void
-destroy_device(volatile struct virtio_net *dev)
+destroy_device(int vid)
 {
+   struct internal_list *list;
struct rte_eth_dev *eth_dev;
struct vhost_queue *vq;
unsigned i;

-   if (dev == NULL) {
-   RTE_LOG(INFO, PMD, "Invalid argument\n");
-   return;
-   }
-
-   eth_dev = (struct rte_eth_dev *)dev->priv;
-   if (eth_dev == NULL) {
-   RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
+   list = find_internal_resource(vid);
+   if (list == NULL) {
+   RTE_LOG(INFO, PMD, "Invalid vid %d\n", vid);
return;
}
+   eth_dev = list->eth_dev;

/* Wait until rx/tx_pkt_burst stops accessing vhost device */
for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
@@ -330,19 +320,17 @@ destroy_device(volatile struct virtio_net *dev)

[dpdk-dev] [PATCH 09/16] vhost: add few more functions

2016-05-02 Thread Yuanhan Liu

Add few more functions to export few more fields or informations of
virtio_net struct, to applications, as we are gonna make them private.
It includes:

- rte_vhost_avail_entries
  It's actually a rename of "rte_vring_available_entries", with the
  "vring" to "vhost" name change to keep the consistency of other
  functions.

- rte_vhost_get_queue_num
  Exports the "virt_qp_nb" field.

- rte_vhost_get_numa_node
  Exports the numa node from where the virtio net device is allocated.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 18 -
 examples/vhost/main.c |  4 +--
 lib/librte_vhost/rte_virtio_net.h | 37 +++
 lib/librte_vhost/virtio-net.c | 54 +++
 4 files changed, 98 insertions(+), 15 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 290fd9e..9763cd4 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -33,9 +33,6 @@
 #include 
 #include 
 #include 
-#ifdef RTE_LIBRTE_VHOST_NUMA
-#include 
-#endif

 #include 
 #include 
@@ -228,7 +225,7 @@ new_device(struct virtio_net *dev)
struct vhost_queue *vq;
unsigned i;
 #ifdef RTE_LIBRTE_VHOST_NUMA
-   int newnode, ret;
+   int newnode;
 #endif

if (dev == NULL) {
@@ -247,14 +244,9 @@ new_device(struct virtio_net *dev)
internal->vid = dev->vid;

 #ifdef RTE_LIBRTE_VHOST_NUMA
-   ret  = get_mempolicy(, NULL, 0, dev,
-   MPOL_F_NODE | MPOL_F_ADDR);
-   if (ret < 0) {
-   RTE_LOG(ERR, PMD, "Unknown numa node\n");
-   return -1;
-   }
-
-   eth_dev->data->numa_node = newnode;
+   newnode = rte_vhost_get_numa_node(dev->vid);
+   if (newnode > 0)
+   eth_dev->data->numa_node = newnode;
 #endif

for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
@@ -274,7 +266,7 @@ new_device(struct virtio_net *dev)
vq->port = eth_dev->data->port_id;
}

-   for (i = 0; i < dev->virt_qp_nb * VIRTIO_QNUM; i++)
+   for (i = 0; i < rte_vhost_get_queue_num(dev->vid) * VIRTIO_QNUM; i++)
rte_vhost_enable_guest_notification(dev, i, 0);

dev->priv = eth_dev;
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index e395e4a..145fa6f 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1056,13 +1056,13 @@ drain_eth_rx(struct vhost_dev *vdev)
 * to diminish packet loss.
 */
if (enable_retry &&
-   unlikely(rx_count > rte_vring_available_entries(dev,
+   unlikely(rx_count > rte_vhost_avail_entries(vdev->vid,
VIRTIO_RXQ))) {
uint32_t retry;

for (retry = 0; retry < burst_rx_retry_num; retry++) {
rte_delay_us(burst_rx_delay_time);
-   if (rx_count <= rte_vring_available_entries(dev,
+   if (rx_count <= rte_vhost_avail_entries(vdev->vid,
VIRTIO_RXQ))
break;
}
diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index bc64e89..27f6847 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -245,6 +245,43 @@ int rte_vhost_driver_callback_register(struct 
virtio_net_device_ops const * cons
 int rte_vhost_driver_session_start(void);

 /**
+ * Get how many avail entries are left in the queue @queue_id.
+ *
+ * @param vid
+ *  virtio-net device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ *
+ * @return
+ *  num of avail entires left
+ */
+uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
+
+/**
+ * Get the number of queues the device supports.
+ *
+ * @param vid
+ *  virtio-net device ID
+ *
+ * @return
+ *  The number of queues, 0 on failure
+ */
+uint32_t rte_vhost_get_queue_num(int vid);
+
+/**
+ * Get the numa node from which the virtio net device's memory
+ * is allocated.
+ *
+ * @param vid
+ *  virtio-net device ID
+ *
+ * @return
+ *  The numa node, -1 on failure
+ */
+int rte_vhost_get_numa_node(int vid);
+
+
+/**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtual device. A packet
  * count is returned to indicate the number of packets that were succesfully
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index ea28090..6bf4d87 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -753,6 +753,60 @@ int rte_vhost_feature_disable(uint64_t feature_mask)
return 0;
 }

+uint16_t
+rte_vhost_avail_entries(int vid, uint16_t queue_id)
+{
+   struct virtio_net *dev;
+   struct vhost_virtqueue *vq;
+
+   dev = get_device(vid);
+   if (!dev)
+   return 0;
+
+   vq = dev->virtqueue[queue_id];
+

[dpdk-dev] [PATCH 08/16] vhost: query pmd internal by vid

2016-05-02 Thread Yuanhan Liu

Query internal by vid instead of "ifname", to avoid the dependency of
virtio_net struct.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 63538c1..290fd9e 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -86,6 +86,7 @@ struct vhost_queue {
 };

 struct pmd_internal {
+   int vid;
char *dev_name;
char *iface_name;
uint16_t max_queues;
@@ -194,20 +195,17 @@ eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
 }

 static inline struct internal_list *
-find_internal_resource(char *ifname)
+find_internal_resource(int vid)
 {
int found = 0;
struct internal_list *list;
struct pmd_internal *internal;

-   if (ifname == NULL)
-   return NULL;
-
pthread_mutex_lock(_list_lock);

TAILQ_FOREACH(list, _list, next) {
internal = list->eth_dev->data->dev_private;
-   if (!strcmp(internal->iface_name, ifname)) {
+   if (internal->vid == vid) {
found = 1;
break;
}
@@ -238,14 +236,15 @@ new_device(struct virtio_net *dev)
return -1;
}

-   list = find_internal_resource(dev->ifname);
+   list = find_internal_resource(dev->vid);
if (list == NULL) {
-   RTE_LOG(INFO, PMD, "Invalid device name\n");
+   RTE_LOG(INFO, PMD, "Invalid vid %d\n", dev->vid);
return -1;
}

eth_dev = list->eth_dev;
internal = eth_dev->data->dev_private;
+   internal->vid = dev->vid;

 #ifdef RTE_LIBRTE_VHOST_NUMA
ret  = get_mempolicy(, NULL, 0, dev,
@@ -371,9 +370,9 @@ vring_state_changed(struct virtio_net *dev, uint16_t vring, 
int enable)
return -1;
}

-   list = find_internal_resource(dev->ifname);
+   list = find_internal_resource(dev->vid);
if (list == NULL) {
-   RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev->ifname);
+   RTE_LOG(ERR, PMD, "Invalid vid %d\n", dev->vid);
return -1;
}

@@ -884,7 +883,7 @@ rte_pmd_vhost_devuninit(const char *name)
if (internal == NULL)
return -ENODEV;

-   list = find_internal_resource(internal->iface_name);
+   list = find_internal_resource(internal->vid);
if (list == NULL)
return -ENODEV;

-- 
1.9.0

[dpdk-dev] [PATCH 04/16] example/vhost: make a copy of virtio device id

2016-05-02 Thread Yuanhan Liu

Make a copy of virtio device id (device_fh) from the virtio_net struct,
so that we could have less dependency on the virtio_net struct.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 59 ---
 examples/vhost/main.h |  1 +
 2 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 23bfe09..7273897 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -709,7 +709,6 @@ static int
 link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
struct ether_hdr *pkt_hdr;
-   struct virtio_net *dev = vdev->dev;
int i, ret;

/* Learn MAC address of guest device from packet */
@@ -718,7 +717,7 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
if (find_vhost_dev(_hdr->s_addr)) {
RTE_LOG(ERR, VHOST_DATA,
"(%d) device is using a registered MAC!\n",
-   dev->device_fh);
+   vdev->device_fh);
return -1;
}

@@ -726,12 +725,12 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
vdev->mac_address.addr_bytes[i] = pkt_hdr->s_addr.addr_bytes[i];

/* vlan_tag currently uses the device_id. */
-   vdev->vlan_tag = vlan_tags[dev->device_fh];
+   vdev->vlan_tag = vlan_tags[vdev->device_fh];

/* Print out VMDQ registration info. */
RTE_LOG(INFO, VHOST_DATA,
"(%d) mac %02x:%02x:%02x:%02x:%02x:%02x and vlan %d 
registered\n",
-   dev->device_fh,
+   vdev->device_fh,
vdev->mac_address.addr_bytes[0], 
vdev->mac_address.addr_bytes[1],
vdev->mac_address.addr_bytes[2], 
vdev->mac_address.addr_bytes[3],
vdev->mac_address.addr_bytes[4], 
vdev->mac_address.addr_bytes[5],
@@ -739,11 +738,11 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)

/* Register the MAC address. */
ret = rte_eth_dev_mac_addr_add(ports[0], >mac_address,
-   (uint32_t)dev->device_fh + vmdq_pool_base);
+   (uint32_t)vdev->device_fh + vmdq_pool_base);
if (ret)
RTE_LOG(ERR, VHOST_DATA,
"(%d) failed to add device MAC address to VMDQ\n",
-   dev->device_fh);
+   vdev->device_fh);

/* Enable stripping of the vlan tag as we handle routing. */
if (vlan_strip)
@@ -815,7 +814,6 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
struct ether_hdr *pkt_hdr;
struct vhost_dev *dst_vdev;
-   int fh;

pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

@@ -823,19 +821,19 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf 
*m)
if (!dst_vdev)
return -1;

-   fh = dst_vdev->dev->device_fh;
-   if (fh == vdev->dev->device_fh) {
+   if (vdev->device_fh == dst_vdev->device_fh) {
RTE_LOG(DEBUG, VHOST_DATA,
"(%d) TX: src and dst MAC is same. Dropping packet.\n",
-   fh);
+   vdev->device_fh);
return 0;
}

-   RTE_LOG(DEBUG, VHOST_DATA, "(%d) TX: MAC address is local\n", fh);
+   RTE_LOG(DEBUG, VHOST_DATA,
+   "(%d) TX: MAC address is local\n", dst_vdev->device_fh);

if (unlikely(dst_vdev->remove)) {
RTE_LOG(DEBUG, VHOST_DATA,
-   "(%d) device is marked for removal\n", fh);
+   "(%d) device is marked for removal\n", 
dst_vdev->device_fh);
return 0;
}

@@ -848,7 +846,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
  * and get its vlan tag, and offset if it is.
  */
 static inline int __attribute__((always_inline))
-find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,
+find_local_dest(struct vhost_dev *vdev, struct rte_mbuf *m,
uint32_t *offset, uint16_t *vlan_tag)
 {
struct vhost_dev *dst_vdev;
@@ -858,10 +856,10 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf 
*m,
if (!dst_vdev)
return 0;

-   if (dst_vdev->dev->device_fh == dev->device_fh) {
+   if (vdev->device_fh == dst_vdev->device_fh) {
RTE_LOG(DEBUG, VHOST_DATA,
"(%d) TX: src and dst MAC is same. Dropping packet.\n",
-   dst_vdev->dev->device_fh);
+   vdev->device_fh);
return -1;
}

@@ -871,11 +869,11 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf 
*m,
 * the packet length by plus it.
 */
*offset  = VLAN_HLEN;
-   *vlan_tag = vlan_tags[(uint16_t)dst_vdev->dev->device_fh];
+   *vlan_tag = vlan_tags[vdev->device_fh];

RTE_LOG(DEBUG, VHOST_DATA,
-   "(%d) TX: pkt to local VM device id (%d) vlan tag: %u.\n",
-

[dpdk-dev] [PATCH 03/16] vhost: declare device_fh as int

2016-05-02 Thread Yuanhan Liu

Firstly, "int" would be big enough: we don't need 64 bit to represent
a virtio-net device id.

Secondly, this could let us avoid the ugly "%" PRIu64 ".." stuff.

And since ctx.fh is derived from device_fh, declare it as int, too.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 45 ++-
 lib/librte_vhost/rte_virtio_net.h |  2 +-
 lib/librte_vhost/vhost-net.h  |  8 ++---
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 34 ++--
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  7 ++---
 lib/librte_vhost/vhost_rxtx.c | 34 +---
 lib/librte_vhost/vhost_user/virtio-net-user.c |  2 +-
 lib/librte_vhost/virtio-net.c | 21 ++---
 8 files changed, 74 insertions(+), 79 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 93f9994..23bfe09 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -717,7 +717,7 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)

if (find_vhost_dev(_hdr->s_addr)) {
RTE_LOG(ERR, VHOST_DATA,
-   "Device (%" PRIu64 ") is using a registered MAC!\n",
+   "(%d) device is using a registered MAC!\n",
dev->device_fh);
return -1;
}
@@ -729,7 +729,8 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
vdev->vlan_tag = vlan_tags[dev->device_fh];

/* Print out VMDQ registration info. */
-   RTE_LOG(INFO, VHOST_DATA, "(%"PRIu64") MAC_ADDRESS 
%02x:%02x:%02x:%02x:%02x:%02x and VLAN_TAG %d registered\n",
+   RTE_LOG(INFO, VHOST_DATA,
+   "(%d) mac %02x:%02x:%02x:%02x:%02x:%02x and vlan %d 
registered\n",
dev->device_fh,
vdev->mac_address.addr_bytes[0], 
vdev->mac_address.addr_bytes[1],
vdev->mac_address.addr_bytes[2], 
vdev->mac_address.addr_bytes[3],
@@ -740,8 +741,9 @@ link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
ret = rte_eth_dev_mac_addr_add(ports[0], >mac_address,
(uint32_t)dev->device_fh + vmdq_pool_base);
if (ret)
-   RTE_LOG(ERR, VHOST_DATA, "(%"PRIu64") Failed to add device MAC 
address to VMDQ\n",
-   dev->device_fh);
+   RTE_LOG(ERR, VHOST_DATA,
+   "(%d) failed to add device MAC address to VMDQ\n",
+   dev->device_fh);

/* Enable stripping of the vlan tag as we handle routing. */
if (vlan_strip)
@@ -813,7 +815,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
struct ether_hdr *pkt_hdr;
struct vhost_dev *dst_vdev;
-   uint64_t fh;
+   int fh;

pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

@@ -824,17 +826,16 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf 
*m)
fh = dst_vdev->dev->device_fh;
if (fh == vdev->dev->device_fh) {
RTE_LOG(DEBUG, VHOST_DATA,
-   "(%" PRIu64 ") TX: src and dst MAC is same. "
-   "Dropping packet.\n", fh);
+   "(%d) TX: src and dst MAC is same. Dropping packet.\n",
+   fh);
return 0;
}

-   RTE_LOG(DEBUG, VHOST_DATA,
-   "(%" PRIu64 ") TX: MAC address is local\n", fh);
+   RTE_LOG(DEBUG, VHOST_DATA, "(%d) TX: MAC address is local\n", fh);

if (unlikely(dst_vdev->remove)) {
-   RTE_LOG(DEBUG, VHOST_DATA, "(%" PRIu64 ") "
-   "Device is marked for removal\n", fh);
+   RTE_LOG(DEBUG, VHOST_DATA,
+   "(%d) device is marked for removal\n", fh);
return 0;
}

@@ -859,8 +860,8 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,

if (dst_vdev->dev->device_fh == dev->device_fh) {
RTE_LOG(DEBUG, VHOST_DATA,
-   "(%" PRIu64 ") TX: src and dst MAC is same. "
-   " Dropping packet.\n", dst_vdev->dev->device_fh);
+   "(%d) TX: src and dst MAC is same. Dropping packet.\n",
+   dst_vdev->dev->device_fh);
return -1;
}

@@ -873,8 +874,7 @@ find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,
*vlan_tag = vlan_tags[(uint16_t)dst_vdev->dev->device_fh];

RTE_LOG(DEBUG, VHOST_DATA,
-   "(%" PRIu64 ") TX: pkt to local VM device id: (%" PRIu64 ") "
-   "vlan tag: %u.\n",
+   "(%d) TX: pkt to local VM device id (%d) vlan tag: %u.\n",
dev->device_fh, dst_vdev->dev->device_fh, *vlan_tag);

return 0;
@@ -965,8 +965,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, 
uint16_t vlan_tag)
}
}

-   RTE_LOG(DEBUG, VHOST_DATA, "(%" PRIu64 ") TX: "
-   "MAC address is

[dpdk-dev] [PATCH 02/16] vhost: set/reset dev flags internally

2016-05-02 Thread Yuanhan Liu

It does not make sense to ask the application to set/unset the flag
VIRTIO_DEV_RUNNING (that used internal only) at new_device()/
destroy_device() callback.

Instead, it should be set after new_device() succeeds and reset before
destroy_device() is invoked inside vhost lib. This patch fixes it.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/vhost/rte_eth_vhost.c |  2 --
 examples/vhost/main.c |  3 ---
 lib/librte_vhost/vhost_user/virtio-net-user.c | 11 +++
 lib/librte_vhost/virtio-net.c | 21 ++---
 4 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 310cbef..63538c1 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -278,7 +278,6 @@ new_device(struct virtio_net *dev)
for (i = 0; i < dev->virt_qp_nb * VIRTIO_QNUM; i++)
rte_vhost_enable_guest_notification(dev, i, 0);

-   dev->flags |= VIRTIO_DEV_RUNNING;
dev->priv = eth_dev;
eth_dev->data->dev_link.link_status = ETH_LINK_UP;

@@ -341,7 +340,6 @@ destroy_device(volatile struct virtio_net *dev)
eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;

dev->priv = NULL;
-   dev->flags &= ~VIRTIO_DEV_RUNNING;

for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
vq = eth_dev->data->rx_queues[i];
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index d3da41b..93f9994 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1180,8 +1180,6 @@ destroy_device (volatile struct virtio_net *dev)
struct vhost_dev *vdev;
int lcore;

-   dev->flags &= ~VIRTIO_DEV_RUNNING;
-
vdev = (struct vhost_dev *)dev->priv;
/*set the remove flag. */
vdev->remove = 1;
@@ -1258,7 +1256,6 @@ new_device (struct virtio_net *dev)
/* Disable notifications. */
rte_vhost_enable_guest_notification(dev, VIRTIO_RXQ, 0);
rte_vhost_enable_guest_notification(dev, VIRTIO_TXQ, 0);
-   dev->flags |= VIRTIO_DEV_RUNNING;

RTE_LOG(INFO, VHOST_DATA, "(%"PRIu64") Device has been added to data 
core %d\n", dev->device_fh, vdev->coreid);

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c 
b/lib/librte_vhost/vhost_user/virtio-net-user.c
index f5248bc..e775e45 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -115,8 +115,10 @@ user_set_mem_table(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
return -1;

/* Remove from the data plane. */
-   if (dev->flags & VIRTIO_DEV_RUNNING)
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
notify_ops->destroy_device(dev);
+   }

if (dev->mem) {
free_mem_region(dev);
@@ -286,9 +288,10 @@ user_set_vring_kick(struct vhost_device_ctx ctx, struct 
VhostUserMsg *pmsg)
"vring kick idx:%d file:%d\n", file.index, file.fd);
vhost_set_vring_kick(ctx, );

-   if (virtio_is_ready(dev) &&
-   !(dev->flags & VIRTIO_DEV_RUNNING))
-   notify_ops->new_device(dev);
+   if (virtio_is_ready(dev) && !(dev->flags & VIRTIO_DEV_RUNNING)) {
+   if (notify_ops->new_device(dev) == 0)
+   dev->flags |= VIRTIO_DEV_RUNNING;
+   }
 }

 /*
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index f7c7215..5eea3be 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -296,8 +296,10 @@ vhost_destroy_device(struct vhost_device_ctx ctx)
if (dev == NULL)
return;

-   if (dev->flags & VIRTIO_DEV_RUNNING)
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
notify_ops->destroy_device(dev);
+   }

cleanup_device(dev, 1);
free_device(dev);
@@ -352,8 +354,10 @@ vhost_reset_owner(struct vhost_device_ctx ctx)
if (dev == NULL)
return -1;

-   if (dev->flags & VIRTIO_DEV_RUNNING)
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   dev->flags &= ~VIRTIO_DEV_RUNNING;
notify_ops->destroy_device(dev);
+   }

cleanup_device(dev, 0);
reset_device(dev);
@@ -718,12 +722,15 @@ vhost_set_backend(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
if (dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED &&
dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) {
-   return notify_ops->new_device(dev);
+   if (notify_ops->new_device(dev) < 0)
+   return -1;
+   dev->flags |= VIRTIO_DEV_RUNNING;
}
-   /* Otherwise we

[dpdk-dev] [PATCH 01/16] vhost: declare backend with int type

2016-05-02 Thread Yuanhan Liu

It's an fd; so define it as "int", which could also save the unncessary
(int) case.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/rte_virtio_net.h | 2 +-
 lib/librte_vhost/virtio-net.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_vhost/rte_virtio_net.h 
b/lib/librte_vhost/rte_virtio_net.h
index 600b20b..4d25f79 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -85,7 +85,7 @@ struct vhost_virtqueue {
struct vring_avail  *avail; /**< Virtqueue 
available ring. */
struct vring_used   *used;  /**< Virtqueue used 
ring. */
uint32_tsize;   /**< Size of descriptor 
ring. */
-   uint32_tbackend;/**< Backend value to 
determine if device should started/stopped. */
+   int backend;/**< Backend value to 
determine if device should started/stopped. */
uint16_tvhost_hlen; /**< Vhost header 
length (varies depending on RX merge buffers. */
volatile uint16_t   last_used_idx;  /**< Last index used on 
the available ring */
volatile uint16_t   last_used_idx_res;  /**< Used for multiple 
devices reserving buffers. */
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index d870ad9..f7c7215 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -716,8 +716,8 @@ vhost_set_backend(struct vhost_device_ctx ctx, struct 
vhost_vring_file *file)
 * we add the device.
 */
if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
-   if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != 
VIRTIO_DEV_STOPPED) &&
-   ((int)dev->virtqueue[VIRTIO_RXQ]->backend != 
VIRTIO_DEV_STOPPED)) {
+   if (dev->virtqueue[VIRTIO_TXQ]->backend != VIRTIO_DEV_STOPPED &&
+   dev->virtqueue[VIRTIO_RXQ]->backend != VIRTIO_DEV_STOPPED) {
return notify_ops->new_device(dev);
}
/* Otherwise we remove it. */
-- 
1.9.0

[dpdk-dev] [PATCH 00/16] vhost ABI/API refactoring

2016-05-02 Thread Yuanhan Liu

Every time we introduce a new feature to vhost, we are likely
to break ABI. Moreover, some cleanups (such as the one from Ilya
to remove vec_buf from vhost_virtqueue struct) also break ABI.

This patch set is meant to resolve above issue ultimately, by
hiding virtio_net structure (as well as few others) internaly,
and export the virtio_net dev strut to applications by a number,
vid, like the way kernel exposes an fd to user space.

Back to the patch set, the first part of this set makes some
changes to vhost example, vhost-pmd and vhost, bit by bit, to
remove the dependence to "virtio_net" struct. And then do the
final change to make the current APIs to adapt to using "vid".

After that, "vrtio_net_device_ops" is the only left open struct
that an application can acces, thefeore, it's the only place
that might introduce potential ABI breakage in future for
extension. Hence, I made few more (5) space reservation, to
make sure we will not break ABI for a long time, and hopefuly,
forever.

The last bit of this patch set is some cleanups, including the
one from Ilya.

Note that this refactoring breaks the tep_termination example.
Well, it's just another copy of the original messy vhost example,
and I have no interest to cleanup it again. Therefore, I might
consider to remove that example later, and add the vxlan bits
into vhost example.

Few more TODOs: update release note, update lib version, update
version.map

Thanks.

--yliu

---
Ilya Maximets (1):
  vhost: make buf vector for scatter Rx local

Yuanhan Liu (15):
  vhost: declare backend with int type
  vhost: set/reset dev flags internally
  vhost: declare device_fh as int
  example/vhost: make a copy of virtio device id
  vhost: rename device_fh to vid
  vhost: get device by vid only
  vhost: move vhost_device_ctx to cuse
  vhost: query pmd internal by vid
  vhost: add few more functions
  vhost: export vid as the only interface to applications
  vhost: hide internal structs/macros/functions
  vhost: remove unnecessary fields
  vhost: remove virtio-net.h
  vhost: reserve few more space for future extension
  vhost: per device vhost_hlen

 drivers/net/vhost/rte_eth_vhost.c |  86 ---
 examples/vhost/main.c | 126 ---
 examples/vhost/main.h |   1 +
 lib/librte_vhost/rte_virtio_net.h | 197 ++--
 lib/librte_vhost/vhost-net.h  | 195 +++
 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  |  83 +-
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |  30 ++--
 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  12 +-
 lib/librte_vhost/vhost_rxtx.c | 133 
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  53 +++
 lib/librte_vhost/vhost_user/virtio-net-user.c |  64 
 lib/librte_vhost/vhost_user/virtio-net-user.h |  18 +--
 lib/librte_vhost/virtio-net.c | 213 --
 lib/librte_vhost/virtio-net.h |  43 --
 14 files changed, 644 insertions(+), 610 deletions(-)
 delete mode 100644 lib/librte_vhost/virtio-net.h

-- 
1.9.0

[dpdk-dev] [PATCH] lpm6: fix assigned value is garbage or undefined

2016-05-02 Thread Thomas Monjalon

2016-04-27 17:07, Daniel Mrzyglod:
> Fix issue reported by clang scan-build
> 
> Value of pointer tbl_next was uninitialized. When function lookup_step()
> take else branch it may provide garbage into tbl = tbl_next;
> 
> Fixes: 5c510e13a9cb ("lpm: add IPv6 support")
> 
> Signed-off-by: Daniel Mrzyglod 

Applied, thanks

[dpdk-dev] [PATCH] cfgfile: fix uninitialized scalar variable

2016-05-02 Thread Thomas Monjalon

> > CID 13323:
> > Uninitialized scalar variable. Using uninitialized value
> > cfg->num_sections when calling rte_cfgfile_close.
> > 
> > Fixes: eaafbad419bf ("cfgfile: library to interpret config files")
> > 
> > Signed-off-by: Michal Kobylinski 
> 
> Acked-by: Cristian Dumitrescu 

Applied, thanks

[dpdk-dev] [PATCH] cfgfile: fix return value comment

2016-05-02 Thread Thomas Monjalon

> > Function rte_cfgfile_load can return NULL value, when something goes
> > wrong.
> > 
> > Signed-off-by: Dmitriy Yakovlev 
> Acked-by: Cristian Dumitrescu 

Applied, thanks

[dpdk-dev] [PATCH v2 8/8] examples/vhost: embed statistics into vhost_dev struct

2016-05-02 Thread Yuanhan Liu

Embed dev_statistics into vhost_dev strcuct, which could clean
the code a bit.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 87 +++
 examples/vhost/main.h | 11 +--
 2 files changed, 40 insertions(+), 58 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 66d3bf2..d3da41b 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -217,15 +217,6 @@ struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE];
 / US_PER_S * BURST_TX_DRAIN_US)
 #define VLAN_HLEN   4

-/* Per-device statistics struct */
-struct device_statistics {
-   uint64_t tx_total;
-   rte_atomic64_t rx_total_atomic;
-   uint64_t tx;
-   rte_atomic64_t rx_atomic;
-} __rte_cache_aligned;
-struct device_statistics dev_statistics[MAX_DEVICES];
-
 /*
  * Builds up the correct configuration for VMDQ VLAN pool map
  * according to the pool & queue limits.
@@ -799,17 +790,17 @@ unlink_vmdq(struct vhost_dev *vdev)
 }

 static inline void __attribute__((always_inline))
-virtio_xmit(struct virtio_net *dst_dev, struct virtio_net *src_dev,
+virtio_xmit(struct vhost_dev *dst_vdev, struct vhost_dev *src_vdev,
struct rte_mbuf *m)
 {
uint16_t ret;

-   ret = rte_vhost_enqueue_burst(dst_dev, VIRTIO_RXQ, , 1);
+   ret = rte_vhost_enqueue_burst(dst_vdev->dev, VIRTIO_RXQ, , 1);
if (enable_stats) {
-   
rte_atomic64_inc(_statistics[dst_dev->device_fh].rx_total_atomic);
-   rte_atomic64_add(_statistics[dst_dev->device_fh].rx_atomic, 
ret);
-   dev_statistics[src_dev->device_fh].tx_total++;
-   dev_statistics[src_dev->device_fh].tx += ret;
+   rte_atomic64_inc(_vdev->stats.rx_total_atomic);
+   rte_atomic64_add(_vdev->stats.rx_atomic, ret);
+   src_vdev->stats.tx_total++;
+   src_vdev->stats.tx += ret;
}
 }

@@ -847,7 +838,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
return 0;
}

-   virtio_xmit(dst_vdev->dev, vdev->dev, m);
+   virtio_xmit(dst_vdev, vdev, m);
return 0;
 }

@@ -956,7 +947,7 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, 
uint16_t vlan_tag)
struct vhost_dev *vdev2;

TAILQ_FOREACH(vdev2, _dev_list, next) {
-   virtio_xmit(vdev2->dev, vdev->dev, m);
+   virtio_xmit(vdev2, vdev, m);
}
goto queue2nic;
}
@@ -1020,8 +1011,8 @@ queue2nic:

tx_q->m_table[tx_q->len++] = m;
if (enable_stats) {
-   dev_statistics[dev->device_fh].tx_total++;
-   dev_statistics[dev->device_fh].tx++;
+   vdev->stats.tx_total++;
+   vdev->stats.tx++;
}

if (unlikely(tx_q->len == MAX_PKT_BURST))
@@ -1082,10 +1073,8 @@ drain_eth_rx(struct vhost_dev *vdev)
enqueue_count = rte_vhost_enqueue_burst(dev, VIRTIO_RXQ,
pkts, rx_count);
if (enable_stats) {
-   uint64_t fh = dev->device_fh;
-
-   rte_atomic64_add(_statistics[fh].rx_total_atomic, rx_count);
-   rte_atomic64_add(_statistics[fh].rx_atomic, enqueue_count);
+   rte_atomic64_add(>stats.rx_total_atomic, rx_count);
+   rte_atomic64_add(>stats.rx_atomic, enqueue_count);
}

free_pkts(pkts, rx_count);
@@ -1266,9 +1255,6 @@ new_device (struct virtio_net *dev)
TAILQ_INSERT_TAIL(_info[vdev->coreid].vdev_list, vdev, next);
lcore_info[vdev->coreid].device_num++;

-   /* Initialize device stats */
-   memset(_statistics[dev->device_fh], 0, sizeof(struct 
device_statistics));
-
/* Disable notifications. */
rte_vhost_enable_guest_notification(dev, VIRTIO_RXQ, 0);
rte_vhost_enable_guest_notification(dev, VIRTIO_TXQ, 0);
@@ -1299,7 +1285,6 @@ print_stats(void)
struct vhost_dev *vdev;
uint64_t tx_dropped, rx_dropped;
uint64_t tx, tx_total, rx, rx_total;
-   uint32_t device_fh;
const char clr[] = { 27, '[', '2', 'J', '\0' };
const char top_left[] = { 27, '[', '1', ';', '1', 'H','\0' };

@@ -1307,37 +1292,32 @@ print_stats(void)
sleep(enable_stats);

/* Clear screen and move to top left */
-   printf("%s%s", clr, top_left);
-
-   printf("\nDevice statistics 
");
+   printf("%s%s\n", clr, top_left);
+   printf("Device statistics =\n");

TAILQ_FOREACH(vdev, _dev_list, next) {
-   device_fh = vdev->dev->device_fh;
-   tx_total = dev_statistics[device_fh].tx_total;
-   tx = dev_statistics[device_fh].tx;
+   tx_total   = vdev->stats.tx_total;
+

[dpdk-dev] [PATCH v2 7/8] examples/vhost: switch_worker cleanup

2016-05-02 Thread Yuanhan Liu

switch_worker() is the last piece of code that is messy yet it touches
virtio/vhost device.

Here do a cleanup, so that we will be less painful for later vhost ABI
refactoring.

The cleanup is straigforward: break long lines, move some code into
functions. The last, comment a bit on switch_worker().

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 253 +++---
 1 file changed, 136 insertions(+), 117 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index dbb42ee..66d3bf2 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -213,6 +213,8 @@ struct mbuf_table {
 /* TX queue for each data core. */
 struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE];

+#define MBUF_TABLE_DRAIN_TSC   ((rte_get_tsc_hz() + US_PER_S - 1) \
+/ US_PER_S * BURST_TX_DRAIN_US)
 #define VLAN_HLEN   4

 /* Per-device statistics struct */
@@ -915,16 +917,35 @@ static void virtio_tx_offload(struct rte_mbuf *m)
tcp_hdr->cksum = get_psd_sum(l3_hdr, m->ol_flags);
 }

+static inline void
+free_pkts(struct rte_mbuf **pkts, uint16_t n)
+{
+   while (n--)
+   rte_pktmbuf_free(pkts[n]);
+}
+
+static inline void __attribute__((always_inline))
+do_drain_mbuf_table(struct mbuf_table *tx_q)
+{
+   uint16_t count;
+
+   count = rte_eth_tx_burst(ports[0], tx_q->txq_id,
+tx_q->m_table, tx_q->len);
+   if (unlikely(count < tx_q->len))
+   free_pkts(_q->m_table[count], tx_q->len - count);
+
+   tx_q->len = 0;
+}
+
 /*
- * This function routes the TX packet to the correct interface. This may be a 
local device
- * or the physical port.
+ * This function routes the TX packet to the correct interface. This
+ * may be a local device or the physical port.
  */
 static inline void __attribute__((always_inline))
 virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, uint16_t vlan_tag)
 {
struct mbuf_table *tx_q;
-   struct rte_mbuf **m_table;
-   unsigned len, ret, offset = 0;
+   unsigned offset = 0;
const uint16_t lcore_id = rte_lcore_id();
struct virtio_net *dev = vdev->dev;
struct ether_hdr *nh;
@@ -960,7 +981,6 @@ queue2nic:

/*Add packet to the port tx queue*/
tx_q = _tx_queue[lcore_id];
-   len = tx_q->len;

nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
if (unlikely(nh->ether_type == rte_cpu_to_be_16(ETHER_TYPE_VLAN))) {
@@ -998,55 +1018,130 @@ queue2nic:
if (m->ol_flags & PKT_TX_TCP_SEG)
virtio_tx_offload(m);

-   tx_q->m_table[len] = m;
-   len++;
+   tx_q->m_table[tx_q->len++] = m;
if (enable_stats) {
dev_statistics[dev->device_fh].tx_total++;
dev_statistics[dev->device_fh].tx++;
}

-   if (unlikely(len == MAX_PKT_BURST)) {
-   m_table = (struct rte_mbuf **)tx_q->m_table;
-   ret = rte_eth_tx_burst(ports[0], (uint16_t)tx_q->txq_id, 
m_table, (uint16_t) len);
-   /* Free any buffers not handled by TX and update the port 
stats. */
-   if (unlikely(ret < len)) {
-   do {
-   rte_pktmbuf_free(m_table[ret]);
-   } while (++ret < len);
+   if (unlikely(tx_q->len == MAX_PKT_BURST))
+   do_drain_mbuf_table(tx_q);
+}
+
+
+static inline void __attribute__((always_inline))
+drain_mbuf_table(struct mbuf_table *tx_q)
+{
+   static uint64_t prev_tsc;
+   uint64_t cur_tsc;
+
+   if (tx_q->len == 0)
+   return;
+
+   cur_tsc = rte_rdtsc();
+   if (unlikely(cur_tsc - prev_tsc > MBUF_TABLE_DRAIN_TSC)) {
+   prev_tsc = cur_tsc;
+
+   RTE_LOG(DEBUG, VHOST_DATA,
+   "TX queue drained after timeout with burst size %u\n",
+   tx_q->len);
+   do_drain_mbuf_table(tx_q);
+   }
+}
+
+static inline void __attribute__((always_inline))
+drain_eth_rx(struct vhost_dev *vdev)
+{
+   uint16_t rx_count, enqueue_count;
+   struct virtio_net *dev = vdev->dev;
+   struct rte_mbuf *pkts[MAX_PKT_BURST];
+
+   rx_count = rte_eth_rx_burst(ports[0], vdev->vmdq_rx_q,
+   pkts, MAX_PKT_BURST);
+   if (!rx_count)
+   return;
+
+   /*
+* When "enable_retry" is set, here we wait and retry when there
+* is no enough free slots in the queue to hold @rx_count packets,
+* to diminish packet loss.
+*/
+   if (enable_retry &&
+   unlikely(rx_count > rte_vring_available_entries(dev,
+   VIRTIO_RXQ))) {
+   uint32_t retry;
+
+   for (retry = 0; retry < burst_rx_retry_num; retry++) {
+   rte_delay_us(burst_rx_delay_time);
+   if (rx_count <= rte_vring_available_entries(dev,
+

[dpdk-dev] [PATCH v2 6/8] examples/vhost: fix mbuf allocation failure

2016-05-02 Thread Yuanhan Liu

It has always been a mystery (at least to me before) that how many
mbuf is enough while creating an mbuf pool. While current macro
NUM_MBUFS_PER_PORT gives your some insights, it's not that accurate:
it doesn't consider the case we may receive a big packet, say 64K
when TSO is enabled.

We actually have tried to fix it once before, with commit 5499c1fc9baa
("examples/vhost: fix mbuf allocation"), but it just workarounded it
by enlarging it a bit so that the case described in the commit log
by passes. So, while trying to fix it ultimately, I'm thinking how
big is big enough, and what are the factors need consider to figure
out a proper value.

Therefore, here you are. I introduced a helper function to create
the mbuf pool, and do the "how many mbufs are needed" calculation
there. Also, I put detailed comments how that comes, to serve as
the guidelines.

Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
Fixes: 5499c1fc9baa ("examples/vhost: fix mbuf allocation")

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 79 ++-
 1 file changed, 59 insertions(+), 20 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 7448e4f..dbb42ee 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -62,14 +62,6 @@
 /* the maximum number of external ports supported */
 #define MAX_SUP_PORTS 1

-/*
- * Calculate the number of buffers needed per port
- */
-#define NUM_MBUFS_PER_PORT ((MAX_QUEUES*RTE_TEST_RX_DESC_DEFAULT) +
\
-   
(num_switching_cores*MAX_PKT_BURST) +   \
-   
(num_switching_cores*RTE_TEST_TX_DESC_DEFAULT) +\
-   
((num_switching_cores+1)*MBUF_CACHE_SIZE))
-
 #define MBUF_CACHE_SIZE128
 #define MBUF_DATA_SIZE RTE_MBUF_DEFAULT_BUF_SIZE

@@ -110,9 +102,6 @@ static uint32_t enabled_port_mask = 0;
 /* Promiscuous mode */
 static uint32_t promiscuous;

-/*Number of switching cores enabled*/
-static uint32_t num_switching_cores = 0;
-
 /* number of devices/queues to support*/
 static uint32_t num_queues = 0;
 static uint32_t num_devices;
@@ -1345,6 +1334,57 @@ sigint_handler(__rte_unused int signum)
 }

 /*
+ * While creating an mbuf pool, one key thing is to figure out how
+ * many mbuf entries is enough for our use. FYI, here are some
+ * guidelines:
+ *
+ * - Each rx queue would reserve @nr_rx_desc mbufs at queue setup stage
+ *
+ * - For each switch core (A CPU core does the packet switch), we need
+ *   also make some reservation for receiving the packets from virtio
+ *   Tx queue. How many is enough depends on the usage. It's normally
+ *   a simple calculation like following:
+ *
+ *   MAX_PKT_BURST * max packet size / mbuf size
+ *
+ *   So, we definitely need allocate more mbufs when TSO is enabled.
+ *
+ * - Similarly, for each switching core, we should serve @nr_rx_desc
+ *   mbufs for receiving the packets from physical NIC device.
+ *
+ * - We also need make sure, for each switch core, we have allocated
+ *   enough mbufs to fill up the mbuf cache.
+ */
+static void
+create_mbuf_pool(uint16_t nr_port, uint32_t nr_switch_core, uint32_t mbuf_size,
+   uint32_t nr_queues, uint32_t nr_rx_desc, uint32_t nr_mbuf_cache)
+{
+   uint32_t nr_mbufs;
+   uint32_t nr_mbufs_per_core;
+   uint32_t mtu = 1500;
+
+   if (mergeable)
+   mtu = 9000;
+   if (enable_tso)
+   mtu = 64 * 1024;
+
+   nr_mbufs_per_core  = (mtu + mbuf_size) * MAX_PKT_BURST /
+   (mbuf_size - RTE_PKTMBUF_HEADROOM) * MAX_PKT_BURST;
+   nr_mbufs_per_core += nr_rx_desc;
+   nr_mbufs_per_core  = RTE_MAX(nr_mbufs_per_core, nr_mbuf_cache);
+
+   nr_mbufs  = nr_queues * nr_rx_desc;
+   nr_mbufs += nr_mbufs_per_core * nr_switch_core;
+   nr_mbufs *= nr_port;
+
+   mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", nr_mbufs,
+   nr_mbuf_cache, 0, mbuf_size,
+   rte_socket_id());
+   if (mbuf_pool == NULL)
+   rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
+}
+
+/*
  * Main function, does initialisation and calls the per-lcore functions. The 
CUSE
  * device is also registered here to handle the IOCTLs.
  */
@@ -1381,9 +1421,6 @@ main(int argc, char *argv[])
if (rte_lcore_count() > RTE_MAX_LCORE)
rte_exit(EXIT_FAILURE,"Not enough cores\n");

-   /*set the number of swithcing cores available*/
-   num_switching_cores = rte_lcore_count()-1;
-
/* Get the number of physical ports. */
nb_ports = rte_eth_dev_count();
if (nb_ports > RTE_MAX_ETHPORTS)
@@ -1401,12 +1438,14 @@ main(int argc, char *argv[])
return -1;
}

-   /* Create the mbuf pool. */
-   mbuf_pool =

[dpdk-dev] [PATCH v2 5/8] examples/vhost: handle broadcast packet

2016-05-02 Thread Yuanhan Liu

Every time I do a VM2VM iperf test with vhost example, I have to set
the arp table manually, as vhost-switch just ignores the broadcast
packet, leaving the ARP request not served.

Here we do a transmit a broadcast packet (such as ARP request) to
every vhost device, as well as the physical port, to fix above
arp table issue.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index bfcabf3..7448e4f 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -807,6 +807,21 @@ unlink_vmdq(struct vhost_dev *vdev)
}
 }

+static inline void __attribute__((always_inline))
+virtio_xmit(struct virtio_net *dst_dev, struct virtio_net *src_dev,
+   struct rte_mbuf *m)
+{
+   uint16_t ret;
+
+   ret = rte_vhost_enqueue_burst(dst_dev, VIRTIO_RXQ, , 1);
+   if (enable_stats) {
+   
rte_atomic64_inc(_statistics[dst_dev->device_fh].rx_total_atomic);
+   rte_atomic64_add(_statistics[dst_dev->device_fh].rx_atomic, 
ret);
+   dev_statistics[src_dev->device_fh].tx_total++;
+   dev_statistics[src_dev->device_fh].tx += ret;
+   }
+}
+
 /*
  * Check if the packet destination MAC address is for a local device. If so 
then put
  * the packet on that devices RX queue. If not then return.
@@ -815,7 +830,6 @@ static inline int __attribute__((always_inline))
 virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
struct ether_hdr *pkt_hdr;
-   uint64_t ret = 0;
struct vhost_dev *dst_vdev;
uint64_t fh;

@@ -842,15 +856,7 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
return 0;
}

-   /* send the packet to the local virtio device */
-   ret = rte_vhost_enqueue_burst(dst_vdev->dev, VIRTIO_RXQ, , 1);
-   if (enable_stats) {
-   rte_atomic64_inc(_statistics[fh].rx_total_atomic);
-   rte_atomic64_add(_statistics[fh].rx_atomic, ret);
-   dev_statistics[vdev->dev->device_fh].tx_total++;
-   dev_statistics[vdev->dev->device_fh].tx += ret;
-   }
-
+   virtio_xmit(dst_vdev->dev, vdev->dev, m);
return 0;
 }

@@ -934,6 +940,17 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
struct virtio_net *dev = vdev->dev;
struct ether_hdr *nh;

+
+   nh = rte_pktmbuf_mtod(m, struct ether_hdr *);
+   if (unlikely(is_broadcast_ether_addr(>d_addr))) {
+   struct vhost_dev *vdev2;
+
+   TAILQ_FOREACH(vdev2, _dev_list, next) {
+   virtio_xmit(vdev2->dev, vdev->dev, m);
+   }
+   goto queue2nic;
+   }
+
/*check if destination is local VM*/
if ((vm2vm_mode == VM2VM_SOFTWARE) && (virtio_tx_local(vdev, m) == 0)) {
rte_pktmbuf_free(m);
@@ -950,6 +967,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, 
uint16_t vlan_tag)
RTE_LOG(DEBUG, VHOST_DATA, "(%" PRIu64 ") TX: "
"MAC address is external\n", dev->device_fh);

+queue2nic:
+
/*Add packet to the port tx queue*/
tx_q = _tx_queue[lcore_id];
len = tx_q->len;
-- 
1.9.3

[dpdk-dev] [PATCH v2 4/8] examples/vhost: use mac compare helper function directly

2016-05-02 Thread Yuanhan Liu

rte_ether.h already provides a helper function to do mac address
compare. No need to define our own, use it directly.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 135a4a4..bfcabf3 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -104,9 +104,6 @@
 /* Maximum long option length for option parsing. */
 #define MAX_LONG_OPT_SZ 64

-/* Used to compare MAC addresses. */
-#define MAC_ADDR_CMP 0xULL
-
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -708,15 +705,6 @@ static unsigned check_ports_num(unsigned nb_ports)
return valid_num_ports;
 }

-/*
- * Compares a packet destination MAC address to a device MAC address.
- */
-static inline int __attribute__((always_inline))
-ether_addr_cmp(struct ether_addr *ea, struct ether_addr *eb)
-{
-   return ((*(uint64_t *)ea ^ *(uint64_t *)eb) & MAC_ADDR_CMP) == 0;
-}
-
 static inline struct vhost_dev *__attribute__((always_inline))
 find_vhost_dev(struct ether_addr *mac)
 {
@@ -724,7 +712,7 @@ find_vhost_dev(struct ether_addr *mac)

TAILQ_FOREACH(vdev, _dev_list, next) {
if (vdev->ready == DEVICE_RX &&
-   ether_addr_cmp(mac, >mac_address))
+   is_same_ether_addr(mac, >mac_address))
return vdev;
}

-- 
1.9.3

[dpdk-dev] [PATCH v2 3/8] examples/vhost: use tailq to link vhost devices

2016-05-02 Thread Yuanhan Liu

To simplify code and logic.

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 457 --
 examples/vhost/main.h |  32 ++--
 2 files changed, 126 insertions(+), 363 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 9445100..135a4a4 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -86,10 +86,6 @@
 #define DEVICE_RX  1
 #define DEVICE_SAFE_REMOVE 2

-/* Config_core_flag status definitions. */
-#define REQUEST_DEV_REMOVAL 1
-#define ACK_DEV_REMOVAL 0
-
 /* Configurable number of RX/TX ring descriptors */
 #define RTE_TEST_RX_DESC_DEFAULT 1024
 #define RTE_TEST_TX_DESC_DEFAULT 512
@@ -216,11 +212,9 @@ const uint16_t vlan_tags[] = {
 /* ethernet addresses of ports */
 static struct ether_addr vmdq_ports_eth_addr[RTE_MAX_ETHPORTS];

-/* heads for the main used and free linked lists for the data path. */
-static struct virtio_net_data_ll *ll_root_used = NULL;
-static struct virtio_net_data_ll *ll_root_free = NULL;
+static struct vhost_dev_tailq_list vhost_dev_list =
+   TAILQ_HEAD_INITIALIZER(vhost_dev_list);

-/* Array of data core structures containing information on individual core 
linked lists. */
 static struct lcore_info lcore_info[RTE_MAX_LCORE];

 /* Used for queueing bursts of TX packets. */
@@ -723,6 +717,20 @@ ether_addr_cmp(struct ether_addr *ea, struct ether_addr 
*eb)
return ((*(uint64_t *)ea ^ *(uint64_t *)eb) & MAC_ADDR_CMP) == 0;
 }

+static inline struct vhost_dev *__attribute__((always_inline))
+find_vhost_dev(struct ether_addr *mac)
+{
+   struct vhost_dev *vdev;
+
+   TAILQ_FOREACH(vdev, _dev_list, next) {
+   if (vdev->ready == DEVICE_RX &&
+   ether_addr_cmp(mac, >mac_address))
+   return vdev;
+   }
+
+   return NULL;
+}
+
 /*
  * This function learns the MAC address of the device and registers this along 
with a
  * vlan tag to a VMDQ.
@@ -731,21 +739,17 @@ static int
 link_vmdq(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
struct ether_hdr *pkt_hdr;
-   struct virtio_net_data_ll *dev_ll;
struct virtio_net *dev = vdev->dev;
int i, ret;

/* Learn MAC address of guest device from packet */
pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   dev_ll = ll_root_used;
-
-   while (dev_ll != NULL) {
-   if (ether_addr_cmp(&(pkt_hdr->s_addr), 
_ll->vdev->mac_address)) {
-   RTE_LOG(INFO, VHOST_DATA, "(%"PRIu64") WARNING: This 
device is using an existing MAC address and has not been registered.\n", 
dev->device_fh);
-   return -1;
-   }
-   dev_ll = dev_ll->next;
+   if (find_vhost_dev(_hdr->s_addr)) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "Device (%" PRIu64 ") is using a registered MAC!\n",
+   dev->device_fh);
+   return -1;
}

for (i = 0; i < ETHER_ADDR_LEN; i++)
@@ -822,60 +826,44 @@ unlink_vmdq(struct vhost_dev *vdev)
 static inline int __attribute__((always_inline))
 virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
 {
-   struct virtio_net_data_ll *dev_ll;
struct ether_hdr *pkt_hdr;
uint64_t ret = 0;
-   struct virtio_net *dev = vdev->dev;
-   struct virtio_net *tdev; /* destination virito device */
+   struct vhost_dev *dst_vdev;
+   uint64_t fh;

pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);

-   /*get the used devices list*/
-   dev_ll = ll_root_used;
+   dst_vdev = find_vhost_dev(_hdr->d_addr);
+   if (!dst_vdev)
+   return -1;
+
+   fh = dst_vdev->dev->device_fh;
+   if (fh == vdev->dev->device_fh) {
+   RTE_LOG(DEBUG, VHOST_DATA,
+   "(%" PRIu64 ") TX: src and dst MAC is same. "
+   "Dropping packet.\n", fh);
+   return 0;
+   }

-   while (dev_ll != NULL) {
-   if ((dev_ll->vdev->ready == DEVICE_RX) && 
ether_addr_cmp(&(pkt_hdr->d_addr),
- _ll->vdev->mac_address)) {
+   RTE_LOG(DEBUG, VHOST_DATA,
+   "(%" PRIu64 ") TX: MAC address is local\n", fh);

-   /* Drop the packet if the TX packet is destined for the 
TX device. */
-   if (dev_ll->vdev->dev->device_fh == dev->device_fh) {
-   RTE_LOG(DEBUG, VHOST_DATA, "(%" PRIu64 ") TX: "
-   "Source and destination MAC addresses 
are the same. "
-   "Dropping packet.\n",
-

[dpdk-dev] [PATCH v2 2/8] examples/vhost: remove unused macro and struct

2016-05-02 Thread Yuanhan Liu

Interestingly, DESC_PER_CACHELINE has never been used since the
introduction of vhost example. Remove it.

vlan_ethhdr struct and VLAN_ETH_HLEN macro reference had been removed
by commit 4d50b6acbd95 ("examples/vhost: adapt Tx routing to lib"), but
had forgot to remove the definition.

Fixes: 4d50b6acbd95 ("examples/vhost: adapt Tx routing to lib")

Signed-off-by: Yuanhan Liu 
---
 examples/vhost/main.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 9452bab..9445100 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -111,9 +111,6 @@
 /* Used to compare MAC addresses. */
 #define MAC_ADDR_CMP 0xULL

-/* Number of descriptors per cacheline. */
-#define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))
-
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -236,18 +233,7 @@ struct mbuf_table {
 /* TX queue for each data core. */
 struct mbuf_table lcore_tx_queue[RTE_MAX_LCORE];

-/* Vlan header struct used to insert vlan tags on TX. */
-struct vlan_ethhdr {
-   unsigned char   h_dest[ETH_ALEN];
-   unsigned char   h_source[ETH_ALEN];
-   __be16  h_vlan_proto;
-   __be16  h_vlan_TCI;
-   __be16  h_vlan_encapsulated_proto;
-};
-
-/* Header lengths. */
 #define VLAN_HLEN   4
-#define VLAN_ETH_HLEN   18

 /* Per-device statistics struct */
 struct device_statistics {
-- 
1.9.3

[dpdk-dev] [PATCH v2 1/8] examples/vhost: remove the non-working zero copy code

2016-05-02 Thread Yuanhan Liu

It's reported that it's has not been working for a long while. And due
to it's complex, it's better to redesign it than to fix it to make it
work again.

Signed-off-by: Yuanhan Liu 
---

v2: remove macro PRINT_PACKET; will not be used anymore
---
 doc/guides/sample_app_ug/vhost.rst |   36 +-
 examples/vhost/main.c  | 1497 +---
 examples/vhost/main.h  |   17 -
 3 files changed, 25 insertions(+), 1525 deletions(-)

diff --git a/doc/guides/sample_app_ug/vhost.rst 
b/doc/guides/sample_app_ug/vhost.rst
index 47ce36c..5f81802 100644
--- a/doc/guides/sample_app_ug/vhost.rst
+++ b/doc/guides/sample_app_ug/vhost.rst
@@ -491,39 +491,9 @@ The default value is 15.
  -- --rx-retry 1 --rx-retry-delay 20

 **Zero copy.**
-The zero copy option enables/disables the zero copy mode for RX/TX packet,
-in the zero copy mode the packet buffer address from guest translate into host 
physical address
-and then set directly as DMA address.
-If the zero copy mode is disabled, then one copy mode is utilized in the 
sample.
-This option is disabled by default.
-
-.. code-block:: console
-
-./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \
- -- --zero-copy [0,1]
-
-**RX descriptor number.**
-The RX descriptor number option specify the Ethernet RX descriptor number,
-Linux legacy virtio-net has different behavior in how to use the vring 
descriptor from DPDK based virtio-net PMD,
-the former likely allocate half for virtio header, another half for frame 
buffer,
-while the latter allocate all for frame buffer,
-this lead to different number for available frame buffer in vring,
-and then lead to different Ethernet RX descriptor number could be used in zero 
copy mode.
-So it is valid only in zero copy mode is enabled. The value is 32 by default.
-
-.. code-block:: console
-
-./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \
- -- --zero-copy 1 --rx-desc-num [0, n]
-
-**TX descriptor number.**
-The TX descriptor number option specify the Ethernet TX descriptor number, it 
is valid only in zero copy mode is enabled.
-The value is 64 by default.
-
-.. code-block:: console
-
-./vhost-switch -c f -n 4 --socket-mem 1024 --huge-dir /mnt/huge \
- -- --zero-copy 1 --tx-desc-num [0, n]
+Zero copy mode is removed, due to it has not been working for a while. And
+due to the large and complex code, it's better to redesign it than fixing
+it to make it work again. Hence, zero copy may be added back later.

 **VLAN strip.**
 The VLAN strip option enable/disable the VLAN strip on host, if disabled, the 
guest will receive the packets with VLAN tag.
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 78fd1ab..9452bab 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -73,15 +73,6 @@
 #define MBUF_CACHE_SIZE128
 #define MBUF_DATA_SIZE RTE_MBUF_DEFAULT_BUF_SIZE

-/*
- * No frame data buffer allocated from host are required for zero copy
- * implementation, guest will allocate the frame data buffer, and vhost
- * directly use it.
- */
-#define VIRTIO_DESCRIPTOR_LEN_ZCP  RTE_MBUF_DEFAULT_DATAROOM
-#define MBUF_DATA_SIZE_ZCP RTE_MBUF_DEFAULT_BUF_SIZE
-#define MBUF_CACHE_SIZE_ZCP 0
-
 #define MAX_PKT_BURST 32   /* Max burst size for RX/TX */
 #define BURST_TX_DRAIN_US 100  /* TX drain every ~100us */

@@ -103,25 +94,6 @@
 #define RTE_TEST_RX_DESC_DEFAULT 1024
 #define RTE_TEST_TX_DESC_DEFAULT 512

-/*
- * Need refine these 2 macros for legacy and DPDK based front end:
- * Max vring avail descriptor/entries from guest - MAX_PKT_BURST
- * And then adjust power 2.
- */
-/*
- * For legacy front end, 128 descriptors,
- * half for virtio header, another half for mbuf.
- */
-#define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
-#define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */
-
-/* Get first 4 bytes in mbuf headroom. */
-#define MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t *)((uint8_t *)(mbuf) \
-   + sizeof(struct rte_mbuf)))
-
-/* true if x is a power of 2 */
-#define POWEROF2(x) x)-1) & (x)) == 0)
-
 #define INVALID_PORT_ID 0xFF

 /* Max number of devices. Limited by vmdq. */
@@ -142,8 +114,6 @@
 /* Number of descriptors per cacheline. */
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))

-#define MBUF_EXT_MEM(mb)   (rte_mbuf_from_indirect(mb) != (mb))
-
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -157,29 +127,12 @@ static uint32_t num_switching_cores = 0;
 static uint32_t num_queues = 0;
 static uint32_t num_devices;

-/*
- * Enable zero copy, pkts buffer will directly dma to hw descriptor,
- * disabled on default.
- */
-static uint32_t zero_copy;
+static struct rte_mempool *mbuf_pool;
 static int mergeable;

 /* Do vlan strip on host, enabled on default */
 static uint32_t vlan_strip = 1;

-/* number of descriptors to apply*/
-static uint32_t num_rx_descriptor =

[dpdk-dev] [PATCH v2 0/8] vhost/example cleanup/fix

2016-05-02 Thread Yuanhan Liu

I'm starting to work on the vhost ABI refactoring, that I also have to
touch the vhost example code. The vhost example code, however, is very
messy, full of __very__ long lines. This would make a later diff to
apply the new vhost API be very ugly, therefore, not friendly for review.
This is how this cleanup comes.

Besides that, there is one enhancement patch, which handles the broadcast
packets so that we could rely the ARP request packet, to let vhost-switch
be more like a real switch. There is another patch that (hopefully) would
fix the mbuf allocation failure ultimately. I also added some guidelines
there as comments to show how to count how many mbuf entries is enough for
our usage.

In another word, an example is meant to be clean/simple and with good
coding style so that people can get the usage easily. So, one way or
another, this patch is good to have, even without this ABI refactoring
stuff.

Note that I'm going to apply it before the end of this week, if no objections.


v2: - some checkpatch fixes

- cleaned the code about device statistics

---
Yuanhan Liu (8):
  examples/vhost: remove the non-working zero copy code
  examples/vhost: remove unused macro and struct
  examples/vhost: use tailq to link vhost devices
  examples/vhost: use mac compare helper function directly
  examples/vhost: handle broadcast packet
  examples/vhost: fix mbuf allocation failure
  examples/vhost: switch_worker cleanup
  examples/vhost: embed statistics into vhost_dev struct

 doc/guides/sample_app_ug/vhost.rst |   36 +-
 examples/vhost/main.c  | 2394 ++--
 examples/vhost/main.h  |   56 +-
 3 files changed, 391 insertions(+), 2095 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH] ethdev: make struct rte_eth_dev cache aligned

2016-05-02 Thread Jerin Jacob

Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob 
---
 lib/librte_ether/rte_ethdev.c | 2 +-
 lib/librte_ether/rte_ethdev.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..04f492d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -70,7 +70,7 @@
 #include "rte_ethdev.h"

 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
-struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
+struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] __rte_cache_aligned;
 static struct rte_eth_dev_data *rte_eth_dev_data;
 static uint8_t nb_ports;

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8519ff6..e359dda 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1630,7 +1630,7 @@ struct rte_eth_dev {
struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
uint8_t attached; /**< Flag indicating the port is attached */
enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
-};
+} __rte_cache_aligned;

 struct rte_eth_dev_sriov {
uint8_t active;   /**< SRIOV is active with 16, 32 or 64 
pools */
-- 
2.1.0

[dpdk-dev] [PATCH] ip_pipeline: configuration file parser cleanup

2016-05-02 Thread Jasvinder Singh

This patch updates the parsing routines of packet queues (pktq_in/out
fields in the PIPELINE section) and message queues (msgq_in/out fields
of in the MSGQ Section) specified in ip_pipeline configuration file.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/config_parse.c | 221 
 1 file changed, 45 insertions(+), 176 deletions(-)

diff --git a/examples/ip_pipeline/config_parse.c 
b/examples/ip_pipeline/config_parse.c
index ab79cd5..72e3d61 100644
--- a/examples/ip_pipeline/config_parse.c
+++ b/examples/ip_pipeline/config_parse.c
@@ -1128,61 +1128,29 @@ parse_pipeline_pcap_sink(struct app_params *app,
 static int
 parse_pipeline_pktq_in(struct app_params *app,
struct app_pipeline_params *p,
-   const char *value)
+   char *value)
 {
-   const char *next = value;
-   char *end;
-   char name[APP_PARAM_NAME_SIZE];
-   size_t name_len;

-   while (*next != '\0') {
+   while (1) {
enum app_pktq_in_type type;
int id;
-   char *end_space;
-   char *end_tab;
+   char *token = strtok_r(value, PARSE_DELIMITER, );

-   next = skip_white_spaces(next);
-   if (!next)
+   if (token == NULL)
break;

-   end_space = strchr(next, ' ');
-   end_tab = strchr(next, '');
-
-   if (end_space && (!end_tab))
-   end = end_space;
-   else if ((!end_space) && end_tab)
-   end = end_tab;
-   else if (end_space && end_tab)
-   end = RTE_MIN(end_space, end_tab);
-   else
-   end = NULL;
-
-   if (!end)
-   name_len = strlen(next);
-   else
-   name_len = end - next;
-
-   if (name_len == 0 || name_len == sizeof(name))
-   return -EINVAL;
-
-   strncpy(name, next, name_len);
-   name[name_len] = '\0';
-   next += name_len;
-   if (*next != '\0')
-   next++;
-
-   if (validate_name(name, "RXQ", 2) == 0) {
+   if (validate_name(token, "RXQ", 2) == 0) {
type = APP_PKTQ_IN_HWQ;
-   id = APP_PARAM_ADD(app->hwq_in_params, name);
-   } else if (validate_name(name, "SWQ", 1) == 0) {
+   id = APP_PARAM_ADD(app->hwq_in_params, token);
+   } else if (validate_name(token, "SWQ", 1) == 0) {
type = APP_PKTQ_IN_SWQ;
-   id = APP_PARAM_ADD(app->swq_params, name);
-   } else if (validate_name(name, "TM", 1) == 0) {
+   id = APP_PARAM_ADD(app->swq_params, token);
+   } else if (validate_name(token, "TM", 1) == 0) {
type = APP_PKTQ_IN_TM;
-   id = APP_PARAM_ADD(app->tm_params, name);
-   } else if (validate_name(name, "SOURCE", 1) == 0) {
+   id = APP_PARAM_ADD(app->tm_params, token);
+   } else if (validate_name(token, "SOURCE", 1) == 0) {
type = APP_PKTQ_IN_SOURCE;
-   id = APP_PARAM_ADD(app->source_params, name);
+   id = APP_PARAM_ADD(app->source_params, token);
} else
return -EINVAL;

@@ -1200,60 +1168,28 @@ parse_pipeline_pktq_in(struct app_params *app,
 static int
 parse_pipeline_pktq_out(struct app_params *app,
struct app_pipeline_params *p,
-   const char *value)
+   char *value)
 {
-   const char *next = value;
-   char *end;
-   char name[APP_PARAM_NAME_SIZE];
-   size_t name_len;
-
-   while (*next != '\0') {
-   enum app_pktq_out_type type;
+   while (1) {
+   enum app_pktq_in_type type;
int id;
-   char *end_space;
-   char *end_tab;
+   char *token = strtok_r(value, PARSE_DELIMITER, );

-   next = skip_white_spaces(next);
-   if (!next)
+   if (token == NULL)
break;

-   end_space = strchr(next, ' ');
-   end_tab = strchr(next, '');
-
-   if (end_space && (!end_tab))
-   end = end_space;
-   else if ((!end_space) && end_tab)
-   end = end_tab;
-   else if (end_space && end_tab)
-   end = RTE_MIN(end_space, end_tab);
-   else
-   end = NULL;
-
-   if (!end)
-   name_len = strlen(next);
-   else
-   name_len = end - next;
-
-   if (name_len == 0 || name_len == sizeof(name))
-

[dpdk-dev] [PATCH] lpm6: fix missing header dependency

2016-05-02 Thread Thomas Monjalon

2016-04-28 15:08, Igor Ryzhov:
> Include stdint.h for the definition of uint*_t types.
> 
> Signed-off-by: Igor Ryzhov 

Applied, thanks

[dpdk-dev] [PATCH] lpm: fix freeing of rules_tbl in rte_lpm_free_v20

2016-05-02 Thread Thomas Monjalon

> > Back then when we fixed the missing free lpm I was to quickly to say yes
> > if it applies not only to the lpm6 but also to all of the lpm code.
> >
> > It turned out to not apply to all of them. In rte_lpm_create_v20 there
> > is an unexpected fused allocation:
> > mem_size = sizeof(*lpm) + (sizeof(lpm->rules_tbl[0]) * max_rules);
> > [...]
> > lpm = (struct rte_lpm_v20 *)rte_zmalloc_socket(mem_name,mem_size,
> > RTE_CACHE_LINE_SIZE, socket_id);
> >
> > That causes lpm->rules_tbl not to have an own struct malloc_elem that
> > can be derived via RTE_PTR_SUB(data, MALLOC_ELEM_HEADER_LEN) in
> > malloc_elem_from_data.
> > Due to that the rte_lpm_free_v20 accidentially misderives the elem and
> > assumes it is ELEM_FREE triggering in malloc_elem_free
> > if (!malloc_elem_cookies_ok(elem) || elem->state !=
> >  return -1;
> >
> > While it seems counter-intuitive the way to properly remove rules_tbl in
> > the old fused allocation style of rte_lpm_free_v20 is to not remove it.
> >
> > The newer rte_lpm_free_v1604 is safe because in rte_lpm_create_v1604
> > rules_tbl is a separate allocation.
> >
> > Fixes: d4c18f0a1d5d ("lpm: fix missing free")
> >
> > Signed-off-by: Christian Ehrhardt 
> 
> Acked-by: Olivier Matz 
> 
> Thanks, I missed it too during the review.

Applied, thanks

[dpdk-dev] [PATCH v2] virtio: fix modify drv_flags for specific device

2016-05-02 Thread Yuanhan Liu

On Thu, Apr 28, 2016 at 06:08:59PM +, Jianfeng Tan wrote:
> Issue: virtio's drv_flags are decided by devices types (modern vs legacy),
> and which kernel driver is used, and the negotiated features (especially
> VIRTIO_NET_STATUS) with backend, which makes it possible to multiple
> virtio devices have different versions of drv_flags, but this variable
> is currently shared by each virtio device.
> 
> How to fix: dev_flags is a device-specific variable to store this info.
> 
> Fixes: da978dfdc43 ("virtio: use port IO to get PCI resource")
> 
> Reported-by: David Marchand 
> Suggested-by: David Marchand 

Hi David,

May I ask your review on this and give ACK when no issue to you?

Thanks.

--yliu

[dpdk-dev] Flow Director Example?

2016-05-02 Thread Zhang, Helin

Hi Alex

Can you confirm that you are using DPDK? And how do you use DPDK and possibly 
kernel driver?
I need your detailed topo of how are you using DPDK, as I am a bit confused. 
Thanks!

Regards,
Helin

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Forster
> Sent: Saturday, April 30, 2016 4:34 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Flow Director Example?
> 
> Hi guys, apologies if this is the wrong list, but the others look pretty bare.
> 
> We have a 32 core server that has two X520-QDA1's NICs with 2x10G ports
> plugged into each. I'm using 2016.1 (latest stable) with ixgbe 4.3.15 (latest 
> stable).
> I'm setting up 8 RX queues per port, and I'd like Flow Director in signature 
> mode
> (?) to place packets into queues based on a hash of destination IPv4 or IPv6
> address. However, I can't figure out rte_fdir_conf, and despite a good amount 
> of
> trial and error, each of my ports are still only using one of the RX queues I 
> set up.
> 
> Would anyone be able to point me in the right direction here? Thanks in 
> advance!
> 
> Alex Forster

44 matches

Mail list logo