[dpdk-dev] [PATCH v13 2/2] vhost: Add VHOST PMD

Loftus, Ciara Mon, 21 Mar 2016 15:40:07 +0000

Hi Tetsuya,

Thanks for the patches. Just one query below re max queue numbers.


Thanks,
Ciara

> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Monday, March 21, 2016 5:45 AM
> To: dev at dpdk.org
> Cc: Richardson, Bruce <bruce.richardson at intel.com>;
> ann.zhuangyanying at huawei.com; thomas.monjalon at 6wind.com; Tetsuya
> Mukawa <mukawa at igel.co.jp>
> Subject: [dpdk-dev] [PATCH v13 2/2] vhost: Add VHOST PMD
> 
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>            virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>            virtio-net device has.
>            (Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
>         <snip>
>         -chardev socket,id=chr0,path=/tmp/sock0 \
>         -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
>         -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa <mukawa at igel.co.jp>
> Acked-by: Ferruh Yigit <ferruh.yigit at intel.com>
> Acked-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> Acked-by: Rich Lane <rich.lane at bigswitch.com>
> Tested-by: Rich Lane <rich.lane at bigswitch.com>
> ---
>  MAINTAINERS                                 |   5 +
>  config/common_base                          |   6 +
>  config/common_linuxapp                      |   1 +
>  doc/guides/nics/index.rst                   |   1 +
>  doc/guides/nics/overview.rst                |  37 +-
>  doc/guides/nics/vhost.rst                   | 110 ++++
>  doc/guides/rel_notes/release_16_04.rst      |   4 +
>  drivers/net/Makefile                        |   4 +
>  drivers/net/vhost/Makefile                  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c           | 917
> ++++++++++++++++++++++++++++
>  drivers/net/vhost/rte_eth_vhost.h           | 109 ++++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  10 +
>  mk/rte.app.mk                               |   6 +
>  13 files changed, 1254 insertions(+), 18 deletions(-)
>  create mode 100644 doc/guides/nics/vhost.rst
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8b21979..7a47fc0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -352,6 +352,11 @@ Null PMD
>  M: Tetsuya Mukawa <mukawa at igel.co.jp>
>  F: drivers/net/null/
> 
> +Vhost PMD
> +M: Tetsuya Mukawa <mukawa at igel.co.jp>
> +M: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> +F: drivers/net/vhost/
> +
>  Intel AES-NI GCM PMD
>  M: Declan Doherty <declan.doherty at intel.com>
>  F: drivers/crypto/aesni_gcm/
> diff --git a/config/common_base b/config/common_base
> index dbd405b..5efee07 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
>  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> 
>  #
> +# Compile vhost PMD
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_PMD_VHOST=n
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index ffbe260..7e698e2 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -40,5 +40,6 @@ CONFIG_RTE_EAL_VFIO=y
>  CONFIG_RTE_KNI_KMOD=y
>  CONFIG_RTE_LIBRTE_KNI=y
>  CONFIG_RTE_LIBRTE_VHOST=y
> +CONFIG_RTE_LIBRTE_PMD_VHOST=y
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>  CONFIG_RTE_LIBRTE_POWER=y
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 0b353a8..d53b0c7 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -49,6 +49,7 @@ Network Interface Controller Drivers
>      nfp
>      szedata2
>      virtio
> +    vhost
>      vmxnet3
>      pcap_ring
> 
> diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
> index 2d4f014..40ca5ec 100644
> --- a/doc/guides/nics/overview.rst
> +++ b/doc/guides/nics/overview.rst
> @@ -74,20 +74,21 @@ Most of these differences are summarized below.
> 
>  .. table:: Features availability in networking drivers
> 
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> -   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r 
> s v v v x
> -                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i 
> z i i m e
> -                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n 
> e r r x n
> -                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g 
> d t t n v
> -                        c x x i e 0     . v v   f e e e e k k     e         
> a i i e i
> -                        k   v n         . f f       . v v   .               
> t o o t r
> -                        e   f g         .   .       . f f   .               
> a   . 3 t
> -                        t               v   v       v   v   v               
> 2   v
> -                                        e   e       e   e   e                
>    e
> -                                        c   c       c   c   c                
>    c
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> -   link status                  X     X X                                   X
> -   link status event                  X X
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> +   Feature              a b b b c e e i i i i i i i i i i f f m m m n n p r 
> s v v v v x
> +                        f n n o x 1 n 4 4 4 4 g g x x x x m m l l p f u c i 
> z h i i m e
> +                        p x x n g 0 i 0 0 0 0 b b g g g g 1 1 x x i p l a n 
> e o r r x n
> +                        a 2 2 d b 0 c e e e e   v b b b b 0 0 4 5 p   l p g 
> d s t t n v
> +                        c x x i e 0     . v v   f e e e e k k     e         
> a t i i e i
> +                        k   v n         . f f       . v v   .               
> t   o o t r
> +                        e   f g         .   .       . f f   .               
> a     . 3 t
> +                        t               v   v       v   v   v               
> 2     v
> +                                        e   e       e   e   e                
>      e
> +                                        c   c       c   c   c                
>      c
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> +   link status                  X     X X                                   
> X X
> +   link status event                  X X                                    
>  X
> +   queue status event                                                        
>  X
>     Rx interrupt                       X X X X
>     queue start/stop             X     X X X X                               X
>     MTU update                   X
> @@ -125,7 +126,7 @@ Most of these differences are summarized below.
>     inner L4 checksum                  X   X
>     packet type parsing          X     X   X
>     timesync                           X X
> -   basic stats                  X     X X X X                               X
> +   basic stats                  X     X X X X                               
> X X
>     extended stats                     X X X X
>     stats per queue              X                                           X
>     EEPROM dump
> @@ -139,9 +140,9 @@ Most of these differences are summarized below.
>     ARMv8
>     Power8
>     TILE-Gx
> -   x86-32                       X     X X X X
> -   x86-64                       X     X X X X                               X
> +   x86-32                       X     X X X X                                
>  X
> +   x86-64                       X     X X X X                               
> X X
>     usage doc                    X                                           X
>     design doc
>     perf doc
> -   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = =
> +   ==================== = = = = = = = = = = = = = = = = = = = = = = = = = =
> = = = = = =
> diff --git a/doc/guides/nics/vhost.rst b/doc/guides/nics/vhost.rst
> new file mode 100644
> index 0000000..50e8a3a
> --- /dev/null
> +++ b/doc/guides/nics/vhost.rst
> @@ -0,0 +1,110 @@
> +..  BSD LICENSE
> +    Copyright(c) 2016 IGEL Co., Ltd.. All rights reserved.
> +    All rights reserved.
> +
> +    Redistribution and use in source and binary forms, with or without
> +    modification, are permitted provided that the following conditions
> +    are met:
> +
> +    * Redistributions of source code must retain the above copyright
> +    notice, this list of conditions and the following disclaimer.
> +    * Redistributions in binary form must reproduce the above copyright
> +    notice, this list of conditions and the following disclaimer in
> +    the documentation and/or other materials provided with the
> +    distribution.
> +    * Neither the name of IGEL Co., Ltd. nor the names of its
> +    contributors may be used to endorse or promote products derived
> +    from this software without specific prior written permission.
> +
> +    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +Poll Mode Driver that wraps vhost library
> +=========================================
> +
> +This PMD is a thin wrapper of the DPDK vhost library.
> +The user can handle virtqueues as one of normal DPDK port.
> +
> +Vhost Implementation in DPDK
> +----------------------------
> +
> +Please refer to Chapter "Vhost Library" of *DPDK Programmer's Guide* to
> know detail of vhost.
> +
> +Features and Limitations of vhost PMD
> +-------------------------------------
> +
> +Currently, the vhost PMD provides the basic functionality of packet
> reception, transmission and event handling.
> +
> +*   It has multiple queues support.
> +
> +*   It supports ``RTE_ETH_EVENT_INTR_LSC`` and
> ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE`` events.
> +
> +*   It supports Port Hotplug functionality.
> +
> +*   Don't need to stop RX/TX, when the user wants to stop a guest or a
> virtio-net driver on guest.
> +
> +Vhost PMD arguments
> +-------------------
> +
> +The user can specify below arguments in `--vdev` option.
> +
> +#.  ``iface``:
> +
> +    It is used to specify a path to connect to a QEMU virtio-net device.
> +
> +#.  ``queues``:
> +
> +    It is used to specify the number of queues virtio-net device has.
> +    (Default: 1)
> +
> +Vhost PMD event handling
> +------------------------
> +
> +This section describes how to handle vhost PMD events.
> +
> +The user can register an event callback handler with
> ``rte_eth_dev_callback_register()``.
> +The registered callback handler will be invoked with one of below event
> types.
> +
> +#.  ``RTE_ETH_EVENT_INTR_LSC``:
> +
> +    It means link status of the port was changed.
> +
> +#.  ``RTE_ETH_EVENT_QUEUE_STATE_CHANGE``:
> +
> +    It means some of queue statuses were changed. Call
> ``rte_eth_vhost_get_queue_event()`` in the callback handler.
> +    Because changing multiple statuses may occur only one event, call the
> function repeatedly as long as it doesn't return negative value.
> +
> +Vhost PMD with testpmd application
> +----------------------------------
> +
> +This section demonstrates vhost PMD with testpmd DPDK sample
> application.
> +
> +#.  Launch the testpmd with vhost PMD:
> +
> +    .. code-block:: console
> +
> +        ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- 
> -i
> +
> +    Other basic DPDK preparations like hugepage enabling here.
> +    Please refer to the *DPDK Getting Started Guide* for detailed
> instructions.
> +
> +#.  Launch the QEMU:
> +
> +    .. code-block:: console
> +
> +       qemu-system-x86_64 <snip>
> +                   -chardev socket,id=chr0,path=/tmp/sock0 \
> +                   -netdev 
> vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> +                   -device virtio-net-pci,netdev=net0
> +
> +    This command attaches one virtio-net device to QEMU guest.
> +    After initialization processes between QEMU and DPDK vhost library are
> done, status of the port will be linked up.
> diff --git a/doc/guides/rel_notes/release_16_04.rst
> b/doc/guides/rel_notes/release_16_04.rst
> index 2785b29..2e4bbb3 100644
> --- a/doc/guides/rel_notes/release_16_04.rst
> +++ b/doc/guides/rel_notes/release_16_04.rst
> @@ -248,6 +248,10 @@ This section should contain new features added in
> this release. Sample format:
> 
>    New application implementing an IPsec Security Gateway.
> 
> +* **Added vhost PMD.**
> +
> +  Added virtual PMD that wraps librte_vhost.
> +
> 
>  Resolved Issues
>  ---------------
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 0c3393f..8ba37fb 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -52,4 +52,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> new file mode 100644
> index 0000000..f49a69b
> --- /dev/null
> +++ b/drivers/net/vhost/Makefile
> @@ -0,0 +1,62 @@
> +#   BSD LICENSE
> +#
> +#   Copyright (c) 2010-2016 Intel Corporation.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of Intel corporation nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_vhost.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +
> +EXPORT_MAP := rte_pmd_vhost_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
> +
> +#
> +# Export include files
> +#
> +SYMLINK-y-include += rte_eth_vhost.h
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += lib/librte_vhost
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> new file mode 100644
> index 0000000..6b9d287
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -0,0 +1,917 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co.,Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <stdbool.h>
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +#include <numaif.h>
> +#endif
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_dev.h>
> +#include <rte_kvargs.h>
> +#include <rte_virtio_net.h>
> +#include <rte_spinlock.h>
> +
> +#include "rte_eth_vhost.h"
> +
> +#define ETH_VHOST_IFACE_ARG          "iface"
> +#define ETH_VHOST_QUEUES_ARG         "queues"
> +
> +static const char *drivername = "VHOST PMD";
> +
> +static const char *valid_arguments[] = {
> +     ETH_VHOST_IFACE_ARG,
> +     ETH_VHOST_QUEUES_ARG,
> +     NULL
> +};
> +
> +static struct ether_addr base_eth_addr = {
> +     .addr_bytes = {
> +             0x56 /* V */,
> +             0x48 /* H */,
> +             0x4F /* O */,
> +             0x53 /* S */,
> +             0x54 /* T */,
> +             0x00
> +     }
> +};
> +
> +struct vhost_queue {
> +     rte_atomic32_t allow_queuing;
> +     rte_atomic32_t while_queuing;
> +     struct virtio_net *device;
> +     struct pmd_internal *internal;
> +     struct rte_mempool *mb_pool;
> +     uint8_t port;
> +     uint16_t virtqueue_id;
> +     uint64_t rx_pkts;
> +     uint64_t tx_pkts;
> +     uint64_t missed_pkts;
> +     uint64_t rx_bytes;
> +     uint64_t tx_bytes;
> +};
> +
> +struct pmd_internal {
> +     char *dev_name;
> +     char *iface_name;
> +
> +     volatile uint16_t once;
> +};
> +
> +struct internal_list {
> +     TAILQ_ENTRY(internal_list) next;
> +     struct rte_eth_dev *eth_dev;
> +};
> +
> +TAILQ_HEAD(internal_list_head, internal_list);
> +static struct internal_list_head internal_list =
> +     TAILQ_HEAD_INITIALIZER(internal_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
> +
> +static rte_atomic16_t nb_started_ports;
> +static pthread_t session_th;
> +
> +static struct rte_eth_link pmd_link = {
> +             .link_speed = 10000,
> +             .link_duplex = ETH_LINK_FULL_DUPLEX,
> +             .link_status = 0
> +};
> +
> +struct rte_vhost_vring_state {
> +     rte_spinlock_t lock;
> +
> +     bool cur[RTE_MAX_QUEUES_PER_PORT * 2];
> +     bool seen[RTE_MAX_QUEUES_PER_PORT * 2];
> +     unsigned int index;
> +     unsigned int max_vring;
> +};
> +
> +static struct rte_vhost_vring_state *vring_states[RTE_MAX_ETHPORTS];
> +
> +static uint16_t
> +eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +     struct vhost_queue *r = q;
> +     uint16_t i, nb_rx = 0;
> +
> +     if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +             return 0;
> +
> +     rte_atomic32_set(&r->while_queuing, 1);
> +
> +     if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +             goto out;
> +
> +     /* Dequeue packets from guest TX queue */
> +     nb_rx = rte_vhost_dequeue_burst(r->device,
> +                     r->virtqueue_id, r->mb_pool, bufs, nb_bufs);
> +
> +     r->rx_pkts += nb_rx;
> +
> +     for (i = 0; likely(i < nb_rx); i++) {
> +             bufs[i]->port = r->port;
> +             r->rx_bytes += bufs[i]->pkt_len;
> +     }
> +
> +out:
> +     rte_atomic32_set(&r->while_queuing, 0);
> +
> +     return nb_rx;
> +}
> +
> +static uint16_t
> +eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
> +{
> +     struct vhost_queue *r = q;
> +     uint16_t i, nb_tx = 0;
> +
> +     if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +             return 0;
> +
> +     rte_atomic32_set(&r->while_queuing, 1);
> +
> +     if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
> +             goto out;
> +
> +     /* Enqueue packets to guest RX queue */
> +     nb_tx = rte_vhost_enqueue_burst(r->device,
> +                     r->virtqueue_id, bufs, nb_bufs);
> +
> +     r->tx_pkts += nb_tx;
> +     r->missed_pkts += nb_bufs - nb_tx;
> +
> +     for (i = 0; likely(i < nb_tx); i++)
> +             r->tx_bytes += bufs[i]->pkt_len;
> +
> +     for (i = 0; likely(i < nb_tx); i++)
> +             rte_pktmbuf_free(bufs[i]);
> +out:
> +     rte_atomic32_set(&r->while_queuing, 0);
> +
> +     return nb_tx;
> +}
> +
> +static int
> +eth_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +     return 0;
> +}
> +
> +static inline struct internal_list *
> +find_internal_resource(char *ifname)
> +{
> +     int found = 0;
> +     struct internal_list *list;
> +     struct pmd_internal *internal;
> +
> +     if (ifname == NULL)
> +             return NULL;
> +
> +     pthread_mutex_lock(&internal_list_lock);
> +
> +     TAILQ_FOREACH(list, &internal_list, next) {
> +             internal = list->eth_dev->data->dev_private;
> +             if (!strcmp(internal->iface_name, ifname)) {
> +                     found = 1;
> +                     break;
> +             }
> +     }
> +
> +     pthread_mutex_unlock(&internal_list_lock);
> +
> +     if (!found)
> +             return NULL;
> +
> +     return list;
> +}
> +
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +     struct rte_eth_dev *eth_dev;
> +     struct internal_list *list;
> +     struct pmd_internal *internal;
> +     struct vhost_queue *vq;
> +     unsigned i;
> +
> +     if (dev == NULL) {
> +             RTE_LOG(INFO, PMD, "Invalid argument\n");
> +             return -1;
> +     }
> +
> +     list = find_internal_resource(dev->ifname);
> +     if (list == NULL) {
> +             RTE_LOG(INFO, PMD, "Invalid device name\n");
> +             return -1;
> +     }
> +
> +     eth_dev = list->eth_dev;
> +     internal = eth_dev->data->dev_private;
> +
> +     for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +             vq = eth_dev->data->rx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             vq->device = dev;
> +             vq->internal = internal;
> +             vq->port = eth_dev->data->port_id;
> +             rte_vhost_enable_guest_notification(dev, vq-
> >virtqueue_id, 0);
> +     }
> +     for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +             vq = eth_dev->data->tx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             vq->device = dev;
> +             vq->internal = internal;
> +             vq->port = eth_dev->data->port_id;
> +             rte_vhost_enable_guest_notification(dev, vq-
> >virtqueue_id, 0);
> +     }
> +
> +     dev->flags |= VIRTIO_DEV_RUNNING;
> +     dev->priv = eth_dev;
> +     eth_dev->data->dev_link.link_status = 1;
> +
> +     for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +             vq = eth_dev->data->rx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             rte_atomic32_set(&vq->allow_queuing, 1);
> +     }
> +     for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +             vq = eth_dev->data->tx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             rte_atomic32_set(&vq->allow_queuing, 1);
> +     }
> +
> +     RTE_LOG(INFO, PMD, "New connection established\n");
> +
> +     _rte_eth_dev_callback_process(eth_dev,
> RTE_ETH_EVENT_INTR_LSC);
> +
> +     return 0;
> +}
> +
> +static void
> +destroy_device(volatile struct virtio_net *dev)
> +{
> +     struct rte_eth_dev *eth_dev;
> +     struct vhost_queue *vq;
> +     unsigned i;
> +
> +     if (dev == NULL) {
> +             RTE_LOG(INFO, PMD, "Invalid argument\n");
> +             return;
> +     }
> +
> +     eth_dev = (struct rte_eth_dev *)dev->priv;
> +     if (eth_dev == NULL) {
> +             RTE_LOG(INFO, PMD, "Failed to find a ethdev\n");
> +             return;
> +     }
> +
> +     /* Wait until rx/tx_pkt_burst stops accessing vhost device */
> +     for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +             vq = eth_dev->data->rx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             rte_atomic32_set(&vq->allow_queuing, 0);
> +             while (rte_atomic32_read(&vq->while_queuing))
> +                     rte_pause();
> +     }
> +     for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +             vq = eth_dev->data->tx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             rte_atomic32_set(&vq->allow_queuing, 0);
> +             while (rte_atomic32_read(&vq->while_queuing))
> +                     rte_pause();
> +     }
> +
> +     eth_dev->data->dev_link.link_status = 0;
> +
> +     dev->priv = NULL;
> +     dev->flags &= ~VIRTIO_DEV_RUNNING;
> +
> +     for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
> +             vq = eth_dev->data->rx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             vq->device = NULL;
> +     }
> +     for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
> +             vq = eth_dev->data->tx_queues[i];
> +             if (vq == NULL)
> +                     continue;
> +             vq->device = NULL;
> +     }
> +
> +     RTE_LOG(INFO, PMD, "Connection closed\n");
> +
> +     _rte_eth_dev_callback_process(eth_dev,
> RTE_ETH_EVENT_INTR_LSC);
> +}
> +
> +static int
> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
> +{
> +     struct rte_vhost_vring_state *state;
> +     struct rte_eth_dev *eth_dev;
> +     struct internal_list *list;
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +     int newnode, ret;
> +#endif
> +
> +     if (dev == NULL) {
> +             RTE_LOG(ERR, PMD, "Invalid argument\n");
> +             return -1;
> +     }
> +
> +     list = find_internal_resource(dev->ifname);
> +     if (list == NULL) {
> +             RTE_LOG(ERR, PMD, "Invalid interface name: %s\n", dev-
> >ifname);
> +             return -1;
> +     }
> +
> +     eth_dev = list->eth_dev;
> +     /* won't be NULL */
> +     state = vring_states[eth_dev->data->port_id];
> +
> +#ifdef RTE_LIBRTE_VHOST_NUMA
> +     ret  = get_mempolicy(&newnode, NULL, 0, dev,
> +                     MPOL_F_NODE | MPOL_F_ADDR);
> +     if (ret < 0) {
> +             RTE_LOG(ERR, PMD, "Unknown numa node\n");
> +             return -1;
> +     }
> +
> +     eth_dev->data->numa_node = newnode;
> +#endif
> +     rte_spinlock_lock(&state->lock);
> +     state->cur[vring] = enable;
> +     state->max_vring = RTE_MAX(vring, state->max_vring);
> +     rte_spinlock_unlock(&state->lock);
> +
> +     RTE_LOG(INFO, PMD, "vring%u is %s\n",
> +                     vring, enable ? "enabled" : "disabled");
> +
> +     _rte_eth_dev_callback_process(eth_dev,
> +                     RTE_ETH_EVENT_QUEUE_STATE_CHANGE);
> +
> +     return 0;
> +}
> +
> +int
> +rte_eth_vhost_get_queue_event(uint8_t port_id,
> +             struct rte_eth_vhost_queue_event *event)
> +{
> +     struct rte_vhost_vring_state *state;
> +     unsigned int i;
> +     int idx;
> +
> +     if (port_id >= RTE_MAX_ETHPORTS) {
> +             RTE_LOG(ERR, PMD, "Invalid port id\n");
> +             return -1;
> +     }
> +
> +     state = vring_states[port_id];
> +     if (!state) {
> +             RTE_LOG(ERR, PMD, "Unused port\n");
> +             return -1;
> +     }
> +
> +     rte_spinlock_lock(&state->lock);
> +     for (i = 0; i <= state->max_vring; i++) {
> +             idx = state->index++ % (state->max_vring + 1);
> +
> +             if (state->cur[idx] != state->seen[idx]) {
> +                     state->seen[idx] = state->cur[idx];
> +                     event->queue_id = idx / 2;
> +                     event->rx = idx & 1;
> +                     event->enable = state->cur[idx];
> +                     rte_spinlock_unlock(&state->lock);
> +                     return 0;
> +             }
> +     }
> +     rte_spinlock_unlock(&state->lock);
> +
> +     return -1;
> +}
> +
> +static void *
> +vhost_driver_session(void *param __rte_unused)
> +{
> +     static struct virtio_net_device_ops vhost_ops;
> +
> +     /* set vhost arguments */
> +     vhost_ops.new_device = new_device;
> +     vhost_ops.destroy_device = destroy_device;
> +     vhost_ops.vring_state_changed = vring_state_changed;
> +     if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
> +             RTE_LOG(ERR, PMD, "Can't register callbacks\n");
> +
> +     /* start event handling */
> +     rte_vhost_driver_session_start();
> +
> +     return NULL;
> +}
> +
> +static int
> +vhost_driver_session_start(void)
> +{
> +     int ret;
> +
> +     ret = pthread_create(&session_th,
> +                     NULL, vhost_driver_session, NULL);
> +     if (ret)
> +             RTE_LOG(ERR, PMD, "Can't create a thread\n");
> +
> +     return ret;
> +}
> +
> +static void
> +vhost_driver_session_stop(void)
> +{
> +     int ret;
> +
> +     ret = pthread_cancel(session_th);
> +     if (ret)
> +             RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
> +
> +     ret = pthread_join(session_th, NULL);
> +     if (ret)
> +             RTE_LOG(ERR, PMD, "Can't join the thread\n");
> +}
> +
> +static int
> +eth_dev_start(struct rte_eth_dev *dev)
> +{
> +     struct pmd_internal *internal = dev->data->dev_private;
> +     int ret = 0;
> +
> +     if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
> +             ret = rte_vhost_driver_register(internal->iface_name);
> +             if (ret)
> +                     return ret;
> +     }
> +
> +     /* We need only one message handling thread */
> +     if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
> +             ret = vhost_driver_session_start();
> +
> +     return ret;
> +}
> +
> +static void
> +eth_dev_stop(struct rte_eth_dev *dev)
> +{
> +     struct pmd_internal *internal = dev->data->dev_private;
> +
> +     if (rte_atomic16_cmpset(&internal->once, 1, 0))
> +             rte_vhost_driver_unregister(internal->iface_name);
> +
> +     if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
> +             vhost_driver_session_stop();
> +}
> +
> +static int
> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
> +                uint16_t nb_rx_desc __rte_unused,
> +                unsigned int socket_id,
> +                const struct rte_eth_rxconf *rx_conf __rte_unused,
> +                struct rte_mempool *mb_pool)
> +{
> +     struct vhost_queue *vq;
> +
> +     vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +                     RTE_CACHE_LINE_SIZE, socket_id);
> +     if (vq == NULL) {
> +             RTE_LOG(ERR, PMD, "Failed to allocate memory for rx
> queue\n");
> +             return -ENOMEM;
> +     }
> +
> +     vq->mb_pool = mb_pool;
> +     vq->virtqueue_id = rx_queue_id * VIRTIO_QNUM + VIRTIO_TXQ;
> +     dev->data->rx_queues[rx_queue_id] = vq;
> +
> +     return 0;
> +}
> +
> +static int
> +eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
> +                uint16_t nb_tx_desc __rte_unused,
> +                unsigned int socket_id,
> +                const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> +     struct vhost_queue *vq;
> +
> +     vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
> +                     RTE_CACHE_LINE_SIZE, socket_id);
> +     if (vq == NULL) {
> +             RTE_LOG(ERR, PMD, "Failed to allocate memory for tx
> queue\n");
> +             return -ENOMEM;
> +     }
> +
> +     vq->virtqueue_id = tx_queue_id * VIRTIO_QNUM + VIRTIO_RXQ;
> +     dev->data->tx_queues[tx_queue_id] = vq;
> +
> +     return 0;
> +}
> +
> +static void
> +eth_dev_info(struct rte_eth_dev *dev,
> +          struct rte_eth_dev_info *dev_info)
> +{
> +     dev_info->driver_name = drivername;
> +     dev_info->max_mac_addrs = 1;
> +     dev_info->max_rx_pktlen = (uint32_t)-1;
> +     dev_info->max_rx_queues = dev->data->nb_rx_queues;
> +     dev_info->max_tx_queues = dev->data->nb_tx_queues;

I'm not entirely familiar with eth driver code so please correct me if I am 
wrong.

I'm wondering if assigning the max queue values to dev->data->nb_*x_queues is 
correct.
A user could change the value of nb_*x_queues with a call to 
rte_eth_dev_configure(n_queues) which in turn calls 
rte_eth_dev_*x_queue_config(n_queues) which will set dev->data->nb_*x_queues to 
the value of n_queues which can be arbitrary and decided by the user. If this 
is the case, dev->data->nb_*x_queues will no longer reflect the max, rather the 
value the user chose in the call to rte_eth_dev_configure. And the max could 
potentially change with multiple calls to configure. Is this intended behaviour?

> +     dev_info->min_rx_bufsize = 0;
> +}
> +
> +static void
> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> +     unsigned i;
> +     unsigned long rx_total = 0, tx_total = 0, tx_missed_total = 0;
> +     unsigned long rx_total_bytes = 0, tx_total_bytes = 0;
> +     struct vhost_queue *vq;
> +
> +     for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +                     i < dev->data->nb_rx_queues; i++) {
> +             if (dev->data->rx_queues[i] == NULL)
> +                     continue;
> +             vq = dev->data->rx_queues[i];
> +             stats->q_ipackets[i] = vq->rx_pkts;
> +             rx_total += stats->q_ipackets[i];
> +
> +             stats->q_ibytes[i] = vq->rx_bytes;
> +             rx_total_bytes += stats->q_ibytes[i];
> +     }
> +
> +     for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS &&
> +                     i < dev->data->nb_tx_queues; i++) {
> +             if (dev->data->tx_queues[i] == NULL)
> +                     continue;
> +             vq = dev->data->tx_queues[i];
> +             stats->q_opackets[i] = vq->tx_pkts;
> +             tx_missed_total += vq->missed_pkts;
> +             tx_total += stats->q_opackets[i];
> +
> +             stats->q_obytes[i] = vq->tx_bytes;
> +             tx_total_bytes += stats->q_obytes[i];
> +     }
> +
> +     stats->ipackets = rx_total;
> +     stats->opackets = tx_total;
> +     stats->imissed = tx_missed_total;
> +     stats->ibytes = rx_total_bytes;
> +     stats->obytes = tx_total_bytes;
> +}
> +
> +static void
> +eth_stats_reset(struct rte_eth_dev *dev)
> +{
> +     struct vhost_queue *vq;
> +     unsigned i;
> +
> +     for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +             if (dev->data->rx_queues[i] == NULL)
> +                     continue;
> +             vq = dev->data->rx_queues[i];
> +             vq->rx_pkts = 0;
> +             vq->rx_bytes = 0;
> +     }
> +     for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +             if (dev->data->tx_queues[i] == NULL)
> +                     continue;
> +             vq = dev->data->tx_queues[i];
> +             vq->tx_pkts = 0;
> +             vq->tx_bytes = 0;
> +             vq->missed_pkts = 0;
> +     }
> +}
> +
> +static void
> +eth_queue_release(void *q)
> +{
> +     rte_free(q);
> +}
> +
> +static int
> +eth_link_update(struct rte_eth_dev *dev __rte_unused,
> +             int wait_to_complete __rte_unused)
> +{
> +     return 0;
> +}
> +
> +/**
> + * Disable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_disable(uint64_t feature_mask)
> +{
> +     return rte_vhost_feature_disable(feature_mask);
> +}
> +
> +/**
> + * Enable features in feature_mask. Returns 0 on success.
> + */
> +int
> +rte_eth_vhost_feature_enable(uint64_t feature_mask)
> +{
> +     return rte_vhost_feature_enable(feature_mask);
> +}
> +
> +/* Returns currently supported vhost features */
> +uint64_t
> +rte_eth_vhost_feature_get(void)
> +{
> +     return rte_vhost_feature_get();
> +}
> +
> +static const struct eth_dev_ops ops = {
> +     .dev_start = eth_dev_start,
> +     .dev_stop = eth_dev_stop,
> +     .dev_configure = eth_dev_configure,
> +     .dev_infos_get = eth_dev_info,
> +     .rx_queue_setup = eth_rx_queue_setup,
> +     .tx_queue_setup = eth_tx_queue_setup,
> +     .rx_queue_release = eth_queue_release,
> +     .tx_queue_release = eth_queue_release,
> +     .link_update = eth_link_update,
> +     .stats_get = eth_stats_get,
> +     .stats_reset = eth_stats_reset,
> +};
> +
> +static int
> +eth_dev_vhost_create(const char *name, char *iface_name, int16_t
> queues,
> +                  const unsigned numa_node)
> +{
> +     struct rte_eth_dev_data *data = NULL;
> +     struct pmd_internal *internal = NULL;
> +     struct rte_eth_dev *eth_dev = NULL;
> +     struct ether_addr *eth_addr = NULL;
> +     struct rte_vhost_vring_state *vring_state = NULL;
> +     struct internal_list *list = NULL;
> +
> +     RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa
> socket %u\n",
> +             numa_node);
> +
> +     /* now do all data allocation - for eth_dev structure, dummy pci
> driver
> +      * and internal (private) data
> +      */
> +     data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
> +     if (data == NULL)
> +             goto error;
> +
> +     internal = rte_zmalloc_socket(name, sizeof(*internal), 0,
> numa_node);
> +     if (internal == NULL)
> +             goto error;
> +
> +     list = rte_zmalloc_socket(name, sizeof(*list), 0, numa_node);
> +     if (list == NULL)
> +             goto error;
> +
> +     /* reserve an ethdev entry */
> +     eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
> +     if (eth_dev == NULL)
> +             goto error;
> +
> +     eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0,
> numa_node);
> +     if (eth_addr == NULL)
> +             goto error;
> +     *eth_addr = base_eth_addr;
> +     eth_addr->addr_bytes[5] = eth_dev->data->port_id;
> +
> +     vring_state = rte_zmalloc_socket(name,
> +                     sizeof(*vring_state), 0, numa_node);
> +     if (vring_state == NULL)
> +             goto error;
> +
> +     TAILQ_INIT(&eth_dev->link_intr_cbs);
> +
> +     /* now put it all together
> +      * - store queue data in internal,
> +      * - store numa_node info in ethdev data
> +      * - point eth_dev_data to internals
> +      * - and point eth_dev structure to new eth_dev_data structure
> +      */
> +     internal->dev_name = strdup(name);
> +     if (internal->dev_name == NULL)
> +             goto error;
> +     internal->iface_name = strdup(iface_name);
> +     if (internal->iface_name == NULL)
> +             goto error;
> +
> +     list->eth_dev = eth_dev;
> +     pthread_mutex_lock(&internal_list_lock);
> +     TAILQ_INSERT_TAIL(&internal_list, list, next);
> +     pthread_mutex_unlock(&internal_list_lock);
> +
> +     rte_spinlock_init(&vring_state->lock);
> +     vring_states[eth_dev->data->port_id] = vring_state;
> +
> +     data->dev_private = internal;
> +     data->port_id = eth_dev->data->port_id;
> +     memmove(data->name, eth_dev->data->name, sizeof(data-
> >name));
> +     data->nb_rx_queues = queues;
> +     data->nb_tx_queues = queues;
> +     data->dev_link = pmd_link;
> +     data->mac_addrs = eth_addr;
> +
> +     /* We'll replace the 'data' originally allocated by eth_dev. So the
> +      * vhost PMD resources won't be shared between multi processes.
> +      */
> +     eth_dev->data = data;
> +     eth_dev->dev_ops = &ops;
> +     eth_dev->driver = NULL;
> +     data->dev_flags =
> +             RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
> +     data->kdrv = RTE_KDRV_NONE;
> +     data->drv_name = internal->dev_name;
> +     data->numa_node = numa_node;
> +
> +     /* finally assign rx and tx ops */
> +     eth_dev->rx_pkt_burst = eth_vhost_rx;
> +     eth_dev->tx_pkt_burst = eth_vhost_tx;
> +
> +     return data->port_id;
> +
> +error:
> +     if (internal)
> +             free(internal->dev_name);
> +     rte_free(vring_state);
> +     rte_free(eth_addr);
> +     if (eth_dev)
> +             rte_eth_dev_release_port(eth_dev);
> +     rte_free(internal);
> +     rte_free(list);
> +     rte_free(data);
> +
> +     return -1;
> +}
> +
> +static inline int
> +open_iface(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +     const char **iface_name = extra_args;
> +
> +     if (value == NULL)
> +             return -1;
> +
> +     *iface_name = value;
> +
> +     return 0;
> +}
> +
> +static inline int
> +open_queues(const char *key __rte_unused, const char *value, void
> *extra_args)
> +{
> +     uint16_t *q = extra_args;
> +
> +     if (value == NULL || extra_args == NULL)
> +             return -EINVAL;
> +
> +     *q = (uint16_t)strtoul(value, NULL, 0);
> +     if (*q == USHRT_MAX && errno == ERANGE)
> +             return -1;
> +
> +     if (*q > RTE_MAX_QUEUES_PER_PORT)
> +             return -1;
> +
> +     return 0;
> +}
> +
> +static int
> +rte_pmd_vhost_devinit(const char *name, const char *params)
> +{
> +     struct rte_kvargs *kvlist = NULL;
> +     int ret = 0;
> +     char *iface_name;
> +     uint16_t queues;
> +
> +     RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
> +
> +     kvlist = rte_kvargs_parse(params, valid_arguments);
> +     if (kvlist == NULL)
> +             return -1;
> +
> +     if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
> +             ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
> +                                      &open_iface, &iface_name);
> +             if (ret < 0)
> +                     goto out_free;
> +     } else {
> +             ret = -1;
> +             goto out_free;
> +     }
> +
> +     if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
> +             ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
> +                                      &open_queues, &queues);
> +             if (ret < 0)
> +                     goto out_free;
> +
> +     } else
> +             queues = 1;
> +
> +     eth_dev_vhost_create(name, iface_name, queues,
> rte_socket_id());
> +
> +out_free:
> +     rte_kvargs_free(kvlist);
> +     return ret;
> +}
> +
> +static int
> +rte_pmd_vhost_devuninit(const char *name)
> +{
> +     struct rte_eth_dev *eth_dev = NULL;
> +     struct pmd_internal *internal;
> +     struct internal_list *list;
> +     unsigned int i;
> +
> +     RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
> +
> +     /* find an ethdev entry */
> +     eth_dev = rte_eth_dev_allocated(name);
> +     if (eth_dev == NULL)
> +             return -ENODEV;
> +
> +     internal = eth_dev->data->dev_private;
> +     if (internal == NULL)
> +             return -ENODEV;
> +
> +     list = find_internal_resource(internal->iface_name);
> +     if (list == NULL)
> +             return -ENODEV;
> +
> +     pthread_mutex_lock(&internal_list_lock);
> +     TAILQ_REMOVE(&internal_list, list, next);
> +     pthread_mutex_unlock(&internal_list_lock);
> +     rte_free(list);
> +
> +     eth_dev_stop(eth_dev);
> +
> +     rte_free(vring_states[eth_dev->data->port_id]);
> +     vring_states[eth_dev->data->port_id] = NULL;
> +
> +     free(internal->dev_name);
> +     free(internal->iface_name);
> +
> +     for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> +             rte_free(eth_dev->data->rx_queues[i]);
> +     for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> +             rte_free(eth_dev->data->tx_queues[i]);
> +
> +     rte_free(eth_dev->data->mac_addrs);
> +     rte_free(eth_dev->data);
> +     rte_free(internal);
> +
> +     rte_eth_dev_release_port(eth_dev);
> +
> +     return 0;
> +}
> +
> +static struct rte_driver pmd_vhost_drv = {
> +     .name = "eth_vhost",
> +     .type = PMD_VDEV,
> +     .init = rte_pmd_vhost_devinit,
> +     .uninit = rte_pmd_vhost_devuninit,
> +};
> +
> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
> diff --git a/drivers/net/vhost/rte_eth_vhost.h
> b/drivers/net/vhost/rte_eth_vhost.h
> new file mode 100644
> index 0000000..e78cb74
> --- /dev/null
> +++ b/drivers/net/vhost/rte_eth_vhost.h
> @@ -0,0 +1,109 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 IGEL Co., Ltd.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IGEL Co., Ltd. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_VHOST_H_
> +#define _RTE_ETH_VHOST_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_virtio_net.h>
> +
> +/**
> + * Disable features in feature_mask.
> + *
> + * @param feature_mask
> + *  Vhost features defined in "linux/virtio_net.h".
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
> +
> +/**
> + * Enable features in feature_mask.
> + *
> + * @param feature_mask
> + *  Vhost features defined in "linux/virtio_net.h".
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
> +
> +/**
> + * Returns currently supported vhost features.
> + *
> + * @return
> + *  Vhost features defined in "linux/virtio_net.h".
> + */
> +uint64_t rte_eth_vhost_feature_get(void);
> +
> +/*
> + * Event description.
> + */
> +struct rte_eth_vhost_queue_event {
> +     uint16_t queue_id;
> +     bool rx;
> +     bool enable;
> +};
> +
> +/**
> + * Get queue events from specified port.
> + * If a callback for below event is registered by
> + * rte_eth_dev_callback_register(), this function will describe what was
> + * changed.
> + *  - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
> + * Multiple events may cause only one callback kicking, so call this function
> + * while returning 0.
> + *
> + * @param port_id
> + *  Port id.
> + * @param event
> + *  Pointer to a rte_eth_vhost_queue_event structure.
> + * @return
> + *  - On success, zero.
> + *  - On failure, a negative value.
> + */
> +int rte_eth_vhost_get_queue_event(uint8_t port_id,
> +             struct rte_eth_vhost_queue_event *event);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map
> b/drivers/net/vhost/rte_pmd_vhost_version.map
> new file mode 100644
> index 0000000..65bf3a8
> --- /dev/null
> +++ b/drivers/net/vhost/rte_pmd_vhost_version.map
> @@ -0,0 +1,10 @@
> +DPDK_16.04 {
> +     global:
> +
> +     rte_eth_vhost_feature_disable;
> +     rte_eth_vhost_feature_enable;
> +     rte_eth_vhost_feature_get;
> +     rte_eth_vhost_get_queue_event;
> +
> +     local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index a1cd9a3..bd973e8 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -166,6 +166,12 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)
> += -lrte_pmd_snow3g
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SNOW3G)     += -
> L$(LIBSSO_PATH)/build -lsso
>  endif # CONFIG_RTE_LIBRTE_CRYPTODEV
> 
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
> +
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +
>  endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
> 
>  _LDLIBS-y += $(EXECENV_LDLIBS)
> --
> 2.1.4

[dpdk-dev] [PATCH v13 2/2] vhost: Add VHOST PMD

Reply via email to