from:"Liu, Yong"

Re: [dpdk-dev] [PATCH v2 2/2] vhost: notice Vhost ops struct renaming

2021-08-06 Thread Liu, Yong

> -Original Message-
> From: dev  On Behalf Of Maxime Coquelin
> Sent: Friday, July 30, 2021 4:13 PM
> To: dev@dpdk.org; Xia, Chenbo ;
> amore...@redhat.com; Richardson, Bruce ;
> Yigit, Ferruh ; tho...@monjalon.net;
> acon...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [dpdk-dev] [PATCH v2 2/2] vhost: notice Vhost ops struct renaming
> 
> This patch announces the renaming of struct
> vhost_device_ops to rte_vhost_device_ops in DPDK v21.11.
> 
> Acked-by: Chenbo Xia 
> Signed-off-by: Maxime Coquelin 
> ---
>  doc/guides/rel_notes/deprecation.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index b34bed61a6..76ebf162bd 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -151,3 +151,6 @@ Deprecation Notices
>  * vhost: ``rte_vdpa_register_device``, ``rte_vdpa_unregister_device``,
>``rte_vhost_host_notifier_ctrl`` and ``rte_vdpa_relay_vring_used`` vDPA
>driver API will be marked as internal in DPDK v21.11.
> +
> +* vhost: rename ``struct vhost_device_ops`` to ``struct
> rte_vhost_device_ops``
> +  int DPDK v21.11.
> --
> 2.31.1

Acked-by: Marvin Liu

Re: [dpdk-dev] [PATCH v2 1/2] vhost: announce vDPA driver API marking as internal

2021-08-06 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Maxime Coquelin
> Sent: Friday, July 30, 2021 4:12 PM
> To: dev@dpdk.org; Xia, Chenbo ;
> amore...@redhat.com; Richardson, Bruce ;
> Yigit, Ferruh ; tho...@monjalon.net;
> acon...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [dpdk-dev] [PATCH v2 1/2] vhost: announce vDPA driver API marking
> as internal
> 
> This patch announces the marking of all the vDPA driver APIs
> as internal.
> 
> Acked-by: Chenbo Xia 
> Signed-off-by: Maxime Coquelin 
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 9584d6bfd7..b34bed61a6 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,7 @@ Deprecation Notices
>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
> specific
>content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>original structure will be kept until DPDK 21.11.
> +
> +* vhost: ``rte_vdpa_register_device``, ``rte_vdpa_unregister_device``,
> +  ``rte_vhost_host_notifier_ctrl`` and ``rte_vdpa_relay_vring_used`` vDPA
> +  driver API will be marked as internal in DPDK v21.11.
> --
> 2.31.1


Acked-by: Marvin Liu

Re: [dpdk-dev] [PATCH v2] vhost: announce experimental tag removal of vhost APIs

2021-08-06 Thread Liu, Yong



> -Original Message-
> From: dev  On Behalf Of Chenbo Xia
> Sent: Friday, July 30, 2021 4:24 PM
> To: dev@dpdk.org; maxime.coque...@redhat.com; amore...@redhat.com;
> step...@networkplumber.org; tho...@monjalon.net; Yigit, Ferruh
> ; Richardson, Bruce ;
> Ananyev, Konstantin ;
> ktray...@redhat.com; jerinjac...@gmail.com
> Subject: [dpdk-dev] [PATCH v2] vhost: announce experimental tag removal of
> vhost APIs
> 
> This patch announces the experimental tag removal of 10 vhost APIs,
> which have been experimental for more than 2 years. All APIs could
> be made stable in DPDK 21.11.
> 
> Signed-off-by: Chenbo Xia 
> Acked-by: Maxime Coquelin 
> ---
>  doc/guides/rel_notes/deprecation.rst | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 9584d6bfd7..5d5b7884d7 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -147,3 +147,11 @@ Deprecation Notices
>  * cmdline: ``cmdline`` structure will be made opaque to hide platform-
> specific
>content. On Linux and FreeBSD, supported prior to DPDK 20.11,
>original structure will be kept until DPDK 21.11.
> +
> +* vhost: The experimental tags of
> ``rte_vhost_driver_get_protocol_features``,
> +  ``rte_vhost_driver_get_queue_num``, ``rte_vhost_crypto_create``,
> +  ``rte_vhost_crypto_free``, ``rte_vhost_crypto_fetch_requests``,
> +  ``rte_vhost_crypto_finalize_requests``, ``rte_vhost_crypto_set_zero_copy``,
> +  ``rte_vhost_va_from_guest_pa``, ``rte_vhost_extern_callback_register``,
> +  and ``rte_vhost_driver_set_protocol_features`` APIs will be removed and
> the
> +  APIs will be made stable in DPDK 21.11.
> --
> 2.17.1

Acked-by: Marvin Liu

Re: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check

2021-05-27 Thread Liu, Yong




> -Original Message-
> From: Burakov, Anatoly 
> Sent: Thursday, May 27, 2021 9:07 PM
> To: Liu, Yong ; dev@dpdk.org; McDaniel, Timothy
> ; Xing, Beilei ; Wu,
> Jingjing ; Yang, Qiming ;
> Zhang, Qi Z ; Wang, Haiyue
> ; Matan Azrad ; Shahaf
> Shuler ; Viacheslav Ovsiienko
> ; Richardson, Bruce
> ; Ananyev, Konstantin
> 
> Cc: Loftus, Ciara 
> Subject: Re: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor
> check
> 
> On 25-May-21 10:15 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: dev  On Behalf Of Anatoly Burakov
> >> Sent: Tuesday, May 11, 2021 11:32 PM
> >> To: dev@dpdk.org; McDaniel, Timothy ;
> Xing,
> >> Beilei ; Wu, Jingjing ; Yang,
> >> Qiming ; Zhang, Qi Z ;
> >> Wang, Haiyue ; Matan Azrad
> >> ; Shahaf Shuler ; Viacheslav
> >> Ovsiienko ; Richardson, Bruce
> >> ; Ananyev, Konstantin
> >> 
> >> Cc: Loftus, Ciara 
> >> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
> >>
> >> Previously, the semantics of power monitor were such that we were
> >> checking current value against the expected value, and if they matched,
> >> then the sleep was aborted. This is somewhat inflexible, because it only
> >> allowed us to check for a specific value.
> >>
> >> We can reverse the check, and instead have monitor sleep to be aborted
> >> if the expected value *doesn't* match what's in memory. This allows us
> >> to both implement all currently implemented driver code, as well as
> >> support more use cases which don't easily map to previous semantics
> >> (such as waiting on writes to AF_XDP counter value).
> >>
> >
> > Hi Anatoly,
> > In virtio spec, packed formatted descriptor utilizes two bits for 
> > representing
> the status. One bit for available status, one bit for used status.
> > For checking the status more precisely, it is need to check value against 
> > the
> expected value.
> > The monitor function in virtio datapath still can work with new semantics,
> but it may lead to some useless io call.
> > Base on that, I'd like to keep previous semantics.
> >
> > Regards,
> > Marvin
> >
> 
> Thanks for your feedback! Would making this an option make things
> better? Because we need the inverted semantics for AF_XDP, it can't work
> without it. So, we either invert all of them, or we have an option to do
> regular or inverted check on a per-condition basis. Would that work?
> 

That will be great if we can select the check type based on input parameter.
Just in virtio datapath, we need both inverted and original semantics for 
different ring formats. 

Regards,
Marvin

> 
> --
> Thanks,
> Anatoly

Re: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check

2021-05-25 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Anatoly Burakov
> Sent: Tuesday, May 11, 2021 11:32 PM
> To: dev@dpdk.org; McDaniel, Timothy ; Xing,
> Beilei ; Wu, Jingjing ; Yang,
> Qiming ; Zhang, Qi Z ;
> Wang, Haiyue ; Matan Azrad
> ; Shahaf Shuler ; Viacheslav
> Ovsiienko ; Richardson, Bruce
> ; Ananyev, Konstantin
> 
> Cc: Loftus, Ciara 
> Subject: [dpdk-dev] [21.08 PATCH v1 1/2] power: invert the monitor check
> 
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value.
> 
> We can reverse the check, and instead have monitor sleep to be aborted
> if the expected value *doesn't* match what's in memory. This allows us
> to both implement all currently implemented driver code, as well as
> support more use cases which don't easily map to previous semantics
> (such as waiting on writes to AF_XDP counter value).
> 

Hi Anatoly,
In virtio spec, packed formatted descriptor utilizes two bits for representing 
the status. One bit for available status, one bit for used status. 
For checking the status more precisely, it is need to check value against the 
expected value. 
The monitor function in virtio datapath still can work with new semantics, but 
it may lead to some useless io call. 
Base on that, I'd like to keep previous semantics.

Regards,
Marvin

> This commit also adjusts all current driver implementations to match the
> new semantics.
> 
> Signed-off-by: Anatoly Burakov 
> ---
>  drivers/event/dlb2/dlb2.c  | 2 +-
>  drivers/net/i40e/i40e_rxtx.c   | 2 +-
>  drivers/net/iavf/iavf_rxtx.c   | 2 +-
>  drivers/net/ice/ice_rxtx.c | 2 +-
>  drivers/net/ixgbe/ixgbe_rxtx.c | 2 +-
>  drivers/net/mlx5/mlx5_rx.c | 2 +-
>  lib/eal/include/generic/rte_power_intrinsics.h | 8 
>  lib/eal/x86/rte_power_intrinsics.c | 4 ++--
>  8 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 3570678b9e..5701bbb8ab 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -3188,7 +3188,7 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>   &cq_base[qm_port->cq_idx];
>   monitor_addr++; /* cq_gen bit is in second 64bit location */
> 
> - if (qm_port->gen_bit)
> + if (!qm_port->gen_bit)
>   expected_value = qe_mask.raw_qe[1];
>   else
>   expected_value = 0;
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 02cf5e787c..4617ae914a 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -88,7 +88,7 @@ i40e_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>* we expect the DD bit to be set to 1 if this descriptor was already
>* written to.
>*/
> - pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> + pmc->val = 0;
>   pmc->mask = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> 
>   /* registers are 64-bit */
> diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
> index 87f7eebc65..d8d9cc860c 100644
> --- a/drivers/net/iavf/iavf_rxtx.c
> +++ b/drivers/net/iavf/iavf_rxtx.c
> @@ -73,7 +73,7 @@ iavf_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>* we expect the DD bit to be set to 1 if this descriptor was already
>* written to.
>*/
> - pmc->val = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> + pmc->val = 0;
>   pmc->mask = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> 
>   /* registers are 64-bit */
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 92fbbc18da..4e349bfa3f 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -43,7 +43,7 @@ ice_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>* we expect the DD bit to be set to 1 if this descriptor was already
>* written to.
>*/
> - pmc->val = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> + pmc->val = 0;
>   pmc->mask = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> 
>   /* register is 16-bit */
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index d69f36e977..2793718171 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1385,7 +1385,7 @@ ixgbe_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>* we expect the DD bit to be set to 1 if this descriptor was already
>* written to.
>*/
> - pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +

Re: [dpdk-dev] [PATCH] vhost: fix accessing uninitialized variables

2021-04-06 Thread Liu, Yong



> -Original Message-
> From: Xia, Chenbo 
> Sent: Tuesday, April 6, 2021 2:18 PM
> To: Liu, Yong ; maxime.coque...@redhat.com
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH] vhost: fix accessing uninitialized variables
> 
> Hi Marvin,
> 
> > -Original Message-
> > From: Liu, Yong 
> > Sent: Wednesday, March 3, 2021 3:28 PM
> > To: maxime.coque...@redhat.com; Xia, Chenbo 
> > Cc: dev@dpdk.org; Liu, Yong ; sta...@dpdk.org
> > Subject: [PATCH] vhost: fix accessing uninitialized variables
> >
> > This patch fixs coverity issue by adding initialization step before
> > using temporary virtio header.
> >
> > Coverity issue: 366181, 366123
> > Fixes: fb3815cc614d ("vhost: handle virtually non-contiguous buffers in Rx-
> > mrg")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 583bf379c6..fe464b3088 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -808,9 +808,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> 
> You should apply the same fix to async_mbuf_to_desc.
> Maybe you did not notice: one coverity issue is in copy_mbuf_to_desc, but
> another
> in async_mbuf_to_desc :)
> 
Thanks for remind, will fix function async_mbuf_to_desc in next version.

Regards,
Marvin

> Thanks,
> Chenbo
> 
> >
> >  hdr_mbuf = m;
> >  hdr_addr = buf_addr;
> > -if (unlikely(buf_len < dev->vhost_hlen))
> > +if (unlikely(buf_len < dev->vhost_hlen)) {
> > +memset(&tmp_hdr, 0, sizeof(struct virtio_net_hdr_mrg_rxbuf));
> >  hdr = &tmp_hdr;
> > -else
> > +} else
> >  hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)hdr_addr;
> >
> >  VHOST_LOG_DATA(DEBUG, "(%d) RX: num merge buffers %d\n",
> > --
> > 2.17.1
>

Re: [dpdk-dev] [PATCH] vhost: fix accessing uninitialized variables

2021-03-28 Thread Liu, Yong



> -Original Message-
> From: wangyunjian 
> Sent: Saturday, March 27, 2021 6:06 PM
> To: Maxime Coquelin ; Liu, Yong
> ; Xia, Chenbo 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] vhost: fix accessing uninitialized variables
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin
> > Sent: Wednesday, March 24, 2021 5:55 PM
> > To: Marvin Liu ; chenbo@intel.com
> > Cc: dev@dpdk.org; sta...@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] vhost: fix accessing uninitialized variables
> >
> >
> >
> > On 3/3/21 8:27 AM, Marvin Liu wrote:
> > > This patch fixs coverity issue by adding initialization step before
> > > using temporary virtio header.
> > >
> > > Coverity issue: 366181, 366123
> > > Fixes: fb3815cc614d ("vhost: handle virtually non-contiguous buffers
> > > in Rx-mrg")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Marvin Liu 
> > >
> > > diff --git a/lib/librte_vhost/virtio_net.c
> > > b/lib/librte_vhost/virtio_net.c index 583bf379c6..fe464b3088 100644
> > > --- a/lib/librte_vhost/virtio_net.c
> > > +++ b/lib/librte_vhost/virtio_net.c
> > > @@ -808,9 +808,10 @@ copy_mbuf_to_desc(struct virtio_net *dev,
> struct
> > > vhost_virtqueue *vq,
> > >
> > >   hdr_mbuf = m;
> > >   hdr_addr = buf_addr;
> > > - if (unlikely(buf_len < dev->vhost_hlen))
> > > + if (unlikely(buf_len < dev->vhost_hlen)) {
> > > + memset(&tmp_hdr, 0, sizeof(struct
> virtio_net_hdr_mrg_rxbuf));
> > >   hdr = &tmp_hdr;
> > > - else
> > > + } else
> > >   hdr = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)hdr_addr;
> > >
> > >   VHOST_LOG_DATA(DEBUG, "(%d) RX: num merge buffers %d\n",
> > >
> 
> I think it's better to revise it in this way:
> 

Thanks, yunjian. This patch for reported coverity issue.

The problem came from the read of net_hdr->csum_offset when using macro 
ASSIGN_UNLESS_EQUAL.
When net_hdr not completed in the first buffer, temporary net_hdr will be used 
which hasn't been initialized. 

Regards,
Marvin

> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 583bf37..ccb73b9 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -420,6 +420,8 @@
> net_hdr->csum_offset = (offsetof(struct rte_sctp_hdr,
> cksum));
> break;
> +   default:
> +   ASSIGN_UNLESS_EQUAL(net_hdr->csum_offset, 0);
> }
> 
> >
> > Reviewed-by: Maxime Coquelin 
> >
> > Thanks,
> > Maxime

Re: [dpdk-dev] [PATCH] vhost: fix potential buffer overflow

2021-03-24 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Wednesday, March 24, 2021 4:56 PM
> To: Liu, Yong ; Xia, Chenbo 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [PATCH] vhost: fix potential buffer overflow
> 
> Hi Marvin,
> 
> On 2/26/21 8:33 AM, Marvin Liu wrote:
> > In vhost datapath, descriptor's length are mostly used in two coherent
> > operations. First step is used for address translation, second step is
> > used for memory transaction from guest to host. But the iterval between
> > two steps will give a window for malicious guest, in which can change
> > descriptor length after vhost calcuated buffer size. Thus may lead to
> > buffer overflow in vhost side. This potential risk can be eliminated by
> > accessing the descriptor length once.
> >
> > Fixes: 1be4ebb1c464 ("vhost: support indirect descriptor in mergeable Rx")
> > Fixes: 2f3225a7d69b ("vhost: add vector filling support for packed ring")
> > Fixes: 75ed51697820 ("vhost: add packed ring batch dequeue")
> 
> As the offending commits have been introduced in different LTS, I would
> prefer the patch to be split. It will make is easier for backporting later.
> 

Maxime,
Thanks for your suggestion,  I will split this patch into three parts as they 
were spread over three different LTS. 

Regards,
Marvin

> > Signed-off-by: Marvin Liu 
> > Cc: sta...@dpdk.org
> >
> > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> > index 583bf379c6..0a7d008a91 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -548,10 +548,11 @@ fill_vec_buf_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > return -1;
> > }
> >
> > -   len += descs[idx].len;
> > +   dlen = descs[idx].len;
> > +   len += dlen;
> >
> > if (unlikely(map_one_desc(dev, vq, buf_vec, &vec_id,
> > -   descs[idx].addr,
> descs[idx].len,
> > +   descs[idx].addr, dlen,
> > perm))) {
> > free_ind_table(idesc);
> > return -1;
> > @@ -668,9 +669,10 @@ fill_vec_buf_packed_indirect(struct virtio_net
> *dev,
> > return -1;
> > }
> >
> > -   *len += descs[i].len;
> > +   dlen = descs[i].len;
> > +   *len += dlen;
> > if (unlikely(map_one_desc(dev, vq, buf_vec, &vec_id,
> > -   descs[i].addr, descs[i].len,
> > +   descs[i].addr, dlen,
> > perm)))
> > return -1;
> > }
> > @@ -691,6 +693,7 @@ fill_vec_buf_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > bool wrap_counter = vq->avail_wrap_counter;
> > struct vring_packed_desc *descs = vq->desc_packed;
> > uint16_t vec_id = *vec_idx;
> > +   uint64_t dlen;
> >
> > if (avail_idx < vq->last_avail_idx)
> > wrap_counter ^= 1;
> > @@ -723,11 +726,12 @@ fill_vec_buf_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > len, perm) < 0))
> > return -1;
> > } else {
> > -   *len += descs[avail_idx].len;
> > +   dlen = descs[avail_idx].len;
> > +   *len += dlen;
> >
> > if (unlikely(map_one_desc(dev, vq, buf_vec, &vec_id,
> > descs[avail_idx].addr,
> > -   descs[avail_idx].len,
> > +   dlen,
> > perm)))
> > return -1;
> > }
> > @@ -2314,7 +2318,7 @@ vhost_reserve_avail_batch_packed(struct
> virtio_net *dev,
> > }
> >
> > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > -   pkts[i]->pkt_len = descs[avail_idx + i].len - buf_offset;
> > +   pkts[i]->pkt_len = lens[i] - buf_offset;
> > pkts[i]->data_len = pkts[i]->pkt_len;
> > ids[i] = descs[avail_idx + i].id;
> > }
> >
> 
> Other than that, the patch looks valid to me.
> With the split done:
> 
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v2] vhost: add support for packed ring in async vhost

2021-03-24 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Cheng Jiang
> Sent: Monday, March 22, 2021 2:15 PM
> To: maxime.coque...@redhat.com; Xia, Chenbo 
> Cc: dev@dpdk.org; Hu, Jiayu ; Yang, YvonneX
> ; Wang, Yinan ; Jiang,
> Cheng1 
> Subject: [dpdk-dev] [PATCH v2] vhost: add support for packed ring in async
> vhost
> 
> For now async vhost data path only supports split ring structure. In
> order to make async vhost compatible with virtio 1.1 spec this patch
> enables packed ring in async vhost data path.
> 
> Signed-off-by: Cheng Jiang 
> ---
> v2:
>   * fix wrong buffer index in rte_vhost_poll_enqueue_completed()
>   * add async_buffers_packed memory free in vhost_free_async_mem()
> 
>  lib/librte_vhost/rte_vhost_async.h |   1 +
>  lib/librte_vhost/vhost.c   |  24 +-
>  lib/librte_vhost/vhost.h   |   7 +-
>  lib/librte_vhost/virtio_net.c  | 447 +++--
>  4 files changed, 441 insertions(+), 38 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_vhost_async.h
> b/lib/librte_vhost/rte_vhost_async.h
> index c855ff875..6faa31f5a 100644
> --- a/lib/librte_vhost/rte_vhost_async.h
> +++ b/lib/librte_vhost/rte_vhost_async.h
> @@ -89,6 +89,7 @@ struct rte_vhost_async_channel_ops {
>  struct async_inflight_info {
>   struct rte_mbuf *mbuf;
>   uint16_t descs; /* num of descs inflight */
> + uint16_t nr_buffers; /* num of buffers inflight for packed ring */
>  };
> 
>  /**
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 52ab93d1e..51b44d6f2 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -330,15 +330,20 @@ vhost_free_async_mem(struct vhost_virtqueue
> *vq)
>  {
>   if (vq->async_pkts_info)
>   rte_free(vq->async_pkts_info);
> - if (vq->async_descs_split)
> + if (vq->async_buffers_packed) {
> + rte_free(vq->async_buffers_packed);
> + vq->async_buffers_packed = NULL;
> + } else {
>   rte_free(vq->async_descs_split);
> + vq->async_descs_split = NULL;
> + }
> +
>   if (vq->it_pool)
>   rte_free(vq->it_pool);
>   if (vq->vec_pool)
>   rte_free(vq->vec_pool);
> 
>   vq->async_pkts_info = NULL;
> - vq->async_descs_split = NULL;
>   vq->it_pool = NULL;
>   vq->vec_pool = NULL;
>  }
> @@ -1603,9 +1608,9 @@ int rte_vhost_async_channel_register(int vid,
> uint16_t queue_id,
>   return -1;
> 
>   /* packed queue is not supported */
> - if (unlikely(vq_is_packed(dev) || !f.async_inorder)) {
> + if (unlikely(!f.async_inorder)) {
>   VHOST_LOG_CONFIG(ERR,
> - "async copy is not supported on packed queue or
> non-inorder mode "
> + "async copy is not supported on non-inorder mode "
>   "(vid %d, qid: %d)\n", vid, queue_id);
>   return -1;
>   }
> @@ -1643,10 +1648,17 @@ int rte_vhost_async_channel_register(int vid,
> uint16_t queue_id,
>   vq->vec_pool = rte_malloc_socket(NULL,
>   VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
>   RTE_CACHE_LINE_SIZE, node);
> - vq->async_descs_split = rte_malloc_socket(NULL,
> + if (vq_is_packed(dev)) {
> + vq->async_buffers_packed = rte_malloc_socket(NULL,
> + vq->size * sizeof(struct vring_used_elem_packed),
> + RTE_CACHE_LINE_SIZE, node);
> + } else {
> + vq->async_descs_split = rte_malloc_socket(NULL,
>   vq->size * sizeof(struct vring_used_elem),
>   RTE_CACHE_LINE_SIZE, node);
> - if (!vq->async_descs_split || !vq->async_pkts_info ||
> + }
> +
> + if (!vq->async_pkts_info ||
>   !vq->it_pool || !vq->vec_pool) {
>   vhost_free_async_mem(vq);
>   VHOST_LOG_CONFIG(ERR,
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 658f6fc28..d6324fbf8 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -206,9 +206,14 @@ struct vhost_virtqueue {
>   uint16_tasync_pkts_idx;
>   uint16_tasync_pkts_inflight_n;
>   uint16_tasync_last_pkts_n;
> - struct vring_used_elem  *async_descs_split;
> + union {
> + struct vring_used_elem  *async_descs_split;
> + struct vring_used_elem_packed *async_buffers_packed;
> + };
>   uint16_t async_desc_idx;
> + uint16_t async_packed_buffer_idx;
>   uint16_t last_async_desc_idx;
> + uint16_t last_async_buffer_idx;
> 
>   /* vq async features */
>   boolasync_inorder;
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 583bf379c..fa8c4f4fe 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -363,8 +363,7 @@
> vhost_shadow_dequeue_single_packed_inorder(struct vhost_virtqueue *vq,
>  }
> 
>  static __rte_alway

Re: [dpdk-dev] [PATCH] net/virtio: enable packet data prefetch on x86

2020-11-13 Thread Liu, Yong

+ more people into this conversation.  IMHO, restore to previous state is the 
best choice by now.

> -Original Message-
> From: David Marchand 
> Sent: Friday, November 13, 2020 3:27 PM
> To: Liu, Yong 
> Cc: Maxime Coquelin ; Xia, Chenbo
> ; dev ; Richardson, Bruce
> 
> Subject: Re: [dpdk-dev] [PATCH] net/virtio: enable packet data prefetch on
> x86
> 
> On Fri, Nov 13, 2020 at 2:20 AM Liu, Yong  wrote:
> > > Yes and this will also solve https://patchwork.dpdk.org/patch/83468/.
> > > Thanks.
> > >
> >
> > Agreed, original patch was intended to recover prefetch configuration in
> meson build.
> > Please check http://patchwork.dpdk.org/patch/78451/.
> > And it leaded a discussion about how to utilize prefetch function
> optimally.
> > Due to no conclusion for current position is best for other platforms
> except x86, now only enable prefetch in virtio + x86.
> 
> I disagree.
> No conclusion means the best is to restore the previous state, i.e.
> enable this option for all platforms.
> 
> If later other architectures want to change this, this can revisit.
> 
> 
> --
> David Marchand

Re: [dpdk-dev] [PATCH] net/virtio: enable packet data prefetch on x86

2020-11-12 Thread Liu, Yong



> -Original Message-
> From: David Marchand 
> Sent: Thursday, November 12, 2020 4:58 PM
> To: Liu, Yong ; Maxime Coquelin
> 
> Cc: Xia, Chenbo ; dev ; Richardson,
> Bruce 
> Subject: Re: [dpdk-dev] [PATCH] net/virtio: enable packet data prefetch on
> x86
> 
> On Thu, Nov 12, 2020 at 9:48 AM Maxime Coquelin
>  wrote:
> > On 11/11/20 4:40 PM, Marvin Liu wrote:
> > > Data prefetch instruction can preload data into cpu’s hierarchical
> > > cache before data access. Virtio datapath utilized this feature for
> > > data access acceleration. As config RTE_PMD_PACKET_PREFETCH was
> > > discarded, now packet data prefetch is enabled based on architecture.
> > >
> > > Signed-off-by: Marvin Liu 
> > >
> > > diff --git a/drivers/net/virtio/virtqueue.h
> b/drivers/net/virtio/virtqueue.h
> > > index 42c4c9882..0196290a5 100644
> > > --- a/drivers/net/virtio/virtqueue.h
> > > +++ b/drivers/net/virtio/virtqueue.h
> > > @@ -106,7 +106,7 @@ virtqueue_store_flags_packed(struct
> vring_packed_desc *dp,
> > >   dp->flags = flags;
> > >   }
> > >  }
> > > -#ifdef RTE_PMD_PACKET_PREFETCH
> > > +#if defined(RTE_ARCH_X86)
> > >  #define rte_packet_prefetch(p)  rte_prefetch1(p)
> > >  #else
> > >  #define rte_packet_prefetch(p)  do {} while(0)
> > >
> >
> > Thanks for catching this issue.
> > I agree it should be re-enabled by default, and not only on X86, not
> > only on Virtio PMD.
> >
> > AFAICS, prefetch was enabled for all platforms before the switch to
> > Meson, so I see it as an involuntary change that needs to be reverted.
> 
> Yes and this will also solve https://patchwork.dpdk.org/patch/83468/.
> Thanks.
> 

Agreed, original patch was intended to recover prefetch configuration in meson 
build. 
Please check http://patchwork.dpdk.org/patch/78451/. 
And it leaded a discussion about how to utilize prefetch function optimally. 
Due to no conclusion for current position is best for other platforms except 
x86, now only enable prefetch in virtio + x86. 

> --
> David Marchand

Re: [dpdk-dev] [PATCH v3 0/5] vhost add vectorized data path

2020-10-15 Thread Liu, Yong

Hi All,
Performance gain from vectorized datapath in OVS-DPDK is around 1%, meanwhile 
it have a small impact of original datapath. 
On the other hand, it will increase the complexity of vhost (new parameter 
introduced, prepare memory information for address translation). 
After weighed the procs and co, I’d like to drawback this patch set.  Thanks 
for your time.

Regards,
Marvin

> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, October 12, 2020 4:22 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/9/20 10:14 AM, Marvin Liu wrote:
> > Packed ring format is imported since virtio spec 1.1. All descriptors
> > are compacted into one single ring when packed ring format is on. It is
> > straight forward that ring operations can be accelerated by utilizing
> > SIMD instructions.
> >
> > This patch set will introduce vectorized data path in vhost library. If
> > vectorized option is on, operations like descs check, descs writeback,
> > address translation will be accelerated by SIMD instructions. On skylake
> > server, it can bring 6% performance gain in loopback case and around 4%
> > performance gain in PvP case.
> 
> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> benchmark, I will try to do a benchmark this week.
> 
> Thanks,
> Maxime
> 
> > Vhost application can choose whether using vectorized acceleration, just
> > like external buffer feature. If platform or ring format not support
> > vectorized function, vhost will fallback to use default batch function.
> > There will be no impact in current data path.
> >
> > v3:
> > * rename vectorized datapath file
> > * eliminate the impact when avx512 disabled
> > * dynamically allocate memory regions structure
> > * remove unlikely hint for in_order
> >
> > v2:
> > * add vIOMMU support
> > * add dequeue offloading
> > * rebase code
> >
> > Marvin Liu (5):
> >   vhost: add vectorized data path
> >   vhost: reuse packed ring functions
> >   vhost: prepare memory regions addresses
> >   vhost: add packed ring vectorized dequeue
> >   vhost: add packed ring vectorized enqueue
> >
> >  doc/guides/nics/vhost.rst   |   5 +
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >  lib/librte_vhost/meson.build|  16 ++
> >  lib/librte_vhost/rte_vhost.h|   1 +
> >  lib/librte_vhost/socket.c   |   5 +
> >  lib/librte_vhost/vhost.c|  11 +
> >  lib/librte_vhost/vhost.h| 239 +++
> >  lib/librte_vhost/vhost_user.c   |  26 +++
> >  lib/librte_vhost/virtio_net.c   | 258 -
> >  lib/librte_vhost/virtio_net_avx.c   | 344 
> >  11 files changed, 718 insertions(+), 216 deletions(-)
> >  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >

Re: [dpdk-dev] [PATCH v2] config: enable packet data prefetch

2020-10-15 Thread Liu, Yong



> -Original Message-
> From: Honnappa Nagarahalli 
> Sent: Thursday, October 15, 2020 12:10 PM
> To: Liu, Yong ; tho...@monjalon.net
> Cc: Richardson, Bruce ;
> step...@networkplumber.org; dev@dpdk.org;
> david.march...@redhat.com; Yigit, Ferruh ;
> maxime.coque...@redhat.com; David Christensen
> ; Ruifeng Wang ; nd
> ; Honnappa Nagarahalli ;
> nd 
> Subject: RE: [dpdk-dev] [PATCH v2] config: enable packet data prefetch
> 
> 
> 
> > >
> > > 23/09/2020 03:51, Marvin Liu:
> > > > Data prefetch instruction can preload data into cpu’s hierarchical
> > > > cache before data access. Virtualized data paths like virtio
> > > > utilized this feature for acceleration. Since most modern cpus have
> > > > support prefetch function, we can enable packet data prefetch as
> default.
> > > >
> > > > Signed-off-by: Marvin Liu 
> > > > ---
> > > > +#define RTE_PMD_PACKET_PREFETCH 1
> > >
> > > We could also remove the related #ifdefs.
> > >
> > > What can be the drawback of always enable those prefetches?
> > >
> >
> > Hi Thomas,
> > I think the potential drawback is that current prefetch location cannot
> > guarantee the best performance across different platforms.
> Then, does it make sense to enable this by default?
> 

Now most of prefetch actions are placed after pointer of data is valid.  I 
think this methodology can benefit all platforms.
It's hard to say that it’s the best choice for all. But no more better solution 
in my mind. 
At least, we need to allow user to enable packet data prefetch. 

Regards,
Marvin

> > Each developer has tuned the performance by adding prefetch instruction
> > and verified the result on himself platform.
> > So prefetch location is based on certain platform, also it will be hard for
> > developer to compare the results across platforms.
> >
> > Thanks,
> > Marvin

Re: [dpdk-dev] [PATCH v2] config: enable packet data prefetch

2020-10-14 Thread Liu, Yong



> -Original Message-
> From: Thomas Monjalon 
> Sent: Thursday, October 15, 2020 6:03 AM
> To: Liu, Yong 
> Cc: Richardson, Bruce ;
> step...@networkplumber.org; dev@dpdk.org;
> david.march...@redhat.com; Yigit, Ferruh ;
> maxime.coque...@redhat.com; honnappa.nagaraha...@arm.com; David
> Christensen ; ruifeng.w...@arm.com
> Subject: Re: [dpdk-dev] [PATCH v2] config: enable packet data prefetch
> 
> 23/09/2020 03:51, Marvin Liu:
> > Data prefetch instruction can preload data into cpu’s hierarchical
> > cache before data access. Virtualized data paths like virtio utilized
> > this feature for acceleration. Since most modern cpus have support
> > prefetch function, we can enable packet data prefetch as default.
> >
> > Signed-off-by: Marvin Liu 
> > ---
> > +#define RTE_PMD_PACKET_PREFETCH 1
> 
> We could also remove the related #ifdefs.
> 
> What can be the drawback of always enable those prefetches?
> 

Hi Thomas,
I think the potential drawback is that current prefetch location cannot 
guarantee the best performance across different platforms. 
Each developer has tuned the performance by adding prefetch instruction and 
verified the result on himself platform. 
So prefetch location is based on certain platform, also it will be hard for 
developer to compare the results across platforms. 

Thanks,
Marvin

Re: [dpdk-dev] [PATCH v3 0/5] vhost add vectorized data path

2020-10-12 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, October 12, 2020 5:57 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/12/20 11:10 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Monday, October 12, 2020 4:22 PM
> >> To: Liu, Yong ; Xia, Chenbo
> ;
> >> Wang, Zhihong 
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> >>
> >> Hi Marvin,
> >>
> >> On 10/9/20 10:14 AM, Marvin Liu wrote:
> >>> Packed ring format is imported since virtio spec 1.1. All descriptors
> >>> are compacted into one single ring when packed ring format is on. It is
> >>> straight forward that ring operations can be accelerated by utilizing
> >>> SIMD instructions.
> >>>
> >>> This patch set will introduce vectorized data path in vhost library. If
> >>> vectorized option is on, operations like descs check, descs writeback,
> >>> address translation will be accelerated by SIMD instructions. On skylake
> >>> server, it can bring 6% performance gain in loopback case and around 4%
> >>> performance gain in PvP case.
> >>
> >> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> >> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> >> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> >> benchmark, I will try to do a benchmark this week.
> >>
> >
> > Maxime,
> > I have observed around 3% gain with OVS-DPDK in first version. But the
> number is not reliable as datapath has been changed.
> > I will try again after fixed OVS integration issue with latest dpdk.
> 
> Thanks for the information.
> 
> Also, wouldn't using AVX512 lower the CPU frequency?
> If so, could it have an impact on the workload running on the other
> CPUs?
> 

All AVX512 instructions used in vhost are lightweight ones, frequency won't be 
affected. 
Theoretically system performance won’t be affected if only lightweight 
instructions are used. 

Thanks.

> Thanks,
> Maxime
> 
> >> Thanks,
> >> Maxime
> >>
> >>> Vhost application can choose whether using vectorized acceleration,
> just
> >>> like external buffer feature. If platform or ring format not support
> >>> vectorized function, vhost will fallback to use default batch function.
> >>> There will be no impact in current data path.
> >>>
> >>> v3:
> >>> * rename vectorized datapath file
> >>> * eliminate the impact when avx512 disabled
> >>> * dynamically allocate memory regions structure
> >>> * remove unlikely hint for in_order
> >>>
> >>> v2:
> >>> * add vIOMMU support
> >>> * add dequeue offloading
> >>> * rebase code
> >>>
> >>> Marvin Liu (5):
> >>>   vhost: add vectorized data path
> >>>   vhost: reuse packed ring functions
> >>>   vhost: prepare memory regions addresses
> >>>   vhost: add packed ring vectorized dequeue
> >>>   vhost: add packed ring vectorized enqueue
> >>>
> >>>  doc/guides/nics/vhost.rst   |   5 +
> >>>  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >>>  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >>>  lib/librte_vhost/meson.build|  16 ++
> >>>  lib/librte_vhost/rte_vhost.h|   1 +
> >>>  lib/librte_vhost/socket.c   |   5 +
> >>>  lib/librte_vhost/vhost.c|  11 +
> >>>  lib/librte_vhost/vhost.h| 239 +++
> >>>  lib/librte_vhost/vhost_user.c   |  26 +++
> >>>  lib/librte_vhost/virtio_net.c   | 258 -
> >>>  lib/librte_vhost/virtio_net_avx.c   | 344
> 
> >>>  11 files changed, 718 insertions(+), 216 deletions(-)
> >>>  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >>>
> >

Re: [dpdk-dev] [PATCH v3 0/5] vhost add vectorized data path

2020-10-12 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, October 12, 2020 4:22 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 0/5] vhost add vectorized data path
> 
> Hi Marvin,
> 
> On 10/9/20 10:14 AM, Marvin Liu wrote:
> > Packed ring format is imported since virtio spec 1.1. All descriptors
> > are compacted into one single ring when packed ring format is on. It is
> > straight forward that ring operations can be accelerated by utilizing
> > SIMD instructions.
> >
> > This patch set will introduce vectorized data path in vhost library. If
> > vectorized option is on, operations like descs check, descs writeback,
> > address translation will be accelerated by SIMD instructions. On skylake
> > server, it can bring 6% performance gain in loopback case and around 4%
> > performance gain in PvP case.
> 
> IMHO, 4% gain on PVP is not a significant gain if we compare to the
> added complexity. Moreover, I guess this is 4% gain with testpmd-based
> PVP? If this is the case it may be even lower with OVS-DPDK PVP
> benchmark, I will try to do a benchmark this week.
> 

Maxime, 
I have observed around 3% gain with OVS-DPDK in first version. But the number 
is not reliable as datapath has been changed. 
I will try again after fixed OVS integration issue with latest dpdk. 

> Thanks,
> Maxime
> 
> > Vhost application can choose whether using vectorized acceleration, just
> > like external buffer feature. If platform or ring format not support
> > vectorized function, vhost will fallback to use default batch function.
> > There will be no impact in current data path.
> >
> > v3:
> > * rename vectorized datapath file
> > * eliminate the impact when avx512 disabled
> > * dynamically allocate memory regions structure
> > * remove unlikely hint for in_order
> >
> > v2:
> > * add vIOMMU support
> > * add dequeue offloading
> > * rebase code
> >
> > Marvin Liu (5):
> >   vhost: add vectorized data path
> >   vhost: reuse packed ring functions
> >   vhost: prepare memory regions addresses
> >   vhost: add packed ring vectorized dequeue
> >   vhost: add packed ring vectorized enqueue
> >
> >  doc/guides/nics/vhost.rst   |   5 +
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >  lib/librte_vhost/meson.build|  16 ++
> >  lib/librte_vhost/rte_vhost.h|   1 +
> >  lib/librte_vhost/socket.c   |   5 +
> >  lib/librte_vhost/vhost.c|  11 +
> >  lib/librte_vhost/vhost.h| 239 +++
> >  lib/librte_vhost/vhost_user.c   |  26 +++
> >  lib/librte_vhost/virtio_net.c   | 258 -
> >  lib/librte_vhost/virtio_net_avx.c   | 344 
> >  11 files changed, 718 insertions(+), 216 deletions(-)
> >  create mode 100644 lib/librte_vhost/virtio_net_avx.c
> >

Re: [dpdk-dev] [PATCH v2 4/5] vhost: add packed ring vectorized dequeue

2020-10-09 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, October 6, 2020 11:19 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2 4/5] vhost: add packed ring vectorized dequeue
> 
> 
> 
> On 9/21/20 8:48 AM, Marvin Liu wrote:
> > Optimize vhost packed ring dequeue path with SIMD instructions. Four
> > descriptors status check and writeback are batched handled with AVX512
> > instructions. Address translation operations are also accelerated by
> > AVX512 instructions.
> >
> > If platform or compiler not support vectorization, will fallback to
> > default path.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > index cc9aa65c67..c1481802d7 100644
> > --- a/lib/librte_vhost/meson.build
> > +++ b/lib/librte_vhost/meson.build
> > @@ -8,6 +8,22 @@ endif
> >  if has_libnuma == 1
> > dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
> >  endif
> > +
> > +if arch_subdir == 'x86'
> > +if not machine_args.contains('-mno-avx512f')
> > +if cc.has_argument('-mavx512f') and cc.has_argument('-
> mavx512vl') and cc.has_argument('-mavx512bw')
> > +cflags += ['-DCC_AVX512_SUPPORT']
> > +vhost_avx512_lib = 
> > static_library('vhost_avx512_lib',
> > +  'vhost_vec_avx.c',
> > +  dependencies: 
> > [static_rte_eal,
> static_rte_mempool,
> > +  static_rte_mbuf, 
> > static_rte_ethdev,
> static_rte_net],
> > +  include_directories: 
> > includes,
> > +  c_args: [cflags, 
> > '-mavx512f', '-mavx512bw', '-
> mavx512vl'])
> > +objs += 
> > vhost_avx512_lib.extract_objects('vhost_vec_avx.c')
> > +endif
> > +endif
> > +endif
> > +
> >  if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0'))
> > cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> >  elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0'))
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 4a81f18f01..fc7daf2145 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -1124,4 +1124,12 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> *dev, struct rte_mempool *mp,
> > return NULL;
> >  }
> >
> > +int
> > +vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev,
> > +struct vhost_virtqueue *vq,
> > +struct rte_mempool *mbuf_pool,
> > +struct rte_mbuf **pkts,
> > +uint16_t avail_idx,
> > +uintptr_t *desc_addrs,
> > +uint16_t *ids);
> >  #endif /* _VHOST_NET_CDEV_H_ */
> > diff --git a/lib/librte_vhost/vhost_vec_avx.c
> b/lib/librte_vhost/vhost_vec_avx.c
> > new file mode 100644
> > index 00..dc5322d002
> > --- /dev/null
> > +++ b/lib/librte_vhost/vhost_vec_avx.c
> > @@ -0,0 +1,181 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2016 Intel Corporation
> > + */
> > +#include 
> > +
> > +#include "vhost.h"
> > +
> > +#define BYTE_SIZE 8
> > +/* reference count offset in mbuf rearm data */
> > +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
> > +   offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +/* segment number offset in mbuf rearm data */
> > +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
> > +   offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +
> > +/* default rearm data */
> > +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
> > +   1ULL << REFCNT_BITS_OFFSET)
> > +
> > +#define DESC_FLAGS_SHORT_OFFSET (offsetof(struct vring_packed_desc,
> flags) / \
> > +   sizeof(uint16_t))
> > +
> > +#define DESC_FLAGS_SHORT_SIZE (sizeof(struct vring_packed_desc) / \
> > +   sizeof(uint16_t))
> > +#define BATCH_FLAGS_MASK (1 << DESC_FLAGS_SHORT_OFFSE

Re: [dpdk-dev] [PATCH v2 5/5] vhost: add packed ring vectorized enqueue

2020-10-08 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, October 6, 2020 11:00 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2 5/5] vhost: add packed ring vectorized enqueue
> 
> 
> 
> On 9/21/20 8:48 AM, Marvin Liu wrote:
> > Optimize vhost packed ring enqueue path with SIMD instructions. Four
> > descriptors status and length are batched handled with AVX512
> > instructions. Address translation operations are also accelerated
> > by AVX512 instructions.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index fc7daf2145..b78b2c5c1b 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -1132,4 +1132,10 @@ vhost_reserve_avail_batch_packed_avx(struct
> virtio_net *dev,
> >  uint16_t avail_idx,
> >  uintptr_t *desc_addrs,
> >  uint16_t *ids);
> > +
> > +int
> > +virtio_dev_rx_batch_packed_avx(struct virtio_net *dev,
> > +  struct vhost_virtqueue *vq,
> > +  struct rte_mbuf **pkts);
> > +
> >  #endif /* _VHOST_NET_CDEV_H_ */
> > diff --git a/lib/librte_vhost/vhost_vec_avx.c
> b/lib/librte_vhost/vhost_vec_avx.c
> > index dc5322d002..7d2250ed86 100644
> > --- a/lib/librte_vhost/vhost_vec_avx.c
> > +++ b/lib/librte_vhost/vhost_vec_avx.c
> > @@ -35,9 +35,15 @@
> >  #define PACKED_AVAIL_FLAG ((0ULL | VRING_DESC_F_AVAIL) <<
> FLAGS_BITS_OFFSET)
> >  #define PACKED_AVAIL_FLAG_WRAP ((0ULL | VRING_DESC_F_USED) << \
> > FLAGS_BITS_OFFSET)
> > +#define PACKED_WRITE_AVAIL_FLAG (PACKED_AVAIL_FLAG | \
> > +   ((0ULL | VRING_DESC_F_WRITE) << FLAGS_BITS_OFFSET))
> > +#define PACKED_WRITE_AVAIL_FLAG_WRAP
> (PACKED_AVAIL_FLAG_WRAP | \
> > +   ((0ULL | VRING_DESC_F_WRITE) << FLAGS_BITS_OFFSET))
> >
> >  #define DESC_FLAGS_POS 0xaa
> >  #define MBUF_LENS_POS 0x
> > +#define DESC_LENS_POS 0x
> > +#define DESC_LENS_FLAGS_POS 0xB0B0B0B0
> >
> >  int
> >  vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev,
> > @@ -179,3 +185,154 @@ vhost_reserve_avail_batch_packed_avx(struct
> virtio_net *dev,
> >
> > return -1;
> >  }
> > +
> > +int
> > +virtio_dev_rx_batch_packed_avx(struct virtio_net *dev,
> > +  struct vhost_virtqueue *vq,
> > +  struct rte_mbuf **pkts)
> > +{
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   uint16_t avail_idx = vq->last_avail_idx;
> > +   uint64_t desc_addrs[PACKED_BATCH_SIZE];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint32_t desc_status;
> > +   uint64_t lens[PACKED_BATCH_SIZE];
> > +   uint16_t i;
> > +   void *desc_addr;
> > +   uint8_t cmp_low, cmp_high, cmp_result;
> > +
> > +   if (unlikely(avail_idx & PACKED_BATCH_MASK))
> > +   return -1;
> 
> Same comment as for patch 4. Packed ring size may not be a pow2.
> 
Thanks, will fix in next version.

> > +   /* check refcnt and nb_segs */
> > +   __m256i mbuf_ref = _mm256_set1_epi64x(DEFAULT_REARM_DATA);
> > +
> > +   /* load four mbufs rearm data */
> > +   __m256i mbufs = _mm256_set_epi64x(
> > +   *pkts[3]->rearm_data,
> > +   *pkts[2]->rearm_data,
> > +   *pkts[1]->rearm_data,
> > +   *pkts[0]->rearm_data);
> > +
> > +   uint16_t cmp = _mm256_cmpneq_epu16_mask(mbufs, mbuf_ref);
> > +   if (cmp & MBUF_LENS_POS)
> > +   return -1;
> > +
> > +   /* check desc status */
> > +   desc_addr = &vq->desc_packed[avail_idx];
> > +   __m512i desc_vec = _mm512_loadu_si512(desc_addr);
> > +
> > +   __m512i avail_flag_vec;
> > +   __m512i used_flag_vec;
> > +   if (vq->avail_wrap_counter) {
> > +#if defined(RTE_ARCH_I686)
> 
> Is supporting AVX512 on i686 really useful/necessary?
> 
It is useless for function point of view.  Here is for successful compilation 
if enabled i686 build. 

> > +   avail_flag_vec =
> _mm512_set4_epi64(PACKED_WRITE_AVAIL_FLAG,
> > +   0x0, PACKED_WRITE_AVAIL_FLAG,
> 0x0);
> > +   used_flag_vec = _mm512_set4_epi64(PACKED_FLAGS_MASK,
> 0x0,
> > +   PACKED_FLAGS_MASK, 0x0);
> > +#else
> &g

Re: [dpdk-dev] [PATCH v2 4/5] vhost: add packed ring vectorized dequeue

2020-10-08 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, October 6, 2020 10:59 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2 4/5] vhost: add packed ring vectorized dequeue
> 
> 
> 
> On 9/21/20 8:48 AM, Marvin Liu wrote:
> > Optimize vhost packed ring dequeue path with SIMD instructions. Four
> > descriptors status check and writeback are batched handled with AVX512
> > instructions. Address translation operations are also accelerated by
> > AVX512 instructions.
> >
> > If platform or compiler not support vectorization, will fallback to
> > default path.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > index cc9aa65c67..c1481802d7 100644
> > --- a/lib/librte_vhost/meson.build
> > +++ b/lib/librte_vhost/meson.build
> > @@ -8,6 +8,22 @@ endif
> >  if has_libnuma == 1
> > dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
> >  endif
> > +
> > +if arch_subdir == 'x86'
> > +if not machine_args.contains('-mno-avx512f')
> > +if cc.has_argument('-mavx512f') and cc.has_argument('-
> mavx512vl') and cc.has_argument('-mavx512bw')
> > +cflags += ['-DCC_AVX512_SUPPORT']
> > +vhost_avx512_lib = 
> > static_library('vhost_avx512_lib',
> > +  'vhost_vec_avx.c',
> > +  dependencies: 
> > [static_rte_eal,
> static_rte_mempool,
> > +  static_rte_mbuf, 
> > static_rte_ethdev,
> static_rte_net],
> > +  include_directories: 
> > includes,
> > +  c_args: [cflags, 
> > '-mavx512f', '-mavx512bw', '-
> mavx512vl'])
> > +objs += 
> > vhost_avx512_lib.extract_objects('vhost_vec_avx.c')
> > +endif
> > +endif
> > +endif
> 
> Not a Meson expert, but wonder how I can disable CC_AVX512_SUPPORT.
> I checked the DPDK doc, but I could not find how to pass -mno-avx512f to
> the machine_args.

Hi Maxime,
By now mno-avx512f flag will be set only if binutils check script found issue 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90028.
So avx512 code will be built-in if compiler support that. There's alternative 
way is that introduce one new option in meson build. 

Thanks,
Marvin

> 
> > +
> >  if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0'))
> > cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> >  elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0'))
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 4a81f18f01..fc7daf2145 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -1124,4 +1124,12 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> *dev, struct rte_mempool *mp,
> > return NULL;
> >  }
> >
> > +int
> > +vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev,
> > +struct vhost_virtqueue *vq,
> > +struct rte_mempool *mbuf_pool,
> > +struct rte_mbuf **pkts,
> > +uint16_t avail_idx,
> > +uintptr_t *desc_addrs,
> > +uint16_t *ids);
> >  #endif /* _VHOST_NET_CDEV_H_ */
> > diff --git a/lib/librte_vhost/vhost_vec_avx.c
> b/lib/librte_vhost/vhost_vec_avx.c
> > new file mode 100644
> > index 00..dc5322d002
> > --- /dev/null
> > +++ b/lib/librte_vhost/vhost_vec_avx.c
> 
> For consistency it should be prefixed with virtio_net, not vhost.
> 
> > @@ -0,0 +1,181 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2010-2016 Intel Corporation
> > + */
> > +#include 
> > +
> > +#include "vhost.h"
> > +
> > +#define BYTE_SIZE 8
> > +/* reference count offset in mbuf rearm data */
> > +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
> > +   offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +/* segment number offset in mbuf rearm data */
> > +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
> > +   offsetof

Re: [dpdk-dev] [PATCH v2 0/5] vhost add vectorized data path

2020-10-07 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, October 6, 2020 9:34 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2 0/5] vhost add vectorized data path
> 
> Hi,
> 
> On 9/21/20 8:48 AM, Marvin Liu wrote:
> > Packed ring format is imported since virtio spec 1.1. All descriptors
> > are compacted into one single ring when packed ring format is on. It is
> > straight forward that ring operations can be accelerated by utilizing
> > SIMD instructions.
> >
> > This patch set will introduce vectorized data path in vhost library. If
> > vectorized option is on, operations like descs check, descs writeback,
> > address translation will be accelerated by SIMD instructions. Vhost
> > application can choose whether using vectorized acceleration, it is
> > like external buffer and zero copy features.
> >
> > If platform or ring format not support vectorized function, vhost will
> > fallback to use default batch function. There will be no impact in current
> > data path.
> 
> As a pre-requisite, I'd like some performance numbers in both loopback
> and PVP to figure out if adding such complexity is worth it, given we
> will have to support it for at least one year.
> 


Thanks for suggestion, will add some reference numbers in next version.

> Thanks,
> Maxime
> 
> > v2:
> > * add vIOMMU support
> > * add dequeue offloading
> > * rebase code
> >
> > Marvin Liu (5):
> >   vhost: add vectorized data path
> >   vhost: reuse packed ring functions
> >   vhost: prepare memory regions addresses
> >   vhost: add packed ring vectorized dequeue
> >   vhost: add packed ring vectorized enqueue
> >
> >  doc/guides/nics/vhost.rst   |   5 +
> >  doc/guides/prog_guide/vhost_lib.rst |  12 +
> >  drivers/net/vhost/rte_eth_vhost.c   |  17 +-
> >  lib/librte_vhost/meson.build|  16 ++
> >  lib/librte_vhost/rte_vhost.h|   1 +
> >  lib/librte_vhost/socket.c   |   5 +
> >  lib/librte_vhost/vhost.c|  11 +
> >  lib/librte_vhost/vhost.h| 235 +++
> >  lib/librte_vhost/vhost_user.c   |  11 +
> >  lib/librte_vhost/vhost_vec_avx.c| 338
> 
> >  lib/librte_vhost/virtio_net.c   | 257 -
> >  11 files changed, 692 insertions(+), 216 deletions(-)
> >  create mode 100644 lib/librte_vhost/vhost_vec_avx.c
> >

Re: [dpdk-dev] [PATCH] build: enable packet data prefetch

2020-09-22 Thread Liu, Yong



> -Original Message-
> From: Stephen Hemminger 
> Sent: Tuesday, September 22, 2020 10:12 PM
> To: Liu, Yong 
> Cc: Richardson, Bruce ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] build: enable packet data prefetch
> 
> On Tue, 22 Sep 2020 16:21:35 +0800
> Marvin Liu  wrote:
> 
> > Data prefetch instruction can preload data into cpu’s hierarchical
> > cache before data access. Virtualized data paths like virtio utilized
> > this feature for acceleration. Since most modern cpus have support
> > prefetch function, we can enable packet data prefetch as default.
> >
> > Signed-off-by: Marvin Liu 
> >
> 
> With meson, the project has been using rte_config.h for this.

Thanks a lot,  will send v2 for the change. 

Regards,
Marvin

Re: [dpdk-dev] [PATCH v1 4/5] vhost: add packed ring vectorized dequeue

2020-09-21 Thread Liu, Yong



> -Original Message-
> From: Liu, Yong
> Sent: Monday, September 21, 2020 2:27 PM
> To: 'Maxime Coquelin' ; Xia, Chenbo
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v1 4/5] vhost: add packed ring vectorized dequeue
> 
> 
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Friday, September 18, 2020 9:45 PM
> > To: Liu, Yong ; Xia, Chenbo ;
> > Wang, Zhihong 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v1 4/5] vhost: add packed ring vectorized dequeue
> >
> >
> >
> > On 8/19/20 5:24 AM, Marvin Liu wrote:
> > > Optimize vhost packed ring dequeue path with SIMD instructions. Four
> > > descriptors status check and writeback are batched handled with
> AVX512
> > > instructions. Address translation operations are also accelerated by
> > > AVX512 instructions.
> > >
> > > If platform or compiler not support vectorization, will fallback to
> > > default path.
> > >
> > > Signed-off-by: Marvin Liu 
> > >
> > > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > > index 4f2f3e47da..c0cd7d498f 100644
> > > --- a/lib/librte_vhost/Makefile
> > > +++ b/lib/librte_vhost/Makefile
> > > @@ -31,6 +31,13 @@ CFLAGS += -DVHOST_ICC_UNROLL_PRAGMA
> > >  endif
> > >  endif
> > >
> > > +ifneq ($(FORCE_DISABLE_AVX512), y)
> > > +CC_AVX512_SUPPORT=\
> > > +$(shell $(CC) -march=native -dM -E - &1 | \
> > > +sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' |
> \
> > > +grep -q AVX512 && echo 1)
> > > +endif
> > > +
> > >  ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
> > >  LDLIBS += -lnuma
> > >  endif
> > > @@ -40,6 +47,12 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
> > lrte_ethdev -lrte_net
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
> \
> > >   vhost_user.c virtio_net.c vdpa.c
> > >
> > > +ifeq ($(CC_AVX512_SUPPORT), 1)
> > > +CFLAGS += -DCC_AVX512_SUPPORT
> > > +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_vec_avx.c
> > > +CFLAGS_vhost_vec_avx.o += -mavx512f -mavx512bw -mavx512vl
> > > +endif
> > > +
> > >  # install includes
> > >  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > rte_vdpa.h \
> > >   rte_vdpa_dev.h
> > rte_vhost_async.h
> > > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > > index cc9aa65c67..c1481802d7 100644
> > > --- a/lib/librte_vhost/meson.build
> > > +++ b/lib/librte_vhost/meson.build
> > > @@ -8,6 +8,22 @@ endif
> > >  if has_libnuma == 1
> > >   dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
> > >  endif
> > > +
> > > +if arch_subdir == 'x86'
> > > +if not machine_args.contains('-mno-avx512f')
> > > +if cc.has_argument('-mavx512f') and cc.has_argument('-
> > mavx512vl') and cc.has_argument('-mavx512bw')
> > > +cflags += ['-DCC_AVX512_SUPPORT']
> > > +vhost_avx512_lib = 
> > > static_library('vhost_avx512_lib',
> > > +  'vhost_vec_avx.c',
> > > +  dependencies: 
> > > [static_rte_eal,
> > static_rte_mempool,
> > > +  static_rte_mbuf, 
> > > static_rte_ethdev,
> > static_rte_net],
> > > +  include_directories: 
> > > includes,
> > > +  c_args: [cflags, 
> > > '-mavx512f', '-mavx512bw', '-
> > mavx512vl'])
> > > +objs +=
> vhost_avx512_lib.extract_objects('vhost_vec_avx.c')
> > > +endif
> > > +endif
> > > +endif
> > > +
> > >  if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0'))
> > >   cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> > >  elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0'))
> > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> >

Re: [dpdk-dev] [PATCH v1 4/5] vhost: add packed ring vectorized dequeue

2020-09-20 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, September 18, 2020 9:45 PM
> To: Liu, Yong ; Xia, Chenbo ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 4/5] vhost: add packed ring vectorized dequeue
> 
> 
> 
> On 8/19/20 5:24 AM, Marvin Liu wrote:
> > Optimize vhost packed ring dequeue path with SIMD instructions. Four
> > descriptors status check and writeback are batched handled with AVX512
> > instructions. Address translation operations are also accelerated by
> > AVX512 instructions.
> >
> > If platform or compiler not support vectorization, will fallback to
> > default path.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 4f2f3e47da..c0cd7d498f 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -31,6 +31,13 @@ CFLAGS += -DVHOST_ICC_UNROLL_PRAGMA
> >  endif
> >  endif
> >
> > +ifneq ($(FORCE_DISABLE_AVX512), y)
> > +CC_AVX512_SUPPORT=\
> > +$(shell $(CC) -march=native -dM -E - &1 | \
> > +sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \
> > +grep -q AVX512 && echo 1)
> > +endif
> > +
> >  ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
> >  LDLIBS += -lnuma
> >  endif
> > @@ -40,6 +47,12 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
> lrte_ethdev -lrte_net
> >  SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
> > vhost_user.c virtio_net.c vdpa.c
> >
> > +ifeq ($(CC_AVX512_SUPPORT), 1)
> > +CFLAGS += -DCC_AVX512_SUPPORT
> > +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_vec_avx.c
> > +CFLAGS_vhost_vec_avx.o += -mavx512f -mavx512bw -mavx512vl
> > +endif
> > +
> >  # install includes
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> rte_vdpa.h \
> > rte_vdpa_dev.h
> rte_vhost_async.h
> > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > index cc9aa65c67..c1481802d7 100644
> > --- a/lib/librte_vhost/meson.build
> > +++ b/lib/librte_vhost/meson.build
> > @@ -8,6 +8,22 @@ endif
> >  if has_libnuma == 1
> > dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
> >  endif
> > +
> > +if arch_subdir == 'x86'
> > +if not machine_args.contains('-mno-avx512f')
> > +if cc.has_argument('-mavx512f') and cc.has_argument('-
> mavx512vl') and cc.has_argument('-mavx512bw')
> > +cflags += ['-DCC_AVX512_SUPPORT']
> > +vhost_avx512_lib = 
> > static_library('vhost_avx512_lib',
> > +  'vhost_vec_avx.c',
> > +  dependencies: 
> > [static_rte_eal,
> static_rte_mempool,
> > +  static_rte_mbuf, 
> > static_rte_ethdev,
> static_rte_net],
> > +  include_directories: 
> > includes,
> > +  c_args: [cflags, 
> > '-mavx512f', '-mavx512bw', '-
> mavx512vl'])
> > +objs += 
> > vhost_avx512_lib.extract_objects('vhost_vec_avx.c')
> > +endif
> > +endif
> > +endif
> > +
> >  if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0'))
> > cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> >  elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0'))
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 4a81f18f01..fc7daf2145 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -1124,4 +1124,12 @@ virtio_dev_pktmbuf_alloc(struct virtio_net
> *dev, struct rte_mempool *mp,
> > return NULL;
> >  }
> >
> > +int
> > +vhost_reserve_avail_batch_packed_avx(struct virtio_net *dev,
> > +struct vhost_virtqueue *vq,
> > +struct rte_mempool *mbuf_pool,
> > +struct rte_mbuf **pkts,
> > +uint16_t avail_idx,
> > +uintptr_t *desc_addrs,
> > +uint16_t *ids);
> >

Re: [dpdk-dev] [PATCH v4 1/2] vhost: introduce async enqueue registration API

2020-07-05 Thread Liu, Yong

Hi Patrick,
Few comments are inline,  others are fine to me.

Regards,
Marvin

> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 0d822d6..58ee3ef 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -332,8 +332,13 @@
>  {
>   if (vq_is_packed(dev))
>   rte_free(vq->shadow_used_packed);
> - else
> + else {
>   rte_free(vq->shadow_used_split);
> + if (vq->async_pkts_pending)
> + rte_free(vq->async_pkts_pending);
> + if (vq->async_pending_info)
> + rte_free(vq->async_pending_info);

Missed pointer set and feature set to 0. 


> +int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id)
> +{
> + struct vhost_virtqueue *vq;
> + struct virtio_net *dev = get_device(vid);
> + int ret = -1;
> +
> + if (dev == NULL)
> + return ret;
> +
> + vq = dev->virtqueue[queue_id];
> +
> + if (vq == NULL)
> + return ret;
> +
> + ret = 0;
> + rte_spinlock_lock(&vq->access_lock);
> +
> + if (!vq->async_registered)
> + goto out;
> +
> + if (vq->async_pkts_inflight_n) {
> + VHOST_LOG_CONFIG(ERR, "Failed to unregister async
> channel. "
> + "async inflight packets must be completed before
> unregistration.\n");
> + ret = -1;
> + goto out;
> + }
> +
> + if (vq->async_pkts_pending) {
> + rte_free(vq->async_pkts_pending);
> + vq->async_pkts_pending = 0;
> + }
> +
> + if (vq->async_pending_info) {
> + rte_free(vq->async_pending_info);
> + vq->async_pending_info = 0;
> + }
> +

Please unify the async pending pointer check and free logic and pointer should 
be set to NULL.

> + vq->async_ops.transfer_data = NULL;
> + vq->async_ops.check_completed_copies = NULL;
> + vq->async_registered = false;
> +
> +out:
> + rte_spinlock_unlock(&vq->access_lock);
> +
> + return ret;
> +}
> +

Re: [dpdk-dev] [PATCH v2 2/2] vhost: introduce async enqueue for split ring

2020-07-01 Thread Liu, Yong

> 
> +#define VHOST_ASYNC_BATCH_THRESHOLD 8
> +

Not very clear about why batch number is 8. It is better to save it in 
rte_vhost_async_features if the value come from hardware requirement. 

> +
> +static __rte_noinline uint32_t
> +virtio_dev_rx_async_submit_split(struct virtio_net *dev,
> + struct vhost_virtqueue *vq, uint16_t queue_id,
> + struct rte_mbuf **pkts, uint32_t count)
> +{
> + uint32_t pkt_idx = 0, pkt_burst_idx = 0;
> + uint16_t num_buffers;
> + struct buf_vector buf_vec[BUF_VECTOR_MAX];
> + uint16_t avail_head, last_idx, shadow_idx;
> +
> + struct rte_vhost_iov_iter *it_pool = vq->it_pool;
> + struct iovec *vec_pool = vq->vec_pool;
> + struct rte_vhost_async_desc tdes[MAX_PKT_BURST];
> + struct iovec *src_iovec = vec_pool;
> + struct iovec *dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
> + struct rte_vhost_iov_iter *src_it = it_pool;
> + struct rte_vhost_iov_iter *dst_it = it_pool + 1;
> + uint16_t n_free_slot, slot_idx;
> + int n_pkts = 0;
> +
> + avail_head = *((volatile uint16_t *)&vq->avail->idx);
> + last_idx = vq->last_avail_idx;
> + shadow_idx = vq->shadow_used_idx;
> +
> + /*
> +  * The ordering between avail index and
> +  * desc reads needs to be enforced.
> +  */
> + rte_smp_rmb();
> +
> + rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
> +
> + for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
> + uint32_t pkt_len = pkts[pkt_idx]->pkt_len + dev->vhost_hlen;
> + uint16_t nr_vec = 0;
> +
> + if (unlikely(reserve_avail_buf_split(dev, vq,
> + pkt_len, buf_vec,
> &num_buffers,
> + avail_head, &nr_vec) < 0)) {
> + VHOST_LOG_DATA(DEBUG,
> + "(%d) failed to get enough desc from
> vring\n",
> + dev->vid);
> + vq->shadow_used_idx -= num_buffers;
> + break;
> + }
> +
> + VHOST_LOG_DATA(DEBUG, "(%d) current index %d | end
> index %d\n",
> + dev->vid, vq->last_avail_idx,
> + vq->last_avail_idx + num_buffers);
> +
> + if (async_mbuf_to_desc(dev, vq, pkts[pkt_idx],
> + buf_vec, nr_vec, num_buffers,
> + src_iovec, dst_iovec, src_it, dst_it) < 0) {
> + vq->shadow_used_idx -= num_buffers;
> + break;
> + }
> +
> + slot_idx = (vq->async_pkts_idx + pkt_idx) & (vq->size - 1);
> + if (src_it->count) {
> + async_fill_des(&tdes[pkt_burst_idx], src_it, dst_it);
> + pkt_burst_idx++;
> + vq->async_pending_info[slot_idx] =
> + num_buffers | (src_it->nr_segs << 16);
> + src_iovec += src_it->nr_segs;
> + dst_iovec += dst_it->nr_segs;
> + src_it += 2;
> + dst_it += 2;

Patrick, 
In my understanding, nr_segs type definition can follow nr_vec type definition 
(uint16_t). By that can short the data saved in async_pkts_pending from 64bit 
to 32bit. 
Since those information will be used in datapath, the smaller size will get the 
better perf. 

It is better to replace integer 2 with macro. 

Thanks,
Marvin

Re: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API

2020-06-18 Thread Liu, Yong




> -Original Message-
> From: Fu, Patrick 
> Sent: Thursday, June 18, 2020 5:09 PM
> To: Liu, Yong 
> Cc: Jiang, Cheng1 ; Liang, Cunming
> ; dev@dpdk.org; maxime.coque...@redhat.com;
> Xia, Chenbo ; Wang, Zhihong
> ; Ye, Xiaolong 
> Subject: RE: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path
> registration API
> 
> 
> 
> > -Original Message-
> > From: Liu, Yong 
> > Sent: Thursday, June 18, 2020 1:51 PM
> > To: Fu, Patrick 
> > Cc: Fu, Patrick ; Jiang, Cheng1
> > ; Liang, Cunming ;
> > dev@dpdk.org; maxime.coque...@redhat.com; Xia, Chenbo
> > ; Wang, Zhihong ; Ye,
> > Xiaolong 
> > Subject: RE: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path
> > registration API
> >
> > Thanks, Patrick. So comments are inline.
> >
> > > -Original Message-
> > > From: dev  On Behalf Of patrick...@intel.com
> > > Sent: Thursday, June 11, 2020 6:02 PM
> > > To: dev@dpdk.org; maxime.coque...@redhat.com; Xia, Chenbo
> > > ; Wang, Zhihong ;
> Ye,
> > > Xiaolong 
> > > Cc: Fu, Patrick ; Jiang, Cheng1
> > > ; Liang, Cunming 
> > > Subject: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path
> > > registration API
> > >
> > > From: Patrick 
> > >
> > > This patch introduces registration/un-registration APIs for async data
> > > path together with all required data structures and DMA callback
> > > function proto-types.
> > >
> > > Signed-off-by: Patrick 
> > > ---
> > >  lib/librte_vhost/Makefile  |   3 +-
> > >  lib/librte_vhost/rte_vhost.h   |   1 +
> > >  lib/librte_vhost/rte_vhost_async.h | 134
> > > +
> > >  lib/librte_vhost/socket.c  |  20 ++
> > >  lib/librte_vhost/vhost.c   |  74 +++-
> > >  lib/librte_vhost/vhost.h   |  30 -
> > >  lib/librte_vhost/vhost_user.c  |  28 ++--
> > >  7 files changed, 283 insertions(+), 7 deletions(-)  create mode
> > > 100644 lib/librte_vhost/rte_vhost_async.h
> > >
> > > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > > index e592795..3aed094 100644
> > > --- a/lib/librte_vhost/Makefile
> > > +++ b/lib/librte_vhost/Makefile
> > > @@ -41,7 +41,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c
> > iotlb.c
> > > socket.c vhost.c \
> > >  vhost_user.c virtio_net.c vdpa.c
> > >
> > >  # install includes
> > > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > rte_vdpa.h
> > > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > rte_vdpa.h
> > > \
> > > +rte_vhost_async.h
> > >
> > Hi Patrick,
> > Please also update meson build for newly added file.
> >
> > Thanks,
> > Marvin
> >
> > >  # only compile vhost crypto when cryptodev is enabled  ifeq
> > > ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
> > > diff --git a/lib/librte_vhost/rte_vhost.h
> > > b/lib/librte_vhost/rte_vhost.h index d43669f..cec4d07 100644
> > > --- a/lib/librte_vhost/rte_vhost.h
> > > +++ b/lib/librte_vhost/rte_vhost.h
> > > @@ -35,6 +35,7 @@
> > >  #define RTE_VHOST_USER_EXTBUF_SUPPORT(1ULL << 5)
> > >  /* support only linear buffers (no chained mbufs) */
> > >  #define RTE_VHOST_USER_LINEARBUF_SUPPORT(1ULL << 6)
> > > +#define RTE_VHOST_USER_ASYNC_COPY(1ULL << 7)
> > >
> > >  /** Protocol features. */
> > >  #ifndef VHOST_USER_PROTOCOL_F_MQ
> > > diff --git a/lib/librte_vhost/rte_vhost_async.h
> > > b/lib/librte_vhost/rte_vhost_async.h
> > > new file mode 100644
> > > index 000..82f2ebe
> > > --- /dev/null
> > > +++ b/lib/librte_vhost/rte_vhost_async.h
> > > @@ -0,0 +1,134 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2018 Intel Corporation  */
> >
> > s/2018/2020/
> >
> > > +
> > > +#ifndef _RTE_VHOST_ASYNC_H_
> > > +#define _RTE_VHOST_ASYNC_H_
> > > +
> > > +#include "rte_vhost.h"
> > > +
> > > +/**
> > > + * iovec iterator
> > > + */
> > > +struct iov_it {
> > > +/** offset to the first byte of interesting data */
> > > +size_t offset;
> > > +/** total bytes of data in this iterator */
> > > +size_t count;
> > > +/** pointer to the iovec array */
>

Re: [dpdk-dev] [PATCH v1 2/2] vhost: introduce async enqueue for split ring

2020-06-17 Thread Liu, Yong

Thanks, Patrick. Some comments are inline.

> -Original Message-
> From: dev  On Behalf Of patrick...@intel.com
> Sent: Thursday, June 11, 2020 6:02 PM
> To: dev@dpdk.org; maxime.coque...@redhat.com; Xia, Chenbo
> ; Wang, Zhihong ; Ye,
> Xiaolong 
> Cc: Fu, Patrick ; Jiang, Cheng1
> ; Liang, Cunming 
> Subject: [dpdk-dev] [PATCH v1 2/2] vhost: introduce async enqueue for split
> ring
> 
> From: Patrick 
> 
> This patch implement async enqueue data path for split ring.
> 
> Signed-off-by: Patrick 
> ---
>  lib/librte_vhost/rte_vhost_async.h |  38 +++
>  lib/librte_vhost/virtio_net.c  | 538
> -
>  2 files changed, 574 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_vhost_async.h
> b/lib/librte_vhost/rte_vhost_async.h
> index 82f2ebe..efcba0a 100644
> --- a/lib/librte_vhost/rte_vhost_async.h
> +++ b/lib/librte_vhost/rte_vhost_async.h
> @@ -131,4 +131,42 @@ int rte_vhost_async_channel_register(int vid,
> uint16_t queue_id,
>   */
>  int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id);
> 
> +/**
> + * This function submit enqueue data to DMA. This function has no
> + * guranttee to the transfer completion upon return. Applications should
> + * poll transfer status by rte_vhost_poll_enqueue_completed()
> + *
> + * @param vid
> + *  id of vhost device to enqueue data
> + * @param queue_id
> + *  queue id to enqueue data
> + * @param pkts
> + *  array of packets to be enqueued
> + * @param count
> + *  packets num to be enqueued
> + * @return
> + *  num of packets enqueued
> + */
> +uint16_t rte_vhost_submit_enqueue_burst(int vid, uint16_t queue_id,
> + struct rte_mbuf **pkts, uint16_t count);
> +
> +/**
> + * This function check DMA completion status for a specific vhost
> + * device queue. Packets which finish copying (enqueue) operation
> + * will be returned in an array.
> + *
> + * @param vid
> + *  id of vhost device to enqueue data
> + * @param queue_id
> + *  queue id to enqueue data
> + * @param pkts
> + *  blank array to get return packet pointer
> + * @param count
> + *  size of the packet array
> + * @return
> + *  num of packets returned
> + */
> +uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
> + struct rte_mbuf **pkts, uint16_t count);
> +
>  #endif /* _RTE_VDPA_H_ */
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 751c1f3..cf9f884 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -17,14 +17,15 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "iotlb.h"
>  #include "vhost.h"
> 
> -#define MAX_PKT_BURST 32
> -
>  #define MAX_BATCH_LEN 256
> 
> +#define VHOST_ASYNC_BATCH_THRESHOLD 8
> +
>  static  __rte_always_inline bool
>  rxvq_is_mergeable(struct virtio_net *dev)
>  {
> @@ -117,6 +118,35 @@
>  }
> 
>  static __rte_always_inline void
> +async_flush_shadow_used_ring_split(struct virtio_net *dev,
> + struct vhost_virtqueue *vq)
> +{
> + uint16_t used_idx = vq->last_used_idx & (vq->size - 1);
> +
> + if (used_idx + vq->shadow_used_idx <= vq->size) {
> + do_flush_shadow_used_ring_split(dev, vq, used_idx, 0,
> +   vq->shadow_used_idx);
> + } else {
> + uint16_t size;
> +
> + /* update used ring interval [used_idx, vq->size] */
> + size = vq->size - used_idx;
> + do_flush_shadow_used_ring_split(dev, vq, used_idx, 0, size);
> +
> + /* update the left half used ring interval [0, left_size] */
> + do_flush_shadow_used_ring_split(dev, vq, 0, size,
> +   vq->shadow_used_idx - size);
> + }
> + vq->last_used_idx += vq->shadow_used_idx;
> +
> + rte_smp_wmb();
> +
> + vhost_log_cache_sync(dev, vq);
> +
> + vq->shadow_used_idx = 0;
> +}
> +
> +static __rte_always_inline void
>  update_shadow_used_ring_split(struct vhost_virtqueue *vq,
>uint16_t desc_idx, uint32_t len)
>  {
> @@ -905,6 +935,199 @@
>   return error;
>  }
> 
> +static __rte_always_inline void
> +async_fill_vec(struct iovec *v, void *base, size_t len)
> +{
> + v->iov_base = base;
> + v->iov_len = len;
> +}
> +
> +static __rte_always_inline void
> +async_fill_it(struct iov_it *it, size_t count,
> + struct iovec *vec, unsigned long nr_seg)
> +{
> + it->offset = 0;
> + it->count = count;
> +
> + if (count) {
> + it->iov = vec;
> + it->nr_segs = nr_seg;
> + } else {
> + it->iov = 0;
> + it->nr_segs = 0;
> + }
> +}
> +
> +static __rte_always_inline void
> +async_fill_des(struct dma_trans_desc *desc,
> + struct iov_it *src, struct iov_it *dst)
> +{
> + desc->src = src;
> + desc->dst = dst;
> +}
> +
> +static __rte_always_inline int
> +async_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
> + st

Re: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path registration API

2020-06-17 Thread Liu, Yong

Thanks, Patrick. So comments are inline.

> -Original Message-
> From: dev  On Behalf Of patrick...@intel.com
> Sent: Thursday, June 11, 2020 6:02 PM
> To: dev@dpdk.org; maxime.coque...@redhat.com; Xia, Chenbo
> ; Wang, Zhihong ; Ye,
> Xiaolong 
> Cc: Fu, Patrick ; Jiang, Cheng1
> ; Liang, Cunming 
> Subject: [dpdk-dev] [PATCH v1 1/2] vhost: introduce async data path
> registration API
> 
> From: Patrick 
> 
> This patch introduces registration/un-registration APIs
> for async data path together with all required data
> structures and DMA callback function proto-types.
> 
> Signed-off-by: Patrick 
> ---
>  lib/librte_vhost/Makefile  |   3 +-
>  lib/librte_vhost/rte_vhost.h   |   1 +
>  lib/librte_vhost/rte_vhost_async.h | 134
> +
>  lib/librte_vhost/socket.c  |  20 ++
>  lib/librte_vhost/vhost.c   |  74 +++-
>  lib/librte_vhost/vhost.h   |  30 -
>  lib/librte_vhost/vhost_user.c  |  28 ++--
>  7 files changed, 283 insertions(+), 7 deletions(-)
>  create mode 100644 lib/librte_vhost/rte_vhost_async.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e592795..3aed094 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -41,7 +41,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c
> iotlb.c socket.c vhost.c \
>   vhost_user.c virtio_net.c vdpa.c
> 
>  # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
> \
> + rte_vhost_async.h
> 
Hi Patrick,
Please also update meson build for newly added file.

Thanks,
Marvin

>  # only compile vhost crypto when cryptodev is enabled
>  ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index d43669f..cec4d07 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -35,6 +35,7 @@
>  #define RTE_VHOST_USER_EXTBUF_SUPPORT(1ULL << 5)
>  /* support only linear buffers (no chained mbufs) */
>  #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6)
> +#define RTE_VHOST_USER_ASYNC_COPY(1ULL << 7)
> 
>  /** Protocol features. */
>  #ifndef VHOST_USER_PROTOCOL_F_MQ
> diff --git a/lib/librte_vhost/rte_vhost_async.h
> b/lib/librte_vhost/rte_vhost_async.h
> new file mode 100644
> index 000..82f2ebe
> --- /dev/null
> +++ b/lib/librte_vhost/rte_vhost_async.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */

s/2018/2020/ 

> +
> +#ifndef _RTE_VHOST_ASYNC_H_
> +#define _RTE_VHOST_ASYNC_H_
> +
> +#include "rte_vhost.h"
> +
> +/**
> + * iovec iterator
> + */
> +struct iov_it {
> + /** offset to the first byte of interesting data */
> + size_t offset;
> + /** total bytes of data in this iterator */
> + size_t count;
> + /** pointer to the iovec array */
> + struct iovec *iov;
> + /** number of iovec in this iterator */
> + unsigned long nr_segs;
> +};

Patrick,
I think structure named as "it" is too generic for understanding, please use 
more meaningful name like "iov_iter". 

> +
> +/**
> + * dma transfer descriptor pair
> + */
> +struct dma_trans_desc {
> + /** source memory iov_it */
> + struct iov_it *src;
> + /** destination memory iov_it */
> + struct iov_it *dst;
> +};
> +

This series patch named as sync copy,  and dma is just one async copy method 
which underneath hardware supplied. 
IMHO, structure is better to named as "async_copy_desc" which matched the 
overall concept. 

> +/**
> + * dma transfer status
> + */
> +struct dma_trans_status {
> + /** An array of application specific data for source memory */
> + uintptr_t *src_opaque_data;
> + /** An array of application specific data for destination memory */
> + uintptr_t *dst_opaque_data;
> +};
> +
Same as pervious comment.

> +/**
> + * dma operation callbacks to be implemented by applications
> + */
> +struct rte_vhost_async_channel_ops {
> + /**
> +  * instruct a DMA channel to perform copies for a batch of packets
> +  *
> +  * @param vid
> +  *  id of vhost device to perform data copies
> +  * @param queue_id
> +  *  queue id to perform data copies
> +  * @param descs
> +  *  an array of DMA transfer memory descriptors
> +  * @param opaque_data
> +  *  opaque data pair sending to DMA engine
> +  * @param count
> +  *  number of elements in the "descs" array
> +  * @return
> +  *  -1 on failure, number of descs processed on success
> +  */
> + int (*transfer_data)(int vid, uint16_t queue_id,
> + struct dma_trans_desc *descs,
> + struct dma_trans_status *opaque_data,
> + uint16_t count);
> + /**
> +  * check copy-completed packe

Re: [dpdk-dev] [PATCH] net/virtio: disable event suppression when reconnect

2020-05-14 Thread Liu, Yong

Thanks for reminder, xiao. v2 will cc stable branch. 
As spec mentioned, notification is more precise in semantic.  Will do /event 
suppression/event notification/s. 

Regards,
Marvin

> -Original Message-
> From: Wang, Xiao W 
> Sent: Friday, May 15, 2020 9:53 AM
> To: Liu, Yong ; maxime.coque...@redhat.com; Ye,
> Xiaolong ; Wang, Zhihong
> 
> Cc: dev@dpdk.org; Liu, Yong 
> Subject: RE: [dpdk-dev] [PATCH] net/virtio: disable event suppression when
> reconnect
> 
> Hi Marvin,
> 
> Comments inline.
> Thanks for the fix.
> 
> Best Regards,
> Xiao
> 
> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, May 15, 2020 9:41 AM
> > To: maxime.coque...@redhat.com; Ye, Xiaolong ;
> > Wang, Zhihong 
> > Cc: dev@dpdk.org; Liu, Yong 
> > Subject: [dpdk-dev] [PATCH] net/virtio: disable event suppression when
> > reconnect
> >
> > Event suppression should be disabled after virtqueue initialization. It
> 
> s/Event suppression/interrupt/g
> 
> > can be enabled by calling rte_eth_dev_rx_intr_enable later.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
> > index 408bba236a..2702e120ee 100644
> > --- a/drivers/net/virtio/virtqueue.c
> > +++ b/drivers/net/virtio/virtqueue.c
> > @@ -175,6 +175,7 @@ virtqueue_rxvq_reset_packed(struct virtqueue *vq)
> >
> >  vring_desc_init_packed(vq, size);
> >
> > +virtqueue_disable_intr(vq);
> >  return 0;
> >  }
> >
> > @@ -211,5 +212,6 @@ virtqueue_txvq_reset_packed(struct virtqueue *vq)
> >
> >  vring_desc_init_packed(vq, size);
> >
> > +virtqueue_disable_intr(vq);
> >  return 0;
> >  }
> > --
> > 2.17.1
> 
> Can we backport it to LTS by cc stable?
>

Re: [dpdk-dev] [PATCH v2] net/virtio: fix AVX512 datapath selection

2020-05-12 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, May 12, 2020 6:04 PM
> To: Liu, Yong ; Yigit, Ferruh 
> Cc: dev@dpdk.org; Thomas Monjalon ; David
> Marchand ; Richardson, Bruce
> ; Nicolau, Radu ;
> Luca Boccassi ; Wang, Zhihong
> ; Ye, Xiaolong 
> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> 
> 
> 
> On 5/12/20 10:46 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Tuesday, May 12, 2020 4:36 PM
> >> To: Liu, Yong ; Yigit, Ferruh 
> >> Cc: dev@dpdk.org; Thomas Monjalon ; David
> >> Marchand ; Richardson, Bruce
> >> ; Nicolau, Radu ;
> >> Luca Boccassi ; Wang, Zhihong
> >> ; Ye, Xiaolong 
> >> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> >>
> >>
> >>
> >> On 5/12/20 5:29 AM, Liu, Yong wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Maxime Coquelin 
> >>>> Sent: Tuesday, May 12, 2020 3:50 AM
> >>>> To: Yigit, Ferruh ; Wang, Zhihong
> >>>> ; Ye, Xiaolong ;
> Liu,
> >>>> Yong 
> >>>> Cc: dev@dpdk.org; Thomas Monjalon ; David
> >>>> Marchand ; Richardson, Bruce
> >>>> ; Nicolau, Radu
> ;
> >>>> Luca Boccassi 
> >>>> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> >>>>
> >>>>
> >>>>
> >>>> On 5/11/20 8:48 PM, Ferruh Yigit wrote:
> >>>>> From: Maxime Coquelin 
> >>>>>
> >>>>> The AVX512 packed ring datapath selection was only done
> >>>>> at build time, but it should also be checked at runtime
> >>>>> that the CPU supports it.
> >>>>>
> >>>>> This patch add a CPU flags check so that non-vectorized
> >>>>> path is selected at runtime if AVX512 is not supported.
> >>>>>
> >>>>> Also in meson build enable vectorization only for relevant file, not
> for
> >>>>> all driver.
> >>>>>
> >>>>> Fixes: ccb10995c2ad ("net/virtio: add election for vectorized path")
> >>>>>
> >>>>> Signed-off-by: Maxime Coquelin 
> >>>>> Signed-off-by: Ferruh Yigit 
> >>>>> ---
> >>>>> Cc: Bruce Richardson 
> >>>>> Cc: Radu Nicolau 
> >>>>> Cc: Luca Boccassi 
> >>>>>
> >>>>> For meson I mainly adapted implementation from other driver, not
> >> able
> >>>> to
> >>>>> test or verify myself.
> >>>>> ---
> >>>>>  drivers/net/virtio/meson.build | 9 +++--
> >>>>>  drivers/net/virtio/virtio_ethdev.c | 6 --
> >>>>>  2 files changed, 11 insertions(+), 4 deletions(-)
> >>>>
> >>>> Thanks Ferruh, I cannot test either right now but it looks good to me:
> >>>>
> >>>
> >>> Hi Maxime & Ferruh,
> >>> IMHO, meson build update is the essential part for fixing unexpected
> >> AVX512 instructions.
> >>> Change in virtio_ethdev may cause building issues on ppc and arm
> >> platform.  Is it convenient to revert that change?
> >>
> >> As replied to v1:
> >>
> >> With a bit more of context, we can see that it only affects packed ring
> >> when CC_AVX512_SUPPORT is set. So it does break neither split ring nor
> >> ARM/PPC:
> >>
> >>if (vectorized) {
> >>if (!vtpci_packed_queue(hw)) {
> >>hw->use_vec_rx = 1;
> >>} else {
> >> #if !defined(CC_AVX512_SUPPORT)
> >>PMD_DRV_LOG(INFO,
> >>"building environment do not support
> >> packed ring vectorized");
> >> #else
> >>if
> >> (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) {
> >>hw->use_vec_rx = 1;
> >>hw->use_vec_tx = 1;
> >>}
> >> #endif
> >>}
> >>}
> >>
> >> So IMO, no revert has to be done.
> >>
> >
> > Ok, I messed it with my previous building fix.  It will be no harm for  this
> double check.
> 
> While it does not break, I agree for the unnecessary double-check.
> You can send a clean-up patch to remove this part in -rc3.
> 

Thanks a lot, have sent clean-up patch.

> Thanks,
> Maxime
> 
> >>> Regards,
> >>> Marvin
> >>>
> >>>> In case you're waiting for it:
> >>>> Acked-by: Maxime Coquelin 
> >>>>
> >>>> Maxime
> >>>
> >

Re: [dpdk-dev] [PATCH v2] net/virtio: fix AVX512 datapath selection

2020-05-12 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, May 12, 2020 4:36 PM
> To: Liu, Yong ; Yigit, Ferruh 
> Cc: dev@dpdk.org; Thomas Monjalon ; David
> Marchand ; Richardson, Bruce
> ; Nicolau, Radu ;
> Luca Boccassi ; Wang, Zhihong
> ; Ye, Xiaolong 
> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> 
> 
> 
> On 5/12/20 5:29 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Tuesday, May 12, 2020 3:50 AM
> >> To: Yigit, Ferruh ; Wang, Zhihong
> >> ; Ye, Xiaolong ; Liu,
> >> Yong 
> >> Cc: dev@dpdk.org; Thomas Monjalon ; David
> >> Marchand ; Richardson, Bruce
> >> ; Nicolau, Radu ;
> >> Luca Boccassi 
> >> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> >>
> >>
> >>
> >> On 5/11/20 8:48 PM, Ferruh Yigit wrote:
> >>> From: Maxime Coquelin 
> >>>
> >>> The AVX512 packed ring datapath selection was only done
> >>> at build time, but it should also be checked at runtime
> >>> that the CPU supports it.
> >>>
> >>> This patch add a CPU flags check so that non-vectorized
> >>> path is selected at runtime if AVX512 is not supported.
> >>>
> >>> Also in meson build enable vectorization only for relevant file, not for
> >>> all driver.
> >>>
> >>> Fixes: ccb10995c2ad ("net/virtio: add election for vectorized path")
> >>>
> >>> Signed-off-by: Maxime Coquelin 
> >>> Signed-off-by: Ferruh Yigit 
> >>> ---
> >>> Cc: Bruce Richardson 
> >>> Cc: Radu Nicolau 
> >>> Cc: Luca Boccassi 
> >>>
> >>> For meson I mainly adapted implementation from other driver, not
> able
> >> to
> >>> test or verify myself.
> >>> ---
> >>>  drivers/net/virtio/meson.build | 9 +++--
> >>>  drivers/net/virtio/virtio_ethdev.c | 6 --
> >>>  2 files changed, 11 insertions(+), 4 deletions(-)
> >>
> >> Thanks Ferruh, I cannot test either right now but it looks good to me:
> >>
> >
> > Hi Maxime & Ferruh,
> > IMHO, meson build update is the essential part for fixing unexpected
> AVX512 instructions.
> > Change in virtio_ethdev may cause building issues on ppc and arm
> platform.  Is it convenient to revert that change?
> 
> As replied to v1:
> 
> With a bit more of context, we can see that it only affects packed ring
> when CC_AVX512_SUPPORT is set. So it does break neither split ring nor
> ARM/PPC:
> 
>   if (vectorized) {
>   if (!vtpci_packed_queue(hw)) {
>   hw->use_vec_rx = 1;
>   } else {
> #if !defined(CC_AVX512_SUPPORT)
>   PMD_DRV_LOG(INFO,
>   "building environment do not support
> packed ring vectorized");
> #else
>   if
> (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) {
>   hw->use_vec_rx = 1;
>   hw->use_vec_tx = 1;
>   }
> #endif
>   }
>   }
> 
> So IMO, no revert has to be done.
> 

Ok, I messed it with my previous building fix.  It will be no harm for  this 
double check. 

> > Regards,
> > Marvin
> >
> >> In case you're waiting for it:
> >> Acked-by: Maxime Coquelin 
> >>
> >> Maxime
> >

Re: [dpdk-dev] [PATCH v2] net/virtio: fix AVX512 datapath selection

2020-05-11 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, May 12, 2020 3:50 AM
> To: Yigit, Ferruh ; Wang, Zhihong
> ; Ye, Xiaolong ; Liu,
> Yong 
> Cc: dev@dpdk.org; Thomas Monjalon ; David
> Marchand ; Richardson, Bruce
> ; Nicolau, Radu ;
> Luca Boccassi 
> Subject: Re: [PATCH v2] net/virtio: fix AVX512 datapath selection
> 
> 
> 
> On 5/11/20 8:48 PM, Ferruh Yigit wrote:
> > From: Maxime Coquelin 
> >
> > The AVX512 packed ring datapath selection was only done
> > at build time, but it should also be checked at runtime
> > that the CPU supports it.
> >
> > This patch add a CPU flags check so that non-vectorized
> > path is selected at runtime if AVX512 is not supported.
> >
> > Also in meson build enable vectorization only for relevant file, not for
> > all driver.
> >
> > Fixes: ccb10995c2ad ("net/virtio: add election for vectorized path")
> >
> > Signed-off-by: Maxime Coquelin 
> > Signed-off-by: Ferruh Yigit 
> > ---
> > Cc: Bruce Richardson 
> > Cc: Radu Nicolau 
> > Cc: Luca Boccassi 
> >
> > For meson I mainly adapted implementation from other driver, not able
> to
> > test or verify myself.
> > ---
> >  drivers/net/virtio/meson.build | 9 +++--
> >  drivers/net/virtio/virtio_ethdev.c | 6 --
> >  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> Thanks Ferruh, I cannot test either right now but it looks good to me:
> 

Hi Maxime & Ferruh,
IMHO, meson build update is the essential part for fixing unexpected AVX512 
instructions. 
Change in virtio_ethdev may cause building issues on ppc and arm platform.  Is 
it convenient to revert that change?

Regards,
Marvin

> In case you're waiting for it:
> Acked-by: Maxime Coquelin 
> 
> Maxime

Re: [dpdk-dev] [PATCH] net/virtio: fix AVX512 datapath selection

2020-05-11 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, May 11, 2020 10:47 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Yigit, Ferruh ; dev@dpdk.org
> Cc: Maxime Coquelin 
> Subject: [PATCH] net/virtio: fix AVX512 datapath selection
> 
> The AVX512 packed ring datapath selection was only done
> at build time, but it should also be checked at runtime
> that the CPU supports it.
> 
> This patch add a CPU flags check so that non-vectorized
> path is selected at runtime if AVX512 is not supported.
> 
> Fixes: ccb10995c2ad ("net/virtio: add election for vectorized path")
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 312871cb48..49ccef12c7 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1965,8 +1965,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>   PMD_DRV_LOG(INFO,
>   "building environment do not support
> packed ring vectorized");
>  #else
> - hw->use_vec_rx = 1;
> - hw->use_vec_tx = 1;
> + if
> (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) {
> + hw->use_vec_rx = 1;
> + hw->use_vec_tx = 1;
> + }
>  #endif

Hi Maxime,
Here is pre-setting for vectorized path selection, virtio_dev_configure will do 
second time check. 
Running environment will be checked in second time check.  We can move some 
checks from virtio_dev_configure to here, but is it needed to do that?

BTW, both split ring and packed ring will utilized this setting, it will break 
split vectorized datapath is server not has AVX512F flag.
And it may cause building issue on those platforms which not defined 
RTE_CPUFLAG_AVX512F.

Thanks,
Marvin

>   }
>   }
> --
> 2.25.4

Re: [dpdk-dev] Intel CI failure due to Virtio PMD AVX series

2020-05-05 Thread Liu, Yong

Sure, I have fixed it in http://patchwork.dpdk.org/patch/69802/.  
The issue was caused by clang 6.0.0 not defined the function when building 
32-bit target.

Thanks,
Marvin

> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, May 4, 2020 5:59 PM
> To: Liu, Yong 
> Cc: Yigit, Ferruh ; Thomas Monjalon
> ; David Marchand ;
> dev@dpdk.org
> Subject: Intel CI failure due to Virtio PMD AVX series
> 
> Hi Marvin,
> 
> Could you please check what is wrong with your AVX series for
> Virtio packed ring in Intel CI (UB1804-32  + Meson)?
> 
> http://mails.dpdk.org/archives/test-report/2020-May/130108.html
> 
> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v4 2/2] vhost: utilize dpdk dynamic memory allocator

2020-04-28 Thread Liu, Yong

This is sent by mistake, please ignore.

> -Original Message-
> From: Liu, Yong 
> Sent: Wednesday, April 29, 2020 9:01 AM
> To: maxime.coque...@redhat.com; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Liu, Yong 
> Subject: [PATCH v4 2/2] vhost: utilize dpdk dynamic memory allocator
> 
> Replace dynamic memory allocator with dpdk memory allocator.
> 
> Signed-off-by: Marvin Liu 
> Reviewed-by: Maxime Coquelin 
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index bd1be0104..79fcb9d19 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -191,7 +191,7 @@ vhost_backend_cleanup(struct virtio_net *dev)
>   dev->mem = NULL;
>   }
> 
> - free(dev->guest_pages);
> + rte_free(dev->guest_pages);
>   dev->guest_pages = NULL;
> 
>   if (dev->log_addr) {
> @@ -903,11 +903,12 @@ add_one_guest_page(struct virtio_net *dev,
> uint64_t guest_phys_addr,
>   if (dev->nr_guest_pages == dev->max_guest_pages) {
>   dev->max_guest_pages *= 2;
>   old_pages = dev->guest_pages;
> - dev->guest_pages = realloc(dev->guest_pages,
> - dev->max_guest_pages *
> sizeof(*page));
> - if (!dev->guest_pages) {
> + dev->guest_pages = rte_realloc(dev->guest_pages,
> + dev->max_guest_pages *
> sizeof(*page),
> + RTE_CACHE_LINE_SIZE);
> + if (dev->guest_pages == NULL) {
>   VHOST_LOG_CONFIG(ERR, "cannot realloc
> guest_pages\n");
> - free(old_pages);
> + rte_free(old_pages);
>   return -1;
>   }
>   }
> @@ -1062,10 +1063,12 @@ vhost_user_set_mem_table(struct virtio_net
> **pdev, struct VhostUserMsg *msg,
>   vhost_user_iotlb_flush_all(dev->virtqueue[i]);
> 
>   dev->nr_guest_pages = 0;
> - if (!dev->guest_pages) {
> + if (dev->guest_pages == NULL) {
>   dev->max_guest_pages = 8;
> - dev->guest_pages = malloc(dev->max_guest_pages *
> - sizeof(struct guest_page));
> + dev->guest_pages = rte_zmalloc(NULL,
> + dev->max_guest_pages *
> + sizeof(struct guest_page),
> + RTE_CACHE_LINE_SIZE);
>   if (dev->guest_pages == NULL) {
>   VHOST_LOG_CONFIG(ERR,
>   "(%d) failed to allocate memory "
> --
> 2.17.1

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Liu, Yong
> Sent: Tuesday, April 28, 2020 9:01 PM
> To: 'Maxime Coquelin' ; Ye, Xiaolong
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Tuesday, April 28, 2020 4:44 PM
> > To: Liu, Yong ; Ye, Xiaolong ;
> > Wang, Zhihong 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >
> >
> >
> > On 4/28/20 3:14 AM, Liu, Yong wrote:
> > >
> > >
> > >> -Original Message-
> > >> From: Maxime Coquelin 
> > >> Sent: Monday, April 27, 2020 7:21 PM
> > >> To: Liu, Yong ; Ye, Xiaolong
> > ;
> > >> Wang, Zhihong 
> > >> Cc: dev@dpdk.org
> > >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> > path
> > >>
> > >>
> > >>
> > >> On 4/26/20 4:19 AM, Marvin Liu wrote:
> > >>> Optimize packed ring Rx path with SIMD instructions. Solution of
> > >>> optimization is pretty like vhost, is that split path into batch and
> > >>> single functions. Batch function is further optimized by AVX512
> > >>> instructions. Also pad desc extra structure to 16 bytes aligned, thus
> > >>> four elements will be saved in one batch.
> > >>>
> > >>> Signed-off-by: Marvin Liu 
> > >>>
> > >>> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > >>> index c9edb84ee..102b1deab 100644
> > >>> --- a/drivers/net/virtio/Makefile
> > >>> +++ b/drivers/net/virtio/Makefile
> > >>> @@ -36,6 +36,41 @@ else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> > >> $(CONFIG_RTE_ARCH_ARM64)),)
> > >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> > virtio_rxtx_simple_neon.c
> > >>>  endif
> > >>>
> > >>> +ifneq ($(FORCE_DISABLE_AVX512), y)
> > >>> +   CC_AVX512_SUPPORT=\
> > >>> +   $(shell $(CC) -march=native -dM -E - &1 | \
> > >>> +   sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' 
> > >>> | \
> > >>> +   grep -q AVX512 && echo 1)
> > >>> +endif
> > >>> +
> > >>> +ifeq ($(CC_AVX512_SUPPORT), 1)
> > >>> +CFLAGS += -DCC_AVX512_SUPPORT
> > >>> +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx_packed_avx.c
> > >>> +
> > >>> +ifeq ($(RTE_TOOLCHAIN), gcc)
> > >>> +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> > >>> +CFLAGS += -DVIRTIO_GCC_UNROLL_PRAGMA
> > >>> +endif
> > >>> +endif
> > >>> +
> > >>> +ifeq ($(RTE_TOOLCHAIN), clang)
> > >>> +ifeq ($(shell test
> > $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -
> > >> ge 37 && echo 1), 1)
> > >>> +CFLAGS += -DVIRTIO_CLANG_UNROLL_PRAGMA
> > >>> +endif
> > >>> +endif
> > >>> +
> > >>> +ifeq ($(RTE_TOOLCHAIN), icc)
> > >>> +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> > >>> +CFLAGS += -DVIRTIO_ICC_UNROLL_PRAGMA
> > >>> +endif
> > >>> +endif
> > >>> +
> > >>> +CFLAGS_virtio_rxtx_packed_avx.o += -mavx512f -mavx512bw -
> > mavx512vl
> > >>> +ifeq ($(shell test $(GCC_VERSION) -ge 100 && echo 1), 1)
> > >>> +CFLAGS_virtio_rxtx_packed_avx.o += -Wno-zero-length-bounds
> > >>> +endif
> > >>> +endif
> > >>> +
> > >>>  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> > >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_user/vhost_user.c
> > >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> > virtio_user/vhost_kernel.c
> > >>> diff --git a/drivers/net/virtio/meson.build
> > b/drivers/net/virtio/meson.build
> > >>> index 15150eea1..8e68c3039 100644
> > >>> --- a/drivers/net/virtio/meson.build
> > >>> +++ b/drivers/net/virtio/meson.build
> > >>> @@ -9,6 +9,20 @@ sources += files('virtio_ethdev.c',
> > >>>  deps += ['kvargs', 'bus_pci']
> > >>>
> > >>>  if arch_

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, April 28, 2020 11:40 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; jer...@marvell.com
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/28/20 5:35 PM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Tuesday, April 28, 2020 10:50 PM
> >> To: Liu, Yong ; Ye, Xiaolong
> ;
> >> Wang, Zhihong 
> >> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >> ; jer...@marvell.com
> >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >>
> >>
> >>
> >> On 4/28/20 4:43 PM, Liu, Yong wrote:
> >>>
> >>>
> >>>> -Original Message-
> >>>> From: Maxime Coquelin 
> >>>> Sent: Tuesday, April 28, 2020 9:46 PM
> >>>> To: Liu, Yong ; Ye, Xiaolong
> >> ;
> >>>> Wang, Zhihong 
> >>>> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >>>> ; jer...@marvell.com
> >>>> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> >> path
> >>>>
> >>>>
> >>>>
> >>>> On 4/28/20 3:01 PM, Liu, Yong wrote:
> >>>>>>> Maxime,
> >>>>>>> Thanks for point it out, it will add extra cache miss in datapath.
> >>>>>>> And its impact on performance is around 1% in loopback case.
> >>>>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
> >>>>>> on my side because when doing IO loopback, the cache pressure is
> >>>>>> much less important.
> >>>>>>
> >>>>>>> While benefit of vectorized path will be more than that number.
> >>>>>> Ok, but I disagree for two reasons:
> >>>>>>  1. You have to keep in mind than non-vectorized is the default and
> >>>>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
> >>>>>> checking header length (so no error stats), etc...
> >>>>>>
> >>>>> Ok, I will keep non-vectorized same as before.
> >>>>>
> >>>>>>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A
> >> because
> >>>>>> the gain is 20% on $CPU_VENDOR_B.
> >>>>>>
> >>>>>> In the case we see more degradation in real-world scenario, you
> might
> >>>>>> want to consider using ifdefs to avoid adding padding in the non-
> >>>>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-
> user
> >>>>>> PMD in patch 7.
> >>>>>>
> >>>>> Maxime,
> >>>>> The performance difference is so slight, so I ignored for it look like a
> >>>> sampling error.
> >>>>
> >>>> Agree for IO loopback, but it adds one more cache line access per
> burst,
> >>>> which might be see in some real-life use cases.
> >>>>
> >>>>> It maybe not suitable to add new configuration for such setting
> which
> >>>> only used inside driver.
> >>>>
> >>>> Wait, the Virtio-user #ifdef is based on the defconfig options? How
> can
> >>>> it work since both Virtio PMD and Virtio-user PMD can be selected at
> the
> >>>> same time?
> >>>>
> >>>> I thought it was a define set before the headers inclusion and unset
> >>>> afterwards, but I didn't checked carefully.
> >>>>
> >>>
> >>> Maxime,
> >>> The difference between virtio PMD and Virtio-user PMD addresses is
> >> handled by vq->offset.
> >>>
> >>> When virtio PMD is running, offset will be set to buf_iova.
> >>> vq->offset = offsetof(struct rte_mbuf, buf_iova);
> >>>
> >>> When virtio_user PMD is running, offset will be set to buf_addr.
> >>> vq->offset = offsetof(struct rte_mbuf, buf_addr);
> >>
> >> Ok, but below is a build time check:
> >>
> >> +#ifdef RTE_VIRTIO_USER
> >> +  __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq-
> >>> offset);
> >> +#else
> >> +  __m128i flag_offset = _m

Re: [dpdk-dev] [PATCH v3 2/2] vhost: binary search address mapping table

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, April 28, 2020 11:28 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v3 2/2] vhost: binary search address mapping table
> 
> 
> 
> On 4/28/20 11:13 AM, Marvin Liu wrote:
> > If Tx zero copy enabled, gpa to hpa mapping table is updated one by
> > one. This will harm performance when guest memory backend using 2M
> > hugepages. Now utilize binary search to find the entry in mapping
> > table, meanwhile set threshold to 256 entries for linear search.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index e592795f2..8769afaad 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -10,7 +10,7 @@ EXPORT_MAP := rte_vhost_version.map
> >
> >  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> >  CFLAGS += -I vhost_user
> > -CFLAGS += -fno-strict-aliasing
> > +CFLAGS += -fno-strict-aliasing -Wno-maybe-uninitialized
> >  LDLIBS += -lpthread
> >
> >  ifeq ($(RTE_TOOLCHAIN), gcc)
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 507dbf214..a0fee39d5 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -546,20 +546,46 @@ extern int vhost_data_log_level;
> >  #define MAX_VHOST_DEVICE   1024
> >  extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
> >
> > +#define VHOST_BINARY_SEARCH_THRESH 256
> > +static int guest_page_addrcmp(const void *p1, const void *p2)
> > +{
> > +   const struct guest_page *page1 = (const struct guest_page *)p1;
> > +   const struct guest_page *page2 = (const struct guest_page *)p2;
> > +
> > +   if (page1->guest_phys_addr > page2->guest_phys_addr)
> > +   return 1;
> > +   if (page1->guest_phys_addr < page2->guest_phys_addr)
> > +   return -1;
> > +
> > +   return 0;
> > +}
> > +
> >  /* Convert guest physical address to host physical address */
> >  static __rte_always_inline rte_iova_t
> >  gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
> >  {
> > uint32_t i;
> > struct guest_page *page;
> > -
> > -   for (i = 0; i < dev->nr_guest_pages; i++) {
> > -   page = &dev->guest_pages[i];
> > -
> > -   if (gpa >= page->guest_phys_addr &&
> > -   gpa + size < page->guest_phys_addr + page->size) {
> > -   return gpa - page->guest_phys_addr +
> > -  page->host_phys_addr;
> > +   struct guest_page key;
> > +
> > +   if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
> 
> I would have expected the binary search to be more efficient for much
> smaller number of pages. Have you done some tests to define this
> threshold value?
> 
Maxime,
In my unit test, binary search will be better when size over 16. But it won't 
be the case with real VM.
I have tested around 128 to 1024 pages,  the benefit can be seen around 256. So 
threshold is set to it.

Thanks,
Marvin

> > +   key.guest_phys_addr = gpa;
> > +   page = bsearch(&key, dev->guest_pages, dev-
> >nr_guest_pages,
> > +  sizeof(struct guest_page), guest_page_addrcmp);
> > +   if (page) {
> > +   if (gpa + size < page->guest_phys_addr + page->size)
> > +   return gpa - page->guest_phys_addr +
> > +   page->host_phys_addr;
> > +   }
> 
> Is all the generated code inlined?
> 
The compare function hasn't been inlined.  Will inline it in next version.

> I see that in the elf file:
> 2386: 00874f7016 FUNCLOCAL  DEFAULT   13
> guest_page_addrcmp
> 
> > +   } else {
> > +   for (i = 0; i < dev->nr_guest_pages; i++) {
> > +   page = &dev->guest_pages[i];
> > +
> > +   if (gpa >= page->guest_phys_addr &&
> > +   gpa + size < page->guest_phys_addr +
> > +   page->size)
> > +   return gpa - page->guest_phys_addr +
> > +  page->host_phys_addr;
> > }
> > }
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 79fcb9d19..15e50d27d 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -965,6 +965,12 @@ add_guest_pages(struct virtio_net *dev, struct
> rte_vhost_mem_region *reg,
> > reg_size -= size;
> > }
> >
> > +   /* sort guest page array if over binary search threshold */
> > +   if (dev->nr_guest_pages >= VHOST_BINARY_SEARCH_THRESH) {
> > +   qsort((void *)dev->guest_pages, dev->nr_guest_pages,
> > +   sizeof(struct guest_page), guest_page_addrcmp);
> > +   }
> > +
> > return 0;
> >  }
> >
> >

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, April 28, 2020 10:50 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; jer...@marvell.com
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/28/20 4:43 PM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Tuesday, April 28, 2020 9:46 PM
> >> To: Liu, Yong ; Ye, Xiaolong
> ;
> >> Wang, Zhihong 
> >> Cc: dev@dpdk.org; Honnappa Nagarahalli
> >> ; jer...@marvell.com
> >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >>
> >>
> >>
> >> On 4/28/20 3:01 PM, Liu, Yong wrote:
> >>>>> Maxime,
> >>>>> Thanks for point it out, it will add extra cache miss in datapath.
> >>>>> And its impact on performance is around 1% in loopback case.
> >>>> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
> >>>> on my side because when doing IO loopback, the cache pressure is
> >>>> much less important.
> >>>>
> >>>>> While benefit of vectorized path will be more than that number.
> >>>> Ok, but I disagree for two reasons:
> >>>>  1. You have to keep in mind than non-vectorized is the default and
> >>>> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
> >>>> checking header length (so no error stats), etc...
> >>>>
> >>> Ok, I will keep non-vectorized same as before.
> >>>
> >>>>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A
> because
> >>>> the gain is 20% on $CPU_VENDOR_B.
> >>>>
> >>>> In the case we see more degradation in real-world scenario, you might
> >>>> want to consider using ifdefs to avoid adding padding in the non-
> >>>> vectorized case, like you did to differentiate Virtio PMD to Virtio-user
> >>>> PMD in patch 7.
> >>>>
> >>> Maxime,
> >>> The performance difference is so slight, so I ignored for it look like a
> >> sampling error.
> >>
> >> Agree for IO loopback, but it adds one more cache line access per burst,
> >> which might be see in some real-life use cases.
> >>
> >>> It maybe not suitable to add new configuration for such setting which
> >> only used inside driver.
> >>
> >> Wait, the Virtio-user #ifdef is based on the defconfig options? How can
> >> it work since both Virtio PMD and Virtio-user PMD can be selected at the
> >> same time?
> >>
> >> I thought it was a define set before the headers inclusion and unset
> >> afterwards, but I didn't checked carefully.
> >>
> >
> > Maxime,
> > The difference between virtio PMD and Virtio-user PMD addresses is
> handled by vq->offset.
> >
> > When virtio PMD is running, offset will be set to buf_iova.
> > vq->offset = offsetof(struct rte_mbuf, buf_iova);
> >
> > When virtio_user PMD is running, offset will be set to buf_addr.
> > vq->offset = offsetof(struct rte_mbuf, buf_addr);
> 
> Ok, but below is a build time check:
> 
> +#ifdef RTE_VIRTIO_USER
> + __m128i flag_offset = _mm_set_epi64x(flags_temp, (uint64_t)vq-
> >offset);
> +#else
> + __m128i flag_offset = _mm_set_epi64x(flags_temp, 0);
> +#endif
> 
> So how can it work for a single build for both Virtio and Virtio-user?
> 

Sorry, here is an implementation error. vq->offset should be used in descs_base 
for getting the iova address. 
It will work the same as VIRTIO_MBUF_ADDR macro.

> >>> Virtio driver can check whether virtqueue is using vectorized path when
> >> initialization, will use padded structure if it is.
> >>> I have added some tested code and now performance came back.  Since
> >> code has changed in initialization process,  it need some time for
> regression
> >> check.
> >>
> >> Ok, works for me.
> >>
> >> I am investigating a linkage issue with your series, which does not
> >> happen systematically (see below, it happens also with clang). David
> >> pointed me to some Intel patches removing the usage if __rte_weak,
> >> could it be related?
> >>
> >
> > I checked David's patch, it only changed i40e driver. Meanwhile attribute
> __rte_weak should still be in virtio_rxtx.c.
> > I will follow David's patch, eliminate the usage of weak attribute.
> 
> Yeah, I meant below issue could be linked to __rte_weak, not that i40e
> patch was the cause of this problem.
> 

Maxime,
I haven't seen any build issue related to __rte_weak both with gcc and clang.   

Thanks,
Marvin

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, April 28, 2020 9:46 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; jer...@marvell.com
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/28/20 3:01 PM, Liu, Yong wrote:
> >>> Maxime,
> >>> Thanks for point it out, it will add extra cache miss in datapath.
> >>> And its impact on performance is around 1% in loopback case.
> >> Ok, thanks for doing the test. I'll try to run some PVP benchmarks
> >> on my side because when doing IO loopback, the cache pressure is
> >> much less important.
> >>
> >>> While benefit of vectorized path will be more than that number.
> >> Ok, but I disagree for two reasons:
> >>  1. You have to keep in mind than non-vectorized is the default and
> >> encouraged mode to use. Indeed, it takes a lot of shortcuts like not
> >> checking header length (so no error stats), etc...
> >>
> > Ok, I will keep non-vectorized same as before.
> >
> >>  2. It's like saying it's OK it degrades by 5% on $CPU_VENDOR_A because
> >> the gain is 20% on $CPU_VENDOR_B.
> >>
> >> In the case we see more degradation in real-world scenario, you might
> >> want to consider using ifdefs to avoid adding padding in the non-
> >> vectorized case, like you did to differentiate Virtio PMD to Virtio-user
> >> PMD in patch 7.
> >>
> > Maxime,
> > The performance difference is so slight, so I ignored for it look like a
> sampling error.
> 
> Agree for IO loopback, but it adds one more cache line access per burst,
> which might be see in some real-life use cases.
> 
> > It maybe not suitable to add new configuration for such setting which
> only used inside driver.
> 
> Wait, the Virtio-user #ifdef is based on the defconfig options? How can
> it work since both Virtio PMD and Virtio-user PMD can be selected at the
> same time?
> 
> I thought it was a define set before the headers inclusion and unset
> afterwards, but I didn't checked carefully.
> 

Maxime,
The difference between virtio PMD and Virtio-user PMD addresses is handled by 
vq->offset. 

When virtio PMD is running, offset will be set to buf_iova.
vq->offset = offsetof(struct rte_mbuf, buf_iova);

When virtio_user PMD is running, offset will be set to buf_addr.
vq->offset = offsetof(struct rte_mbuf, buf_addr);

> > Virtio driver can check whether virtqueue is using vectorized path when
> initialization, will use padded structure if it is.
> > I have added some tested code and now performance came back.  Since
> code has changed in initialization process,  it need some time for regression
> check.
> 
> Ok, works for me.
> 
> I am investigating a linkage issue with your series, which does not
> happen systematically (see below, it happens also with clang). David
> pointed me to some Intel patches removing the usage if __rte_weak,
> could it be related?
> 

I checked David's patch, it only changed i40e driver. Meanwhile attribute 
__rte_weak should still be in virtio_rxtx.c. 
I will follow David's patch, eliminate the usage of weak attribute. 

> 
> gcc  -o app/test/dpdk-test
> 'app/test/3062f5d@@dpdk-test@exe/commands.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/packet_burst_generator.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_acl.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_alarm.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_atomic.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_barrier.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_bpf.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_byteorder.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_cirbuf.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_etheraddr.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_ipaddr.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_lib.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_num.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_portlist.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cmdline_string.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_common.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cpuflags.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_crc.c.o'
> 'app/test/3062f5d@@dpdk-test@exe/test_cryptodev.c.o'
> 'app/test/3062f5d@@dpd

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-28 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Tuesday, April 28, 2020 4:44 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/28/20 3:14 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Monday, April 27, 2020 7:21 PM
> >> To: Liu, Yong ; Ye, Xiaolong
> ;
> >> Wang, Zhihong 
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx
> path
> >>
> >>
> >>
> >> On 4/26/20 4:19 AM, Marvin Liu wrote:
> >>> Optimize packed ring Rx path with SIMD instructions. Solution of
> >>> optimization is pretty like vhost, is that split path into batch and
> >>> single functions. Batch function is further optimized by AVX512
> >>> instructions. Also pad desc extra structure to 16 bytes aligned, thus
> >>> four elements will be saved in one batch.
> >>>
> >>> Signed-off-by: Marvin Liu 
> >>>
> >>> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> >>> index c9edb84ee..102b1deab 100644
> >>> --- a/drivers/net/virtio/Makefile
> >>> +++ b/drivers/net/virtio/Makefile
> >>> @@ -36,6 +36,41 @@ else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> >> $(CONFIG_RTE_ARCH_ARM64)),)
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx_simple_neon.c
> >>>  endif
> >>>
> >>> +ifneq ($(FORCE_DISABLE_AVX512), y)
> >>> + CC_AVX512_SUPPORT=\
> >>> + $(shell $(CC) -march=native -dM -E - &1 | \
> >>> + sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \
> >>> + grep -q AVX512 && echo 1)
> >>> +endif
> >>> +
> >>> +ifeq ($(CC_AVX512_SUPPORT), 1)
> >>> +CFLAGS += -DCC_AVX512_SUPPORT
> >>> +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_packed_avx.c
> >>> +
> >>> +ifeq ($(RTE_TOOLCHAIN), gcc)
> >>> +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> >>> +CFLAGS += -DVIRTIO_GCC_UNROLL_PRAGMA
> >>> +endif
> >>> +endif
> >>> +
> >>> +ifeq ($(RTE_TOOLCHAIN), clang)
> >>> +ifeq ($(shell test
> $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -
> >> ge 37 && echo 1), 1)
> >>> +CFLAGS += -DVIRTIO_CLANG_UNROLL_PRAGMA
> >>> +endif
> >>> +endif
> >>> +
> >>> +ifeq ($(RTE_TOOLCHAIN), icc)
> >>> +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> >>> +CFLAGS += -DVIRTIO_ICC_UNROLL_PRAGMA
> >>> +endif
> >>> +endif
> >>> +
> >>> +CFLAGS_virtio_rxtx_packed_avx.o += -mavx512f -mavx512bw -
> mavx512vl
> >>> +ifeq ($(shell test $(GCC_VERSION) -ge 100 && echo 1), 1)
> >>> +CFLAGS_virtio_rxtx_packed_avx.o += -Wno-zero-length-bounds
> >>> +endif
> >>> +endif
> >>> +
> >>>  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_user/vhost_kernel.c
> >>> diff --git a/drivers/net/virtio/meson.build
> b/drivers/net/virtio/meson.build
> >>> index 15150eea1..8e68c3039 100644
> >>> --- a/drivers/net/virtio/meson.build
> >>> +++ b/drivers/net/virtio/meson.build
> >>> @@ -9,6 +9,20 @@ sources += files('virtio_ethdev.c',
> >>>  deps += ['kvargs', 'bus_pci']
> >>>
> >>>  if arch_subdir == 'x86'
> >>> + if '-mno-avx512f' not in machine_args
> >>> + if cc.has_argument('-mavx512f') and cc.has_argument('-
> >> mavx512vl') and cc.has_argument('-mavx512bw')
> >>> + cflags += ['-mavx512f', '-mavx512bw', '-mavx512vl']
> >>> + cflags += ['-DCC_AVX512_SUPPORT']
> >>> + if (toolchain == 'gcc' and
> >> cc.version().version_compare('>=8.3.0'))
> >>> + cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> >>> + elif (toolchain == 'clang' and
> >> cc.version().version_compare('>=3.7.0&

Re: [dpdk-dev] [PATCH v10 4/9] net/virtio-user: add vectorized devarg

2020-04-27 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, April 27, 2020 7:07 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v10 4/9] net/virtio-user: add vectorized devarg
> 
> 
> 
> On 4/26/20 4:19 AM, Marvin Liu wrote:
> > Add new devarg for virtio user device vectorized path selection. By
> > default vectorized path is disabled.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/doc/guides/nics/virtio.rst b/doc/guides/nics/virtio.rst
> > index 902a1f0cf..d59add23e 100644
> > --- a/doc/guides/nics/virtio.rst
> > +++ b/doc/guides/nics/virtio.rst
> > @@ -424,6 +424,12 @@ Below devargs are supported by the virtio-user
> vdev:
> >  rte_eth_link_get_nowait function.
> >  (Default: 1 (10G))
> >
> > +#.  ``vectorized``:
> > +
> > +It is used to specify whether virtio device perfer to use vectorized 
> > path.
> 
> s/perfer/prefers/
> 
> I'll fix while applying if the rest of the series is ok.

Thanks, Maxime. I will fix in next version followed with i686 building fix. 

> 
> Reviewed-by: Maxime Coquelin 
> 
> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path

2020-04-27 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, April 27, 2020 7:21 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v10 6/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/26/20 4:19 AM, Marvin Liu wrote:
> > Optimize packed ring Rx path with SIMD instructions. Solution of
> > optimization is pretty like vhost, is that split path into batch and
> > single functions. Batch function is further optimized by AVX512
> > instructions. Also pad desc extra structure to 16 bytes aligned, thus
> > four elements will be saved in one batch.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > index c9edb84ee..102b1deab 100644
> > --- a/drivers/net/virtio/Makefile
> > +++ b/drivers/net/virtio/Makefile
> > @@ -36,6 +36,41 @@ else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> $(CONFIG_RTE_ARCH_ARM64)),)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> >  endif
> >
> > +ifneq ($(FORCE_DISABLE_AVX512), y)
> > +   CC_AVX512_SUPPORT=\
> > +   $(shell $(CC) -march=native -dM -E - &1 | \
> > +   sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \
> > +   grep -q AVX512 && echo 1)
> > +endif
> > +
> > +ifeq ($(CC_AVX512_SUPPORT), 1)
> > +CFLAGS += -DCC_AVX512_SUPPORT
> > +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_packed_avx.c
> > +
> > +ifeq ($(RTE_TOOLCHAIN), gcc)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_GCC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), clang)
> > +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -
> ge 37 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_CLANG_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), icc)
> > +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_ICC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +CFLAGS_virtio_rxtx_packed_avx.o += -mavx512f -mavx512bw -mavx512vl
> > +ifeq ($(shell test $(GCC_VERSION) -ge 100 && echo 1), 1)
> > +CFLAGS_virtio_rxtx_packed_avx.o += -Wno-zero-length-bounds
> > +endif
> > +endif
> > +
> >  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
> > diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build
> > index 15150eea1..8e68c3039 100644
> > --- a/drivers/net/virtio/meson.build
> > +++ b/drivers/net/virtio/meson.build
> > @@ -9,6 +9,20 @@ sources += files('virtio_ethdev.c',
> >  deps += ['kvargs', 'bus_pci']
> >
> >  if arch_subdir == 'x86'
> > +   if '-mno-avx512f' not in machine_args
> > +   if cc.has_argument('-mavx512f') and cc.has_argument('-
> mavx512vl') and cc.has_argument('-mavx512bw')
> > +   cflags += ['-mavx512f', '-mavx512bw', '-mavx512vl']
> > +   cflags += ['-DCC_AVX512_SUPPORT']
> > +   if (toolchain == 'gcc' and
> cc.version().version_compare('>=8.3.0'))
> > +   cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> > +   elif (toolchain == 'clang' and
> cc.version().version_compare('>=3.7.0'))
> > +   cflags += '-
> DVHOST_CLANG_UNROLL_PRAGMA'
> > +   elif (toolchain == 'icc' and
> cc.version().version_compare('>=16.0.0'))
> > +   cflags += '-DVHOST_ICC_UNROLL_PRAGMA'
> > +   endif
> > +   sources += files('virtio_rxtx_packed_avx.c')
> > +   endif
> > +   endif
> > sources += files('virtio_rxtx_simple_sse.c')
> >  elif arch_subdir == 'ppc'
> > sources += files('virtio_rxtx_simple_altivec.c')
> > diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> > index febaf17a8..5c112cac7 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -105,6 +105,9 @@ uint16_t virtio_xmit_pkts_inorder(void *tx_queue,
> struct rte_mbuf **tx_pkts,
> >  ui

Re: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation

2020-04-27 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, April 27, 2020 4:45 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v2 2/2] vhost: cache gpa to hpa translation
> 
> Hi Marvin,
> 
> On 4/1/20 4:50 PM, Marvin Liu wrote:
> > If Tx zero copy enabled, gpa to hpa mapping table is updated one by
> > one. This will harm performance when guest memory backend using 2M
> > hugepages. Now add cached mapping table which will sorted by using
> > sequence. Address translation will first check cached mapping table,
> > then check unsorted mapping table if no match found.
> >
> > Signed-off-by: Marvin Liu 
> >
> 
> I don't like the approach, as I think it could have nasty effects.
> For example, the system is loaded normally and let's say 25% of the
> pages are used. Then we have a small spike, and buffers that were never
> used start to be used, it will cause writing new entries into the cache
> in the hot path when it is already overloaded. Wouldn't it increase the
> number of packets dropped?
> 
> At set_mem_table time, instead of adding the guest pages unsorted, maybe
> better to add them sorted there. Then you can use a better algorithm
> than linear searching (O(n)), like binary search (O(log n)).
> 

Maxime,
Thanks for input. Previous sorted way is according using sequence, it may cause 
more packets drop if accessing pages sequence varied a lot.
Based on current dpdk and virtio-net implementation, it is unlikely to be 
happened. Anyway, it is not the best choice.
I will use binary search replace current cache solution.

Regards,
Marvin

> Thanks,
> Maxime
>

Re: [dpdk-dev] [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path

2020-04-24 Thread Liu, Yong




> -Original Message-
> From: Liu, Yong
> Sent: Friday, April 24, 2020 9:41 PM
> To: 'Maxime Coquelin' ; Ye, Xiaolong
> ; Wang, Zhihong 
> Cc: dev@dpdk.org; Van Haaren, Harry 
> Subject: RE: [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Friday, April 24, 2020 9:34 PM
> > To: Liu, Yong ; Ye, Xiaolong ;
> > Wang, Zhihong 
> > Cc: dev@dpdk.org; Van Haaren, Harry 
> > Subject: Re: [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path
> >
> >
> >
> > On 4/24/20 3:12 PM, Liu, Yong wrote:
> > >> IIUC, the only difference with the non-vectorized version is the GSO
> > >> support removed here.
> > >> gso_type being in the same cacheline as flags in virtio_net_hdr, I don't
> > >> think checking the performance gain is worth the added maintainance
> > >> effort due to code duplication.
> > >>
> > >> Please prove I'm wrong, otherwise please move virtio_rx_offload() in a
> > >> header and use it here. Alternative if it really imapcts performance is
> > >> to put all the shared code in a dedicated function that can be re-used
> > >> by both implementations.
> > >>
> > > Maxime,
> > > It won't be much performance difference between non-vectorized and
> > vectorized.
> > > The reason to add special vectorized version is for skipping the handling
> of
> > garbage GSO packets.
> > > As all descs have been handled in batch, it is needed to revert when
> found
> > garbage packets.
> > > That will introduce complicated logic in vectorized path.
> >
> 
> Dequeue function will call virtio_discard_rxbuf when found gso info in hdr is
> invalid.
> IMHO, there's no need to check gso info when GSO not negotiated.
> There's an alternative way is that use single function handle GSO packets but
> its performance will be worse than normal function.
> 
> if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
>(hdr->gso_size == 0)) {
>return -EINVAL;
> }
> 

Hi Maxime,
There's about 6% performance drop in loopback case after handling this special 
case in Rx path.
I prefer to keep current implementation. What's your option?

Thanks,
Marvin

> >
> > What do you mean by garbage packet?
> > Is it really good to just ignore such issues?
> >
> > Thanks,
> > Maxime
> >
> > > Regards,
> > > Marvin
> > >

Re: [dpdk-dev] [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path

2020-04-24 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, April 24, 2020 9:36 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Van Haaren, Harry 
> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
> 
> 
> 
> On 4/24/20 3:33 PM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Friday, April 24, 2020 8:30 PM
> >> To: Liu, Yong ; Ye, Xiaolong ;
> >> Wang, Zhihong 
> >> Cc: dev@dpdk.org; Van Haaren, Harry 
> >> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
> >>
> >>
> >>
> >> On 4/24/20 11:24 AM, Marvin Liu wrote:
> >>> Optimize packed ring Tx path alike Rx path. Split Tx path into batch and
> >>
> >> s/alike/like/ ?
> >>
> >>> single Tx functions. Batch function is further optimized by AVX512
> >>> instructions.
> >>>
> >>> Signed-off-by: Marvin Liu 
> >>>
> >>> diff --git a/drivers/net/virtio/virtio_ethdev.h
> >> b/drivers/net/virtio/virtio_ethdev.h
> >>> index 5c112cac7..b7d52d497 100644
> >>> --- a/drivers/net/virtio/virtio_ethdev.h
> >>> +++ b/drivers/net/virtio/virtio_ethdev.h
> >>> @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue,
> >> struct rte_mbuf **rx_pkts,
> >>>  uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf
> >> **rx_pkts,
> >>>   uint16_t nb_pkts);
> >>>
> >>> +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf
> >> **tx_pkts,
> >>> + uint16_t nb_pkts);
> >>> +
> >>>  int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
> >>>
> >>>  void virtio_interrupt_handler(void *param);
> >>> diff --git a/drivers/net/virtio/virtio_rxtx.c
> b/drivers/net/virtio/virtio_rxtx.c
> >>> index cf18fe564..f82fe8d64 100644
> >>> --- a/drivers/net/virtio/virtio_rxtx.c
> >>> +++ b/drivers/net/virtio/virtio_rxtx.c
> >>> @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue
> >> __rte_unused,
> >>>  {
> >>>   return 0;
> >>>  }
> >>> +
> >>> +__rte_weak uint16_t
> >>> +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused,
> >>> + struct rte_mbuf **tx_pkts __rte_unused,
> >>> + uint16_t nb_pkts __rte_unused)
> >>> +{
> >>> + return 0;
> >>> +}
> >>> diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> >> b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> >>> index 8a7b459eb..c023ace4e 100644
> >>> --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> >>> +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> >>> @@ -23,6 +23,24 @@
> >>>  #define PACKED_FLAGS_MASK ((0ULL |
> >> VRING_PACKED_DESC_F_AVAIL_USED) << \
> >>>   FLAGS_BITS_OFFSET)
> >>>
> >>> +/* reference count offset in mbuf rearm data */
> >>> +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
> >>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> >>> +/* segment number offset in mbuf rearm data */
> >>> +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
> >>> + offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> >>> +
> >>> +/* default rearm data */
> >>> +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
> >>> + 1ULL << REFCNT_BITS_OFFSET)
> >>> +
> >>> +/* id bits offset in packed ring desc higher 64bits */
> >>> +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \
> >>> + offsetof(struct vring_packed_desc, len)) * BYTE_SIZE)
> >>> +
> >>> +/* net hdr short size mask */
> >>> +#define NET_HDR_MASK 0x3F
> >>> +
> >>>  #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> >>>   sizeof(struct vring_packed_desc))
> >>>  #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> >>> @@ -47,6 +65,48 @@
> >>>   for (iter = val; iter < num; iter++)
> >>>  #endif
> >>>
> >>> +static inline void
> >>> +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq)
> >>> +{
> >>> + struct vring_packed_desc *desc = vq->vq_pack

Re: [dpdk-dev] [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path

2020-04-24 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, April 24, 2020 9:34 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Van Haaren, Harry 
> Subject: Re: [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/24/20 3:12 PM, Liu, Yong wrote:
> >> IIUC, the only difference with the non-vectorized version is the GSO
> >> support removed here.
> >> gso_type being in the same cacheline as flags in virtio_net_hdr, I don't
> >> think checking the performance gain is worth the added maintainance
> >> effort due to code duplication.
> >>
> >> Please prove I'm wrong, otherwise please move virtio_rx_offload() in a
> >> header and use it here. Alternative if it really imapcts performance is
> >> to put all the shared code in a dedicated function that can be re-used
> >> by both implementations.
> >>
> > Maxime,
> > It won't be much performance difference between non-vectorized and
> vectorized.
> > The reason to add special vectorized version is for skipping the handling of
> garbage GSO packets.
> > As all descs have been handled in batch, it is needed to revert when found
> garbage packets.
> > That will introduce complicated logic in vectorized path.
> 

Dequeue function will call virtio_discard_rxbuf when found gso info in hdr is 
invalid.
IMHO, there's no need to check gso info when GSO not negotiated.
There's an alternative way is that use single function handle GSO packets but 
its performance will be worse than normal function.

if ((hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) ||
 (hdr->gso_size == 0)) {
 return -EINVAL;
}

> 
> What do you mean by garbage packet?
> Is it really good to just ignore such issues?
> 
> Thanks,
> Maxime
> 
> > Regards,
> > Marvin
> >

Re: [dpdk-dev] [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path

2020-04-24 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, April 24, 2020 8:30 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Van Haaren, Harry 
> Subject: Re: [PATCH v9 7/9] net/virtio: add vectorized packed ring Tx path
> 
> 
> 
> On 4/24/20 11:24 AM, Marvin Liu wrote:
> > Optimize packed ring Tx path alike Rx path. Split Tx path into batch and
> 
> s/alike/like/ ?
> 
> > single Tx functions. Batch function is further optimized by AVX512
> > instructions.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> > index 5c112cac7..b7d52d497 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -108,6 +108,9 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue,
> struct rte_mbuf **rx_pkts,
> >  uint16_t virtio_recv_pkts_packed_vec(void *rx_queue, struct rte_mbuf
> **rx_pkts,
> > uint16_t nb_pkts);
> >
> > +uint16_t virtio_xmit_pkts_packed_vec(void *tx_queue, struct rte_mbuf
> **tx_pkts,
> > +   uint16_t nb_pkts);
> > +
> >  int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
> >
> >  void virtio_interrupt_handler(void *param);
> > diff --git a/drivers/net/virtio/virtio_rxtx.c 
> > b/drivers/net/virtio/virtio_rxtx.c
> > index cf18fe564..f82fe8d64 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -2175,3 +2175,11 @@ virtio_recv_pkts_packed_vec(void *rx_queue
> __rte_unused,
> >  {
> > return 0;
> >  }
> > +
> > +__rte_weak uint16_t
> > +virtio_xmit_pkts_packed_vec(void *tx_queue __rte_unused,
> > +   struct rte_mbuf **tx_pkts __rte_unused,
> > +   uint16_t nb_pkts __rte_unused)
> > +{
> > +   return 0;
> > +}
> > diff --git a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > index 8a7b459eb..c023ace4e 100644
> > --- a/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > +++ b/drivers/net/virtio/virtio_rxtx_packed_avx.c
> > @@ -23,6 +23,24 @@
> >  #define PACKED_FLAGS_MASK ((0ULL |
> VRING_PACKED_DESC_F_AVAIL_USED) << \
> > FLAGS_BITS_OFFSET)
> >
> > +/* reference count offset in mbuf rearm data */
> > +#define REFCNT_BITS_OFFSET ((offsetof(struct rte_mbuf, refcnt) - \
> > +   offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +/* segment number offset in mbuf rearm data */
> > +#define SEG_NUM_BITS_OFFSET ((offsetof(struct rte_mbuf, nb_segs) - \
> > +   offsetof(struct rte_mbuf, rearm_data)) * BYTE_SIZE)
> > +
> > +/* default rearm data */
> > +#define DEFAULT_REARM_DATA (1ULL << SEG_NUM_BITS_OFFSET | \
> > +   1ULL << REFCNT_BITS_OFFSET)
> > +
> > +/* id bits offset in packed ring desc higher 64bits */
> > +#define ID_BITS_OFFSET ((offsetof(struct vring_packed_desc, id) - \
> > +   offsetof(struct vring_packed_desc, len)) * BYTE_SIZE)
> > +
> > +/* net hdr short size mask */
> > +#define NET_HDR_MASK 0x3F
> > +
> >  #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> > sizeof(struct vring_packed_desc))
> >  #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> > @@ -47,6 +65,48 @@
> > for (iter = val; iter < num; iter++)
> >  #endif
> >
> > +static inline void
> > +virtio_xmit_cleanup_packed_vec(struct virtqueue *vq)
> > +{
> > +   struct vring_packed_desc *desc = vq->vq_packed.ring.desc;
> > +   struct vq_desc_extra *dxp;
> > +   uint16_t used_idx, id, curr_id, free_cnt = 0;
> > +   uint16_t size = vq->vq_nentries;
> > +   struct rte_mbuf *mbufs[size];
> > +   uint16_t nb_mbuf = 0, i;
> > +
> > +   used_idx = vq->vq_used_cons_idx;
> > +
> > +   if (!desc_is_used(&desc[used_idx], vq))
> > +   return;
> > +
> > +   id = desc[used_idx].id;
> > +
> > +   do {
> > +   curr_id = used_idx;
> > +   dxp = &vq->vq_descx[used_idx];
> > +   used_idx += dxp->ndescs;
> > +   free_cnt += dxp->ndescs;
> > +
> > +   if (dxp->cookie != NULL) {
> > +   mbufs[nb_mbuf] = dxp->cookie;
> > +   dxp->cookie = NULL;
> > +   nb_mbuf++;
> > +   }
> > +
> > +   if (used_idx >= size) {
> > +   used_idx -= size;
> > +   vq->vq_packed.used_wrap_c

Re: [dpdk-dev] [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path

2020-04-24 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, April 24, 2020 7:52 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org; Van Haaren, Harry 
> Subject: Re: [PATCH v9 5/9] net/virtio: add vectorized packed ring Rx path
> 
> 
> 
> On 4/24/20 11:24 AM, Marvin Liu wrote:
> > Optimize packed ring Rx path with SIMD instructions. Solution of
> > optimization is pretty like vhost, is that split path into batch and
> > single functions. Batch function is further optimized by AVX512
> > instructions. Also pad desc extra structure to 16 bytes aligned, thus
> > four elements will be saved in one batch.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > index c9edb84ee..102b1deab 100644
> > --- a/drivers/net/virtio/Makefile
> > +++ b/drivers/net/virtio/Makefile
> > @@ -36,6 +36,41 @@ else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> $(CONFIG_RTE_ARCH_ARM64)),)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> >  endif
> >
> > +ifneq ($(FORCE_DISABLE_AVX512), y)
> > +   CC_AVX512_SUPPORT=\
> > +   $(shell $(CC) -march=native -dM -E - &1 | \
> > +   sed '/./{H;$$!d} ; x ; /AVX512F/!d; /AVX512BW/!d; /AVX512VL/!d' | \
> > +   grep -q AVX512 && echo 1)
> > +endif
> > +
> > +ifeq ($(CC_AVX512_SUPPORT), 1)
> > +CFLAGS += -DCC_AVX512_SUPPORT
> > +SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_packed_avx.c
> > +
> > +ifeq ($(RTE_TOOLCHAIN), gcc)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_GCC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), clang)
> > +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -
> ge 37 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_CLANG_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), icc)
> > +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> > +CFLAGS += -DVIRTIO_ICC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +CFLAGS_virtio_rxtx_packed_avx.o += -mavx512f -mavx512bw -mavx512vl
> > +ifeq ($(shell test $(GCC_VERSION) -ge 100 && echo 1), 1)
> > +CFLAGS_virtio_rxtx_packed_avx.o += -Wno-zero-length-bounds
> > +endif
> > +endif
> > +
> >  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
> > diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build
> > index 15150eea1..8e68c3039 100644
> > --- a/drivers/net/virtio/meson.build
> > +++ b/drivers/net/virtio/meson.build
> > @@ -9,6 +9,20 @@ sources += files('virtio_ethdev.c',
> >  deps += ['kvargs', 'bus_pci']
> >
> >  if arch_subdir == 'x86'
> > +   if '-mno-avx512f' not in machine_args
> > +   if cc.has_argument('-mavx512f') and cc.has_argument('-
> mavx512vl') and cc.has_argument('-mavx512bw')
> > +   cflags += ['-mavx512f', '-mavx512bw', '-mavx512vl']
> > +   cflags += ['-DCC_AVX512_SUPPORT']
> > +   if (toolchain == 'gcc' and
> cc.version().version_compare('>=8.3.0'))
> > +   cflags += '-DVHOST_GCC_UNROLL_PRAGMA'
> > +   elif (toolchain == 'clang' and
> cc.version().version_compare('>=3.7.0'))
> > +   cflags += '-
> DVHOST_CLANG_UNROLL_PRAGMA'
> > +   elif (toolchain == 'icc' and
> cc.version().version_compare('>=16.0.0'))
> > +   cflags += '-DVHOST_ICC_UNROLL_PRAGMA'
> > +   endif
> > +   sources += files('virtio_rxtx_packed_avx.c')
> > +   endif
> > +   endif
> > sources += files('virtio_rxtx_simple_sse.c')
> >  elif arch_subdir == 'ppc'
> > sources += files('virtio_rxtx_simple_altivec.c')
> > diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> > index febaf17a8..5c112cac7 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -105,6 +105,9 @@ uint16_t virtio_xmit_pkts_inorder(void *tx_queue,
> struct rte_mbuf **t

Re: [dpdk-dev] [PATCH v8 2/9] net/virtio: enable vectorized path

2020-04-23 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Thursday, April 23, 2020 4:50 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: Van Haaren, Harry ; dev@dpdk.org
> Subject: Re: [PATCH v8 2/9] net/virtio: enable vectorized path
> 
> 
> 
> On 4/23/20 10:46 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Thursday, April 23, 2020 4:34 PM
> >> To: Liu, Yong ; Ye, Xiaolong ;
> >> Wang, Zhihong 
> >> Cc: Van Haaren, Harry ; dev@dpdk.org
> >> Subject: Re: [PATCH v8 2/9] net/virtio: enable vectorized path
> >>
> >>
> >>
> >> On 4/23/20 2:30 PM, Marvin Liu wrote:
> >>> Previously, virtio split ring vectorized path is enabled as default.
> >>
> >> s/is/was/
> >> s/as/by/
> >>
> >>> This is not suitable for everyone because of that path not follow virtio
> >>
> >> s/because of that path not follow/because that path does not follow the/
> >>
> >>> spec. Add new config for virtio vectorized path selection. By default
> >>> vectorized path is disabled.
> >>
> >> I think we can keep it enabled by default for consistency between make &
> >> meson, now that you are providing a devarg for it that is disabled by
> >> default.
> >>
> >> Maybe we can just drop this config flag, what do you think?
> >>
> >
> > Maxime,
> > Devarg will only have effect on virtio-user path selection, while DPDK
> configuration can affect both virtio pmd and virtio-user.
> > It maybe worth to add new configuration as it can allow user to choice
> whether disabled vectorized path in virtio pmd.
> 
> Ok, so we had a misunderstanding. I was requesting the the devarg to be
> effective also for the Virtio PMD, disabled by default.
> 
Got you, will change in next vesion.

> Thanks,
> Maxime
> > IMHO, AVX512 instructions should be selective in each component.
> >
> > Regards,
> > Marvin
> >
> >> Thanks,
> >> Maxime
> >>
> >>> Signed-off-by: Marvin Liu 
> >>>
> >>> diff --git a/config/common_base b/config/common_base
> >>> index 00d8d0792..334a26a17 100644
> >>> --- a/config/common_base
> >>> +++ b/config/common_base
> >>> @@ -456,6 +456,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> >>>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
> >>>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
> >>>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> >>> +CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR=n
> >>>
> >>>  #
> >>>  # Compile virtio device emulation inside virtio PMD driver
> >>> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> >>> index c9edb84ee..4b69827ab 100644
> >>> --- a/drivers/net/virtio/Makefile
> >>> +++ b/drivers/net/virtio/Makefile
> >>> @@ -28,6 +28,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> >> virtio_rxtx.c
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> >>>
> >>> +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR),y)
> >>>  ifeq ($(CONFIG_RTE_ARCH_X86),y)
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c
> >>>  else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y)
> >>> @@ -35,6 +36,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> >> virtio_rxtx_simple_altivec.c
> >>>  else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> >> $(CONFIG_RTE_ARCH_ARM64)),)
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> >>>  endif
> >>> +endif
> >>>
> >>>  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >>>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> >>> diff --git a/drivers/net/virtio/meson.build
> b/drivers/net/virtio/meson.build
> >>> index 15150eea1..ce3525ef5 100644
> >>> --- a/drivers/net/virtio/meson.build
> >>> +++ b/drivers/net/virtio/meson.build
> >>> @@ -8,6 +8,7 @@ sources += files('virtio_ethdev.c',
> >>>   'virtqueue.c')
> >>>  deps += ['kvargs', 'bus_pci']
> >>>
> >>> +dpdk_conf.set('RTE_LIBRTE_VIRTIO_INC_VECTOR', 1)
> >>>  if arch_subdir == 'x86'
> >>>   sources += files('virtio_rxtx_simple_sse.c')
> >>>  elif arch_subdir == 'ppc'
> >>>
> >

Re: [dpdk-dev] [PATCH v8 2/9] net/virtio: enable vectorized path

2020-04-23 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Thursday, April 23, 2020 4:34 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: Van Haaren, Harry ; dev@dpdk.org
> Subject: Re: [PATCH v8 2/9] net/virtio: enable vectorized path
> 
> 
> 
> On 4/23/20 2:30 PM, Marvin Liu wrote:
> > Previously, virtio split ring vectorized path is enabled as default.
> 
> s/is/was/
> s/as/by/
> 
> > This is not suitable for everyone because of that path not follow virtio
> 
> s/because of that path not follow/because that path does not follow the/
> 
> > spec. Add new config for virtio vectorized path selection. By default
> > vectorized path is disabled.
> 
> I think we can keep it enabled by default for consistency between make &
> meson, now that you are providing a devarg for it that is disabled by
> default.
> 
> Maybe we can just drop this config flag, what do you think?
> 

Maxime, 
Devarg will only have effect on virtio-user path selection, while DPDK 
configuration can affect both virtio pmd and virtio-user.
It maybe worth to add new configuration as it can allow user to choice whether 
disabled vectorized path in virtio pmd.  
IMHO, AVX512 instructions should be selective in each component. 

Regards,
Marvin

> Thanks,
> Maxime
> 
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/config/common_base b/config/common_base
> > index 00d8d0792..334a26a17 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -456,6 +456,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> > +CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR=n
> >
> >  #
> >  # Compile virtio device emulation inside virtio PMD driver
> > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > index c9edb84ee..4b69827ab 100644
> > --- a/drivers/net/virtio/Makefile
> > +++ b/drivers/net/virtio/Makefile
> > @@ -28,6 +28,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> >
> > +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR),y)
> >  ifeq ($(CONFIG_RTE_ARCH_X86),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c
> >  else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y)
> > @@ -35,6 +36,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx_simple_altivec.c
> >  else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> $(CONFIG_RTE_ARCH_ARM64)),)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> >  endif
> > +endif
> >
> >  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> > diff --git a/drivers/net/virtio/meson.build b/drivers/net/virtio/meson.build
> > index 15150eea1..ce3525ef5 100644
> > --- a/drivers/net/virtio/meson.build
> > +++ b/drivers/net/virtio/meson.build
> > @@ -8,6 +8,7 @@ sources += files('virtio_ethdev.c',
> > 'virtqueue.c')
> >  deps += ['kvargs', 'bus_pci']
> >
> > +dpdk_conf.set('RTE_LIBRTE_VIRTIO_INC_VECTOR', 1)
> >  if arch_subdir == 'x86'
> > sources += files('virtio_rxtx_simple_sse.c')
> >  elif arch_subdir == 'ppc'
> >

Re: [dpdk-dev] [PATCH v6 2/9] net/virtio: enable vectorized path

2020-04-22 Thread Liu, Yong




> -Original Message-
> From: Liu, Yong
> Sent: Tuesday, April 21, 2020 2:43 PM
> To: 'Maxime Coquelin' ; Ye, Xiaolong
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v6 2/9] net/virtio: enable vectorized path
> 
> 
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Monday, April 20, 2020 10:08 PM
> > To: Liu, Yong ; Ye, Xiaolong ;
> > Wang, Zhihong 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH v6 2/9] net/virtio: enable vectorized path
> >
> > Hi Marvin,
> >
> > On 4/17/20 12:24 AM, Marvin Liu wrote:
> > > Previously, virtio split ring vectorized path is enabled as default.
> > > This is not suitable for everyone because of that path not follow virtio
> > > spec. Add new config for virtio vectorized path selection. By default
> > > vectorized path is enabled.
> >
> > It should be disabled by default if not following spec. Also, it means
> > it will always be enabled with Meson, which is not acceptable.
> >
> > I think we should have a devarg, so that it is built by default but
> > disabled. User would specify explicitly he wants to enable vector
> > support when probing the device.
> >
> 

Hi Maxime,
There's one new parameter "vectorized" in devarg which allow user specific 
whether enable or disable vectorized path.
By now this parameter depend on RTE_LIBRTE_VIRTIO_INC_VECTOR,  parameter won't 
be used if INC_VECTOR option is disable. 

Regards,
Marvin

> Thanks, Maxime. Will change to disable as default in next version.
> 
> > Thanks,
> > Maxime
> >
> > > Signed-off-by: Marvin Liu 
> > >
> > > diff --git a/config/common_base b/config/common_base
> > > index c31175f9d..5901a94f7 100644
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -449,6 +449,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> > >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
> > >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
> > >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> > > +CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR=y
> > >
> > >  #
> > >  # Compile virtio device emulation inside virtio PMD driver
> > > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > > index efdcb0d93..9ef445bc9 100644
> > > --- a/drivers/net/virtio/Makefile
> > > +++ b/drivers/net/virtio/Makefile
> > > @@ -29,6 +29,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> > virtio_rxtx.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> > >
> > > +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR),y)
> > >  ifeq ($(CONFIG_RTE_ARCH_X86),y)
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c
> > >  else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y)
> > > @@ -36,6 +37,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> > virtio_rxtx_simple_altivec.c
> > >  else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> > $(CONFIG_RTE_ARCH_ARM64)),)
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> > >  endif
> > > +endif
> > >
> > >  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> > >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> > > diff --git a/drivers/net/virtio/meson.build
> > b/drivers/net/virtio/meson.build
> > > index 5e7ca855c..f9619a108 100644
> > > --- a/drivers/net/virtio/meson.build
> > > +++ b/drivers/net/virtio/meson.build
> > > @@ -9,12 +9,14 @@ sources += files('virtio_ethdev.c',
> > >   'virtqueue.c')
> > >  deps += ['kvargs', 'bus_pci']
> > >
> > > -if arch_subdir == 'x86'
> > > - sources += files('virtio_rxtx_simple_sse.c')
> > > -elif arch_subdir == 'ppc'
> > > - sources += files('virtio_rxtx_simple_altivec.c')
> > > -elif arch_subdir == 'arm' and
> > host_machine.cpu_family().startswith('aarch64')
> > > - sources += files('virtio_rxtx_simple_neon.c')
> > > +if dpdk_conf.has('RTE_LIBRTE_VIRTIO_INC_VECTOR')
> > > + if arch_subdir == 'x86'
> > > + sources += files('virtio_rxtx_simple_sse.c')
> > > + elif arch_subdir == 'ppc'
> > > + sources += files('virtio_rxtx_simple_altivec.c')
> > > + elif arch_subdir == 'arm' and
> > host_machine.cpu_family().startswith('aarch64')
> > > + sources += files('virtio_rxtx_simple_neon.c')
> > > + endif
> > >  endif
> > >
> > >  if is_linux
> > >

Re: [dpdk-dev] [PATCH v6 2/9] net/virtio: enable vectorized path

2020-04-20 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Monday, April 20, 2020 10:08 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v6 2/9] net/virtio: enable vectorized path
> 
> Hi Marvin,
> 
> On 4/17/20 12:24 AM, Marvin Liu wrote:
> > Previously, virtio split ring vectorized path is enabled as default.
> > This is not suitable for everyone because of that path not follow virtio
> > spec. Add new config for virtio vectorized path selection. By default
> > vectorized path is enabled.
> 
> It should be disabled by default if not following spec. Also, it means
> it will always be enabled with Meson, which is not acceptable.
> 
> I think we should have a devarg, so that it is built by default but
> disabled. User would specify explicitly he wants to enable vector
> support when probing the device.
> 

Thanks, Maxime. Will change to disable as default in next version.

> Thanks,
> Maxime
> 
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/config/common_base b/config/common_base
> > index c31175f9d..5901a94f7 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -449,6 +449,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
> >  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> > +CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR=y
> >
> >  #
> >  # Compile virtio device emulation inside virtio PMD driver
> > diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> > index efdcb0d93..9ef445bc9 100644
> > --- a/drivers/net/virtio/Makefile
> > +++ b/drivers/net/virtio/Makefile
> > @@ -29,6 +29,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> >
> > +ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_INC_VECTOR),y)
> >  ifeq ($(CONFIG_RTE_ARCH_X86),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_sse.c
> >  else ifeq ($(CONFIG_RTE_ARCH_PPC_64),y)
> > @@ -36,6 +37,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) +=
> virtio_rxtx_simple_altivec.c
> >  else ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM)
> $(CONFIG_RTE_ARCH_ARM64)),)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple_neon.c
> >  endif
> > +endif
> >
> >  ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
> >  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
> > diff --git a/drivers/net/virtio/meson.build
> b/drivers/net/virtio/meson.build
> > index 5e7ca855c..f9619a108 100644
> > --- a/drivers/net/virtio/meson.build
> > +++ b/drivers/net/virtio/meson.build
> > @@ -9,12 +9,14 @@ sources += files('virtio_ethdev.c',
> > 'virtqueue.c')
> >  deps += ['kvargs', 'bus_pci']
> >
> > -if arch_subdir == 'x86'
> > -   sources += files('virtio_rxtx_simple_sse.c')
> > -elif arch_subdir == 'ppc'
> > -   sources += files('virtio_rxtx_simple_altivec.c')
> > -elif arch_subdir == 'arm' and
> host_machine.cpu_family().startswith('aarch64')
> > -   sources += files('virtio_rxtx_simple_neon.c')
> > +if dpdk_conf.has('RTE_LIBRTE_VIRTIO_INC_VECTOR')
> > +   if arch_subdir == 'x86'
> > +   sources += files('virtio_rxtx_simple_sse.c')
> > +   elif arch_subdir == 'ppc'
> > +   sources += files('virtio_rxtx_simple_altivec.c')
> > +   elif arch_subdir == 'arm' and
> host_machine.cpu_family().startswith('aarch64')
> > +   sources += files('virtio_rxtx_simple_neon.c')
> > +   endif
> >  endif
> >
> >  if is_linux
> >

Re: [dpdk-dev] [PATCH] net/virtio: fix crash when device reconnecting

2020-04-18 Thread Liu, Yong

Sorry for missed this question.   The purpose of change function is to skip 
device initialization which is not needed in configuration stage.
When features not matched, can just do feature negotiation in configuration 
stage and do related actions when virtio device start.

Regards,
Marvin

> -Original Message-
> From: Maxime Coquelin 
> Sent: Friday, April 17, 2020 11:18 PM
> To: Liu, Yong ; Ye, Xiaolong 
> Cc: Wang, Zhihong ; dev@dpdk.org; Ding, Xuan
> 
> Subject: Re: [PATCH] net/virtio: fix crash when device reconnecting
> 
> Hi Marvin,
> 
> On 4/15/20 9:30 AM, Liu, Yong wrote:
> >> @@ -2120,7 +2119,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >>
> >>/* if request features changed, reinit the device */
> >>if (req_features != hw->req_guest_features) {
> >> -  ret = virtio_init_device(dev, req_features);
> >> +  ret = virtio_negotiate_features(hw, req_features);
> > Why do we need to change virtio_init_device to virtio_negotiate_features
> > here?
> 
> 
> You missed to reply to that question from Xiaolong.
> 
> Regards,
> Maxime

Re: [dpdk-dev] [PATCH] vhost: remove deferred shadow update

2020-04-15 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Wednesday, April 15, 2020 11:04 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong ; epere...@redhat.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] vhost: remove deferred shadow update
> 
> Hi Marvin,
> 
> On 4/15/20 4:55 PM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin 
> >> Sent: Wednesday, April 15, 2020 10:16 PM
> >> To: Liu, Yong ; Ye, Xiaolong
> ;
> >> Wang, Zhihong ; epere...@redhat.com
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH] vhost: remove deferred shadow update
> >>
> >>
> >>
> >> On 4/1/20 11:29 PM, Marvin Liu wrote:
> >>> Defer shadow ring update will help overall throughput when frontend
> >>> much slower than backend. But that is not all the cases we faced now.
> >>> In case like ovs-dpdk + dpdk virtio user, frontend will much faster
> >>> than backend. Frontend may not be able to collect available descs
> when
> >>> shadow update is deferred. Thus will harm RFC2544 performance.
> >>
> >> I don't understand this comment. What is the difference in term of
> >> performance between Qemu + Virtio PMD and Virtio-User PMD, as the
> >> datapath is the same?
> >>
> >
> > Hi Maxime,
> > The statement is for the different situations between virtio-net + vhost
> pmd and virtio-user + vhost pmd in ovs.
> > When combination is virtio-user + vhost pmd in ovs, frontend will be
> much faster than backend.  Defer used ring update won't give benefit when
> requiring zero packet loss.
> 
> Ok, so you mean Virtio PMD vs. Virtio-net kernel driver.
> 
> Regarding who is faster between Virtio PMD and Vhost PMD, it actually
> depends on what the applications using them are doing.
> 
> If you have OVS on host + testpmd on guest doing IO fowarding, then of
> course the frontent is much faster.
> 
> But if you have testpmd IO forward on host + tespmd MACSWAP forward in
> guest, then the frontend could be slower.
> 
> That looks like a benchmark optimization only.
> 

Maxime,
IMHO, it will be more like performance bug fix. Defer shadow ring update method 
brings performance issue in certain case.

Thanks,
Marvin

> > Regards,
> > Marvin
> >
> >>> Solution is just remove deferred shadow update, which will help
> RFC2544
> >>> and fix potential issue with virtio net driver.
> >>
> >> What is the potential issue?
> >>
> >> Maxime
> >
> > It  is napi stops issue which has been fixed by Eugenio.
> 
> OK, then I would suggest to change the patch title to:
> "vhost: fix shadow update"
> 
> Then explicit the commit message to point to Eugenio's bug, and tag it
> with the proper Fixes tag, so that the patch gets backported to 19.11
> LTS.
> 

Thanks, will do it in next version.

> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v4 6/8] eal/x86: identify AVX512 extensions flag

2020-04-15 Thread Liu, Yong

Thanks for note, David.  Kevin's patch can fully cover this one.  

> -Original Message-
> From: David Marchand 
> Sent: Wednesday, April 15, 2020 9:32 PM
> To: Liu, Yong 
> Cc: Maxime Coquelin ; Ye, Xiaolong
> ; Wang, Zhihong ; Van
> Haaren, Harry ; dev ; Laatz,
> Kevin ; Kinsella, Ray 
> Subject: Re: [dpdk-dev] [PATCH v4 6/8] eal/x86: identify AVX512 extensions
> flag
> 
> On Wed, Apr 15, 2020 at 11:14 AM Marvin Liu  wrote:
> >
> > Read CPUID to check if AVX512 extensions are supported.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_eal/common/arch/x86/rte_cpuflags.c
> b/lib/librte_eal/common/arch/x86/rte_cpuflags.c
> > index 6492df556..54e9f6185 100644
> > --- a/lib/librte_eal/common/arch/x86/rte_cpuflags.c
> > +++ b/lib/librte_eal/common/arch/x86/rte_cpuflags.c
> > @@ -109,6 +109,9 @@ const struct feature_entry rte_cpu_feature_table[]
> = {
> > FEAT_DEF(RTM, 0x0007, 0, RTE_REG_EBX, 11)
> > FEAT_DEF(AVX512F, 0x0007, 0, RTE_REG_EBX, 16)
> > FEAT_DEF(RDSEED, 0x0007, 0, RTE_REG_EBX, 18)
> > +   FEAT_DEF(AVX512CD, 0x0007, 0, RTE_REG_EBX, 28)
> > +   FEAT_DEF(AVX512BW, 0x0007, 0, RTE_REG_EBX, 30)
> > +   FEAT_DEF(AVX512VL, 0x0007, 0, RTE_REG_EBX, 31)
> >
> > FEAT_DEF(LAHF_SAHF, 0x8001, 0, RTE_REG_ECX,  0)
> > FEAT_DEF(LZCNT, 0x8001, 0, RTE_REG_ECX,  4)
> > diff --git a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
> b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
> > index 25ba47b96..5bf99e05f 100644
> > --- a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
> > +++ b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
> > @@ -98,6 +98,9 @@ enum rte_cpu_flag_t {
> > RTE_CPUFLAG_RTM,/**< Transactional memory */
> > RTE_CPUFLAG_AVX512F,/**< AVX512F */
> > RTE_CPUFLAG_RDSEED, /**< RDSEED instruction */
> > +   RTE_CPUFLAG_AVX512CD,   /**< AVX512CD */
> > +   RTE_CPUFLAG_AVX512BW,   /**< AVX512BW */
> > +   RTE_CPUFLAG_AVX512VL,   /**< AVX512VL */
> >
> > /* (EAX 8001h) ECX features */
> > RTE_CPUFLAG_LAHF_SAHF,  /**< LAHF_SAHF */
> 
> This patch most likely breaks the ABI (renumbering flags after
> RTE_CPUFLAG_LAHF_SAHF).
> This change should not go through the virtio tree and is not rebased on
> master.
> A similar patch had been proposed by Kevin:
> http://patchwork.dpdk.org/patch/67438/
> 
> 
> --
> David Marchand

Re: [dpdk-dev] [PATCH] vhost: remove deferred shadow update

2020-04-15 Thread Liu, Yong




> -Original Message-
> From: Maxime Coquelin 
> Sent: Wednesday, April 15, 2020 10:16 PM
> To: Liu, Yong ; Ye, Xiaolong ;
> Wang, Zhihong ; epere...@redhat.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] vhost: remove deferred shadow update
> 
> 
> 
> On 4/1/20 11:29 PM, Marvin Liu wrote:
> > Defer shadow ring update will help overall throughput when frontend
> > much slower than backend. But that is not all the cases we faced now.
> > In case like ovs-dpdk + dpdk virtio user, frontend will much faster
> > than backend. Frontend may not be able to collect available descs when
> > shadow update is deferred. Thus will harm RFC2544 performance.
> 
> I don't understand this comment. What is the difference in term of
> performance between Qemu + Virtio PMD and Virtio-User PMD, as the
> datapath is the same?
> 

Hi Maxime,
The statement is for the different situations between virtio-net + vhost pmd 
and virtio-user + vhost pmd in ovs. 
When combination is virtio-user + vhost pmd in ovs, frontend will be much 
faster than backend.  Defer used ring update won't give benefit when requiring 
zero packet loss. 

Regards,
Marvin

> > Solution is just remove deferred shadow update, which will help RFC2544
> > and fix potential issue with virtio net driver.
> 
> What is the potential issue?
> 
> Maxime

It  is napi stops issue which has been fixed by Eugenio.

Re: [dpdk-dev] [PATCH] net/virtio: fix crash when device reconnecting

2020-04-15 Thread Liu, Yong




> -Original Message-
> From: Ye, Xiaolong 
> Sent: Wednesday, April 15, 2020 3:24 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong
> ; dev@dpdk.org; Ding, Xuan
> 
> Subject: Re: [PATCH] net/virtio: fix crash when device reconnecting
> 
> Hi, Marvin
> 
> On 04/14, Marvin Liu wrote:
> >When doing virtio device initialization, virtqueues will be reset in
> >server mode if ring type is packed. This will cause issue because queues
> >have been freed in the beginning of device initialization.
> >
> >Fix this issue by splitting device initial process and device reinit
> >process. Virt queues won't be freed or realloc in reinit process. Also
> >moved virtio device initialization from configuration to start stage,
> >which can reduce number of reinitialization times.
> >
> >Fixes: 6ebbf4109f35 ("net/virtio-user: fix packed ring server mode")
> 
> I think it also needs to cc stable.
> 

Hi Xiaolong,
Packed ring server mode fix was merged in 20.02 release. So there's no stable 
branch for it. 

Thanks,
Marvin

> >
> >Signed-off-by: Marvin Liu 
> >
> >diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> >index 21570e5cf..8c84bfe91 100644
> >--- a/drivers/net/virtio/virtio_ethdev.c
> >+++ b/drivers/net/virtio/virtio_ethdev.c
> >@@ -1670,7 +1670,9 @@ virtio_configure_intr(struct rte_eth_dev *dev)
> >
> > /* reset device and renegotiate features if needed */
> > static int
> >-virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
> >+virtio_init_device(struct rte_eth_dev *eth_dev,
> >+   uint64_t req_features,
> >+   bool reinit)
> > {
> > struct virtio_hw *hw = eth_dev->data->dev_private;
> > struct virtio_net_config *config;
> >@@ -1681,7 +1683,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev,
> uint64_t req_features)
> > /* Reset the device although not necessary at startup */
> > vtpci_reset(hw);
> >
> >-if (hw->vqs) {
> >+if (hw->vqs && !reinit) {
> > virtio_dev_free_mbufs(eth_dev);
> > virtio_free_queues(hw);
> > }
> >@@ -1794,9 +1796,11 @@ virtio_init_device(struct rte_eth_dev *eth_dev,
> uint64_t req_features)
> > VLAN_TAG_LEN - hw->vtnet_hdr_size;
> > }
> >
> >-ret = virtio_alloc_queues(eth_dev);
> >-if (ret < 0)
> >-return ret;
> >+if (!reinit) {
> >+ret = virtio_alloc_queues(eth_dev);
> >+if (ret < 0)
> >+return ret;
> >+}
> >
> > if (eth_dev->data->dev_conf.intr_conf.rxq) {
> > if (virtio_configure_intr(eth_dev) < 0) {
> >@@ -1925,7 +1929,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
> > rte_spinlock_init(&hw->state_lock);
> >
> > /* reset device and negotiate default features */
> >-ret = virtio_init_device(eth_dev,
> VIRTIO_PMD_DEFAULT_GUEST_FEATURES);
> >+ret = virtio_init_device(eth_dev,
> VIRTIO_PMD_DEFAULT_GUEST_FEATURES,
> >+false);
> > if (ret < 0)
> > goto err_virtio_init;
> >
> >@@ -2091,12 +2096,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> > return -EINVAL;
> > }
> >
> >-if (dev->data->dev_conf.intr_conf.rxq) {
> >-ret = virtio_init_device(dev, hw->req_guest_features);
> >-if (ret < 0)
> >-return ret;
> >-}
> >-
> > if (rxmode->max_rx_pkt_len > hw->max_mtu + ether_hdr_len)
> > req_features &= ~(1ULL << VIRTIO_NET_F_MTU);
> >
> >@@ -2120,7 +2119,7 @@ virtio_dev_configure(struct rte_eth_dev *dev)
> >
> > /* if request features changed, reinit the device */
> > if (req_features != hw->req_guest_features) {
> >-ret = virtio_init_device(dev, req_features);
> >+ret = virtio_negotiate_features(hw, req_features);
> 
> Why do we need to change virtio_init_device to virtio_negotiate_features
> here?
> 
> Thanks,
> Xiaolong
> 
> > if (ret < 0)
> > return ret;
> > }
> >@@ -2235,6 +2234,11 @@ virtio_dev_start(struct rte_eth_dev *dev)
> > struct virtio_hw *hw = dev->data->dev_private;
> > int ret;
> >
> >+/* reinit the device */
> >+ret = virtio_init_device(dev, hw->req_guest_features, true);
> >+if (ret < 0)
> >+return ret;
> >+
> > /* Finish the initialization of the queues */
> > for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > ret = virtio_dev_rx_queue_setup_finish(dev, i);
> >--
> >2.17.1
> >

Re: [dpdk-dev] [PATCH v3 2/7] net/virtio-user: add vectorized packed ring parameter

2020-04-08 Thread Liu, Yong




> -Original Message-
> From: Ye, Xiaolong 
> Sent: Wednesday, April 8, 2020 2:23 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong
> ; Van Haaren, Harry
> ; dev@dpdk.org
> Subject: Re: [PATCH v3 2/7] net/virtio-user: add vectorized packed ring
> parameter
> 
> On 04/08, Marvin Liu wrote:
> >Add new parameter "packed_vec" which can disable vectorized packed
> ring
> >datapath explicitly. When "packed_vec" option is on, driver will check
> >packed ring vectorized datapath prerequisites. If any one of them not
> >matched, vectorized datapath won't be selected.
> >
> >Signed-off-by: Marvin Liu 
> >
> >diff --git a/drivers/net/virtio/virtio_pci.h 
> >b/drivers/net/virtio/virtio_pci.h
> >index 7433d2f08..8103b7a18 100644
> >--- a/drivers/net/virtio/virtio_pci.h
> >+++ b/drivers/net/virtio/virtio_pci.h
> >@@ -251,6 +251,8 @@ struct virtio_hw {
> > uint8_t use_msix;
> > uint8_t modern;
> > uint8_t use_simple_rx;
> >+uint8_t packed_vec_rx;
> >+uint8_t packed_vec_tx;
> > uint8_t use_inorder_rx;
> > uint8_t use_inorder_tx;
> > uint8_t weak_barriers;
> >diff --git a/drivers/net/virtio/virtio_user_ethdev.c
> b/drivers/net/virtio/virtio_user_ethdev.c
> >index e61af4068..399ac5511 100644
> >--- a/drivers/net/virtio/virtio_user_ethdev.c
> >+++ b/drivers/net/virtio/virtio_user_ethdev.c
> >@@ -450,6 +450,8 @@ static const char *valid_args[] = {
> > VIRTIO_USER_ARG_IN_ORDER,
> > #define VIRTIO_USER_ARG_PACKED_VQ  "packed_vq"
> > VIRTIO_USER_ARG_PACKED_VQ,
> >+#define VIRTIO_USER_ARG_PACKED_VEC "packed_vec"
> >+VIRTIO_USER_ARG_PACKED_VEC,
> > NULL
> > };
> >
> >@@ -552,6 +554,8 @@ virtio_user_pmd_probe(struct rte_vdev_device
> *dev)
> > uint64_t mrg_rxbuf = 1;
> > uint64_t in_order = 1;
> > uint64_t packed_vq = 0;
> >+uint64_t packed_vec = 0;
> >+
> > char *path = NULL;
> > char *ifname = NULL;
> > char *mac_addr = NULL;
> >@@ -668,6 +672,15 @@ virtio_user_pmd_probe(struct rte_vdev_device
> *dev)
> > }
> > }
> >
> >+if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_PACKED_VEC) == 1) {
> >+if (rte_kvargs_process(kvlist,
> VIRTIO_USER_ARG_PACKED_VEC,
> >+   &get_integer_arg, &packed_vec) < 0) {
> >+PMD_INIT_LOG(ERR, "error to parse %s",
> >+ VIRTIO_USER_ARG_PACKED_VQ);
> >+goto end;
> >+}
> >+}
> >+
> > if (queues > 1 && cq == 0) {
> > PMD_INIT_LOG(ERR, "multi-q requires ctrl-q");
> > goto end;
> >@@ -705,6 +718,17 @@ virtio_user_pmd_probe(struct rte_vdev_device
> *dev)
> > }
> >
> > hw = eth_dev->data->dev_private;
> >+#if defined(RTE_ARCH_X86) && defined(CC_AVX512_SUPPORT)
> >+if (packed_vec) {
> >+hw->packed_vec_rx = 1;
> >+hw->packed_vec_tx = 1;
> >+}
> >+#else
> >+if (packed_vec)
> >+PMD_INIT_LOG(ERR, "building environment not match
> vectorized "
> >+  "packed ring datapath requirement");
> 
> Minor nit:
> 
> s/not match/doesn't match/
> 
> And better to avoid breaking error message strings across multiple source
> lines.
> It makes it harder to use tools like grep to find errors in source.
> E.g. user uses "vectorized packed ring datapath" to grep the code.
> 
> Thanks,
> Xiaolong
> 

Thanks for remind. Will change in next release.

> >+#endif
> >+
> > if (virtio_user_dev_init(hw->virtio_user_dev, path, queues, cq,
> >  queue_size, mac_addr, &ifname, server_mode,
> >  mrg_rxbuf, in_order, packed_vq) < 0) {
> >@@ -777,4 +801,5 @@
> RTE_PMD_REGISTER_PARAM_STRING(net_virtio_user,
> > "server=<0|1> "
> > "mrg_rxbuf=<0|1> "
> > "in_order=<0|1> "
> >-"packed_vq=<0|1>");
> >+"packed_vq=<0|1>"
> >+"packed_vec=<0|1>");
> >--
> >2.17.1
> >

Re: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation

2020-04-01 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu 
> Sent: Thursday, April 2, 2020 11:05 AM
> To: Liu, Yong ; maxime.coque...@redhat.com; Ye,
> Xiaolong ; Wang, Zhihong
> 
> Cc: dev@dpdk.org; nd ; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation
> 
> Hi Marvin,
> 
> > -----Original Message-
> > From: Liu, Yong 
> > Sent: Wednesday, April 1, 2020 9:01 PM
> > To: Gavin Hu ; maxime.coque...@redhat.com; Ye,
> > Xiaolong ; Wang, Zhihong
> > 
> > Cc: dev@dpdk.org; nd 
> > Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation
> >
> >
> >
> > > -Original Message-
> > > From: Gavin Hu 
> > > Sent: Wednesday, April 1, 2020 6:07 PM
> > > To: Liu, Yong ; maxime.coque...@redhat.com; Ye,
> > > Xiaolong ; Wang, Zhihong
> > > 
> > > Cc: dev@dpdk.org; nd 
> > > Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa
> translation
> > >
> > > Hi Marvin,
> > >
> > > > -Original Message-
> > > > From: dev  On Behalf Of Marvin Liu
> > > > Sent: Wednesday, April 1, 2020 10:50 PM
> > > > To: maxime.coque...@redhat.com; xiaolong...@intel.com;
> > > > zhihong.w...@intel.com
> > > > Cc: dev@dpdk.org; Marvin Liu 
> > > > Subject: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation
> > > >
> > > > If Tx zero copy enabled, gpa to hpa mapping table is updated one by
> > > > one. This will harm performance when guest memory backend using
> 2M
> > > > hugepages. Now add cached mapping table which will sorted by using
> > > > sequence. Address translation will first check cached mapping table,
> > > > then check unsorted mapping table if no match found.
> > > >
> > > > Signed-off-by: Marvin Liu 
> > > >
> > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > > > index 2087d1400..5cb0e83dd 100644
> > > > --- a/lib/librte_vhost/vhost.h
> > > > +++ b/lib/librte_vhost/vhost.h
> > > > @@ -368,7 +368,9 @@ struct virtio_net {
> > > > struct vhost_device_ops const *notify_ops;
> > > >
> > > > uint32_tnr_guest_pages;
> > > > +   uint32_tnr_cached_guest_pages;
> > > > uint32_tmax_guest_pages;
> > > > +   struct guest_page   *cached_guest_pages;
> > > > struct guest_page   *guest_pages;
> > > >
> > > > int slave_req_fd;
> > > > @@ -553,12 +555,25 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t
> > gpa,
> > > > uint64_t size)
> > > >  {
> > > > uint32_t i;
> > > > struct guest_page *page;
> > > > +   uint32_t cached_pages = dev->nr_cached_guest_pages;
> > > > +
> 
> Add a comment here, something like "Firstly look up in the cached pages"?
> 
> > > > +   for (i = 0; i < cached_pages; i++) {
> 
> Should the searching order reversed  here to search the most recent entries?
> 
> > > > +   page = &dev->cached_guest_pages[i];
> > > > +   if (gpa >= page->guest_phys_addr &&
> > > > +   gpa + size < page->guest_phys_addr + 
> > > > page->size) {
> > > > +   return gpa - page->guest_phys_addr +
> > > > +   page->host_phys_addr;
> > > > +   }
> > > > +   }
> > > Sorry, I did not see any speedup with cached guest pages in comparison
> to
> > > the old code below.
> > > Is it not a simple copy?
> > > Is it a better idea to use hash instead to speed up the translation?
> > > /Gavin
> >
> > Hi Gavin,
> > Here just resort the overall mapping table according to using sequence.
> > Most likely virtio driver will reuse recently cycled buffers, thus search 
> > will
> > find match in beginning.
> > That is simple fix for performance enhancement. If use hash for index, it
> will
> > take much more cost in normal case.
> >
> > Regards,
> > Marvin
> 
> There are issues here, the cached table is growing over time, will it become
> less efficient when growing too big and even bigger than the original table
> and overflow happen?
> Is it a good idea to limit

Re: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation

2020-04-01 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu 
> Sent: Wednesday, April 1, 2020 6:07 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Ye,
> Xiaolong ; Wang, Zhihong
> 
> Cc: dev@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation
> 
> Hi Marvin,
> 
> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Wednesday, April 1, 2020 10:50 PM
> > To: maxime.coque...@redhat.com; xiaolong...@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 2/2] vhost: cache gpa to hpa translation
> >
> > If Tx zero copy enabled, gpa to hpa mapping table is updated one by
> > one. This will harm performance when guest memory backend using 2M
> > hugepages. Now add cached mapping table which will sorted by using
> > sequence. Address translation will first check cached mapping table,
> > then check unsorted mapping table if no match found.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 2087d1400..5cb0e83dd 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -368,7 +368,9 @@ struct virtio_net {
> > struct vhost_device_ops const *notify_ops;
> >
> > uint32_tnr_guest_pages;
> > +   uint32_tnr_cached_guest_pages;
> > uint32_tmax_guest_pages;
> > +   struct guest_page   *cached_guest_pages;
> > struct guest_page   *guest_pages;
> >
> > int slave_req_fd;
> > @@ -553,12 +555,25 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa,
> > uint64_t size)
> >  {
> > uint32_t i;
> > struct guest_page *page;
> > +   uint32_t cached_pages = dev->nr_cached_guest_pages;
> > +
> > +   for (i = 0; i < cached_pages; i++) {
> > +   page = &dev->cached_guest_pages[i];
> > +   if (gpa >= page->guest_phys_addr &&
> > +   gpa + size < page->guest_phys_addr + page->size) {
> > +   return gpa - page->guest_phys_addr +
> > +   page->host_phys_addr;
> > +   }
> > +   }
> Sorry, I did not see any speedup with cached guest pages in comparison to
> the old code below.
> Is it not a simple copy?
> Is it a better idea to use hash instead to speed up the translation?
> /Gavin

Hi Gavin,
Here just resort the overall mapping table according to using sequence.  
Most likely virtio driver will reuse recently cycled buffers, thus search will 
find match in beginning. 
That is simple fix for performance enhancement. If use hash for index, it will 
take much more cost in normal case.

Regards,
Marvin 


> >
> > for (i = 0; i < dev->nr_guest_pages; i++) {
> > page = &dev->guest_pages[i];
> >
> > if (gpa >= page->guest_phys_addr &&
> > gpa + size < page->guest_phys_addr + page->size) {
> > +   rte_memcpy(&dev-
> > >cached_guest_pages[cached_pages],
> > +  page, sizeof(struct guest_page));
> > +   dev->nr_cached_guest_pages++;
> > return gpa - page->guest_phys_addr +
> >page->host_phys_addr;
> > }
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 79fcb9d19..1bae1fddc 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -192,7 +192,9 @@ vhost_backend_cleanup(struct virtio_net *dev)
> > }
> >
> > rte_free(dev->guest_pages);
> > +   rte_free(dev->cached_guest_pages);
> > dev->guest_pages = NULL;
> > +   dev->cached_guest_pages = NULL;
> >
> > if (dev->log_addr) {
> > munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
> > @@ -898,7 +900,7 @@ add_one_guest_page(struct virtio_net *dev,
> > uint64_t guest_phys_addr,
> >uint64_t host_phys_addr, uint64_t size)
> >  {
> > struct guest_page *page, *last_page;
> > -   struct guest_page *old_pages;
> > +   struct guest_page *old_pages, *old_cached_pages;
> >
> > if (dev->nr_guest_pages == dev->max_guest_pages) {
> > dev->max_guest_pages *= 2;
> > @@ -906,9 +908,19 @@ add_one_guest_page(struct virtio_net *dev,
> > uint64_t guest_phys_addr,
> >

Re: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to accelerate Tx operations

2020-03-17 Thread Liu, Yong




> -Original Message-
> From: Hu, Jiayu 
> Sent: Tuesday, March 17, 2020 5:31 PM
> To: Liu, Yong ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> Wang, Zhihong 
> Subject: RE: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to
> accelerate Tx operations
> 
> Hi Marvin,
> 
> Thanks for comments. Replies are inline.
> 
> > -Original Message-
> > From: Liu, Yong 
> > Sent: Tuesday, March 17, 2020 3:21 PM
> > To: Hu, Jiayu ; dev@dpdk.org
> > Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> > Wang, Zhihong ; Hu, Jiayu
> 
> > Subject: RE: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to
> > accelerate Tx operations
> >
> > Hi Jiayu,
> > Some comments are inline.
> >
> > Thanks,
> > Marvin
> >
> > > -Original Message-
> > > From: dev  On Behalf Of Jiayu Hu
> > > Sent: Tuesday, March 17, 2020 5:21 PM
> > > To: dev@dpdk.org
> > > Cc: maxime.coque...@redhat.com; Ye, Xiaolong
> ;
> > > Wang, Zhihong ; Hu, Jiayu
> 
> > > Subject: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to
> > > accelerate Tx operations
> > >
> > >
> > >  int vhost_logtype;
> > > @@ -30,8 +34,12 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
> > >  #define ETH_VHOST_IOMMU_SUPPORT  "iommu-support"
> > >  #define ETH_VHOST_POSTCOPY_SUPPORT   "postcopy-support"
> > >  #define ETH_VHOST_VIRTIO_NET_F_HOST_TSO "tso"
> > > +#define ETH_VHOST_DMA_ARG"dmas"
> > >  #define VHOST_MAX_PKT_BURST 32
> > >
> > > +/* ring size of I/OAT */
> > > +#define IOAT_RING_SIZE 1024
> > > +
> >
> > Jiayu,
> > Configured I/OAT ring size is 1024 here, but do not see in_flight or
> > nr_batching size check in enqueue function.
> > Is there any possibility that IOAT ring exhausted?
> 
> We will wait for IOAT's copy completion, when its ring is full.
> This is to guarantee that all enqueue to IOAT can success.
> 
> > > +struct dma_info_input {
> > > + struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
> > > + uint16_t nr;
> > > +};
> > > +
> > > +static inline int
> > > +open_dma(const char *key __rte_unused, const char *value, void
> > > *extra_args)
> > > +{
> > > + struct dma_info_input *dma_info = extra_args;
> > > + char *input = strndup(value, strlen(value) + 1);
> > > + char *addrs = input;
> > > + char *ptrs[2];
> > > + char *start, *end, *substr;
> > > + int64_t qid, vring_id;
> > > + struct rte_ioat_rawdev_config config;
> > > + struct rte_rawdev_info info = { .dev_private = &config };
> > > + char name[32];
> > > + int dev_id;
> > > + int ret = 0;
> > > +
> > > + while (isblank(*addrs))
> > > + addrs++;
> > > + if (addrs == '\0') {
> > > + VHOST_LOG(ERR, "No input DMA addresses\n");
> > > + ret = -1;
> > > + goto out;
> > > + }
> > > +
> > > + /* process DMA devices within bracket. */
> > > + addrs++;
> > > + substr = strtok(addrs, ";]");
> > > + if (!substr) {
> > > + VHOST_LOG(ERR, "No input DMA addresse\n");
> > > + ret = -1;
> > > + goto out;
> > > + }
> > > +
> > > + do {
> > > + rte_strsplit(substr, strlen(substr), ptrs, 2, '@');
> > > +
> > Function rte_strsplit can be failed. Need to check return value.
> 
> Thanks. Will check it later.
> 
> >
> > > + start = strstr(ptrs[0], "txq");
> > > + if (start == NULL) {
> > > + VHOST_LOG(ERR, "Illegal queue\n");
> > > + ret = -1;
> > > + goto out;
> > > + }
> > > +
> > > + start += 3;
> >
> > It's better not use hardcode value.
> >
> > > + qid = strtol(start, &end, 0);
> > > + if (end == start) {
> > > + VHOST_LOG(ERR, "No input queue ID\n");
> > > + ret = -1;
> > > + goto out;
> > > + }
> > > +
> > > + vring_id = qid * 2 + VIRTIO_RXQ;
> > > + if (rte_pci_addr_parse(ptrs[1],
> > > +&dma_info-

Re: [dpdk-dev] [PATCH 2/4] net/vhost: setup vrings for DMA-accelerated datapath

2020-03-17 Thread Liu, Yong




> -Original Message-
> From: Hu, Jiayu 
> Sent: Tuesday, March 17, 2020 5:36 PM
> To: Liu, Yong ; dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> Wang, Zhihong 
> Subject: RE: [dpdk-dev] [PATCH 2/4] net/vhost: setup vrings for DMA-
> accelerated datapath
> 
> Hi Marvin,
> 
> > -----Original Message-
> > From: Liu, Yong 
> > Sent: Tuesday, March 17, 2020 2:30 PM
> > To: Hu, Jiayu ; dev@dpdk.org
> > Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> > Wang, Zhihong ; Hu, Jiayu
> 
> > Subject: RE: [dpdk-dev] [PATCH 2/4] net/vhost: setup vrings for DMA-
> > accelerated datapath
> >
> >
> >
> > > +
> > > +struct guest_page {
> > > + uint64_t guest_phys_addr;
> > > + uint64_t host_phys_addr;
> > > + uint64_t size;
> > > +};
> > > +
> > > +struct dma_vring {
> > > + struct rte_vhost_vring  vr;
> > > +
> > > + uint16_t last_avail_idx;
> > > + uint16_t last_used_idx;
> > > +
> > > + /* the last used index that front end can consume */
> > > + uint16_t copy_done_used;
> > > +
> > > + uint16_t signalled_used;
> > > + bool signalled_used_valid;
> > > +
> > > + struct vring_used_elem *shadow_used_split;
> > > + uint16_t shadow_used_idx;
> > > +
> > > + struct batch_copy_elem  *batch_copy_elems;
> > > + uint16_t batch_copy_nb_elems;
> > > +
> > > + bool dma_enabled;
> > > + /**
> > > +  * DMA ID. Currently, we only support I/OAT,
> > > +  * so it's I/OAT rawdev ID.
> > > +  */
> > > + uint16_t dev_id;
> > > + /* DMA address */
> > > + struct rte_pci_addr dma_addr;
> > > + /**
> > > +  * the number of copy jobs that are submitted to the DMA
> > > +  * but may not be completed.
> > > +  */
> > > + uint64_t nr_inflight;
> > > + int nr_batching;
> >
> > Look like nr_batching can't be negative value, please changed to uint16_t
> or
> > uint32_t.
> 
> Thanks, will change it later.
> 
> > > diff --git a/drivers/net/vhost/virtio_net.h
> b/drivers/net/vhost/virtio_net.h
> > > new file mode 100644
> > > index 000..7f99f1d
> > > --- /dev/null
> > > +++ b/drivers/net/vhost/virtio_net.h
> > > @@ -0,0 +1,168 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2020 Intel Corporation
> > > + */
> > > +#ifndef _VIRTIO_NET_H_
> > > +#define _VIRTIO_NET_H_
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "internal.h"
> > > +
> > > +static uint64_t
> > > +get_blk_size(int fd)
> > > +{
> > > + struct stat stat;
> > > + int ret;
> > > +
> > > + ret = fstat(fd, &stat);
> > > + return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize;
> > > +}
> > > +
> > > +static __rte_always_inline int
> > > +add_one_guest_page(struct pmd_internal *dev, uint64_t
> > guest_phys_addr,
> > > +uint64_t host_phys_addr, uint64_t size)
> >
> > Jiayu,
> > We have same set of functions for gpa to hpa translation in vhost library.
> Can
> > those functions be shared here?
> 
> Do you think it's necessary to provide a API for translating GPA to HPA?
> 

IMHO, these functions are common requirement for accelerators.  It is worthy to 
think about it.

> >
> > Thanks,
> > Marvin
> >

Re: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to accelerate Tx operations

2020-03-17 Thread Liu, Yong

Hi Jiayu,
Some comments are inline.

Thanks,
Marvin

> -Original Message-
> From: dev  On Behalf Of Jiayu Hu
> Sent: Tuesday, March 17, 2020 5:21 PM
> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> Wang, Zhihong ; Hu, Jiayu 
> Subject: [dpdk-dev] [PATCH 3/4] net/vhost: leverage DMA engines to
> accelerate Tx operations
> 
> This patch accelerates large data movement in Tx operations via DMA
> engines, like I/OAT, the DMA engine in Intel's processors.
> 
> Large copies are offloaded from the CPU to the DMA engine in an
> asynchronous manner. The CPU just submits copy jobs to the DMA engine
> and without waiting for DMA copy completion; there is no CPU intervention
> during DMA data transfer. By overlapping CPU computation and DMA copy,
> we can save precious CPU cycles and improve the overall throughput for
> vhost-user PMD based applications, like OVS. Due to startup overheads
> associated with DMA engines, small copies are performed by the CPU.
> 
> Note that vhost-user PMD can support various DMA engines, but it just
> supports I/OAT devices currently. In addition, I/OAT acceleration
> is only enabled for split rings.
> 
> DMA devices used by queues are assigned by users; for a queue without
> assigning a DMA device, the PMD will leverages librte_vhost to perform
> Tx operations. A queue can only be assigned one I/OAT device, and
> an I/OAT device can only be used by one queue.
> 
> We introduce a new vdev parameter to enable DMA acceleration for Tx
> operations of queues:
>  - dmas: This parameter is used to specify the assigned DMA device of
>a queue.
> Here is an example:
>  $ ./testpmd -c f -n 4 \
>--vdev 'net_vhost0,iface=/tmp/s0,queues=1,dmas=[txq0@00:04.0]'
> 
> Signed-off-by: Jiayu Hu 
> ---
>  drivers/net/vhost/Makefile|   2 +-
>  drivers/net/vhost/internal.h  |  19 +
>  drivers/net/vhost/meson.build |   2 +-
>  drivers/net/vhost/rte_eth_vhost.c | 252 -
>  drivers/net/vhost/virtio_net.c| 742
> ++
>  drivers/net/vhost/virtio_net.h| 120 ++
>  6 files changed, 1120 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> index 19cae52..87dfb14 100644
> --- a/drivers/net/vhost/Makefile
> +++ b/drivers/net/vhost/Makefile
> @@ -11,7 +11,7 @@ LIB = librte_pmd_vhost.a
>  LDLIBS += -lpthread
>  LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool -lrte_ring
>  LDLIBS += -lrte_ethdev -lrte_net -lrte_kvargs -lrte_vhost
> -LDLIBS += -lrte_bus_vdev
> +LDLIBS += -lrte_bus_vdev -lrte_rawdev_ioat
> 
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS)
> diff --git a/drivers/net/vhost/internal.h b/drivers/net/vhost/internal.h
> index 7588fdf..f19ed7a 100644
> --- a/drivers/net/vhost/internal.h
> +++ b/drivers/net/vhost/internal.h
> @@ -20,6 +20,8 @@ extern int vhost_logtype;
>  #define VHOST_LOG(level, ...) \
>   rte_log(RTE_LOG_ ## level, vhost_logtype, __VA_ARGS__)
> 
> +typedef int (*process_dma_done_fn)(void *dev, void *dma_vr);
> +
>  enum vhost_xstats_pkts {
>   VHOST_UNDERSIZE_PKT = 0,
>   VHOST_64_PKT,
> @@ -96,6 +98,11 @@ struct dma_vring {
>* used by the DMA.
>*/
>   phys_addr_t used_idx_hpa;
> +
> + struct ring_index *indices;
> + uint16_t max_indices;
> +
> + process_dma_done_fn dma_done_fn;
>  };
> 
>  struct vhost_queue {
> @@ -110,6 +117,13 @@ struct vhost_queue {
>   struct dma_vring *dma_vring;
>  };
> 
> +struct dma_info {
> + process_dma_done_fn dma_done_fn;
> + struct rte_pci_addr addr;
> + uint16_t dev_id;
> + bool is_valid;
> +};
> +
>  struct pmd_internal {
>   rte_atomic32_t dev_attached;
>   char *iface_name;
> @@ -132,6 +146,11 @@ struct pmd_internal {
>   /* negotiated features */
>   uint64_t features;
>   size_t hdr_len;
> + bool vring_setup_done;
> + bool guest_mem_populated;
> +
> + /* User-assigned DMA information */
> + struct dma_info dmas[RTE_MAX_QUEUES_PER_PORT * 2];
>  };
> 
>  #ifdef __cplusplus
> diff --git a/drivers/net/vhost/meson.build b/drivers/net/vhost/meson.build
> index b308dcb..af3c640 100644
> --- a/drivers/net/vhost/meson.build
> +++ b/drivers/net/vhost/meson.build
> @@ -6,4 +6,4 @@ reason = 'missing dependency, DPDK vhost library'
>  sources = files('rte_eth_vhost.c',
>   'virtio_net.c')
>  install_headers('rte_eth_vhost.h')
> -deps += 'vhost'
> +deps += ['vhost', 'rawdev']
> diff --git a/drivers/net/vhost/rte_eth_vhost.c
> b/drivers/net/vhost/rte_eth_vhost.c
> index b5c927c..9faaa02 100644
> --- a/drivers/net/vhost/rte_eth_vhost.c
> +++ b/drivers/net/vhost/rte_eth_vhost.c
> @@ -15,8 +15,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> 
>  #include "internal.h"
> +#include "virtio_net.h"
>  #include "rte_eth_vhost.h"
> 
>  int vhost_logtype;
> @@ -30,8 +34,12 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>  #define ETH_VHOST_IOMMU_SUPPORT  "iommu-support

Re: [dpdk-dev] [PATCH 2/4] net/vhost: setup vrings for DMA-accelerated datapath

2020-03-16 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Jiayu Hu
> Sent: Tuesday, March 17, 2020 5:21 PM
> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Ye, Xiaolong ;
> Wang, Zhihong ; Hu, Jiayu 
> Subject: [dpdk-dev] [PATCH 2/4] net/vhost: setup vrings for DMA-
> accelerated datapath
> 
> This patch gets vrings' addresses and sets up GPA and HPA mappings
> for offloading large data movement from the CPU to DMA engines in
> vhost-user PMD.
> 
> Signed-off-by: Jiayu Hu 
> ---
>  drivers/Makefile  |   2 +-
>  drivers/net/vhost/Makefile|   4 +-
>  drivers/net/vhost/internal.h  | 141
> 
>  drivers/net/vhost/meson.build |   3 +-
>  drivers/net/vhost/rte_eth_vhost.c |  56 +
>  drivers/net/vhost/virtio_net.c| 119 +++
>  drivers/net/vhost/virtio_net.h| 168
> ++
>  7 files changed, 438 insertions(+), 55 deletions(-)
>  create mode 100644 drivers/net/vhost/internal.h
>  create mode 100644 drivers/net/vhost/virtio_net.c
>  create mode 100644 drivers/net/vhost/virtio_net.h
> 
> diff --git a/drivers/Makefile b/drivers/Makefile
> index c70bdf9..8555ddd 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -9,7 +9,7 @@ DEPDIRS-bus := common
>  DIRS-y += mempool
>  DEPDIRS-mempool := common bus
>  DIRS-y += net
> -DEPDIRS-net := common bus mempool
> +DEPDIRS-net := common bus mempool raw
>  DIRS-$(CONFIG_RTE_LIBRTE_BBDEV) += baseband
>  DEPDIRS-baseband := common bus mempool
>  DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += crypto
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> index 0461e29..19cae52 100644
> --- a/drivers/net/vhost/Makefile
> +++ b/drivers/net/vhost/Makefile
> @@ -15,13 +15,15 @@ LDLIBS += -lrte_bus_vdev
> 
>  CFLAGS += -O3
>  CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -fno-strict-aliasing
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> 
>  EXPORT_MAP := rte_pmd_vhost_version.map
> 
>  #
>  # all source are stored in SRCS-y
>  #
> -SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += rte_eth_vhost.c virtio_net.c
> 
>  #
>  # Export include files
> diff --git a/drivers/net/vhost/internal.h b/drivers/net/vhost/internal.h
> new file mode 100644
> index 000..7588fdf
> --- /dev/null
> +++ b/drivers/net/vhost/internal.h
> @@ -0,0 +1,141 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +#ifndef _INTERNAL_H_
> +#define _INTERNAL_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +extern int vhost_logtype;
> +
> +#define VHOST_LOG(level, ...) \
> + rte_log(RTE_LOG_ ## level, vhost_logtype, __VA_ARGS__)
> +
> +enum vhost_xstats_pkts {
> + VHOST_UNDERSIZE_PKT = 0,
> + VHOST_64_PKT,
> + VHOST_65_TO_127_PKT,
> + VHOST_128_TO_255_PKT,
> + VHOST_256_TO_511_PKT,
> + VHOST_512_TO_1023_PKT,
> + VHOST_1024_TO_1522_PKT,
> + VHOST_1523_TO_MAX_PKT,
> + VHOST_BROADCAST_PKT,
> + VHOST_MULTICAST_PKT,
> + VHOST_UNICAST_PKT,
> + VHOST_ERRORS_PKT,
> + VHOST_ERRORS_FRAGMENTED,
> + VHOST_ERRORS_JABBER,
> + VHOST_UNKNOWN_PROTOCOL,
> + VHOST_XSTATS_MAX,
> +};
> +
> +struct vhost_stats {
> + uint64_t pkts;
> + uint64_t bytes;
> + uint64_t missed_pkts;
> + uint64_t xstats[VHOST_XSTATS_MAX];
> +};
> +
> +struct batch_copy_elem {
> + void *dst;
> + void *src;
> + uint32_t len;
> +};
> +
> +struct guest_page {
> + uint64_t guest_phys_addr;
> + uint64_t host_phys_addr;
> + uint64_t size;
> +};
> +
> +struct dma_vring {
> + struct rte_vhost_vring  vr;
> +
> + uint16_t last_avail_idx;
> + uint16_t last_used_idx;
> +
> + /* the last used index that front end can consume */
> + uint16_t copy_done_used;
> +
> + uint16_t signalled_used;
> + bool signalled_used_valid;
> +
> + struct vring_used_elem *shadow_used_split;
> + uint16_t shadow_used_idx;
> +
> + struct batch_copy_elem  *batch_copy_elems;
> + uint16_t batch_copy_nb_elems;
> +
> + bool dma_enabled;
> + /**
> +  * DMA ID. Currently, we only support I/OAT,
> +  * so it's I/OAT rawdev ID.
> +  */
> + uint16_t dev_id;
> + /* DMA address */
> + struct rte_pci_addr dma_addr;
> + /**
> +  * the number of copy jobs that are submitted to the DMA
> +  * but may not be completed.
> +  */
> + uint64_t nr_inflight;
> + int nr_batching;

Look like nr_batching can't be negative value, please changed to uint16_t or 
uint32_t. 

> +
> + /**
> +  * host physical address of used ring index,
> +  * used by the DMA.
> +  */
> + phys_addr_t used_idx_hpa;
> +};
> +
> +struct vhost_queue {
> + int vid;
> + rte_atomic32_t allow_queuing;
> + rte_atomic32_t while_queuing;
> + struct pmd_internal *internal;
> + struct rte_mempool *mb_po

Re: [dpdk-dev] [PATCH] vhost: cache guest/vhost physical address mapping

2020-03-16 Thread Liu, Yong

Thanks, xiaolong. 

> -Original Message-
> From: Ye, Xiaolong 
> Sent: Monday, March 16, 2020 9:48 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong
> ; dev@dpdk.org
> Subject: Re: [PATCH] vhost: cache guest/vhost physical address mapping
> 
> Hi, Marvin
> 
> On 03/16, Marvin Liu wrote:
> >If Tx zero copy enabled, gpa to hpa mapping table is updated one by
> >one. This will harm performance when guest memory backend using 2M
> >hugepages. Now add cached mapping table which will sorted by using
> >sequence. Address translation will first check cached mapping table,
> >now performance is back.
> >
> >Signed-off-by: Marvin Liu 
> >
> >diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> >index 2087d1400..de2c09e7e 100644
> >--- a/lib/librte_vhost/vhost.h
> >+++ b/lib/librte_vhost/vhost.h
> >@@ -368,7 +368,9 @@ struct virtio_net {
> > struct vhost_device_ops const *notify_ops;
> >
> > uint32_tnr_guest_pages;
> >+uint32_tnr_cached;
> 
> What about naming it nr_cached_guest_pages to make it more self-
> explanatory
> as nr_cached is too generic?

Agreed, naming is too generic. Will be changed in next version.

> 
> > uint32_tmax_guest_pages;
> >+struct guest_page   *cached_guest_pages;
> > struct guest_page   *guest_pages;
> >
> > int slave_req_fd;
> >@@ -554,11 +556,23 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa,
> uint64_t size)
> > uint32_t i;
> > struct guest_page *page;
> >
> >+for (i = 0; i < dev->nr_cached; i++) {
> >+page = &dev->cached_guest_pages[i];
> >+if (gpa >= page->guest_phys_addr &&
> >+gpa + size < page->guest_phys_addr + page->size) {
> >+return gpa - page->guest_phys_addr +
> >+page->host_phys_addr;
> >+}
> >+}
> >+
> > for (i = 0; i < dev->nr_guest_pages; i++) {
> > page = &dev->guest_pages[i];
> >
> > if (gpa >= page->guest_phys_addr &&
> > gpa + size < page->guest_phys_addr + page->size) {
> >+rte_memcpy(&dev->cached_guest_pages[dev-
> >nr_cached],
> >+   page, sizeof(struct guest_page));
> >+dev->nr_cached++;
> > return gpa - page->guest_phys_addr +
> >page->host_phys_addr;
> > }
> >diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >index bd1be0104..573e99066 100644
> >--- a/lib/librte_vhost/vhost_user.c
> >+++ b/lib/librte_vhost/vhost_user.c
> >@@ -192,7 +192,9 @@ vhost_backend_cleanup(struct virtio_net *dev)
> > }
> >
> > free(dev->guest_pages);
> >+free(dev->cached_guest_pages);
> > dev->guest_pages = NULL;
> >+dev->cached_guest_pages = NULL;
> >
> > if (dev->log_addr) {
> > munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
> >@@ -905,7 +907,10 @@ add_one_guest_page(struct virtio_net *dev,
> uint64_t guest_phys_addr,
> > old_pages = dev->guest_pages;
> > dev->guest_pages = realloc(dev->guest_pages,
> > dev->max_guest_pages *
> sizeof(*page));
> >-if (!dev->guest_pages) {
> >+dev->cached_guest_pages = realloc(dev-
> >cached_guest_pages,
> >+dev->max_guest_pages *
> sizeof(*page));
> >+dev->nr_cached = 0;
> >+if (!dev->guest_pages || !dev->cached_guest_pages) {
> 
> Better to compare pointer to NULL according to DPDK's coding style.
> 
 OK, will change it.
> > VHOST_LOG_CONFIG(ERR, "cannot realloc
> guest_pages\n");
> > free(old_pages);
> > return -1;
> >@@ -1075,6 +1080,18 @@ vhost_user_set_mem_table(struct virtio_net
> **pdev, struct VhostUserMsg *msg,
> > }
> > }
> >
> 
> Do we need initialize dev->nr_cached to 0 explicitly here?
> 

Structure vhost_virtqueue has been cleared in function init_vring_queue, it is 
not needed to do initialization in other place.

> >+if (!dev->cached_guest_pages) {
> >+dev->cached

Re: [dpdk-dev] [PATCH] vhost: fix zmbuf buffer id invalid

2020-02-23 Thread Liu, Yong

Thanks, xiaolong & maxime. Commits log has been fixed in v2.

> -Original Message-
> From: Ye, Xiaolong 
> Sent: Monday, February 24, 2020 3:26 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; dev@dpdk.org; Bie, Tiwei
> ; Wang, Zhihong ;
> sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: fix zmbuf buffer id invalid
> 
> For the subject, what about:
> 
> vhost: fix invalid zmbuf buffer id
> 
> On 02/24, Marvin Liu wrote:
> >zc mbufs should record available buffer id when doing dequeue zcopy.
> >There's no guarantee that local queue avail index equal to buffer index.
> 
> s/equal to/is equal to
> 
> >
> >Fixes: d1eafb532268 ("vhost: add packed ring zcopy batch and single
> dequeue")
> >Cc: sta...@dpdk.org
> >
> >Signed-off-by: Marvin Liu 
> >Reported-by: Yinan Wang 
> >
> >diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> >index 37c47c7dc..210415904 100644
> >--- a/lib/librte_vhost/virtio_net.c
> >+++ b/lib/librte_vhost/virtio_net.c
> >@@ -2004,7 +2004,7 @@ virtio_dev_tx_batch_packed_zmbuf(struct virtio_net
> *dev,
> >
> > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > zmbufs[i]->mbuf = pkts[i];
> >-zmbufs[i]->desc_idx = avail_idx + i;
> >+zmbufs[i]->desc_idx = ids[i];
> > zmbufs[i]->desc_count = 1;
> > }
> >
> >@@ -2045,7 +2045,7 @@ virtio_dev_tx_single_packed_zmbuf(struct
> virtio_net *dev,
> > return -1;
> > }
> > zmbuf->mbuf = *pkts;
> >-zmbuf->desc_idx = vq->last_avail_idx;
> >+zmbuf->desc_idx = buf_id;
> > zmbuf->desc_count = desc_count;
> >
> > rte_mbuf_refcnt_update(*pkts, 1);
> >--
> >2.17.1
> >
> 
> Apart from above,
> 
> Reviewed-by: Xiaolong Ye

Re: [dpdk-dev] [Bug 383] dpdk virtio_user lack of notifications make vhost_net+napi stops tx buffers

2020-01-14 Thread Liu, Yong



> -Original Message-
> From: dev  On Behalf Of epere...@redhat.com
> Sent: Thursday, January 09, 2020 11:56 PM
> To: bugzi...@dpdk.org; dev@dpdk.org; Maxime Coquelin 
> Cc: Jason Wang ; Michael S. Tsirkin ;
> Adrian Moreno Zapata 
> Subject: Re: [dpdk-dev] [Bug 383] dpdk virtio_user lack of notifications
> make vhost_net+napi stops tx buffers
> 
> Proposal for patch - Requesting For Comments.
> 
> Just running the shadow copy-flush-call unconditionally in
> vhost_flush_dequeue_packed solve the issue, and it gives the best
> latency I can get in the tests (8101.959 trans/sec if I run netperf
> TCP_RR, 1331.735 trans/sec if I run TCP_STREAM at the same time). Apart
> from that, testpmd is able to tx about 820Kpps to the guest.
> 

Hi Eugenio,

Shadow method is aimed to maximum the throughput. In our experimental, there's 
no 
clear performance gain when shadowed size over half of the ring size (eg.256).

Unconditional do shadow flush will harm the performance a lot with virtio-user 
frontend.
Checking next descriptors will has less impact in performance, I prefer this 
solution.

Thanks,
Marvin


> However, to still do a little bit of batching I replace the condition
> for the one attached here. Although it implies a read barrier, I am
> able to achieve a little more of throughput (about 890Kpps), reducing
> to 8048.919 the numbers of transactions/sec in TCP_RR test (1372.327 if
> it runs in parallel with TCP_STREAM).
> 
> I also tried to move the vhost_flush_dequeue_shadow_packed and
> host_flush_dequeue_shadow_packed after the do_data_copy_dequeue in
> virtio_dev_tx_packed, more or less the same way virtio_dev_rx_packed
> do, but I repeatedly find less throughput in this case, even if I add
> the !next_desc_is_avail(vq) test. Not sure why, since both ways should
> be very similar. About 836Kpps are achieved this way, and TCP_RR is
> able to do 8120.154 trans/sec by itself and 1363.341 trans/sec if it
> runs with another TCP_STREAM test in parallel.
> 
> So, is there room for improvement, either in the patches or in the
> tests? Is one of the solutions preferred over another?
> 
> All tests were run with:
> * producer in a different processor than consumer (host testpmd and VM
> never run in the same core)
> * 256 descriptors queues in guest's testpmd and tx vq
> 
> Thanks!
> 
> PS: Sorry for the from mail address change, DPDK bugzilla doesn't send
> me the confirmation mail to this account.
> 
> diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> index 21c311732..f7137149c 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -382,6 +382,20 @@ vhost_shadow_enqueue_single_packed(struct
> virtio_net *dev,
> }
>  }
> 
> +static __rte_always_inline bool
> +next_desc_is_avail(const struct vhost_virtqueue *vq)
> +{
> +   bool wrap_counter = vq->avail_wrap_counter;
> +   uint16_t next_used_idx = vq->last_used_idx + 1;
> +
> +   if (next_used_idx >= vq->size) {
> +   next_used_idx -= vq->size;
> +   wrap_counter ^= 1;
> +   }
> +
> +   return desc_is_avail(&vq->desc_packed[next_used_idx],
> wrap_counter);
> +}
> +
>  static __rte_always_inline void
>  vhost_flush_dequeue_packed(struct virtio_net *dev,
>struct vhost_virtqueue *vq)
> @@ -394,7 +408,8 @@ vhost_flush_dequeue_packed(struct virtio_net *dev,
> if (shadow_count <= 0)
> shadow_count += vq->size;
> 
> -   if ((uint32_t)shadow_count >= (vq->size - MAX_PKT_BURST)) {
> +   if ((uint32_t)shadow_count >= (vq->size - MAX_PKT_BURST)
> +   || !next_desc_is_avail(vq)) {
> do_data_copy_dequeue(vq);
> vhost_flush_dequeue_shadow_packed(dev, vq);
> vhost_vring_call_packed(dev, vq);
> 
> On Thu, 2020-01-09 at 15:47 +, bugzi...@dpdk.org wrote:
> > https://bugs.dpdk.org/show_bug.cgi?id=383
> >
> > Bug ID: 383
> >Summary: dpdk virtio_user lack of notifications make
> > vhost_net+napi stops tx buffers
> >Product: DPDK
> >Version: unspecified
> >   Hardware: All
> > OS: Linux
> > Status: UNCONFIRMED
> >   Severity: normal
> >   Priority: Normal
> >  Component: vhost/virtio
> >   Assignee: dev@dpdk.org
> >   Reporter: eup...@gmail.com
> >   Target Milestone: ---
> >
> > Using the current testpmd vhost_user as:
> >
> > ./app/testpmd -l 6,7,8 --vdev='net_vhost1,iface=/tmp/vhost-user1'
> > --vdev='net_vhost2,iface=/tmp/vhost-user2' -- -a -i --rxq=1 --txq=1
> > --txd=1024
> > --forward-mode=rxonly
> >
> > And starting qemu using packed=on on the interface:
> >
> > -netdev vhost-user,chardev=charnet1,id=hostnet1 -device
> > virtio-net-pci,rx_queue_size=256,...,packed=on
> >
> > And start to tx in the guest using:
> >
> > ./dpdk/build/app/testpmd -l 1,2 --vdev=eth_af_packet0,iface=eth0 -- \
> >

Re: [dpdk-dev] [DPDK] net/virtio: packed ring notification data feature support

2020-01-08 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Cheng Jiang
> Sent: Wednesday, December 04, 2019 11:03 PM
> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Bie, Tiwei ; Wang,
> Zhihong ; Jiang, Cheng1 
> Subject: [dpdk-dev] [DPDK] net/virtio: packed ring notification data
> feature support
> 
> This patch supports the feature that the driver passes extra data
> (besides identifying the virtqueue) in its device notifications.
> 
> Signed-off-by: Cheng Jiang 
> ---
>  drivers/net/virtio/virtio_ethdev.h |  3 ++-
>  drivers/net/virtio/virtio_pci.c| 15 ++-
>  drivers/net/virtio/virtio_pci.h|  6 ++
>  3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> index a10111758..cd8947656 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> + if (vtpci_with_feature(hw, VIRTIO_F_RING_PACKED))
> + notify_data = uint32_t)vq->vq_packed.used_wrap_counter <<
> 15) |
> + vq->vq_avail_idx) << 16) | vq->vq_queue_index;
> + else
> + notify_data = ((uint32_t)vq->vq_avail_idx << 16) |
> + vq->vq_queue_index;

Hi Cheng,
According to virtio1.1 spec, wrap counter should refer to next available 
descriptor.
So used_wrap_counter should be replaced with avail_wrap_counter. Sorry for late 
noticing. 

Thanks,
Marvin

> + rte_write32(notify_data, vq->notify_addr);
>  }
>

Re: [dpdk-dev] [PATCH v1] net/virtio-user: fix packed ring server mode

2019-12-09 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Xuan Ding
> Sent: Tuesday, December 10, 2019 12:50 AM
> To: maintai...@dpdk.org
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong ; Ding, Xuan
> ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH v1] net/virtio-user: fix packed ring server
> mode
> 
> This patch fixes the situation where datapath does not work properly when
> vhost reconnects to virtio in server mode with packed ring.
> 
> Currently, virtio and vhost share memory of vring. For split ring, vhost
> can read the status of discriptors directly from the available ring and
> the used ring during reconnection. Therefore, the datapath can continue.
> 
> But for packed ring, when reconnecting to virtio, vhost cannot get the
> status of discriptors only through the descriptor ring. By resetting Tx
> and Rx queues, the datapath can restart from the beginning.
> 
> Fixes: 4c3f5822eb214 ("net/virtio: add packed virtqueue defines")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xuan Ding 
> ---
>  drivers/net/virtio/virtio_ethdev.c  | 112 +++-
>  drivers/net/virtio/virtio_ethdev.h  |   3 +
>  drivers/net/virtio/virtio_user_ethdev.c |   8 ++
>  3 files changed, 121 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 044eb10a7..c0cb0f23c 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -433,6 +433,94 @@ virtio_init_vring(struct virtqueue *vq)
>   virtqueue_disable_intr(vq);
>  }
> 
> +static int
> +virtio_user_reset_rx_queues(struct rte_eth_dev *dev, uint16_t queue_idx)
> +{

Hi Xuan,
This function named as virtio_user_reset, but look like it has no relationship 
with virtio_user.
Maybe rename this function to virtqueue_reset and move it to virtqueue.c will 
be more suitable. 
Please also add suffix _packed as this function only workable for packed ring.

Thanks,
Marvin

> + uint16_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_RQ_QUEUE_IDX;
> + struct virtio_hw *hw = dev->data->dev_private;
> + struct virtqueue *vq = hw->vqs[vtpci_queue_idx];
> + struct virtnet_rx *rxvq;
> + struct vq_desc_extra *dxp;
> + unsigned int vq_size;
> + uint16_t desc_idx, i;
> +
> + vq_size = VTPCI_OPS(hw)->get_queue_num(hw, vtpci_queue_idx);
> +
Virtqueue size has been set to vq_nentries in virtqueue structure. Do we need 
to re-catch it?

> + vq->vq_packed.used_wrap_counter = 1;
> + vq->vq_packed.cached_flags = VRING_PACKED_DESC_F_AVAIL;
> + vq->vq_packed.event_flags_shadow = 0;
> + vq->vq_packed.cached_flags |= VRING_DESC_F_WRITE;
> +
> + rxvq = &vq->rxq;
> + memset(rxvq->mz->addr, 0, rxvq->mz->len);
> +
> + for (desc_idx = 0; desc_idx < vq_size; desc_idx++) {
> + dxp = &vq->vq_descx[desc_idx];
> + if (dxp->cookie != NULL) {
> + rte_pktmbuf_free(dxp->cookie);
> + dxp->cookie = NULL;
> + }
> + }
> +
> + virtio_init_vring(vq);
> +
> + for (i = 0; i < hw->max_queue_pairs; i++)
> + if (rxvq->mpool != NULL)
> + virtio_dev_rx_queue_setup_finish(dev, i);
> +

Please add parentheses for multiple lines loop content. 

> + return 0;
> +}
> +
> +static int
> +virtio_user_reset_tx_queues(struct rte_eth_dev *dev, uint16_t queue_idx)
> +{
> + uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
> + struct virtio_hw *hw = dev->data->dev_private;
> + struct virtqueue *vq = hw->vqs[vtpci_queue_idx];
> + struct virtnet_tx *txvq;
> + struct vq_desc_extra *dxp;
> + unsigned int vq_size;
> + uint16_t desc_idx;
> +
> + vq_size = VTPCI_OPS(hw)->get_queue_num(hw, vtpci_queue_idx);
> +
> + vq->vq_packed.used_wrap_counter = 1;
> + vq->vq_packed.cached_flags = VRING_PACKED_DESC_F_AVAIL;
> + vq->vq_packed.event_flags_shadow = 0;
> +
> + txvq = &vq->txq;
> + memset(txvq->mz->addr, 0, txvq->mz->len);
> + memset(txvq->virtio_net_hdr_mz->addr, 0,
> + txvq->virtio_net_hdr_mz->len);
> +
> + for (desc_idx = 0; desc_idx < vq_size; desc_idx++) {
> + dxp = &vq->vq_descx[desc_idx];
> + if (dxp->cookie != NULL) {
> + rte_pktmbuf_free(dxp->cookie);
> + dxp->cookie = NULL;
> + }
> + }
> +
> + virtio_init_vring(vq);
> +
> + return 0;
> +}
> +
> +static int
> +virtio_user_reset_queues(struct rte_eth_dev *eth_dev)
> +{
> + uint16_t i;
> +
> + /* Vring reset for each Tx queue and Rx queue. */
> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> + virtio_user_reset_rx_queues(eth_dev, i);
> +
> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> + virtio_user_reset_tx_queues(eth_dev, i);
> +
> + return 0;
> +}
> +
>  static int
>  virtio_init_queue(struct rte_eth_dev *dev, uint16_t vtpci_queue_i

Re: [dpdk-dev] [DPDK] net/virtio: packed ring notification data feature support

2019-12-08 Thread Liu, Yong




> -Original Message-
> From: dev  On Behalf Of Cheng Jiang
> Sent: Wednesday, December 04, 2019 11:03 PM
> To: dev@dpdk.org
> Cc: maxime.coque...@redhat.com; Bie, Tiwei ; Wang,
> Zhihong ; Jiang, Cheng1 
> Subject: [dpdk-dev] [DPDK] net/virtio: packed ring notification data
> feature support
> 
> This patch supports the feature that the driver passes extra data
> (besides identifying the virtqueue) in its device notifications.
> 
> Signed-off-by: Cheng Jiang 
> ---
>  drivers/net/virtio/virtio_ethdev.h |  3 ++-
>  drivers/net/virtio/virtio_pci.c| 15 ++-
>  drivers/net/virtio/virtio_pci.h|  6 ++
>  3 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h
> b/drivers/net/virtio/virtio_ethdev.h
> index a10111758..cd8947656 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -36,7 +36,8 @@
>1ULL << VIRTIO_F_IN_ORDER| \
>1ULL << VIRTIO_F_RING_PACKED | \
>1ULL << VIRTIO_F_IOMMU_PLATFORM  | \
> -  1ULL << VIRTIO_F_ORDER_PLATFORM)
> +  1ULL << VIRTIO_F_ORDER_PLATFORM  | \
> +  1ULL << VIRTIO_F_NOTIFICATION_DATA)
> 
>  #define VIRTIO_PMD_SUPPORTED_GUEST_FEATURES  \
>   (VIRTIO_PMD_DEFAULT_GUEST_FEATURES |\
> diff --git a/drivers/net/virtio/virtio_pci.c
> b/drivers/net/virtio/virtio_pci.c
> index 4468e89cb..2462a7dab 100644
> --- a/drivers/net/virtio/virtio_pci.c
> +++ b/drivers/net/virtio/virtio_pci.c
> @@ -418,7 +418,20 @@ modern_del_queue(struct virtio_hw *hw, struct
> virtqueue *vq)
>  static void
>  modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue
> *vq)
>  {

Hi Cheng,
hw pointer will be used in notify function, please remove rte_unused attribute. 

Thanks,
Marvin

> - rte_write16(vq->vq_queue_index, vq->notify_addr);
> + uint32_t notify_data;
> +
> + if (!vtpci_with_feature(hw, VIRTIO_F_NOTIFICATION_DATA)) {
> + rte_write16(vq->vq_queue_index, vq->notify_addr);
> + return;
> + }
> +
> + if (vtpci_with_feature(hw, VIRTIO_F_RING_PACKED))
> + notify_data = uint32_t)vq->vq_packed.used_wrap_counter <<
> 15) |
> + vq->vq_avail_idx) << 16) | vq->vq_queue_index;
> + else
> + notify_data = ((uint32_t)vq->vq_avail_idx << 16) |
> + vq->vq_queue_index;
> + rte_write32(notify_data, vq->notify_addr);
>  }
> 
>  const struct virtio_pci_ops modern_ops = {
> diff --git a/drivers/net/virtio/virtio_pci.h
> b/drivers/net/virtio/virtio_pci.h
> index a38cb45ad..7433d2f08 100644
> --- a/drivers/net/virtio/virtio_pci.h
> +++ b/drivers/net/virtio/virtio_pci.h
> @@ -135,6 +135,12 @@ struct virtnet_ctl;
>   */
>  #define VIRTIO_F_ORDER_PLATFORM 36
> 
> +/*
> + * This feature indicates that the driver passes extra data (besides
> + * identifying the virtqueue) in its device notifications.
> + */
> +#define VIRTIO_F_NOTIFICATION_DATA 38
> +
>  /* The Guest publishes the used index for which it expects an interrupt
>   * at the end of the avail ring. Host should ignore the avail->flags
> field. */
>  /* The Host publishes the avail index for which it expects a kick
> --
> 2.17.1

Re: [dpdk-dev] [PATCH] vhost: fix batch enqueue only handle few packets

2019-11-07 Thread Liu, Yong



> -Original Message-
> From: Liu, Yong
> Sent: Thursday, November 07, 2019 4:29 PM
> To: Maxime Coquelin ; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH] vhost: fix batch enqueue only handle few packets
> 
> 
> 
> > -Original Message-
> > From: Maxime Coquelin 
> > Sent: Thursday, November 07, 2019 4:20 PM
> > To: Liu, Yong ; Bie, Tiwei ;
> Wang,
> > Zhihong 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH] vhost: fix batch enqueue only handle few packets
> >
> >
> >
> > On 11/7/19 3:37 PM, Marvin Liu wrote:
> > > After enqueue function finished, packet index has been increased.
> Batch
> > > enqueue function should retrieve mbuf structure pointed by that index.
> > >
> > > Fixes: 0294211bb6dc ("vhost: optimize packed ring enqueue")
> > >
> > > Signed-off-by: Marvin Liu 
> > > ---
> > >  lib/librte_vhost/virtio_net.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > Applied to dpdk-next-virtio/master
> >
> > Can you please run again the performance benchmarks, and see what is the
> > loss, if any?
> >
> Sure, I will rerun the performance benchmarks.
> 

latest result with gcc9.0.1.
+---+
|   | 19.08 | + opt |
|---|---|---|
| 1518B PvP | 2.63M | 3.07M |
|---|---|---|
| 64B loopback  | 7.81M | 12.3M |
|---|---|---|
| 1518B loopback| 3.59M | 4.54M |
|---|---|---|
| 16K chained loopback  | 297K  | 322K  |
|---|---|---|
| 50% 256B + 50% 16K| 688K  | 959K  |
|---|---|---|
| pktgen_sample03_burst_single_flow | 5.78M | 5.80M |
+---+

> Thanks,
> Marvin
> 
> > Thanks,
> > Maxime

Re: [dpdk-dev] [PATCH] vhost: fix batch enqueue only handle few packets

2019-11-07 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin 
> Sent: Thursday, November 07, 2019 4:20 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] vhost: fix batch enqueue only handle few packets
> 
> 
> 
> On 11/7/19 3:37 PM, Marvin Liu wrote:
> > After enqueue function finished, packet index has been increased. Batch
> > enqueue function should retrieve mbuf structure pointed by that index.
> >
> > Fixes: 0294211bb6dc ("vhost: optimize packed ring enqueue")
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  lib/librte_vhost/virtio_net.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Applied to dpdk-next-virtio/master
> 
> Can you please run again the performance benchmarks, and see what is the
> loss, if any?
> 
Sure, I will rerun the performance benchmarks.

Thanks,
Marvin

> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v3] vhost: fix vhost user virtqueue not accessible

2019-11-01 Thread Liu, Yong



> -Original Message-
> From: Adrian Moreno [mailto:amore...@redhat.com]
> Sent: Friday, November 01, 2019 1:48 AM
> To: Liu, Yong ; Bie, Tiwei 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong ;
> dev@dpdk.org
> Subject: Re: [PATCH v3] vhost: fix vhost user virtqueue not accessible
> 
> Hi Marvin,
> 
> On 10/31/19 3:54 PM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Bie, Tiwei
> >> Sent: Thursday, October 31, 2019 6:42 PM
> >> To: Liu, Yong 
> >> Cc: maxime.coque...@redhat.com; Wang, Zhihong ;
> >> amore...@redhat.com; dev@dpdk.org
> >> Subject: Re: [PATCH v3] vhost: fix vhost user virtqueue not accessible
> >>
> >> On Wed, Oct 30, 2019 at 10:56:02PM +0800, Marvin Liu wrote:
> >>> Log feature is disabled in vhost user, so that log address was invalid
> >>> when checking. Check whether log address is valid can workaround it.
> >>> Also log address should be translated in packed ring virtqueue.
> >>>
> >>> Fixes: 04cfc7fdbfca ("vhost: translate incoming log address to gpa")
> >>>
> >>> Signed-off-by: Marvin Liu 
> >>> ---
> >>>  lib/librte_vhost/vhost_user.c | 30 +-
> >>>  1 file changed, 13 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_user.c
> >> b/lib/librte_vhost/vhost_user.c
> >>> index 61ef699ac..7754d2467 100644
> >>> --- a/lib/librte_vhost/vhost_user.c
> >>> +++ b/lib/librte_vhost/vhost_user.c
> >>> @@ -641,11 +641,23 @@ translate_ring_addresses(struct virtio_net *dev,
> >> int vq_index)
> >>>   struct vhost_vring_addr *addr = &vq->ring_addrs;
> >>>   uint64_t len, expected_len;
> >>>
> >>> + dev = numa_realloc(dev, vq_index);
> >>
> >> We need to update `vq->desc` first before doing numa_realloc.
> >>
> https://github.com/DPDK/dpdk/blob/19397c7bf2545e6adab41b657a1f1da3c7344e7b/
> >> lib/librte_vhost/vhost_user.c#L445
> >>
> >>> + vq = dev->virtqueue[vq_index];
> >>> + if (addr->flags & (1 << VHOST_VRING_F_LOG)) {
> >>
> I fear the possible consequences of this change.
> Before 04cfc7fdbfca the approach was "best-effort". The log address would
> be
> assigned without further checks:
> 
>   vq->log_guest_addr = addr->log_guest_addr;
> 
> Then, the behavior changed and an error was generated if the log address
> was
> invalid, which I guess is the problem you have hit:
> 
>   vq->log_guest_addr =
>   translate_log_addr(dev, vq, addr->log_guest_addr);
>   if (vq->log_guest_addr == 0) {
>   RTE_LOG(DEBUG, VHOST_CONFIG,
>   "(%d) failed to map log_guest_addr .\n",
>   dev->vid);
>   return dev;
>   }
> 
> In the tests I ran I always saw valid log addresses being sent at ring
> initialization phase, but if, as you claim, it's possible that invalid
> addresses
> are given at initialization phase, maybe we should go back to "best-effort"
> (i.e: remove the return statement)
> 
> But it's unlikely that qemu has enabled logging at ring initialization so
> this
> would effectively disable the translation at the initialization phase. I
> cannot
> forecast the consequences of this change without deeper analysis.

That's fine, Adrian. This issue only occurred when using dpdk virtio user 
device.
Since address logging is disabled in virtio user, simple flag check fix it.

> 
> >> `vq` can be reallocated by numa_realloc.
> >> We need to update the `addr` pointer before using it.
> >>
> >
> > Hi Tiwei,
> > Numa_realloc function will copy data from original vq structure to new vq
> when reallocating.
> > The content of vhost_ring_addr will be the same in new and old vqs, it
> may not be necessary to update pointer.
> That's true but 'addr' still holds a pointer to the old structure, assigned
> at
> line 641.
> 
> Also, note Tiwei's comment regarding updating 'vq->desc'. The idea behind
> numa_realloc is to reallocate the vhost_virtqueue structure to the same
> numa
> node as the descriptor ring. This function is updating the descriptor rings,
> so
> I think the idea is to update the ring addresses and then reallocate the
> virtqueue structure if needed.
> 
You are right, I misunderstood tiwei's comment. The ring address is useful for 
checking numa id, num

Re: [dpdk-dev] [PATCH v3] vhost: fix vhost user virtqueue not accessible

2019-10-31 Thread Liu, Yong



> -Original Message-
> From: Bie, Tiwei
> Sent: Thursday, October 31, 2019 6:42 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong ;
> amore...@redhat.com; dev@dpdk.org
> Subject: Re: [PATCH v3] vhost: fix vhost user virtqueue not accessible
> 
> On Wed, Oct 30, 2019 at 10:56:02PM +0800, Marvin Liu wrote:
> > Log feature is disabled in vhost user, so that log address was invalid
> > when checking. Check whether log address is valid can workaround it.
> > Also log address should be translated in packed ring virtqueue.
> >
> > Fixes: 04cfc7fdbfca ("vhost: translate incoming log address to gpa")
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  lib/librte_vhost/vhost_user.c | 30 +-
> >  1 file changed, 13 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user.c
> b/lib/librte_vhost/vhost_user.c
> > index 61ef699ac..7754d2467 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -641,11 +641,23 @@ translate_ring_addresses(struct virtio_net *dev,
> int vq_index)
> > struct vhost_vring_addr *addr = &vq->ring_addrs;
> > uint64_t len, expected_len;
> >
> > +   dev = numa_realloc(dev, vq_index);
> 
> We need to update `vq->desc` first before doing numa_realloc.
> https://github.com/DPDK/dpdk/blob/19397c7bf2545e6adab41b657a1f1da3c7344e7b/
> lib/librte_vhost/vhost_user.c#L445
> 
> > +   vq = dev->virtqueue[vq_index];
> > +   if (addr->flags & (1 << VHOST_VRING_F_LOG)) {
> 
> `vq` can be reallocated by numa_realloc.
> We need to update the `addr` pointer before using it.
> 

Hi Tiwei,
Numa_realloc function will copy data from original vq structure to new vq when 
reallocating.
The content of vhost_ring_addr will be the same in new and old vqs, it may not 
be necessary to update pointer.

Regards,
Marvin

> Thanks,
> Tiwei
> 
> 
> > +   vq->log_guest_addr =
> > +   translate_log_addr(dev, vq, addr->log_guest_addr);
> > +   if (vq->log_guest_addr == 0) {
> > +   RTE_LOG(DEBUG, VHOST_CONFIG,
> > +   "(%d) failed to map log_guest_addr.\n",
> > +   dev->vid);
> > +   return dev;
> > +   }
> > +   }
> > +
> > if (vq_is_packed(dev)) {
> > len = sizeof(struct vring_packed_desc) * vq->size;
> > vq->desc_packed = (struct vring_packed_desc *)(uintptr_t)
> > ring_addr_to_vva(dev, vq, addr->desc_user_addr, &len);
> > -   vq->log_guest_addr = 0;
> > if (vq->desc_packed == NULL ||
> > len != sizeof(struct vring_packed_desc) *
> > vq->size) {
> > @@ -655,10 +667,6 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
> > return dev;
> > }
> >
> > -   dev = numa_realloc(dev, vq_index);
> > -   vq = dev->virtqueue[vq_index];
> > -   addr = &vq->ring_addrs;
> > -
> > len = sizeof(struct vring_packed_desc_event);
> > vq->driver_event = (struct vring_packed_desc_event *)
> > (uintptr_t)ring_addr_to_vva(dev,
> > @@ -701,10 +709,6 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
> > return dev;
> > }
> >
> > -   dev = numa_realloc(dev, vq_index);
> > -   vq = dev->virtqueue[vq_index];
> > -   addr = &vq->ring_addrs;
> > -
> > len = sizeof(struct vring_avail) + sizeof(uint16_t) * vq->size;
> > if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> > len += sizeof(uint16_t);
> > @@ -741,14 +745,6 @@ translate_ring_addresses(struct virtio_net *dev, int
> vq_index)
> > vq->last_avail_idx = vq->used->idx;
> > }
> >
> > -   vq->log_guest_addr =
> > -   translate_log_addr(dev, vq, addr->log_guest_addr);
> > -   if (vq->log_guest_addr == 0) {
> > -   RTE_LOG(DEBUG, VHOST_CONFIG,
> > -   "(%d) failed to map log_guest_addr .\n",
> > -   dev->vid);
> > -   return dev;
> > -   }
> > vq->access_ok = 1;
> >
> > VHOST_LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
> > --
> > 2.17.1
> >

Re: [dpdk-dev] [PATCH v3] net/virtio: fix multicast and promisc mode enable failure

2019-10-29 Thread Liu, Yong



> -Original Message-
> From: Bie, Tiwei
> Sent: Tuesday, October 29, 2019 8:28 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong ;
> dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [PATCH v3] net/virtio: fix multicast and promisc mode enable
> failure
> 
> On Tue, Oct 29, 2019 at 12:42:20AM +0800, Marvin Liu wrote:
> > As doc mentioned, promisc and multicast are by-default supported in
> > virtio pmd. Mac/vlan filter are supported by best effort. These control
> > messages should return pass.
> >
> > Fixes: f9b9d1a55775 ("net/virtio-user: add multiple queues in device
> emulation")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  .../net/virtio/virtio_user/virtio_user_dev.c  | 37 ++-
> >  drivers/net/virtio/virtio_user_ethdev.c   |  4 ++
> >  2 files changed, 31 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> > index 1c575d0cd..b614dd0c0 100644
> > --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> > +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> > @@ -587,7 +587,7 @@ static uint32_t
> >  virtio_user_handle_ctrl_msg(struct virtio_user_dev *dev, struct vring
> *vring,
> > uint16_t idx_hdr)
> >  {
> > -   struct virtio_net_ctrl_hdr *hdr;
> > +   struct virtio_pmd_ctrl *ctrl;
> 
> We shouldn't use virtio_pmd_ctrl here. The virtio_pmd_ctrl
> is just a private structure defined in virtio PMD (upper layer).
> And we won't put this structure as is in the buffer pointed
> by the descriptor.

Thanks, will change in next version.

> 
> > virtio_net_ctrl_ack status = ~0;
> > uint16_t i, idx_data, idx_status;
> > uint32_t n_descs = 0;
> > @@ -606,13 +606,22 @@ virtio_user_handle_ctrl_msg(struct virtio_user_dev
> *dev, struct vring *vring,
> > idx_status = i;
> > n_descs++;
> >
> > -   hdr = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
> > -   if (hdr->class == VIRTIO_NET_CTRL_MQ &&
> > -   hdr->cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> > +   ctrl = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
> > +   if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
> > +   ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> > uint16_t queues;
> >
> > queues = *(uint16_t *)(uintptr_t)vring->desc[idx_data].addr;
> > status = virtio_user_handle_mq(dev, queues);
> > +   } else if (ctrl->hdr.class == VIRTIO_NET_CTRL_RX) {
> > +   if (ctrl->hdr.cmd == VIRTIO_NET_CTRL_RX_PROMISC ||
> > +   ctrl->hdr.cmd == VIRTIO_NET_CTRL_RX_ALLMULTI) {
> > +   if (ctrl->data[0])
> 
> Why do we need this check?
> 
Promisc and multicast setting should be checked before return. Ctrl data is not 
the value should be checked. 
Will check the content in idx_data in next version.

> 
> > +   status = 0;
> > +   }
> > +   } else if (ctrl->hdr.class == VIRTIO_NET_CTRL_MAC ||
> > +  ctrl->hdr.class == VIRTIO_NET_CTRL_VLAN) {
> > +   status = 0;
> > }
> >
> > /* Update status */
> > @@ -635,7 +644,7 @@ virtio_user_handle_ctrl_msg_packed(struct
> virtio_user_dev *dev,
> >struct vring_packed *vring,
> >uint16_t idx_hdr)
> >  {
> > -   struct virtio_net_ctrl_hdr *hdr;
> > +   struct virtio_pmd_ctrl *ctrl;
> > virtio_net_ctrl_ack status = ~0;
> > uint16_t idx_data, idx_status;
> > /* initialize to one, header is first */
> > @@ -656,14 +665,22 @@ virtio_user_handle_ctrl_msg_packed(struct
> virtio_user_dev *dev,
> > n_descs++;
> > }
> >
> > -   hdr = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
> > -   if (hdr->class == VIRTIO_NET_CTRL_MQ &&
> > -   hdr->cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> > +   ctrl = (void *)(uintptr_t)vring->desc[idx_hdr].addr;
> > +   if (ctrl->hdr.class == VIRTIO_NET_CTRL_MQ &&
> > +   ctrl->hdr.cmd == VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET) {
> > uint16_t queues;
> >
> > -   queues = *(uint16_t *)(uintptr_t)
> > -   vring->desc[idx_data].addr;
> > +   queues = *(uint16_t *)(uintptr_t)vring->desc[idx_data].addr;
> > status = virtio_user_handle_mq(dev, queues);
>

Re: [dpdk-dev] [PATCH] vhost: fix vhost user virtqueue not accessable

2019-10-27 Thread Liu, Yong



> -Original Message-
> From: Adrian Moreno [mailto:amore...@redhat.com]
> Sent: Friday, October 25, 2019 8:21 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] vhost: fix vhost user virtqueue not accessable
> 
> Hi Marvin,
> 
> On 10/25/19 6:20 PM, Marvin Liu wrote:
> > Log feature is disabled in vhost user, so that log address was invalid
> > when checking. Add feature bit check can skip useless address check.
> >
> Just so I understand, what conditions is the log address invalid?
> 
> > Fixes: 04cfc7fdbfca ("vhost: translate incoming log address to gpa")
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  lib/librte_vhost/vhost_user.c | 16 +---
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_user.c
> b/lib/librte_vhost/vhost_user.c
> > index 61ef699ac..0407fdc29 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -741,13 +741,15 @@ translate_ring_addresses(struct virtio_net *dev,
> int vq_index)
> > vq->last_avail_idx = vq->used->idx;
> > }
> >
> > -   vq->log_guest_addr =
> > -   translate_log_addr(dev, vq, addr->log_guest_addr);
> > -   if (vq->log_guest_addr == 0) {
> > -   RTE_LOG(DEBUG, VHOST_CONFIG,
> > -   "(%d) failed to map log_guest_addr .\n",
> > -   dev->vid);
> > -   return dev;
> > +   if (dev->features & (1ULL << VHOST_F_LOG_ALL)) {
> > +   vq->log_guest_addr =
> > +   translate_log_addr(dev, vq, addr->log_guest_addr);
> 
> VHOST_F_LOG_ALL is only negotiated once the migration has started (at least
> from
> qemu's perspective).
> That means that we will postponing the translation  of the log address to
> the
> vhost_user_set_vring_addr() call that follows the VHOST_F_LOG_ALL enabling.
> In
> that call there are (at least) two things that could go wrong and lead to a
> migration failure:
> - If VHOST_USER_F_PROTOCOL_FEATURES is not enabled, the address won't be
> translated:
> 
> vhost_user:795
>   if ((vq->enabled && (dev->features &
>   (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) ||
>   access_ok) {
>   dev = translate_ring_addresses(dev, msg->payload.addr.index);
>   if (!dev)
>   return RTE_VHOST_MSG_RESULT_ERR;
> 
>   *pdev = dev;
>   }
> 
> - If the IOMMU is enabled and there's a miss, we would have to wait for the
> IOTLB_UPDATE and during that time, there would be failed accesses to the
> (still
> untranslated) log address.
> 
> 

Thanks, Adrian. 
Log address can be zero when logging is not enabled.
How about add other criteria after translation? Log address will be translated 
anyway and not affect vq status.

vq->log_guest_addr =
translate_log_addr(dev, vq, addr->log_guest_addr);
-   if (vq->log_guest_addr == 0) {
+   if (vq->log_guest_addr == 0 && addr->flags) {
RTE_LOG(DEBUG, VHOST_CONFIG,
"(%d) failed to map log_guest_addr .\n",
dev->vid);
return dev;
}

Meanwhile, log address of packed ring is fixed to zero. Is any special reason 
to do that? 

Regards,
Marvin

> 
> > +   if (vq->log_guest_addr == 0) {
> > +   RTE_LOG(DEBUG, VHOST_CONFIG,
> > +   "(%d) failed to map log_guest_addr .\n",
> > +   dev->vid);
> > +   return dev;
> > +   }
> > }
> > vq->access_ok = 1;
> >
> >
> Thanks,
> Adrian

Re: [dpdk-dev] [PATCH v8 00/13] vhost packed ring performance optimization

2019-10-24 Thread Liu, Yong

Thanks, Maxime. Just sent out v9. 

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Thursday, October 24, 2019 4:25 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v8 00/13] vhost packed ring performance optimization
> 
> 
> 
> On 10/24/19 9:18 AM, Liu, Yong wrote:
> >
> >
> >> -Original Message-
> >> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> >> Sent: Thursday, October 24, 2019 2:50 PM
> >> To: Liu, Yong ; Bie, Tiwei ;
> Wang,
> >> Zhihong ; step...@networkplumber.org;
> >> gavin...@arm.com
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v8 00/13] vhost packed ring performance optimization
> >>
> >> I get some checkpatch warnings, and build fails with clang.
> >> Could you please fix these issues and send v9?
> >>
> >
> >
> > Hi Maxime,
> > Clang build fails will be fixed in v9. For checkpatch warning, it was due
> to pragma string inside.
> > Previous version can avoid such warning, while format is a little messy
> as below.
> > I prefer to keep code clean and more readable. How about your idea?
> >
> > +#ifdef UNROLL_PRAGMA_PARAM
> > +#define VHOST_UNROLL_PRAGMA(param) _Pragma(param)
> > +#else
> > +#define VHOST_UNROLL_PRAGMA(param) do {} while (0);
> > +#endif
> >
> > +   VHOST_UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++)
> 
> That's less clean indeed. I agree to waive the checkpatch errors.
> just fix the Clang build for patch 8 and we're good.
> 
> Thanks,
> Maxime
> 
> > Regards,
> > Marvin
> >
> >> Thanks,
> >> Maxime
> >>
> >> ### [PATCH] vhost: try to unroll for each loop
> >>
> >> WARNING:CAMELCASE: Avoid CamelCase: <_Pragma>
> >> #78: FILE: lib/librte_vhost/vhost.h:47:
> >> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll
> >> 4") \
> >>
> >> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> >> parenthesis
> >> #78: FILE: lib/librte_vhost/vhost.h:47:
> >> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll
> >> 4") \
> >> +  for (iter = val; iter < size; iter++)
> >>
> >> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> >> parenthesis
> >> #83: FILE: lib/librte_vhost/vhost.h:52:
> >> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll 4")
> \
> >> +  for (iter = val; iter < size; iter++)
> >>
> >> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> >> parenthesis
> >> #88: FILE: lib/librte_vhost/vhost.h:57:
> >> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll (4)")
> \
> >> +  for (iter = val; iter < size; iter++)
> >>
> >> total: 3 errors, 1 warnings, 67 lines checked
> >>
> >> 0/1 valid patch/tmp/dpdk_build/lib/librte_vhost/virtio_net.c:2065:1:
> >> error: unused function 'free_zmbuf' [-Werror,-Wunused-function]
> >> free_zmbuf(struct vhost_virtqueue *vq)
> >> ^
> >> 1 error generated.
> >> make[5]: *** [virtio_net.o] Error 1
> >> make[4]: *** [librte_vhost] Error 2
> >> make[4]: *** Waiting for unfinished jobs
> >> make[3]: *** [lib] Error 2
> >> make[2]: *** [all] Error 2
> >> make[1]: *** [pre_install] Error 2
> >> make: *** [install] Error 2
> >>
> >>
> >> On 10/22/19 12:08 AM, Marvin Liu wrote:
> >>> Packed ring has more compact ring format and thus can significantly
> >>> reduce the number of cache miss. It can lead to better performance.
> >>> This has been approved in virtio user driver, on normal E5 Xeon cpu
> >>> single core performance can raise 12%.
> >>>
> >>> http://mails.dpdk.org/archives/dev/2018-April/095470.html
> >>>
> >>> However vhost performance with packed ring performance was decreased.
> >>> Through analysis, mostly extra cost was from the calculating of each
> >>> descriptor flag which depended on ring wrap counter. Moreover, both
> >>> frontend and backend need to write same descriptors which will cause
> >>> cache contention. Especially when doing vhost enqueue function, virtio
> >>&g

Re: [dpdk-dev] [PATCH v8 00/13] vhost packed ring performance optimization

2019-10-24 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Thursday, October 24, 2019 2:50 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v8 00/13] vhost packed ring performance optimization
> 
> I get some checkpatch warnings, and build fails with clang.
> Could you please fix these issues and send v9?
> 


Hi Maxime,
Clang build fails will be fixed in v9. For checkpatch warning, it was due to 
pragma string inside.
Previous version can avoid such warning, while format is a little messy as 
below. 
I prefer to keep code clean and more readable. How about your idea?

+#ifdef UNROLL_PRAGMA_PARAM
+#define VHOST_UNROLL_PRAGMA(param) _Pragma(param)
+#else
+#define VHOST_UNROLL_PRAGMA(param) do {} while (0);
+#endif

+   VHOST_UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
+   for (i = 0; i < PACKED_BATCH_SIZE; i++)

Regards,
Marvin

> Thanks,
> Maxime
> 
> ### [PATCH] vhost: try to unroll for each loop
> 
> WARNING:CAMELCASE: Avoid CamelCase: <_Pragma>
> #78: FILE: lib/librte_vhost/vhost.h:47:
> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll
> 4") \
> 
> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> parenthesis
> #78: FILE: lib/librte_vhost/vhost.h:47:
> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("GCC unroll
> 4") \
> + for (iter = val; iter < size; iter++)
> 
> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> parenthesis
> #83: FILE: lib/librte_vhost/vhost.h:52:
> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll 4") \
> + for (iter = val; iter < size; iter++)
> 
> ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in
> parenthesis
> #88: FILE: lib/librte_vhost/vhost.h:57:
> +#define vhost_for_each_try_unroll(iter, val, size) _Pragma("unroll (4)") \
> + for (iter = val; iter < size; iter++)
> 
> total: 3 errors, 1 warnings, 67 lines checked
> 
> 0/1 valid patch/tmp/dpdk_build/lib/librte_vhost/virtio_net.c:2065:1:
> error: unused function 'free_zmbuf' [-Werror,-Wunused-function]
> free_zmbuf(struct vhost_virtqueue *vq)
> ^
> 1 error generated.
> make[5]: *** [virtio_net.o] Error 1
> make[4]: *** [librte_vhost] Error 2
> make[4]: *** Waiting for unfinished jobs
> make[3]: *** [lib] Error 2
> make[2]: *** [all] Error 2
> make[1]: *** [pre_install] Error 2
> make: *** [install] Error 2
> 
> 
> On 10/22/19 12:08 AM, Marvin Liu wrote:
> > Packed ring has more compact ring format and thus can significantly
> > reduce the number of cache miss. It can lead to better performance.
> > This has been approved in virtio user driver, on normal E5 Xeon cpu
> > single core performance can raise 12%.
> >
> > http://mails.dpdk.org/archives/dev/2018-April/095470.html
> >
> > However vhost performance with packed ring performance was decreased.
> > Through analysis, mostly extra cost was from the calculating of each
> > descriptor flag which depended on ring wrap counter. Moreover, both
> > frontend and backend need to write same descriptors which will cause
> > cache contention. Especially when doing vhost enqueue function, virtio
> > refill packed ring function may write same cache line when vhost doing
> > enqueue function. This kind of extra cache cost will reduce the benefit
> > of reducing cache misses.
> >
> > For optimizing vhost packed ring performance, vhost enqueue and dequeue
> > function will be splitted into fast and normal path.
> >
> > Several methods will be taken in fast path:
> >   Handle descriptors in one cache line by batch.
> >   Split loop function into more pieces and unroll them.
> >   Prerequisite check that whether I/O space can copy directly into mbuf
> > space and vice versa.
> >   Prerequisite check that whether descriptor mapping is successful.
> >   Distinguish vhost used ring update function by enqueue and dequeue
> > function.
> >   Buffer dequeue used descriptors as many as possible.
> >   Update enqueue used descriptors by cache line.
> >
> > After all these methods done, single core vhost PvP performance with 64B
> > packet on Xeon 8180 can boost 35%.
> >
> > v8:
> > - Allocate mbuf by virtio_dev_pktmbuf_alloc
> >
> > v7:
> > - Rebase code
> > - Rename unroll macro and definitions
> > - Calculate flags when doing single dequeue
> >
> > v6:
> > - Fix dequeue zcopy result check
> >
> > v5:
> > - Remove disa

Re: [dpdk-dev] [PATCH v7 06/13] vhost: add packed ring batch dequeue

2019-10-21 Thread Liu, Yong

Thanks Maxime, has been modified in v8.

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Monday, October 21, 2019 5:47 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v7 06/13] vhost: add packed ring batch dequeue
> 
> 
> 
> On 10/21/19 5:40 PM, Marvin Liu wrote:
> > Add batch dequeue function like enqueue function for packed ring, batch
> > dequeue function will not support chained descritpors, single packet
> > dequeue function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index a2b9221e0..67724c342 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,9 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#define PACKED_DESC_SINGLE_DEQUEUE_FLAG (VRING_DESC_F_NEXT | \
> > +VRING_DESC_F_INDIRECT)
> > +
> >  #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> > sizeof(struct vring_packed_desc))
> >  #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 317be1aed..f13fcafbb 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -1635,6 +1635,114 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> > return i;
> >  }
> >
> > +static __rte_always_inline int
> > +vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> > +struct vhost_virtqueue *vq,
> > +struct rte_mempool *mbuf_pool,
> > +struct rte_mbuf **pkts,
> > +uint16_t avail_idx,
> > +uintptr_t *desc_addrs,
> > +uint16_t *ids)
> > +{
> > +   bool wrap = vq->avail_wrap_counter;
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   struct virtio_net_hdr *hdr;
> > +   uint64_t lens[PACKED_BATCH_SIZE];
> > +   uint64_t buf_lens[PACKED_BATCH_SIZE];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint16_t flags, i;
> > +
> > +   if (unlikely(avail_idx & PACKED_BATCH_MASK))
> > +   return -1;
> > +   if (unlikely((avail_idx + PACKED_BATCH_SIZE) > vq->size))
> > +   return -1;
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   flags = descs[avail_idx + i].flags;
> > +   if (unlikely((wrap != !!(flags & VRING_DESC_F_AVAIL)) ||
> > +(wrap == !!(flags & VRING_DESC_F_USED))  ||
> > +(flags & PACKED_DESC_SINGLE_DEQUEUE_FLAG)))
> > +   return -1;
> > +   }
> > +
> > +   rte_smp_rmb();
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   lens[i] = descs[avail_idx + i].len;
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   desc_addrs[i] = vhost_iova_to_vva(dev, vq,
> > + descs[avail_idx + i].addr,
> > + &lens[i], VHOST_ACCESS_RW);
> > +   }
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely((lens[i] != descs[avail_idx + i].len)))
> > +   return -1;
> > +   }
> > +
> > +   if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts, PACKED_BATCH_SIZE))
> 
> Same here, you may want to create a variant of Flavio's
> virtio_dev_pktmbuf_alloc for bulk allocations.

> 
> > +   return -1;
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   buf_lens[i] = pkts[i]->buf_len - pkts[i]->data_off;
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely(buf_lens[i] < (lens[i] - buf_offset)))
> > +   goto free_buf;
> > +   }
> > +
> > +   vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   pkts[i]->pkt_len = descs[avail_idx + i].len - buf_offset;
> > +   pkts[i]->data_len = pkts[i]->pkt_len;
> > +   ids[i] = descs[avail_idx + i].id;
> > +   }
> > +
> > +   if (virtio_net_with_host_offload(dev)) {
> > +   vhost_for_each_try_un

Re: [dpdk-dev] [PATCH v6 00/13] vhost packed ring performance optimization

2019-10-17 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Thursday, October 17, 2019 3:31 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v6 00/13] vhost packed ring performance optimization
> 
> Hi Marvin,
> 
> This is almost good, just fix the small comments I made.
> 
> Also, please rebase on top of next-virtio branch, because I applied
> below patch from Flavio that you need to take into account:
> 
> http://patches.dpdk.org/patch/61284/

Thanks, Maxime. I will start rebasing work.

> 
> Regards,
> Maxime
> 
> On 10/15/19 6:07 PM, Marvin Liu wrote:
> > Packed ring has more compact ring format and thus can significantly
> > reduce the number of cache miss. It can lead to better performance.
> > This has been approved in virtio user driver, on normal E5 Xeon cpu
> > single core performance can raise 12%.
> >
> > http://mails.dpdk.org/archives/dev/2018-April/095470.html
> >
> > However vhost performance with packed ring performance was decreased.
> > Through analysis, mostly extra cost was from the calculating of each
> > descriptor flag which depended on ring wrap counter. Moreover, both
> > frontend and backend need to write same descriptors which will cause
> > cache contention. Especially when doing vhost enqueue function, virtio
> > refill packed ring function may write same cache line when vhost doing
> > enqueue function. This kind of extra cache cost will reduce the benefit
> > of reducing cache misses.
> >
> > For optimizing vhost packed ring performance, vhost enqueue and dequeue
> > function will be splitted into fast and normal path.
> >
> > Several methods will be taken in fast path:
> >   Handle descriptors in one cache line by batch.
> >   Split loop function into more pieces and unroll them.
> >   Prerequisite check that whether I/O space can copy directly into mbuf
> > space and vice versa.
> >   Prerequisite check that whether descriptor mapping is successful.
> >   Distinguish vhost used ring update function by enqueue and dequeue
> > function.
> >   Buffer dequeue used descriptors as many as possible.
> >   Update enqueue used descriptors by cache line.
> >
> > After all these methods done, single core vhost PvP performance with 64B
> > packet on Xeon 8180 can boost 35%.
> >
> > v6:
> > - Fix dequeue zcopy result check
> >
> > v5:
> > - Remove disable sw prefetch as performance impact is small
> > - Change unroll pragma macro format
> > - Rename shadow counter elements names
> > - Clean dequeue update check condition
> > - Add inline functions replace of duplicated code
> > - Unify code style
> >
> > v4:
> > - Support meson build
> > - Remove memory region cache for no clear performance gain and ABI break
> > - Not assume ring size is power of two
> >
> > v3:
> > - Check available index overflow
> > - Remove dequeue remained descs number check
> > - Remove changes in split ring datapath
> > - Call memory write barriers once when updating used flags
> > - Rename some functions and macros
> > - Code style optimization
> >
> > v2:
> > - Utilize compiler's pragma to unroll loop, distinguish clang/icc/gcc
> > - Buffered dequeue used desc number changed to (RING_SZ - PKT_BURST)
> > - Optimize dequeue used ring update when in_order negotiated
> >
> >
> > Marvin Liu (13):
> >   vhost: add packed ring indexes increasing function
> >   vhost: add packed ring single enqueue
> >   vhost: try to unroll for each loop
> >   vhost: add packed ring batch enqueue
> >   vhost: add packed ring single dequeue
> >   vhost: add packed ring batch dequeue
> >   vhost: flush enqueue updates by batch
> >   vhost: flush batched enqueue descs directly
> >   vhost: buffer packed ring dequeue updates
> >   vhost: optimize packed ring enqueue
> >   vhost: add packed ring zcopy batch and single dequeue
> >   vhost: optimize packed ring dequeue
> >   vhost: optimize packed ring dequeue when in-order
> >
> >  lib/librte_vhost/Makefile |  18 +
> >  lib/librte_vhost/meson.build  |   7 +
> >  lib/librte_vhost/vhost.h  |  57 +++
> >  lib/librte_vhost/virtio_net.c | 924 +++---
> >  4 files changed, 812 insertions(+), 194 deletions(-)
> >

Re: [dpdk-dev] [PATCH v6 06/13] vhost: add packed ring batch dequeue

2019-10-16 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Wednesday, October 16, 2019 6:36 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v6 06/13] vhost: add packed ring batch dequeue
> 
> 
> 
> On 10/15/19 6:07 PM, Marvin Liu wrote:
> > Add batch dequeue function like enqueue function for packed ring, batch
> > dequeue function will not support chained descritpors, single packet
> > dequeue function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 18d01cb19..96bf763b1 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,9 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#define PACKED_DESC_SINGLE_DEQUEUE_FLAG (VRING_DESC_F_NEXT | \
> > +VRING_DESC_F_INDIRECT)
> > +
> >  #define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> > sizeof(struct vring_packed_desc))
> >  #define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index e1b06c1ce..274a28f99 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -1551,6 +1551,113 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> > return i;
> >  }
> >
> > +static __rte_always_inline int
> > +vhost_reserve_avail_batch_packed(struct virtio_net *dev,
> > +struct vhost_virtqueue *vq,
> > +struct rte_mempool *mbuf_pool,
> > +struct rte_mbuf **pkts,
> > +uint16_t avail_idx,
> > +uintptr_t *desc_addrs,
> > +uint16_t *ids)
> > +{
> > +   bool wrap = vq->avail_wrap_counter;
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   struct virtio_net_hdr *hdr;
> > +   uint64_t lens[PACKED_BATCH_SIZE];
> > +   uint64_t buf_lens[PACKED_BATCH_SIZE];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint16_t flags, i;
> > +
> > +   if (unlikely(avail_idx & PACKED_BATCH_MASK))
> > +   return -1;
> > +   if (unlikely((avail_idx + PACKED_BATCH_SIZE) > vq->size))
> > +   return -1;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   flags = descs[avail_idx + i].flags;
> > +   if (unlikely((wrap != !!(flags & VRING_DESC_F_AVAIL)) ||
> > +(wrap == !!(flags & VRING_DESC_F_USED))  ||
> > +(flags & PACKED_DESC_SINGLE_DEQUEUE_FLAG)))
> > +   return -1;
> > +   }
> > +
> > +   rte_smp_rmb();
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   lens[i] = descs[avail_idx + i].len;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   desc_addrs[i] = vhost_iova_to_vva(dev, vq,
> > + descs[avail_idx + i].addr,
> > + &lens[i], VHOST_ACCESS_RW);
> > +   }
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely((lens[i] != descs[avail_idx + i].len)))
> > +   return -1;
> > +   }
> > +
> > +   if (rte_pktmbuf_alloc_bulk(mbuf_pool, pkts, PACKED_BATCH_SIZE))
> > +   return -1;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   buf_lens[i] = pkts[i]->buf_len - pkts[i]->data_off;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely(buf_lens[i] < (lens[i] - buf_offset)))
> > +   goto free_buf;
> > +   }
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   pkts[i]->pkt_len = descs[avail_idx + i].len - buf_offset;
> > +   pkts[i]->data_len = pkts[i]->pkt_len;
> > +   ids[i] = descs[avail_idx + i].id;
> > +   }
> > +
> > +   if (virtio_net_with_host_offload(dev)) {
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   hdr = (struct virtio_net_hdr *)(desc_addrs[i]);
> > +   vhost_dequeue_offload(hdr, pkts[i]);
> > +

Re: [dpdk-dev] [PATCH v6 04/13] vhost: add packed ring batch enqueue

2019-10-15 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Tuesday, October 15, 2019 7:36 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong ;
> step...@networkplumber.org
> Cc: dev@dpdk.org; nd 
> Subject: RE: [PATCH v6 04/13] vhost: add packed ring batch enqueue
> 
> Hi Marvin,
> 
> > -Original Message-
> > From: Marvin Liu 
> > Sent: Wednesday, October 16, 2019 12:08 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com; step...@networkplumber.org; Gavin Hu (Arm
> > Technology China) 
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [PATCH v6 04/13] vhost: add packed ring batch enqueue
> >
> > Batch enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Batch
> > enqueue function do not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> > Reviewed-by: Maxime Coquelin 
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 142c14e04..a8130dc06 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -881,6 +881,76 @@ virtio_dev_rx_split(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > return pkt_idx;
> >  }
> >
> > +static __rte_unused int
> > +virtio_dev_rx_batch_packed(struct virtio_net *dev,
> > +  struct vhost_virtqueue *vq,
> > +  struct rte_mbuf **pkts)
> > +{
> > +   bool wrap_counter = vq->avail_wrap_counter;
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   uint16_t avail_idx = vq->last_avail_idx;
> > +   uint64_t desc_addrs[PACKED_BATCH_SIZE];
> > +   struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_BATCH_SIZE];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint64_t lens[PACKED_BATCH_SIZE];
> > +   uint16_t i;
> > +
> > +   if (unlikely(avail_idx & PACKED_BATCH_MASK))
> > +   return -1;
> > +
> > +   if (unlikely((avail_idx + PACKED_BATCH_SIZE) > vq->size))
> > +   return -1;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely(pkts[i]->next != NULL))
> > +   return -1;
> > +   if (unlikely(!desc_is_avail(&descs[avail_idx + i],
> > +   wrap_counter)))
> > +   return -1;
> > +   }
> > +
> > +   rte_smp_rmb();
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   lens[i] = descs[avail_idx + i].len;
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely(pkts[i]->pkt_len > (lens[i] - buf_offset)))
> > +   return -1;
> > +   }
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   desc_addrs[i] = vhost_iova_to_vva(dev, vq,
> > + descs[avail_idx + i].addr,
> > + &lens[i],
> > + VHOST_ACCESS_RW);
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   if (unlikely(lens[i] != descs[avail_idx + i].len))
> > +   return -1;
> > +   }
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   rte_prefetch0((void *)(uintptr_t)desc_addrs[i]);
> > +   hdrs[i] = (struct virtio_net_hdr_mrg_rxbuf *)
> > +   (uintptr_t)desc_addrs[i];
> > +   lens[i] = pkts[i]->pkt_len + dev->vhost_hlen;
> > +   }
> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
> > +   virtio_enqueue_offload(pkts[i], &hdrs[i]->hdr);
> > +
> > +   vq_inc_last_avail_packed(vq, PACKED_BATCH_SIZE);
> 
> Is the last_avail_idx a shared variable? Why is updated before the
> following payload copy?
> This will cause the other side get earlier-than-arrival data?
> /Gavin

Hi Gavin,
Last_avail_idx and last_used_idx are all vhost local variables. 
They are used for tracking next available and used index of virtqueue.
Last avail_idx value should increase after descs are consumed.
Last used_idx value should increase after descs flags are updated.

Thanks,
Marvin

> > +
> > +   for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
> > +   rte_memcpy((void *)(uintptr_t)(desc_addrs[i] + buf_offset),
> > +  rte_pktmbuf_mtod_offset(pkts[i], void *, 0),
> > +  pkts[i]->pkt_len);
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >  static __rte_unused int16_t
> >  virtio_dev_rx_single_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > --
> > 2.17.1

Re: [dpdk-dev] [PATCH v4 03/14] vhost: add batch enqueue function for packed ring

2019-10-14 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Friday, October 11, 2019 10:22 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v4 03/14] vhost: add batch enqueue function for packed
> ring
> 
> 
> 
> On 10/9/19 3:38 PM, Marvin Liu wrote:
> > Batch enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Batch
> > enqueue function not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> >
> 
> Thinking again about this patch and series in general...
> 
> So this series improves performance by 40% in cases where:
>  - descriptors are cache aligned
>  - single mbuf
> 
> But my understanding is that it will cause performance regression for
> the other cases, which may not be that uncommon, no?
> 
> Do you have some number about the performance impact on these other
> cases?
> 

Hi Maxime,
Check prerequisites of batch handling is pretty simple and fast.
It almost has no performance impact on uncommon case. 
Chained packets can slightly benefit from cache related optimization.
As shown in below table, all cases I run can benefit from vhost optimization.
From our experimental, more performance gain can be seen if more packets 
handled by batch.

+---+
|   | 19.08 | + opt |
|---|---|---|
| 1518B PvP | 2.63M | 2.98M |
|---|---|---|
| 64B loopback  | 7.81M | 12.0M |
|---|---|---|
| 1518B loopback| 3.59M | 4.69M |
|---|---|---|
| 16K chained loopback  | 297K  | 306K  |
|---|---|---|
| 50% 256B + 50% 16K| 296K  | 309K  |
|---|---|---|
| pktgen_sample03_burst_single_flow | 6.03M | 6.39M |
+---+

Regards,
Marvin

> Thanks,
> Maxime

Re: [dpdk-dev] [PATCH v4 02/14] vhost: unify unroll pragma parameter

2019-10-13 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Friday, October 11, 2019 8:49 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v4 02/14] vhost: unify unroll pragma parameter
> 
> 
> 
> On 10/9/19 3:38 PM, Marvin Liu wrote:
> > Add macro for unifying Clang/ICC/GCC unroll pragma format. Batch
> > functions were contained of several small loops which optimized by
> > compiler’s loop unrolling pragma.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 8623e91c0..30839a001 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -16,6 +16,24 @@ CFLAGS += -I vhost_user
> >  CFLAGS += -fno-strict-aliasing
> >  LDLIBS += -lpthread
> >
> > +ifeq ($(RTE_TOOLCHAIN), gcc)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> > +CFLAGS += -DSUPPORT_GCC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), clang)
> > +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -ge 37
> && echo 1), 1)
> > +CFLAGS += -DSUPPORT_CLANG_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), icc)
> > +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> > +CFLAGS += -DSUPPORT_ICC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> >  ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
> >  LDLIBS += -lnuma
> >  endif
> > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > index cb1123ae3..ddf0ee579 100644
> > --- a/lib/librte_vhost/meson.build
> > +++ b/lib/librte_vhost/meson.build
> > @@ -8,6 +8,13 @@ endif
> >  if has_libnuma == 1
> > dpdk_conf.set10('RTE_LIBRTE_VHOST_NUMA', true)
> >  endif
> > +if (toolchain == 'gcc' and cc.version().version_compare('>=8.3.0'))
> > +   cflags += '-DSUPPORT_GCC_UNROLL_PRAGMA'
> > +elif (toolchain == 'clang' and cc.version().version_compare('>=3.7.0'))
> > +   cflags += '-DSUPPORT_CLANG_UNROLL_PRAGMA'
> > +elif (toolchain == 'icc' and cc.version().version_compare('>=16.0.0'))
> > +   cflags += '-DSUPPORT_ICC_UNROLL_PRAGMA'
> > +endif
> >  dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY',
> >   cc.has_header('linux/userfaultfd.h'))
> >  version = 4
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 884befa85..4cba8c5ef 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,24 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#ifdef SUPPORT_GCC_UNROLL_PRAGMA
> > +#define UNROLL_PRAGMA_PARAM "GCC unroll 4"
> 
> Shouldn't al these defines be either prefixed with VHOST_, or being
> declared in EAL headers, so that it can be used by other DPDK libs?
> 
> I will pick it as is for now, but please consider above comment and
> and send a patch on top if it makes sense.
> 

Hi Maxime,
For making loop unroll macro more generic, modified version as below.
Since only vhost utilize the benefit of compiler's unroll feature, I'd like to 
keep it in vhost by now.

#ifdef SUPPORT_GCC_UNROLL_PRAGMA
#define for_each_try_unroll(iter, val, size) _Pragma("GCC unroll 4") \
for (iter = val; iter < size; iter++)
#endif

#ifdef SUPPORT_CLANG_UNROLL_PRAGMA
#define for_each_try_unroll(iter, val, size) _Pragma("unroll 4") \
for (iter = val; iter < size; iter++)
#endif

#ifdef SUPPORT_ICC_UNROLL_PRAGMA
#define for_each_try_unroll(iter, val, size) _Pragma("unroll (4)") \
for (iter = val; iter < size; iter++)
#endif

#ifndef for_each_try_unroll
#define for_each_try_unroll(iter, val, num) \
for (iter = val; iter < num; iter++)
#endif

Regards,
Marvin

> Thanks,
> Maxime
> > +#endif
> > +
> > +#ifdef SUPPORT_CLANG_UNROLL_PRAGMA
> > +#define UNROLL_PRAGMA_PARAM "unroll 4"
> > +#endif
> > +
> > +#ifdef SUPPORT_ICC_UNROLL_PRAGMA
> > +#define UNROLL_PRAGMA_PARAM "unroll (4)"
> > +#endif
> > +
> > +#ifdef UNROLL_PRAGMA_PARAM
> > +#define UNROLL_PRAGMA(param) _Pragma(param)
> > +#else
> > +#define UNROLL_PRAGMA(param) do {} while (0);
> > +#endif
> > +
> >  /**
> >   * Structure contains buffer address, length and descriptor index
> >   * from vring to do scatter RX.
> >

Re: [dpdk-dev] [PATCH v4 13/14] vhost: check whether disable software pre-fetch

2019-10-13 Thread Liu, Yong



> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Friday, October 11, 2019 10:12 PM
> To: Liu, Yong ; Bie, Tiwei ; Wang,
> Zhihong ; step...@networkplumber.org;
> gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v4 13/14] vhost: check whether disable software pre-
> fetch
> 
> 
> 
> On 10/9/19 3:38 PM, Marvin Liu wrote:
> > Disable software pre-fetch actions on Skylake and later platforms.
> > Hardware can fetch needed data for vhost, additional software pre-fetch
> > will impact performance.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 30839a001..5f3b42e56 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -16,6 +16,12 @@ CFLAGS += -I vhost_user
> >  CFLAGS += -fno-strict-aliasing
> >  LDLIBS += -lpthread
> >
> > +AVX512_SUPPORT=$(shell $(CC) -march=native -dM -E -  AVX512F)
> > +
> > +ifneq ($(AVX512_SUPPORT),)
> > +CFLAGS += -DDISABLE_SWPREFETCH
> > +endif
> 
> That's problematic I think, because the machine running the lib may be
> different from the machine building it, for example distros.
> 
> In this case, a Skylake or later may be used to build the package, but
> with passing "-march=haswell". It would end-up prefetching being
> disabled whereas we would expect it to be enabled.
> 
Thanks, Maxime. Got your idea. Compiling environment and running environment 
maybe different. 
Performance impact on skylake is around 1% in V1 patch under vhost/virtio 
loopback scenario.
Since the impact is very small and has no impact in later revised version.
I'd like to remove this patch.

Regards,
Marvin

> I see several solutions:
> - Check for CONFIG_RTE_ENABLE_AVX512 flag.
> - Keep prefetch instructions (what would be the impact on Skylake and
>   later?)
> - Remove prefetch instructions (what would be the impact on pre-
>   Skylake?)
> 
> 
> But really, I think we need some figures before applying such a patch.
> What performance gain do you measure with this patch?
> 
> >  ifeq ($(RTE_TOOLCHAIN), gcc)
> >  ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> >  CFLAGS += -DSUPPORT_GCC_UNROLL_PRAGMA
> > diff --git a/lib/librte_vhost/meson.build b/lib/librte_vhost/meson.build
> > index ddf0ee579..5c6f0c0b4 100644
> > --- a/lib/librte_vhost/meson.build
> > +++ b/lib/librte_vhost/meson.build
> > @@ -15,6 +15,10 @@ elif (toolchain == 'clang' and
> cc.version().version_compare('>=3.7.0'))
> >  elif (toolchain == 'icc' and cc.version().version_compare('>=16.0.0'))
> > cflags += '-DSUPPORT_ICC_UNROLL_PRAGMA'
> >  endif
> > +r = run_command(toolchain, '-march=native', '-dM', '-E', '-',
> ' > +if (r.stdout().strip() != '')
> > +   cflags += '-DDISABLE_SWPREFETCH'
> > +endif
> >  dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY',
> >   cc.has_header('linux/userfaultfd.h'))
> >  version = 4
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 56c2080fb..046e497c2 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -1075,7 +1075,9 @@ virtio_dev_rx_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> >
> > UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +#ifndef DISABLE_SWPREFETCH
> > rte_prefetch0((void *)(uintptr_t)desc_addrs[i]);
> > +#endif
> > hdrs[i] = (struct virtio_net_hdr_mrg_rxbuf *)
> > (uintptr_t)desc_addrs[i];
> > lens[i] = pkts[i]->pkt_len + dev->vhost_hlen;
> > @@ -1144,7 +1146,9 @@ virtio_dev_rx_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > uint32_t remained = count;
> >
> > do {
> > +#ifndef DISABLE_SWPREFETCH
> > rte_prefetch0(&vq->desc_packed[vq->last_avail_idx]);
> > +#endif
> >
> > if (remained >= PACKED_BATCH_SIZE) {
> > if (!virtio_dev_rx_batch_packed(dev, vq, pkts)) {
> > @@ -1790,7 +1794,9 @@ virtio_dev_tx_batch_packed(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> >
> > UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +#ifndef DISABLE_SWPREFETCH
> > rte_prefetch0((void *)(uintptr_t)desc_addrs[i]);
> > +#endif
> > rte_memcpy(rte_pktmbuf_mtod_offset(pkts[i], void *, 0),
> >(void *)(uintptr_t)(desc_addrs[i] + buf_offset),
> >pkts[i]->pkt_len);
> > @@ -2046,7 +2052,9 @@ virtio_dev_tx_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > uint32_t remained = count;
> >
> > do {
> > +#ifndef DISABLE_SWPREFETCH
> > rte_prefetch0(&vq->desc_packed[vq->last_avail_idx]);
> > +#endif
> >
> > if (remained >= PACKED_BATCH_SIZE) {
> > if (!virtio_dev_tx_batch_packed(dev, vq, mbuf_pool,
> >

Re: [dpdk-dev] [PATCH v3 03/15] vhost: add batch enqueue function for packed ring

2019-10-08 Thread Liu, Yong



> -Original Message-
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Thursday, September 26, 2019 3:31 AM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong ;
> step...@networkplumber.org; gavin...@arm.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 03/15] vhost: add batch enqueue function
> for packed ring
> 
> On 2019-09-25 19:13, Marvin Liu wrote:
> > Batch enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Batch
> > enqueue function not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 4cba8c5ef..e241436c7 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,10 @@
> >
> >   #define VHOST_LOG_CACHE_NR 32
> >
> > +#define PACKED_BATCH_SIZE (RTE_CACHE_LINE_SIZE / \
> > +   sizeof(struct vring_packed_desc))
> > +#define PACKED_BATCH_MASK (PACKED_BATCH_SIZE - 1)
> > +
> >   #ifdef SUPPORT_GCC_UNROLL_PRAGMA
> >   #define UNROLL_PRAGMA_PARAM "GCC unroll 4"
> >   #endif
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 520c4c6a8..5e08f7d9b 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -883,6 +883,86 @@ virtio_dev_rx_split(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > return pkt_idx;
> >   }
> >
> > +static __rte_unused int
> > +virtio_dev_rx_batch_packed(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> > +struct rte_mbuf **pkts)
> > +{
> > +   bool wrap_counter = vq->avail_wrap_counter;
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   uint16_t avail_idx = vq->last_avail_idx;
> > +   uint64_t desc_addrs[PACKED_BATCH_SIZE];
> > +   struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_BATCH_SIZE];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint64_t lens[PACKED_BATCH_SIZE];
> > +   uint16_t i;
> > +
> > +   if (unlikely(avail_idx & PACKED_BATCH_MASK))
> > +   return -1;
> 
> Does this really generate better code than just "avail_idx <
> PACKED_BATCH_SIZE"? and+jne vs cmp+jbe.

Hi Mattias,
This comparison is to check whether descriptor location is cache aligned. 
In x86 cache line size is 64 bytes, so here mask is 0x3. This check will be and 
+ test + je which is very simple. 
Most of times the cost of execution will be eliminated as the result can be 
predicted.

Thanks,
Marvin

> 
> > +
> > +   if (unlikely((avail_idx + PACKED_BATCH_SIZE) > vq->size))
> > +   return -1;
> > +
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +   if (unlikely(pkts[i]->next != NULL))
> > +   return -1;
> > +   if (unlikely(!desc_is_avail(&descs[avail_idx + i],
> > +   wrap_counter)))
> > +   return -1;
> > +   }
> > +
> > +   rte_smp_rmb();
> > +
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++)
> > +   lens[i] = descs[avail_idx + i].len;
> > +
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +   if (unlikely(pkts[i]->pkt_len > (lens[i] - buf_offset)))
> > +   return -1;
> > +   }
> > +
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++)
> > +   desc_addrs[i] = vhost_iova_to_vva(dev, vq,
> > + descs[avail_idx + i].addr,
> > + &lens[i],
> > + VHOST_ACCESS_RW);
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +   if (unlikely(lens[i] != descs[avail_idx + i].len))
> > +   return -1;
> > +   }
> > +
> > +   UNROLL_PRAGMA(UNROLL_PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_BATCH_SIZE; i++) {
> > +   rte_prefetch0((void *)(uintptr_t)desc_addrs[i]);
> > +   hdrs[i] = (struct virtio_net_hdr_mrg_rxbuf *)
> > +   (uintptr_t)desc_addrs[i];
> > +   len

Re: [dpdk-dev] [PATCH v3 13/15] vhost: cache address translation result

2019-10-08 Thread Liu, Yong



> -Original Message-
> From: Bie, Tiwei
> Sent: Thursday, September 26, 2019 1:32 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Wang, Zhihong ;
> step...@networkplumber.org; gavin...@arm.com; dev@dpdk.org
> Subject: Re: [PATCH v3 13/15] vhost: cache address translation result
> 
> On Thu, Sep 26, 2019 at 01:13:27AM +0800, Marvin Liu wrote:
> > Cache address translation result and use it in next translation. Due
> > to limited regions are supported, buffers are most likely in same
> > region when doing data transmission.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> > index 7fb172912..d90235cd6 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -91,10 +91,18 @@ struct rte_vhost_mem_region {
> > int fd;
> >  };
> >
> > +struct rte_vhost_mem_region_cache {
> > +   uint64_t guest_phys_addr;
> > +   uint64_t guest_phys_addr_end;
> > +   int64_t host_user_addr_offset;
> > +   uint64_t size;
> > +};
> > +
> >  /**
> >   * Memory structure includes region and mapping information.
> >   */
> >  struct rte_vhost_memory {
> > +   struct rte_vhost_mem_region_cache cache_region;
> 
> This breaks ABI.
> 
Got, will remove it as no clear performance gain with this patch.

> > uint32_t nregions;
> > struct rte_vhost_mem_region regions[];
> >  };
> > @@ -232,11 +240,30 @@ rte_vhost_va_from_guest_pa(struct rte_vhost_memory
> *mem,
> > struct rte_vhost_mem_region *r;
> > uint32_t i;
> >
> > +   struct rte_vhost_mem_region_cache *r_cache;
> > +   /* check with cached region */
> > +   r_cache = &mem->cache_region;
> > +   if (likely(gpa >= r_cache->guest_phys_addr && gpa <
> > +  r_cache->guest_phys_addr_end)) {
> > +   if (unlikely(*len > r_cache->guest_phys_addr_end - gpa))
> > +   *len = r_cache->guest_phys_addr_end - gpa;
> > +
> > +   return gpa - r_cache->host_user_addr_offset;
> > +   }
> 
> Does this help a lot in performance?
> We can implement this caching for builtin backend first.
> 
Tiwei,

It won’t help too much in performance as region number will be 1 at most of 
times.
Will remove cache function in next version.

Thanks,
Marvin
> 
> > +
> > +
> > for (i = 0; i < mem->nregions; i++) {
> > r = &mem->regions[i];
> > if (gpa >= r->guest_phys_addr &&
> > gpa <  r->guest_phys_addr + r->size) {
> >
> > +   r_cache->guest_phys_addr = r->guest_phys_addr;
> > +   r_cache->guest_phys_addr_end = r->guest_phys_addr +
> > +  r->size;
> > +   r_cache->size = r->size;
> > +   r_cache->host_user_addr_offset = r->guest_phys_addr -
> > +r->host_user_addr;
> > +
> > if (unlikely(*len > r->guest_phys_addr + r->size - gpa))
> > *len = r->guest_phys_addr + r->size - gpa;
> >
> > --
> > 2.17.1
> >

Re: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function for packed ring

2019-09-25 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Wednesday, September 25, 2019 5:29 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org; nd ; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
> for packed ring
> 
> Hi Marvin,
> 
> > -----Original Message-
> > From: Liu, Yong 
> > Sent: Tuesday, September 24, 2019 11:31 AM
> > To: Gavin Hu (Arm Technology China) ;
> > maxime.coque...@redhat.com; Bie, Tiwei ; Wang,
> > Zhihong 
> > Cc: dev@dpdk.org; nd 
> > Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue
> function
> > for packed ring
> >
> > Thanks, Gavin. My comments are inline.
> >
> > > -Original Message-
> > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> > > Sent: Monday, September 23, 2019 7:09 PM
> > > To: Liu, Yong ; maxime.coque...@redhat.com; Bie,
> > Tiwei
> > > ; Wang, Zhihong 
> > > Cc: dev@dpdk.org; nd 
> > > Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue
> > function
> > > for packed ring
> > >
> > > Hi Marvin,
> > >
> > > Is it possible to vectorize the processing?
> > > Other comments inline:
> > > /Gavin
> >
> > Gavin,
> > According to our experiment, only vectorize some parts in [ed]nqueue
> > function can't benefit performance.
> > Functions like vhost_iova_to_vva and virtio_enqueue_offload can't be
> > easily vectorized as they are full of judgment conditions.
> >
> > Thanks,
> > Marvin
> >
> > > > -Original Message-
> > > > From: dev  On Behalf Of Marvin Liu
> > > > Sent: Friday, September 20, 2019 12:37 AM
> > > > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > > > zhihong.w...@intel.com
> > > > Cc: dev@dpdk.org; Marvin Liu 
> > > > Subject: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue
> function
> > > for
> > > > packed ring
> > > >
> > > > Burst enqueue function will first check whether descriptors are cache
> > > > aligned. It will also check prerequisites in the beginning. Burst
> > > > enqueue function not support chained mbufs, single packet enqueue
> > > > function will handle it.
> > > >
> > > > Signed-off-by: Marvin Liu 
> > > >
> > > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > > > index 5074226f0..67889c80a 100644
> > > > --- a/lib/librte_vhost/vhost.h
> > > > +++ b/lib/librte_vhost/vhost.h
> > > > @@ -39,6 +39,9 @@
> > > >
> > > >  #define VHOST_LOG_CACHE_NR 32
> > > >
> > > > +#define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \
> > > > +   sizeof(struct vring_packed_desc))
> > > > +
> > > >  #ifdef SUPPORT_GCC_UNROLL_PRAGMA
> > > >  #define PRAGMA_PARAM "GCC unroll 4"
> > > >  #endif
> > > > @@ -57,6 +60,8 @@
> > > >  #define UNROLL_PRAGMA(param) do {} while(0);
> > > >  #endif
> > > >
> > > > +#define PACKED_BURST_MASK (PACKED_DESCS_BURST - 1)
> > > > +
> > > >  /**
> > > >   * Structure contains buffer address, length and descriptor index
> > > >   * from vring to do scatter RX.
> > > > diff --git a/lib/librte_vhost/virtio_net.c
> > > b/lib/librte_vhost/virtio_net.c
> > > > index 2b5c47145..c664b27c5 100644
> > > > --- a/lib/librte_vhost/virtio_net.c
> > > > +++ b/lib/librte_vhost/virtio_net.c
> > > > @@ -895,6 +895,84 @@ virtio_dev_rx_split(struct virtio_net *dev,
> > struct
> > > > vhost_virtqueue *vq,
> > > > return pkt_idx;
> > > >  }
> > > >
> > > > +static __rte_unused __rte_always_inline int
> > > I remember "__rte_always_inline" should start at the first and separate
> > > line, otherwise you will get a style issue.
> > > /Gavin
> > > > +virtio_dev_rx_burst_packed(struct virtio_net *dev, struct
> > > vhost_virtqueue
> > > > *vq,
> > > > +struct rte_mbuf **pkts)
> > > > +{
> > > > +   bool wrap_counter = vq->avail_wrap_counter;
> > > > +   struct vring_packed_desc *descs = vq->desc_packed;
> > > > +

Re: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for burst enqueue

2019-09-24 Thread Liu, Yong

> -Original Message-
> From: Liu, Yong
> Sent: Wednesday, September 25, 2019 1:38 PM
> To: Gavin Hu (Arm Technology China) ;
> maxime.coque...@redhat.com; Bie, Tiwei ; Wang, Zhihong
> 
> Cc: dev@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for
> burst enqueue
> 
> 
> 
> > -Original Message-
> > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> > Sent: Wednesday, September 25, 2019 11:38 AM
> > To: Liu, Yong ; maxime.coque...@redhat.com; Bie,
> Tiwei
> > ; Wang, Zhihong 
> > Cc: dev@dpdk.org; nd 
> > Subject: RE: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for
> > burst enqueue
> >
> > Hi Marvin,
> >
> > One typo and one comment about the barrier.
> >
> > /Gavin
> >
> > > -Original Message-
> > > From: dev  On Behalf Of Marvin Liu
> > > Sent: Friday, September 20, 2019 12:37 AM
> > > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > > zhihong.w...@intel.com
> > > Cc: dev@dpdk.org; Marvin Liu 
> > > Subject: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for
> burst
> > > enqueue
> > >
> > > Flush used flags when burst enqueue function is finished. Descriptor's
> > > flags are pre-calculated as them will be reset by vhost.
> > s/them/they
> >
> 
> Thanks.
> 
> > >
> > > Signed-off-by: Marvin Liu 
> > >
> > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > > index 000648dd4..9c42c7db0 100644
> > > --- a/lib/librte_vhost/vhost.h
> > > +++ b/lib/librte_vhost/vhost.h
> > > @@ -39,6 +39,9 @@
> > >
> > >  #define VHOST_LOG_CACHE_NR 32
> > >
> > > +#define VIRTIO_RX_USED_FLAG  (0ULL | VRING_DESC_F_AVAIL |
> > > VRING_DESC_F_USED \
> > > + | VRING_DESC_F_WRITE)
> > > +#define VIRTIO_RX_USED_WRAP_FLAG (VRING_DESC_F_WRITE)
> > >  #define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \
> > >   sizeof(struct vring_packed_desc))
> > >
> > > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c
> > > index e2787b72e..8e4036204 100644
> > > --- a/lib/librte_vhost/virtio_net.c
> > > +++ b/lib/librte_vhost/virtio_net.c
> > > @@ -169,6 +169,51 @@ update_shadow_packed(struct vhost_virtqueue
> > > *vq,
> > >   vq->shadow_used_packed[i].count = count;
> > >  }
> > >
> > > +static __rte_always_inline void
> > > +flush_burst_packed(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > > + uint64_t *lens, uint16_t *ids, uint16_t flags)
> > > +{
> > > + uint16_t i;
> > > +
> > > + UNROLL_PRAGMA(PRAGMA_PARAM)
> > > + for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > > + vq->desc_packed[vq->last_used_idx + i].id = ids[i];
> > > + vq->desc_packed[vq->last_used_idx + i].len = lens[i];
> > > + }
> > > +
> > > + UNROLL_PRAGMA(PRAGMA_PARAM)
> > > + for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > > + rte_smp_wmb();
> > Should this rte_smp_wmb() be moved above the loop? It guarantees the
> > orderings of updates of id, len happens before the flags,
> > But all the flags of different descriptors should not be ordered.
> >
> Hi Gavin,
> For each descriptor, virtio driver will first check flags and then check
> read barrier, at the last driver will read id and length.
> So wmb here is to guarantee that id and length are updated before flags.
> And afterwards wmb is to guarantee the sequence.
> 
Gavin,
Checked with master branch, flags store sequence is not needed.
But in my environment, performance will be a litter better if ordered flags 
store.
I think it may be harmless to place wmb here. How about your idea?

> Thanks,
> Marvin
> 
> > > + vq->desc_packed[vq->last_used_idx + i].flags = flags;
> > > + }
> > > +
> > > + vhost_log_cache_used_vring(dev, vq, vq->last_used_idx *
> > > +sizeof(struct vring_packed_desc),
> > > +sizeof(struct vring_packed_desc) *
> > > +PACKED_DESCS_BURST);
> > > + vhost_log_cache_sync(dev, vq);
> > > +
> > > + vq->last_used_idx += PACKED_DESCS_BURST;
> > > + if (vq->last_used_idx >= vq->size) {
> > > + vq->used_wrap_counter ^= 1;
> > > + vq->last_used_id

Re: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for burst enqueue

2019-09-24 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Wednesday, September 25, 2019 11:38 AM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for
> burst enqueue
> 
> Hi Marvin,
> 
> One typo and one comment about the barrier.
> 
> /Gavin
> 
> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:37 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 08/16] vhost: add flush function for burst
> > enqueue
> >
> > Flush used flags when burst enqueue function is finished. Descriptor's
> > flags are pre-calculated as them will be reset by vhost.
> s/them/they
> 

Thanks.

> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 000648dd4..9c42c7db0 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,9 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#define VIRTIO_RX_USED_FLAG  (0ULL | VRING_DESC_F_AVAIL |
> > VRING_DESC_F_USED \
> > +   | VRING_DESC_F_WRITE)
> > +#define VIRTIO_RX_USED_WRAP_FLAG (VRING_DESC_F_WRITE)
> >  #define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \
> > sizeof(struct vring_packed_desc))
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index e2787b72e..8e4036204 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -169,6 +169,51 @@ update_shadow_packed(struct vhost_virtqueue
> > *vq,
> > vq->shadow_used_packed[i].count = count;
> >  }
> >
> > +static __rte_always_inline void
> > +flush_burst_packed(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > +   uint64_t *lens, uint16_t *ids, uint16_t flags)
> > +{
> > +   uint16_t i;
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > +   vq->desc_packed[vq->last_used_idx + i].id = ids[i];
> > +   vq->desc_packed[vq->last_used_idx + i].len = lens[i];
> > +   }
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > +   rte_smp_wmb();
> Should this rte_smp_wmb() be moved above the loop? It guarantees the
> orderings of updates of id, len happens before the flags,
> But all the flags of different descriptors should not be ordered.
> 
Hi Gavin,
For each descriptor, virtio driver will first check flags and then check read 
barrier, at the last driver will read id and length.
So wmb here is to guarantee that id and length are updated before flags. And 
afterwards wmb is to guarantee the sequence.

Thanks,
Marvin

> > +   vq->desc_packed[vq->last_used_idx + i].flags = flags;
> > +   }
> > +
> > +   vhost_log_cache_used_vring(dev, vq, vq->last_used_idx *
> > +  sizeof(struct vring_packed_desc),
> > +  sizeof(struct vring_packed_desc) *
> > +  PACKED_DESCS_BURST);
> > +   vhost_log_cache_sync(dev, vq);
> > +
> > +   vq->last_used_idx += PACKED_DESCS_BURST;
> > +   if (vq->last_used_idx >= vq->size) {
> > +   vq->used_wrap_counter ^= 1;
> > +   vq->last_used_idx -= vq->size;
> > +   }
> > +}
> > +
> > +static __rte_always_inline void
> > +flush_enqueue_burst_packed(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > +   uint64_t *lens, uint16_t *ids)
> > +{
> > +   uint16_t flags = 0;
> > +
> > +   if (vq->used_wrap_counter)
> > +   flags = VIRTIO_RX_USED_FLAG;
> > +   else
> > +   flags = VIRTIO_RX_USED_WRAP_FLAG;
> > +
> > +   flush_burst_packed(dev, vq, lens, ids, flags);
> > +}
> > +
> >  static __rte_always_inline void
> >  update_enqueue_shadow_packed(struct vhost_virtqueue *vq, uint16_t
> > desc_idx,
> > uint32_t len, uint16_t count)
> > @@ -950,6 +995,7 @@ virtio_dev_rx_burst_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq,
> > struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_DESCS_BURST];
> > uint32_t buf_offset = dev->vhost_hlen;
> > uint64_t lens[PACKED_DESCS_BURST];
> > +   uint16_t ids[PACKED_DESCS_BURST];
> >
> > uint16_t i;
> >
> > @@ -1013,6 +1059,12 @@ virtio_dev_rx_burst_packed(struct virtio_net
> > *dev, struct vhost_virtqueue *vq,
> >pkts[i]->pkt_len);
> > }
> >
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++)
> > +   ids[i] = descs[avail_idx + i].id;
> > +
> > +   flush_enqueue_burst_packed(dev, vq, lens, ids);
> > +
> > return 0;
> >  }
> >
> > --
> > 2.17.1

Re: [dpdk-dev] [PATCH] net/virtio: fix mbuf data and pkt length mismatch

2019-09-23 Thread Liu, Yong

> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Monday, September 23, 2019 11:22 PM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; Bie, Tiwei ; Wang,
> Zhihong ; dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [PATCH] net/virtio: fix mbuf data and pkt length mismatch
> 
> On Mon, 23 Sep 2019 22:05:11 +0800
> Marvin Liu  wrote:
> 
> > If reserve virtio header room by function rte_pktmbuf_prepend, both
> > segment data length and packet length of mbuf will be increased.
> > Data length will be equal to descriptor length, while packet length
> > should be decreased as virtio-net header won't be taken into packet.
> > Thus will cause mismatch in mbuf structure. Fix this issue by access
> > mbuf data directly and increase descriptor length if it is needed.
> >
> > Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> > Fixes: 892dc798fa9c ("net/virtio: implement Tx path for packed queues")
> > Fixes: 4905ed3a523f ("net/virtio: optimize Tx enqueue for packed ring")
> > Fixes: e5f456a98d3c ("net/virtio: support in-order Rx and Tx")
> > Cc: sta...@dpdk.org
> >
> > Reported-by: Stephen Hemminger 
> > Signed-off-by: Marvin Liu 
> 
> Looks good, for current code.
> Won't apply cleanly to 18.11. Could you send a version for that as well?

Thanks for reviewing, version for 18.11 has sent to stable mailing list.

Regards,
Marvin

Re: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function for packed ring

2019-09-23 Thread Liu, Yong

Thanks, Gavin. My comments are inline.

> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Monday, September 23, 2019 7:09 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org; nd 
> Subject: RE: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
> for packed ring
> 
> Hi Marvin,
> 
> Is it possible to vectorize the processing?
> Other comments inline:
> /Gavin

Gavin,
According to our experiment, only vectorize some parts in [ed]nqueue function 
can't benefit performance.
Functions like vhost_iova_to_vva and virtio_enqueue_offload can't be easily 
vectorized as they are full of judgment conditions.

Thanks,
Marvin

> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:37 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 03/16] vhost: add burst enqueue function
> for
> > packed ring
> >
> > Burst enqueue function will first check whether descriptors are cache
> > aligned. It will also check prerequisites in the beginning. Burst
> > enqueue function not support chained mbufs, single packet enqueue
> > function will handle it.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 5074226f0..67889c80a 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,9 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#define PACKED_DESCS_BURST (RTE_CACHE_LINE_SIZE / \
> > +   sizeof(struct vring_packed_desc))
> > +
> >  #ifdef SUPPORT_GCC_UNROLL_PRAGMA
> >  #define PRAGMA_PARAM "GCC unroll 4"
> >  #endif
> > @@ -57,6 +60,8 @@
> >  #define UNROLL_PRAGMA(param) do {} while(0);
> >  #endif
> >
> > +#define PACKED_BURST_MASK (PACKED_DESCS_BURST - 1)
> > +
> >  /**
> >   * Structure contains buffer address, length and descriptor index
> >   * from vring to do scatter RX.
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 2b5c47145..c664b27c5 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -895,6 +895,84 @@ virtio_dev_rx_split(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > return pkt_idx;
> >  }
> >
> > +static __rte_unused __rte_always_inline int
> I remember "__rte_always_inline" should start at the first and separate
> line, otherwise you will get a style issue.
> /Gavin
> > +virtio_dev_rx_burst_packed(struct virtio_net *dev, struct
> vhost_virtqueue
> > *vq,
> > +struct rte_mbuf **pkts)
> > +{
> > +   bool wrap_counter = vq->avail_wrap_counter;
> > +   struct vring_packed_desc *descs = vq->desc_packed;
> > +   uint16_t avail_idx = vq->last_avail_idx;
> > +
> > +   uint64_t desc_addrs[PACKED_DESCS_BURST];
> > +   struct virtio_net_hdr_mrg_rxbuf *hdrs[PACKED_DESCS_BURST];
> > +   uint32_t buf_offset = dev->vhost_hlen;
> > +   uint64_t lens[PACKED_DESCS_BURST];
> > +
> > +   uint16_t i;
> > +
> > +   if (unlikely(avail_idx & PACKED_BURST_MASK))
> > +   return -1;
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > +   if (unlikely(pkts[i]->next != NULL))
> > +   return -1;
> > +   if (unlikely(!desc_is_avail(&descs[avail_idx + i],
> > +   wrap_counter)))
> > +   return -1;
> > +   }
> > +
> > +   rte_smp_rmb();
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++)
> > +   lens[i] = descs[avail_idx + i].len;
> Looks like the code is a strong candidate for vectorization.
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++) {
> > +   if (unlikely(pkts[i]->pkt_len > (lens[i] - buf_offset)))
> > +   return -1;
> > +   }
> > +
> > +   UNROLL_PRAGMA(PRAGMA_PARAM)
> > +   for (i = 0; i < PACKED_DESCS_BURST; i++)
> > +   desc_addrs[i] = vhost_iova_to_vva(dev, vq,
> > +descs[avail_idx + i].addr,
> > +&lens[i],
> > +

Re: [dpdk-dev] [PATCH v2 02/16] vhost: unify unroll pragma parameter

2019-09-23 Thread Liu, Yong

> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Monday, September 23, 2019 6:09 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong ; Richardson,
> Bruce 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 02/16] vhost: unify unroll pragma
> parameter
> 
> Hi Marvin,
> 
> One general comment and other comments inline:
> 1. Meson build should also be supported as Makefile is phasing out and
> Meson is the future in DPDK.
> 
> /Gavin
> 

Thanks, Gavin. Will update meson build file in next release.

> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:36 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 02/16] vhost: unify unroll pragma parameter
> >
> > Add macro for unifying Clang/ICC/GCC unroll pragma format. Burst
> > functions were contained of several small loops which optimized by
> > compiler’s loop unrolling pragma.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 8623e91c0..30839a001 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -16,6 +16,24 @@ CFLAGS += -I vhost_user
> >  CFLAGS += -fno-strict-aliasing
> >  LDLIBS += -lpthread
> >
> > +ifeq ($(RTE_TOOLCHAIN), gcc)
> > +ifeq ($(shell test $(GCC_VERSION) -ge 83 && echo 1), 1)
> It is better to move this toolchain version related definition to eg:
> mk/toolchain/icc/rte.toolchain-compat.mk.
> There are a lot of similar stuff over there.
> Although "CFLAGS" was added to sth under this subfolder, it still applies
> globally to other components.
> /Gavin
> > +CFLAGS += -DSUPPORT_GCC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), clang)
> > +ifeq ($(shell test $(CLANG_MAJOR_VERSION)$(CLANG_MINOR_VERSION) -
> > ge 37 && echo 1), 1)
> > +CFLAGS += -DSUPPORT_CLANG_UNROLL_PRAGMA
> Why not combine all the three "-DSUPPORT_*_UNROLL_PRAGMA" into one "-
> DSUPPORT_ UNROLL_PRAGMA" for simplicity?
> Any differences for the support by different compilers?
> /Gavin

Gavin,
This is due to parameter format of pragmas are different between compilers. So 
here created several macros for each compiler.

> > +endif
> > +endif
> > +
> > +ifeq ($(RTE_TOOLCHAIN), icc)
> > +ifeq ($(shell test $(ICC_MAJOR_VERSION) -ge 16 && echo 1), 1)
> > +CFLAGS += -DSUPPORT_ICC_UNROLL_PRAGMA
> > +endif
> > +endif
> > +
> >  ifeq ($(CONFIG_RTE_LIBRTE_VHOST_NUMA),y)
> >  LDLIBS += -lnuma
> >  endif
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 884befa85..5074226f0 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -39,6 +39,24 @@
> >
> >  #define VHOST_LOG_CACHE_NR 32
> >
> > +#ifdef SUPPORT_GCC_UNROLL_PRAGMA
> > +#define PRAGMA_PARAM "GCC unroll 4"
> > +#endif
> > +
> > +#ifdef SUPPORT_CLANG_UNROLL_PRAGMA
> > +#define PRAGMA_PARAM "unroll 4"
> > +#endif
> > +
> > +#ifdef SUPPORT_ICC_UNROLL_PRAGMA
> > +#define PRAGMA_PARAM "unroll (4)"
> > +#endif
> > +
> > +#ifdef PRAGMA_PARAM
> > +#define UNROLL_PRAGMA(param) _Pragma(param)
> > +#else
> > +#define UNROLL_PRAGMA(param) do {} while(0);
> > +#endif
> > +
> >  /**
> >   * Structure contains buffer address, length and descriptor index
> >   * from vring to do scatter RX.
> > --
> > 2.17.1
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.

Re: [dpdk-dev] [PATCH v2 00/16] vhost packed ring performance optimization

2019-09-23 Thread Liu, Yong

Sure, have changed state of V1.

> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Monday, September 23, 2019 5:05 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 00/16] vhost packed ring performance
> optimization
> 
> Hi Marvin,
> 
> A general comment for the series, could you mark V1 Superseded?
> 
> /Gavin
> 
> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:36 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 00/16] vhost packed ring performance
> > optimization
> >
> > Packed ring has more compact ring format and thus can significantly
> > reduce the number of cache miss. It can lead to better performance.
> > This has been approved in virtio user driver, on normal E5 Xeon cpu
> > single core performance can raise 12%.
> >
> > http://mails.dpdk.org/archives/dev/2018-April/095470.html
> >
> > However vhost performance with packed ring performance was decreased.
> > Through analysis, mostly extra cost was from the calculating of each
> > descriptor flag which depended on ring wrap counter. Moreover, both
> > frontend and backend need to write same descriptors which will cause
> > cache contention. Especially when doing vhost enqueue function, virtio
> > refill packed ring function may write same cache line when vhost doing
> > enqueue function. This kind of extra cache cost will reduce the benefit
> > of reducing cache misses.
> >
> > For optimizing vhost packed ring performance, vhost enqueue and dequeue
> > function will be splitted into fast and normal path.
> >
> > Several methods will be taken in fast path:
> >   Uroll burst loop function into more pieces.
> >   Handle descriptors in one cache line simultaneously.
> >   Prerequisite check that whether I/O space can copy directly into mbuf
> > space and vice versa.
> >   Prerequisite check that whether descriptor mapping is successful.
> >   Distinguish vhost used ring update function by enqueue and dequeue
> > function.
> >   Buffer dequeue used descriptors as many as possible.
> >   Update enqueue used descriptors by cache line.
> >   Cache memory region structure for fast conversion.
> >   Disable sofware prefetch is hardware can do better.
> >
> > After all these methods done, single core vhost PvP performance with 64B
> > packet on Xeon 8180 can boost 40%.
> >
> > v2:
> > - Utilize compiler's pragma to unroll loop, distinguish clang/icc/gcc
> > - Buffered dequeue used desc number changed to (RING_SZ - PKT_BURST)
> > - Optimize dequeue used ring update when in_order negotiated
> >
> > Marvin Liu (16):
> >   vhost: add single packet enqueue function
> >   vhost: unify unroll pragma parameter
> >   vhost: add burst enqueue function for packed ring
> >   vhost: add single packet dequeue function
> >   vhost: add burst dequeue function
> >   vhost: rename flush shadow used ring functions
> >   vhost: flush vhost enqueue shadow ring by burst
> >   vhost: add flush function for burst enqueue
> >   vhost: buffer vhost dequeue shadow ring
> >   vhost: split enqueue and dequeue flush functions
> >   vhost: optimize enqueue function of packed ring
> >   vhost: add burst and single zero dequeue functions
> >   vhost: optimize dequeue function of packed ring
> >   vhost: cache address translation result
> >   vhost: check whether disable software pre-fetch
> >   vhost: optimize packed ring dequeue when in-order
> >
> >  lib/librte_vhost/Makefile |   24 +
> >  lib/librte_vhost/rte_vhost.h  |   27 +
> >  lib/librte_vhost/vhost.h  |   33 +
> >  lib/librte_vhost/virtio_net.c | 1071 +++--
> >  4 files changed, 960 insertions(+), 195 deletions(-)
> >
> > --
> > 2.17.1
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.

Re: [dpdk-dev] [PATCH v2 01/16] vhost: add single packet enqueue function

2019-09-23 Thread Liu, Yong




> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Monday, September 23, 2019 5:09 PM
> To: Liu, Yong ; maxime.coque...@redhat.com; Bie, Tiwei
> ; Wang, Zhihong 
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2 01/16] vhost: add single packet enqueue
> function
> 
> Hi Marvin,
> 
> > -Original Message-
> > From: dev  On Behalf Of Marvin Liu
> > Sent: Friday, September 20, 2019 12:36 AM
> > To: maxime.coque...@redhat.com; tiwei@intel.com;
> > zhihong.w...@intel.com
> > Cc: dev@dpdk.org; Marvin Liu 
> > Subject: [dpdk-dev] [PATCH v2 01/16] vhost: add single packet enqueue
> > function
> >
> > Add vhost enqueue function for single packet and meanwhile left space
> > for flush used ring function.
> >
> > Signed-off-by: Marvin Liu 
> >
> > diff --git a/lib/librte_vhost/virtio_net.c
> b/lib/librte_vhost/virtio_net.c
> > index 5b85b832d..2b5c47145 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -774,6 +774,70 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> >   return error;
> >  }
> >
> > +/*
> > + * Returns -1 on fail, 0 on success
> > + */
> > +static __rte_always_inline int
> > +vhost_enqueue_single_packed(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> > + struct rte_mbuf *pkt, struct buf_vector *buf_vec, uint16_t
> > *nr_descs)
> > +{
> > + uint16_t nr_vec = 0;
> > +
> > + uint16_t avail_idx;
> > + uint16_t max_tries, tries = 0;
> > +
> > + uint16_t buf_id = 0;
> > + uint32_t len = 0;
> > + uint16_t desc_count;
> > +
> > + uint32_t size = pkt->pkt_len + dev->vhost_hlen;
> > + avail_idx = vq->last_avail_idx;
> > +
> > + if (rxvq_is_mergeable(dev))
> > + max_tries = vq->size - 1;
> > + else
> > + max_tries = 1;
> > +
> > + uint16_t num_buffers = 0;
> > +
> > + while (size > 0) {
> > + /*
> > +  * if we tried all available ring items, and still
> > +  * can't get enough buf, it means something abnormal
> > +  * happened.
> > +  */
> > + if (unlikely(++tries > max_tries))
> > + return -1;
> > +
> > + if (unlikely(fill_vec_buf_packed(dev, vq,
> > + avail_idx, &desc_count,
> > + buf_vec, &nr_vec,
> > + &buf_id, &len,
> > + VHOST_ACCESS_RW) < 0)) {
> > + return -1;
> > + }
> > +
> > + len = RTE_MIN(len, size);
> > +
> > + size -= len;
> > +
> > + avail_idx += desc_count;
> > + if (avail_idx >= vq->size)
> > + avail_idx -= vq->size;
> > +
> > + *nr_descs += desc_count;
> > + num_buffers += 1;
> > + }
> > +
> > + if (copy_mbuf_to_desc(dev, vq, pkt,
> > + buf_vec, nr_vec,
> > + num_buffers) < 0) {
> > + return 0;
> Why return 0, which means success, while "copy_mbuf_to_desc" encounters
> some problems and failed?
> /Gavin

Gavin,
Thanks for notice this typo issue. Here should return negative value -1.

Regards,
Marvin

> > + }
> > +
> > + return 0;
> > +}
> > +
> >  static __rte_noinline uint32_t
> >  virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >   struct rte_mbuf **pkts, uint32_t count)
> > @@ -831,6 +895,35 @@ virtio_dev_rx_split(struct virtio_net *dev, struct
> > vhost_virtqueue *vq,
> >   return pkt_idx;
> >  }
> >
> > +static __rte_unused int16_t
> > +virtio_dev_rx_single_packed(struct virtio_net *dev, struct
> vhost_virtqueue
> > *vq,
> > + struct rte_mbuf *pkt)
> > +{
> > + struct buf_vector buf_vec[BUF_VECTOR_MAX];
> > + uint16_t nr_descs = 0;
> > +
> > + rte_smp_rmb();
> > + if (unlikely(vhost_enqueue_single_packed(dev, vq, pkt, buf_vec,
> > +  &nr_descs) < 0)) {
> > + VHOST_LOG_DEBUG(VHOST_DATA,
> >

Re: [dpdk-dev] examples/l3fwd-power: fix RX interrupt disable

2019-09-19 Thread Liu, Yong

> -Original Message-
> From: Zhang, Xiao
> Sent: Wednesday, September 11, 2019 12:10 AM
> To: dev@dpdk.org
> Cc: Liu, Yong ; Zhang, Xiao ;
> sta...@dpdk.org
> Subject: examples/l3fwd-power: fix RX interrupt disable
> 
> Interrupt will not be received when disabling RX interrupt without
> synchronization mechanism sometimes which leads to wake up issue,
> add spinlock to fix it.
> 
> Fixes: b736d64787fc ("mples/l3fwd-power: disable Rx interrupt when
> waking up")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Xiao Zhang 
> ---
>  examples/l3fwd-power/main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
> index fd8d952..ff1ad37 100644
> --- a/examples/l3fwd-power/main.c
> +++ b/examples/l3fwd-power/main.c
> @@ -880,7 +880,9 @@ sleep_until_rx_interrupt(int num)
>   port_id = ((uintptr_t)data) >> CHAR_BIT;
>   queue_id = ((uintptr_t)data) &
>   RTE_LEN2MASK(CHAR_BIT, uint8_t);
> + rte_spinlock_lock(&(locks[port_id]));
>   rte_eth_dev_rx_intr_disable(port_id, queue_id);
> + rte_spinlock_unlock(&(locks[port_id]));
>   RTE_LOG(INFO, L3FWD_POWER,
>   "lcore %u is waked up from rx interrupt on"
>   " port %d queue %d\n",
> --
> 2.7.4

Reviewed-by: Marvin Liu


Regards,
Marvin

Re: [dpdk-dev] [PATCH v2 2/2] net/virtio: on demand cleanup when doing in order xmit

2019-09-17 Thread Liu, Yong

Thanks for comments, will update in next patch.

> -Original Message-
> From: Bie, Tiwei
> Sent: Wednesday, September 18, 2019 10:44 AM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; dev@dpdk.org
> Subject: Re: [PATCH v2 2/2] net/virtio: on demand cleanup when doing in
> order xmit
> 
> On Wed, Sep 11, 2019 at 12:14:46AM +0800, Marvin Liu wrote:
> > Check whether space are enough before burst enqueue operation. If more
> > space is needed, will try to cleanup used descriptors for space on
> > demand. It can give more chances to free used descriptors, thus will
> > help RFC2544 performance.
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  drivers/net/virtio/virtio_rxtx.c | 73 +++-
> >  1 file changed, 54 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_rxtx.c
> b/drivers/net/virtio/virtio_rxtx.c
> > index d3ca36831..842b600c3 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -2152,6 +2152,22 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> > return nb_tx;
> >  }
> >
> > +static __rte_always_inline int
> > +virtio_xmit_try_cleanup_inorder(struct virtqueue *vq, uint16_t need)
> > +{
> > +   uint16_t nb_used, nb_clean, nb_descs;
> > +   struct virtio_hw *hw = vq->hw;
> > +
> > +   nb_descs = vq->vq_free_cnt + need;
> > +   nb_used = VIRTQUEUE_NUSED(vq);
> > +   virtio_rmb(hw->weak_barriers);
> > +   nb_clean = RTE_MIN(need, (int)nb_used);
> > +
> > +   virtio_xmit_cleanup_inorder(vq, nb_clean);
> > +
> > +   return (nb_descs - vq->vq_free_cnt);
> > +}
> > +
> >  uint16_t
> >  virtio_xmit_pkts_inorder(void *tx_queue,
> > struct rte_mbuf **tx_pkts,
> > @@ -2161,8 +2177,9 @@ virtio_xmit_pkts_inorder(void *tx_queue,
> > struct virtqueue *vq = txvq->vq;
> > struct virtio_hw *hw = vq->hw;
> > uint16_t hdr_size = hw->vtnet_hdr_size;
> > -   uint16_t nb_used, nb_avail, nb_tx = 0, nb_inorder_pkts = 0;
> > +   uint16_t nb_used, nb_tx = 0, nb_inorder_pkts = 0;
> > struct rte_mbuf *inorder_pkts[nb_pkts];
> > +   int need, nb_left;
> >
> > if (unlikely(hw->started == 0 && tx_pkts != hw->inject_pkts))
> > return nb_tx;
> > @@ -2175,17 +2192,12 @@ virtio_xmit_pkts_inorder(void *tx_queue,
> > nb_used = VIRTQUEUE_NUSED(vq);
> >
> > virtio_rmb(hw->weak_barriers);
> > -   if (likely(nb_used > vq->vq_nentries - vq->vq_free_thresh))
> > -   virtio_xmit_cleanup_inorder(vq, nb_used);
> > -
> > -   if (unlikely(!vq->vq_free_cnt))
> > +   if (likely(nb_used > (vq->vq_nentries - vq->vq_free_thresh)))
> > virtio_xmit_cleanup_inorder(vq, nb_used);
> >
> > -   nb_avail = RTE_MIN(vq->vq_free_cnt, nb_pkts);
> > -
> > -   for (nb_tx = 0; nb_tx < nb_avail; nb_tx++) {
> > +   for (nb_tx = 0; nb_tx < nb_pkts; nb_tx++) {
> > struct rte_mbuf *txm = tx_pkts[nb_tx];
> > -   int slots, need;
> > +   int slots;
> >
> > /* optimize ring usage */
> > if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
> > @@ -2199,11 +2211,25 @@ virtio_xmit_pkts_inorder(void *tx_queue,
> > inorder_pkts[nb_inorder_pkts] = txm;
> > nb_inorder_pkts++;
> >
> > -   virtio_update_packet_stats(&txvq->stats, txm);
> > continue;
> > }
> >
> > if (nb_inorder_pkts) {
> > +   need = nb_inorder_pkts - vq->vq_free_cnt;
> > +
> > +   if (unlikely(need > 0)) {
> > +   nb_left = virtio_xmit_try_cleanup_inorder(vq,
> > +   need);
> 
> There is no need to introduce `nb_left`. Looks better
> to just reuse `need`.
> 
> > +
> > +   if (unlikely(nb_left > 0)) {
> > +   PMD_TX_LOG(ERR,
> > +   "No free tx descriptors to "
> > +   "transmit");
> > +   nb_inorder_pkts = vq->vq_free_cnt;
> 
> You need to handle nb_tx as well, otherwise mbufs will leak.
> Or maybe just leave nb_inorder_pkts unchanged, and let the code
> outside th

Re: [dpdk-dev] [PATCH v2 1/2] net/virtio: update stats when in order xmit done

2019-09-17 Thread Liu, Yong



> -Original Message-
> From: Bie, Tiwei
> Sent: Wednesday, September 18, 2019 10:35 AM
> To: Liu, Yong 
> Cc: maxime.coque...@redhat.com; dev@dpdk.org
> Subject: Re: [PATCH v2 1/2] net/virtio: update stats when in order xmit
> done
> 
> On Wed, Sep 11, 2019 at 12:14:45AM +0800, Marvin Liu wrote:
> > When doing xmit in-order enqueue, packets are buffered and then flushed
> > into avail ring. Buffered packets can be dropped due to insufficient
> > space. Moving stats update action just after successful avail ring
> > updates can guarantee correctness.
> >
> > Signed-off-by: Marvin Liu 
> > ---
> >  drivers/net/virtio/virtio_rxtx.c | 87 
> >  1 file changed, 44 insertions(+), 43 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_rxtx.c
> b/drivers/net/virtio/virtio_rxtx.c
> > index 27ead19fb..d3ca36831 100644
> > --- a/drivers/net/virtio/virtio_rxtx.c
> > +++ b/drivers/net/virtio/virtio_rxtx.c
> > @@ -106,6 +106,48 @@ vq_ring_free_id_packed(struct virtqueue *vq,
> uint16_t id)
> > dxp->next = VQ_RING_DESC_CHAIN_END;
> >  }
> >
> > +static inline void
> > +virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf
> *mbuf)
> > +{
> > +   uint32_t s = mbuf->pkt_len;
> > +   struct rte_ether_addr *ea;
> > +
> > +   stats->bytes += s;
> > +
> > +   if (s == 64) {
> > +   stats->size_bins[1]++;
> > +   } else if (s > 64 && s < 1024) {
> > +   uint32_t bin;
> > +
> > +   /* count zeros, and offset into correct bin */
> > +   bin = (sizeof(s) * 8) - __builtin_clz(s) - 5;
> > +   stats->size_bins[bin]++;
> > +   } else {
> > +   if (s < 64)
> > +   stats->size_bins[0]++;
> > +   else if (s < 1519)
> > +   stats->size_bins[6]++;
> > +   else
> > +   stats->size_bins[7]++;
> > +   }
> > +
> > +   ea = rte_pktmbuf_mtod(mbuf, struct rte_ether_addr *);
> > +   if (rte_is_multicast_ether_addr(ea)) {
> > +   if (rte_is_broadcast_ether_addr(ea))
> > +   stats->broadcast++;
> > +   else
> > +   stats->multicast++;
> > +   }
> > +}
> > +
> > +static inline void
> > +virtio_rx_stats_updated(struct virtnet_rx *rxvq, struct rte_mbuf *m)
> > +{
> > +   VIRTIO_DUMP_PACKET(m, m->data_len);
> > +
> > +   virtio_update_packet_stats(&rxvq->stats, m);
> > +}
> > +
> >  static uint16_t
> >  virtqueue_dequeue_burst_rx_packed(struct virtqueue *vq,
> >   struct rte_mbuf **rx_pkts,
> > @@ -317,7 +359,7 @@ virtio_xmit_cleanup(struct virtqueue *vq, uint16_t
> num)
> >  }
> >
> >  /* Cleanup from completed inorder transmits. */
> > -static void
> > +static __rte_always_inline void
> >  virtio_xmit_cleanup_inorder(struct virtqueue *vq, uint16_t num)
> >  {
> > uint16_t i, idx = vq->vq_used_cons_idx;
> > @@ -596,6 +638,7 @@ virtqueue_enqueue_xmit_inorder(struct virtnet_tx
> *txvq,
> > dxp = &vq->vq_descx[vq->vq_avail_idx & (vq->vq_nentries - 1)];
> > dxp->cookie = (void *)cookies[i];
> > dxp->ndescs = 1;
> > +   virtio_update_packet_stats(&txvq->stats, cookies[i]);
> 
> The virtio_update_packet_stats() call in virtio_xmit_pkts_inorder()
> should be removed.
> 

Hi Tiwei,
Function remained in virtio_xmit_pkts_inorder is for those packets not handled 
by burst enqueue function.
Statistic of packets which handled in burst in_order enqueue function is 
updated in inner loop.

Thanks,
Marvin

> 
> >
> > hdr = (struct virtio_net_hdr *)
> > rte_pktmbuf_prepend(cookies[i], head_size);
> > @@ -1083,48 +1126,6 @@ virtio_discard_rxbuf_inorder(struct virtqueue *vq,
> struct rte_mbuf *m)
> > }
> >  }
> >
> > -static inline void
> > -virtio_update_packet_stats(struct virtnet_stats *stats, struct rte_mbuf
> *mbuf)
> > -{
> > -   uint32_t s = mbuf->pkt_len;
> > -   struct rte_ether_addr *ea;
> > -
> > -   stats->bytes += s;
> > -
> > -   if (s == 64) {
> > -   stats->size_bins[1]++;
> > -   } else if (s > 64 && s < 1024) {
> > -   uint32_t bin;
> > -
> > -   /* count zeros, and offset into correct bin */
> > -   bin = (sizeo

Re: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed vring desc avail flags

2019-09-10 Thread Liu, Yong

Thanks Gavin, my answers are inline.

> -Original Message-
> From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> Sent: Wednesday, September 11, 2019 11:35 AM
> To: Liu, Yong ; Wang, Yinan ;
> Maxime Coquelin ; Joyce Kong (Arm Technology
> China) ; dev@dpdk.org
> Cc: nd ; Bie, Tiwei ; Wang, Zhihong
> ; amore...@redhat.com; Wang, Xiao W
> ; jfreim...@redhat.com; Honnappa Nagarahalli
> ; Steve Capper 
> Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed
> vring desc avail flags
> 
> Hi Marvin,
> 
> Thanks for your answers, one more question for x86:
> 1. For CIO memory alone or MMIO memory(eg PCI BAR) alone, the compiler
> barrier is enough to keep ordering, that's why both rte_io_mb and
> rte_cio_mb are defined as compiler barriers, right?

Yes, that's right for x86.

> 2. How about the ordering of interleaved CIO and MMIO accesses, for example,
> a young store to MMIO can be reordered before an older store to CIO? CIO
> may be faster than devices, but store buffers or caching may cause the CIO
> update not visible to the device(in a common doorbell case)?
> 

There's always one kind of cache coherent engine in x86 uncore sub-system.
When CIO write instruction was retried, data will be in CPU LLC.
When device doing inbound read, request will go to cache engine first and then 
check memory state and retrieve latest value.

> Best regards,
> Gavin
> 
> > -Original Message-
> > From: Liu, Yong 
> > Sent: Wednesday, September 11, 2019 10:39 AM
> > To: Gavin Hu (Arm Technology China) ; Wang, Yinan
> > ; Maxime Coquelin ;
> > Joyce Kong (Arm Technology China) ; dev@dpdk.org
> > Cc: nd ; Bie, Tiwei ; Wang, Zhihong
> > ; amore...@redhat.com; Wang, Xiao W
> > ; jfreim...@redhat.com; Honnappa Nagarahalli
> > ; Steve Capper 
> > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for packed
> vring
> > desc avail flags
> >
> >
> >
> > > -Original Message-
> > > From: Gavin Hu (Arm Technology China) [mailto:gavin...@arm.com]
> > > Sent: Tuesday, September 10, 2019 5:49 PM
> > > To: Wang, Yinan ; Maxime Coquelin
> > > ; Joyce Kong (Arm Technology China)
> > > ; dev@dpdk.org
> > > Cc: nd ; Bie, Tiwei ; Wang, Zhihong
> > > ; amore...@redhat.com; Wang, Xiao W
> > > ; Liu, Yong ;
> > > jfreim...@redhat.com; Honnappa Nagarahalli
> > ;
> > > Steve Capper 
> > > Subject: RE: [dpdk-dev] [PATCH v3 1/2] virtio: one way barrier for
> packed
> > > vring desc avail flags
> > >
> > > Hi Yinan,
> > >
> > > We have done a comparative analysis and found with the old code the
> > > if(weak_barriers) and else branches were saved on x86 as rte_smp_wmb
> > and
> > > rte_cio_wmb are identical.
> > > http://git.dpdk.org/dpdk/tree/drivers/net/virtio/virtqueue.h#n49
> > > For the new code, with Joyce's patches applied, the branches were not
> > saved,
> > > which requir additional cpu cycles, this caused slight degradation on
> x86.
> > >
> > > The patches uplifted the performance on aarch64 about 9% as indicated
> in
> > > the cover letter. While I am thinking over a solution to the
> degradation on
> > > x86,could you help answer:
> > > 1. Is rte_cio_wmb is sufficient for the non weak-barrier case(HW
> > > offloading)?
> > >  I got this question because I see in Intel NIC PMDs, it is almost
> never
> > > used, it is rte_wmb that is more widely used to notify the NIC device,
> any
> > > difference between the virtio ring compatible smartNIC device(or vDPA?)
> > and
> > > i40e like devices?
> >
> > Hi Gavin,
> > X86 architecture can guarantee that young store happen later than old
> store.
> > So rte_cio_wmb is just compiler memory barrier in x86.
> >
> > I think compiler barrier is also enough in pmd, rte_wmb is in pmd because
> of
> > it was inherit from first implementation :)
> >
> > Thanks,
> > Marvin
> >
> > > 2. If the rte_cio_wmb is not sufficient for this case and replaced by
> > > stronger barriers, like sfence,  then the branches will not be saved by
> the
> > > compiler, then the problem becomes with the correct use of barriers,
> other
> > > than the degradation.
> > >
> > > Any comments are welcome!
> > >
> > > Best Regards,
> > > Gavin
> > >
> > > > -Original Message-
> > > > From: Wang, Yinan 
> > > > Sent: Tuesday, September 10, 2019

1 2 3 >

1 - 100 of 201 matches

Mail list logo