[dpdk-dev] VFIO no-iommu

2015-12-17 Thread Vincent JARDIN
On 17/12/2015 20:38, Jan Viktorin wrote:
> which platforms (or computer systems) I am targeting?

It is about VMs on IOMMU capable systems. What if you need to use SRIOV 
with IXGBE, or IGB devices?

For some DPDK cases, like Mellanox or virtio, you do not need to use 
VFIO/UIO into the guests, so no issue. But for some other PMDs, you need 
a VFIO/UIO.

Best regards,
   Vincent


[dpdk-dev] [PATCH v2 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-17 Thread Stephen Hemminger
On Mon, 14 Dec 2015 09:14:41 +0800
Huawei Xie  wrote:

> v2 changes:
>  unroll the loop a bit to help the performance
> 
> rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
> 
> There is related thread about this bulk API.
> http://dpdk.org/dev/patchwork/patch/4718/
> Thanks to Konstantin's loop unrolling.
> 
> Signed-off-by: Gerald Rogers 
> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 
> ---
>  lib/librte_mbuf/rte_mbuf.h | 50 
> ++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f234ac9..4e209e0 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1336,6 +1336,56 @@ static inline struct rte_mbuf 
> *rte_pktmbuf_alloc(struct rte_mempool *mp)
>  }
>  
>  /**
> + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to 
> default
> + * values.
> + *
> + *  @param pool
> + *The mempool from which mbufs are allocated.
> + *  @param mbufs
> + *Array of pointers to mbufs
> + *  @param count
> + *Array size
> + *  @return
> + *   - 0: Success
> + */
> +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
> +  struct rte_mbuf **mbufs, unsigned count)
> +{
> + unsigned idx = 0;
> + int rc;
> +
> + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> + if (unlikely(rc))
> + return rc;
> +
> + switch (count % 4) {
> + while (idx != count) {
> + case 0:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 3:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 2:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 1:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + }
> + }
> + return 0;
> +}

This is weird. Why not just use Duff's device in a more normal manner.



[dpdk-dev] [PATCH v2 0/6] vhost-user live migration support

2015-12-17 Thread Yuanhan Liu
On Thu, Dec 17, 2015 at 12:08:13PM +, Iremonger, Bernard wrote:
> Hi Yuanhan,
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Thursday, December 17, 2015 3:12 AM
> > To: dev at dpdk.org
> > Cc: Xie, Huawei ; Michael S. Tsirkin
> > ; Victor Kaplansky ; Iremonger,
> > Bernard ; Pavel Fedin
> > ; Peter Xu ; Yuanhan Liu
> > ; Chen, Zhihui ;
> > Yang, Maggie 
> > Subject: [PATCH v2 0/6] vhost-user live migration support
> > 
> > This patch set adds the vhost-user live migration support.
> > 
> > The major task behind that is to log pages we touched during live migration,
> > including used vring and desc buffer. So, this patch set is basically about
> > adding vhost log support, and using it.
> > 
> > Patchset
> > 
> > - Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
> >   the dirty memory bitmap is.
> > 
> > - Patch 2 introduces a vhost_log_write() helper function to log
> >   pages we are gonna change.
> > 
> > - Patch 3 logs changes we made to used vring.
> > 
> > - Patch 4 logs changes we made to vring desc buffer.
> > 
> > - Patch 5 and 6 add some feature bits related to live migration.
> > 
> >
>  
> The follow test guide should probably be added the DPDK doc files.

Yes, but not this one, which is a fare rough one. The official one
should do live migration between two hosts.

> It could be added to the sample app guide or the programmers guide.
> There is already a Vhost Library  section in the programmers guide and
> A Vhost Sample Application section in the sample app guide.

We may do it after the validation from validation team.

--yliu


[dpdk-dev] VFIO no-iommu

2015-12-17 Thread Jan Viktorin
On Thu, 17 Dec 2015 11:09:23 +0100
Thomas Monjalon  wrote:

> Hi,
> 
> 2015-12-17 09:52, Burakov, Anatoly:
> > > >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > > > Does it imply deprecating the uio interface?  I believe the
> > > > > > feature that started this discussion was support for MSI/X
> > > > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > > > would be the ultimate test of whether this provides dpdk with not
> > > > > > only a more consistent interface, but the feature dpdk wants
> > > > > > that's missing in uio. Thanks,  
> > > >
> > > > Ferruh has done a great job so far testing Alex's patch, very few 
> > > > changes  
> > > from DPDK side seem to be required as far as existing functionality goes 
> > > (not
> > > sure about VF interrupts mentioned by Alex). However, one thing that
> > > concerns me is usability. While it is true that no-IOMMU mode in VFIO 
> > > would
> > > mean uio interfaces could be deprecated in time, the no-iommu mode is way
> > > more hassle than using igb_uio/uio_pci_generic because it will require a
> > > kernel recompile as opposed to simply compiling and insmod'ding an out-of-
> > > tree driver. So, in essence, if you don't want an IOMMU, it's becoming 
> > > that
> > > much harder to use DPDK. Would that be something DPDK is willing to live
> > > with in the absence of uio interfaces?
> > > 
> > > Excuse me if I missed something obvious.
> > > Why a kernel compilation is needed?  
> > 
> > Well, not really full kernel compilation, but in the default configuration, 
> > VFIO driver would not support NOIOMMU mode. I.e. it's not compiled by 
> > default. Support for no-iommu should be enabled in kernel config and 
> > compiled in. So, whoever is going to use DPDK with VFIO-no-iommu will have 
> > to download kernel tree and recompile the VFIO module and install it. 
> > That's obviously way more hassle than simply compiling an out-of-tree 
> > driver that's already included and works with an out-of-the-box kernel.  
> 
> The "out-of-the-box kernel" is configured by your distribution.
> So we don't know yet what will be their choice.
> If the distribution supports DPDK, it should be enabled.

I have a question as I am not involved in all possible DPDK
configurations, platforms, etc. and not yet very involved in vfio. What
are the devices which do not have IOMMU? If I have, say, DPDK 2.3 with
vfio-noiommu, which platforms (or computer systems) I am targeting?

Would it be an Intel-based system? Would it be PPC8, ARM?

If it is ARMv7... I would say that the fact I have to explicitly enable
the no-IOMMU feature and rebuild the kernel (or whatever) is just OK. As
for such systems, it is common to have a quite customized OS. Well,
the big distributions are able to run on those devices, that's true...
However, in such case, the users are usually skilled enough to take
care of having their own special Linux kernel.

So, is the fact the distributions would not support the no-IOMMU setup
in their default configuration really an issue? Will some very common
Intel/DPDK-based box need this?

Regards
Jan


[dpdk-dev] [PATCH] Unlink existing unused sockets at start up

2015-12-17 Thread Yuanhan Liu
On Wed, Dec 16, 2015 at 11:21:02PM -0500, Zhihong Wang wrote:
> This patch unlinks existing unused sockets (which cause new bindings to fail, 
> e.g. vHost PMD) to ensure smooth startup.
> In a lot of cases DPDK applications are terminated abnormally without proper 
> resource release. Therefore, DPDK libs should be able to deal with unclean 
> boot environment.

No, I thought we have made it clear, that a library should not remove a
file given by the application, the application should.


(BTW, please wrap your commit log in 80 chars).

--yliu


[dpdk-dev] [PATCH v5 3/3] vhost: Add helper function to convert port id to virtio device pointer

2015-12-17 Thread Yuanhan Liu
On Tue, Nov 24, 2015 at 06:00:03PM +0900, Tetsuya Mukawa wrote:
> This helper function is used to convert port id to virtio device
> pointer. To use this function, a port should be managed by vhost PMD.
> After getting virtio device pointer, it can be used for calling vhost
> library APIs.

I'm thinking why is that necessary. I mean, hey, can we simply treat
it as a normal pmd driver, and don't consider any vhost lib functions
any more while using vhost pmd?

--yliu


[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-17 Thread Yuanhan Liu
On Tue, Nov 24, 2015 at 06:00:01PM +0900, Tetsuya Mukawa wrote:
> The vhost PMD will be a wrapper of vhost library, but some of vhost
> library APIs cannot be mapped to ethdev library APIs.
> Becasue of this, in some cases, we still need to use vhost library APIs
> for a port created by the vhost PMD.
> 
> Currently, when virtio device is created and destroyed, vhost library
> will call one of callback handlers. The vhost PMD need to use this
> pair of callback handlers to know which virtio devices are connected
> actually.
> Because we can register only one pair of callbacks to vhost library, if
> the PMD use it, DPDK applications cannot have a way to know the events.
> 
> This may break legacy DPDK applications that uses vhost library. To prevent
> it, this patch adds one more pair of callbacks to vhost library especially
> for the vhost PMD.
> With the patch, legacy applications can use the vhost PMD even if they need
> additional specific handling for virtio device creation and destruction.
> 
> For example, legacy application can call
> rte_vhost_enable_guest_notification() in callbacks to change setting.

TBH, I never liked it since the beginning. Introducing two callbacks
for one event is a bit messy, and therefore error prone.

I have been thinking this occasionally last few weeks, and have came
up something that we may introduce another layer callback based on
the vhost pmd itself, by a new API:

rte_eth_vhost_register_callback().

And we then call those new callback inside the vhost pmd new_device()
and vhost pmd destroy_device() implementations.

And we could have same callbacks like vhost have, but I'm thinking
that new_device() and destroy_device() doesn't sound like a good name
to a PMD driver. Maybe a name like "link_state_changed" is better?

What do you think of that?


On the other hand, I'm still thinking is that really necessary to let
the application be able to call vhost functions like 
rte_vhost_enable_guest_notification()
with the vhost PMD driver?

--yliu


[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Successfully setup DPDK OVS with vhostuser

2015-12-17 Thread Abhijeet Karve
Hi Przemek,

Thank you so much for sharing the ref guide.

Would be appreciate if clear one doubt. 

At present we are setting up openstack kilo interactively and further 
replacing ovs with ovs-dpdk enabled. 
Once the above setup done, We are creating instance in openstack and 
passing that instance id to QEMU command line which further passes the 
vhost-user sockets to instances, enabling the DPDK libraries in it.

Isn't this the correct way of integrating ovs-dpdk with openstack?


Thanks & Regards
Abhijeet Karve




From:   "Czesnowicz, Przemyslaw" 
To: Abhijeet Karve 
Cc: "dev at dpdk.org" , "discuss at openvswitch.org" 
, "Gray, Mark D" 
Date:   12/17/2015 05:27 PM
Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser



HI Abhijeet,

For Kilo you need to use ovsdpdk mechanism driver and a matching agent to 
integrate ovs-dpdk with OpenStack.

The guide you are following only talks about running ovs-dpdk not how it 
should be integrated with OpenStack.

Please follow this guide:
https://github.com/openstack/networking-ovs-dpdk/blob/stable/kilo/doc/source/getstarted/ubuntu.rst

Best regards
Przemek


From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com] 
Sent: Wednesday, December 16, 2015 9:37 AM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser

Hi Przemek, 


We have configured the accelerated data path between a physical interface 
to the VM using openvswitch netdev-dpdk with vhost-user support. The VM 
created with this special data path and vhost library, I am calling as 
DPDK instance. 

If assigning ip manually to the newly created Cirros VM instance, We are 
able to make 2 VM's to communicate on the same compute node. Else it's not 
associating any ip through DHCP though DHCP is in compute node only. 

Yes it's a compute + controller node setup and we are using following 
software platform on compute node: 
_ 
Openstack: Kilo 
Distribution: Ubuntu 14.04 
OVS Version: 2.4.0 
DPDK 2.0.0 
_ 

We are following the intel guide 
https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200
 


When doing "ovs-vsctl show" in compute node, it shows below output: 
_ 
ovs-vsctl show 
c2ec29a5-992d-4875-8adc-1265c23e0304 
Bridge br-ex 
Port phy-br-ex 
Interface phy-br-ex 
type: patch 
options: {peer=int-br-ex} 
Port br-ex 
Interface br-ex 
type: internal 
Bridge br-tun 
fail_mode: secure 
Port br-tun 
Interface br-tun 
type: internal 
Port patch-int 
Interface patch-int 
type: patch 
options: {peer=patch-tun} 
Bridge br-int 
fail_mode: secure 
Port "qvo0ae19a43-b6" 
tag: 2 
Interface "qvo0ae19a43-b6" 
Port br-int 
Interface br-int 
type: internal 
Port "qvo31c89856-a2" 
tag: 1 
Interface "qvo31c89856-a2" 
Port patch-tun 
Interface patch-tun 
type: patch 
options: {peer=patch-int} 
Port int-br-ex 
Interface int-br-ex 
type: patch 
options: {peer=phy-br-ex} 
Port "qvo97fef28a-ec" 
tag: 2 
Interface "qvo97fef28a-ec" 
Bridge br-dpdk 
Port br-dpdk 
Interface br-dpdk 
type: internal 
Bridge "br0" 
Port "br0" 
Interface "br0" 
type: internal 
Port "dpdk0" 
Interface "dpdk0" 
type: dpdk 
Port "vhost-user-2" 
Interface "vhost-user-2" 
type: dpdkvhostuser 
Port "vhost-user-0" 
Interface "vhost-user-0" 
type: dpdkvhostuser 
Port "vhost-user-1" 
Interface "vhost-user-1" 
type: dpdkvhostuser 
ovs_version: "2.4.0" 
root at dpdk:~# 
_ 

Open flows output in bridge in compute node are as below: 
_ 
root at dpdk:~# ovs-ofctl dump-flows br-tun 
NXST_FLOW reply (xid=0x4): 
 cookie=0x0, duration=71796.741s, table=0, n_packets=519, n_bytes=33794, 
idle_age=19982, hard_age=65534, priority=1,in_port=1 actions=resubmit(,2) 
 cookie=0x0, duration=71796.700s, table=0, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=0 actions=drop 
 cookie=0x0, duration=71796.649s, table=2, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 
actions=resubmit(,20) 
 

[dpdk-dev] [ [PATCH v2] 01/13] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 5:33 PM, Thomas Monjalon
 wrote:
> 2015-12-17 17:32, Santosh Shukla:
>> On Mon, Dec 14, 2015 at 6:30 PM, Santosh Shukla  
>> wrote:
>> > virtio_recv_pkts_vec and other virtio vector friend apis are written for 
>> > sse/avx
>> > instructions. For arm64 in particular, virtio vector implementation does 
>> > not
>> > exist(todo).
>> >
>> > So virtio pmd driver wont build for targets like i686, arm64.  By making
>> > RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will 
>> > work
>> > in non-vectored virtio mode.
>> >
>> > Signed-off-by: Santosh Shukla 
>> > ---
>>
>> Ping?
>>
>> any review  / comment on this patch much appreciated. Thanks
>
> Why not check for SSE/AVX support instead of adding yet another config option?

Ok, keeping a check for sse/avx across the patch wont stand true for
future virtio vectored implementation lets say for arm/arm64 cases
i.e.. sse2neon types. That implies user suppose to keep on appending /
adding checks for see2neon for example and so forth.

On other hand, motivation of including INC_VEC config was inspired
from IXGBE and other pmd drivers who support vectored sse/avx _rx path
and also could work w/o vectored mode. Current virtio is missing such
support and arm dont have vectored sse2neon types implementation right
now so its a blocker for arm case. Also keeping virtio pmd driver
flexible enough to work in non-vectored mode is a requirement/ a
feature.


[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Yuanhan Liu
On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>  wrote:
> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> >> x86 requires a special set of instructions to access ioports, but other
> >> architectures let you remap io resources.
> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
> >> architectures other than x86.
> >
> > One question: this patch could be a replacement of the igbuio_iomap patch
> > from Santosh? If so, I like it: It's more elegant.
> >
> > --yliu
> >
> 
> I did tried similar in past but not in parse_sysfs (such that
> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
> pci_map_resource not able to map address hence segfault at tespmd
> initialization.
> 
> i was getting these:
> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
> Invalid argument (0x)

That's because ARM (at least the kernel) doesn't allow an IO map:

arch/arm/kernel/bios32.c

618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
619 enum pci_mmap_state mmap_state, int write_combine)
620 {
621 if (mmap_state == pci_mmap_io)
622 return -EINVAL;

And with a quick glimpse of powerpc, I see no such limitation. Hence,
this peice of code may work only on Powerpc platform (and maybe a few
others we don't care).

So, apparently, this will not work for ARM.

--yliu


[dpdk-dev] [ [PATCH v2] 02/13] config: i686: set RTE_VIRTIO_INC_VECTOR=n

2015-12-17 Thread Santosh Shukla
On Mon, Dec 14, 2015 at 6:30 PM, Santosh Shukla  wrote:
> i686 target config example:
> config/defconfig_i686-native-linuxapp-gcc says "Vectorized PMD is not 
> supported
> on 32-bit".
>
> So setting RTE_VIRTIO_INC_VECTOR to 'n'.
>
> Signed-off-by: Santosh Shukla 
> ---

ping? review comment please.

>  config/defconfig_i686-native-linuxapp-gcc |1 +
>  config/defconfig_i686-native-linuxapp-icc |1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/config/defconfig_i686-native-linuxapp-gcc 
> b/config/defconfig_i686-native-linuxapp-gcc
> index a90de9b..a4b1c49 100644
> --- a/config/defconfig_i686-native-linuxapp-gcc
> +++ b/config/defconfig_i686-native-linuxapp-gcc
> @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
>  # Vectorized PMD is not supported on 32-bit
>  #
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> diff --git a/config/defconfig_i686-native-linuxapp-icc 
> b/config/defconfig_i686-native-linuxapp-icc
> index c021321..f8eb6ad 100644
> --- a/config/defconfig_i686-native-linuxapp-icc
> +++ b/config/defconfig_i686-native-linuxapp-icc
> @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
>  # Vectorized PMD is not supported on 32-bit
>  #
>  CONFIG_RTE_IXGBE_INC_VECTOR=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=n
> --
> 1.7.9.5
>


[dpdk-dev] [ [PATCH v2] 01/13] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2015-12-17 Thread Santosh Shukla
On Mon, Dec 14, 2015 at 6:30 PM, Santosh Shukla  wrote:
> virtio_recv_pkts_vec and other virtio vector friend apis are written for 
> sse/avx
> instructions. For arm64 in particular, virtio vector implementation does not
> exist(todo).
>
> So virtio pmd driver wont build for targets like i686, arm64.  By making
> RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will 
> work
> in non-vectored virtio mode.
>
> Signed-off-by: Santosh Shukla 
> ---

Ping?

any review  / comment on this patch much appreciated. Thanks

>  config/common_linuxapp   |1 +
>  drivers/net/virtio/Makefile  |2 +-
>  drivers/net/virtio/virtio_rxtx.c |7 +++
>  3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index ba9e55d..275fb40 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -273,6 +273,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
> +CONFIG_RTE_VIRTIO_INC_VECTOR=y
>
>  #
>  # Compile burst-oriented VMXNET3 PMD driver
> diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
> index 43835ba..25a842d 100644
> --- a/drivers/net/virtio/Makefile
> +++ b/drivers/net/virtio/Makefile
> @@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
> -SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
> +SRCS-$(CONFIG_RTE_VIRTIO_INC_VECTOR) += virtio_rxtx_simple.c
>
>  # this lib depends upon:
>  DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
> diff --git a/drivers/net/virtio/virtio_rxtx.c 
> b/drivers/net/virtio/virtio_rxtx.c
> index 74b39ef..23be1ff 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -438,7 +438,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,
>
> dev->data->rx_queues[queue_idx] = vq;
>
> +#ifdef RTE_VIRTIO_INC_VECTOR
> virtio_rxq_vec_setup(vq);
> +#endif
>
> return 0;
>  }
> @@ -464,7 +466,10 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
> const struct rte_eth_txconf *tx_conf)
>  {
> uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
> +
> +#ifdef RTE_VIRTIO_INC_VECTOR
> struct virtio_hw *hw = dev->data->dev_private;
> +#endif
> struct virtqueue *vq;
> uint16_t tx_free_thresh;
> int ret;
> @@ -477,6 +482,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
> return -EINVAL;
> }
>
> +#ifdef RTE_VIRTIO_INC_VECTOR
> /* Use simple rx/tx func if single segment and no offloads */
> if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS 
> &&
>  !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
> @@ -485,6 +491,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
> dev->rx_pkt_burst = virtio_recv_pkts_vec;
> use_simple_rxtx = 1;
> }
> +#endif
>
> ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, 
> vtpci_queue_idx,
> nb_desc, socket_id, );
> --
> 1.7.9.5
>


[dpdk-dev] [PATCH] doc: fix missing link target

2015-12-17 Thread Thomas Monjalon
> > Fix missing link in the Linux GSG, accidentally removed in previous merge:
> > 
> >   WARNING: undefined label: linux_gsg_compiling_dpdk
> >   Fixes: 29c673401c4d ("doc: improve Linux guide layout")
> > 
> > Signed-off-by: John McNamara 
> Acked-by: Bernard Iremonger 

Applied, thanks


[dpdk-dev] [PATCH] doc: remove DPDK from guide titles

2015-12-17 Thread Thomas Monjalon
> > In HTML and PDF guides, it is clear in the header that the doc is related
> > to the DPDK.
> > So "DPDK" is redundant and can be removed from FAQ and release notes
> > titles to improve consistency.
> > 
> > Signed-off-by: Thomas Monjalon 
> 
> Good point.
> 
> Acked-by: John McNamara 

Applied


[dpdk-dev] [PATCH] librte_ether: fix crashes in rte_ethdev functions.

2015-12-17 Thread Bernard Iremonger
The nb_rx_queues and nb_tx_queues are initialised before
the tx_queue and rx_queue arrays are allocated. The arrays
are allocated when the ethdev port is started.

If any of the following functions are called before the ethdev
port is started there is a segmentation fault:

rte_eth_stats_get
rte_eth_stats_reset
rte_eth_xstats_get
rte_eth_xstats_reset

Fixes: af75078fece3 ("first public release")
Fixes: ce757f5c9a4d ("ethdev: new method to retrieve extended statistics")
Fixes: d4fef8b0d5e5 ("ethdev: expose generic and driver specific stats in 
xstats")
Signed-off-by: Bernard Iremonger 
---
 lib/librte_ether/rte_ethdev.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..a0ee84d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1441,7 +1441,10 @@ rte_eth_stats_get(uint8_t port_id, struct rte_eth_stats 
*stats)
memset(stats, 0, sizeof(*stats));

RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
-   (*dev->dev_ops->stats_get)(dev, stats);
+
+   if (dev->data->dev_started)
+   (*dev->dev_ops->stats_get)(dev, stats);
+
stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed;
return 0;
 }
@@ -1455,7 +1458,10 @@ rte_eth_stats_reset(uint8_t port_id)
dev = _eth_devices[port_id];

RTE_FUNC_PTR_OR_RET(*dev->dev_ops->stats_reset);
-   (*dev->dev_ops->stats_reset)(dev);
+
+   if (dev->data->dev_started)
+   (*dev->dev_ops->stats_reset)(dev);
+
dev->data->rx_mbuf_alloc_failed = 0;
 }

@@ -1479,7 +1485,8 @@ rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats 
*xstats,
(dev->data->nb_tx_queues * RTE_NB_TXQ_STATS);

/* implemented by the driver */
-   if (dev->dev_ops->xstats_get != NULL) {
+   if ((dev->dev_ops->xstats_get != NULL) &&
+   (dev->data->dev_started)) {
/* Retrieve the xstats from the driver at the end of the
 * xstats struct.
 */
@@ -1548,7 +1555,8 @@ rte_eth_xstats_reset(uint8_t port_id)
dev = _eth_devices[port_id];

/* implemented by the driver */
-   if (dev->dev_ops->xstats_reset != NULL) {
+   if ((dev->dev_ops->xstats_reset != NULL) &&
+   (dev->data->dev_started)) {
(*dev->dev_ops->xstats_reset)(dev);
return;
}
-- 
2.6.3



[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 4:03 PM, Thomas Monjalon
 wrote:
> 2015-12-17 15:51, Santosh Shukla:
>> On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
>>  wrote:
>> > Hi,
>> >
>> > 2015-12-17 15:37, Santosh Shukla:
>> >> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla  
>> >> wrote:
>> >> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  
>> >> > wrote:
>> >> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>> >> >>  wrote:
>> >> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>> >>  On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>> >>   wrote:
>> >>  > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> >>  >> x86 requires a special set of instructions to access ioports, but 
>> >>  >> other
>> >>  >> architectures let you remap io resources.
>> >>  >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> >>  >> architectures other than x86.
>> >>  >
>> >>  > One question: this patch could be a replacement of the 
>> >>  > igbuio_iomap patch
>> >>  > from Santosh? If so, I like it: It's more elegant.
>> >>  >
>> >>  > --yliu
>> >>  >
>> >> 
>> >>  I did tried similar in past but not in parse_sysfs (such that
>> >>  mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>> >>  pci_map_resource not able to map address hence segfault at tespmd
>> >>  initialization.
>> >> 
>> >>  i was getting these:
>> >>  EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
>> >>  Invalid argument (0x)
>> >> >>>
>> >> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
>> >> >>>
>> >> >>> arch/arm/kernel/bios32.c
>> >> >>> 
>> >> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct 
>> >> >>> vm_area_struct *vma,
>> >> >>> 619 enum pci_mmap_state mmap_state, int 
>> >> >>> write_combine)
>> >> >>> 620 {
>> >> >>> 621 if (mmap_state == pci_mmap_io)
>> >> >>> 622 return -EINVAL;
>> >> >>>
>> >> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> >> >>> this peice of code may work only on Powerpc platform (and maybe a few
>> >> >>> others we don't care).
>> >> >>>
>> >> >>> So, apparently, this will not work for ARM.
>> >> >>>
>> >> >>
>> >> >> Right and I did shared detailed explanation on why it wont work on
>> >> >> this link [1], infact this patch shouldn;t work for mips too.
>> >> >>
>> >> >> As I mentioned earlier I did tried similar approach and so to get
>> >> >> everything working like iomem is currently in dpdk; we need to add
>> >> >> something like pci_remap_iospace --> ioremap_page_range() but this api
>> >> >> not really pci_mmap_page_range types. user need to write more code on
>> >> >> top so to use this api efficiently, also this api looks like meant to
>> >> >> use by arch file only in kernel space.
>> >> >>
>> >> >>
>> >> > missed link;
>> >> >
>> >> > [1] http://dpdk.org/dev/patchwork/patch/9365/
>> >> >
>> >>
>> >> IMO, it is worth keeping one special device file who could work across
>> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
>> >> dpdk user-space. also this approach has no kernel version dependency
>> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
>> >> interface in drivers/char/mem.c which could read more than byte in one
>> >> single operation, but that has kernel dependency. However that
>> >> approach too is arch agnostic.
>> >
>> > Your first approach use an out-of-tree kernel module (igb_uio), so we 
>> > cannot
>> > really say there is no kernel dependency.
>>
>> Agree but I mentioned kernel __version__ dependency.
>
> Yes you did.
> One of the main issue with out-of-tree kernel modules is the version
> dependency. Probably that igb_uio from DPDK 2.3 will not compile with
> the kernel 5.0.
>

don't know kernel 5.0 feature list so I guess your may be right. is
uio obsoleted for 5.0 kernel?

>> > We should try to remove the need for any out-of-tree kernel module.
>> > That's why the Linux upstream approach is a better solution.
>>
>> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
>> pci_mmap_page_range()?
>
> I don't know what is the best solution in the kernel.
> First we need to be sure that there is absolutely no solution without
> kernel changes.

I guess we have done enough evaluation / investigation that suggest -
so to map iopci region to userspace in arch agnostic-way -

# either we need to modify kernel
   - Make sure all the non-x86 arch to support mapping for
iopci region (i.e. pci_mmap_page_range). I don;t think its a correct
approach though.
or
   - include /dev/ioport char-mem device file who could do
more than byte operation, Note that this implementation does not exist
in kernel.  I could send an RFC to lkml.

OR keep device file in user space (current approach)

[dpdk-dev] [PATCH 1/2] version: 2.3.0-rc0

2015-12-17 Thread Thomas Monjalon
> > Signed-off-by: Thomas Monjalon 
> 
> Acked-by: John McNamara 

Series applied, thanks


[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
 wrote:
> Hi,
>
> 2015-12-17 15:37, Santosh Shukla:
>> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla  
>> wrote:
>> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  
>> > wrote:
>> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>> >>  wrote:
>> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>>  On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>>   wrote:
>>  > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>>  >> x86 requires a special set of instructions to access ioports, but 
>>  >> other
>>  >> architectures let you remap io resources.
>>  >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>>  >> architectures other than x86.
>>  >
>>  > One question: this patch could be a replacement of the igbuio_iomap 
>>  > patch
>>  > from Santosh? If so, I like it: It's more elegant.
>>  >
>>  > --yliu
>>  >
>> 
>>  I did tried similar in past but not in parse_sysfs (such that
>>  mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>>  pci_map_resource not able to map address hence segfault at tespmd
>>  initialization.
>> 
>>  i was getting these:
>>  EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
>>  Invalid argument (0x)
>> >>>
>> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
>> >>>
>> >>> arch/arm/kernel/bios32.c
>> >>> 
>> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct 
>> >>> *vma,
>> >>> 619 enum pci_mmap_state mmap_state, int 
>> >>> write_combine)
>> >>> 620 {
>> >>> 621 if (mmap_state == pci_mmap_io)
>> >>> 622 return -EINVAL;
>> >>>
>> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> >>> this peice of code may work only on Powerpc platform (and maybe a few
>> >>> others we don't care).
>> >>>
>> >>> So, apparently, this will not work for ARM.
>> >>>
>> >>
>> >> Right and I did shared detailed explanation on why it wont work on
>> >> this link [1], infact this patch shouldn;t work for mips too.
>> >>
>> >> As I mentioned earlier I did tried similar approach and so to get
>> >> everything working like iomem is currently in dpdk; we need to add
>> >> something like pci_remap_iospace --> ioremap_page_range() but this api
>> >> not really pci_mmap_page_range types. user need to write more code on
>> >> top so to use this api efficiently, also this api looks like meant to
>> >> use by arch file only in kernel space.
>> >>
>> >>
>> > missed link;
>> >
>> > [1] http://dpdk.org/dev/patchwork/patch/9365/
>> >
>>
>> IMO, it is worth keeping one special device file who could work across
>> archs like arm/arm64/powerpc and others, who could map iopci bar to
>> dpdk user-space. also this approach has no kernel version dependency
>> too. BTW; I did mentioned in second approach in to add /dev/ioport
>> interface in drivers/char/mem.c which could read more than byte in one
>> single operation, but that has kernel dependency. However that
>> approach too is arch agnostic.
>
> Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> really say there is no kernel dependency.

Agree but I mentioned kernel __version__ dependency.

> We should try to remove the need for any out-of-tree kernel module.
> That's why the Linux upstream approach is a better solution.

IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
pci_mmap_page_range()?


[dpdk-dev] [PATCH v2 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-17 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Thursday, December 17, 2015 6:41 AM
> To: Xie, Huawei
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] mbuf: provide rte_pktmbuf_alloc_bulk 
> API
> 
> On Mon, Dec 14, 2015 at 09:14:41AM +0800, Huawei Xie wrote:
> > v2 changes:
> >  unroll the loop a bit to help the performance
> >
> > rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
> >
> > There is related thread about this bulk API.
> > http://dpdk.org/dev/patchwork/patch/4718/
> > Thanks to Konstantin's loop unrolling.
> >
> > Signed-off-by: Gerald Rogers 
> > Signed-off-by: Huawei Xie 
> > Acked-by: Konstantin Ananyev 
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 50 
> > ++
> >  1 file changed, 50 insertions(+)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index f234ac9..4e209e0 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1336,6 +1336,56 @@ static inline struct rte_mbuf 
> > *rte_pktmbuf_alloc(struct rte_mempool *mp)
> >  }
> >
> >  /**
> > + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to 
> > default
> > + * values.
> > + *
> > + *  @param pool
> > + *The mempool from which mbufs are allocated.
> > + *  @param mbufs
> > + *Array of pointers to mbufs
> > + *  @param count
> > + *Array size
> > + *  @return
> > + *   - 0: Success
> > + */
> > +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
> > +struct rte_mbuf **mbufs, unsigned count)
> 
> It violates the coding style a bit.
> 
> > +{
> > +   unsigned idx = 0;
> > +   int rc;
> > +
> > +   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> > +   if (unlikely(rc))
> > +   return rc;
> > +
> > +   switch (count % 4) {
> > +   while (idx != count) {
> 
> Well, that's an awkward trick, putting while between switch and case.
> 
> How about moving the whole switch block ahead, and use goto?
> 
>   switch (count % 4) {
>   case 3:
>   goto __3;
>   break;
>   case 2:
>   goto __2;
>   break;
>   ...
> 
>   }
> 
> It basically generates same instructions, yet it improves the
> readability a bit.

I am personally not a big fun of gotos, unless it is totally unavoidable.
I think switch/while construction is pretty obvious these days.
For me the original variant looks cleaner, so my vote would be to stick with it.
Konstantin

> 
>   --yliu
> 
> > +   case 0:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 3:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 2:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 1:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +
> > +/**
> >   * Attach packet mbuf to another packet mbuf.
> >   *
> >   * After attachment we refer the mbuf we attached as 'indirect',
> > --
> > 1.8.1.4


[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla  wrote:
> On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  wrote:
>> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>>  wrote:
>>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
 On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
  wrote:
 > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
 >> x86 requires a special set of instructions to access ioports, but other
 >> architectures let you remap io resources.
 >> So let eal remap io resources by accepting IORESOURCE_IO flag for
 >> architectures other than x86.
 >
 > One question: this patch could be a replacement of the igbuio_iomap patch
 > from Santosh? If so, I like it: It's more elegant.
 >
 > --yliu
 >

 I did tried similar in past but not in parse_sysfs (such that
 mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
 pci_map_resource not able to map address hence segfault at tespmd
 initialization.

 i was getting these:
 EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
 Invalid argument (0x)
>>>
>>> That's because ARM (at least the kernel) doesn't allow an IO map:
>>>
>>> arch/arm/kernel/bios32.c
>>> 
>>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>>> 619 enum pci_mmap_state mmap_state, int 
>>> write_combine)
>>> 620 {
>>> 621 if (mmap_state == pci_mmap_io)
>>> 622 return -EINVAL;
>>>
>>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>>> this peice of code may work only on Powerpc platform (and maybe a few
>>> others we don't care).
>>>
>>> So, apparently, this will not work for ARM.
>>>
>>
>> Right and I did shared detailed explanation on why it wont work on
>> this link [1], infact this patch shouldn;t work for mips too.
>>
>> As I mentioned earlier I did tried similar approach and so to get
>> everything working like iomem is currently in dpdk; we need to add
>> something like pci_remap_iospace --> ioremap_page_range() but this api
>> not really pci_mmap_page_range types. user need to write more code on
>> top so to use this api efficiently, also this api looks like meant to
>> use by arch file only in kernel space.
>>
>>
> missed link;
>
> [1] http://dpdk.org/dev/patchwork/patch/9365/
>

IMO, it is worth keeping one special device file who could work across
archs like arm/arm64/powerpc and others, who could map iopci bar to
dpdk user-space. also this approach has no kernel version dependency
too. BTW; I did mentioned in second approach in to add /dev/ioport
interface in drivers/char/mem.c which could read more than byte in one
single operation, but that has kernel dependency. However that
approach too is arch agnostic.

Let me know!

>>> --yliu


[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  wrote:
> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
>  wrote:
>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>>>  wrote:
>>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>>> >> x86 requires a special set of instructions to access ioports, but other
>>> >> architectures let you remap io resources.
>>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>>> >> architectures other than x86.
>>> >
>>> > One question: this patch could be a replacement of the igbuio_iomap patch
>>> > from Santosh? If so, I like it: It's more elegant.
>>> >
>>> > --yliu
>>> >
>>>
>>> I did tried similar in past but not in parse_sysfs (such that
>>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>>> pci_map_resource not able to map address hence segfault at tespmd
>>> initialization.
>>>
>>> i was getting these:
>>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
>>> Invalid argument (0x)
>>
>> That's because ARM (at least the kernel) doesn't allow an IO map:
>>
>> arch/arm/kernel/bios32.c
>> 
>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
>> 619 enum pci_mmap_state mmap_state, int 
>> write_combine)
>> 620 {
>> 621 if (mmap_state == pci_mmap_io)
>> 622 return -EINVAL;
>>
>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
>> this peice of code may work only on Powerpc platform (and maybe a few
>> others we don't care).
>>
>> So, apparently, this will not work for ARM.
>>
>
> Right and I did shared detailed explanation on why it wont work on
> this link [1], infact this patch shouldn;t work for mips too.
>
> As I mentioned earlier I did tried similar approach and so to get
> everything working like iomem is currently in dpdk; we need to add
> something like pci_remap_iospace --> ioremap_page_range() but this api
> not really pci_mmap_page_range types. user need to write more code on
> top so to use this api efficiently, also this api looks like meant to
> use by arch file only in kernel space.
>
>
missed link;

[1] http://dpdk.org/dev/patchwork/patch/9365/

>> --yliu


[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
 wrote:
> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>> On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>>  wrote:
>> > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>> >> x86 requires a special set of instructions to access ioports, but other
>> >> architectures let you remap io resources.
>> >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>> >> architectures other than x86.
>> >
>> > One question: this patch could be a replacement of the igbuio_iomap patch
>> > from Santosh? If so, I like it: It's more elegant.
>> >
>> > --yliu
>> >
>>
>> I did tried similar in past but not in parse_sysfs (such that
>> mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>> pci_map_resource not able to map address hence segfault at tespmd
>> initialization.
>>
>> i was getting these:
>> EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
>> Invalid argument (0x)
>
> That's because ARM (at least the kernel) doesn't allow an IO map:
>
> arch/arm/kernel/bios32.c
> 
> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
> 619 enum pci_mmap_state mmap_state, int write_combine)
> 620 {
> 621 if (mmap_state == pci_mmap_io)
> 622 return -EINVAL;
>
> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> this peice of code may work only on Powerpc platform (and maybe a few
> others we don't care).
>
> So, apparently, this will not work for ARM.
>

Right and I did shared detailed explanation on why it wont work on
this link [1], infact this patch shouldn;t work for mips too.

As I mentioned earlier I did tried similar approach and so to get
everything working like iomem is currently in dpdk; we need to add
something like pci_remap_iospace --> ioremap_page_range() but this api
not really pci_mmap_page_range types. user need to write more code on
top so to use this api efficiently, also this api looks like meant to
use by arch file only in kernel space.


> --yliu


[dpdk-dev] VFIO no-iommu

2015-12-17 Thread Stephen Hemminger
On Thu, 17 Dec 2015 20:38:16 +0100
Jan Viktorin  wrote:

> On Thu, 17 Dec 2015 11:09:23 +0100
> Thomas Monjalon  wrote:
> 
> > Hi,
> > 
> > 2015-12-17 09:52, Burakov, Anatoly:
> > > > >  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > > > > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?
> > > > > > > Does it imply deprecating the uio interface?  I believe the
> > > > > > > feature that started this discussion was support for MSI/X
> > > > > > > interrupts so that VFs can support some kind of interrupt (uio
> > > > > > > only supports INTx since it doesn't allow DMA).  Implementing that
> > > > > > > would be the ultimate test of whether this provides dpdk with not
> > > > > > > only a more consistent interface, but the feature dpdk wants
> > > > > > > that's missing in uio. Thanks,  
> > > > >
> > > > > Ferruh has done a great job so far testing Alex's patch, very few 
> > > > > changes  
> > > > from DPDK side seem to be required as far as existing functionality 
> > > > goes (not
> > > > sure about VF interrupts mentioned by Alex). However, one thing that
> > > > concerns me is usability. While it is true that no-IOMMU mode in VFIO 
> > > > would
> > > > mean uio interfaces could be deprecated in time, the no-iommu mode is 
> > > > way
> > > > more hassle than using igb_uio/uio_pci_generic because it will require a
> > > > kernel recompile as opposed to simply compiling and insmod'ding an 
> > > > out-of-
> > > > tree driver. So, in essence, if you don't want an IOMMU, it's becoming 
> > > > that
> > > > much harder to use DPDK. Would that be something DPDK is willing to live
> > > > with in the absence of uio interfaces?
> > > > 
> > > > Excuse me if I missed something obvious.
> > > > Why a kernel compilation is needed?  
> > > 
> > > Well, not really full kernel compilation, but in the default 
> > > configuration, VFIO driver would not support NOIOMMU mode. I.e. it's not 
> > > compiled by default. Support for no-iommu should be enabled in kernel 
> > > config and compiled in. So, whoever is going to use DPDK with 
> > > VFIO-no-iommu will have to download kernel tree and recompile the VFIO 
> > > module and install it. That's obviously way more hassle than simply 
> > > compiling an out-of-tree driver that's already included and works with an 
> > > out-of-the-box kernel.  
> > 
> > The "out-of-the-box kernel" is configured by your distribution.
> > So we don't know yet what will be their choice.
> > If the distribution supports DPDK, it should be enabled.
> 
> I have a question as I am not involved in all possible DPDK
> configurations, platforms, etc. and not yet very involved in vfio. What
> are the devices which do not have IOMMU? If I have, say, DPDK 2.3 with
> vfio-noiommu, which platforms (or computer systems) I am targeting?
> 
> Would it be an Intel-based system? Would it be PPC8, ARM?
> 
> If it is ARMv7... I would say that the fact I have to explicitly enable
> the no-IOMMU feature and rebuild the kernel (or whatever) is just OK. As
> for such systems, it is common to have a quite customized OS. Well,
> the big distributions are able to run on those devices, that's true...
> However, in such case, the users are usually skilled enough to take
> care of having their own special Linux kernel.
> 
> So, is the fact the distributions would not support the no-IOMMU setup
> in their default configuration really an issue? Will some very common
> Intel/DPDK-based box need this?
> 
> Regards
> Jan

So far:
  * broken hardware (many systems including those from Dell) do not provide
working IOMMU because of Bios bugs etc.
  * Linux guest in VMware/KVM/Hyper-V. There is no IOMMU emulation in most
of these systems.
  * Older smaller systems (ie Atom) may not have IOMMU



[dpdk-dev] DPDK Community Call - Linux Foundation

2015-12-17 Thread O'Driscoll, Tim
850
> Sweden : +46 (0) 853 527 835
> Switzerland : +41 (0) 435 0006 96
> United Kingdom : +44 (0) 20 3713 5028
-- next part --
A non-text attachment was scrubbed...
Name: LF Overview for DPDK Community - 16 December 2015.pdf
Type: application/pdf
Size: 1092634 bytes
Desc: LF Overview for DPDK Community - 16 December 2015.pdf
URL: 
<http://dpdk.org/ml/archives/dev/attachments/20151217/25c431bd/attachment-0001.pdf>


[dpdk-dev] [ [PATCH v2] 01/13] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2015-12-17 Thread Stephen Hemminger
On Thu, 17 Dec 2015 17:32:38 +0530
Santosh Shukla  wrote:

> On Mon, Dec 14, 2015 at 6:30 PM, Santosh Shukla  wrote:
> > virtio_recv_pkts_vec and other virtio vector friend apis are written for 
> > sse/avx
> > instructions. For arm64 in particular, virtio vector implementation does not
> > exist(todo).
> >
> > So virtio pmd driver wont build for targets like i686, arm64.  By making
> > RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will 
> > work
> > in non-vectored virtio mode.
> >
> > Signed-off-by: Santosh Shukla 
> > ---
> 
> Ping?
> 
> any review  / comment on this patch much appreciated. Thanks

The patches I posted (and were ignored by Intel) to support indirect
and any layout should have much bigger performance gain than all this
low level SSE bit twiddling.



[dpdk-dev] [ [PATCH v2] 05/13] virtio: change io_base datatype from uint32_t to uint64_type

2015-12-17 Thread Yuanhan Liu
On Wed, Dec 16, 2015 at 08:35:58PM +0530, Santosh Shukla wrote:
> On Wed, Dec 16, 2015 at 8:28 PM, Yuanhan Liu
>  wrote:
> > On Wed, Dec 16, 2015 at 08:09:40PM +0530, Santosh Shukla wrote:
> >> On Wed, Dec 16, 2015 at 7:53 PM, Yuanhan Liu
> >>  wrote:
> >> > On Wed, Dec 16, 2015 at 07:31:57PM +0530, Santosh Shukla wrote:
> >> >> On Wed, Dec 16, 2015 at 7:18 PM, Yuanhan Liu
> >> >>  wrote:
> >> >> > On Mon, Dec 14, 2015 at 06:30:24PM +0530, Santosh Shukla wrote:
> >> >> >> In x86 case io_base to store ioport address not more than 65535 
> >> >> >> ioports. i.e..0
> >> >> >> to  but in non-x86 case in particular arm64 it need to store 
> >> >> >> more than 32
> >> >> >> bit address so changing io_base datatype from 32 to 64.
> >> >> >>
> >> >> >> Signed-off-by: Santosh Shukla 
> >> >> >> ---
> >> >> >>  drivers/net/virtio/virtio_ethdev.c |2 +-
> >> >> >>  drivers/net/virtio/virtio_pci.h|4 ++--
> >> >> >>  2 files changed, 3 insertions(+), 3 deletions(-)
> >> >> >>
> >> >> >> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> >> >> >> b/drivers/net/virtio/virtio_ethdev.c
> >> >> >> index d928339..620e0d4 100644
> >> >> >> --- a/drivers/net/virtio/virtio_ethdev.c
> >> >> >> +++ b/drivers/net/virtio/virtio_ethdev.c
> >> >> >> @@ -1291,7 +1291,7 @@ eth_virtio_dev_init(struct rte_eth_dev 
> >> >> >> *eth_dev)
> >> >> >>   return -1;
> >> >> >>
> >> >> >>   hw->use_msix = virtio_has_msix(_dev->addr);
> >> >> >> - hw->io_base = 
> >> >> >> (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
> >> >> >> + hw->io_base = 
> >> >> >> (uint64_t)(uintptr_t)pci_dev->mem_resource[0].addr;
> >> >> >
> >> >> > I'd suggest to move the io_base assignment (and cast) into 
> >> >> > virtio_ioport_init()
> >> >> > so that we could do the correct cast there, say cast it to uint32_t 
> >> >> > for
> >> >> > X86, and uint64_t for others.
> >> >> >
> >> >>
> >> >> Ok.
> >> >>
> >> >> This was deliberately done considering your 1.0 virtio spec patch do
> >> >> care for uint64_t types and in arm64 case, If I plan to use those
> >> >> future patches, IMO it make more sense to me keep it in uint64_t way;
> >> >
> >> > I did different cast, 32 bit for legacy virtio pci device, and 64 bit
> >> > for modern virtio pci device.
> >> >
> >> >> Also in x86 case max address could of type 0x1000-101f and so forth;
> >> >> changing data-type to uint64_t default wont effect such address,
> >> >> right?
> >> >
> >> > Right, but what's the harm of doing the right cast? :)
> >> >
> >>
> >> Agree.
> >>
> >> >> And hw->io_base by looking at virtio_pci.h function like
> >> >> inb/outb etc.. takes io_base address as unsigned long types which is
> >> >> arch dependent; i.e.. 4 byte for 32 bit and 8 for 64 bit so the lower
> >> >> level rd/wr apis are taking care of data-types accordingly.
> >> >
> >> > Didn't get it. inb/outb takes "unsigned short" arguments, but not
> >> > "unsigned long".
> >> >
> >>
> >> sys/io.h in x86 case using unsigned short int  types..
> >>
> >> include/asm-generic/io.h for arm64 using it unsigned long (from linux
> >> header files)
> >>
> >> In such case keeping
> >> #define VIRTIO_PCI_REG_ADDR(hw, reg) \
> >> (unsigned short)((hw)->io_base + (reg))
> >>
> >> would be x86 specific and what I thought and used in this patch is
> >>
> >> #define VIRTIO_PCI_REG_ADDR(hw, reg) \
> >> (unsigned long)((hw)->io_base + (reg))
> >>
> >> to avoid ifdef ARM or non-x86..clutter, I know data-type is not right
> >> fit for x86 sys/io.h but considering possible address inside
> >> hw->io_base, wont effect functionality and performance my any mean.
> >> That is why at virtio_ethdev_init() i choose to keep it in hw->io_base
> >> = (uint64_t) types.
> >>
> >> Otherwise I'll have to duplicate VIRTIO_PCI_REG_XXX definition for
> >> non-x86 case, Pl. suggest better alternative. Thanks
> >
> >
> > My understanding is that if you have done the right cast in the first
> > time (at the io_base assignment), casting from a short type to a longer
> > type will not matter: the upper bits will be filled with zero.
> >
> > So, I guess we are fine here. I'm thinking that the extra cast in
> > VIRTIO_PCI_REG_ADDR() is not necessary, as C will do the right
> > cast for different inb(), say cast it to "unsigned short" for x86,
> > and "unsigned long" for your arm implementation. The same to
> > other io helpers.
> >
> 
> so to summarize and correct me if i misunderstood,
> keep hw->io_base = (uint64_t)

I still want a different explicit cast for x86 and non-x86. And
actually, we should cast it to (unsigned short) but not (uint32_t)
for x86, don't we?

On the other hand, we may cast it to uint64_t unconditionally,
and then have an explicit sanity check for io_base for x86, say

if ((unsigned short)hw->io_base != hw->io_base) {
PMD_INIT_LOG(ERR, "invalid io port: %"PRIx64, ...);
return -1;
}

It's better than the (unsigned short) cast, as the later simply hides
issue when something 

[dpdk-dev] [PATCH] Unlink existing unused sockets at start up

2015-12-17 Thread Ilya Maximets
On 17.12.2015 07:21, Zhihong Wang wrote:
> This patch unlinks existing unused sockets (which cause new bindings to fail, 
> e.g. vHost PMD) to ensure smooth startup.
> In a lot of cases DPDK applications are terminated abnormally without proper 
> resource release.

Original OVS related problem discussed previously here
( http://dpdk.org/ml/archives/dev/2015-December/030326.html )
fixed in OVS by

commit 9b5422a98f817b9f2a1f8224cab7e1a8d0bbba1f
Author: Ilya Maximets 
Date:   Wed Dec 16 15:32:21 2015 +0300

ovs-lib: Try to call exit before killing.

While killing OVS may not free all allocated resources.

Example:
Socket for vhost-user port will stay in a system
after 'systemctl stop openvswitch' and opening
that port after restart will fail.


So, the crash of application is the last point of discussion.

> Therefore, DPDK libs should be able to deal with unclean boot environment.

Why are you think that recovery after crash of application
is a problem of underneath library?

Best regards, Ilya Maximets.

> 
> Signed-off-by: Zhihong Wang 
> ---
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 28 
> 
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> index 8b7a448..eac0721 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -120,18 +120,38 @@ uds_socket(const char *path)
>   sockfd = socket(AF_UNIX, SOCK_STREAM, 0);
>   if (sockfd < 0)
>   return -1;
> - RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd:%d\n", sockfd);
> + RTE_LOG(INFO, VHOST_CONFIG, "socket created, fd: %d\n", sockfd);
>  
>   memset(, 0, sizeof(un));
>   un.sun_family = AF_UNIX;
>   snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
>   ret = bind(sockfd, (struct sockaddr *), sizeof(un));
>   if (ret == -1) {
> - RTE_LOG(ERR, VHOST_CONFIG, "fail to bind fd:%d, remove file:%s 
> and try again.\n",
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "bind fd: %d to file: %s failed, checking socket...\n",
>   sockfd, path);
> - goto err;
> + ret = connect(sockfd, (struct sockaddr *), sizeof(un));
> + if (ret == -1) {
> + RTE_LOG(INFO, VHOST_CONFIG,
> + "socket: %s is inactive, rebinding after 
> unlink...\n", path);
> + unlink(path);
> + ret = bind(sockfd, (struct sockaddr *), sizeof(un));
> + if (ret == -1) {
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "bind fd: %d to file: %s failed even 
> after unlink\n",
> + sockfd, path);
> + goto err;
> + }
> + } else {
> + RTE_LOG(INFO, VHOST_CONFIG,
> + "socket: %s is alive, remove it and try 
> again\n", path);
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "bind fd: %d to file: %s failed\n", sockfd, 
> path);
> + goto err;
> + }
>   }
> - RTE_LOG(INFO, VHOST_CONFIG, "bind to %s\n", path);
> + RTE_LOG(INFO, VHOST_CONFIG,
> + "bind fd: %d to file: %s successful\n", sockfd, path);
>  
>   ret = listen(sockfd, MAX_VIRTIO_BACKLOG);
>   if (ret == -1)
> 


[dpdk-dev] [PATCH v2 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-17 Thread Yuanhan Liu
On Mon, Dec 14, 2015 at 09:14:42AM +0800, Huawei Xie wrote:
> pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand
> 
> Signed-off-by: Gerald Rogers 
> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 

Acked-by: Yuanhan Liu 
Tested-by: Yuanhan Liu 

Thanks.

--yliu


[dpdk-dev] [PATCH v2 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-17 Thread Yuanhan Liu
On Mon, Dec 14, 2015 at 09:14:41AM +0800, Huawei Xie wrote:
> v2 changes:
>  unroll the loop a bit to help the performance
> 
> rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.
> 
> There is related thread about this bulk API.
> http://dpdk.org/dev/patchwork/patch/4718/
> Thanks to Konstantin's loop unrolling.
> 
> Signed-off-by: Gerald Rogers 
> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 
> ---
>  lib/librte_mbuf/rte_mbuf.h | 50 
> ++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f234ac9..4e209e0 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1336,6 +1336,56 @@ static inline struct rte_mbuf 
> *rte_pktmbuf_alloc(struct rte_mempool *mp)
>  }
>  
>  /**
> + * Allocate a bulk of mbufs, initialize refcnt and reset the fields to 
> default
> + * values.
> + *
> + *  @param pool
> + *The mempool from which mbufs are allocated.
> + *  @param mbufs
> + *Array of pointers to mbufs
> + *  @param count
> + *Array size
> + *  @return
> + *   - 0: Success
> + */
> +static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
> +  struct rte_mbuf **mbufs, unsigned count)

It violates the coding style a bit.

> +{
> + unsigned idx = 0;
> + int rc;
> +
> + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> + if (unlikely(rc))
> + return rc;
> +
> + switch (count % 4) {
> + while (idx != count) {

Well, that's an awkward trick, putting while between switch and case.

How about moving the whole switch block ahead, and use goto?

switch (count % 4) {
case 3:
goto __3;
break;
case 2:
goto __2;
break;
...

}

It basically generates same instructions, yet it improves the
readability a bit.

--yliu

> + case 0:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 3:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 2:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 1:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + }
> + }
> + return 0;
> +}
> +
> +/**
>   * Attach packet mbuf to another packet mbuf.
>   *
>   * After attachment we refer the mbuf we attached as 'indirect',
> -- 
> 1.8.1.4


[dpdk-dev] [PATCH 2/2] doc: init next release notes

2015-12-17 Thread Mcnamara, John
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, December 17, 2015 11:17 AM
> To: Mcnamara, John
> Cc: dev at dpdk.org
> Subject: [PATCH 2/2] doc: init next release notes
> 
> Signed-off-by: Thomas Monjalon 

Acked-by: John McNamara 


[dpdk-dev] [PATCH] doc: remove DPDK from guide titles

2015-12-17 Thread Mcnamara, John
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, December 17, 2015 11:16 AM
> To: Mcnamara, John
> Cc: dev at dpdk.org
> Subject: [PATCH] doc: remove DPDK from guide titles
> 
> In HTML and PDF guides, it is clear in the header that the doc is related
> to the DPDK.
> So "DPDK" is redundant and can be removed from FAQ and release notes
> titles to improve consistency.
> 
> Signed-off-by: Thomas Monjalon 

Good point.

Acked-by: John McNamara 





[dpdk-dev] [ [PATCH v2] 05/13] virtio: change io_base datatype from uint32_t to uint64_type

2015-12-17 Thread Santosh Shukla
On Thu, Dec 17, 2015 at 12:49 PM, Yuanhan Liu
 wrote:
> On Wed, Dec 16, 2015 at 08:35:58PM +0530, Santosh Shukla wrote:
>> On Wed, Dec 16, 2015 at 8:28 PM, Yuanhan Liu
>>  wrote:
>> > On Wed, Dec 16, 2015 at 08:09:40PM +0530, Santosh Shukla wrote:
>> >> On Wed, Dec 16, 2015 at 7:53 PM, Yuanhan Liu
>> >>  wrote:
>> >> > On Wed, Dec 16, 2015 at 07:31:57PM +0530, Santosh Shukla wrote:
>> >> >> On Wed, Dec 16, 2015 at 7:18 PM, Yuanhan Liu
>> >> >>  wrote:
>> >> >> > On Mon, Dec 14, 2015 at 06:30:24PM +0530, Santosh Shukla wrote:
>> >> >> >> In x86 case io_base to store ioport address not more than 65535 
>> >> >> >> ioports. i.e..0
>> >> >> >> to  but in non-x86 case in particular arm64 it need to store 
>> >> >> >> more than 32
>> >> >> >> bit address so changing io_base datatype from 32 to 64.
>> >> >> >>
>> >> >> >> Signed-off-by: Santosh Shukla 
>> >> >> >> ---
>> >> >> >>  drivers/net/virtio/virtio_ethdev.c |2 +-
>> >> >> >>  drivers/net/virtio/virtio_pci.h|4 ++--
>> >> >> >>  2 files changed, 3 insertions(+), 3 deletions(-)
>> >> >> >>
>> >> >> >> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>> >> >> >> b/drivers/net/virtio/virtio_ethdev.c
>> >> >> >> index d928339..620e0d4 100644
>> >> >> >> --- a/drivers/net/virtio/virtio_ethdev.c
>> >> >> >> +++ b/drivers/net/virtio/virtio_ethdev.c
>> >> >> >> @@ -1291,7 +1291,7 @@ eth_virtio_dev_init(struct rte_eth_dev 
>> >> >> >> *eth_dev)
>> >> >> >>   return -1;
>> >> >> >>
>> >> >> >>   hw->use_msix = virtio_has_msix(_dev->addr);
>> >> >> >> - hw->io_base = 
>> >> >> >> (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
>> >> >> >> + hw->io_base = 
>> >> >> >> (uint64_t)(uintptr_t)pci_dev->mem_resource[0].addr;
>> >> >> >
>> >> >> > I'd suggest to move the io_base assignment (and cast) into 
>> >> >> > virtio_ioport_init()
>> >> >> > so that we could do the correct cast there, say cast it to uint32_t 
>> >> >> > for
>> >> >> > X86, and uint64_t for others.
>> >> >> >
>> >> >>
>> >> >> Ok.
>> >> >>
>> >> >> This was deliberately done considering your 1.0 virtio spec patch do
>> >> >> care for uint64_t types and in arm64 case, If I plan to use those
>> >> >> future patches, IMO it make more sense to me keep it in uint64_t way;
>> >> >
>> >> > I did different cast, 32 bit for legacy virtio pci device, and 64 bit
>> >> > for modern virtio pci device.
>> >> >
>> >> >> Also in x86 case max address could of type 0x1000-101f and so forth;
>> >> >> changing data-type to uint64_t default wont effect such address,
>> >> >> right?
>> >> >
>> >> > Right, but what's the harm of doing the right cast? :)
>> >> >
>> >>
>> >> Agree.
>> >>
>> >> >> And hw->io_base by looking at virtio_pci.h function like
>> >> >> inb/outb etc.. takes io_base address as unsigned long types which is
>> >> >> arch dependent; i.e.. 4 byte for 32 bit and 8 for 64 bit so the lower
>> >> >> level rd/wr apis are taking care of data-types accordingly.
>> >> >
>> >> > Didn't get it. inb/outb takes "unsigned short" arguments, but not
>> >> > "unsigned long".
>> >> >
>> >>
>> >> sys/io.h in x86 case using unsigned short int  types..
>> >>
>> >> include/asm-generic/io.h for arm64 using it unsigned long (from linux
>> >> header files)
>> >>
>> >> In such case keeping
>> >> #define VIRTIO_PCI_REG_ADDR(hw, reg) \
>> >> (unsigned short)((hw)->io_base + (reg))
>> >>
>> >> would be x86 specific and what I thought and used in this patch is
>> >>
>> >> #define VIRTIO_PCI_REG_ADDR(hw, reg) \
>> >> (unsigned long)((hw)->io_base + (reg))
>> >>
>> >> to avoid ifdef ARM or non-x86..clutter, I know data-type is not right
>> >> fit for x86 sys/io.h but considering possible address inside
>> >> hw->io_base, wont effect functionality and performance my any mean.
>> >> That is why at virtio_ethdev_init() i choose to keep it in hw->io_base
>> >> = (uint64_t) types.
>> >>
>> >> Otherwise I'll have to duplicate VIRTIO_PCI_REG_XXX definition for
>> >> non-x86 case, Pl. suggest better alternative. Thanks
>> >
>> >
>> > My understanding is that if you have done the right cast in the first
>> > time (at the io_base assignment), casting from a short type to a longer
>> > type will not matter: the upper bits will be filled with zero.
>> >
>> > So, I guess we are fine here. I'm thinking that the extra cast in
>> > VIRTIO_PCI_REG_ADDR() is not necessary, as C will do the right
>> > cast for different inb(), say cast it to "unsigned short" for x86,
>> > and "unsigned long" for your arm implementation. The same to
>> > other io helpers.
>> >
>>
>> so to summarize and correct me if i misunderstood,
>> keep hw->io_base = (uint64_t)
>
> I still want a different explicit cast for x86 and non-x86. And
> actually, we should cast it to (unsigned short) but not (uint32_t)
> for x86, don't we?
>
> On the other hand, we may cast it to uint64_t unconditionally,
> and then have an explicit sanity check for io_base for x86, say
>
> if ((unsigned short)hw->io_base != hw->io_base) {
> 

[dpdk-dev] [ [PATCH v2] 01/13] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2015-12-17 Thread Thomas Monjalon
2015-12-17 17:32, Santosh Shukla:
> On Mon, Dec 14, 2015 at 6:30 PM, Santosh Shukla  wrote:
> > virtio_recv_pkts_vec and other virtio vector friend apis are written for 
> > sse/avx
> > instructions. For arm64 in particular, virtio vector implementation does not
> > exist(todo).
> >
> > So virtio pmd driver wont build for targets like i686, arm64.  By making
> > RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will 
> > work
> > in non-vectored virtio mode.
> >
> > Signed-off-by: Santosh Shukla 
> > ---
> 
> Ping?
> 
> any review  / comment on this patch much appreciated. Thanks

Why not check for SSE/AVX support instead of adding yet another config option?


[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Successfully setup DPDK OVS with vhostuser

2015-12-17 Thread Czesnowicz, Przemyslaw
I haven't tried that approach not sure if that would work, it seems clunky.

If you enable ovsdpdk ml2 mechanism driver and agent all of that (add ports to 
ovs with the right type, pass the sockets to qemu) would be done by OpenStack.

Przemek

From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com]
Sent: Thursday, December 17, 2015 12:41 PM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser

Hi Przemek,

Thank you so much for sharing the ref guide.

Would be appreciate if clear one doubt.

At present we are setting up openstack kilo interactively and further replacing 
ovs with ovs-dpdk enabled.
Once the above setup done, We are creating instance in openstack and passing 
that instance id to QEMU command line which further passes the vhost-user 
sockets to instances, enabling the DPDK libraries in it.

Isn't this the correct way of integrating ovs-dpdk with openstack?


Thanks & Regards
Abhijeet Karve




From:"Czesnowicz, Przemyslaw" mailto:przemyslaw.czesnow...@intel.com>>
To:Abhijeet Karve mailto:abhijeet.karve at 
tcs.com>>
Cc:"dev at dpdk.org" mailto:dev at dpdk.org>>, "discuss at openvswitch.org" mailto:discuss at 
openvswitch.org>>, "Gray, Mark D" mailto:mark.d.gray 
at intel.com>>
Date:12/17/2015 05:27 PM
Subject:RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser




HI Abhijeet,

For Kilo you need to use ovsdpdk mechanism driver and a matching agent to 
integrate ovs-dpdk with OpenStack.

The guide you are following only talks about running ovs-dpdk not how it should 
be integrated with OpenStack.

Please follow this guide:
https://github.com/openstack/networking-ovs-dpdk/blob/stable/kilo/doc/source/getstarted/ubuntu.rst

Best regards
Przemek


From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com]
Sent: Wednesday, December 16, 2015 9:37 AM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at 
openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser

Hi Przemek,


We have configured the accelerated data path between a physical interface to 
the VM using openvswitch netdev-dpdk with vhost-user support. The VM created 
with this special data path and vhost library, I am calling as DPDK instance.

If assigning ip manually to the newly created Cirros VM instance, We are able 
to make 2 VM's to communicate on the same compute node. Else it's not 
associating any ip through DHCP though DHCP is in compute node only.

Yes it's a compute + controller node setup and we are using following software 
platform on compute node:
_
Openstack: Kilo
Distribution: Ubuntu 14.04
OVS Version: 2.4.0
DPDK 2.0.0
_

We are following the intel guide 
https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200

When doing "ovs-vsctl show" in compute node, it shows below output:
_
ovs-vsctl show
c2ec29a5-992d-4875-8adc-1265c23e0304
   Bridge br-ex
   Port phy-br-ex
   Interface phy-br-ex
   type: patch
   options: {peer=int-br-ex}
   Port br-ex
   Interface br-ex
   type: internal
   Bridge br-tun
   fail_mode: secure
   Port br-tun
   Interface br-tun
   type: internal
   Port patch-int
   Interface patch-int
   type: patch
   options: {peer=patch-tun}
   Bridge br-int
   fail_mode: secure
   Port "qvo0ae19a43-b6"
   tag: 2
   Interface "qvo0ae19a43-b6"
   Port br-int
   Interface br-int
   type: internal
   Port "qvo31c89856-a2"
   tag: 1
   Interface "qvo31c89856-a2"
   Port patch-tun
   Interface patch-tun
   type: patch
   options: {peer=patch-int}
   Port int-br-ex
   Interface int-br-ex
   type: patch
   options: {peer=phy-br-ex}
   Port "qvo97fef28a-ec"
   tag: 2
   Interface "qvo97fef28a-ec"
   Bridge br-dpdk
   Port br-dpdk
   Interface br-dpdk
   type: internal
   Bridge "br0"
   Port "br0"
   Interface "br0"
   type: internal
   Port "dpdk0"
   Interface "dpdk0"
   type: dpdk
   Port "vhost-user-2"
   Interface "vhost-user-2"
   type: dpdkvhostuser
   Port "vhost-user-0"
   Interface "vhost-user-0"
   type: dpdkvhostuser
   Port "vhost-user-1"
   Interface "vhost-user-1"
   type: dpdkvhostuser
   ovs_version: "2.4.0"

[dpdk-dev] dpdk multi process increase the number of mbufs, throughput gets dropped

2015-12-17 Thread 张伟
Hi all, 


When running the multi process example, does anybody know that why increasing 
the number of mbufs, the performance gets dropped. 


In multi process example, there are two macros which are related to the number 
of mbufs


#defineMBUFS_PER_CLIENT1536
|
| #defineMBUFS_PER_PORT1536 |
| |


If increasing these two numbers by 8 times, the performance drops about 10%. 
Does anybody know why?

| constunsigned num_mbufs = (num_clients * MBUFS_PER_CLIENT) \ |
| | + (ports->num_ports * MBUFS_PER_PORT); |
| pktmbuf_pool = rte_mempool_create(PKTMBUF_POOL_NAME, num_mbufs, |
| | MBUF_SIZE, MBUF_CACHE_SIZE, |
| | sizeof(struct rte_pktmbuf_pool_private), rte_pktmbuf_pool_init, |
| | NULL, rte_pktmbuf_init, NULL, rte_socket_id(), NO_FLAGS ); |


[dpdk-dev] make install and RTE_KERNELDIR in dpdk 2.2

2015-12-17 Thread Thomas Monjalon
2015-12-17 12:11, Piotr Bartosiewicz:
> W dniu 17.12.2015 o 00:26, Thomas Monjalon pisze:
> > 2015-12-16 15:14, Piotr Bartosiewicz:
> >> A new 'make install' wrongly assumes that the output module name is
> >> always 'uname -r' even if RTE_KERNELDIR is passed.
> > No it does not assume anything, it is just a default value.
> > How can you find the directory based on RTE_KERNELDIR?
> >
> > You can set kerneldir=something-else on the "make install" command line.
> 
> OK, I understand kerneldir in general can't be guessed from RTE_KERNELDIR,
> but maybe there should be some hint in docs to pass kerneldir when 
> RTE_KERNELDIR is used.

Yes you are right, it can be better documented.

> In my case the working command is:
> make install T=... DESTDIR=... 
> RTE_KERNELDIR=/lib/modules/3.16.0-4-amd64/build 
> kerneldir=/lib/modules/3.16.0-4-amd64/extra/dpdk

Please, feel free to update the doc.
Thanks


[dpdk-dev] [PATCH 2/2] doc: init next release notes

2015-12-17 Thread Thomas Monjalon
Signed-off-by: Thomas Monjalon 
---
 doc/guides/rel_notes/index.rst   |  1 +
 doc/guides/rel_notes/release_2_3.rst | 76 
 2 files changed, 77 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_2_3.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index e633e13..29013cf 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -36,6 +36,7 @@ Release Notes
 :numbered:

 rel_description
+release_2_3
 release_2_2
 release_2_1
 release_2_0
diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
new file mode 100644
index 000..99de186
--- /dev/null
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -0,0 +1,76 @@
+DPDK Release 2.3
+
+
+New Features
+
+
+
+Resolved Issues
+---
+
+EAL
+~~~
+
+
+Drivers
+~~~
+
+
+Libraries
+~
+
+
+Examples
+
+
+
+Other
+~
+
+
+Known Issues
+
+
+
+API Changes
+---
+
+
+ABI Changes
+---
+
+
+Shared Library Versions
+---
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+ libethdev.so.2
+ librte_acl.so.2
+ librte_cfgfile.so.2
+ librte_cmdline.so.1
+ librte_distributor.so.1
+ librte_eal.so.2
+ librte_hash.so.2
+ librte_ip_frag.so.1
+ librte_ivshmem.so.1
+ librte_jobstats.so.1
+ librte_kni.so.2
+ librte_kvargs.so.1
+ librte_lpm.so.2
+ librte_mbuf.so.2
+ librte_mempool.so.1
+ librte_meter.so.1
+ librte_pipeline.so.2
+ librte_pmd_bond.so.1
+ librte_pmd_ring.so.2
+ librte_port.so.2
+ librte_power.so.1
+ librte_reorder.so.1
+ librte_ring.so.1
+ librte_sched.so.1
+ librte_table.so.2
+ librte_timer.so.1
+ librte_vhost.so.2
-- 
2.5.2



[dpdk-dev] [PATCH 1/2] version: 2.3.0-rc0

2015-12-17 Thread Thomas Monjalon
Signed-off-by: Thomas Monjalon 
---
 lib/librte_eal/common/include/rte_version.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_version.h 
b/lib/librte_eal/common/include/rte_version.h
index bb3e9fc..6b1890e 100644
--- a/lib/librte_eal/common/include/rte_version.h
+++ b/lib/librte_eal/common/include/rte_version.h
@@ -60,7 +60,7 @@ extern "C" {
 /**
  * Minor version number i.e. the y in x.y.z
  */
-#define RTE_VER_MINOR 2
+#define RTE_VER_MINOR 3

 /**
  * Patch level number i.e. the z in x.y.z
@@ -70,14 +70,14 @@ extern "C" {
 /**
  * Extra string to be appended to version number
  */
-#define RTE_VER_SUFFIX ""
+#define RTE_VER_SUFFIX "-rc"

 /**
  * Patch release number
  *   0-15 = release candidates
  *   16   = release
  */
-#define RTE_VER_PATCH_RELEASE 16
+#define RTE_VER_PATCH_RELEASE 0

 /**
  * Macro to compute a version number usable for comparisons
-- 
2.5.2



[dpdk-dev] [PATCH] doc: remove DPDK from guide titles

2015-12-17 Thread Thomas Monjalon
In HTML and PDF guides, it is clear in the header that the doc
is related to the DPDK.
So "DPDK" is redundant and can be removed from FAQ and release notes
titles to improve consistency.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/faq/index.rst   | 4 ++--
 doc/guides/rel_notes/index.rst | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/guides/faq/index.rst b/doc/guides/faq/index.rst
index 6ca659f..264a3a9 100644
--- a/doc/guides/faq/index.rst
+++ b/doc/guides/faq/index.rst
@@ -28,8 +28,8 @@
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-DPDK FAQ
-
+FAQ
+===

 This document contains some Frequently Asked Questions that arise when working 
with DPDK.

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 007308f..29013cf 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -28,8 +28,8 @@
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

-DPDK Release Notes
-==
+Release Notes
+=

 .. toctree::
 :maxdepth: 1
-- 
2.5.2



[dpdk-dev] [PATCH] Patch introducing API to read/write Intel Architecture Model Specific Registers (MSR), rte_msr_read and rte_msr_write functions.

2015-12-17 Thread Wojciech Andralojc
There is work in progress to implement Intel Cache Allocation Technology (CAT) 
support in DPDK, this technology is programmed through MSRs.
In the future it will be possible to program CAT through Linux cgroups and DPDK 
CAT implementation will take advantage of it.

MSR R/W's are privileged ring 0 operations and they must be done in kernel 
space. For this reason implementation utilizes Linux MSR driver.

Signed-off-by: Wojciech Andralojc 
---
 lib/librte_eal/common/Makefile |   1 +
 lib/librte_eal/common/include/arch/arm/rte_msr.h   |  65 ++
 .../common/include/arch/ppc_64/rte_msr.h   |  65 ++
 lib/librte_eal/common/include/arch/tile/rte_msr.h  |  65 ++
 lib/librte_eal/common/include/arch/x86/rte_msr.h   | 143 +
 lib/librte_eal/common/include/generic/rte_msr.h|  78 +++
 lib/librte_eal/common/include/rte_lcore.h  |  18 +++
 7 files changed, 435 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_msr.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_msr.h
 create mode 100644 lib/librte_eal/common/include/arch/tile/rte_msr.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_msr.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_msr.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index f5ea0ee..567c206 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -48,6 +48,7 @@ endif

 GENERIC_INC := rte_atomic.h rte_byteorder.h rte_cycles.h rte_prefetch.h
 GENERIC_INC += rte_spinlock.h rte_memcpy.h rte_cpuflags.h rte_rwlock.h
+GENERIC_INC += rte_msr.h
 # defined in mk/arch/$(RTE_ARCH)/rte.vars.mk
 ARCH_DIR ?= $(RTE_ARCH)
 ARCH_INC := $(notdir $(wildcard 
$(RTE_SDK)/lib/librte_eal/common/include/arch/$(ARCH_DIR)/*.h))
diff --git a/lib/librte_eal/common/include/arch/arm/rte_msr.h 
b/lib/librte_eal/common/include/arch/arm/rte_msr.h
new file mode 100644
index 000..85c009c
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_msr.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_MSR_ARM_H_
+#define _RTE_MSR_ARM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_msr.h"
+
+/* Function to read CPU's MSR */
+static inline int
+rte_msr_read(__attribute__((unused)) const unsigned lcore,
+   __attribute__((unused)) const uint32_t reg,
+   __attribute__((unused)) uint64_t *value)
+{
+   return -1;
+}
+
+/* Function to write CPU's MSR */
+static inline int
+rte_msr_write(__attribute__((unused)) const unsigned lcore,
+   __attribute__((unused)) const uint32_t reg,
+   __attribute__((unused)) const uint64_t value)
+{
+   return -1;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MSR_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_msr.h 
b/lib/librte_eal/common/include/arch/ppc_64/rte_msr.h
new file mode 100644
index 000..44f3de2
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_msr.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   

[dpdk-dev] make install and RTE_KERNELDIR in dpdk 2.2

2015-12-17 Thread Piotr Bartosiewicz


W dniu 17.12.2015 o 00:26, Thomas Monjalon pisze:
> 2015-12-16 15:14, Piotr Bartosiewicz:
>> A new 'make install' wrongly assumes that the output module name is
>> always 'uname -r' even if RTE_KERNELDIR is passed.
> No it does not assume anything, it is just a default value.
> How can you find the directory based on RTE_KERNELDIR?
>
> You can set kerneldir=something-else on the "make install" command line.

OK, I understand kerneldir in general can't be guessed from RTE_KERNELDIR,
but maybe there should be some hint in docs to pass kerneldir when 
RTE_KERNELDIR is used.

In my case the working command is:
make install T=... DESTDIR=... 
RTE_KERNELDIR=/lib/modules/3.16.0-4-amd64/build 
kerneldir=/lib/modules/3.16.0-4-amd64/extra/dpdk

Thanks


[dpdk-dev] [PATCH v2 0/6] vhost-user live migration support

2015-12-17 Thread Iremonger, Bernard
Hi Yuanhan,

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Thursday, December 17, 2015 3:12 AM
> To: dev at dpdk.org
> Cc: Xie, Huawei ; Michael S. Tsirkin
> ; Victor Kaplansky ; Iremonger,
> Bernard ; Pavel Fedin
> ; Peter Xu ; Yuanhan Liu
> ; Chen, Zhihui ;
> Yang, Maggie 
> Subject: [PATCH v2 0/6] vhost-user live migration support
> 
> This patch set adds the vhost-user live migration support.
> 
> The major task behind that is to log pages we touched during live migration,
> including used vring and desc buffer. So, this patch set is basically about
> adding vhost log support, and using it.
> 
> Patchset
> 
> - Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
>   the dirty memory bitmap is.
> 
> - Patch 2 introduces a vhost_log_write() helper function to log
>   pages we are gonna change.
> 
> - Patch 3 logs changes we made to used vring.
> 
> - Patch 4 logs changes we made to vring desc buffer.
> 
> - Patch 5 and 6 add some feature bits related to live migration.
> 
>

The follow test guide should probably be added the DPDK doc files.
It could be added to the sample app guide or the programmers guide.
There is already a Vhost Library  section in the programmers guide and
A Vhost Sample Application section in the sample app guide.


> A simple test guide (on same host)
> ==
> 
> The following test is based on OVS + DPDK (check [0] for how to setup OVS +
> DPDK):
> 
> [0]: http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
> 
> Here is the rough test guide:
> 
> 1. start ovs-vswitchd
> 
> 2. Add two ovs vhost-user port, say vhost0 and vhost1
> 
> 3. Start a VM1 to connect to vhost0. Here is my example:
> 
>$ $QEMU -enable-kvm -m 1024 -smp 4 \
>-chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
>-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>-device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>-object memory-backend-file,id=mem,size=1024M,mem-
> path=$HOME/hugetlbfs,share=on \
>-numa node,memdev=mem -mem-prealloc \
>-kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>-hda fc-19-i386.img \
>-monitor telnet::,server,nowait -curses
> 
> 4. run "ping $host" inside VM1
> 
> 5. Start VM2 to connect to vhost0, and marking it as the target
>of live migration (by adding -incoming tcp:0: option)
> 
>$ $QEMU -enable-kvm -m 1024 -smp 4 \
>-chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
>-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
>-device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
>-object memory-backend-file,id=mem,size=1024M,mem-
> path=$HOME/hugetlbfs,share=on \
>-numa node,memdev=mem -mem-prealloc \
>-kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
>-hda fc-19-i386.img \
>-monitor telnet::3334,server,nowait -curses \
>-incoming tcp:0:
> 
> 6. connect to VM1 monitor, and start migration:
> 
>> migrate tcp:0:
> 
> 7. After a while, you will find that VM1 has been migrated to VM2,
>and the "ping" command continues running, perfectly.
> 
> 
> Cc: Chen Zhihui 
> Cc: Yang Maggie 
> ---
> Yuanhan Liu (6):
>   vhost: handle VHOST_USER_SET_LOG_BASE request
>   vhost: introduce vhost_log_write
>   vhost: log used vring changes
>   vhost: log vring desc buffer changes
>   vhost: claim that we support GUEST_ANNOUNCE feature
>   vhost: enable log_shmfd protocol feature
> 
>  lib/librte_vhost/rte_virtio_net.h | 36 ++-
>  lib/librte_vhost/vhost_rxtx.c | 88 
> +++
>  lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 ++-
> lib/librte_vhost/vhost_user/vhost-net-user.h  |  6 ++
> lib/librte_vhost/vhost_user/virtio-net-user.c | 48 +++
> lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +-
>  lib/librte_vhost/virtio-net.c |  5 ++
>  7 files changed, 165 insertions(+), 30 deletions(-)
> 
> --
> 1.9.0

Regards,

Bernard.



[dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# Successfully setup DPDK OVS with vhostuser

2015-12-17 Thread Czesnowicz, Przemyslaw
HI Abhijeet,

For Kilo you need to use ovsdpdk mechanism driver and a matching agent to 
integrate ovs-dpdk with OpenStack.

The guide you are following only talks about running ovs-dpdk not how it should 
be integrated with OpenStack.

Please follow this guide:
https://github.com/openstack/networking-ovs-dpdk/blob/stable/kilo/doc/source/getstarted/ubuntu.rst

Best regards
Przemek


From: Abhijeet Karve [mailto:abhijeet.ka...@tcs.com]
Sent: Wednesday, December 16, 2015 9:37 AM
To: Czesnowicz, Przemyslaw
Cc: dev at dpdk.org; discuss at openvswitch.org; Gray, Mark D
Subject: RE: [dpdk-dev] DPDK OVS on Ubuntu 14.04# Issue's Resolved# 
Successfully setup DPDK OVS with vhostuser

Hi Przemek,


We have configured the accelerated data path between a physical interface to 
the VM using openvswitch netdev-dpdk with vhost-user support. The VM created 
with this special data path and vhost library, I am calling as DPDK instance.

If assigning ip manually to the newly created Cirros VM instance, We are able 
to make 2 VM's to communicate on the same compute node. Else it's not 
associating any ip through DHCP though DHCP is in compute node only.

Yes it's a compute + controller node setup and we are using following software 
platform on compute node:
_
Openstack: Kilo
Distribution: Ubuntu 14.04
OVS Version: 2.4.0
DPDK 2.0.0
_

We are following the intel guide 
https://software.intel.com/en-us/blogs/2015/06/09/building-vhost-user-for-ovs-today-using-dpdk-200

When doing "ovs-vsctl show" in compute node, it shows below output:
_
ovs-vsctl show
c2ec29a5-992d-4875-8adc-1265c23e0304
Bridge br-ex
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port br-ex
Interface br-ex
type: internal
Bridge br-tun
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Bridge br-int
fail_mode: secure
Port "qvo0ae19a43-b6"
tag: 2
Interface "qvo0ae19a43-b6"
Port br-int
Interface br-int
type: internal
Port "qvo31c89856-a2"
tag: 1
Interface "qvo31c89856-a2"
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port "qvo97fef28a-ec"
tag: 2
Interface "qvo97fef28a-ec"
Bridge br-dpdk
Port br-dpdk
Interface br-dpdk
type: internal
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port "dpdk0"
Interface "dpdk0"
type: dpdk
Port "vhost-user-2"
Interface "vhost-user-2"
type: dpdkvhostuser
Port "vhost-user-0"
Interface "vhost-user-0"
type: dpdkvhostuser
Port "vhost-user-1"
Interface "vhost-user-1"
type: dpdkvhostuser
ovs_version: "2.4.0"
root at dpdk:~#
_

Open flows output in bridge in compute node are as below:
_
root at dpdk:~# ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=71796.741s, table=0, n_packets=519, n_bytes=33794, 
idle_age=19982, hard_age=65534, priority=1,in_port=1 actions=resubmit(,2)
 cookie=0x0, duration=71796.700s, table=0, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=71796.649s, table=2, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, 
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
 cookie=0x0, duration=71796.610s, table=2, n_packets=519, n_bytes=33794, 
idle_age=19982, hard_age=65534, 
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
 cookie=0x0, duration=71794.631s, table=3, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=1,tun_id=0x5c 
actions=mod_vlan_vid:2,resubmit(,10)
 cookie=0x0, duration=71794.316s, table=3, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=1,tun_id=0x57 
actions=mod_vlan_vid:1,resubmit(,10)
 cookie=0x0, duration=71796.565s, table=3, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=71796.522s, table=4, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=0 actions=drop
 cookie=0x0, duration=71796.481s, table=10, n_packets=0, n_bytes=0, 
idle_age=65534, hard_age=65534, priority=1 

[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Thomas Monjalon
2015-12-17 15:51, Santosh Shukla:
> On Thu, Dec 17, 2015 at 3:44 PM, Thomas Monjalon
>  wrote:
> > Hi,
> >
> > 2015-12-17 15:37, Santosh Shukla:
> >> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla  
> >> wrote:
> >> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  
> >> > wrote:
> >> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
> >> >>  wrote:
> >> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
> >>  On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
> >>   wrote:
> >>  > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
> >>  >> x86 requires a special set of instructions to access ioports, but 
> >>  >> other
> >>  >> architectures let you remap io resources.
> >>  >> So let eal remap io resources by accepting IORESOURCE_IO flag for
> >>  >> architectures other than x86.
> >>  >
> >>  > One question: this patch could be a replacement of the igbuio_iomap 
> >>  > patch
> >>  > from Santosh? If so, I like it: It's more elegant.
> >>  >
> >>  > --yliu
> >>  >
> >> 
> >>  I did tried similar in past but not in parse_sysfs (such that
> >>  mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
> >>  pci_map_resource not able to map address hence segfault at tespmd
> >>  initialization.
> >> 
> >>  i was getting these:
> >>  EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
> >>  Invalid argument (0x)
> >> >>>
> >> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
> >> >>>
> >> >>> arch/arm/kernel/bios32.c
> >> >>> 
> >> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct 
> >> >>> *vma,
> >> >>> 619 enum pci_mmap_state mmap_state, int 
> >> >>> write_combine)
> >> >>> 620 {
> >> >>> 621 if (mmap_state == pci_mmap_io)
> >> >>> 622 return -EINVAL;
> >> >>>
> >> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> >> >>> this peice of code may work only on Powerpc platform (and maybe a few
> >> >>> others we don't care).
> >> >>>
> >> >>> So, apparently, this will not work for ARM.
> >> >>>
> >> >>
> >> >> Right and I did shared detailed explanation on why it wont work on
> >> >> this link [1], infact this patch shouldn;t work for mips too.
> >> >>
> >> >> As I mentioned earlier I did tried similar approach and so to get
> >> >> everything working like iomem is currently in dpdk; we need to add
> >> >> something like pci_remap_iospace --> ioremap_page_range() but this api
> >> >> not really pci_mmap_page_range types. user need to write more code on
> >> >> top so to use this api efficiently, also this api looks like meant to
> >> >> use by arch file only in kernel space.
> >> >>
> >> >>
> >> > missed link;
> >> >
> >> > [1] http://dpdk.org/dev/patchwork/patch/9365/
> >> >
> >>
> >> IMO, it is worth keeping one special device file who could work across
> >> archs like arm/arm64/powerpc and others, who could map iopci bar to
> >> dpdk user-space. also this approach has no kernel version dependency
> >> too. BTW; I did mentioned in second approach in to add /dev/ioport
> >> interface in drivers/char/mem.c which could read more than byte in one
> >> single operation, but that has kernel dependency. However that
> >> approach too is arch agnostic.
> >
> > Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
> > really say there is no kernel dependency.
> 
> Agree but I mentioned kernel __version__ dependency.

Yes you did.
One of the main issue with out-of-tree kernel modules is the version
dependency. Probably that igb_uio from DPDK 2.3 will not compile with
the kernel 5.0.

> > We should try to remove the need for any out-of-tree kernel module.
> > That's why the Linux upstream approach is a better solution.
> 
> IIUC, your suggesting archs like arm/arm64 to support io_mappe_io in
> pci_mmap_page_range()?

I don't know what is the best solution in the kernel.
First we need to be sure that there is absolutely no solution without
kernel changes.
Then we can try a pci_mmap solution or, as you suggest, an interface in
drivers/char/mem.c



[dpdk-dev] [PATCH] doc: show version in html guides

2015-12-17 Thread Thomas Monjalon
The version does not appear in the readthedocs theme.
We may try to customize the theme, or just update the project name
as in this patch. The project name is not used in the PDF.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/conf.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 1861443..773c565 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -35,8 +35,6 @@ from sphinx import __version__ as sphinx_version
 from sphinx.highlighting import PygmentsBridge
 from pygments.formatters.latex import LatexFormatter

-project = 'Data Plane Development Kit'
-
 if LooseVersion(sphinx_version) >= LooseVersion('1.3.1'):
 html_theme = "sphinx_rtd_theme"
 html_logo = '../logo/DPDK_logo_vertical_rev_small.png'
@@ -47,6 +45,7 @@ highlight_language = 'none'

 version = subprocess.check_output(['make', '-sRrC', '../../', 
'showversion']).decode('utf-8')
 release = version
+project = 'Data Plane Development Kit ' + version

 master_doc = 'index'

-- 
2.5.2



[dpdk-dev] [PATCH] virtio: fix rx ring descriptor starvation

2015-12-17 Thread Tom Kiely


On 11/25/2015 05:32 PM, Xie, Huawei wrote:
> On 11/13/2015 5:33 PM, Tom Kiely wrote:
>> If all rx descriptors are processed while transient
>> mbuf exhaustion is present, the rx ring ends up with
>> no available descriptors. Thus no packets are received
>> on that ring. Since descriptor refill is performed post
>> rx descriptor processing, in this case no refill is
>> ever subsequently performed resulting in permanent rx
>> traffic drop.
>>
>> Signed-off-by: Tom Kiely 
>> ---
>>   drivers/net/virtio/virtio_rxtx.c |6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 5770fa2..a95e234 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -586,7 +586,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  if (likely(num > DESC_PER_CACHELINE))
>>  num = num - ((rxvq->vq_used_cons_idx + num) % 
>> DESC_PER_CACHELINE);
>>   
>> -if (num == 0)
>> +/* Refill free descriptors even if no pkts recvd */
>> +if (num == 0 && virtqueue_full(rxvq))
> Should the return condition be that no used buffers and we have avail
> descs in avail ring, i.e,
>  num == 0 && rxvq->vq_free_cnt != rxvq->vq_nentries
>
> rather than
>  num == 0 && rxvq->vq_free_cnt == 0
Yes we could do that but I don't see a good reason to wait until the 
vq_free_cnt == vq_nentries
before attempting the refill. The existing code will attempt refill even 
if only 1 packet was received
and the free count is small. To me it seems safer to extend that to try 
refill even if no packet was received
but the free count is non-zero.

Tom

>>  return 0;
>>   
>>  num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>> @@ -683,7 +684,8 @@ virtio_recv_mergeable_pkts(void *rx_queue,
>>   
>>  virtio_rmb();
>>   
>> -if (nb_used == 0)
>> +/* Refill free descriptors even if no pkts recvd */
>> +if (nb_used == 0 && virtqueue_full(rxvq))
>>  return 0;
>>   
>>  PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);



[dpdk-dev] [PATCH] eal: map io resources for non x86 architectures

2015-12-17 Thread Thomas Monjalon
Hi,

2015-12-17 15:37, Santosh Shukla:
> On Thu, Dec 17, 2015 at 3:32 PM, Santosh Shukla  wrote:
> > On Thu, Dec 17, 2015 at 3:31 PM, Santosh Shukla  
> > wrote:
> >> On Thu, Dec 17, 2015 at 3:08 PM, Yuanhan Liu
> >>  wrote:
> >>> On Wed, Dec 16, 2015 at 07:21:55PM +0530, Santosh Shukla wrote:
>  On Wed, Dec 16, 2015 at 6:18 PM, Yuanhan Liu
>   wrote:
>  > On Wed, Dec 16, 2015 at 01:31:04PM +0100, David Marchand wrote:
>  >> x86 requires a special set of instructions to access ioports, but 
>  >> other
>  >> architectures let you remap io resources.
>  >> So let eal remap io resources by accepting IORESOURCE_IO flag for
>  >> architectures other than x86.
>  >
>  > One question: this patch could be a replacement of the igbuio_iomap 
>  > patch
>  > from Santosh? If so, I like it: It's more elegant.
>  >
>  > --yliu
>  >
> 
>  I did tried similar in past but not in parse_sysfs (such that
>  mem.resource_addr to accept IO_RESOURCE_IO types) and observed that
>  pci_map_resource not able to map address hence segfault at tespmd
>  initialization.
> 
>  i was getting these:
>  EAL: pci_map_resource(): cannot mmap(19, 0x7fa5c0, 0x20, 0x0):
>  Invalid argument (0x)
> >>>
> >>> That's because ARM (at least the kernel) doesn't allow an IO map:
> >>>
> >>> arch/arm/kernel/bios32.c
> >>> 
> >>> 618 int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct 
> >>> *vma,
> >>> 619 enum pci_mmap_state mmap_state, int 
> >>> write_combine)
> >>> 620 {
> >>> 621 if (mmap_state == pci_mmap_io)
> >>> 622 return -EINVAL;
> >>>
> >>> And with a quick glimpse of powerpc, I see no such limitation. Hence,
> >>> this peice of code may work only on Powerpc platform (and maybe a few
> >>> others we don't care).
> >>>
> >>> So, apparently, this will not work for ARM.
> >>>
> >>
> >> Right and I did shared detailed explanation on why it wont work on
> >> this link [1], infact this patch shouldn;t work for mips too.
> >>
> >> As I mentioned earlier I did tried similar approach and so to get
> >> everything working like iomem is currently in dpdk; we need to add
> >> something like pci_remap_iospace --> ioremap_page_range() but this api
> >> not really pci_mmap_page_range types. user need to write more code on
> >> top so to use this api efficiently, also this api looks like meant to
> >> use by arch file only in kernel space.
> >>
> >>
> > missed link;
> >
> > [1] http://dpdk.org/dev/patchwork/patch/9365/
> >
> 
> IMO, it is worth keeping one special device file who could work across
> archs like arm/arm64/powerpc and others, who could map iopci bar to
> dpdk user-space. also this approach has no kernel version dependency
> too. BTW; I did mentioned in second approach in to add /dev/ioport
> interface in drivers/char/mem.c which could read more than byte in one
> single operation, but that has kernel dependency. However that
> approach too is arch agnostic.

Your first approach use an out-of-tree kernel module (igb_uio), so we cannot
really say there is no kernel dependency.
We should try to remove the need for any out-of-tree kernel module.
That's why the Linux upstream approach is a better solution.


[dpdk-dev] [PATCH v2 6/6] vhost: enable log_shmfd protocol feature

2015-12-17 Thread Yuanhan Liu
To claim that we support vhost-user live migration support:
SET_LOG_BASE request will be send only when this feature flag
is set.

Besides this flag, we actually need another feature flag set
to make vhost-user live migration work: VHOST_F_LOG_ALL.
Which, however, has been enabled long time ago.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_user/virtio-net-user.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/vhost_user/virtio-net-user.h
index 013cf38..a3a889d 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -38,8 +38,10 @@
 #include "vhost-net-user.h"

 #define VHOST_USER_PROTOCOL_F_MQ   0
+#define VHOST_USER_PROTOCOL_F_LOG_SHMFD1

-#define VHOST_USER_PROTOCOL_FEATURES   (1ULL << VHOST_USER_PROTOCOL_F_MQ)
+#define VHOST_USER_PROTOCOL_FEATURES   ((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
+(1ULL << 
VHOST_USER_PROTOCOL_F_LOG_SHMFD))

 int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *);

-- 
1.9.0



[dpdk-dev] [PATCH v2 5/6] vhost: claim that we support GUEST_ANNOUNCE feature

2015-12-17 Thread Yuanhan Liu
It's actually a feature already enabled in Linux kernel. What we need to
do is simply to claim that we support such feature, and nothing else.

With that, the guest will send GARP messages after live migration.

Signed-off-by: Yuanhan Liu 
---
 lib/librte_vhost/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 03044f6..0ba5045 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root;
 #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
(1ULL << VIRTIO_NET_F_CTRL_RX) | \
+   (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
(VHOST_SUPPORTS_MQ)| \
(1ULL << VIRTIO_F_VERSION_1)   | \
(1ULL << VHOST_F_LOG_ALL)  | \
-- 
1.9.0



[dpdk-dev] [PATCH v2 4/6] vhost: log vring desc buffer changes

2015-12-17 Thread Yuanhan Liu
Every time we copy a buf to vring desc, we need to log it.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Victor Kaplansky addr + vb_offset, 
len_to_cpy);
PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset),
len_to_cpy, 0);

@@ -232,6 +234,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

rte_memcpy((void *)(uintptr_t)buff_hdr_addr,
(const void *)_hdr, vq->vhost_hlen);
+   vhost_log_write(dev, hdr_desc->addr, vq->vhost_hlen);

PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1);

@@ -309,6 +312,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,

rte_memcpy((void *)(uintptr_t)vb_hdr_addr,
(const void *)_hdr, vq->vhost_hlen);
+   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen);

PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1);

@@ -353,6 +357,8 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset),
rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset),
cpy_len);
+   vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr + vb_offset,
+   cpy_len);

PRINT_PACKET(dev,
(uintptr_t)(vb_addr + vb_offset),
-- 
1.9.0



[dpdk-dev] [PATCH v2 3/6] vhost: log used vring changes

2015-12-17 Thread Yuanhan Liu
Introducing a vhost_log_write() wrapper, vhost_log_used_vring, to
log used vring changes.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Victor Kaplansky log_guest_addr + offset;
+   vhost_log_write(dev, addr, len);
+}
+
 /**
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
@@ -129,6 +139,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
uint32_t offset = 0, vb_offset = 0;
uint32_t pkt_len, len_to_cpy, data_len, total_copied = 0;
uint8_t hdr = 0, uncompleted_pkt = 0;
+   uint16_t idx;

/* Get descriptor from available ring */
desc = >desc[head[packet_success]];
@@ -200,16 +211,18 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
}

/* Update used ring with desc information */
-   vq->used->ring[res_cur_idx & (vq->size - 1)].id =
-   head[packet_success];
+   idx = res_cur_idx & (vq->size - 1);
+   vq->used->ring[idx].id = head[packet_success];

/* Drop the packet if it is uncompleted */
if (unlikely(uncompleted_pkt == 1))
-   vq->used->ring[res_cur_idx & (vq->size - 1)].len =
-   vq->vhost_hlen;
+   vq->used->ring[idx].len = vq->vhost_hlen;
else
-   vq->used->ring[res_cur_idx & (vq->size - 1)].len =
-   pkt_len + 
vq->vhost_hlen;
+   vq->used->ring[idx].len = pkt_len + vq->vhost_hlen;
+
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, ring[idx]),
+   sizeof(vq->used->ring[idx]));

res_cur_idx++;
packet_success++;
@@ -236,6 +249,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,

*(volatile uint16_t *)>used->idx += count;
vq->last_used_idx = res_end_idx;
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, idx),
+   sizeof(vq->used->idx));

/* flush used->idx update before we read avail->flags. */
rte_mb();
@@ -265,6 +281,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
uint32_t seg_avail;
uint32_t vb_avail;
uint32_t cpy_len, entry_len;
+   uint16_t idx;

if (pkt == NULL)
return 0;
@@ -302,16 +319,18 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
entry_len = vq->vhost_hlen;

if (vb_avail == 0) {
-   uint32_t desc_idx =
-   vq->buf_vec[vec_idx].desc_idx;
+   uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx;
+
+   if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) {
+   idx = cur_idx & (vq->size - 1);

-   if ((vq->desc[desc_idx].flags
-   & VRING_DESC_F_NEXT) == 0) {
/* Update used ring with desc information */
-   vq->used->ring[cur_idx & (vq->size - 1)].id
-   = vq->buf_vec[vec_idx].desc_idx;
-   vq->used->ring[cur_idx & (vq->size - 1)].len
-   = entry_len;
+   vq->used->ring[idx].id = vq->buf_vec[vec_idx].desc_idx;
+   vq->used->ring[idx].len = entry_len;
+
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, ring[idx]),
+   sizeof(vq->used->ring[idx]));

entry_len = 0;
cur_idx++;
@@ -354,10 +373,13 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t 
queue_id,
if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags &
VRING_DESC_F_NEXT) == 0) {
/* Update used ring with desc information */
-   vq->used->ring[cur_idx & (vq->size - 1)].id
+   idx = cur_idx & (vq->size - 1);
+   vq->used->ring[idx].id
= vq->buf_vec[vec_idx].desc_idx;
-   vq->used->ring[cur_idx & (vq->size - 1)].len
-   = entry_len;
+   vq->used->ring[idx].len = entry_len;
+   vhost_log_used_vring(dev, vq,
+   offsetof(struct vring_used, ring[idx]),
+   sizeof(vq->used->ring[idx]));
entry_len = 0;

[dpdk-dev] [PATCH v2 1/6] vhost: handle VHOST_USER_SET_LOG_BASE request

2015-12-17 Thread Yuanhan Liu
VHOST_USER_SET_LOG_BASE request is used to tell the backend (dpdk
vhost-user) where we should log dirty pages, and how big the log
buffer is.

This request introduces a new payload:

typedef struct VhostUserLog {
uint64_t mmap_size;
uint64_t mmap_offset;
} VhostUserLog;

Also, a fd is delivered from QEMU by ancillary data.

With those info given, an area of memory is mmaped, assigned
to dev->log_base, for logging dirty pages.

Signed-off-by: Yuanhan Liu 
Signed-off-by: Victor Kaplansky protocol_features = protocol_features;
 }
+
+int
+user_set_log_base(struct vhost_device_ctx ctx,
+struct VhostUserMsg *msg)
+{
+   struct virtio_net *dev;
+   int fd = msg->fds[0];
+   uint64_t size, off;
+   void *addr;
+
+   dev = get_device(ctx);
+   if (!dev)
+   return -1;
+
+   if (fd < 0) {
+   RTE_LOG(ERR, VHOST_CONFIG, "invalid log fd: %d\n", fd);
+   return -1;
+   }
+
+   if (msg->size != sizeof(VhostUserLog)) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "invalid log base msg size: %"PRId32" != %d\n",
+   msg->size, (int)sizeof(VhostUserLog));
+   return -1;
+   }
+
+   size = msg->payload.log.mmap_size;
+   off  = msg->payload.log.mmap_offset;
+   RTE_LOG(INFO, VHOST_CONFIG,
+   "log mmap size: %"PRId64", offset: %"PRId64"\n",
+   size, off);
+
+   /*
+* mmap from 0 to workaround a hugepage mmap bug: mmap will be
+* failed when offset is not page size aligned.
+*/
+   addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+   if (addr == MAP_FAILED) {
+   RTE_LOG(ERR, VHOST_CONFIG, "mmap log base failed!\n");
+   return -1;
+   }
+
+   /* TODO: unmap on stop */
+   dev->log_base = (uint64_t)(uintptr_t)addr + off;
+   dev->log_size = size;
+
+   return 0;
+}
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h 
b/lib/librte_vhost/vhost_user/virtio-net-user.h
index b82108d..013cf38 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.h
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.h
@@ -49,6 +49,7 @@ void user_set_vring_kick(struct vhost_device_ctx, struct 
VhostUserMsg *);

 void user_set_protocol_features(struct vhost_device_ctx ctx,
uint64_t protocol_features);
+int user_set_log_base(struct vhost_device_ctx ctx, struct VhostUserMsg *);

 int user_get_vring_base(struct vhost_device_ctx, struct vhost_vring_state *);

-- 
1.9.0



[dpdk-dev] [PATCH v2 0/6] vhost-user live migration support

2015-12-17 Thread Yuanhan Liu
This patch set adds the vhost-user live migration support.

The major task behind that is to log pages we touched during
live migration, including used vring and desc buffer. So, this
patch set is basically about adding vhost log support, and
using it.

Patchset

- Patch 1 handles VHOST_USER_SET_LOG_BASE, which tells us where
  the dirty memory bitmap is.

- Patch 2 introduces a vhost_log_write() helper function to log
  pages we are gonna change.

- Patch 3 logs changes we made to used vring.

- Patch 4 logs changes we made to vring desc buffer.

- Patch 5 and 6 add some feature bits related to live migration.


A simple test guide (on same host)
==

The following test is based on OVS + DPDK (check [0] for
how to setup OVS + DPDK):

[0]: http://wiki.qemu.org/Features/vhost-user-ovs-dpdk

Here is the rough test guide:

1. start ovs-vswitchd

2. Add two ovs vhost-user port, say vhost0 and vhost1

3. Start a VM1 to connect to vhost0. Here is my example:

   $ $QEMU -enable-kvm -m 1024 -smp 4 \
   -chardev socket,id=char0,path=/var/run/openvswitch/vhost0  \
   -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
   -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
   -object 
memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
   -numa node,memdev=mem -mem-prealloc \
   -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
   -hda fc-19-i386.img \
   -monitor telnet::,server,nowait -curses

4. run "ping $host" inside VM1

5. Start VM2 to connect to vhost0, and marking it as the target
   of live migration (by adding -incoming tcp:0: option)

   $ $QEMU -enable-kvm -m 1024 -smp 4 \
   -chardev socket,id=char0,path=/var/run/openvswitch/vhost1  \
   -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
   -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \
   -object 
memory-backend-file,id=mem,size=1024M,mem-path=$HOME/hugetlbfs,share=on \
   -numa node,memdev=mem -mem-prealloc \
   -kernel $HOME/iso/vmlinuz -append "root=/dev/sda1" \
   -hda fc-19-i386.img \
   -monitor telnet::3334,server,nowait -curses \
   -incoming tcp:0: 

6. connect to VM1 monitor, and start migration:

   > migrate tcp:0:

7. After a while, you will find that VM1 has been migrated to VM2,
   and the "ping" command continues running, perfectly.


Cc: Chen Zhihui 
Cc: Yang Maggie 
---
Yuanhan Liu (6):
  vhost: handle VHOST_USER_SET_LOG_BASE request
  vhost: introduce vhost_log_write
  vhost: log used vring changes
  vhost: log vring desc buffer changes
  vhost: claim that we support GUEST_ANNOUNCE feature
  vhost: enable log_shmfd protocol feature

 lib/librte_vhost/rte_virtio_net.h | 36 ++-
 lib/librte_vhost/vhost_rxtx.c | 88 +++
 lib/librte_vhost/vhost_user/vhost-net-user.c  |  7 ++-
 lib/librte_vhost/vhost_user/vhost-net-user.h  |  6 ++
 lib/librte_vhost/vhost_user/virtio-net-user.c | 48 +++
 lib/librte_vhost/vhost_user/virtio-net-user.h |  5 +-
 lib/librte_vhost/virtio-net.c |  5 ++
 7 files changed, 165 insertions(+), 30 deletions(-)

-- 
1.9.0



[dpdk-dev] [PATCH] ixgbe: Discard SRIOV transparent vlan packet headers.

2015-12-17 Thread Tom Kiely
Sorry for the delay in replying to this thread. I was on vacation for 
the last 3 days. Please see inline for my comments.

On 12/15/2015 02:37 PM, Ananyev, Konstantin wrote:
>
>> -Original Message-
>> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
>> Sent: Monday, December 14, 2015 9:35 PM
>> To: Ananyev, Konstantin
>> Cc: Zhang, Helin; dev at dpdk.org; Tom Kiely
>> Subject: Re: [PATCH] ixgbe: Discard SRIOV transparent vlan packet headers.
>>
>> On Mon, 14 Dec 2015 19:57:10 +
>> "Ananyev, Konstantin"  wrote:
>>
>>>
 -Original Message-
 From: Stephen Hemminger [mailto:stephen at networkplumber.org]
 Sent: Monday, December 14, 2015 7:25 PM
 To: Ananyev, Konstantin
 Cc: Zhang, Helin; dev at dpdk.org; Tom Kiely
 Subject: Re: [PATCH] ixgbe: Discard SRIOV transparent vlan packet headers.

 On Mon, 14 Dec 2015 19:12:26 +
 "Ananyev, Konstantin"  wrote:

>
>> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
>> Sent: Friday, December 11, 2015 4:59 PM
>> To: Zhang, Helin; Ananyev, Konstantin
>> Cc: dev at dpdk.org; Tom Kiely; Stephen Hemminger
>> Subject: [PATCH] ixgbe: Discard SRIOV transparent vlan packet headers.
>>
>> From: Tom Kiely 
>>
>> SRIOV VFs support "transparent" vlans. Traffic from/to a VM
>> associated with a VF is tagged/untagged with the specified
>> vlan in a manner intended to be totally transparent to the VM.
>>
>> The vlan is specified by "ip link set  vf  vlan ".
>> The VM is not configured for any vlan on the VF and the VM
>> should never see these transparent vlan headers for that reason.
>>
>> However, in practice these vlan headers are being received by
>> the VM which discards the packets as that vlan is unknown to it.
>> The Linux kernel explicitly discards such vlan headers but DPDK
>> does not.
>> This patch mirrors the kernel behaviour for SRIOV VFs only
>
> I have few concerns about that approach:
>
> 1. I don't think vlan_tci info should *always* be stripped by vf RX 
> routine.
> There could be configurations when that information might be needed by 
> upper layer.
> Let say VF can be member of 2 or more VLANs and upper layer would like to 
> have that information
> for further processing.
> Or special mirror VF, that does traffic snnoping, or something else.
> 2. Proposed implementation would introduce a slowdown for all VF RX 
> routines.
> 3. From the description it seems like the aim is to clear VLAN 
> information for the RX packet.
> Though the patch actually clears VLAN info only for the RX packet whose 
> VLAN tag is not present inside SW copy of VFTA table.
> Which makes no much point to me:
> If VLAN is not present in HW VFTA table, then packet with that VLAN tag 
> will be discarded by HW anyway.
> If it is present inside VFTA table (both SW & HW), then VLAN information 
> would be preserved with and without the patch.
>
> If you need to clear VLAN information, why not to do it on the upper 
> layer - inside your application itself?
> Either create some sort of wrapper around rx_burst(), or setup an RX 
> call-back for your VF device.
>
> Konstantin

 The aim is to get SRIOV to work when the transparent VLAN tag feature is 
 used.
 Please talk to the Linux driver team. Similar code exists there in 
 ixgbevf_process_skb_fields.
>>>
>>> Ah ok, I realised what you are trying to achieve now:
>>> You setup HW VFTA[] from the PF, so from VF point of view SW copy of the 
>>> VFTA[] remains unset.
>>> So HW will pass VLAN packet in, but then SW will clear VLAN tag.
>>> Ok, that clears #3 above, but I think #1,2 still remain.
>> On the host, what configured is a vlan tag per VF per guest
>>
>> Tom had more info in the original mail.
>>
>> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/28932
>>
 The other option is have a copy of all the receive logic which is only
 used by VF code.
>>> Why that's the only option?
>>> Why can't you clear that VLAN information above the PMD layer?
>>> Keep/obtain a copy of VFTA[] somewhere on the upper layer,
>>> and do actual clear after rx_burst() returns?
>>> Konstantin
>> The problem is that the guest is supposed to not see the VLAN tags (it has 
>> no reason to),
>> but the hardware leaves a VLAN tag on there.
> Yes, I understand what you are trying to achieve.
>   What I am trying to say:
> 1. VLAN tag removing shouldn't be forced for all VFs.
> I think there are scenarios where existing behaviour (keeping vlan_tci and 
> ol_flags intact) are what people need.
> One example would be mirror VF doing other VFs traffic snooping.
> Probably some other cases too.
> 2. The way you implemented it - it might cause a RX performance degradation 
> (specially for VF).
> That's why I think it better to be implemented on 

[dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing

2015-12-17 Thread Thomas Monjalon
2015-12-17 05:22, Xie, Huawei:
> You ever asked about the performance data.
> Another thing is how about adding a simple vhost performance example,
> like the vring bench which is used to test virtio performance, so that
> each time we have some performance related patches, we could use this
> benchmark to report the performance difference?

The examples are part of the doc and should be enough simple to be used
in a tutorial.
The tool to test the DPDK drivers is testpmd. There is also the simpler
unit tests in app/test. If something more complex is needed (with qemu
options automated), maybe that dts is a better choice.


[dpdk-dev] VFIO no-iommu

2015-12-17 Thread Alex Williamson
On Wed, 2015-12-16 at 17:22 +, Burakov, Anatoly wrote:
> Hi Alex,
> 
> > On Wed, 2015-12-16 at 08:35 +, Burakov, Anatoly wrote:
> > > Hi Alex,
> > > 
> > > > On Wed, 2015-12-16 at 04:04 +, Ferruh Yigit wrote:
> > > > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson
> > > > > wrote:
> > > > > I tested the DPDK (HEAD of master) with the patch, with help
> > > > > of
> > > > > Anatoly, and DPDK works in no-iommu environment with a little
> > > > > modification.
> > > > > 
> > > > > Basically the only modification is adapt new group naming
> > > > > (noiommu-$)
> > > > > and
> > > > 
> > > > Sorry, forgot to mention that one. ?The intention with the
> > > > modified
> > > > group name is that I want to be very certain that a user
> > > > intending
> > > > to only support properly iommu isolated devices doesn't
> > > > accidentally
> > > > need to deal with these no-iommu mode devices.
> > > > 
> > > > > disable dma mapping (VFIO_IOMMU_MAP_DMA)
> > > > > 
> > > > > Also I need to disable VFIO_CHECK_EXTENSION ioctl, because in
> > > > > vfio
> > > > > module,
> > > > > container->noiommu is not set before doing a
> > > > > vfio_group_set_container()
> > > > > and vfio_for_each_iommu_driver selects wrong driver.
> > > > 
> > > > Running CHECK_EXTENSION on a container without the group
> > > > attached is
> > > > only going to tell you what extensions vfio is capable of, not
> > > > necessarily what extensions are available to you with that
> > > > group.
> > > > Is this just a general dpdk- vfio ordering bug?
> > > 
> > > Yes, that is how VFIO was implemented in DPDK. I was under the
> > > impression that checking extension before assigning devices was
> > > the
> > > correct way to do things, so as to not to try anything we know
> > > would
> > > fail anyway. Does this imply that CHECK_EXTENSION needs to be
> > > called
> > > on both container and groups (or just on groups)?
> > 
> > Hmm, in Documentation/vfio.txt we do give the following algorithm:
> > 
> > if (ioctl(container, VFIO_GET_API_VERSION) !=
> > VFIO_API_VERSION)
> > /* Unknown API version */
> > 
> > if (!ioctl(container, VFIO_CHECK_EXTENSION,
> > VFIO_TYPE1_IOMMU))
> > /* Doesn't support the IOMMU driver we want. */
> > ...
> > 
> > That's just going to query each iommu driver and we can't yet say
> > whether
> > the group the user attaches to the container later will actually
> > support that
> > extension until we try to do it, that would come at VFIO_SET_IOMMU.
> > ?So is
> > it perhaps a vfio bug that we're not advertising no-iommu until the
> > group is
> > attached? ?After all, we are capable of it with just an empty
> > container, just
> > like we are with type1, but we're going to fail SET_IOMMU for the
> > wrong
> > combination.
> > ?This is exactly the sort of thing that makes me glad we reverted
> > it without
> > feedback from a working user driver. ?Thanks,
> 
> Whether it should be considered a "bug" in VFIO or "by design" is up
> to you, of course, but at least according to the VFIO documentation,
> we are meant to check for type 1 extension and then attach devices,
> so it would be expected to get VFIO_NOIOMMU_IOMMU marked as supported
> even without any devices attached to the container (just like we get
> type 1 as supported without any devices attached). Having said that,
> if it was meant to attach devices first and then check the
> extensions, then perhaps the documentation should also point out that
> fact (or perhaps I missed that detail in my readings of the docs, in
> which case my apologies).

Hi Anatoly,

Does the below patch make it behave more like you'd expect. ?This
applies to v4.4-rc4, I'd fold this into the base patch if we
reincorporate it to a future kernel. ?Thanks,

Alex

commit 88d4dcb6b77624965f0b45b5cd305a2b4a105c94
Author: Alex Williamson 
Date:???Wed Dec 16 19:02:01 2015 -0700

vfio: Fix no-iommu CHECK_EXTENSION

Previously the no-iommu iommu driver was only visible when the
container had an attached no-iommu group.??This means that
CHECK_EXTENSION on and empty container couldn't report the possibility
of using VFIO_NOIOMMU_IOMMU.??We report TYPE1 whether or not the user
can make use of it with the group, so this is inconsistent.??Add the
no-iommu iommu to the list of iommu drivers when enabled via module
option, but skip all the others if the container is attached to a
no-iommu groups.??Note that tainting is now done with the "unsafe"
module callback rather than explictly within vfio.

Also fixes module option and module description name inconsistency.

Also make vfio_noiommu_ops const.

Signed-off-by: Alex Williamson 

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index de632da..d3a9432 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -99,9 +99,6 @@ struct vfio_device {
?
?#ifdef CONFIG_VFIO_NOIOMMU
?static bool noiommu __read_mostly;

[dpdk-dev] [PATCH] virtio: fix rx ring descriptor starvation

2015-12-17 Thread Tom Kiely
Hi,
Sorry for the delay. I have been occupied on another critical issue. 
I'll look at this today.
Tom

On 12/17/2015 04:47 AM, Xie, Huawei wrote:
> On 11/26/2015 1:33 AM, Xie, Huawei wrote:
>> On 11/13/2015 5:33 PM, Tom Kiely wrote:
>>> If all rx descriptors are processed while transient
>>> mbuf exhaustion is present, the rx ring ends up with
>>> no available descriptors. Thus no packets are received
>>> on that ring. Since descriptor refill is performed post
>>> rx descriptor processing, in this case no refill is
>>> ever subsequently performed resulting in permanent rx
>>> traffic drop.
>>>
>>> Signed-off-by: Tom Kiely 
>>> ---
>>>   drivers/net/virtio/virtio_rxtx.c |6 --
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>>> b/drivers/net/virtio/virtio_rxtx.c
>>> index 5770fa2..a95e234 100644
>>> --- a/drivers/net/virtio/virtio_rxtx.c
>>> +++ b/drivers/net/virtio/virtio_rxtx.c
>>> @@ -586,7 +586,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>>> **rx_pkts, uint16_t nb_pkts)
>>> if (likely(num > DESC_PER_CACHELINE))
>>> num = num - ((rxvq->vq_used_cons_idx + num) % 
>>> DESC_PER_CACHELINE);
>>>   
>>> -   if (num == 0)
>>> +   /* Refill free descriptors even if no pkts recvd */
>>> +   if (num == 0 && virtqueue_full(rxvq))
>> Should the return condition be that no used buffers and we have avail
>> descs in avail ring, i.e,
>>  num == 0 && rxvq->vq_free_cnt != rxvq->vq_nentries
>>
>> rather than
>>  num == 0 && rxvq->vq_free_cnt == 0
>> ?
> Tom:
> Any further progress?
>>> return 0;
>>>   
>>> num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>>> @@ -683,7 +684,8 @@ virtio_recv_mergeable_pkts(void *rx_queue,
>>>   
>>> virtio_rmb();
>>>   
>>> -   if (nb_used == 0)
>>> +   /* Refill free descriptors even if no pkts recvd */
>>> +   if (nb_used == 0 && virtqueue_full(rxvq))
>>> return 0;
>>>   
>>> PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);



[dpdk-dev] [PATCH] log: add missing symbol

2015-12-17 Thread Neil Horman
On Wed, Dec 16, 2015 at 04:38:34PM -0800, Stephen Hemminger wrote:
> rte_get_log_type and rte_get_log_level functions has been avaliable
> for many versions. But they are missing from the shared library map
> and therefore do not get exported correctly.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
> b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> index cbe175f..51a241c 100644
> --- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> +++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
> @@ -93,7 +93,9 @@ DPDK_2.0 {
>   rte_realloc;
>   rte_set_application_usage_hook;
>   rte_set_log_level;
> + rte_get_log_level;
>   rte_set_log_type;
> + rte_get_log_type;
>   rte_socket_id;
>   rte_strerror;
>   rte_strsplit;
> -- 
> 2.1.4
> 
> 
Acked-by: Neil Horman 



[dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple rx/tx processing

2015-12-17 Thread Xie, Huawei
On 11/27/2015 2:03 PM, Xu, Qian Q wrote:
> Some virtio-pmd optimization performance data sharing: 
> 1. Use simplified vhost-sample, only doing the dequeuer and free, so virtio 
> only tx, then test the virtio tx performance improvement. Then in the VM, 
> using one virtio to do the txonly, and let the virtio tx working. Also 
> modified the txonly file to remove the memory copy part, then check the 
> virtio TX rate. The performance of optimized virtio-pmd will have ~2x 
> performance than the non-optimized virtio-pmd. 
> 2. Similarly as item1, but use the default txonly file, so with memory copy, 
> then the performance of optimized virtio-pmd will have ~37% performance 
> improvement than the non-optimized virtio-pmd. 
> 3. In the OVS test scenario, one physical NIC + one virtio in the VM, then 
> let the virtio do the loopback(having rx and tx), running testpmd in the VM, 
> then the performance will have 60% performance improvement than the 
> non-optimized virtio-pmd. 
Thomas:
You ever asked about the performance data.
Another thing is how about adding a simple vhost performance example,
like the vring bench which is used to test virtio performance, so that
each time we have some performance related patches, we could use this
benchmark to report the performance difference?
>
>
>
> Thanks
> Qian
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Huawei Xie
> Sent: Thursday, October 29, 2015 10:53 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 0/8] virtio ring layout optimization and simple 
> rx/tx processing
>
> Changes in v6:
> - Update release notes
> - Fix the error in virtio tx ring layout ascii chart in the cover-letter
>
> Changes in v5:
> - Call __rte_pktmbuf_prefree_seg to check refcnt when free mbufs
>
> Changes in v4:
> - Fix the error in virtio tx ring layout ascii chart in the commit message
> - Move virtio_xmit_cleanup ahead to free descriptors earlier
> - Test merge-able feature when select simple rx/tx functions
>
> Changes in v3:
> - Remove unnecessary NULL test for rte_free
> - Remove unnecessary assign of local var after free
> - Remove return at the end of void function
> - Remove always_inline attribute for virtio_xmit_cleanup
> - Reword some commit messages
> - Add TODO in the commit message of simple tx patch
>
> Changes in v2:
> - Remove the configure macro
> - Enable simple R/TX processing when user specifies simple txq flags
> - Reword some comments and commit messages
>
> In DPDK based switching enviroment, mostly vhost runs on a dedicated core 
> while virtio processing in guest VMs runs on other different cores.
> Take RX for example, with generic implementation, for each guest buffer,
> a) virtio driver allocates a descriptor from free descriptor list
> b) modify the entry of avail ring to point to allocated descriptor
> c) after packet is received, free the descriptor
>
> When vhost fetches the avail ring, it need to fetch the modified L1 cache 
> from virtio core, which is a heavy cost in current CPU implementation.
>
> This idea of this optimization is:
> allocate the fixed descriptor for each entry of avail ring, so avail ring 
> will always be the same during the run.
> This removes L1M cache transfer from virtio core to vhost core for avail ring.
> (Note we couldn't avoid the cache transfer for descriptors).
> Besides, descriptor allocation and free operation is eliminated.
> This also makes vector procesing possible to further accelerate the 
> processing.
>
> This is the layout for the avail ring(take 256 ring entries for example), 
> with each entry pointing to the descriptor with the same index.
> avail
> idx
> +
> |
> +++---+-+--+
> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
> +-+--+-+--+-+-+-+---+--+---+
>   |||   |   |  |
>   |||   |   |  |
>   vvv   |   v  v
> +-+--+-+--+-+-+-+---+--+---+
> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
> +++---+-+--+
> |
> |
> +++---+-+--+
> | 0  | 1  | 2 | |  254  | 255  |  used ring
> +++---+-+--+
> |
> +
>
> This is the ring layout for TX.
> As we need one virtio header for each xmit packet, we have 128 slots 
> available.
>
>  ++
>  ||
>  ||
> +-+-+-+--+--+--+--+
> |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
> +--+--+--+--+-+---+--+---+--+---+--+--+---+
>| ||  ||  |  | |
>v vv  ||  v  v v
> +--+--+--+--+-+---+--+---+--+---+--+--+---+
> | 128 | 129 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for 
> 

[dpdk-dev] [PATCH] virtio: fix rx ring descriptor starvation

2015-12-17 Thread Xie, Huawei
On 11/26/2015 1:33 AM, Xie, Huawei wrote:
> On 11/13/2015 5:33 PM, Tom Kiely wrote:
>> If all rx descriptors are processed while transient
>> mbuf exhaustion is present, the rx ring ends up with
>> no available descriptors. Thus no packets are received
>> on that ring. Since descriptor refill is performed post
>> rx descriptor processing, in this case no refill is
>> ever subsequently performed resulting in permanent rx
>> traffic drop.
>>
>> Signed-off-by: Tom Kiely 
>> ---
>>  drivers/net/virtio/virtio_rxtx.c |6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c 
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 5770fa2..a95e234 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -586,7 +586,8 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  if (likely(num > DESC_PER_CACHELINE))
>>  num = num - ((rxvq->vq_used_cons_idx + num) % 
>> DESC_PER_CACHELINE);
>>  
>> -if (num == 0)
>> +/* Refill free descriptors even if no pkts recvd */
>> +if (num == 0 && virtqueue_full(rxvq))
> Should the return condition be that no used buffers and we have avail
> descs in avail ring, i.e,
> num == 0 && rxvq->vq_free_cnt != rxvq->vq_nentries
>
> rather than
> num == 0 && rxvq->vq_free_cnt == 0
> ?
Tom:
Any further progress?
>>  return 0;
>>  
>>  num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>> @@ -683,7 +684,8 @@ virtio_recv_mergeable_pkts(void *rx_queue,
>>  
>>  virtio_rmb();
>>  
>> -if (nb_used == 0)
>> +/* Refill free descriptors even if no pkts recvd */
>> +if (nb_used == 0 && virtqueue_full(rxvq))
>>  return 0;
>>  
>>  PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
>



[dpdk-dev] make install and RTE_KERNELDIR in dpdk 2.2

2015-12-17 Thread Thomas Monjalon
2015-12-16 15:14, Piotr Bartosiewicz:
> A new 'make install' wrongly assumes that the output module name is 
> always 'uname -r' even if RTE_KERNELDIR is passed.

No it does not assume anything, it is just a default value.
How can you find the directory based on RTE_KERNELDIR?

You can set kerneldir=something-else on the "make install" command line.


[dpdk-dev] VFIO no-iommu

2015-12-17 Thread Thomas Monjalon
2015-12-16 16:23, Burakov, Anatoly:
> Hi Thomas,
> 
>  > > On Tue, Dec 15, 2015 at 09:53:18AM -0700, Alex Williamson wrote:
> > > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> > > it imply deprecating the uio interface?  I believe the feature that
> > > started this discussion was support for MSI/X interrupts so that VFs
> > > can support some kind of interrupt (uio only supports INTx since it
> > > doesn't allow DMA).  Implementing that would be the ultimate test of
> > > whether this provides dpdk with not only a more consistent interface,
> > > but the feature dpdk wants that's missing in uio. Thanks,
> 
> Ferruh has done a great job so far testing Alex's patch, very few changes 
> from DPDK side seem to be required as far as existing functionality goes (not 
> sure about VF interrupts mentioned by Alex). However, one thing that concerns 
> me is usability. While it is true that no-IOMMU mode in VFIO would mean uio 
> interfaces could be deprecated in time, the no-iommu mode is way more hassle 
> than using igb_uio/uio_pci_generic because it will require a kernel recompile 
> as opposed to simply compiling and insmod'ding an out-of-tree driver. So, in 
> essence, if you don't want an IOMMU, it's becoming that much harder to use 
> DPDK. Would that be something DPDK is willing to live with in the absence of 
> uio interfaces?

Excuse me if I missed something obvious.
Why a kernel compilation is needed?