[dpdk-dev] Jumbo frame support in pktgen

2015-10-06 Thread Wiles, Keith

On 10/6/15, 11:07 PM, "dev on behalf of Hyunseok"  wrote:

>Hi,
>
>Can we generate 9k jumbo frames using the latest pktgen?

Internally I do not create mbufs larger then 2K at this time in Pktgen. I
guess it could be changed, but I do not have the time. If you want to
submit a patch that would be great.
>
>Thanks!
>-hs

? 
Regards,
++Keith Wiles

Intel Corporation



[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov


On 10/06/15 18:00, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 05:49:21PM +0300, Vlad Zolotarov wrote:
>>> and read/write the config space.
>>> This means that a single userspace bug is enough to corrupt kernel
>>> memory.
>> Could u, pls., provide and example of this simple bug? Because it's
>> absolutely not obvious...
> Stick a value that happens to match a kernel address in Msg Addr field
> in an unmasked MSI-X entry.

This patch neither configures MSI-X entries in the user space nor 
provides additional means to do so therefore this "sticking" would be a 
matter of some extra code that is absolutely unrelated to this patch. 
So, this example seems absolutely irrelevant to this particular discussion.

thanks,
vlad

>



[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Avi Kivity


On 10/06/2015 05:07 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 03:15:57PM +0300, Avi Kivity wrote:
>> btw, (2) doesn't really add any insecurity.  The user could already poke at
>> the msix tables (as well as perform DMA); they just couldn't get a useful
>> interrupt out of them.
> Poking at msix tables won't cause memory corruption unless msix and bus
> mastering is enabled.

It's a given that bus mastering is enabled.  It's true that msix is 
unlikely to be enabled, unless msix support is added.

>It's true root can enable msix and bus mastering
> through sysfs - but that's easy to block or detect. Even if you don't
> buy a security story, it seems less likely to trigger as a result
> of a userspace bug.

If you're doing DMA, that's the least of your worries.

Still, zero-mapping the msix space seems reasonable, and can protect 
userspace from silly stuff.  It can't be considered to have anything to 
do with security though, as long as users can simply DMA to every bit of 
RAM in the system they want to.


[dpdk-dev] Jumbo frame support in pktgen

2015-10-06 Thread Hyunseok
Hi,

Can we generate 9k jumbo frames using the latest pktgen?

Thanks!
-hs


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Tue, Oct 06, 2015 at 05:49:21PM +0300, Vlad Zolotarov wrote:
> >and read/write the config space.
> >This means that a single userspace bug is enough to corrupt kernel
> >memory.
> 
> Could u, pls., provide and example of this simple bug? Because it's
> absolutely not obvious...

Stick a value that happens to match a kernel address in Msg Addr field
in an unmasked MSI-X entry.

-- 
MST


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov


On 10/06/15 16:58, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 11:23:11AM +0300, Vlad Zolotarov wrote:
>> Michael, how this or any other related patch is related to the problem u r
>> describing?
>> The above ability is there for years and if memory serves me
>> well it was u who wrote uio_pci_generic with this "security flaw".  ;)
> I answered all this already.
>
> This patch enables bus mastering, enables MSI or MSI-X

This may be done from the user space right now without this patch...

> , and requires
> userspace to map the MSI-X table

Hmmm... I must have missed this requirement. Could u, pls., clarify? 
 From what I see, MSI/MSI-X table is configured completely in the kernel 
here...

> and read/write the config space.
> This means that a single userspace bug is enough to corrupt kernel
> memory.

Could u, pls., provide and example of this simple bug? Because it's 
absolutely not obvious...

>
> uio_pci_generic does not enable bus mastering or MSI, and
> it might be a good idea to have uio_pci_generic block
> access to MSI/MSI-X config.

Since device bars may be mapped bypassing the UIO/uio_pci_generic - this 
won't solve any issue.




[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Tue, Oct 06, 2015 at 03:15:57PM +0300, Avi Kivity wrote:
> btw, (2) doesn't really add any insecurity.  The user could already poke at
> the msix tables (as well as perform DMA); they just couldn't get a useful
> interrupt out of them.

Poking at msix tables won't cause memory corruption unless msix and bus
mastering is enabled.  It's true root can enable msix and bus mastering
through sysfs - but that's easy to block or detect. Even if you don't
buy a security story, it seems less likely to trigger as a result
of a userspace bug.

-- 
MST


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Tue, Oct 06, 2015 at 11:23:11AM +0300, Vlad Zolotarov wrote:
> Michael, how this or any other related patch is related to the problem u r
> describing?
> The above ability is there for years and if memory serves me
> well it was u who wrote uio_pci_generic with this "security flaw".  ;)

I answered all this already.

This patch enables bus mastering, enables MSI or MSI-X, and requires
userspace to map the MSI-X table and read/write the config space.
This means that a single userspace bug is enough to corrupt kernel
memory.

uio_pci_generic does not enable bus mastering or MSI, and
it might be a good idea to have uio_pci_generic block
access to MSI/MSI-X config.
-- 
MST


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Tue, Oct 06, 2015 at 08:33:56AM +0100, Stephen Hemminger wrote:
> Other than implementation objections, so far the two main arguments
> against this reduce to:
>   1. If you allow UIO ioctl then it opens an API hook for all the crap out
>  of tree UIO drivers to do what they want.
>   2. If you allow UIO MSI-X then you are expanding the usage of userspace
>  device access in an insecure manner.

That's not all. Without MSI one can detect insecure usage by detecting
userspace enabling bus mastering.  This can be detected simply using
lspci.  Or one can also imagine a configuration where this ability is
disabled, is logged, or taints kernel.  This seems like something that
might be worth having for some locked-down systems.

OTOH enabling MSI requires enabling bus mastering so suddenly we have no
idea whether device can be/is used in a safe way.

> 
> Another alternative which I explored was making a version of VFIO that
> works without IOMMU. It solves #1 but actually increases the likely negative
> response to arguent #2.

No - because VFIO has limited protection against device misuse by
userspace, by limiting access to sub-ranges of device BARs and config
space.  For a device that doesn't do DMA, that will be enough to make it
secure to use.

That's a pretty weak excuse to support userspace drivers for PCI devices
without an IOMMU, but it's the best I heard so far.

Is that worth the security trade-off? I'm still not sure.

> This would keep same API, and avoid having to
> modify UIO. But we would still have the same (if not more resistance)
> from IOMMU developers who believe all systems have to be secure against
> root.

"Secure against root" is a confusing way to put it IMHO. We are talking
about memory protection.

So that's not IOMMU developers IIUC. I believe most kernel developers will
agree it's not a good idea to let userspace corrupt kernel memory.
Otherwise, the driver can't be supported, and maintaining upstream
drivers that can't be supported serves no useful purpose.  Anyone can
load out of tree ones just as well.

VFIO already supports MSI so VFIO developers already have a lot of
experience with these issues. Getting their input would be valuable.

-- 
MST


[dpdk-dev] [PATCH v2 4/4] Modifying configuration scripts for Netronome's nfp_uio driver.

2015-10-06 Thread Alejandro.Lucero
From: "Alejandro.Lucero" 

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 tools/dpdk_nic_bind.py |8 ++--
 tools/setup.sh |  122 ++--
 2 files changed, 101 insertions(+), 29 deletions(-)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index b7bd877..f7f8a39 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -43,7 +43,7 @@ ETHERNET_CLASS = "0200"
 # Each device within this is itself a dictionary of device properties
 devices = {}
 # list of supported DPDK drivers
-dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic" ]
+dpdk_drivers = [ "igb_uio", "vfio-pci", "uio_pci_generic", "nfp_uio" ]

 # command-line arg flags
 b_flag = None
@@ -153,7 +153,7 @@ def find_module(mod):
 return path

 def check_modules():
-'''Checks that igb_uio is loaded'''
+'''Checks that at least one dpdk module is loaded'''
 global dpdk_drivers

 fd = file("/proc/modules")
@@ -261,7 +261,7 @@ def get_nic_details():
 devices[d]["Active"] = "*Active*"
 break;

-# add igb_uio to list of supporting modules if needed
+# add module to list of supporting modules if needed
 if "Module_str" in devices[d]:
 for driver in dpdk_drivers:
 if driver not in devices[d]["Module_str"]:
@@ -440,7 +440,7 @@ def display_devices(title, dev_list, extra_params = None):

 def show_status():
 '''Function called when the script is passed the "--status" option. 
Displays
-to the user what devices are bound to the igb_uio driver, the kernel driver
+to the user what devices are bound to a dpdk driver, the kernel driver
 or to no driver'''
 global dpdk_drivers
 kernel_drv = []
diff --git a/tools/setup.sh b/tools/setup.sh
index 5a8b2f3..e434ddb 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -236,6 +236,52 @@ load_vfio_module()
 }

 #
+# Unloads nfp_uio.ko.
+#
+remove_nfp_uio_module()
+{
+   echo "Unloading any existing DPDK UIO module"
+   /sbin/lsmod | grep -s nfp_uio > /dev/null
+   if [ $? -eq 0 ] ; then
+   sudo /sbin/rmmod nfp_uio
+   fi
+}
+
+#
+# Loads new nfp_uio.ko (and uio module if needed).
+#
+load_nfp_uio_module()
+{
+   echo "Using RTE_SDK=$RTE_SDK and RTE_TARGET=$RTE_TARGET"
+   if [ ! -f $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko ];then
+   echo "## ERROR: Target does not have the DPDK UIO Kernel 
Module."
+   echo "   To fix, please try to rebuild target."
+   return
+   fi
+
+   remove_nfp_uio_module
+
+   /sbin/lsmod | grep -s uio > /dev/null
+   if [ $? -ne 0 ] ; then
+   modinfo uio > /dev/null
+   if [ $? -eq 0 ]; then
+   echo "Loading uio module"
+   sudo /sbin/modprobe uio
+   fi
+   fi
+
+   # UIO may be compiled into kernel, so it may not be an error if it can't
+   # be loaded.
+
+   echo "Loading DPDK UIO module"
+   sudo /sbin/insmod $RTE_SDK/$RTE_TARGET/kmod/nfp_uio.ko
+   if [ $? -ne 0 ] ; then
+   echo "## ERROR: Could not load kmod/nfp_uio.ko."
+   quit
+   fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -427,10 +473,10 @@ grep_meminfo()
 #
 show_nics()
 {
-   if  /sbin/lsmod | grep -q -e igb_uio -e vfio_pci; then
+   if  /sbin/lsmod | grep -q -e igb_uio -e vfio_pci -e nfp_uio; then
${RTE_SDK}/tools/dpdk_nic_bind.py --status
else
-   echo "# Please load the 'igb_uio' or 'vfio-pci' kernel module 
before "
+   echo "# Please load the 'igb_uio', 'vfio-pci' or 'nfp_uio' 
kernel module before "
echo "# querying or adjusting NIC device bindings"
fi
 }
@@ -471,6 +517,23 @@ bind_nics_to_igb_uio()
 }

 #
+# Uses dpdk_nic_bind.py to move devices to work with nfp_uio
+#
+bind_nics_to_nfp_uio()
+{
+   if  /sbin/lsmod  | grep -q nfp_uio ; then
+   ${RTE_SDK}/tools/dpdk_nic_bind.py --status
+   echo ""
+   echo -n "Enter PCI address of device to bind to NFP UIO driver: 
"
+   read PCI_PATH
+   sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b nfp_uio $PCI_PATH && 
echo "OK"
+   else
+   echo "# Please load the 'nfp_uio' kernel module before querying 
or "
+   echo "# adjusting NIC device bindings"
+   fi
+}
+
+#
 # Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
@@ -513,29 +576,35 @@ step2_func()
TEXT[1]="Insert IGB UIO module"
FUNC[1]="load_igb_uio_module"

-   TEXT[2]="Insert VFIO module"
-   FUNC[2]="load_vfio_module"
+   TEXT[2]="Insert NFP UIO module"
+   FUNC[2]="load_nfp_uio_module"

-   TEXT[3]="Insert KNI module"
-   FUNC[3]="load_kni_module"
+   TEXT[3]="Insert VFIO 

[dpdk-dev] [PATCH v2 3/4] This patch adds documentation about Netronome´s NFP nic

2015-10-06 Thread Alejandro.Lucero
From: "Alejandro.Lucero" 

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 doc/guides/nics/index.rst |1 +
 doc/guides/nics/nfp.rst   |  270 +
 2 files changed, 271 insertions(+)
 create mode 100644 doc/guides/nics/nfp.rst

diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index d1a92f8..596ff88 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -48,6 +48,7 @@ Network Interface Controller Drivers
 virtio
 vmxnet3
 pcap_ring
+nfp

 **Figures**

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
new file mode 100644
index 000..57b34c6
--- /dev/null
+++ b/doc/guides/nics/nfp.rst
@@ -0,0 +1,270 @@
+..  BSD LICENSE
+Copyright(c) 2015 Netronome Systems, Inc. All rights reserved.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.
+* Neither the name of Intel Corporation nor the names of its
+contributors may be used to endorse or promote products derived
+from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+NFP poll mode driver library
+
+
+Netronome's sixth generation of flow processors pack 216 programmable
+cores and over 100 hardware accelerators that uniquely combine packet,
+flow, security and content processing in a single device that scales
+up to 400 Gbps.
+
+This document explains how to use DPDK with the Netronome Poll Mode
+Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
+(NFP-6xxx).
+
+Currently the driver supports virtual functions (VFs) only.
+
+Dependencies
+
+
+Before using the Netronome's DPDK PMD some NFP-6xxx configuration,
+which is not related to DPDK, is required. The system requires
+installation of **Netronome's BSP (Board Support Package)** which includes
+Linux drivers, programs and libraries.
+
+If you have a NFP-6xxx device you should already have the code and
+documentation for doing this configuration. Contact
+**support at netronome.com** to obtain the latest available firmware.
+
+The NFP Linux kernel drivers (including the required PF driver for the
+NFP) are available on Github at
+**https://github.com/Netronome/nfp-drv-kmods** along with build
+instructions.
+
+DPDK runs in userspace and PMDs uses the Linux kernel UIO interface to
+allow access to physical devices from userspace. The NFP PMD requires
+a separate UIO driver, **nfp_uio**, to perform correct
+initialization. This driver is part of the DPDK source tree and is
+equivalent to Intel's igb_uio driver.
+
+Building the software
+-
+
+Netronome's PMD code is provided in the **drivers/net/nfp** directory and
+nfp_uio is present in the **lib/librte_eal/linuxapp/nfp_uio** directory. Both
+are part of the DPDK build if the **common_linuxapp configuration** file is
+used. If you use another configuration file and want to have NFP support
+just add:
+
+- **CONFIG_RTE_EAL_NFP_UIO=y**
+
+- **CONFIG_RTE_LIBRTE_NFP_PMD=y**
+
+Once DPDK is built all the DPDK apps and examples include support for
+the NFP PMD. The nfp_uio.ko module will be at build/kmods directory or
+at the directory specified when building DPDK.
+
+
+System configuration
+
+
+Using the NFP PMD is not different to using other PMDs. Usual steps are:
+
+#. **Configure hugepages:** All major Linux distributions have the hugepages
+   functionality enabled by default. By default this allows the system uses for
+   working with transparent hugepages. But in this case some hugepages need to
+   be created/reserved for use with the DPDK through the hugetlbfs file system.
+   First the virtual 

[dpdk-dev] [PATCH v2 2/4] This patch adds a new UIO driver for Netronome NFP PCI cards.

2015-10-06 Thread Alejandro.Lucero
From: "Alejandro.Lucero" 

Current Netronome's PMD just supports Virtual Functions. Future Physical
Function support will require specific Netronome code here.

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 lib/librte_eal/common/include/rte_pci.h   |1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c |4 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |2 +-
 lib/librte_eal/linuxapp/nfp_uio/Makefile  |   53 +++
 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c |  497 +
 lib/librte_ether/rte_ethdev.c |1 +
 6 files changed, 557 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 83e3c28..89baaf6 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -146,6 +146,7 @@ struct rte_devargs;
 enum rte_kernel_driver {
RTE_KDRV_UNKNOWN = 0,
RTE_KDRV_IGB_UIO,
+   RTE_KDRV_NFP_UIO,
RTE_KDRV_VFIO,
RTE_KDRV_UIO_GENERIC,
RTE_KDRV_NIC_UIO,
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bc5b5be..19a93fe 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -137,6 +137,7 @@ pci_map_device(struct rte_pci_device *dev)
 #endif
break;
case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_NFP_UIO:
case RTE_KDRV_UIO_GENERIC:
/* map resources for devices that use uio */
ret = pci_uio_map_resource(dev);
@@ -161,6 +162,7 @@ pci_unmap_device(struct rte_pci_device *dev)
RTE_LOG(ERR, EAL, "Hotplug doesn't support vfio yet\n");
break;
case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_NFP_UIO:
case RTE_KDRV_UIO_GENERIC:
/* unmap resources for devices that use uio */
pci_uio_unmap_resource(dev);
@@ -357,6 +359,8 @@ pci_scan_one(const char *dirname, uint16_t domain, uint8_t 
bus,
dev->kdrv = RTE_KDRV_VFIO;
else if (!strcmp(driver, "igb_uio"))
dev->kdrv = RTE_KDRV_IGB_UIO;
+   else if (!strcmp(driver, "nfp_uio"))
+   dev->kdrv = RTE_KDRV_NFP_UIO;
else if (!strcmp(driver, "uio_pci_generic"))
dev->kdrv = RTE_KDRV_UIO_GENERIC;
else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ac50e13..29ec9cb 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -270,7 +270,7 @@ pci_uio_alloc_resource(struct rte_pci_device *dev,
goto error;
}

-   if (dev->kdrv == RTE_KDRV_IGB_UIO)
+   if (dev->kdrv == RTE_KDRV_IGB_UIO || dev->kdrv == RTE_KDRV_NFP_UIO)
dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
else {
dev->intr_handle.type = RTE_INTR_HANDLE_UIO_INTX;
diff --git a/lib/librte_eal/linuxapp/nfp_uio/Makefile 
b/lib/librte_eal/linuxapp/nfp_uio/Makefile
new file mode 100644
index 000..b9e2f0a
--- /dev/null
+++ b/lib/librte_eal/linuxapp/nfp_uio/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2014-2015 Netronome. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 

[dpdk-dev] [PATCH v2 1/4] This patch adds a PMD driver for Netronome NFP PCI cards.

2015-10-06 Thread Alejandro.Lucero
From: "Alejandro.Lucero" 

Signed-off-by: Alejandro.Lucero 
Signed-off-by: Rolf.Neugebauer 
---
 MAINTAINERS  |9 +
 config/common_linuxapp   |6 +
 doc/guides/rel_notes/release_2_2.rst |8 +
 drivers/net/Makefile |1 +
 drivers/net/nfp/Makefile |   88 ++
 drivers/net/nfp/nfp_net.c| 2495 ++
 drivers/net/nfp/nfp_net_ctrl.h   |  290 
 drivers/net/nfp/nfp_net_logs.h   |   75 +
 drivers/net/nfp/nfp_net_pmd.h|  434 ++
 lib/librte_eal/linuxapp/Makefile |3 +
 mk/rte.app.mk|1 +
 11 files changed, 3410 insertions(+)
 create mode 100644 drivers/net/nfp/Makefile
 create mode 100644 drivers/net/nfp/nfp_net.c
 create mode 100644 drivers/net/nfp/nfp_net_ctrl.h
 create mode 100644 drivers/net/nfp/nfp_net_logs.h
 create mode 100644 drivers/net/nfp/nfp_net_pmd.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..1fb2ba6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -167,6 +167,10 @@ FreeBSD UIO
 M: Bruce Richardson 
 F: lib/librte_eal/bsdapp/nic_uio/

+NFP UIO
+M: Alejandro Lucero 
+F: lib/librte_eal/linuxapp/nfp_uio/
+

 Core Libraries
 --
@@ -255,6 +259,11 @@ M: Adrien Mazarguil 
 F: drivers/net/mlx4/
 F: doc/guides/nics/mlx4.rst

+Netronome NFP
+M: Alejandro Lucero 
+F: drivers/net/nfp/
+F: doc/guides/nics/nfp.rst
+
 RedHat virtio
 M: Huawei Xie 
 M: Changchun Ouyang 
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..d8d6384 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -108,6 +108,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_NFP_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_MALLOC_DEBUG=n

@@ -238,6 +239,11 @@ CONFIG_RTE_LIBRTE_ENIC_PMD=y
 CONFIG_RTE_LIBRTE_ENIC_DEBUG=n

 #
+# Compile burst-oriented Netronome PMD driver
+#
+CONFIG_RTE_LIBRTE_NFP_PMD=y
+
+#
 # Compile burst-oriented VIRTIO PMD driver
 #
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index 5687676..364cca3 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -16,6 +16,10 @@ EAL
   Fixed issue where the ``rte_epoll_wait()`` function didn't return when the
   underlying call to ``epoll_wait()`` timed out.

+* **eal/linuxapp: New UIO driver for Netronome?s NFP support**
+
+  Netronome?s NFP PMD requires some specific configuration. Current 
implementation
+  supports just VFs. Future PF support will require major changes to this 
driver.

 Drivers
 ~~~
@@ -39,6 +43,10 @@ Drivers

   Fixed issue with libvirt ``virsh destroy`` not killing the VM.

+* **drivers/net: New PMD for Netronome?s NFP 6xxx cards**
+
+  PMD supporting VFs with Netronome?s NFP card. It requires specific UIO
+  driver, nfp_uio, and previous configuration using Netronome?s BSP.

 Libraries
 ~
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..bc08591 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -48,6 +48,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += ring
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp

 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/nfp/Makefile b/drivers/net/nfp/Makefile
new file mode 100644
index 000..ef74e27
--- /dev/null
+++ b/drivers/net/nfp/Makefile
@@ -0,0 +1,88 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 

[PATCH v2 0/4] Support for Netronome´s NFP-6xxx card

2015-10-06 Thread Alejandro.Lucero
From: "Alejandro.Lucero" 

This patchset adds a new PMD for Netronome?s NFP-6xxx card along with a new
UIO driver, documentation and minor changes to configuration scrips.

Alejandro.Lucero (4):
  This patch adds a PMD driver for Netronome NFP PCI cards.
  This patch adds a new UIO driver for Netronome NFP PCI cards.
  This patch adds documentation about Netronome?s NFP nic
  Modifying configuration scripts for Netronome's nfp_uio driver.

 MAINTAINERS   |9 +
 config/common_linuxapp|6 +
 doc/guides/nics/index.rst |1 +
 doc/guides/nics/nfp.rst   |  270 
 doc/guides/rel_notes/release_2_2.rst  |8 +
 drivers/net/Makefile  |1 +
 drivers/net/nfp/Makefile  |   88 +
 drivers/net/nfp/nfp_net.c | 2495 +
 drivers/net/nfp/nfp_net_ctrl.h|  290 
 drivers/net/nfp/nfp_net_logs.h|   75 +
 drivers/net/nfp/nfp_net_pmd.h |  434 +
 lib/librte_eal/common/include/rte_pci.h   |1 +
 lib/librte_eal/linuxapp/Makefile  |3 +
 lib/librte_eal/linuxapp/eal/eal_pci.c |4 +
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |2 +-
 lib/librte_eal/linuxapp/nfp_uio/Makefile  |   53 +
 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c |  497 ++
 lib/librte_ether/rte_ethdev.c |1 +
 mk/rte.app.mk |1 +
 tools/dpdk_nic_bind.py|8 +-
 tools/setup.sh|  122 +-
 21 files changed, 4339 insertions(+), 30 deletions(-)
 create mode 100644 doc/guides/nics/nfp.rst
 create mode 100644 drivers/net/nfp/Makefile
 create mode 100644 drivers/net/nfp/nfp_net.c
 create mode 100644 drivers/net/nfp/nfp_net_ctrl.h
 create mode 100644 drivers/net/nfp/nfp_net_logs.h
 create mode 100644 drivers/net/nfp/nfp_net_pmd.h
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/nfp_uio/nfp_uio.c

-- 
1.7.9.5



[dpdk-dev] [PATCH v5 3/4] ethdev: redesign link speed config API

2015-10-06 Thread Nélio Laranjeiro
Hi Marc,

On Sun, Oct 04, 2015 at 11:12:46PM +0200, Marc Sune wrote:
>[...]
>  /**
> + * Device supported speeds bitmap flags
> + */
> +#define ETH_LINK_SPEED_AUTONEG   (0 << 0)  /*< Autonegociate 
> (all speeds)  */
> +#define ETH_LINK_SPEED_NO_AUTONEG(1 << 0)  /*< Disable autoneg (fixed 
> speed)  */
> +#define ETH_LINK_SPEED_10M_HD(1 << 1)  /*< 10 Mbps 
> half-duplex */
> +#define ETH_LINK_SPEED_10M   (1 << 2)  /*< 10 Mbps full-duplex */
> +#define ETH_LINK_SPEED_100M_HD   (1 << 3)  /*< 100 Mbps 
> half-duplex */
> +#define ETH_LINK_SPEED_100M  (1 << 4)  /*< 100 Mbps full-duplex */
> +#define ETH_LINK_SPEED_1G(1 << 5)  /*< 1 Gbps */
> +#define ETH_LINK_SPEED_2_5G  (1 << 6)  /*< 2.5 Gbps */
> +#define ETH_LINK_SPEED_5G(1 << 7)  /*< 5 Gbps */
> +#define ETH_LINK_SPEED_10G   (1 << 8)  /*< 10 Mbps */
> +#define ETH_LINK_SPEED_20G   (1 << 9)  /*< 20 Gbps */
> +#define ETH_LINK_SPEED_25G   (1 << 10)  /*< 25 Gbps */
> +#define ETH_LINK_SPEED_40G   (1 << 11)  /*< 40 Gbps */
> +#define ETH_LINK_SPEED_50G   (1 << 12)  /*< 50 Gbps */
> +#define ETH_LINK_SPEED_56G   (1 << 13)  /*< 56 Gbps */
> +#define ETH_LINK_SPEED_100G  (1 << 14)  /*< 100 Gbps */
> +
> +/**
> + * Ethernet numeric link speeds in Mbps
> + */
> +#define ETH_SPEED_NUM_NONE   0  /*< Not defined */
> +#define ETH_SPEED_NUM_10M10 /*< 10 Mbps */
> +#define ETH_SPEED_NUM_100M   100/*< 100 Mbps */
> +#define ETH_SPEED_NUM_1G 1000   /*< 1 Gbps */
> +#define ETH_SPEED_NUM_2_5G   2500   /*< 2.5 Gbps */
> +#define ETH_SPEED_NUM_5G 5000   /*< 5 Gbps */
> +#define ETH_SPEED_NUM_10G1  /*< 10 Mbps */
> +#define ETH_SPEED_NUM_20G2  /*< 20 Gbps */
> +#define ETH_SPEED_NUM_25G25000  /*< 25 Gbps */
> +#define ETH_SPEED_NUM_40G4  /*< 40 Gbps */
> +#define ETH_SPEED_NUM_50G5  /*< 50 Gbps */
> +#define ETH_SPEED_NUM_56G56000  /*< 56 Gbps */
> +#define ETH_SPEED_NUM_100G   10 /*< 100 Gbps */
> +
> +/**
>   * A structure used to retrieve link-level information of an Ethernet port.
>   */
>  struct rte_eth_link {
> - uint16_t link_speed;  /**< ETH_LINK_SPEED_[10, 100, 1000, 1] */
> - uint16_t link_duplex; /**< ETH_LINK_[HALF_DUPLEX, FULL_DUPLEX] */
> - uint8_t  link_status : 1; /**< 1 -> link up, 0 -> link down */
> -}__attribute__((aligned(8))); /**< aligned for atomic64 read/write */
> -
> -#define ETH_LINK_SPEED_AUTONEG  0   /**< Auto-negotiate link speed. */
> -#define ETH_LINK_SPEED_10   10  /**< 10 megabits/second. */
> -#define ETH_LINK_SPEED_100  100 /**< 100 megabits/second. */
> -#define ETH_LINK_SPEED_1000 1000/**< 1 gigabits/second. */
> -#define ETH_LINK_SPEED_11   /**< 10 gigabits/second. */
> -#define ETH_LINK_SPEED_10G  1   /**< alias of 10 gigabits/second. */
> -#define ETH_LINK_SPEED_20G  2   /**< 20 gigabits/second. */
> -#define ETH_LINK_SPEED_40G  4   /**< 40 gigabits/second. */
> + uint32_t link_speed;   /**< Link speed (ETH_SPEED_NUM_) */
> + uint16_t link_duplex;  /**< 1 -> full duplex, 0 -> half duplex */
> + uint8_t link_autoneg : 1;  /**< 1 -> link speed has been autoneg */
> + uint8_t link_status  : 1;  /**< 1 -> link up, 0 -> link down */
> +} __attribute__((aligned(8)));  /**< aligned for atomic64 read/write */
>[...]

Pretty good.  One question, why did you not merge link_duplex, autoneg,
and status like:

struct rte_eth_link {
uint32_t link_speed;
uint32_t link_duplex:1;
uint32_t link_autoneg:1;
uint32_t link_status:1;
};

is it really useful to keep a uint16_t for the duplex alone?

Another point, the comment about link_duplex field should point to the
defines you have changed i.e. ETH_LINK_HALF_DUPLEX, ETH_LINK_FULL_DUPLEX.

Regards,

-- 
N?lio Laranjeiro
6WIND


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Avi Kivity
On 10/06/2015 10:33 AM, Stephen Hemminger wrote:
> Other than implementation objections, so far the two main arguments
> against this reduce to:
>1. If you allow UIO ioctl then it opens an API hook for all the crap out
>   of tree UIO drivers to do what they want.
>2. If you allow UIO MSI-X then you are expanding the usage of userspace
>   device access in an insecure manner.
>
> Another alternative which I explored was making a version of VFIO that
> works without IOMMU. It solves #1 but actually increases the likely negative
> response to arguent #2. This would keep same API, and avoid having to
> modify UIO. But we would still have the same (if not more resistance)
> from IOMMU developers who believe all systems have to be secure against
> root.

vfio's charter was explicitly aiming for modern setups with iommus.

This could be revisited, but I agree it will have even more resistance, 
justified IMO.

btw, (2) doesn't really add any insecurity.  The user could already poke 
at the msix tables (as well as perform DMA); they just couldn't get a 
useful interrupt out of them.

Maybe a module parameter "allow_insecure_dma" can be added to 
uio_pci_generic.  Without the parameter, bus mastering and msix is 
disabled, with the parameter it is allowed.  This requires the sysadmin 
to take a positive step in order to make use of their hardware.



[dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1

2015-10-06 Thread Vincent JARDIN
Le 6 oct. 2015 09:54, "Stephen Hemminger"  a
?crit :
>
> On Mon,  5 Oct 2015 19:54:35 +0200
> Adrien Mazarguil  wrote:
>
> > Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> > (mlx5) adapters can take advantage of, such as:
> >
> > - Separate post and doorbell operations on all queues.
> > - Lightweight RX queues called Work Queues (WQs).
> > - Low-level RSS indirection table and hash key configuration.
> >
> > This patchset enhances mlx5 with all of these for better performance and
> > flexibility. Documentation is updated accordingly.
>
> Has anybody explored doing a driver without the dependency on OFED?
> It is certainly possible. The Linux kernel drivers don't depend on it.
> And dropping OFED would certainly be faster.

OFED is an established kernel API. I agree that F from infiniband should be
deprecated since it has broader scope of use.

It avoid wasting effort by duplicating kernel's code.

It provides security too that UIO could not provide.

Best regards,
  Vincent


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vlad Zolotarov


On 10/06/15 01:49, Michael S. Tsirkin wrote:
> On Tue, Oct 06, 2015 at 01:09:55AM +0300, Vladislav Zolotarov wrote:
>> How about instead of trying to invent the wheel just go and attack the 
>> problem
>> directly just like i've proposed already a few times in the last days: 
>> instead
>> of limiting the UIO limit the users that are allowed to use UIO to privileged
>> users only (e.g. root). This would solve all clearly unresolvable issues u 
>> are
>> raising here all together, wouldn't it?
> No - root or no root, if the user can modify the addresses in the MSI-X
> table and make the chip corrupt random memory, this is IMHO a non-starter.

Michael, how this or any other related patch is related to the problem u 
r describing? The above ability is there for years and if memory serves 
me well it was u who wrote uio_pci_generic with this "security flaw".  ;)

This patch in general only adds the ability to receive notifications per 
MSI-X interrupt and it has nothing to do with the ability to reprogram 
the MSI-X related registers from the user space which was always there.

>
> And tainting kernel is not a solution - your patch adds a pile of
> code that either goes completely unused or taints the kernel.
> Not just that - it's a dedicated userspace API that either
> goes completely unused or taints the kernel.
>
>>> --
>>> MST



[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-10-06 Thread Charles (Chas) Williams
On Tue, 2015-10-06 at 08:35 +0100, Stephen Hemminger wrote:
> On Mon,  5 Oct 2015 11:26:08 -0400
> Chas Williams <3chas3 at gmail.com> wrote:
> 
> > diff --git a/lib/librte_eal/common/include/rte_pci.h 
> > b/lib/librte_eal/common/include/rte_pci.h
> > index 83e3c28..852c149 100644
> > --- a/lib/librte_eal/common/include/rte_pci.h
> > +++ b/lib/librte_eal/common/include/rte_pci.h
> > @@ -161,6 +161,7 @@ struct rte_pci_device {
> > struct rte_pci_resource mem_resource[PCI_MAX_RESOURCE];   /**< PCI 
> > Memory Resource */
> > struct rte_intr_handle intr_handle; /**< Interrupt handle */
> > struct rte_pci_driver *driver;  /**< Associated driver */
> > +   char name[32];
> 
> Why not use IFNAMSIZ rather than magic constant here?

No particular reason.  It just matches the virtual device name size.
I will change it.



[dpdk-dev] [PATCH 00/17] Enhance mlx5 with Mellanox OFED 3.1

2015-10-06 Thread Stephen Hemminger
On Mon,  5 Oct 2015 19:54:35 +0200
Adrien Mazarguil  wrote:

> Mellanox OFED 3.1 [1] comes with improved APIs that Mellanox ConnectX-4
> (mlx5) adapters can take advantage of, such as:
> 
> - Separate post and doorbell operations on all queues.
> - Lightweight RX queues called Work Queues (WQs).
> - Low-level RSS indirection table and hash key configuration.
> 
> This patchset enhances mlx5 with all of these for better performance and
> flexibility. Documentation is updated accordingly.

Has anybody explored doing a driver without the dependency on OFED?
It is certainly possible. The Linux kernel drivers don't depend on it.
And dropping OFED would certainly be faster.


[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-10-06 Thread Stephen Hemminger
On Mon,  5 Oct 2015 11:26:08 -0400
Chas Williams <3chas3 at gmail.com> wrote:

> diff --git a/lib/librte_eal/common/include/rte_pci.h 
> b/lib/librte_eal/common/include/rte_pci.h
> index 83e3c28..852c149 100644
> --- a/lib/librte_eal/common/include/rte_pci.h
> +++ b/lib/librte_eal/common/include/rte_pci.h
> @@ -161,6 +161,7 @@ struct rte_pci_device {
>   struct rte_pci_resource mem_resource[PCI_MAX_RESOURCE];   /**< PCI 
> Memory Resource */
>   struct rte_intr_handle intr_handle; /**< Interrupt handle */
>   struct rte_pci_driver *driver;  /**< Associated driver */
> + char name[32];

Why not use IFNAMSIZ rather than magic constant here?


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Stephen Hemminger
Other than implementation objections, so far the two main arguments
against this reduce to:
  1. If you allow UIO ioctl then it opens an API hook for all the crap out
 of tree UIO drivers to do what they want.
  2. If you allow UIO MSI-X then you are expanding the usage of userspace
 device access in an insecure manner.

Another alternative which I explored was making a version of VFIO that
works without IOMMU. It solves #1 but actually increases the likely negative
response to arguent #2. This would keep same API, and avoid having to
modify UIO. But we would still have the same (if not more resistance)
from IOMMU developers who believe all systems have to be secure against
root.




[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Tue, Oct 06, 2015 at 01:09:55AM +0300, Vladislav Zolotarov wrote:
> How about instead of trying to invent the wheel just go and attack the problem
> directly just like i've proposed already a few times in the last days: instead
> of limiting the UIO limit the users that are allowed to use UIO to privileged
> users only (e.g. root). This would solve all clearly unresolvable issues u are
> raising here all together, wouldn't it?

No - root or no root, if the user can modify the addresses in the MSI-X
table and make the chip corrupt random memory, this is IMHO a non-starter.

And tainting kernel is not a solution - your patch adds a pile of
code that either goes completely unused or taints the kernel.
Not just that - it's a dedicated userspace API that either
goes completely unused or taints the kernel.

> >
> > --
> > MST
> 


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Vladislav Zolotarov
On Oct 6, 2015 12:55 AM, "Michael S. Tsirkin"  wrote:
>
> On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> > Just forwarding events is not enough to make a valid driver.
> > What is missing is a way to access the device in a safe way.
>
> Thinking about it some more, maybe some devices don't do DMA, and merely
> signal events with MSI/MSI-X.
>
> The fact you mention igb_uio in the cover letter seems to hint that this
> isn't the case, and that the real intent is to abuse it for DMA-capable
> devices, but still ...
>
> If we assume such a simple device, we need to block userspace from
> tweaking at least the MSI control and the MSI-X table.
> And changing BARs might make someone else corrupt the MSI-X
> table, so we need to block it from changing BARs, too.
>
> Things like device reset will clear the table.  I guess this means we
> need to track access to reset, too, make sure we restore the
> table to a sane config.
>
> PM  capability can be used to reset things tooI think. Better be
> careful about that.
>
> And a bunch of devices could be doing weird things that need
> to be special-cased.
>
> All of this is what VFIO is already dealing with.
>
> Maybe extending VFIO for this usecase, or finding another way to share
> code might be a better idea than duplicating the code within uio?

How about instead of trying to invent the wheel just go and attack the
problem directly just like i've proposed already a few times in the last
days: instead of limiting the UIO limit the users that are allowed to use
UIO to privileged users only (e.g. root). This would solve all clearly
unresolvable issues u are raising here all together, wouldn't it?

>
> --
> MST


[dpdk-dev] [PATCH 2/2] uio: new driver to support PCI MSI-X

2015-10-06 Thread Michael S. Tsirkin
On Thu, Oct 01, 2015 at 11:33:06AM +0300, Michael S. Tsirkin wrote:
> Just forwarding events is not enough to make a valid driver.
> What is missing is a way to access the device in a safe way.

Thinking about it some more, maybe some devices don't do DMA, and merely
signal events with MSI/MSI-X.

The fact you mention igb_uio in the cover letter seems to hint that this
isn't the case, and that the real intent is to abuse it for DMA-capable
devices, but still ...

If we assume such a simple device, we need to block userspace from
tweaking at least the MSI control and the MSI-X table.
And changing BARs might make someone else corrupt the MSI-X
table, so we need to block it from changing BARs, too.

Things like device reset will clear the table.  I guess this means we
need to track access to reset, too, make sure we restore the
table to a sane config.

PM  capability can be used to reset things tooI think. Better be
careful about that.

And a bunch of devices could be doing weird things that need
to be special-cased.

All of this is what VFIO is already dealing with.

Maybe extending VFIO for this usecase, or finding another way to share
code might be a better idea than duplicating the code within uio?

-- 
MST


[dpdk-dev] Unlinking hugepage backing file after initialiation

2015-10-06 Thread Michael S. Tsirkin
On Mon, Oct 05, 2015 at 01:08:52PM +, Xie, Huawei wrote:
> On 9/30/2015 5:36 AM, Michael S. Tsirkin wrote:
> > On Tue, Sep 29, 2015 at 05:50:00PM +, shesha Sreenivasamurthy (shesha) 
> > wrote:
> >> Sure. Then, is there any real reason why the backing files should not be
> >> unlinked ?
> > AFAIK qemu unlinks them already.
> Sorry, i didn't make it clear. Let us take the physical Ethernet
> controller in the host for example
> 
> 1)  DPDK app1 unlinked huge page after initialization.
> 2)  DPDK app1 crashed or got killed unexpectedly.
> 3)  The nic device is just DMAing to the buffer memory allocated from
> the huge page.
> 4)  Another app2 started, allocated memory from the hugetlbfs, and the
> memory allocated happened to be the buffer memory.
> Ok, the nic device dmaed to memory of app2, which corrupted app2.
> Btw, the window opened is very very narrow, but we could avoid this
> corruption if we don't unlink huge page immediately.  We could
> reinitialize the nic through binding operation and then remove the huge
> page.
> 
> I mentioned virtio at the first time. For its case, the one who does DMA
> is vhost and i am talking about the guest huge page not the huge pages
> used to back guest memory.
> 
> So we had better not unlink huge pages unless we have other solution to
> avoid the corruption.

Oh, I get it now. It's when you (ab)use UIO to bypass all normal kernel
protections.  There's no problem when using VFIO.

So kernel doesn't protect you in case of a crash, but I guess you
can try to protect yourself.

For example, write a separate service that you can pass the hugepage FDs
and the device FDs to. Have it hold on to them, and when it detects your
app crashed, have it reset the device before closing the FDs.

Just make sure that one doesn't crash :).

But really, people should just use VFIO.

-- 
MST