date:20160202

[dpdk-dev] [PATCH v1 1/5] eal: get CPU flag name

2016-02-02 Thread Thomas Monjalon

The new function rte_cpu_get_flag_name() is added to the EAL API.
It is implemented (duplicated) in each arch because the next patch
will remove the public exposure of the feature tables.

Signed-off-by: Thomas Monjalon 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map|  1 +
 lib/librte_eal/common/arch/arm/rte_cpuflags.c|  7 +++
 lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c |  8 
 lib/librte_eal/common/arch/x86/rte_cpuflags.c|  8 
 lib/librte_eal/common/eal_common_cpuflags.c  |  2 +-
 lib/librte_eal/common/include/generic/rte_cpuflags.h | 14 --
 lib/librte_eal/linuxapp/eal/rte_eal_version.map  |  2 ++
 7 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 6fa9c67..1280cb2 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -140,5 +140,6 @@ DPDK_2.3 {
global:

rte_cpu_feature_table;
+   rte_cpu_get_flag_name;

 } DPDK_2.2;
diff --git a/lib/librte_eal/common/arch/arm/rte_cpuflags.c 
b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
index 4348574..62e0791 100644
--- a/lib/librte_eal/common/arch/arm/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/arm/rte_cpuflags.c
@@ -77,3 +77,10 @@ const struct feature_entry rte_cpu_feature_table[] = {
 };
 #endif

+const char *
+rte_cpu_get_flag_name(enum rte_cpu_flag_t feature)
+{
+   if (feature >= RTE_CPUFLAG_NUMFLAGS)
+   return NULL;
+   return rte_cpu_feature_table[feature].name;
+}
diff --git a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c 
b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
index 7f4e6dd..a270ccc 100644
--- a/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/ppc_64/rte_cpuflags.c
@@ -68,3 +68,11 @@ const struct feature_entry rte_cpu_feature_table[] = {
FEAT_DEF(HTM, 0x0001, 0, REG_HWCAP2,  30)
FEAT_DEF(ARCH_2_07, 0x0001, 0, REG_HWCAP2,  31)
 };
+
+const char *
+rte_cpu_get_flag_name(enum rte_cpu_flag_t feature)
+{
+   if (feature >= RTE_CPUFLAG_NUMFLAGS)
+   return NULL;
+   return rte_cpu_feature_table[feature].name;
+}
diff --git a/lib/librte_eal/common/arch/x86/rte_cpuflags.c 
b/lib/librte_eal/common/arch/x86/rte_cpuflags.c
index 22dc572..3346fde 100644
--- a/lib/librte_eal/common/arch/x86/rte_cpuflags.c
+++ b/lib/librte_eal/common/arch/x86/rte_cpuflags.c
@@ -127,3 +127,11 @@ const struct feature_entry rte_cpu_feature_table[] = {

FEAT_DEF(INVTSC, 0x8007, 0, RTE_REG_EDX,  8)
 };
+
+const char *
+rte_cpu_get_flag_name(enum rte_cpu_flag_t feature)
+{
+   if (feature >= RTE_CPUFLAG_NUMFLAGS)
+   return NULL;
+   return rte_cpu_feature_table[feature].name;
+}
diff --git a/lib/librte_eal/common/eal_common_cpuflags.c 
b/lib/librte_eal/common/eal_common_cpuflags.c
index 9ba9b1e..8c0576d 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -79,7 +79,7 @@ rte_cpu_check_supported(void)
fprintf(stderr,
"ERROR: This system does not support \"%s\".\n"
"Please check that RTE_MACHINE is set 
correctly.\n",
-   
rte_cpu_feature_table[compile_time_flags[i]].name);
+   rte_cpu_get_flag_name(compile_time_flags[i]));
exit(1);
}
}
diff --git a/lib/librte_eal/common/include/generic/rte_cpuflags.h 
b/lib/librte_eal/common/include/generic/rte_cpuflags.h
index 5738a2a..3ca2e36 100644
--- a/lib/librte_eal/common/include/generic/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/generic/rte_cpuflags.h
@@ -47,9 +47,7 @@
 /**
  * Enumeration of all CPU features supported
  */
-#ifdef __DOXYGEN__
 enum rte_cpu_flag_t;
-#endif

 /**
  * Enumeration of CPU registers
@@ -95,6 +93,18 @@ static inline void
 rte_cpu_get_features(uint32_t leaf, uint32_t subleaf, cpuid_registers_t out);

 /**
+ * Get name of CPU flag
+ *
+ * @param feature
+ * CPU flag ID
+ * @return
+ * flag name
+ * NULL if flag ID is invalid
+ */
+const char *
+rte_cpu_get_flag_name(enum rte_cpu_flag_t feature);
+
+/**
  * Function for checking a CPU flag availability
  *
  * @param feature
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index aa4fcbd..d8a5978 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -143,4 +143,6 @@ DPDK_2.3 {
global:

rte_cpu_feature_table;
+   rte_cpu_get_flag_name;
+
 } DPDK_2.2;
-- 
2.5.2

[dpdk-dev] [PATCH v6 2/2] vhost: Add VHOST PMD

2016-02-02 Thread Ferruh Yigit

On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
> The patch introduces a new PMD. This PMD is implemented as thin wrapper
> of librte_vhost. It means librte_vhost is also needed to compile the PMD.
> The vhost messages will be handled only when a port is started. So start
> a port first, then invoke QEMU.
> 
> The PMD has 2 parameters.
>  - iface:  The parameter is used to specify a path to connect to a
>virtio-net device.
>  - queues: The parameter is used to specify the number of the queues
>virtio-net device has.
>(Default: 1)
> 
> Here is an example.
> $ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i
> 
> To connect above testpmd, here is qemu command example.
> 
> $ qemu-system-x86_64 \
> 
> -chardev socket,id=chr0,path=/tmp/sock0 \
> -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
> -device virtio-net-pci,netdev=net0,mq=on
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  config/common_linuxapp  |   6 +
>  doc/guides/nics/index.rst   |   1 +
>  doc/guides/rel_notes/release_2_2.rst|   2 +
>  drivers/net/Makefile|   4 +
>  drivers/net/vhost/Makefile  |  62 ++
>  drivers/net/vhost/rte_eth_vhost.c   | 920 
> 
>  drivers/net/vhost/rte_eth_vhost.h   |  73 +++
>  drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
>  mk/rte.app.mk   |   8 +-
>  9 files changed, 1086 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/vhost/Makefile
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.c
>  create mode 100644 drivers/net/vhost/rte_eth_vhost.h
>  create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..357b557 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
>  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
>  
>  #
> +# Compile vhost PMD
> +# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
> +#
> +CONFIG_RTE_LIBRTE_PMD_VHOST=y
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 33c9cea..5819cdb 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -47,6 +47,7 @@ Network Interface Controller Drivers
>  nfp
>  szedata2
>  virtio
> +vhost
>  vmxnet3
>  pcap_ring
>  
> diff --git a/doc/guides/rel_notes/release_2_2.rst 
> b/doc/guides/rel_notes/release_2_2.rst

Should this be 2.3 release notes?

> index bb7d15a..6390b44 100644
> --- a/doc/guides/rel_notes/release_2_2.rst
> +++ b/doc/guides/rel_notes/release_2_2.rst
> @@ -168,6 +168,8 @@ New Features
>  
>  * **Added vhost-user multiple queue support.**
>  
> +* **Added vhost PMD.**
> +
>  * **Added port hotplug support to vmxnet3.**
>  
>  * **Added port hotplug support to xenvirt.**
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 6e4497e..4300b93 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
>  
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
> +endif # $(CONFIG_RTE_LIBRTE_VHOST)
> +

Not directly related to the your patch, but since I saw here:
I think we should have resolve dependencies in our "config tool"/"make system" 
instead of having them in makefiles, for long term.
But there is nothing you can do right now, since there is nothing available to 
resolve configuration dependencies.

>  include $(RTE_SDK)/mk/rte.sharelib.mk
>  include $(RTE_SDK)/mk/rte.subdir.mk
> diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
> new file mode 100644
> index 000..f49a69b
> --- /dev/null
> +++ b/drivers/net/vhost/Makefile
> @@ -0,0 +1,62 @@
> +#   BSD LICENSE
> +#
> +#   Copyright (c) 2010-2016 Intel Corporation.
> +#   All rights reserved.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of Intel corporation nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +#

[dpdk-dev] [PATCH v5 3/4] ethdev: redesign link speed config API

2016-02-02 Thread Marc

On 2 February 2016 at 03:20, Stephen Hemminger 
wrote:

> On Thu, 28 Jan 2016 17:33:20 +
> Harish Patil  wrote:
>
> > * Added utility MACROs ETH_SPEED_NUM_XXX with the numeric
> >   values of all supported link speeds, in Mbps.
>
> I would prefer that there were no speed value macros.
> Linux used to have these, but people kept adding new hardware speeds
> and it soon gets out of date.
>

I see what you mean, but I am not sure I agree. Link speeds are generally a
reduced amount of items (~20). Though it is true it can eventually grow,
but at small rate. Having numeric constants all over the source seems less
readable and less maintainable (e.g. less "grepable"/"sedable") to me.

>
> If you are going to redo it, then just increase speed to 64 bit, and allow
> any non-zero value.
>

Value is now 32 bits, which I think is enough for future rates in mbps.
Since these constants were there, and before doing something to have to
revert it, can someone else give his/her opinion on this?

If there is consensus, I've no problem on removing it for v8

Thanks
marc

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Santosh Shukla

On Tue, Feb 2, 2016 at 9:21 PM, Santosh Shukla  wrote:
> On Tue, Feb 2, 2016 at 2:19 PM, Yuanhan Liu  
> wrote:
>> On Tue, Feb 02, 2016 at 06:50:18AM +0100, David Marchand wrote:
>>> On Tue, Feb 2, 2016 at 6:43 AM, Yuanhan Liu >> linux.intel.com> wrote:
>>> > On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
>>> >> Current use-case is virtio: It is used as io_bar which is first
>>> >> bar[1]. But implementation is generic, can be used to do rd/wr for
>>> >> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
>>> >> w/o mapping that bar, So apis will be useful for such cases in future.
>>> >>
>>> >> AFAIU: uio has read/write_config api only and Yes if bar region mapped
>>> >> then no need to do rd/wr, user can directly access the pci_memory. But
>>> >> use-case of this api entirely different: unmapped memory by
>>> >> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
>>> >>
>>> >> Is above explanation convincing? Pl. let me know.
>>> >
>>> > TBH, not really. So, as you stated, it should be generic APIs to
>>> > read/write bar space, but limiting it to VFIO only and claiming
>>> > that read/write bar space is not support by other drivers (such
>>> > as UIO) while in fact it can (in some ways) doesn't seem right
>>> > to me.
>>> >
>>> > Anyway, it's just some thoughts from me. David, comments?
>>>
>>> >From the very start, same opinion.
>>> We should have a unique api to access those, and eal should hide
>>> details like kernel drivers (uio, vfio, whatever) to the pmd.
>>>
>>> Now the thing is, how to do this in an elegant and efficient way.
>>
>> I was thinking that we may just make it be IO port specific read/
>> write functions:
>>
>
> Ok,
>
>> rte_eal_pci_ioport_read(dev, bar, buf, size)
>> {
>>
>> return if not an IO bar;
>>
>> if (has io)
>> return inb/w/l();
>>
>
> In that case, It may be r / if (has io) / if (drv->kdrv == UIO)
>
>> if (vfio)
>> return vfio_ioport_read();
>>
>> else, claim aloud that io port read is not allowed
>> }
>>
>> Let us not handle memory bar resource here: in such case, you should
>> go with rte_eal_pci_map_device() and do it with memory mapped io.
>>
>> Does that make any sense?
>>
> I am not entirely sure.
> Are you considering IGB_UIO, UIO_GENERIC and NIC_UIO: all the cases ?
>

Just came-up something below what Yuanhan has proposed, Does this look okay?

int rte_eal_pci_ioport_read(const struct rte_pci_device *device,
  void *buf, size_t len,
off_t offset,
  int bar_idx)
{
  if (bar_idx != 0) {
  RTE_LOG(ERR, EAL, "not a ioport bar\n");
  return -1;
   }

 switch (device->kdrv) {
 case RTE_KDRV_VFIO:
   return pci_vfio_ioport_read(device, buf, len, offset, bar_idx);
 case RTE_KDRV_IGB_UIO:
 case RTE_KDRV_UIO_GENERIC:
 case RTE_KDRV_NIC_UIO:
 {
  switch (size)
  case 1: return inb(buf /*ioport address*/);
  case 2: return inw(buf /* ioport address*/);
  case 4: return inl(buf /* ioport address*/);
  default:
 RTE_LOG(ERR, EAL, "invalid size\n");
  }

   default:
 RTE_LOG(ERR, EAL, "read bar not supported by driver\n");
 return -1;
  }
}

>
>> --yliu

[dpdk-dev] [PATCH v7 9/9] virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c

2016-02-02 Thread Yuanhan Liu

virtio_pci.c is the only file references those macros; move them there.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
 drivers/net/virtio/virtio_pci.c | 19 +++
 drivers/net/virtio/virtio_pci.h | 18 --
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 4cdecea..9fbc2f5 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -49,6 +49,25 @@
 #define PCI_CAPABILITY_LIST0x34
 #define PCI_CAP_ID_VNDR0x09

+
+#define VIRTIO_PCI_REG_ADDR(hw, reg) \
+   (unsigned short)((hw)->io_base + (reg))
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+   outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
+#define VIRTIO_READ_REG_2(hw, reg) \
+   inw((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+   outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
+#define VIRTIO_READ_REG_4(hw, reg) \
+   inl((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
   void *dst, int length)
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index fcac660..0544a07 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -318,24 +318,6 @@ outl_p(unsigned int data, unsigned int port)
 }
 #endif

-#define VIRTIO_PCI_REG_ADDR(hw, reg) \
-   (unsigned short)((hw)->io_base + (reg))
-
-#define VIRTIO_READ_REG_1(hw, reg) \
-   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_1(hw, reg, value) \
-   outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_2(hw, reg) \
-   inw((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_2(hw, reg, value) \
-   outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_4(hw, reg) \
-   inl((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_4(hw, reg, value) \
-   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
 static inline int
 vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 {
-- 
1.9.0

[dpdk-dev] [PATCH v7 8/9] virtio: add 1.0 support

2016-02-02 Thread Yuanhan Liu

Modern (v1.0) virtio pci device defines several pci capabilities.
Each cap has a configure structure corresponding to it, and the
cap.bar and cap.offset fields tell us where to find it.

Firstly, we map the pci resources by rte_eal_pci_map_device().
We then could easily locate a cfg structure by:

cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset;

Therefore, the entrance of enabling modern (v1.0) pci device support
is to iterate the pci capability lists, and to locate some configs
we care; and they are:

- common cfg

  For generic virtio and virtqueue configuration, such as setting/getting
  features, enabling a specific queue, and so on.

- nofity cfg

  Combining with `queue_notify_off' from common cfg, we could use it to
  notify a specific virt queue.

- device cfg

  Where virtio_net_config structure is located.

- isr cfg

  Where to read isr (interrupt status).

If any of above cap is not found, we fallback to the legacy virtio
handling.

If succeed, hw->vtpci_ops is assigned to modern_ops, where all
operations are implemented by reading/writing a (or few) specific
configuration space from above 4 cfg structures. And that's basically
how this patch works.

Besides those changes, virtio 1.0 introduces a new status field:
FEATURES_OK, which is set after features negotiation is done.

Last, set the VIRTIO_F_VERSION_1 feature flag.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---

v6: - unfold DEF_IO_READ/WRITE macros

v5: - rename MODERN_READ/WRITE_DEF macro name to IO_READ/WRITE_DEF

- check offset + length overflow
---
 doc/guides/rel_notes/release_2_3.rst |   3 +
 drivers/net/virtio/virtio_ethdev.c   |  25 ++-
 drivers/net/virtio/virtio_ethdev.h   |   3 +-
 drivers/net/virtio/virtio_pci.c  | 355 ++-
 drivers/net/virtio/virtio_pci.h  |  67 +++
 drivers/net/virtio/virtqueue.h   |   2 +
 6 files changed, 450 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..c390d97 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
 New Features
 

+* **Virtio 1.0 support.**
+
+  Enabled virtio 1.0 support for virtio pmd driver.

 Resolved Issues
 ---
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 4f84757..755503d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -927,7 +927,7 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
return virtio_send_command(hw->cvq, , , 1);
 }

-static void
+static int
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint64_t host_features;
@@ -949,6 +949,22 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features = vtpci_negotiate_features(hw, host_features);
PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
hw->guest_features);
+
+   if (hw->modern) {
+   if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+   PMD_INIT_LOG(ERR,
+   "VIRTIO_F_VERSION_1 features is not enabled.");
+   return -1;
+   }
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+   if (!(vtpci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) 
{
+   PMD_INIT_LOG(ERR,
+   "failed to set FEATURES_OK status!");
+   return -1;
+   }
+   }
+
+   return 0;
 }

 /*
@@ -1032,7 +1048,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-   virtio_negotiate_features(hw);
+   if (virtio_negotiate_features(hw) < 0)
+   return -1;

/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
@@ -1043,7 +1060,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
rx_func_get(eth_dev);

/* Setting up rx_header size for the device */
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+   vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
@@ -1159,6 +1177,7 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
rte_intr_callback_unregister(_dev->intr_handle,
virtio_interrupt_handler,
eth_dev);
+   rte_eal_pci_unmap_device(pci_dev);

[dpdk-dev] [PATCH v7 7/9] eal: pci: export pci_[un]map_device

2016-02-02 Thread Yuanhan Liu

Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.

However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.

Therefore, this patch exports pci_map_device, and let virtio pmd call
it when necessary.

Cc: David Marchand 
Signed-off-by: Yuanhan Liu 
Tested-by: Santosh Shukla 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: David Marchand 
Acked-by: Huawei Xie 
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  4 ++--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  2 ++
 lib/librte_eal/common/eal_common_pci.c  |  4 ++--
 lib/librte_eal/common/eal_private.h | 18 -
 lib/librte_eal/common/include/rte_pci.h | 27 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  4 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  3 +++
 7 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 6c21fbd..95c32c1 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

@@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev)

 /* Unmap pci device */
 void
-pci_unmap_device(struct rte_pci_device *dev)
+rte_eal_pci_unmap_device(struct rte_pci_device *dev)
 {
/* try unmapping the NIC resources */
switch (dev->kdrv) {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 6fa9c67..d8ac7f7 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -139,6 +139,8 @@ DPDK_2.2 {
 DPDK_2.3 {
global:

+   rte_eal_pci_map_device;
+   rte_eal_pci_unmap_device;
rte_cpu_feature_table;

 } DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..96d5113 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
pci_config_space_set(dev);
 #endif
/* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
+   ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
/* unmap resources for devices that use igb_uio */
-   pci_unmap_device(dev);
+   rte_eal_pci_unmap_device(dev);

return 0;
}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 072e672..2342fa1 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,24 +165,6 @@ struct rte_pci_device;
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
- * Map this device
- *
- * This function is private to EAL.
- *
- * @return
- *   0 on success, negative on error and positive if no driver
- *   is found for the device.
- */
-int pci_map_device(struct rte_pci_device *dev);
-
-/**
- * Unmap this device
- *
- * This function is private to EAL.
- */
-void pci_unmap_device(struct rte_pci_device *dev);
-
-/**
  * Map the PCI resource of a PCI device in virtual memory
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 334c12e..2224109 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -485,6 +485,33 @@ int rte_eal_pci_read_config(const struct rte_pci_device 
*device,
  */
 int rte_eal_pci_write_config(const struct rte_pci_device *device,
 const void *buf, size_t len, off_t offset);
+/**
+ * Map the PCI device resources in user space virtual memory address
+ *
+ * Note that driver should not call this function when flag
+ * RTE_PCI_DRV_NEED_MAPPING is set, as EAL will do that for
+ * you when it's on.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ *
+ * @return
+ *   0 on success,

[dpdk-dev] [PATCH v7 6/9] virtio: retrieve hdr_size from hw->vtnet_hdr_size

2016-02-02 Thread Yuanhan Liu

The mergeable virtio net hdr format has been the standard and the
only virtio net hdr format since virtio 1.0. Therefore, we can
not hardcode hdr_size to "sizeof(struct virtio_net_hdr)" any more
at virtio_recv_pkts(), otherwise, there would be a mismatch of
hdr size from rte_vhost_enqueue_burst() and virtio_recv_pkts(),
leading a packet corruption.

Instead, we should retrieve it from hw->vtnet_hdr_size; we will
do proper settings at eth_virtio_dev_init() in later patches.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
---
 drivers/net/virtio/virtio_rxtx.c|  6 --
 drivers/net/virtio/virtio_rxtx_simple.c | 12 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b7267c0..41a1366 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -560,7 +560,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
int error;
uint32_t i, nb_enqueued;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -580,6 +580,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
hw = rxvq->hw;
nb_rx = 0;
nb_enqueued = 0;
+   hdr_size = hw->vtnet_hdr_size;

for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];
@@ -664,7 +665,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint32_t seg_num;
uint16_t extra_idx;
uint32_t seg_res;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -682,6 +683,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_num = 0;
extra_idx = 0;
seg_res = 0;
+   hdr_size = hw->vtnet_hdr_size;

while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index ff3c11a..3a1de9d 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -81,9 +81,9 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,

start_dp = vq->vq_ring.desc;
start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
-   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   RTE_PKTMBUF_HEADROOM - vq->hw->vtnet_hdr_size);
start_dp[desc_idx].len = cookie->buf_len -
-   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   RTE_PKTMBUF_HEADROOM + vq->hw->vtnet_hdr_size;

vq->vq_free_cnt--;
vq->vq_avail_idx++;
@@ -120,9 +120,9 @@ virtio_rxq_rearm_vec(struct virtqueue *rxvq)

start_dp[i].addr =
(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
-   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   RTE_PKTMBUF_HEADROOM - rxvq->hw->vtnet_hdr_size);
start_dp[i].len = sw_ring[i]->buf_len -
-   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   RTE_PKTMBUF_HEADROOM + rxvq->hw->vtnet_hdr_size;
}

rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
@@ -175,8 +175,8 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
len_adjust = _mm_set_epi16(
0, 0,
0,
-   (uint16_t) -sizeof(struct virtio_net_hdr),
-   0, (uint16_t) -sizeof(struct virtio_net_hdr),
+   (uint16_t)-rxvq->hw->vtnet_hdr_size,
+   0, (uint16_t)-rxvq->hw->vtnet_hdr_size,
0, 0);

if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
-- 
1.9.0

[dpdk-dev] [PATCH v7 5/9] viritio: switch to 64 bit features

2016-02-02 Thread Yuanhan Liu

Switch to 64 bit features, which virtio 1.0 supports.

While legacy virtio only supports 32 bit features, it complains aloud
and quit when trying to setting > 32 bit features.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c |  8 
 drivers/net/virtio/virtio_pci.c| 15 ++-
 drivers/net/virtio/virtio_pci.h| 12 ++--
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b57224d..4f84757 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -930,16 +930,16 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
-   uint32_t host_features;
+   uint64_t host_features;

/* Prepare guest_features: feature that driver wants to support */
hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
-   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %" PRIx64,
hw->guest_features);

/* Read device(host) feature bits */
host_features = hw->vtpci_ops->get_features(hw);
-   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %" PRIx64,
host_features);

/*
@@ -947,7 +947,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 * guest feature bits.
 */
hw->guest_features = vtpci_negotiate_features(hw, host_features);
-   PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "features after negotiate = %" PRIx64,
hw->guest_features);
 }

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index e89a044..3961077 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -87,15 +87,20 @@ legacy_write_dev_config(struct virtio_hw *hw, size_t offset,
}
 }

-static uint32_t
+static uint64_t
 legacy_get_features(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
 }

 static void
-legacy_set_features(struct virtio_hw *hw, uint32_t features)
+legacy_set_features(struct virtio_hw *hw, uint64_t features)
 {
+   if ((features >> 32) != 0) {
+   PMD_DRV_LOG(ERR,
+   "only 32 bit features are allowed for legacy virtio!");
+   return;
+   }
VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
 }

@@ -453,10 +458,10 @@ vtpci_write_dev_config(struct virtio_hw *hw, size_t 
offset,
hw->vtpci_ops->write_dev_cfg(hw, offset, src, length);
 }

-uint32_t
-vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
+uint64_t
+vtpci_negotiate_features(struct virtio_hw *hw, uint64_t host_features)
 {
-   uint32_t features;
+   uint64_t features;

/*
 * Limit negotiated features to what the driver, virtqueue, and
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index e8e7509..d7bc6bb 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -175,8 +175,8 @@ struct virtio_pci_ops {
uint8_t (*get_status)(struct virtio_hw *hw);
void(*set_status)(struct virtio_hw *hw, uint8_t status);

-   uint32_t (*get_features)(struct virtio_hw *hw);
-   void (*set_features)(struct virtio_hw *hw, uint32_t features);
+   uint64_t (*get_features)(struct virtio_hw *hw);
+   void (*set_features)(struct virtio_hw *hw, uint64_t features);

uint8_t (*get_isr)(struct virtio_hw *hw);

@@ -191,7 +191,7 @@ struct virtio_pci_ops {
 struct virtio_hw {
struct virtqueue *cvq;
uint32_tio_base;
-   uint32_tguest_features;
+   uint64_tguest_features;
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
@@ -271,9 +271,9 @@ outl_p(unsigned int data, unsigned int port)
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg

 static inline int
-vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
+vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 {
-   return (hw->guest_features & (1u << bit)) != 0;
+   return (hw->guest_features & (1ULL << bit)) != 0;
 }

 /*
@@ -286,7 +286,7 @@ void vtpci_reinit_complete(struct virtio_hw *);

 void vtpci_set_status(struct virtio_hw *, uint8_t);

-uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
+uint64_t vtpci_negotiate_features(struct virtio_hw *, uint64_t);

 void vtpci_write_dev_config(struct virtio_hw *, size_t, const void *, int);

-- 
1.9.0

[dpdk-dev] [PATCH v7 4/9] virtio: move left pci stuff to virtio_pci.c

2016-02-02 Thread Yuanhan Liu

virtio_pci.c is a more proper place for pci stuff; virtio_ethdev is not.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
 drivers/net/virtio/virtio_ethdev.c | 265 +---
 drivers/net/virtio/virtio_pci.c| 272 -
 2 files changed, 272 insertions(+), 265 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 6c1d3a0..b57224d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,10 +36,6 @@
 #include 
 #include 
 #include 
-#ifdef RTE_EXEC_ENV_LINUXAPP
-#include 
-#include 
-#endif

 #include 
 #include 
@@ -955,260 +951,6 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);
 }

-#ifdef RTE_EXEC_ENV_LINUXAPP
-static int
-parse_sysfs_value(const char *filename, unsigned long *val)
-{
-   FILE *f;
-   char buf[BUFSIZ];
-   char *end = NULL;
-
-   f = fopen(filename, "r");
-   if (f == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
-__func__, filename);
-   return -1;
-   }
-
-   if (fgets(buf, sizeof(buf), f) == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   *val = strtoul(buf, , 0);
-   if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   fclose(f);
-   return 0;
-}
-
-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int 
buflen,
-   unsigned int *uio_num)
-{
-   struct dirent *e;
-   DIR *dir;
-   char dirname[PATH_MAX];
-
-   /* depending on kernel version, uio can be located in uio/uioX
-* or uio:uioX */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-   if (dir == NULL) {
-   /* retry with the parent directory */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-
-   if (dir == NULL) {
-   PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
-   return -1;
-   }
-   }
-
-   /* take the first file starting with "uio" */
-   while ((e = readdir(dir)) != NULL) {
-   /* format could be uio%d ...*/
-   int shortprefix_len = sizeof("uio") - 1;
-   /* ... or uio:uio%d */
-   int longprefix_len = sizeof("uio:uio") - 1;
-   char *endptr;
-
-   if (strncmp(e->d_name, "uio", 3) != 0)
-   continue;
-
-   /* first try uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + shortprefix_len, , 10);
-   if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-   snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
-   break;
-   }
-
-   /* then try uio:uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + longprefix_len, , 10);
-   if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-   snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-*uio_num);
-   break;
-   }
-   }
-   closedir(dir);
-
-   /* No uio resource found */
-   if (e == NULL) {
-   PMD_INIT_LOG(ERR, "Could not find uio resource");
-   return -1;
-   }
-
-   return 0;
-}
-
-static int
-virtio_has_msix(const struct rte_pci_addr *loc)
-{
-   DIR *d;
-   char dirname[PATH_MAX];
-
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
-loc->domain, loc->bus, loc->devid, loc->function);
-
-   d = opendir(dirname);
-   if (d)
-   closedir(d);
-
-   return (d != NULL);
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
-{
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   unsigned long start, size;
-   unsigned int uio_num;
-
-   if (get_uio_dev(_dev->addr, dirname, sizeof(dirname), _num) < 0)
-   return -1;
-
-   /* get portio size */
-

[dpdk-dev] [PATCH v7 3/9] virtio: introduce struct virtio_pci_ops

2016-02-02 Thread Yuanhan Liu

Introduce struct virtio_pci_ops, to let legacy virtio (v0.95) and
modern virtio (1.0) have different implementation regarding to a
specific pci action, such as read host status.

With that, this patch reimplements all exported pci functions, in
a way like:

vtpci_foo_bar(struct virtio_hw *hw)
{
hw->vtpci_ops->foo_bar(hw);
}

So that we need pay attention to those pci related functions only
while adding virtio 1.0 support.

This patch introduced a new vtpci function, vtpci_init(), to do
proper virtio pci settings. It's pretty simple so far: just sets
hw->vtpci_ops to legacy_ops as we don't support 1.0 yet.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
v5: - define "src" arg of vtpci_write_dev_config()
---
 drivers/net/virtio/virtio_ethdev.c |  22 ++---
 drivers/net/virtio/virtio_pci.c| 164 ++---
 drivers/net/virtio/virtio_pci.h|  29 ++-
 drivers/net/virtio/virtqueue.h |   2 +-
 4 files changed, 170 insertions(+), 47 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..6c1d3a0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -272,9 +272,7 @@ virtio_dev_queue_release(struct virtqueue *vq) {

if (vq) {
hw = vq->hw;
-   /* Select and deactivate the queue */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, 
vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
+   hw->vtpci_ops->del_queue(hw, vq);

rte_free(vq->sw_ring);
rte_free(vq);
@@ -295,15 +293,13 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue *vq = NULL;

-   /* Write the virtqueue index to the Queue Select Field */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
-   PMD_INIT_LOG(DEBUG, "selecting queue: %u", vtpci_queue_idx);
+   PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);

/*
 * Read the virtqueue size from the Queue Size field
 * Always power of 2 and if 0 virtqueue does not exist
 */
-   vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
+   vq_size = hw->vtpci_ops->get_queue_num(hw, vtpci_queue_idx);
PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc);
if (vq_size == 0) {
PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
@@ -436,12 +432,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   /*
-* Set guest physical address of the virtqueue
-* in VIRTIO_PCI_QUEUE_PFN config register of device
-*/
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
-   mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+   hw->vtpci_ops->setup_queue(hw, vq);
+
*pvq = vq;
return 0;
 }
@@ -950,7 +942,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);

/* Read device(host) feature bits */
-   host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+   host_features = hw->vtpci_ops->get_features(hw);
PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
host_features);

@@ -1287,6 +1279,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

+   vtpci_init(pci_dev, hw);
+
if (virtio_resource_init(pci_dev) < 0)
return -1;

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index b34b59e..8d001e8 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -34,12 +34,11 @@

 #include "virtio_pci.h"
 #include "virtio_logs.h"
+#include "virtqueue.h"

-static uint8_t vtpci_get_status(struct virtio_hw *);
-
-void
-vtpci_read_dev_config(struct virtio_hw *hw, size_t offset,
-   void *dst, int length)
+static void
+legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
+  void *dst, int length)
 {
uint64_t off;
uint8_t *d;
@@ -60,22 +59,22 @@ vtpci_read_dev_config(struct virtio_hw *hw, size_t offset,
}
 }

-void
-vtpci_write_dev_config(struct virtio_hw *hw, size_t offset,
-   void *src, int length)
+static void
+legacy_write_dev_config(struct virtio_hw *hw, size_t offset,
+   const void *src, int length)
 {
uint64_t off;
-   uint8_t *s;
+   const uint8_t *s;
int size;

off = VIRTIO_PCI_CONFIG(hw) + offset;
for (s = src; length > 0; s += size, off += size, length -= size) {
if (length >= 4) {
size = 4;
-

[dpdk-dev] [PATCH v7 2/9] virtio: define offset as size_t type

2016-02-02 Thread Yuanhan Liu

offset arg of vtpci_read/write_dev_config is derived from offsetof(),
which is of size_t type, instead of uint64_t. So, define it as size_t
type.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
 drivers/net/virtio/virtio_pci.c | 4 ++--
 drivers/net/virtio/virtio_pci.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 2245bec..b34b59e 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -38,7 +38,7 @@
 static uint8_t vtpci_get_status(struct virtio_hw *);

 void
-vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+vtpci_read_dev_config(struct virtio_hw *hw, size_t offset,
void *dst, int length)
 {
uint64_t off;
@@ -61,7 +61,7 @@ vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
 }

 void
-vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+vtpci_write_dev_config(struct virtio_hw *hw, size_t offset,
void *src, int length)
 {
uint64_t off;
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..fe89c21 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -261,9 +261,9 @@ void vtpci_set_status(struct virtio_hw *, uint8_t);

 uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);

-void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);
+void vtpci_write_dev_config(struct virtio_hw *, size_t, void *, int);

-void vtpci_read_dev_config(struct virtio_hw *, uint64_t, void *, int);
+void vtpci_read_dev_config(struct virtio_hw *, size_t, void *, int);

 uint8_t vtpci_isr(struct virtio_hw *);

-- 
1.9.0

[dpdk-dev] [PATCH v7 1/9] virtio: don't set vring address again at queue startup

2016-02-02 Thread Yuanhan Liu

As we have already set up it at virtio_dev_queue_setup(), and a vq
restart will not reset the settings.

Signed-off-by: Yuanhan Liu 
Tested-by: Qian Xu 
Reviewed-by: Tetsuya Mukawa 
Tested-by: Tetsuya Mukawa 
Acked-by: Huawei Xie 
---
 drivers/net/virtio/virtio_rxtx.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 74b39ef..b7267c0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -339,11 +339,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
vq_update_avail_idx(vq);

PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
} else if (queue_type == VTNET_TQ) {
if (use_simple_rxtx) {
int mid_idx  = vq->vq_nentries >> 1;
@@ -362,16 +357,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
for (i = mid_idx; i < vq->vq_nentries; i++)
vq->vq_ring.avail->ring[i] = i;
}
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-   } else {
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
}
 }

-- 
1.9.0

[dpdk-dev] [PATCH v7 0/9] virtio 1.0 enabling for virtio pmd driver

2016-02-02 Thread Yuanhan Liu

v7: - make checkpatch a bit happier; few (false) warnings still left.

- rebase to latest code: fixed a conflict.

v6: unfold IO_READ/WRITE_DEF macro

v5: minor fixes:

- fix wrong type of arg "offset" of read/write_dev_config(): patch 2
  is newly added for that.

- check "offset + length" overflow

Almost all difference comes from virtio 1.0 are the PCI layout change:
the major configuration structures are stored at bar space, and their
location is stored at corresponding pci cap structure. Reading/parsing
them is one of the major work of patch 8.

To make handling virtio v1.0 and v0.95 co-exist well, this patch set
introduces a virtio_pci_ops structure, to add another layer so that
we could keep those vtpci_foo_bar "APIs". With that, we could do the
minimum change to add virtio 1.0 support.


Rough test guide


Firstly, you need get a virtio 1.0 supported QEMU (say, v2.5), then add
option "disable-modern=false" to qemu virtio-net-pci device to enable
virtio 1.0 (which is disabled by default).

And if you see something like following from 'lspci -v', it means virtio
1.0 is indeed enabled:

00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, IRQ 11
I/O ports at c040 [size=64]
Memory at febf1000 (32-bit, non-prefetchable) [size=4K]
Memory at fe00 (64-bit, prefetchable) [size=8M]
Expansion ROM at feb8 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=6 Masked-
==> Capabilities: [84] Vendor Specific Information: Len=14 
==> Capabilities: [70] Vendor Specific Information: Len=14 
==> Capabilities: [60] Vendor Specific Information: Len=10 
==> Capabilities: [50] Vendor Specific Information: Len=10 
==> Capabilities: [40] Vendor Specific Information: Len=10 
Kernel driver in use: virtio-pci
Kernel modules: virtio_pci

After that, there wasn't anything speical comparing to the old virtio
0.95 pmd driver.


---
Yuanhan Liu (9):
  virtio: don't set vring address again at queue startup
  virtio: define offset as size_t type
  virtio: introduce struct virtio_pci_ops
  virtio: move left pci stuff to virtio_pci.c
  viritio: switch to 64 bit features
  virtio: retrieve hdr_size from hw->vtnet_hdr_size
  eal: pci: export pci_[un]map_device
  virtio: add 1.0 support
  virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c

 doc/guides/rel_notes/release_2_3.rst|   3 +
 drivers/net/virtio/virtio_ethdev.c  | 302 +
 drivers/net/virtio/virtio_ethdev.h  |   3 +-
 drivers/net/virtio/virtio_pci.c | 815 +++-
 drivers/net/virtio/virtio_pci.h | 124 +++-
 drivers/net/virtio/virtio_rxtx.c|  21 +-
 drivers/net/virtio/virtio_rxtx_simple.c |  12 +-
 drivers/net/virtio/virtqueue.h  |   4 +-
 lib/librte_eal/bsdapp/eal/eal_pci.c |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   2 +
 lib/librte_eal/common/eal_common_pci.c  |   4 +-
 lib/librte_eal/common/eal_private.h |  18 -
 lib/librte_eal/common/include/rte_pci.h |  27 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   4 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   3 +
 15 files changed, 964 insertions(+), 382 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Santosh Shukla

On Tue, Feb 2, 2016 at 2:19 PM, Yuanhan Liu  
wrote:
> On Tue, Feb 02, 2016 at 06:50:18AM +0100, David Marchand wrote:
>> On Tue, Feb 2, 2016 at 6:43 AM, Yuanhan Liu  
>> wrote:
>> > On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
>> >> Current use-case is virtio: It is used as io_bar which is first
>> >> bar[1]. But implementation is generic, can be used to do rd/wr for
>> >> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
>> >> w/o mapping that bar, So apis will be useful for such cases in future.
>> >>
>> >> AFAIU: uio has read/write_config api only and Yes if bar region mapped
>> >> then no need to do rd/wr, user can directly access the pci_memory. But
>> >> use-case of this api entirely different: unmapped memory by
>> >> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
>> >>
>> >> Is above explanation convincing? Pl. let me know.
>> >
>> > TBH, not really. So, as you stated, it should be generic APIs to
>> > read/write bar space, but limiting it to VFIO only and claiming
>> > that read/write bar space is not support by other drivers (such
>> > as UIO) while in fact it can (in some ways) doesn't seem right
>> > to me.
>> >
>> > Anyway, it's just some thoughts from me. David, comments?
>>
>> >From the very start, same opinion.
>> We should have a unique api to access those, and eal should hide
>> details like kernel drivers (uio, vfio, whatever) to the pmd.
>>
>> Now the thing is, how to do this in an elegant and efficient way.
>
> I was thinking that we may just make it be IO port specific read/
> write functions:
>

Ok,

> rte_eal_pci_ioport_read(dev, bar, buf, size)
> {
>
> return if not an IO bar;
>
> if (has io)
> return inb/w/l();
>

In that case, It may be r / if (has io) / if (drv->kdrv == UIO)

> if (vfio)
> return vfio_ioport_read();
>
> else, claim aloud that io port read is not allowed
> }
>
> Let us not handle memory bar resource here: in such case, you should
> go with rte_eal_pci_map_device() and do it with memory mapped io.
>
> Does that make any sense?
>
I am not entirely sure.
Are you considering IGB_UIO, UIO_GENERIC and NIC_UIO: all the cases ?


> --yliu

[dpdk-dev] [PATCH v6 0/9] virtio 1.0 enabling for virtio pmd driver

2016-02-02 Thread Yuanhan Liu

On Tue, Feb 02, 2016 at 11:46:54AM +0100, Thomas Monjalon wrote:
> Hi Yuanhan,
> 
> I wanted to apply these patches but I see few checkpatch warnings,
> inluding a typo.
> Sometimes I fix them myself but I guess you would prefer checking it.

Okay; I will fix them all.

--yliu

[dpdk-dev] [PATCH v6 2/2] vhost: Add VHOST PMD

2016-02-02 Thread Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost. It means librte_vhost is also needed to compile the PMD.
The vhost messages will be handled only when a port is started. So start
a port first, then invoke QEMU.

The PMD has 2 parameters.
 - iface:  The parameter is used to specify a path to connect to a
   virtio-net device.
 - queues: The parameter is used to specify the number of the queues
   virtio-net device has.
   (Default: 1)

Here is an example.
$ ./testpmd -c f -n 4 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' -- -i

To connect above testpmd, here is qemu command example.

$ qemu-system-x86_64 \

-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=1 \
-device virtio-net-pci,netdev=net0,mq=on

Signed-off-by: Tetsuya Mukawa 
---
 config/common_linuxapp  |   6 +
 doc/guides/nics/index.rst   |   1 +
 doc/guides/rel_notes/release_2_2.rst|   2 +
 drivers/net/Makefile|   4 +
 drivers/net/vhost/Makefile  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c   | 920 
 drivers/net/vhost/rte_eth_vhost.h   |  73 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 mk/rte.app.mk   |   8 +-
 9 files changed, 1086 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..357b557 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -514,6 +514,12 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=n
 CONFIG_RTE_LIBRTE_VHOST_DEBUG=n

 #
+# Compile vhost PMD
+# To compile, CONFIG_RTE_LIBRTE_VHOST should be enabled.
+#
+CONFIG_RTE_LIBRTE_PMD_VHOST=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 33c9cea..5819cdb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -47,6 +47,7 @@ Network Interface Controller Drivers
 nfp
 szedata2
 virtio
+vhost
 vmxnet3
 pcap_ring

diff --git a/doc/guides/rel_notes/release_2_2.rst 
b/doc/guides/rel_notes/release_2_2.rst
index bb7d15a..6390b44 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -168,6 +168,8 @@ New Features

 * **Added vhost-user multiple queue support.**

+* **Added vhost PMD.**
+
 * **Added port hotplug support to vmxnet3.**

 * **Added port hotplug support to xenvirt.**
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 6e4497e..4300b93 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -52,5 +52,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

+ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
+endif # $(CONFIG_RTE_LIBRTE_VHOST)
+
 include $(RTE_SDK)/mk/rte.sharelib.mk
 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/drivers/net/vhost/Makefile b/drivers/net/vhost/Makefile
new file mode 100644
index 000..f49a69b
--- /dev/null
+++ b/drivers/net/vhost/Makefile
@@ -0,0 +1,62 @@
+#   BSD LICENSE
+#
+#   Copyright (c) 2010-2016 Intel Corporation.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#

[dpdk-dev] [PATCH v6 1/2] ethdev: Add a new event type to notify a queue state changed event

2016-02-02 Thread Tetsuya Mukawa

This patch adds a below event type.
 - RTE_ETH_EVENT_QUEUE_STATE_CHANGE
This event is used for notifying a queue state changed event.

Signed-off-by: Tetsuya Mukawa 
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..2fbf42a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2661,6 +2661,8 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
 enum rte_eth_event_type {
RTE_ETH_EVENT_UNKNOWN,  /**< unknown event type */
RTE_ETH_EVENT_INTR_LSC, /**< lsc interrupt event */
+   RTE_ETH_EVENT_QUEUE_STATE_CHANGE,
+   /**< queue state changed interrupt */
RTE_ETH_EVENT_MAX   /**< max value of this enum */
 };

-- 
2.1.4

[dpdk-dev] [PATCH v6 0/2] Add VHOST PMD

2016-02-02 Thread Tetsuya Mukawa

The patch introduces a new PMD. This PMD is implemented as thin wrapper
of librte_vhost.

PATCH v6 changes:
 - Remove rte_vhost_driver_pmd_callback_registe().
 - Support link status interrupt.
 - Support queue state changed interrupt.
 - Add rte_eth_vhost_get_queue_event().
 - Support numa node detection when new device is connected.

PATCH v5 changes:
 - Rebase on latest master.
 - Fix RX/TX routine to count RX/TX bytes.
 - Fix RX/TX routine not to count as error packets if enqueue/dequeue
   cannot send all packets.
 - Fix if-condition checking for multiqueues.
 - Add "static" to pthread variable.
 - Fix format.
 - Change default behavior not to receive queueing event from driver.
 - Split the patch to separate rte_eth_vhost_portid2vdev().

PATCH v4 changes:
 - Rebase on latest DPDK tree.
 - Fix cording style.
 - Fix code not to invoke multiple messaging handling threads.
 - Fix code to handle vdev parameters correctly.
 - Remove needless cast.
 - Remove needless if-condition before rt_free().

PATCH v3 changes:
 - Rebase on latest matser
 - Specify correct queue_id in RX/TX function.

PATCH v2 changes:
 - Remove a below patch that fixes vhost library.
   The patch was applied as a separate patch.
   - vhost: fix crash with multiqueue enabled
 - Fix typos.
   (Thanks to Thomas, Monjalon)
 - Rebase on latest tree with above bernard's patches.

PATCH v1 changes:
 - Support vhost multiple queues.
 - Rebase on "remove pci driver from vdevs".
 - Optimize RX/TX functions.
 - Fix resource leaks.
 - Fix compile issue.
 - Add patch to fix vhost library.

RFC PATCH v3 changes:
 - Optimize performance.
   In RX/TX functions, change code to access only per core data.
 - Add below API to allow user to use vhost library APIs for a port managed
   by vhost PMD. There are a few limitations. See "rte_eth_vhost.h".
- rte_eth_vhost_portid2vdev()
   To support this functionality, vhost library is also changed.
   Anyway, if users doesn't use vhost PMD, can fully use vhost library APIs.
 - Add code to support vhost multiple queues.
   Actually, multiple queues functionality is not enabled so far.

RFC PATCH v2 changes:
 - Fix issues reported by checkpatch.pl
   (Thanks to Stephen Hemminger)



Tetsuya Mukawa (2):
  ethdev: Add a new event type to notify a queue state changed event
  vhost: Add VHOST PMD

 config/common_linuxapp  |   6 +
 doc/guides/nics/index.rst   |   1 +
 doc/guides/rel_notes/release_2_2.rst|   2 +
 drivers/net/Makefile|   4 +
 drivers/net/vhost/Makefile  |  62 ++
 drivers/net/vhost/rte_eth_vhost.c   | 920 
 drivers/net/vhost/rte_eth_vhost.h   |  73 +++
 drivers/net/vhost/rte_pmd_vhost_version.map |  11 +
 lib/librte_ether/rte_ethdev.h   |   2 +
 mk/rte.app.mk   |   8 +-
 10 files changed, 1088 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vhost/Makefile
 create mode 100644 drivers/net/vhost/rte_eth_vhost.c
 create mode 100644 drivers/net/vhost/rte_eth_vhost.h
 create mode 100644 drivers/net/vhost/rte_pmd_vhost_version.map

-- 
2.1.4

[dpdk-dev] [PATCH v2 3/3] doc: update release note for fm10k FTAG support

2016-02-02 Thread Wang Xiao W

Update the release note.

Signed-off-by: Wang Xiao W 
---
 doc/guides/rel_notes/release_2_3.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..2c8b6f9 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,7 @@ DPDK Release 2.3
 New Features
 

+* **Added fm10k FTAG based forwarding support.**

 Resolved Issues
 ---
-- 
1.9.3

[dpdk-dev] [PATCH v2 2/3] doc: add introduction for fm10k FTAG based forwarding

2016-02-02 Thread Wang Xiao W

Add a brief introduction on FTAG, describe what's FTAG and how it works
in forwarding.

Signed-off-by: Wang Xiao W 
---
 doc/guides/nics/fm10k.rst | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..6f61482 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -1,5 +1,5 @@
 ..  BSD LICENSE
-Copyright(c) 2015 Intel Corporation. All rights reserved.
+Copyright(c) 2015-2016 Intel Corporation. All rights reserved.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
@@ -34,6 +34,20 @@ FM10K Poll Mode Driver
 The FM10K poll mode driver library provides support for the Intel FM1
 (FM10K) family of 40GbE/100GbE adapters.

+FTAG Based Forwarding of FM10K
+--
+
+FTAG Based Forwarding is a unique feature of FM10K. The FM10K family of NICs
+support the addition of a Fabric Tag (FTAG) to carry special information.
+The FTAG is placed at the beginning of the frame, it contains information
+such as where the packet comes from and goes, and the vlan tag. In FTAG based
+forwarding mode, the switch logic forwards packets according to glort (global
+resource tag) information, rather than the mac and vlan table. Currently this
+feature works only on PF.
+
+To enable this feature, turn ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD`` to y in the
+configuration file, and the application should make sure an appropriate FTAG is
+inserted for every frame on TX side.

 Limitations
 ---
-- 
1.9.3

[dpdk-dev] [PATCH v2 1/3] fm10k: enable FTAG based forwarding

2016-02-02 Thread Wang Xiao W

This patch enables reading sglort info into mbuf for RX and inserting
an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
selected from rte_mbuf structure for sglort is not used in fm10k now.
In FTAG based forwarding mode, the switch will forward packets according
to glort info in FTAG rather than mac and vlan table.

To activate this feature, user needs to turn 
``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
to y in common_linuxapp or common_bsdapp. Currently this feature is supported
only on PF, because FM10K_PFVTCTL register is read-only for VF.

Signed-off-by: Wang Xiao W 
---
 config/common_bsdapp   |  1 +
 config/common_linuxapp |  1 +
 drivers/net/fm10k/fm10k_ethdev.c   | 12 
 drivers/net/fm10k/fm10k_rxtx.c | 17 +
 drivers/net/fm10k/fm10k_rxtx_vec.c |  9 +
 5 files changed, 40 insertions(+)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index ed7c31c..451f81a 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
+CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n

 #
 # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..c928bce 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -207,6 +207,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
 CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
+CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n

 #
 # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index e4aed94..3a15c24 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -668,6 +668,18 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
return -1;
}
+#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
+   /* Enable use of FTAG bit in TX descriptor, PFVTCTL
+* register is read-only for VF.
+*/
+   if (hw->mac.type == fm10k_mac_pf)
+   FM10K_WRITE_REG(hw, FM10K_PFVTCTL(i),
+   FM10K_PFVTCTL_FTAG_DESC_ENABLE);
+   else {
+   PMD_INIT_LOG(ERR, "FTAG is not supported in VF.\n");
+   return -1;
+   }
+#endif

/* set location and size for descriptor ring */
FM10K_WRITE_REG(hw, FM10K_TDBAL(i),
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index e958865..f87987d 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -152,6 +152,13 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 */
mbuf->ol_flags |= PKT_RX_VLAN_PKT;
mbuf->vlan_tci = desc.w.vlan;
+#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
+   /**
+* mbuf->vlan_tci_outer is an idle field in fm10k driver,
+* so it can be selected to store sglort value.
+*/
+   mbuf->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
+#endif

rx_pkts[count] = mbuf;
if (++next_dd == q->nb_desc) {
@@ -307,6 +314,13 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 */
mbuf->ol_flags |= PKT_RX_VLAN_PKT;
first_seg->vlan_tci = desc.w.vlan;
+#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
+   /**
+* mbuf->vlan_tci_outer is an idle field in fm10k driver,
+* so it can be selected to store sglort value.
+*/
+   first_seg->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
+#endif

/* Prefetch data of first segment, if configured to do so. */
rte_packet_prefetch((char *)first_seg->buf_addr +
@@ -432,6 +446,9 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, 
struct rte_mbuf *mb)
q->nb_free -= mb->nb_segs;

q->hw_ring[q->next_free].flags = 0;
+#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
+   q->hw_ring[q->next_free].flags |= FM10K_TXD_FLAG_FTAG;
+#endif
/* set checksum flags on first descriptor of packet. SCTP checksum
 * offload is not supported, but we do not explicitly check for this
 * case in favor of greatly simplified processing. */
diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
b/drivers/net/fm10k/fm10k_rxtx_vec.c
index 2a57eef..0b0f2e3 100644
--- a/drivers/net/fm10k/fm10k_rxtx_vec.c
+++ b/drivers/net/fm10k/fm10k_rxtx_vec.c
@@ -198,7 +198,12 @@ fm10k_rx_vec_condition_check(struct rte_eth_dev *dev)
rxmode->header_split == 1)
return -1;

[dpdk-dev] [PATCH v2 0/3] fm10k: enable FTAG based forwarding

2016-02-02 Thread Wang Xiao W

v2:
* Gave an error message for VF FTAG use case.

* Added a notice in the doc to emphasize that application should ensure
  an appropriate FTAG for every frame in FTAG based forwarding mode.

Wang Xiao W (3):
  fm10k: enable FTAG based forwarding
  doc: add introduction for fm10k FTAG based forwarding
  doc: update release note for fm10k FTAG support

 config/common_bsdapp |  1 +
 config/common_linuxapp   |  1 +
 doc/guides/nics/fm10k.rst| 16 +++-
 doc/guides/rel_notes/release_2_3.rst |  1 +
 drivers/net/fm10k/fm10k_ethdev.c | 12 
 drivers/net/fm10k/fm10k_rxtx.c   | 17 +
 drivers/net/fm10k/fm10k_rxtx_vec.c   |  9 +
 7 files changed, 56 insertions(+), 1 deletion(-)

-- 
1.9.3

[dpdk-dev] [PATCH] cryptodev: add capabilities discovery mechanism

2016-02-02 Thread De Lara Guarch, Pablo

Hi Declan,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Declan Doherty
> Sent: Saturday, January 30, 2016 1:11 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] cryptodev: add capabilities discovery
> mechanism

[...]

> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
> @@ -572,6 +572,7 @@ aesni_mb_pmd_dequeue_burst(void *queue_pair,
>  }
> 
> 
> +

Remove this blank line.

>  static int cryptodev_aesni_mb_uninit(const char *name);
> 
>  static int
> @@ -622,6 +623,24 @@ cryptodev_aesni_mb_create(const char *name,
> unsigned socket_id)

[...]

> diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> index 96d22f6..368a803 100644
> --- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> +++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
> @@ -38,6 +38,87 @@
> 
>  #include "rte_aesni_mb_pmd_private.h"
> 
> +
> +static const struct rte_cryptodev_capabilities aesni_mb_pmd_capabilities[]
> = {
> + {   /* MD5 HMAC */
> + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
> + .sym = {
> + .type = RTE_CRYPTO_XFORM_AUTH,
> + .auth = {
> + RTE_CRYPTO_AUTH_MD5_HMAC, 64, 16, 12,
> { 0 }

I would use the variable names to set the values,
it will be longer but I find it more readable.

> + }
> + }

[...]

> diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> index 892375d..61a162b 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -91,12 +91,131 @@ enum rte_cryptodev_type {
>  #define CDEV_PMD_TRACE(fmt, args...)
>  #endif
> 
> +
> +/**
> + * Symmetric Crypto Capability
> + */
> +struct rte_cryptodev_symmetric_capability {
> + enum rte_crypto_xform_type type;
> + /**< Transform type : Authentication / Cipher */
> +
> + union {
> + struct {
> + enum rte_crypto_auth_algorithm algo;
> + /**< authentication algorithm */
> + uint16_t block;

Suggest to rename most variables to x_size (i.e. block_size).

> + /**< algorithm block size */
> + uint16_t key;
> + /**< Key size supported */


In cipher structure, key size is a structure supporting a range of sizes,
authentication key does not need it?

> + uint16_t digest;
> + /**< Maximum digest size supported */
> + struct {
> + uint16_t min;   /**< minimum aad size */
> + uint16_t max;   /**< maximum aad size */
> + uint16_t increment;
> + /**< if a range of sizes are supported,
> +  * this parameter is used to indicate
> +  * increments in byte size that are supported
> +  * between the minimum and maximum */
> + } aad;

[...]

> +/**
> + * Get the name of a crypto device feature flag
> + *
> + * @parammaskThe mask describing the flag.

Wrong parameter name.

> + *
> + * @return
> + *   The name of this flag, or NULL if it's not a valid feature flag.
> + */
> +
> +extern const char *
> +rte_cryptodev_get_feature_name(uint64_t flag);
> +

[...]

> diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map
> b/lib/librte_cryptodev/rte_cryptodev_version.map
> index ff8e93d..9cd12cf 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_version.map
> +++ b/lib/librte_cryptodev/rte_cryptodev_version.map
> @@ -10,6 +10,7 @@ DPDK_2.2 {
>   rte_cryptodev_configure;
>   rte_cryptodev_create_vdev;
>   rte_cryptodev_get_dev_id;
> + rte_cryptodev_get_feature_name;
>   rte_cryptodev_info_get;
>   rte_cryptodev_pmd_allocate;
>   rte_cryptodev_pmd_callback_process;
> @@ -27,6 +28,5 @@ DPDK_2.2 {
>   rte_cryptodev_queue_pair_setup;
>   rte_cryptodev_queue_pair_start;
>   rte_cryptodev_queue_pair_stop;
> -

No need to remove the blank line here.

>   local: *;
>  };
> \ No newline at end of file
> --
> 2.5.0

[dpdk-dev] [PATCH] mempool: Reduce rte_mempool structure size

2016-02-02 Thread Keith Wiles

The rte_mempool structure is changed, which will cause an ABI change
for this structure.

Allow mempool cache support to be dynamic depending on if the
mempool being created needs cache support. Saves about 1.5M of
memory used by the rte_mempool structure. Performance does not seem
to be effected running l3fwd and the test_mempool execution passed.

Allocating small mempools which do not require cache can consume
larges amounts of memory if you have a number of these mempools.

Signed-off-by: Keith Wiles 
---
 app/test/test_mempool.c  |  4 +--
 lib/librte_mempool/rte_mempool.c | 56 ++--
 lib/librte_mempool/rte_mempool.h | 29 ++---
 3 files changed, 40 insertions(+), 49 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 72f8fb6..7b479f8 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -122,8 +122,8 @@ test_mempool_basic(void)
return -1;

printf("get private data\n");
-   if (rte_mempool_get_priv(mp) !=
-   (char*) mp + MEMPOOL_HEADER_SIZE(mp, mp->pg_num))
+   if (rte_mempool_get_priv(mp) != (char *)mp +
+   MEMPOOL_HEADER_SIZE(mp, mp->pg_num, mp->cache_size))
return -1;

printf("get physical address of an object\n");
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index aff5f6d..bdf8e2e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -450,15 +450,11 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
int page_size = getpagesize();

/* compilation-time checks */
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
  RTE_CACHE_LINE_MASK) != 0);
-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_cache) &
  RTE_CACHE_LINE_MASK) != 0);
-   RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, local_cache) &
- RTE_CACHE_LINE_MASK) != 0);
-#endif
-#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
RTE_BUILD_BUG_ON((sizeof(struct rte_mempool_debug_stats) &
  RTE_CACHE_LINE_MASK) != 0);
RTE_BUILD_BUG_ON((offsetof(struct rte_mempool, stats) &
@@ -527,9 +523,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 */
int head = sizeof(struct rte_mempool);
int new_size = (private_data_size + head) % page_size;
-   if (new_size) {
+   if (new_size)
private_data_size += page_size - new_size;
-   }
}

/* try to allocate tailq entry */
@@ -544,7 +539,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * store mempool objects. Otherwise reserve a memzone that is large
 * enough to hold mempool header and metadata plus mempool objects.
 */
-   mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num) + private_data_size;
+   mempool_size = MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size);
+   mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
if (vaddr == NULL)
mempool_size += (size_t)objsz.total_size * n;
@@ -598,8 +594,15 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->cache_flushthresh = CALC_CACHE_FLUSHTHRESH(cache_size);
mp->private_data_size = private_data_size;

+   /*
+* local_cache pointer is set even if cache_size is zero.
+* The local_cache points to just past the elt_pa[] array.
+*/
+   mp->local_cache = (struct rte_mempool_cache *)
+   ((char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
+
/* calculate address of the first element for continuous mempool. */
-   obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num) +
+   obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
private_data_size;
obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);

@@ -613,9 +616,8 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
mp->elt_va_start = (uintptr_t)obj;
mp->elt_pa[0] = mp->phys_addr +
(mp->elt_va_start - (uintptr_t)mp);
-
-   /* mempool elements in a separate chunk of memory. */
} else {
+   /* mempool elements in a separate chunk of memory. */
mp->elt_va_start = (uintptr_t)vaddr;
memcpy(mp->elt_pa, paddr, sizeof (mp->elt_pa[0]) * pg_num);
}
@@ -645,19 +647,15 @@ unsigned
 rte_mempool_count(const struct rte_mempool *mp)
 {
unsigned count;
+   unsigned lcore_id;

count = rte_ring_count(mp->ring);

-#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
-   {
-   unsigned

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Yuanhan Liu

On Tue, Feb 02, 2016 at 06:50:18AM +0100, David Marchand wrote:
> On Tue, Feb 2, 2016 at 6:43 AM, Yuanhan Liu  
> wrote:
> > On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
> >> Current use-case is virtio: It is used as io_bar which is first
> >> bar[1]. But implementation is generic, can be used to do rd/wr for
> >> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
> >> w/o mapping that bar, So apis will be useful for such cases in future.
> >>
> >> AFAIU: uio has read/write_config api only and Yes if bar region mapped
> >> then no need to do rd/wr, user can directly access the pci_memory. But
> >> use-case of this api entirely different: unmapped memory by
> >> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
> >>
> >> Is above explanation convincing? Pl. let me know.
> >
> > TBH, not really. So, as you stated, it should be generic APIs to
> > read/write bar space, but limiting it to VFIO only and claiming
> > that read/write bar space is not support by other drivers (such
> > as UIO) while in fact it can (in some ways) doesn't seem right
> > to me.
> >
> > Anyway, it's just some thoughts from me. David, comments?
> 
> >From the very start, same opinion.
> We should have a unique api to access those, and eal should hide
> details like kernel drivers (uio, vfio, whatever) to the pmd.
> 
> Now the thing is, how to do this in an elegant and efficient way.

I was thinking that we may just make it be IO port specific read/
write functions:

rte_eal_pci_ioport_read(dev, bar, buf, size)
{

return if not an IO bar;

if (has io)
return inb/w/l();

if (vfio)
return vfio_ioport_read();

else, claim aloud that io port read is not allowed
}

Let us not handle memory bar resource here: in such case, you should
go with rte_eal_pci_map_device() and do it with memory mapped io.

Does that make any sense?

--yliu

[dpdk-dev] [PATCH 2/2] ethdev: Export rte_eth_dev_create_unique_device_name() to public API

2016-02-02 Thread krytarow...@caviumnetworks.com

From: Kamil Rytarowski 

Once pci_drv.devinit is overloaded, it's a function used in the original
rte_eth_dev_init(), still reusable in altered versions.

Signed-off-by: Kamil Rytarowski 
---
 lib/librte_ether/rte_ethdev.c |  2 +-
 lib/librte_ether/rte_ethdev.h | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ac4aeab..7f5e741 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -214,7 +214,7 @@ rte_eth_dev_allocate(const char *name, enum 
rte_eth_dev_type type)
return eth_dev;
 }

-static int
+int
 rte_eth_dev_create_unique_device_name(char *name, size_t size,
struct rte_pci_device *pci_dev)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 8710dd7..b19db9d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -3880,6 +3880,24 @@ rte_eth_dma_zone_reserve(const struct rte_eth_dev 
*eth_dev, const char *name,
 uint16_t queue_id, size_t size,
 unsigned align, int socket_id);

+/**
+ * Create unique device name
+ *
+ * @param name
+ *   The port identifier of the Ethernet device.
+ * @param size
+ *   Maximum string length of the generated name
+ * @param pci_dev
+ *   PCI device pointer
+ *
+ * @return
+ *   - 0: Success.
+ *   - <0: Error during generatin
+ *   - -EINVAL: Invalid input parameters.
+ */
+int rte_eth_dev_create_unique_device_name(char *name, size_t size,
+ struct rte_pci_device *pci_dev);
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.9.1

[dpdk-dev] [PATCH 1/2] ethdev: Allow to overload pci_drv.devinit and pci_drv.devuninit

2016-02-02 Thread krytarow...@caviumnetworks.com

From: Kamil Rytarowski 

This change enables drivers needing custom pci (de)initialization functions
through the standard callback overloading.

For example:

 /*
  * virtual function driver struct
  */
 static struct eth_driver rte_DRIVER_pmd = {
{
.name = "rte_DRIVER_pmd",
.id_table = pci_id_DRIVER_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
.devinit = DRIVER_pci_devinit,
},
.eth_dev_init = eth_DRIVER_dev_init,
.dev_private_size = sizeof(struct DRIVER),
 };

Use-case is as follows: NIC offers several pci virtual functions, while
one of them is to be treated as port, we need to configure the rest in a
specific way for particular device for full interface (port) functionality.

Signed-off-by: Kamil Rytarowski 
---
 lib/librte_ether/rte_ethdev.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 756b234..ac4aeab 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -351,8 +351,10 @@ rte_eth_dev_uninit(struct rte_pci_device *pci_dev)
 void
 rte_eth_driver_register(struct eth_driver *eth_drv)
 {
-   eth_drv->pci_drv.devinit = rte_eth_dev_init;
-   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
+   if (!eth_drv->pci_drv.devinit)
+   eth_drv->pci_drv.devinit = rte_eth_dev_init;
+   if (!eth_drv->pci_drv.devuninit)
+   eth_drv->pci_drv.devuninit = rte_eth_dev_uninit;
rte_eal_pci_register(_drv->pci_drv);
 }

-- 
1.9.1

[dpdk-dev] [PATCH v2] cfgfile: support looking up sections by index

2016-02-02 Thread Dumitrescu, Cristian



> -Original Message-
> From: Rich Lane [mailto:rich.lane at bigswitch.com]
> Sent: Tuesday, January 19, 2016 4:42 AM
> To: dev at dpdk.org
> Cc: Dumitrescu, Cristian ; Panu Matilainen
> 
> Subject: [PATCH v2] cfgfile: support looking up sections by index
> 
> This is useful when sections have duplicate names.

Hi Rich,

You are right, I can see this is needed to allow multiple sections with 
identical name in the same configuration file. When sections with the same name 
are not allowed, then this is not needed, as the current API is sufficient.

To me, having several sections with the same name does not look like a good 
idea, in fact it might look like an application design flaw, as differentiating 
between sections with the same name can only done based on the position of the 
section in the configuration file, which is an error prone option. Some 
examples: 
1. While maintaining a large config file, keeping a specific section at a fixed 
position quickly becomes a pain, and shifting the position of the section up or 
down can completely change the application behavior;
2. Using basic pre-processors (like CPP or M4) or scripts, the master 
configuration file can include other configuration files, with the inclusion of 
each decided at run-time based on application command line parameters, so the 
position of certain sections is actually not known until run-time.

Can you provide some examples when having multiple sections with the same name 
is a key requirement?

A straight forward workaround to having multiple sections with the same name is 
to add a number to the name of each such section. Using the current API, all 
the sections with the same prefix name can be read gracefully. As an example, 
the ip_pipeline application allows multiple sections with the same name prefix 
but a different number prefix:
PIPELINE0, PIPELINE1, ...
LINK0, LINK1, ...
MEMPOOL0, MEMPOOL1, ...
RXQ0.0, RXQ0.1, RXQ1.0, ...
TXQ0.0, TXQ0.1, TXQ1.0, ...

Is there a reason why this approach is not acceptable for your application?

Regards,
Cristian

> 
> Signed-off-by: Rich Lane 
> ---
> v1->v2:
> - Added new symbol to version script.
> 
>  lib/librte_cfgfile/rte_cfgfile.c   | 16 
>  lib/librte_cfgfile/rte_cfgfile.h   | 23 +++
>  lib/librte_cfgfile/rte_cfgfile_version.map |  6 ++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/lib/librte_cfgfile/rte_cfgfile.c 
> b/lib/librte_cfgfile/rte_cfgfile.c
> index a677dad..0bb40a4 100644
> --- a/lib/librte_cfgfile/rte_cfgfile.c
> +++ b/lib/librte_cfgfile/rte_cfgfile.c
> @@ -333,6 +333,22 @@ rte_cfgfile_section_entries(struct rte_cfgfile *cfg,
> const char *sectionname,
>   return i;
>  }
> 
> +int
> +rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg, int index,
> + struct rte_cfgfile_entry *entries, int max_entries)
> +{
> + int i;
> + const struct rte_cfgfile_section *sect;
> +
> + if (index >= cfg->num_sections)
> + return -1;
> +
> + sect = cfg->sections[index];
> + for (i = 0; i < max_entries && i < sect->num_entries; i++)
> + entries[i] = *sect->entries[i];
> + return i;
> +}
> +
>  const char *
>  rte_cfgfile_get_entry(struct rte_cfgfile *cfg, const char *sectionname,
>   const char *entryname)
> diff --git a/lib/librte_cfgfile/rte_cfgfile.h 
> b/lib/librte_cfgfile/rte_cfgfile.h
> index d443782..8e69971 100644
> --- a/lib/librte_cfgfile/rte_cfgfile.h
> +++ b/lib/librte_cfgfile/rte_cfgfile.h
> @@ -155,6 +155,29 @@ int rte_cfgfile_section_entries(struct rte_cfgfile
> *cfg,
>   struct rte_cfgfile_entry *entries,
>   int max_entries);
> 
> +/** Get section entries as key-value pairs
> +*
> +* The index of a section is the same as the index of its name in the
> +* result of rte_cfgfile_sections. This API can be used when there are
> +* multiple sections with the same name.
> +*
> +* @param cfg
> +*   Config file
> +* @param index
> +*   Section index
> +* @param entries
> +*   Pre-allocated array of at least max_entries entries where the section
> +*   entries are stored as key-value pair after successful invocation
> +* @param max_entries
> +*   Maximum number of section entries to be stored in entries array
> +* @return
> +*   Number of entries populated on success, negative error code otherwise
> +*/
> +int rte_cfgfile_section_entries_by_index(struct rte_cfgfile *cfg,
> + int index,
> + struct rte_cfgfile_entry *entries,
> + int max_entries);
> +
>  /** Get value of the named entry in named config file section
>  *
>  * @param cfg
> diff --git a/lib/librte_cfgfile/rte_cfgfile_version.map
> b/lib/librte_cfgfile/rte_cfgfile_version.map
> index bf6c6fd..f6a27a9 100644
> --- a/lib/librte_cfgfile/rte_cfgfile_version.map
> +++ b/lib/librte_cfgfile/rte_cfgfile_version.map
> @@ -13,3 +13,9 @@ DPDK_2.0 {
> 
>   local: *;
>  };
> +
> +DPDK_2.3 {
> + global:
> +
> + rte_cfgfile_section_entries_by_index;
>

[dpdk-dev] [PATCH v2 7/7] app/testpmd: add CLIs for E-tag operation

2016-02-02 Thread Wenzhuo Lu

Add the CLIs to support the E-tag operation.
1, Offloading of E-tag insertion and stripping.
2, Forwarding the E-tag packets to pools based on the GRP and E-CID_base.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c | 340 +
 1 file changed, 340 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d409934..df07e60 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -9885,6 +9885,340 @@ cmdline_parse_inst_t 
cmd_config_l2_tunnel_en_dis_specific = {
},
 };

+/* E-tag configuration */
+
+/* Common result structure for all E-tag configuration */
+struct cmd_config_e_tag_result {
+   cmdline_fixed_string_t e_tag;
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t insertion;
+   cmdline_fixed_string_t stripping;
+   cmdline_fixed_string_t forwarding;
+   cmdline_fixed_string_t filter;
+   cmdline_fixed_string_t add;
+   cmdline_fixed_string_t del;
+   cmdline_fixed_string_t on;
+   cmdline_fixed_string_t off;
+   cmdline_fixed_string_t on_off;
+   cmdline_fixed_string_t port_tag_id;
+   uint32_t port_tag_id_val;
+   cmdline_fixed_string_t e_tag_id;
+   uint16_t e_tag_id_val;
+   cmdline_fixed_string_t dst_pool;
+   uint8_t dst_pool_val;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+/* Common CLI fields for all E-tag configuration */
+cmdline_parse_token_string_t cmd_config_e_tag_e_tag =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+e_tag, "E-tag");
+cmdline_parse_token_string_t cmd_config_e_tag_set =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+set, "set");
+cmdline_parse_token_string_t cmd_config_e_tag_insertion =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+insertion, "insertion");
+cmdline_parse_token_string_t cmd_config_e_tag_stripping =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+stripping, "stripping");
+cmdline_parse_token_string_t cmd_config_e_tag_forwarding =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+forwarding, "forwarding");
+cmdline_parse_token_string_t cmd_config_e_tag_filter =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+filter, "filter");
+cmdline_parse_token_string_t cmd_config_e_tag_add =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+add, "add");
+cmdline_parse_token_string_t cmd_config_e_tag_del =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+del, "del");
+cmdline_parse_token_string_t cmd_config_e_tag_on =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+on, "on");
+cmdline_parse_token_string_t cmd_config_e_tag_off =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+off, "off");
+cmdline_parse_token_string_t cmd_config_e_tag_on_off =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+on_off, "on#off");
+cmdline_parse_token_string_t cmd_config_e_tag_port_tag_id =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+port_tag_id, "port-tag-id");
+cmdline_parse_token_num_t cmd_config_e_tag_port_tag_id_val =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_e_tag_result,
+port_tag_id_val, UINT32);
+cmdline_parse_token_string_t cmd_config_e_tag_e_tag_id =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+e_tag_id, "e-tag-id");
+cmdline_parse_token_num_t cmd_config_e_tag_e_tag_id_val =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_e_tag_result,
+e_tag_id_val, UINT16);
+cmdline_parse_token_string_t cmd_config_e_tag_dst_pool =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+dst_pool, "dst-pool");
+cmdline_parse_token_num_t cmd_config_e_tag_dst_pool_val =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_e_tag_result,
+dst_pool_val, UINT8);
+cmdline_parse_token_string_t cmd_config_e_tag_port =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_e_tag_result,
+port, "port");
+cmdline_parse_token_num_t cmd_config_e_tag_port_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_e_tag_result,
+port_id, UINT8);
+
+/* E-tag insertion configuration */
+static void
+cmd_config_e_tag_insertion_en_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+

[dpdk-dev] [PATCH v2 6/7] ixgbe: support l2 tunnel operation

2016-02-02 Thread Wenzhuo Lu

Add support of l2 tunnel operation.
Support enabling/disabling l2 tunnel tag insertion/stripping.
Support enabling/disabling l2 tunnel packets forwarding.
Support adding/deleting forwarding rules for l2 tunnel packets.
Only support E-tag now.

Also update the release note.

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_2_3.rst |   6 +
 drivers/net/ixgbe/ixgbe_ethdev.c | 357 +++
 2 files changed, 363 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..16bd80b 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,12 @@ DPDK Release 2.3
 New Features
 

+* **Added support for E-tag on X550.**
+
+  * Support E-tag offloading of insertion and stripping.
+  * Support Forwarding E-tag packets to pools based on
+GRP and E-CID_base.
+

 Resolved Issues
 ---
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 55ab474..35e76a2 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -139,10 +139,17 @@
 #define IXGBE_CYCLECOUNTER_MASK   0xULL

 #define IXGBE_VT_CTL_POOLING_MODE_MASK 0x0003
+#define IXGBE_VT_CTL_POOLING_MODE_ETAG 0x0001
 #define DEFAULT_ETAG_ETYPE 0x893f
 #define IXGBE_ETAG_ETYPE   0x5084
 #define IXGBE_ETAG_ETYPE_MASK  0x
 #define IXGBE_ETAG_ETYPE_VALID 0x8000
+#define IXGBE_RAH_ADTYPE   0x4000
+#define IXGBE_RAL_ETAG_FILTER_MASK 0x3fff
+#define IXGBE_VMVIR_TAGA_MASK  0x1800
+#define IXGBE_VMVIR_TAGA_ETAG_INSERT   0x0800
+#define IXGBE_VMTIR(_i) (0x00017000 + ((_i) * 4)) /* 64 of these (0-63) */
+#define IXGBE_QDE_STRIP_TAG0x0004

 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
@@ -351,6 +358,30 @@ static int ixgbe_dev_l2_tunnel_enable
 static int ixgbe_dev_l2_tunnel_disable
(struct rte_eth_dev *dev,
 enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_insertion_enable
+   (struct rte_eth_dev *dev, struct rte_eth_l2_tunnel *l2_tunnel);
+static int ixgbe_dev_l2_tunnel_insertion_disable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_stripping_enable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_stripping_disable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_forwarding_enable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_forwarding_disable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_filter_add
+   (struct rte_eth_dev *dev,
+struct rte_eth_l2_tunnel *l2_tunnel,
+uint32_t pool);
+static int ixgbe_dev_l2_tunnel_filter_del
+   (struct rte_eth_dev *dev,
+struct rte_eth_l2_tunnel *l2_tunnel);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -512,6 +543,14 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.l2_tunnel_eth_type_conf = ixgbe_dev_l2_tunnel_eth_type_conf,
.l2_tunnel_enable= ixgbe_dev_l2_tunnel_enable,
.l2_tunnel_disable   = ixgbe_dev_l2_tunnel_disable,
+   .l2_tunnel_insertion_enable   = ixgbe_dev_l2_tunnel_insertion_enable,
+   .l2_tunnel_insertion_disable  = ixgbe_dev_l2_tunnel_insertion_disable,
+   .l2_tunnel_stripping_enable   = ixgbe_dev_l2_tunnel_stripping_enable,
+   .l2_tunnel_stripping_disable  = ixgbe_dev_l2_tunnel_stripping_disable,
+   .l2_tunnel_forwarding_enable  = ixgbe_dev_l2_tunnel_forwarding_enable,
+   .l2_tunnel_forwarding_disable = ixgbe_dev_l2_tunnel_forwarding_disable,
+   .l2_tunnel_filter_add = ixgbe_dev_l2_tunnel_filter_add,
+   .l2_tunnel_filter_del = ixgbe_dev_l2_tunnel_filter_del,
 };

 /*
@@ -6341,6 +6380,324 @@ ixgbe_dev_l2_tunnel_disable(struct rte_eth_dev *dev,
return ret;
 }

+static int
+ixgbe_e_tag_filter_add(struct rte_eth_dev *dev,
+  struct rte_eth_l2_tunnel *l2_tunnel,
+  uint32_t pool)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   u32 i, rar_entries;
+   u32 rar_low, rar_high;
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   rar_entries = ixgbe_get_num_rx_addrs(hw);
+
+   for (i = 1; i < rar_entries; i++) {
+   rar_high =

[dpdk-dev] [PATCH v2 4/7] app/testpmd: add CLIs for l2 tunnel config

2016-02-02 Thread Wenzhuo Lu

Add CLIs to config ether type of l2 tunnel, and to enable/disable
a type of l2 tunnel.
Now only e-tag tunnel is supported.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c | 259 +
 1 file changed, 259 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..d409934 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -9630,6 +9630,261 @@ cmdline_parse_inst_t cmd_mcast_addr = {
},
 };

+/* l2 tunnel config
+ * only support E-tag now.
+ */
+
+/* Ether type config */
+struct cmd_config_l2_tunnel_eth_type_result {
+   cmdline_fixed_string_t port;
+   cmdline_fixed_string_t config;
+   cmdline_fixed_string_t all;
+   uint8_t id;
+   cmdline_fixed_string_t l2_tunnel;
+   cmdline_fixed_string_t l2_tunnel_type;
+   cmdline_fixed_string_t eth_type;
+   uint16_t eth_type_val;
+};
+
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_port =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+port, "port");
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_config =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+config, "config");
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_all_str =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+all, "all");
+cmdline_parse_token_num_t cmd_config_l2_tunnel_eth_type_id =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+id, UINT8);
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_l2_tunnel =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+l2_tunnel, "l2-tunnel");
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_l2_tunnel_type =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+l2_tunnel_type, "E-tag");
+cmdline_parse_token_string_t cmd_config_l2_tunnel_eth_type_eth_type =
+   TOKEN_STRING_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+eth_type, "ether-type");
+cmdline_parse_token_num_t cmd_config_l2_tunnel_eth_type_eth_type_val =
+   TOKEN_NUM_INITIALIZER
+   (struct cmd_config_l2_tunnel_eth_type_result,
+eth_type_val, UINT16);
+
+static uint32_t
+str2fdir_l2_tunnel_type(char *string)
+{
+   uint32_t i = 0;
+
+   static const struct {
+   char str[32];
+   uint32_t type;
+   } l2_tunnel_type_str[] = {
+   {"E-tag", RTE_L2_TUNNEL_TYPE_E_TAG},
+   };
+
+   for (i = 0; i < RTE_DIM(l2_tunnel_type_str); i++) {
+   if (!strcmp(l2_tunnel_type_str[i].str, string))
+   return l2_tunnel_type_str[i].type;
+   }
+   return RTE_L2_TUNNEL_TYPE_NONE;
+}
+
+/* ether type config for all ports */
+static void
+cmd_config_l2_tunnel_eth_type_all_parsed
+   (void *parsed_result,
+__attribute__((unused)) struct cmdline *cl,
+__attribute__((unused)) void *data)
+{
+   struct cmd_config_l2_tunnel_eth_type_result *res = parsed_result;
+   struct rte_eth_l2_tunnel entry;
+   portid_t pid;
+
+   entry.l2_tunnel_type = str2fdir_l2_tunnel_type(res->l2_tunnel_type);
+   entry.ether_type = res->eth_type_val;
+
+   FOREACH_PORT(pid, ports) {
+   rte_eth_dev_l2_tunnel_eth_type_conf(pid, );
+   }
+}
+
+cmdline_parse_inst_t cmd_config_l2_tunnel_eth_type_all = {
+   .f = cmd_config_l2_tunnel_eth_type_all_parsed,
+   .data = NULL,
+   .help_str = "port config all l2-tunnel ether-type",
+   .tokens = {
+   (void *)_config_l2_tunnel_eth_type_port,
+   (void *)_config_l2_tunnel_eth_type_config,
+   (void *)_config_l2_tunnel_eth_type_all_str,
+   (void *)_config_l2_tunnel_eth_type_l2_tunnel,
+   (void *)_config_l2_tunnel_eth_type_l2_tunnel_type,
+   (void *)_config_l2_tunnel_eth_type_eth_type,
+   (void *)_config_l2_tunnel_eth_type_eth_type_val,
+   NULL,
+   },
+};
+
+/* ether type config for a specific port */
+static void
+cmd_config_l2_tunnel_eth_type_specific_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_config_l2_tunnel_eth_type_result *res =
+parsed_result;
+   struct rte_eth_l2_tunnel entry;
+
+   if (port_id_is_invalid(res->id, ENABLED_WARN))
+   return;
+
+   entry.l2_tunnel_type = str2fdir_l2_tunnel_type(res->l2_tunnel_type);
+   entry.ether_type = res->eth_type_val;
+
+   rte_eth_dev_l2_tunnel_eth_type_conf(res->id, );
+}
+
+cmdline_parse_inst_t

[dpdk-dev] [PATCH v2 3/7] ixgbe: support l2 tunnel config

2016-02-02 Thread Wenzhuo Lu

Add support of l2 tunnel configuration.
Support modifying ether type of a type of l2 tunnel.
Support enabling and disabling the support of a type of l2 tunnel.
Only E-tag tunnel is supported now.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 140 +++
 1 file changed, 140 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 83df0c0..55ab474 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -139,6 +139,10 @@
 #define IXGBE_CYCLECOUNTER_MASK   0xULL

 #define IXGBE_VT_CTL_POOLING_MODE_MASK 0x0003
+#define DEFAULT_ETAG_ETYPE 0x893f
+#define IXGBE_ETAG_ETYPE   0x5084
+#define IXGBE_ETAG_ETYPE_MASK  0x
+#define IXGBE_ETAG_ETYPE_VALID 0x8000

 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
@@ -339,6 +343,14 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev 
*dev,
   struct timespec *timestamp);
 static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
   const struct timespec *timestamp);
+static int ixgbe_dev_l2_tunnel_eth_type_conf
+   (struct rte_eth_dev *dev, struct rte_eth_l2_tunnel *l2_tunnel);
+static int ixgbe_dev_l2_tunnel_enable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);
+static int ixgbe_dev_l2_tunnel_disable
+   (struct rte_eth_dev *dev,
+enum rte_eth_l2_tunnel_type l2_tunnel_type);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -497,6 +509,9 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.timesync_adjust_time = ixgbe_timesync_adjust_time,
.timesync_read_time   = ixgbe_timesync_read_time,
.timesync_write_time  = ixgbe_timesync_write_time,
+   .l2_tunnel_eth_type_conf = ixgbe_dev_l2_tunnel_eth_type_conf,
+   .l2_tunnel_enable= ixgbe_dev_l2_tunnel_enable,
+   .l2_tunnel_disable   = ixgbe_dev_l2_tunnel_disable,
 };

 /*
@@ -6201,6 +6216,131 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev,
return 0;
 }

+/* Update e-tag ether type */
+static int
+ixgbe_update_e_tag_eth_type(struct ixgbe_hw *hw,
+   uint16_t ether_type)
+{
+   uint32_t etag_etype;
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   etag_etype = IXGBE_READ_REG(hw, IXGBE_ETAG_ETYPE);
+   etag_etype &= ~IXGBE_ETAG_ETYPE_MASK;
+   etag_etype |= ether_type;
+   IXGBE_WRITE_REG(hw, IXGBE_ETAG_ETYPE, etag_etype);
+   IXGBE_WRITE_FLUSH(hw);
+
+   return 0;
+}
+
+/* Config l2 tunnel ether type */
+static int
+ixgbe_dev_l2_tunnel_eth_type_conf(struct rte_eth_dev *dev,
+ struct rte_eth_l2_tunnel *l2_tunnel)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if (l2_tunnel == NULL)
+   return -EINVAL;
+
+   switch (l2_tunnel->l2_tunnel_type) {
+   case RTE_L2_TUNNEL_TYPE_E_TAG:
+   ret = ixgbe_update_e_tag_eth_type(hw, l2_tunnel->ether_type);
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+/* Enable e-tag tunnel */
+static int
+ixgbe_e_tag_enable(struct ixgbe_hw *hw)
+{
+   uint32_t etag_etype;
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   etag_etype = IXGBE_READ_REG(hw, IXGBE_ETAG_ETYPE);
+   etag_etype |= IXGBE_ETAG_ETYPE_VALID;
+   IXGBE_WRITE_REG(hw, IXGBE_ETAG_ETYPE, etag_etype);
+   IXGBE_WRITE_FLUSH(hw);
+
+   return 0;
+}
+
+/* Enable l2 tunnel */
+static int
+ixgbe_dev_l2_tunnel_enable(struct rte_eth_dev *dev,
+  enum rte_eth_l2_tunnel_type l2_tunnel_type)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   switch (l2_tunnel_type) {
+   case RTE_L2_TUNNEL_TYPE_E_TAG:
+   ret = ixgbe_e_tag_enable(hw);
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+/* Disable e-tag tunnel */
+static int
+ixgbe_e_tag_disable(struct ixgbe_hw *hw)
+{
+   uint32_t etag_etype;
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   etag_etype = IXGBE_READ_REG(hw, IXGBE_ETAG_ETYPE);
+   etag_etype &= ~IXGBE_ETAG_ETYPE_VALID;
+   IXGBE_WRITE_REG(hw,

[dpdk-dev] [PATCH v2 2/7] lib/librte_ether: support l2 tunnel config

2016-02-02 Thread Wenzhuo Lu

Add functions to support l2 tunnel configuration.
The support includes ether type modification and the tunnel support
enabling/disabling.
Ether type modification means modifying the ether type of a specific
type of tunnel. So the packet with this ether type will be parsed as
this type of tunnel.
Enabling/disabling a tunnel support means enabling/disabling the
ability of parsing the specific type of tunnel. This ability should
be enabled before we enable filtering, forwarding, offloading for
this specific type of tunnel.
Only support e-tag tunnel now.

E-tag introduction,
E-tag means external tag which is defined in I 802.1BR
specification.
E-tag is a kind of l2 tunnel. It means a tag will be inserted in the
l2 header. Like below,
   |3124|23   16|15 8|7   0|
  0|   Destination MAC address |
  4| Dest MAC address(cont.)| Src MAC address  |
  8|  Source MAC address(cont.)|
 12| E-tag Etherenet type (0x893f)  |  E-tag header|
 16|E-tag header(cont.)|
 20|   VLAN Ethertype(optional) |   VLAN header(optional)  |
 24| Original type  | ..   |
...|  ..   |
The E-tag format is like below,
   |015|16   18|19 |20   31|
   |   Ethertype - 0x893f  | E-PCP |DEI|   Ingress E-CID_base  |

   |32  33|34 35|36  47|48 55|56 63|
   |  RSV | GRP |E-CID_base|Ingress_E-CID_ext|E-CID_ext|

The Ingess_E-CID_ext and E-CID_ext are always zero for endpoints
and are effectively reserved.

The more details of E-tag is in IEEE 802.1BR. 802.1BR is used to
replace 802.1Qbh. 802.1BR is a standard for Bridge Port Extension.
It specifies the operation of Bridge Port Extenders, including
management, protocols, and algorithms. Bridge Port Extenders
operate in support of the MAC Service by Extended Bridges.
The E-tag is added to l2 header to identify the VM channel and
the virtual port.

Signed-off-by: Wenzhuo Lu 
---
 lib/librte_ether/rte_eth_ctrl.h |  9 +
 lib/librte_ether/rte_ethdev.c   | 61 ++
 lib/librte_ether/rte_ethdev.h   | 84 +
 3 files changed, 154 insertions(+)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index ce224ad..09af6fb 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -804,6 +804,15 @@ struct rte_eth_hash_filter_info {
} info;
 };

+/**
+ * l2 tunnel type.
+ */
+enum rte_eth_l2_tunnel_type {
+   RTE_L2_TUNNEL_TYPE_NONE = 0,
+   RTE_L2_TUNNEL_TYPE_E_TAG,
+   RTE_L2_TUNNEL_TYPE_MAX,
+};
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..1b90e09 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3239,3 +3239,64 @@ rte_eth_copy_pci_info(struct rte_eth_dev *eth_dev, 
struct rte_pci_device *pci_de
eth_dev->data->numa_node = pci_dev->numa_node;
eth_dev->data->drv_name = pci_dev->driver->name;
 }
+
+int
+rte_eth_dev_l2_tunnel_eth_type_conf(uint8_t port_id,
+   struct rte_eth_l2_tunnel *l2_tunnel)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   if (l2_tunnel == NULL) {
+   RTE_PMD_DEBUG_TRACE("Invalid l2_tunnel parameter\n");
+   return -EINVAL;
+   }
+
+   if (l2_tunnel->l2_tunnel_type >= RTE_L2_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid l2 tunnel type\n");
+   return -EINVAL;
+   }
+
+   dev = _eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->l2_tunnel_eth_type_conf,
+   -ENOTSUP);
+   return (*dev->dev_ops->l2_tunnel_eth_type_conf)(dev, l2_tunnel);
+}
+
+int
+rte_eth_dev_l2_tunnel_enable(uint8_t port_id,
+enum rte_eth_l2_tunnel_type l2_tunnel_type)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   if (l2_tunnel_type >= RTE_L2_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid l2 tunnel type\n");
+   return -EINVAL;
+   }
+
+   dev = _eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->l2_tunnel_enable,
+   -ENOTSUP);
+   return (*dev->dev_ops->l2_tunnel_enable)(dev, l2_tunnel_type);
+}
+
+int
+rte_eth_dev_l2_tunnel_disable(uint8_t port_id,
+ enum rte_eth_l2_tunnel_type l2_tunnel_type)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   if (l2_tunnel_type >= RTE_L2_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid l2 tunnel type\n");
+

[dpdk-dev] [PATCH v2 1/7] ixgbe: select pool by MAC when using double VLAN

2016-02-02 Thread Wenzhuo Lu

On X550, as required by datasheet, E-tag packets are not expected
when double VLAN are used. So modify the register PFVTCTL after
enabling double VLAN to select pool by MAC but not MAC or E-tag.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4c4c6df..83df0c0 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -138,6 +138,8 @@

 #define IXGBE_CYCLECOUNTER_MASK   0xULL

+#define IXGBE_VT_CTL_POOLING_MODE_MASK 0x0003
+
 static int eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_ixgbe_dev_uninit(struct rte_eth_dev *eth_dev);
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
@@ -1725,6 +1727,14 @@ ixgbe_vlan_hw_extend_enable(struct rte_eth_dev *dev)
ctrl |= IXGBE_EXTENDED_VLAN;
IXGBE_WRITE_REG(hw, IXGBE_CTRL_EXT, ctrl);

+   /* Clear pooling mode of PFVTCTL. It's required by X550. */
+   if (hw->mac.type == ixgbe_mac_X550 ||
+   hw->mac.type == ixgbe_mac_X550EM_x) {
+   ctrl = IXGBE_READ_REG(hw, IXGBE_VT_CTL);
+   ctrl &= ~IXGBE_VT_CTL_POOLING_MODE_MASK;
+   IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, ctrl);
+   }
+
/*
 * VET EXT field in the EXVET register = 0x8100 by default
 * So no need to change. Same to VT field of DMATXCTL register
-- 
1.9.3

[dpdk-dev] [PATCH v2 0/7] support E-tag offloading and forwarding on Intel X550 NIC

2016-02-02 Thread Wenzhuo Lu

This patch set adds the support of E-tag offloading and forwarding
on X550.
The offloading means E-tag can be inserted and stripped by HW.
And E-tag packets can be recognized and forwarded to specific pools
based on GRP and E-CID_base in E-tag.

Wenzhuo Lu (7):
  ixgbe: select pool by MAC when using double VLAN
  lib/librte_ether: support l2 tunnel config
  ixgbe: support l2 tunnel config
  app/testpmd: add CLIs for l2 tunnel config
  lib/librte_ether: support new l2 tunnel operation
  ixgbe: support l2 tunnel operation
  app/testpmd: add CLIs for E-tag operation

 app/test-pmd/cmdline.c   | 599 +++
 doc/guides/rel_notes/release_2_3.rst |   6 +
 drivers/net/ixgbe/ixgbe_ethdev.c | 507 +
 lib/librte_ether/rte_eth_ctrl.h  |   9 +
 lib/librte_ether/rte_ethdev.c| 239 ++
 lib/librte_ether/rte_ethdev.h| 288 +
 6 files changed, 1648 insertions(+)

-- 
1.9.3

[dpdk-dev] [PATCH v2] i40e: fix vlan filtering

2016-02-02 Thread Julien Meunier

VLAN filtering was always performed, even if hw_vlan_filter was
disabled. During device initialization, default filter
RTE_MACVLAN_PERFECT_MATCH was applied. In this situation, all incoming
VLAN frames were dropped by the card (increase of the register RUPP - Rx
Unsupported Protocol).

In order to restore default behavior, if HW VLAN filtering is activated,
set a filter to match MAC and VLAN. If not, set a filter to only match
MAC.

Signed-off-by: Julien Meunier 
---
Changes since v1:
- use ether_addr_copy() for mac copy
- add more debug messages in case of failure
- update all existing filters when multiple mac addresses have been configured
- when adding new mac address, use correct filter

TODO:
- i40e_update_default_filter_setting always forces to
  RTE_MACVLAN_PERFECT_MATCH.
  => The type of filter should be changed according to vlan filter setting.

- What happens if vlan filter setting changes when various filters are already
  set like RTE_MACVLAN_PERFECT_MATCH, RTE_MACVLAN_PERFECT_MATCH,
  RTE_MAC_HASH_MATCH, RTE_MACVLAN_HASH_MATCH ?
  => With testpmd, it is possible to add manually these filters. But when
  changing vlan filter setting, all previous filter set manually are overriden.
---
drivers/net/i40e/i40e_ethdev.c | 73 --
 drivers/net/i40e/i40e_ethdev.h |  1 +
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..64d6ada 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -2332,6 +2332,13 @@ i40e_vlan_offload_set(struct rte_eth_dev *dev, int mask)
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_vsi *vsi = pf->main_vsi;

+   if (mask & ETH_VLAN_FILTER_MASK) {
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   i40e_vsi_config_vlan_filter(vsi, TRUE);
+   else
+   i40e_vsi_config_vlan_filter(vsi, FALSE);
+   }
+
if (mask & ETH_VLAN_STRIP_MASK) {
/* Enable or disable VLAN stripping */
if (dev->data->dev_conf.rxmode.hw_vlan_strip)
@@ -2583,7 +2590,10 @@ i40e_macaddr_add(struct rte_eth_dev *dev,
}

(void)rte_memcpy(_filter.mac_addr, mac_addr, ETHER_ADDR_LEN);
-   mac_filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   mac_filter.filter_type = RTE_MACVLAN_PERFECT_MATCH;
+   else
+   mac_filter.filter_type = RTE_MAC_PERFECT_MATCH;

if (pool == 0)
vsi = pf->main_vsi;
@@ -4156,6 +4166,63 @@ fail_mem:
return NULL;
 }

+/* Configure vlan filter on or off */
+int
+i40e_vsi_config_vlan_filter(struct i40e_vsi *vsi, bool on)
+{
+   int i, num;
+   struct i40e_mac_filter *f;
+   struct i40e_mac_filter_info *mac_filter;
+   enum rte_mac_filter_type desired_filter;
+   int ret = I40E_SUCCESS;
+
+   if (on) {
+   /* Filter to match MAC and VLAN */
+   desired_filter = RTE_MACVLAN_PERFECT_MATCH;
+   } else {
+   /* Filter to match only MAC */
+   desired_filter = RTE_MAC_PERFECT_MATCH;
+   }
+
+   num = vsi->mac_num;
+
+   mac_filter = rte_zmalloc("mac_filter_info_data",
+num * sizeof(*mac_filter), 0);
+   if (mac_filter == NULL) {
+   PMD_DRV_LOG(ERR, "failed to allocate memory");
+   return I40E_ERR_NO_MEMORY;
+   }
+
+   i = 0;
+
+   /* Remove all existing mac */
+   TAILQ_FOREACH(f, >mac_list, next) {
+   mac_filter[i] = f->mac_info;
+   ret = i40e_vsi_delete_mac(vsi, >mac_info.mac_addr);
+   if (ret) {
+   PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter",
+   on ? "enable" : "disable");
+   goto DONE;
+   }
+   i++;
+   }
+
+   /* Override with new filter */
+   for (i = 0; i < num; i++) {
+   mac_filter[i].filter_type = desired_filter;
+   ret = i40e_vsi_add_mac(vsi, _filter[i]);
+   if (ret) {
+   PMD_DRV_LOG(INFO, "Update VSI failed to %s vlan filter",
+   on ? "enable" : "disable");
+   goto DONE;
+   }
+   }
+
+DONE:
+   rte_free(mac_filter);
+   return ret;
+}
+
 /* Configure vlan stripping on or off */
 int
 i40e_vsi_config_vlan_stripping(struct i40e_vsi *vsi, bool on)
@@ -4203,9 +4270,11 @@ i40e_dev_init_vlan(struct rte_eth_dev *dev)
 {
struct rte_eth_dev_data *data = dev->data;
int ret;
+   int mask = 0;

/* Apply vlan offload setting */
-   i40e_vlan_offload_set(dev, ETH_VLAN_STRIP_MASK);
+   mask = ETH_VLAN_STRIP_MASK | ETH_VLAN_FILTER_MASK;
+

[dpdk-dev] [PATCH v3] eal: add function to check if primary proc alive

2016-02-02 Thread Harry van Haaren

This patch adds a new function to the EAL API:
int rte_eal_primary_proc_alive(const char *path);

The function indicates if a primary process is alive right now.
This functionality is implemented by testing for a write-
lock on the config file, and the function tests for a lock.

The use case for this functionality is that a secondary
process can wait until a primary process starts by polling
the function and waiting. When the primary is running, the
secondary continues to poll to detect if the primary process
has quit unexpectedly, the secondary process can detect this.

The RTE_MAGIC number is written to the shared config by the
primary process, this is the signal to the secondary process
that the EAL is set up, and ready to be used. The function
rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
delayed in the EAL init proceedure, as the PCI probing in
the primary process can interfere with the secondary running.

Signed-off-by: Harry van Haaren 
---

v3:
- Fixed Copyright years

v2:
- Passing NULL as const char* uses default /var/run/.rte_config
- Moved code into /common/ instead of /linuxapp/, should work on BSD now

 doc/guides/rel_notes/release_2_3.rst|  7 +++
 lib/librte_eal/bsdapp/eal/Makefile  |  1 +
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  8 
 lib/librte_eal/common/eal_common_proc.c | 61 +
 lib/librte_eal/common/include/rte_eal.h | 20 +++-
 lib/librte_eal/linuxapp/eal/Makefile|  3 +-
 lib/librte_eal/linuxapp/eal/eal.c   |  6 +--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++
 8 files changed, 108 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/common/eal_common_proc.c

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..14b5b06 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -11,6 +11,13 @@ Resolved Issues
 EAL
 ~~~

+* **Added rte_eal_primary_proc_alive() function**
+
+  A new function ``rte_eal_primary_proc_alive()`` has been added
+  to allow the user to detect if a primary process is running.
+  Use cases for this feature include fault detection, and monitoring
+  using secondary processes.
+

 Drivers
 ~~~
diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
b/lib/librte_eal/bsdapp/eal/Makefile
index 65b293f..2d6e3b1 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -61,6 +61,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_alarm.c

 # from common dir
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_lcore.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_proc.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_timer.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_memzone.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_log.c
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..0e28017 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,11 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_primary_proc_alive;
+
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_proc.c 
b/lib/librte_eal/common/eal_common_proc.c
new file mode 100644
index 000..c598891
--- /dev/null
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -0,0 +1,61 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Intel Shannon Ltd. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY,

[dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api

2016-02-02 Thread Ananyev, Konstantin

Hi Tomasz,

> -Original Message-
> From: Kulasek, TomaszX
> Sent: Tuesday, February 02, 2016 10:01 AM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> Hi Konstantin,
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Friday, January 15, 2016 19:45
> > To: Kulasek, TomaszX; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> >
> > Hi Tomasz,
> >
> > >
> > > + /* get new buffer space first, but keep old space around */
> > > + new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > > + sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > > + if (new_bufs == NULL)
> > > + return -(ENOMEM);
> > > +
> >
> > Why not to allocate space for txq_bufs together with tx_queues (as one
> > chunk for both)?
> > As I understand there is always one to one mapping between them anyway.
> > Would simplify things a bit.
> > Or even introduce a new struct to group with all related tx queue info
> > togetehr struct rte_eth_txq_data {
> > void *queue; /*actual pmd  queue*/
> > struct rte_eth_dev_tx_buffer buf;
> > uint8_t state;
> > }
> > And use it inside struct rte_eth_dev_data?
> > Would probably give a better data locality.
> >
> 
> Introducing such a struct will require a huge rework of pmd drivers. I don't 
> think it's worth only for this one feature.

Why not?
Things are getting more and more messy here: now we have a separate array of 
pointer to queues,
Separate array of queue states, you are going to add separate array of tx 
buffers.
For me it seems logical to unite all these 3 fields into one sub-struct. 

> 
> 
> > > +/**
> > > + * @internal
> > > + * Structure used to buffer packets for future TX
> > > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > > +struct rte_eth_dev_tx_buffer {
> > > + struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> >
> > I think it is better to make size of pkts[] configurable at runtime.
> > There are a lot of different usage scenarios - hard to predict what would
> > be an optimal buffer size for all cases.
> >
> 
> This buffer is allocated in eth_dev shared memory, so there are two scenarios:
> 1) We have prealocated buffer with maximal size, and then we can set 
> threshold level without restarting device, or
> 2) We need to set its size before starting device.

> 
> Second one is better, I think.

Yep, I was thinking about 2) too.
Might be an extra parameter in struct rte_eth_txconf.

> 
> Tomasz

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Yuanhan Liu

On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
> >> +int rte_eal_pci_read_bar(const struct rte_pci_device *device,
> >> +  void *buf, size_t len, off_t offset,
> >> +  int bar_idx)
> >> +
> >> +{
> >> + const struct rte_intr_handle *intr_handle = >intr_handle;
> >
> > I'd suggest to reference this var inside pci_vfio_read/write_bar(), and
> > pass device as the parmater instead.
> >
> 
> (Sorry for late reply, I was travelling on Monday.)
> Make sense.
> 
> >> +
> >> + switch (device->kdrv) {
> >> + case RTE_KDRV_VFIO:
> >> + return pci_vfio_read_bar(intr_handle, buf, len,
> >> +  offset, bar_idx);
> >> + default:
> >> + RTE_LOG(ERR, EAL, "write bar not supported by driver\n");
> >^
> > typo.
> >
> 
> Oh, r / write / read, right? sorry for typo error (:-

Right.

> 
> >
> > BTW, I have a question about this API. Obviously, reading/writing bar
> > space is supported with UIO (when memory resource is mmapped). And I
> > know why you introduced such 2 APIs, for reading IO bar.
> >
> > So, here is the question: what are the 2 APIs for, for being gerneric
> > APIs to read/write bar spaces, or just to read IO bar spaces? If it's
> > former, the message is wrong; if it's later, you may better rename it
> > to rte_eal_pci_read/write_io_bar()?
> >
> 
> Current use-case is virtio: It is used as io_bar which is first
> bar[1]. But implementation is generic, can be used to do rd/wr for
> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
> w/o mapping that bar, So apis will be useful for such cases in future.
> 
> AFAIU: uio has read/write_config api only and Yes if bar region mapped
> then no need to do rd/wr, user can directly access the pci_memory. But
> use-case of this api entirely different: unmapped memory by
> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
> 
> Is above explanation convincing? Pl. let me know.

TBH, not really. So, as you stated, it should be generic APIs to
read/write bar space, but limiting it to VFIO only and claiming
that read/write bar space is not support by other drivers (such
as UIO) while in fact it can (in some ways) doesn't seem right
to me.

Anyway, it's just some thoughts from me. David, comments?

--yliu

[dpdk-dev] [PATCH v5 3/4] ethdev: redesign link speed config API

2016-02-02 Thread Stephen Hemminger

On Thu, 28 Jan 2016 17:33:20 +
Harish Patil  wrote:

> * Added utility MACROs ETH_SPEED_NUM_XXX with the numeric
>   values of all supported link speeds, in Mbps.

I would prefer that there were no speed value macros.
Linux used to have these, but people kept adding new hardware speeds
and it soon gets out of date.

If you are going to redo it, then just increase speed to 64 bit, and allow
any non-zero value.

[dpdk-dev] [PATCH v6 6/8] virtio: add vfio api to rd/wr ioport space

2016-02-02 Thread Yuanhan Liu

On Tue, Feb 02, 2016 at 10:00:36AM +0530, Santosh Shukla wrote:
> >
> > BTW, I didn't follow the noIOMMU discussion; how did it end? Do we still
> > need that? Is this patch a full story to enable virtio on ARM?
> >
> Ok, We agreed that explicit __noiommu suffix not required, atleast for
> rte_xx_drv struct{}, as because sooner than later we'll have virtio
> working for both flavours ie... iommu/noiommu. My only worry was
> parsing for _noiommu and default vfio case, as because noiomu needed
> user to preset "enable_noiommu_" param for vfio driver to do mode
> switch. But we wont need that parsing as because if param is not set
> then binding won't happen, which Thomas rightly pointed out, therefore
> I choose to drop resource parsing for virtio-for-vfio case, now virtio
> driver to check only drv->kdrv == RTE_KDRV_VFIO so to make sure
> interface attached to vfio or not.
> 
> But perhaps when we have both flavours working for virtio, then we
> should at least prompt a  INFO message on console that virtio pmd
> driver attached to default vfio or noIOMMU.
> 
> So we don't need explicit _noIOMMU.

Thanks for the explanation.
> 
> Yes this patch is to enable non-x86 arch to use virtio pmd driver
> (virtio 0.95 spec). After this patch merges-in, I am planning to
> - replace sys/io.h entirely

Hmm, be more specific? Replace it with what?

> - Add raw_read/raw_writel() api for arm/arm64 {Already proposed
> similar implementation in v2} so that they could use virtio 1.0spec
> mapped memory, for both UIO/VFIO mode.

PCI memory bar mapping works both with UIO/VFIO on ARM, without
any extra efforts, right? If so, it should just work with my
patch set and yours.

--yliu

[dpdk-dev] [PATCH 4/9] eal/linux: move back interrupt thread init before setting affinity

2016-02-02 Thread Rahul Lakkireddy

Hi David,

On Monday, February 02/01/16, 2016 at 23:13:55 -0800, David Marchand wrote:
> Hello Rahul,
> 
> On Tue, Feb 2, 2016 at 8:02 AM, Rahul Lakkireddy
>  wrote:
> > On Friday, January 01/29/16, 2016 at 15:08:31 +0100, David Marchand wrote:
> >> Now that virtio pci driver is initialized in a constructor, we only need to
> >> move the interrupt thread init after loading the plugins.
> >> This way, chelsio driver should be happy again [1].
> >
> > Thank you for this patch.  I've tested it and it does improve the perf.
> > back when there is a queue on master lcore in l3fwd app.
> 
> Did you test the whole series ? or just this specific patch ?
> 

I've tested only this particular patch, not the whole series.

> Anyway, great, but this is still too fragile.
> 
> As discussed in the same thread as the problem you reported, there is
> some work over there to be done so that interrupts are "distributed"
> in a more flexible way.
> Did someone look at this ?
> Plans to work on this ? (post 2.3, I suppose)
> 
> 
> -- 
> David Marchand

Thanks,
Rahul

[dpdk-dev] [PATCH v2] doc: minor correction in document

2016-02-02 Thread Ferruh Yigit

* remove outdated chapter reference to Multi-process support.
  Fixes: fc1f2750a3ec ("doc: programmers guide")

* html output converts "--" to "-", this is wrong when explaining the
  command arguments, used fixed width quotes for them.

v2:
* for "--" use fixed width quotes instead of option list.
* expand fixed width quotes usage to other "--" use cases.

Signed-off-by: Ferruh Yigit 
---
 doc/guides/prog_guide/env_abstraction_layer.rst |  2 +-
 doc/guides/prog_guide/multi_proc_support.rst| 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst 
b/doc/guides/prog_guide/env_abstraction_layer.rst
index 89feb69..4737dc2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -103,7 +103,7 @@ Multi-process Support
 ~

 The Linuxapp EAL allows a multi-process as well as a multi-threaded (pthread) 
deployment model.
-See chapter 2.20
+See chapter
 :ref:`Multi-process Support ` for more details.

 Memory Mapping Discovery and Memory Reservation
diff --git a/doc/guides/prog_guide/multi_proc_support.rst 
b/doc/guides/prog_guide/multi_proc_support.rst
index 6562f0d..1680d6b 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -55,9 +55,9 @@ after a primary process has already configured the hugepage 
shared memory for th
 To support these two process types, and other multi-process setups described 
later,
 two additional command-line parameters are available to the EAL:

-*   --proc-type: for specifying a given process instance as the primary or 
secondary DPDK instance
+*   ``--proc-type:`` for specifying a given process instance as the primary or 
secondary DPDK instance

-*   --file-prefix: to allow processes that do not want to co-operate to have 
different memory regions
+*   ``--file-prefix:`` to allow processes that do not want to co-operate to 
have different memory regions

 A number of example applications are provided that demonstrate how multiple 
DPDK processes can be used together.
 These are more fully documented in the "Multi- process Sample Application" 
chapter
@@ -90,7 +90,7 @@ and point to the same objects, in both processes.
Memory Sharing in the DPDK Multi-process Sample Application


-The EAL also supports an auto-detection mode (set by EAL --proc-type=auto flag 
),
+The EAL also supports an auto-detection mode (set by EAL ``--proc-type=auto`` 
flag ),
 whereby an DPDK process is started as a secondary instance if a primary 
instance is already running.

 Deployment Models
@@ -102,8 +102,8 @@ Symmetric/Peer Processes
 DPDK multi-process support can be used to create a set of peer processes where 
each process performs the same workload.
 This model is equivalent to having multiple threads each running the same 
main-loop function,
 as is done in most of the supplied DPDK sample applications.
-In this model, the first of the processes spawned should be spawned using the 
--proc-type=primary EAL flag,
-while all subsequent instances should be spawned using the 
--proc-type=secondary flag.
+In this model, the first of the processes spawned should be spawned using the 
``--proc-type=primary`` EAL flag,
+while all subsequent instances should be spawned using the 
``--proc-type=secondary`` flag.

 The simple_mp and symmetric_mp sample applications demonstrate this usage 
model.
 They are described in the "Multi-process Sample Application" chapter in the 
*DPDK Sample Application's User Guide*.
@@ -125,7 +125,7 @@ Running Multiple Independent DPDK Applications
 In addition to the above scenarios involving multiple DPDK processes working 
together,
 it is possible to run multiple DPDK processes side-by-side,
 where those processes are all working independently.
-Support for this usage scenario is provided using the --file-prefix parameter 
to the EAL.
+Support for this usage scenario is provided using the ``--file-prefix`` 
parameter to the EAL.

 By default, the EAL creates hugepage files on each hugetlbfs filesystem using 
the rtemap_X filename,
 where X is in the range 0 to the maximum number of hugepages -1.
@@ -137,7 +137,7 @@ The rte part of the filenames of each of the above is 
configurable using the fil
 In addition to specifying the file-prefix parameter,
 any DPDK applications that are to be run side-by-side must explicitly limit 
their memory use.
 This is done by passing the -m flag to each process to specify how much 
hugepage memory, in megabytes,
-each process can use (or passing --socket-mem to specify how much hugepage 
memory on each socket each process can use).
+each process can use (or passing ``--socket-mem`` to specify how much hugepage 
memory on each socket each process can use).

 .. note::

@@ -149,7 +149,7 @@ Running Multiple Independent Groups of DPDK Applications

 In the same way that it is possible to run independent DPDK applications side-

[dpdk-dev] DPDK-QoS - link sharing across classes

2016-02-02 Thread sreenaath vasudevan

Hi
I currently have QoS implemented in hardware and I am thinking of using
DPDK's QoS feature to move it to software.
Currently in the hardware,Based on the 4 class per pipe and 4 queues per
class limitation, I was thinking of using 4 classes in DPDK-QoS and spread
out the 8 h/w queues across the 4 classes.
Let me explain with an example. Currently, this is how the h/w queue is
represented
Q0 - 10% b/w
Q1- 10%  b/w
Q2- 15% b/w
Q3 - 15% b/w
Q4 - 15% b/w
Q5 - 15% b/w
Q6 - 10% b/w
Q7 - 10% b/w

Translating the above config to DPDK-QoS, based on my application need, Q0
and Q1 can be logically grouped under class0 with upper b/w = 20%; Q2, Q3,
Q4, Q5 can be logically grouped under class2 with upper b/w = 60%; Q6 and
Q7 can be logically grouped under class 3 with super b/w = 20%.

However, in the h/w, link sharing is available across all the 8 queues.
DPDK materials say link sharing "typically" is enabled for last class, in
this case class2. However, I also want class 1 or class 0 to use the
remaining bandwidth when class2 does not have any traffic and so on. Can
this be done in DPDK ? Do we have a concept of min and max b/w guarantee at
the class level in DPDK-QoS ?

Thanks !

-- 
regards
sreenaath

[dpdk-dev] [PATCH] doc: minor correction in document

2016-02-02 Thread Yigit, Ferruh

On Mon, Feb 01, 2016 at 04:02:05PM +, Mcnamara, John wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> > Sent: Friday, January 29, 2016 10:12 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] doc: minor correction in document
> > 
> > ...
> >
> > * html output converts "--" to "-", this is wrong when explaining the
> >   command arguments, used "option list" to fix this:
> > 
> > http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#option-
> > lists
> 
> Hi Ferruh,
> 
> That is interesting. I wasn't aware of that option. That would make the 
> documentation
> of commandline options very clean.
> 
> However, it doesn't render as wrapped text in the Html output and has scroll 
> bars instead.
> This make it a little hard to read.
> 
> Maybe as a workaround we could just use fixed width quotes like other places 
> in the docs.
> Like this:
> 
> 
>  * ``--proc-type``:for specifying a given process ...
>  * ``--file-prefix``:  to allow processes that do not ...
> 
Thank you for the workaround, I will update according.

Thanks,
ferruh

[dpdk-dev] [PATCH v2 3/3] virtio: Add a new layer to abstract pci access method

2016-02-02 Thread Tetsuya Mukawa

On 2016/02/02 11:45, Yuanhan Liu wrote:
> On Tue, Feb 02, 2016 at 11:19:50AM +0900, Tetsuya Mukawa wrote:
>> On 2016/02/01 22:15, Yuanhan Liu wrote:
>>> On Mon, Feb 01, 2016 at 10:50:00AM +0900, Tetsuya Mukawa wrote:
 On 2016/01/29 18:17, Yuanhan Liu wrote:
> On Thu, Jan 28, 2016 at 06:33:32PM +0900, Tetsuya Mukawa wrote:
>> This patch addss function pointers to abstract pci access method.
>> This abstraction layer will be used when virtio-net PMD supports
>> container extension.
>>
>> The below functions abstract how to access to pci configuration space.
>>
>> struct virtio_pci_cfg_ops {
>> int   (*map)(...);
>> void  (*unmap)(...);
>> void *(*get_mapped_addr)(...);
>> int   (*read)(...);
>> };
>>
>> The pci configuration space has information how to access to virtio
>> device registers. Basically, there are 2 ways to acccess to the
>> registers. One is using portio and the other is using mapped memory.
>> The below functions abstract this access method.
> One question: is there a way to map PCI memory with Qtest? I'm thinking
> if we can keep the io_read/write() for Qtest as well, if so, code could
> be simplified, a lot, IMO.
>
 Yes, I agree with you.
 But AFAIK, we don't have a way to mmap it from DPDK application.

 We may be able to map PCI configuration space to a memory address space
 that guest CPU can handle.
 But even in this case, I guess we cannot access the memory without qtest
 messaging.
>>> Acutally, I have a concern about this access abstraction, which makes
>>> those simple funciton not inline. It won't be an issue for most of them,
>>> as most of them are invoked during init stage, where has no impact on
>>> performance.
>>>
>>> notify_queue(), however, is a bit different. I was thinking the "inline
>>> to callback (not inline)" convertion might has some impacts on the
>>> performance. Would you do a test for me?
>> Sure, I will be able to.
> Thanks.
>
>> But if we concern about it, I guess it's also nice to implement the PMD
>> on your vtpci abstraction.
>> (It means we don't use the access abstraction)
>> Probably this lets our merging process faster.
>> What do you think?
> Another standalone PMD driver? (sorry that I didn't follow the
> discussion).

Yes, Jianfeng will submit one more virtual virtio-net PMD.

>  If so, won't it introduce too much duplicate code?

Quick look, I guess we won't have not so much duplicated code.

Tetsuya

[dpdk-dev] [PATCH 4/9] eal/linux: move back interrupt thread init before setting affinity

2016-02-02 Thread Rahul Lakkireddy

Hi David,

On Friday, January 01/29/16, 2016 at 15:08:31 +0100, David Marchand wrote:
> Now that virtio pci driver is initialized in a constructor, we only need to
> move the interrupt thread init after loading the plugins.
> This way, chelsio driver should be happy again [1].
> 
> [1] http://dpdk.org/ml/archives/dev/2015-November/028289.html
> 
> Signed-off-by: David Marchand 
> ---
>  lib/librte_eal/linuxapp/eal/eal.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 635ec36..62241ee 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -823,6 +823,9 @@ rte_eal_init(int argc, char **argv)
>   if (eal_plugins_init() < 0)
>   rte_panic("Cannot init plugins\n");
>  
> + if (rte_eal_intr_init() < 0)
> + rte_panic("Cannot init interrupt-handling thread\n");
> +
>   eal_thread_init_master(rte_config.master_lcore);
>  
>   ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
> @@ -834,9 +837,6 @@ rte_eal_init(int argc, char **argv)
>   if (rte_eal_dev_init() < 0)
>   rte_panic("Cannot init pmd devices\n");
>  
> - if (rte_eal_intr_init() < 0)
> - rte_panic("Cannot init interrupt-handling thread\n");
> -
>   RTE_LCORE_FOREACH_SLAVE(i) {
>  
>   /*
> -- 
> 1.9.1
> 

Thank you for this patch.  I've tested it and it does improve the perf.
back when there is a queue on master lcore in l3fwd app.

Thanks,
Rahul

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Santosh Shukla

On Tue, Feb 2, 2016 at 12:30 PM, Santosh Shukla  wrote:
> On Tue, Feb 2, 2016 at 11:13 AM, Yuanhan Liu
>  wrote:
>> On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
>>> >> +int rte_eal_pci_read_bar(const struct rte_pci_device *device,
>>> >> +  void *buf, size_t len, off_t offset,
>>> >> +  int bar_idx)
>>> >> +
>>> >> +{
>>> >> + const struct rte_intr_handle *intr_handle = >intr_handle;
>>> >
>>> > I'd suggest to reference this var inside pci_vfio_read/write_bar(), and
>>> > pass device as the parmater instead.
>>> >
>>>
>>> (Sorry for late reply, I was travelling on Monday.)
>>> Make sense.
>>>
>>> >> +
>>> >> + switch (device->kdrv) {
>>> >> + case RTE_KDRV_VFIO:
>>> >> + return pci_vfio_read_bar(intr_handle, buf, len,
>>> >> +  offset, bar_idx);
>>> >> + default:
>>> >> + RTE_LOG(ERR, EAL, "write bar not supported by driver\n");
>>> >^
>>> > typo.
>>> >
>>>
>>> Oh, r / write / read, right? sorry for typo error (:-
>>
>> Right.
>>
>>>
>>> >
>>> > BTW, I have a question about this API. Obviously, reading/writing bar
>>> > space is supported with UIO (when memory resource is mmapped). And I
>>> > know why you introduced such 2 APIs, for reading IO bar.
>>> >
>>> > So, here is the question: what are the 2 APIs for, for being gerneric
>>> > APIs to read/write bar spaces, or just to read IO bar spaces? If it's
>>> > former, the message is wrong; if it's later, you may better rename it
>>> > to rte_eal_pci_read/write_io_bar()?
>>> >
>>>
>>> Current use-case is virtio: It is used as io_bar which is first
>>> bar[1]. But implementation is generic, can be used to do rd/wr for
>>> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
>>> w/o mapping that bar, So apis will be useful for such cases in future.
>>>
>>> AFAIU: uio has read/write_config api only and Yes if bar region mapped
>>> then no need to do rd/wr, user can directly access the pci_memory. But
>>> use-case of this api entirely different: unmapped memory by
>>> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
>>>
>>> Is above explanation convincing? Pl. let me know.
>>
>> TBH, not really. So, as you stated, it should be generic APIs to
>> read/write bar space, but limiting it to VFIO only and claiming
>> that read/write bar space is not support by other drivers (such
>> as UIO) while in fact it can (in some ways) doesn't seem right
>> to me.
>>
>

Sorry typo
> I agree.. But if UIO doesn't and need could, then I am confused what

r / But if UIO doesn't and need could / But if UIO doesn't and vfio could

> can be done? However we have a use-case for vfio so It make sense to
> me use this api.  Or else If we all agree then I can export api only
> for VFIO.. but it will violate EAL abstraction.
>
>
>> Anyway, it's just some thoughts from me. David, comments?
>>
>> --yliu

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Santosh Shukla

On Tue, Feb 2, 2016 at 11:13 AM, Yuanhan Liu
 wrote:
> On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
>> >> +int rte_eal_pci_read_bar(const struct rte_pci_device *device,
>> >> +  void *buf, size_t len, off_t offset,
>> >> +  int bar_idx)
>> >> +
>> >> +{
>> >> + const struct rte_intr_handle *intr_handle = >intr_handle;
>> >
>> > I'd suggest to reference this var inside pci_vfio_read/write_bar(), and
>> > pass device as the parmater instead.
>> >
>>
>> (Sorry for late reply, I was travelling on Monday.)
>> Make sense.
>>
>> >> +
>> >> + switch (device->kdrv) {
>> >> + case RTE_KDRV_VFIO:
>> >> + return pci_vfio_read_bar(intr_handle, buf, len,
>> >> +  offset, bar_idx);
>> >> + default:
>> >> + RTE_LOG(ERR, EAL, "write bar not supported by driver\n");
>> >^
>> > typo.
>> >
>>
>> Oh, r / write / read, right? sorry for typo error (:-
>
> Right.
>
>>
>> >
>> > BTW, I have a question about this API. Obviously, reading/writing bar
>> > space is supported with UIO (when memory resource is mmapped). And I
>> > know why you introduced such 2 APIs, for reading IO bar.
>> >
>> > So, here is the question: what are the 2 APIs for, for being gerneric
>> > APIs to read/write bar spaces, or just to read IO bar spaces? If it's
>> > former, the message is wrong; if it's later, you may better rename it
>> > to rte_eal_pci_read/write_io_bar()?
>> >
>>
>> Current use-case is virtio: It is used as io_bar which is first
>> bar[1]. But implementation is generic, can be used to do rd/wr for
>> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
>> w/o mapping that bar, So apis will be useful for such cases in future.
>>
>> AFAIU: uio has read/write_config api only and Yes if bar region mapped
>> then no need to do rd/wr, user can directly access the pci_memory. But
>> use-case of this api entirely different: unmapped memory by
>> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
>>
>> Is above explanation convincing? Pl. let me know.
>
> TBH, not really. So, as you stated, it should be generic APIs to
> read/write bar space, but limiting it to VFIO only and claiming
> that read/write bar space is not support by other drivers (such
> as UIO) while in fact it can (in some ways) doesn't seem right
> to me.
>

I agree.. But if UIO doesn't and need could, then I am confused what
can be done? However we have a use-case for vfio so It make sense to
me use this api.  Or else If we all agree then I can export api only
for VFIO.. but it will violate EAL abstraction.


> Anyway, it's just some thoughts from me. David, comments?
>
> --yliu

[dpdk-dev] [PATCH v1] Modify and modularize l3fwd code

2016-02-02 Thread Kulasek, TomaszX



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
> Sent: Tuesday, December 22, 2015 00:13
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1] Modify and modularize l3fwd code
> 
> v1:
>   > Rebase to latest code base for DPDK team review.
> 
> Intel team's (Konstantin, Bruce and Declan) review comments
> 
> v4<-v3:
> > Fix code review comments from Konstantin
> > Move buffer optimization code into l3fwd_lpm_sse.h
>   and l3fwd_em_sse.h for LPM and EM respectively
> > Add compile time __SSE4_1__ for header file inclusion
> > Tested with CONFIG_RTE_MACHINE=default for non
>   __SSE4_1__ compilation and build
> > Compiled for GCC 4.8.4 and 5.1 on Ubuntu 14.04
> 
> v3<-v2:
> > Fix code review comments from Bruce
> > Fix multiple static definitions
> > Move local #defines to C files, common #defines
> to H file.
> > Rename ipv4_l3fwd_route to ipv4_l3fwd_lpm and ipv4_l3fwd_em
> > Rename ipv6_l3fwd_route to ipv6_l3fwd_lpm and ipv6_l3fwd_lpm
> > Pass additional parameter to send_single_packet
> > Compiled for GCC 4.8.4 and 5.1 on Ubuntu 14.04
> 
> v2<-v1:
> > Fix errors in GCC 5.1
> > Restore "static inline" functions, rearrange
> functions to take "static inline" into account
> > Duplicate main_loop for LPM and EM
> 
> v1:
> > Split main.c into following 3 files
> > main.c, (parsing, buffer alloc, and other utilities)
> > l3fwd_lpm.c, (Longest Prefix Match functions)
> > l3fwd_em.c, (Exact Match f.e. Hash functions)
> > l3fwd.h, (Common defines and prototypes)
> 
> > Select LPM or EM based on run time selection f.e.
> > l3fwd -c 0x1 -n 1 -- -p 0x1 -E ... (Exact Match)
> > l3fwd -c 0x1 -n 1 -- -p 0x1 -L ... (LPM)
> 
> > Options "E" and "L" are mutualy-exclusive.
> 
> > Use function pointers during initialiation of relevant
> data structures.
> 
> > Remove unwanted #ifdefs in the code with exception to
> > DO_RFC_1812_CHECKS
> > RTE_MACHINE_CPUFLAG_SSE4_2
> 
> > Compiled for
> > i686-native-linuxapp-gcc
> > x86_64-native-linuxapp-gcc
> > x86_x32-native-linuxapp-gcc
> > x86_64-native-bsdapp-gcc
> 
> > Tested on
> > Ubuntu 14.04 (GCC 4.8.4)
> > FreeBSD 10.0 (GCC 4.8)
> > I217 and I218 respectively.
> 
> Signed-off-by: Ravi Kerur 
> ---

Tested-by: Tomasz Kulasek 
Acked-by: Tomasz Kulasek

[dpdk-dev] [PATCH v6 0/2] Add VHOST PMD

2016-02-02 Thread Rich Lane

Looks good. I tested the queue state change code in particular and didn't
find any problems.

Reviewed-by: Rich Lane 
Tested-by: Rich Lane

[dpdk-dev] [PATCH v6 0/9] virtio 1.0 enabling for virtio pmd driver

2016-02-02 Thread Thomas Monjalon

Hi Yuanhan,

I wanted to apply these patches but I see few checkpatch warnings,
inluding a typo.
Sometimes I fix them myself but I guess you would prefer checking it.

### PATCH v6 4_9

CHECK:SPACING: spaces preferred around that '+' (ctx:VxV)
#571: FILE: drivers/net/virtio/virtio_pci.c:361:
+   left = [n+1];
 ^
WARNING:NAKED_SSCANF: unchecked sscanf return value
#582: FILE: drivers/net/virtio/virtio_pci.c:372:
+   sscanf(ptr, "%04hx-%04hx", , );

### PATCH v6 5_9

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#79: FILE: drivers/net/virtio/virtio_ethdev.c:937:
+   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %"PRIx64,

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#85: FILE: drivers/net/virtio/virtio_ethdev.c:942:
+   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %"PRIx64,

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#94: FILE: drivers/net/virtio/virtio_ethdev.c:950:
+   PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64,

### PATCH v6 7_9

WARNING:BAD_SIGN_OFF: 'Tested-by:' is the preferred signature form
#63: 
Tested-By: Santosh Shukla 

### PATCH v6 8_9

CHECK:CAMELCASE: Avoid CamelCase: 
#396: FILE: drivers/net/virtio/virtio_pci.c:620:
+   PMD_INIT_LOG(DEBUG, "\t desc_addr: %"PRIx64, desc_addr);

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#396: FILE: drivers/net/virtio/virtio_pci.c:620:
+   PMD_INIT_LOG(DEBUG, "\t desc_addr: %"PRIx64, desc_addr);

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#397: FILE: drivers/net/virtio/virtio_pci.c:621:
+   PMD_INIT_LOG(DEBUG, "\t aval_addr: %"PRIx64, avail_addr);

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#398: FILE: drivers/net/virtio/virtio_pci.c:622:
+   PMD_INIT_LOG(DEBUG, "\t used_addr: %"PRIx64, used_addr);

WARNING:TYPO_SPELLING: 'lenght' may be misspelled - perhaps 'length'?
#475: FILE: drivers/net/virtio/virtio_pci.c:751:
+   PMD_INIT_LOG(ERR, "offset(%u) + lenght(%u) overflows",

CHECK:CAMELCASE: Avoid CamelCase: 
#482: FILE: drivers/net/virtio/virtio_pci.c:758:
+   "invalid cap: overflows bar space: %u > %"PRIu64,

CHECK:CONCATENATED_STRING: Concatenated strings should use spaces between 
elements
#482: FILE: drivers/net/virtio/virtio_pci.c:758:
+   "invalid cap: overflows bar space: %u > %"PRIu64,

WARNING:INDENTED_LABEL: labels should not be indented
#550: FILE: drivers/net/virtio/virtio_pci.c:826:
+   next:

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-02-02 Thread Van Haaren, Harry

+John,

> From: David Harton 
> Subject: RE: [dpdk-dev] Future Direction for rte_eth_stats_get()
> 
> Hi folks,
> 
> I didn't see any follow up to this response.

I think you may have missed one:
http://dpdk.org/ml/archives/dev/2016-January/032211.html


> 
> However, if modifying that API is not desired then I'd
> really like to have some feedback about extending the current xstats model.

API / ABI modification/breakage is something that will
need to be considered in any discussions. The 2.3 (or 16.04)
V1 patch deadline has passed, so any output from this 
discussion will be leading to the 16.07 release.

> Again, it is desired not to have to copy and/or parse strings for scalability 
> reasons but
> still maintain the "ABI flexibility" for which the xstats model was geared 
> towards.

Agreed.


From: David Harton
> enum rte_eth_stat_e {
> /* accurate desc #1 */
> RTE_ETH_STAT_1,
> /* accurate desc #2 */
> RTE_ETH_STAT_2,
> ...
> }

For this enum, do you see each stat for every PMD entered here?
RX_GOOD_PACKETS,

What about RX and TX, is it listed twice?
RX_GOOD_PACKETS,
TX_GOOD_PACKETS,

In a case where a VF has RX_GOOD_PACKETS, does it get its "own" set?
RX_GOOD_PACKETS,
TX_GOOD_PACKETS,
VF_RX_GOOD_PACKETS,
VF_TX_GOOD_PACKETS,

I'm looking at the enum thinking it will grow out of control.
Have you thought about adding metadata for RX / TX, PF / VF?

If metadata info is added, it would make retrieving a set of
statistics based on a certain mask much easier. Do you think
this may be of use?


-Harry

[dpdk-dev] [PATCH v6 6/8] virtio: add vfio api to rd/wr ioport space

2016-02-02 Thread Santosh Shukla

On Tue, Feb 2, 2016 at 10:49 AM, Yuanhan Liu
 wrote:
> On Tue, Feb 02, 2016 at 10:00:36AM +0530, Santosh Shukla wrote:
>> >
>> > BTW, I didn't follow the noIOMMU discussion; how did it end? Do we still
>> > need that? Is this patch a full story to enable virtio on ARM?
>> >
>> Ok, We agreed that explicit __noiommu suffix not required, atleast for
>> rte_xx_drv struct{}, as because sooner than later we'll have virtio
>> working for both flavours ie... iommu/noiommu. My only worry was
>> parsing for _noiommu and default vfio case, as because noiomu needed
>> user to preset "enable_noiommu_" param for vfio driver to do mode
>> switch. But we wont need that parsing as because if param is not set
>> then binding won't happen, which Thomas rightly pointed out, therefore
>> I choose to drop resource parsing for virtio-for-vfio case, now virtio
>> driver to check only drv->kdrv == RTE_KDRV_VFIO so to make sure
>> interface attached to vfio or not.
>>
>> But perhaps when we have both flavours working for virtio, then we
>> should at least prompt a  INFO message on console that virtio pmd
>> driver attached to default vfio or noIOMMU.
>>
>> So we don't need explicit _noIOMMU.
>
> Thanks for the explanation.
>>
>> Yes this patch is to enable non-x86 arch to use virtio pmd driver
>> (virtio 0.95 spec). After this patch merges-in, I am planning to
>> - replace sys/io.h entirely
>
> Hmm, be more specific? Replace it with what?
>

Just to remove ifdef clutter in general from dpdk source and mo

>> - Add raw_read/raw_writel() api for arm/arm64 {Already proposed
>> similar implementation in v2} so that they could use virtio 1.0spec
>> mapped memory, for both UIO/VFIO mode.
>
> PCI memory bar mapping works both with UIO/VFIO on ARM, without
> any extra efforts, right? If so, it should just work with my
> patch set and yours.
>

Mostl likely, That I have not tried. (todo),
> --yliu

[dpdk-dev] [PATCH v2] eal: add architecture specific rte_cpuflags.c files

2016-02-02 Thread Thomas Monjalon

2016-01-28 12:20, Ferruh Yigit:
> Move cpu_feature_table array from arch specific rte_cpuflags.h files to
> new arch specific rte_cpuflags.c files.
> 
> Main motivation is to escape from static variable declarations in
> header files. cpu_feature_table has many copies in final binary, even
> exist in some object files that does not use this variable at all.
> 
> And this can be a sample to create architecture specific source files
> and move some functions which are not performance sensitive from
> architecture header files to source files.
> 
> v2:
> * rebased for DPDK2.3 (16.04)
> * added arm arch
> * renamed cpu_feature_table[] to rte_cpu_feature_table[]

Applied, thanks.

Now that we have some .c files for rte_cpuflags, I think some code
may be changed.
The function rte_cpu_get_features() and the struct feature_entry are
really specific to each arch and used only by rte_cpu_get_flag_enabled().
So they can be moved in .c and be more specific (leaf/subleaf is unused
for some archs).
Then there is no reason to keep rte_cpu_get_flag_enabled() inline.

[dpdk-dev] [PATCH v2 3/3] virtio: Add a new layer to abstract pci access method

2016-02-02 Thread Tetsuya Mukawa

On 2016/02/01 22:15, Yuanhan Liu wrote:
> On Mon, Feb 01, 2016 at 10:50:00AM +0900, Tetsuya Mukawa wrote:
>> On 2016/01/29 18:17, Yuanhan Liu wrote:
>>> On Thu, Jan 28, 2016 at 06:33:32PM +0900, Tetsuya Mukawa wrote:
 This patch addss function pointers to abstract pci access method.
 This abstraction layer will be used when virtio-net PMD supports
 container extension.

 The below functions abstract how to access to pci configuration space.

 struct virtio_pci_cfg_ops {
 int   (*map)(...);
 void  (*unmap)(...);
 void *(*get_mapped_addr)(...);
 int   (*read)(...);
 };

 The pci configuration space has information how to access to virtio
 device registers. Basically, there are 2 ways to acccess to the
 registers. One is using portio and the other is using mapped memory.
 The below functions abstract this access method.
>>> One question: is there a way to map PCI memory with Qtest? I'm thinking
>>> if we can keep the io_read/write() for Qtest as well, if so, code could
>>> be simplified, a lot, IMO.
>>>
>> Yes, I agree with you.
>> But AFAIK, we don't have a way to mmap it from DPDK application.
>>
>> We may be able to map PCI configuration space to a memory address space
>> that guest CPU can handle.
>> But even in this case, I guess we cannot access the memory without qtest
>> messaging.
> Acutally, I have a concern about this access abstraction, which makes
> those simple funciton not inline. It won't be an issue for most of them,
> as most of them are invoked during init stage, where has no impact on
> performance.
>
> notify_queue(), however, is a bit different. I was thinking the "inline
> to callback (not inline)" convertion might has some impacts on the
> performance. Would you do a test for me?

Sure, I will be able to.
But if we concern about it, I guess it's also nice to implement the PMD
on your vtpci abstraction.
(It means we don't use the access abstraction)
Probably this lets our merging process faster.
What do you think?
Also I guess Jianfeng will implement his PMD on your abstraction.
If so, I will also follow him.

>
> Another off-topic remind is that I guess you might need to send a new
> version of your vhost-pmd patchset, the sooner the better. Chinese new
> year is coming; I'm having vacation since the end of this week (And,
> Huawei has been on vacation sine the end of last week). I hope we could
> make it in v2.3.

Thanks for notification. Sure I will submit it soon.

Tetsuya

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Qiu, Michael
> Sent: Tuesday, February 02, 2016 2:57 AM
> To: Zhang, Helin; Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice
> 
> On 2/2/2016 10:14 AM, Zhang, Helin wrote:
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Tuesday, February 2, 2016 10:07 AM
> >> To: Lu, Wenzhuo; dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
> >> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> [+cc helin]
> >>
> >> On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> >>> Hi Michael,
> >>>
>  -Original Message-
>  From: Qiu, Michael
>  Sent: Monday, February 1, 2016 4:05 PM
>  To: Lu, Wenzhuo; dev at dpdk.org
>  Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
>  Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
>  On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> > Hi Michael,
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Friday, January 29, 2016 1:58 PM
> >> To: dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
> >> Michael
> >> Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> Currently, ixgbe vf and pf will disable interrupt twice in stop
> >> stage and uninit stage. It will cause an error:
> >>
> >> testpmd> quit
> >>
> >> Shutting down port 0...
> >> Stopping ports...
> >> Done
> >> Closing ports...
> >> EAL: Error disabling MSI-X interrupts for fd 26
> >> Done
> >>
> >> Becasue the interrupt already been disabled in stop stage.
> >> Since it is enabled in init stage, better remove from stop stage.
> > I'm afraid it's not a good idea to just remove the intr_disable from
> >> dev_stop.
> > I think dev_stop have the chance to be used independently with
> > dev_unint. In
>  this scenario, we still need intr_disable, right?
> > Maybe what we need is some check before we disable the intr:)
>  Yes, indeed we need some check in disable intr, but it need
>  additional fields in "struct rte_intr_handle",  and it's much saft to
>  do so, but as I check i40e/fm10k code, only ixgbe disable it in 
>  dev_stop().
> >>> I found fm10k doesn't enable intr in dev_start. So, I think it's OK. But 
> >>> i40e
> >> enables intr in dev_start.
> >>> To my opinion, it's more like i40e misses the intr_disable in dev_stop.
> >> I don't think i40e miss it, because it not the right please to disable 
> >> interrupt.
> >> because all interrupts are enabled in init stage.
> >>
> >> Actually, ixgbe enable the interrupt in init stage, but in dev_start, it 
> >> disable it
> >> first and re-enable, so it just the same with doing nothing about 
> >> interrupt.
> >>
> >> Just think below:
> >>
> >> 1. start the port.(interrupt already enabled in init stage, disable -->
> >> re-enable)
> >> 2. stop the port.(disable interrupt)
> >> 3. start port again(Try to disable, but failed, already disabled)
> >>
> >> Would you think the code has issue?
> > [Zhang, Helin] in ixgbe PMD, it can be seen that uninit() calls dev_close(),
> > which calls dev_stop(). So I think the disabling can be done only in 
> > dev_stop().
> > All others can make use of dev_stop to disable the interrupt.
> 
> As I said, if it is in dev_stop, it will has issue when dev_start -->
> dev_stop --> dev_start, this also could applied in i40e and fm10k. If
> you want to put it in dev_stop, better to remove enable interrupts in
> init stage, and only put it in dev_start.

We can't remove enabling interrupt at init stage and put it only in dev_start().
That means PF couldn't handle interrupts from VF till dev_start() will be 
executed on PF
 - which could never happen.
For same reason we can't disable all interrupts in dev_stop().
See: http://dpdk.org/ml/archives/dev/2015-November/027238.html
Konstantin

> 
> Thanks,
> Michael
> > Regards,
> > Helin
> >
> >> Thanks,
> >> Michael
> >>
> >>> Maybe we can follow fm10k's style.
> >>>
>  On other hand, if we remove it in dev_stop, any side effect? In ixgbe
>  start, it will always disable it first and then re-enable it, so it's 
>  safe.
> >>> I think you mean we can disable intr anyway even if it has been disabled.
> >> Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable 
> >> interrupts, and
> >> if we try disable twice, it will return and error.
> >> That's why I mean we need a flag to show the interrupts stats. If it 
> >> already
> >> disabled, we do not need call in to kernel. just return and give a warning
> >> message.
> >>
> >> Thanks,
> >> Michael
> >>
> >>>  Sounds more like why we don't
> >>> need this patch :)
> >>>
>  Thanks,
>  Michael
> >

[dpdk-dev] [PATCH] fm10k: fix vlan flag bug in scattered RX

2016-02-02 Thread Bruce Richardson

On Fri, Dec 18, 2015 at 05:35:35AM +, He, Shaopeng wrote:
> 
> > -Original Message-
> > From: Wang, Xiao W
> > Sent: Friday, December 18, 2015 11:09 AM
> > To: Chen, Jing D
> > Cc: He, Shaopeng; dev at dpdk.org; Wang, Xiao W
> > Subject: [PATCH] fm10k: fix vlan flag bug in scattered RX
> > 
> > In fm10k_recv_scattered_pkts function, a packet is stored in a linked list,
> > offload flags such as PKT_RX_VLAN_PKT should be set in the first segment.
> > 
> > Signed-off-by: Wang Xiao W 
> 
> Thanks for the catch.
> 
> Acked-by: Shaopeng He 

First commit applied to dpdk-next-net tree.

Applied to dpdk-next-net/rel_16_04 with added fixes line:
Fixes: 6b59a3bc82b1 ("fm10k: fix VLAN in Rx mbuf")

I hope to apply more patches that are ready in the coming days.

Regards,
/Bruce

[dpdk-dev] [PATCH 1/3] fm10k: enable FTAG based forwarding

2016-02-02 Thread Wang, Xiao W



> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, February 2, 2016 11:07 AM
> To: Wang, Xiao W ; Chen, Jing D
> 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] fm10k: enable FTAG based forwarding
> 
> On 1/25/2016 4:08 PM, Wang Xiao W wrote:
> > This patch enables reading sglort info into mbuf for RX and inserting
> > an FTAG at the beginning of the packet for TX. The vlan_tci_outer
> > field selected from rte_mbuf structure for sglort is not used in fm10k now.
> > In FTAG based forwarding mode, the switch will forward packets
> > according to glort info in FTAG rather than mac and vlan table.
> >
> > To activate this feature, user needs to turn
> > ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> > to y in common_linuxapp or common_bsdapp. Currently this feature is
> > supported only on PF, because FM10K_PFVTCTL register is read-only for VF.
> >
> > Signed-off-by: Wang Xiao W 
> > ---
> >  config/common_bsdapp   |  1 +
> >  config/common_linuxapp |  1 +
> >  drivers/net/fm10k/fm10k_ethdev.c   |  8 
> >  drivers/net/fm10k/fm10k_rxtx.c | 17 +
> >  drivers/net/fm10k/fm10k_rxtx_vec.c |  9 +
> >  5 files changed, 36 insertions(+)
> >
> > diff --git a/config/common_bsdapp b/config/common_bsdapp index
> > ed7c31c..451f81a 100644
> > --- a/config/common_bsdapp
> > +++ b/config/common_bsdapp
> > @@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
> >  CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
> >  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
> >  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
> > +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
> >
> >  #
> >  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD diff --git
> > a/config/common_linuxapp b/config/common_linuxapp index
> > 74bc515..c928bce 100644
> > --- a/config/common_linuxapp
> > +++ b/config/common_linuxapp
> > @@ -207,6 +207,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
> >  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
> >  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
> >  CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
> > +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
> >
> >  #
> >  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD diff --git
> > a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
> > index e4aed94..cc8317f 100644
> > --- a/drivers/net/fm10k/fm10k_ethdev.c
> > +++ b/drivers/net/fm10k/fm10k_ethdev.c
> > @@ -668,6 +668,14 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
> > PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
> > return -1;
> > }
> > +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> > +   /* Enable use of FTAG bit in TX descriptor, PFVTCTL
> > +* register is read-only for VF.
> > +*/
> > +   if (hw->mac.type == fm10k_mac_pf)
> > +   FM10K_WRITE_REG(hw, FM10K_PFVTCTL(i),
> > +
>   FM10K_PFVTCTL_FTAG_DESC_ENABLE);
> 
> So here if somebody enable FTAG, when compile, but he use VF, what will
> happen? We'd better to give a error message when he try to use VF with FTAG.
> 
> Thanks,
> Michael

Yes, we'd better to give an error message and return failure in this use case, 
I'll
rework the patch. Thanks for your comment. 

Best Regards,
Wang, Xiao W

[dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api

2016-02-02 Thread Kulasek, TomaszX

Hi Konstantin,

> -Original Message-
> From: Ananyev, Konstantin
> Sent: Friday, January 15, 2016 19:45
> To: Kulasek, TomaszX; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 1/2] ethdev: add buffered tx api
> 
> Hi Tomasz,
> 
> >
> > +   /* get new buffer space first, but keep old space around */
> > +   new_bufs = rte_zmalloc("ethdev->txq_bufs",
> > +   sizeof(*dev->data->txq_bufs) * nb_queues, 0);
> > +   if (new_bufs == NULL)
> > +   return -(ENOMEM);
> > +
> 
> Why not to allocate space for txq_bufs together with tx_queues (as one
> chunk for both)?
> As I understand there is always one to one mapping between them anyway.
> Would simplify things a bit.
> Or even introduce a new struct to group with all related tx queue info
> togetehr struct rte_eth_txq_data {
>   void *queue; /*actual pmd  queue*/
>   struct rte_eth_dev_tx_buffer buf;
>   uint8_t state;
> }
> And use it inside struct rte_eth_dev_data?
> Would probably give a better data locality.
> 

Introducing such a struct will require a huge rework of pmd drivers. I don't 
think it's worth only for this one feature. 


> > +/**
> > + * @internal
> > + * Structure used to buffer packets for future TX
> > + * Used by APIs rte_eth_tx_buffer and rte_eth_tx_buffer_flush  */
> > +struct rte_eth_dev_tx_buffer {
> > +   struct rte_mbuf *pkts[RTE_ETHDEV_TX_BUFSIZE];
> 
> I think it is better to make size of pkts[] configurable at runtime.
> There are a lot of different usage scenarios - hard to predict what would
> be an optimal buffer size for all cases.
> 

This buffer is allocated in eth_dev shared memory, so there are two scenarios:
1) We have prealocated buffer with maximal size, and then we can set threshold 
level without restarting device, or
2) We need to set its size before starting device.

Second one is better, I think.

Tomasz

[dpdk-dev] [PATCH v3 0/8] vhost-user live migration support

2016-02-02 Thread Yuanhan Liu

On Mon, Feb 01, 2016 at 03:54:05PM +, Iremonger, Bernard wrote:
> Hi Yuanhan,
> 
> 
> 
> > A simple test guide (on same host)
> > ==
> > 
> > The following test is based on OVS + DPDK (check [0] for how to setup OVS +
> > DPDK):
...
> 
> The above test guide should probably be added to the DPDK doc files.
> It could be added to the sample app guide or the programmers guide.
> There is already a Vhost Library section in the programmers guide and
> a Vhost Sample Application section in the sample app guide.

Hi Iremonger,

You had same ask in my last version, and I was aware of that while
preparing this version. The reason I didn't include it is the same
as I replied to you last time: it's a very rough test guide; the
formal test guide would be with 2 hosts, and also you have to establish
a connection to the VM before migration and make sure it still works
right after the migration.

So, I will consider to add such formal test plan after validation is
done from validation team.

> The Vhost Sample Application section might be the best place to add it.

I don't think that's the right place: we can't do the live migration
test with the vhost-switch example: it includes some hardware features,
such as VLAN. Those info would be lost after migration.

And that's the reason I choose OVS.

> 
> It would be useful to add some documentation on the Vhost Logging feature 
> Itself. 

Good suggestion; I will think what I can do for it.

--yliu

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread Santosh Shukla

On Mon, Feb 1, 2016 at 7:18 PM, Yuanhan Liu  
wrote:
> On Fri, Jan 29, 2016 at 11:51:50PM +0530, Santosh Shukla wrote:
>> Introducing below api for pci bar region rd/wr.
>> Api's are:
>> - rte_eal_pci_read_bar
>> - rte_eal_pci_write_bar
>>
>> Signed-off-by: Santosh Shukla 
>> ---
>> v5-->v6:
>> - update api infor in rte_eal_version.map file
>>   suggested by david manchand.
>>
>>  lib/librte_eal/bsdapp/eal/eal_pci.c |   19 
>>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |3 ++
>>  lib/librte_eal/common/include/rte_pci.h |   38 
>> +++
>>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   34 
>>  lib/librte_eal/linuxapp/eal/eal_pci_init.h  |6 
>>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c  |   28 +
>>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |3 ++
>>  7 files changed, 131 insertions(+)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
>> b/lib/librte_eal/bsdapp/eal/eal_pci.c
>> index 95c32c1..2e535ea 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> ...
>> +int rte_eal_pci_read_bar(const struct rte_pci_device *device,
>> +  void *buf, size_t len, off_t offset,
>> +  int bar_idx)
>> +
>> +{
>> + const struct rte_intr_handle *intr_handle = >intr_handle;
>
> I'd suggest to reference this var inside pci_vfio_read/write_bar(), and
> pass device as the parmater instead.
>

(Sorry for late reply, I was travelling on Monday.)
Make sense.

>> +
>> + switch (device->kdrv) {
>> + case RTE_KDRV_VFIO:
>> + return pci_vfio_read_bar(intr_handle, buf, len,
>> +  offset, bar_idx);
>> + default:
>> + RTE_LOG(ERR, EAL, "write bar not supported by driver\n");
>^
> typo.
>

Oh, r / write / read, right? sorry for typo error (:-

>
> BTW, I have a question about this API. Obviously, reading/writing bar
> space is supported with UIO (when memory resource is mmapped). And I
> know why you introduced such 2 APIs, for reading IO bar.
>
> So, here is the question: what are the 2 APIs for, for being gerneric
> APIs to read/write bar spaces, or just to read IO bar spaces? If it's
> former, the message is wrong; if it's later, you may better rename it
> to rte_eal_pci_read/write_io_bar()?
>

Current use-case is virtio: It is used as io_bar which is first
bar[1]. But implementation is generic, can be used to do rd/wr for
other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
w/o mapping that bar, So apis will be useful for such cases in future.

AFAIU: uio has read/write_config api only and Yes if bar region mapped
then no need to do rd/wr, user can directly access the pci_memory. But
use-case of this api entirely different: unmapped memory by
application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.

Is above explanation convincing? Pl. let me know.

[1] https://en.wikipedia.org/wiki/PCI_configuration_space (first bar
offset 0x010)

> David, what do you think of that?
>
>
>> + return -1;
>> + }
>> +}
>> +
> ...
>> +int
>> +pci_vfio_read_bar(const struct rte_intr_handle *intr_handle,
>> +   void *buf, size_t len, off_t offs, int bar_idx)
>> +{
>> + if (bar_idx < VFIO_PCI_BAR0_REGION_INDEX
>> +|| bar_idx > VFIO_PCI_BAR5_REGION_INDEX) {
>
> A minor nit: it's more nature to put the '||' at the end of expression,
> instead of at the front:
>
> if (bar_idx < VFIO_PCI_BAR0_REGION_INDEX ||
> bar_idx > VFIO_PCI_BAR5_REGION_INDEX) {
>
>
> --yliu

[dpdk-dev] Future Direction for rte_eth_stats_get()

2016-02-02 Thread Kyle Larose

On Tue, Feb 2, 2016 at 7:44 AM, Tahhan, Maryam  
wrote:
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
>> Sent: Friday, January 22, 2016 8:49 PM
>> To: Igor Ryzhov 
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] Future Direction for rte_eth_stats_get()
>>
>> On Fri, Jan 22, 2016 at 06:02:24PM +0300, Igor Ryzhov wrote:
>> > How about exposing stats according to IF-MIB?
>> >
>> > Statistics to be exposed are - octets, unicast packets, multicast
>> > packets, broadcast packets, errors and discards for both TX and RX.
>> >
>> > These counters are basic and implemented by most of drivers.
>>
>> To be a bit more specific it would be good to have IF-MIB ifTable with
>> the items from ifXTable as well:
>>
>
> I think the MIBs ifTable would be good for the high level stats across all 
> the drivers for sure, we would need to take backward compatibility for the 
> current stats into account.
>
>> ifIndex
>> ifMtu
>> ifHighSpeed
>> ifPromiscuousMode
>> ifPhysAddress
>> ifConnectorPresent
>>
>> ifHCInOctets
>> ifHCInUcastPkts
>> ifHCInMulticastPkts
>> ifHCInBroadcastPkts
>> ifInDiscards
>> ifInErrors
>> ifInUnknownProtos
>>
>> ifHCOutOctets
>> ifHCOutUcastPkts
>> ifHCOutMulticastPkts
>> ifHCOutBroadcastPkts
>> ifOutDiscards
>> ifOutErrors
>>
>> A number of things are missing or weird in the DPDK stats interface.
>> Then I get stuck trying to maintain them in my app instead and it's
>> annoying.
>>
>> Also, it is nice to get the struct populated atomically so the values are as
>> self-consistent as possible. If you have to call a function separately on
>> each stat it makes them self-inconsistent because it is less atomically
>> populated.
>
> +1

I also agree about the ifTable/ifXTable. I think that a few other
ethernet oriented MIBs may also be worth considering. The RMON MIB's
etherStatsTable has some useful counters in it (namely, the packet
size histogram counters). We could also do the dot3StatsTable from the
EtherLike MIB, though I'm not sure how useful it would be.

>>
>> From long experience, this inconsistency is quite annoying when trying to
>> make very accurate traffic measurements in network management
>> software.
>>
>> Matthew.

[dpdk-dev] [PATCH 4/9] eal/linux: move back interrupt thread init before setting affinity

2016-02-02 Thread David Marchand

Hello Rahul,

On Tue, Feb 2, 2016 at 8:02 AM, Rahul Lakkireddy
 wrote:
> On Friday, January 01/29/16, 2016 at 15:08:31 +0100, David Marchand wrote:
>> Now that virtio pci driver is initialized in a constructor, we only need to
>> move the interrupt thread init after loading the plugins.
>> This way, chelsio driver should be happy again [1].
>
> Thank you for this patch.  I've tested it and it does improve the perf.
> back when there is a queue on master lcore in l3fwd app.

Did you test the whole series ? or just this specific patch ?

Anyway, great, but this is still too fragile.

As discussed in the same thread as the problem you reported, there is
some work over there to be done so that interrupts are "distributed"
in a more flexible way.
Did someone look at this ?
Plans to work on this ? (post 2.3, I suppose)

-- 
David Marchand

[dpdk-dev] [PATCH v6 1/8] eal: pci: add api to rd/wr pci bar region

2016-02-02 Thread David Marchand

On Tue, Feb 2, 2016 at 6:43 AM, Yuanhan Liu  
wrote:
> On Tue, Feb 02, 2016 at 09:44:14AM +0530, Santosh Shukla wrote:
>> Current use-case is virtio: It is used as io_bar which is first
>> bar[1]. But implementation is generic, can be used to do rd/wr for
>> other bar index too. Also vfio facilitate user to do rd/wr to pci_bars
>> w/o mapping that bar, So apis will be useful for such cases in future.
>>
>> AFAIU: uio has read/write_config api only and Yes if bar region mapped
>> then no need to do rd/wr, user can directly access the pci_memory. But
>> use-case of this api entirely different: unmapped memory by
>> application context i.e.. vfio_rd/wr-way {pread/pwrite-way}.
>>
>> Is above explanation convincing? Pl. let me know.
>
> TBH, not really. So, as you stated, it should be generic APIs to
> read/write bar space, but limiting it to VFIO only and claiming
> that read/write bar space is not support by other drivers (such
> as UIO) while in fact it can (in some ways) doesn't seem right
> to me.
>
> Anyway, it's just some thoughts from me. David, comments?

>From the very start, same opinion.
We should have a unique api to access those, and eal should hide
details like kernel drivers (uio, vfio, whatever) to the pmd.

Now the thing is, how to do this in an elegant and efficient way.


Regard,
-- 
David Marchand

[dpdk-dev] rings PMD detaching

2016-02-02 Thread David Marchand

Hello Mauricio,

On Tue, Feb 2, 2016 at 2:37 AM, Mauricio V?squez
 wrote:
> I was wondering if there were a way to detach (delete) a ring pmd device
> created with rte_eth_from_rings.  I realized that rte_eth_dev_detach does
> not work in this case because there is a comparison between the device's
> name and the driver's name in rte_eal_vdev_uninit, then devices created
> with arbitrary names can not be uninitialized.

I would say this is the same problem than what I observed in pci code.
Doing a "driver" lookup on detach is useless (and buggy in your case),
eal should just reuse the driver that the device had been attached to.
The problem is that, for vdev, the driver is not stored in a device
object at attach time, since there is no device object.

> My question is how to implement it?, I have two ideas on mind:
> - make rte_pmd_ring_devuninit a public function, then the user can call
> this using as argument the name of the device.

Having something like this enter abi, because of a bug in eal, does
not sound like a good idea.

> - modify rte_eal_vdev_uninit in such a way that there is not any comparison
> based on the dev name, probably it will require to add some extra field in
> the rte_eth_dev structure to distinguish between the different virtual
> devices.

No, the problem lies in eal where resources are bound to drivers.
No reason to pollute ethdev (we have enough workarounds in it ;-)).
This driver lookup should just go away.

If we had the rte_device I described [1], this kind of issues would disappear.

[1] http://dpdk.org/ml/archives/dev/2016-January/031390.html

Regards,
-- 
David Marchand

[dpdk-dev] [PATCH v2 2/7] lib/librte_ether: support l2 tunnel config

2016-02-02 Thread Bruce Richardson

On Tue, Feb 02, 2016 at 02:57:00PM +0800, Wenzhuo Lu wrote:
> Add functions to support l2 tunnel configuration.
> The support includes ether type modification and the tunnel support
> enabling/disabling.
> Ether type modification means modifying the ether type of a specific
> type of tunnel. So the packet with this ether type will be parsed as
> this type of tunnel.
> Enabling/disabling a tunnel support means enabling/disabling the
> ability of parsing the specific type of tunnel. This ability should
> be enabled before we enable filtering, forwarding, offloading for
> this specific type of tunnel.
> Only support e-tag tunnel now.
> 
> E-tag introduction,
> E-tag means external tag which is defined in I 802.1BR
> specification.
> E-tag is a kind of l2 tunnel. It means a tag will be inserted in the
> l2 header. Like below,
>|3124|23   16|15 8|7   0|
>   0|   Destination MAC address |
>   4| Dest MAC address(cont.)| Src MAC address  |
>   8|  Source MAC address(cont.)|
>  12| E-tag Etherenet type (0x893f)  |  E-tag header|
>  16|E-tag header(cont.)|
>  20|   VLAN Ethertype(optional) |   VLAN header(optional)  |
>  24| Original type  | ..   |
> ...|  ..   |
> The E-tag format is like below,
>|015|16   18|19 |20   31|
>|   Ethertype - 0x893f  | E-PCP |DEI|   Ingress E-CID_base  |
> 
>|32  33|34 35|36  47|48 55|56 63|
>|  RSV | GRP |E-CID_base|Ingress_E-CID_ext|E-CID_ext|
> 
> The Ingess_E-CID_ext and E-CID_ext are always zero for endpoints
> and are effectively reserved.
> 
> The more details of E-tag is in IEEE 802.1BR. 802.1BR is used to
> replace 802.1Qbh. 802.1BR is a standard for Bridge Port Extension.
> It specifies the operation of Bridge Port Extenders, including
> management, protocols, and algorithms. Bridge Port Extenders
> operate in support of the MAC Service by Extended Bridges.
> The E-tag is added to l2 header to identify the VM channel and
> the virtual port.
> 
> Signed-off-by: Wenzhuo Lu 

Your introduction to e-tag needs to be in patch 1 rather than 2, and in the
cover letter, since both those earlier mails reference it.
Furthermore, it would be better if you could link to a web description of the
e-tag rather than try and explain it all in a commit message. Diagrams also work
better on a web-page.

/Bruce

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Qiu, Michael

On 2/2/2016 11:07 AM, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Tuesday, February 2, 2016 10:57 AM
>> To: Zhang, Helin ; Lu, Wenzhuo
>> ; dev at dpdk.org
>> Cc: Zhou, Danny ; Liu, Yong ;
>> Liang, Cunming 
>> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
>>
>> On 2/2/2016 10:14 AM, Zhang, Helin wrote:
 -Original Message-
 From: Qiu, Michael
 Sent: Tuesday, February 2, 2016 10:07 AM
 To: Lu, Wenzhuo; dev at dpdk.org
 Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
 Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice

 [+cc helin]

 On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> Hi Michael,
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Monday, February 1, 2016 4:05 PM
>> To: Lu, Wenzhuo; dev at dpdk.org
>> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
>> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
>>
>> On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
>>> Hi Michael,
>>>
 -Original Message-
 From: Qiu, Michael
 Sent: Friday, January 29, 2016 1:58 PM
 To: dev at dpdk.org
 Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
 Michael
 Subject: [PATCH v2] ixgbe: Fix disable interrupt twice

 Currently, ixgbe vf and pf will disable interrupt twice in stop
 stage and uninit stage. It will cause an error:

 testpmd> quit

 Shutting down port 0...
 Stopping ports...
 Done
 Closing ports...
 EAL: Error disabling MSI-X interrupts for fd 26
 Done

 Becasue the interrupt already been disabled in stop stage.
 Since it is enabled in init stage, better remove from stop stage.
>>> I'm afraid it?s not a good idea to just remove the intr_disable
>>> from
 dev_stop.
>>> I think dev_stop have the chance to be used independently with
>>> dev_unint. In
>> this scenario, we still need intr_disable, right?
>>> Maybe what we need is some check before we disable the intr:)
>> Yes, indeed we need some check in disable intr, but it need
>> additional fields in "struct rte_intr_handle",  and it's much saft
>> to do so, but as I check i40e/fm10k code, only ixgbe disable it in
>> dev_stop().
> I found fm10k doesn?t enable intr in dev_start. So, I think it's OK.
> But i40e
 enables intr in dev_start.
> To my opinion, it's more like i40e misses the intr_disable in dev_stop.
 I don't think i40e miss it, because it not the right please to disable 
 interrupt.
 because all interrupts are enabled in init stage.

 Actually, ixgbe enable the interrupt in init stage, but in dev_start,
 it disable it first and re-enable, so it just the same with doing nothing 
 about
>> interrupt.
 Just think below:

 1. start the port.(interrupt already enabled in init stage, disable
 -->
 re-enable)
 2. stop the port.(disable interrupt)
 3. start port again(Try to disable, but failed, already disabled)

 Would you think the code has issue?
>>> [Zhang, Helin] in ixgbe PMD, it can be seen that uninit() calls
>>> dev_close(), which calls dev_stop(). So I think the disabling can be done 
>>> only in
>> dev_stop().
>>> All others can make use of dev_stop to disable the interrupt.
>> As I said, if it is in dev_stop, it will has issue when dev_start --> 
>> dev_stop -->
>> dev_start, this also could applied in i40e and fm10k. If you want to put it 
>> in
>> dev_stop, better to remove enable interrupts in init stage, and only put it 
>> in
>> dev_start.
> Oh, yes, you are talking about the refactoring. That's good, and reasonable.
> Please do more validation with LSC, mailbox, rx interrupts, to make sure there
> is no issue introduced.

I have no plan to do code refactor, it includes lots of validation, and
will influence many components, time is limited for 2.3. I would like
keep it in uninit and remove it from stop, this only affect ixgbe, and I
have done validation for it.

Thanks,
Michael
> Thanks,
> Helin
>
>> Thanks,
>> Michael
>>> Regards,
>>> Helin
>>>
 Thanks,
 Michael

> Maybe we can follow fm10k's style.
>
>> On other hand, if we remove it in dev_stop, any side effect? In
>> ixgbe start, it will always disable it first and then re-enable it, so 
>> it's safe.
> I think you mean we can disable intr anyway even if it has been disabled.
 Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable
 interrupts, and if we try disable twice, it will return and error.
 That's why I mean we need a flag to show the interrupts stats. If it
 already disabled, we do not need call in to kernel. just return and
 give a warning message.

 Thanks,
 Michael

>  Sounds more like why

[dpdk-dev] [PATCH 1/3] fm10k: enable FTAG based forwarding

2016-02-02 Thread Qiu, Michael

On 1/25/2016 4:08 PM, Wang Xiao W wrote:
> This patch enables reading sglort info into mbuf for RX and inserting
> an FTAG at the beginning of the packet for TX. The vlan_tci_outer field
> selected from rte_mbuf structure for sglort is not used in fm10k now.
> In FTAG based forwarding mode, the switch will forward packets according
> to glort info in FTAG rather than mac and vlan table.
>
> To activate this feature, user needs to turn 
> ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> to y in common_linuxapp or common_bsdapp. Currently this feature is supported
> only on PF, because FM10K_PFVTCTL register is read-only for VF.
>
> Signed-off-by: Wang Xiao W 
> ---
>  config/common_bsdapp   |  1 +
>  config/common_linuxapp |  1 +
>  drivers/net/fm10k/fm10k_ethdev.c   |  8 
>  drivers/net/fm10k/fm10k_rxtx.c | 17 +
>  drivers/net/fm10k/fm10k_rxtx_vec.c |  9 +
>  5 files changed, 36 insertions(+)
>
> diff --git a/config/common_bsdapp b/config/common_bsdapp
> index ed7c31c..451f81a 100644
> --- a/config/common_bsdapp
> +++ b/config/common_bsdapp
> @@ -208,6 +208,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
> +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
>  
>  #
>  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 74bc515..c928bce 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -207,6 +207,7 @@ CONFIG_RTE_LIBRTE_FM10K_DEBUG_TX_FREE=n
>  CONFIG_RTE_LIBRTE_FM10K_DEBUG_DRIVER=n
>  CONFIG_RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y
>  CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y
> +CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD=n
>  
>  #
>  # Compile burst-oriented Mellanox ConnectX-3 (MLX4) PMD
> diff --git a/drivers/net/fm10k/fm10k_ethdev.c 
> b/drivers/net/fm10k/fm10k_ethdev.c
> index e4aed94..cc8317f 100644
> --- a/drivers/net/fm10k/fm10k_ethdev.c
> +++ b/drivers/net/fm10k/fm10k_ethdev.c
> @@ -668,6 +668,14 @@ fm10k_dev_tx_init(struct rte_eth_dev *dev)
>   PMD_INIT_LOG(ERR, "failed to disable queue %d", i);
>   return -1;
>   }
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /* Enable use of FTAG bit in TX descriptor, PFVTCTL
> +  * register is read-only for VF.
> +  */
> + if (hw->mac.type == fm10k_mac_pf)
> + FM10K_WRITE_REG(hw, FM10K_PFVTCTL(i),
> + FM10K_PFVTCTL_FTAG_DESC_ENABLE);

So here if somebody enable FTAG, when compile, but he use VF, what will
happen? We'd better to give a error message when he try to use VF with FTAG.

Thanks,
Michael
> +#endif
>  
>   /* set location and size for descriptor ring */
>   FM10K_WRITE_REG(hw, FM10K_TDBAL(i),
> diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
> index e958865..f87987d 100644
> --- a/drivers/net/fm10k/fm10k_rxtx.c
> +++ b/drivers/net/fm10k/fm10k_rxtx.c
> @@ -152,6 +152,13 @@ fm10k_recv_pkts(void *rx_queue, struct rte_mbuf 
> **rx_pkts,
>*/
>   mbuf->ol_flags |= PKT_RX_VLAN_PKT;
>   mbuf->vlan_tci = desc.w.vlan;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /**
> +  * mbuf->vlan_tci_outer is an idle field in fm10k driver,
> +  * so it can be selected to store sglort value.
> +  */
> + mbuf->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
> +#endif
>  
>   rx_pkts[count] = mbuf;
>   if (++next_dd == q->nb_desc) {
> @@ -307,6 +314,13 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct 
> rte_mbuf **rx_pkts,
>*/
>   mbuf->ol_flags |= PKT_RX_VLAN_PKT;
>   first_seg->vlan_tci = desc.w.vlan;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + /**
> +  * mbuf->vlan_tci_outer is an idle field in fm10k driver,
> +  * so it can be selected to store sglort value.
> +  */
> + first_seg->vlan_tci_outer = rte_le_to_cpu_16(desc.w.sglort);
> +#endif
>  
>   /* Prefetch data of first segment, if configured to do so. */
>   rte_packet_prefetch((char *)first_seg->buf_addr +
> @@ -432,6 +446,9 @@ static inline void tx_xmit_pkt(struct fm10k_tx_queue *q, 
> struct rte_mbuf *mb)
>   q->nb_free -= mb->nb_segs;
>  
>   q->hw_ring[q->next_free].flags = 0;
> +#ifdef RTE_LIBRTE_FM10K_FTAG_FWD
> + q->hw_ring[q->next_free].flags |= FM10K_TXD_FLAG_FTAG;
> +#endif
>   /* set checksum flags on first descriptor of packet. SCTP checksum
>* offload is not supported, but we do not explicitly check for this
>* case in favor of greatly simplified processing. */
> diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c 
> b/drivers/net/fm10k/fm10k_rxtx_vec.c
> index 2a57eef..0b0f2e3 100644
> ---

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Zhang, Helin



> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, February 2, 2016 10:57 AM
> To: Zhang, Helin ; Lu, Wenzhuo
> ; dev at dpdk.org
> Cc: Zhou, Danny ; Liu, Yong ;
> Liang, Cunming 
> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
> On 2/2/2016 10:14 AM, Zhang, Helin wrote:
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Tuesday, February 2, 2016 10:07 AM
> >> To: Lu, Wenzhuo; dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
> >> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> [+cc helin]
> >>
> >> On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> >>> Hi Michael,
> >>>
>  -Original Message-
>  From: Qiu, Michael
>  Sent: Monday, February 1, 2016 4:05 PM
>  To: Lu, Wenzhuo; dev at dpdk.org
>  Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
>  Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
>  On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> > Hi Michael,
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Friday, January 29, 2016 1:58 PM
> >> To: dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
> >> Michael
> >> Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> Currently, ixgbe vf and pf will disable interrupt twice in stop
> >> stage and uninit stage. It will cause an error:
> >>
> >> testpmd> quit
> >>
> >> Shutting down port 0...
> >> Stopping ports...
> >> Done
> >> Closing ports...
> >> EAL: Error disabling MSI-X interrupts for fd 26
> >> Done
> >>
> >> Becasue the interrupt already been disabled in stop stage.
> >> Since it is enabled in init stage, better remove from stop stage.
> > I'm afraid it's not a good idea to just remove the intr_disable
> > from
> >> dev_stop.
> > I think dev_stop have the chance to be used independently with
> > dev_unint. In
>  this scenario, we still need intr_disable, right?
> > Maybe what we need is some check before we disable the intr:)
>  Yes, indeed we need some check in disable intr, but it need
>  additional fields in "struct rte_intr_handle",  and it's much saft
>  to do so, but as I check i40e/fm10k code, only ixgbe disable it in
> dev_stop().
> >>> I found fm10k doesn't enable intr in dev_start. So, I think it's OK.
> >>> But i40e
> >> enables intr in dev_start.
> >>> To my opinion, it's more like i40e misses the intr_disable in dev_stop.
> >> I don't think i40e miss it, because it not the right please to disable 
> >> interrupt.
> >> because all interrupts are enabled in init stage.
> >>
> >> Actually, ixgbe enable the interrupt in init stage, but in dev_start,
> >> it disable it first and re-enable, so it just the same with doing nothing 
> >> about
> interrupt.
> >>
> >> Just think below:
> >>
> >> 1. start the port.(interrupt already enabled in init stage, disable
> >> -->
> >> re-enable)
> >> 2. stop the port.(disable interrupt)
> >> 3. start port again(Try to disable, but failed, already disabled)
> >>
> >> Would you think the code has issue?
> > [Zhang, Helin] in ixgbe PMD, it can be seen that uninit() calls
> > dev_close(), which calls dev_stop(). So I think the disabling can be done 
> > only in
> dev_stop().
> > All others can make use of dev_stop to disable the interrupt.
> 
> As I said, if it is in dev_stop, it will has issue when dev_start --> 
> dev_stop -->
> dev_start, this also could applied in i40e and fm10k. If you want to put it in
> dev_stop, better to remove enable interrupts in init stage, and only put it in
> dev_start.
Oh, yes, you are talking about the refactoring. That's good, and reasonable.
Please do more validation with LSC, mailbox, rx interrupts, to make sure there
is no issue introduced.

Thanks,
Helin

> 
> Thanks,
> Michael
> > Regards,
> > Helin
> >
> >> Thanks,
> >> Michael
> >>
> >>> Maybe we can follow fm10k's style.
> >>>
>  On other hand, if we remove it in dev_stop, any side effect? In
>  ixgbe start, it will always disable it first and then re-enable it, so 
>  it's safe.
> >>> I think you mean we can disable intr anyway even if it has been disabled.
> >> Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable
> >> interrupts, and if we try disable twice, it will return and error.
> >> That's why I mean we need a flag to show the interrupts stats. If it
> >> already disabled, we do not need call in to kernel. just return and
> >> give a warning message.
> >>
> >> Thanks,
> >> Michael
> >>
> >>>  Sounds more like why we don't
> >>> need this patch :)
> >>>
>  Thanks,
>  Michael
> >

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Qiu, Michael

On 2/2/2016 10:14 AM, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Tuesday, February 2, 2016 10:07 AM
>> To: Lu, Wenzhuo; dev at dpdk.org
>> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
>> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
>>
>> [+cc helin]
>>
>> On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
>>> Hi Michael,
>>>
 -Original Message-
 From: Qiu, Michael
 Sent: Monday, February 1, 2016 4:05 PM
 To: Lu, Wenzhuo; dev at dpdk.org
 Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
 Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice

 On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> Hi Michael,
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Friday, January 29, 2016 1:58 PM
>> To: dev at dpdk.org
>> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
>> Michael
>> Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
>>
>> Currently, ixgbe vf and pf will disable interrupt twice in stop
>> stage and uninit stage. It will cause an error:
>>
>> testpmd> quit
>>
>> Shutting down port 0...
>> Stopping ports...
>> Done
>> Closing ports...
>> EAL: Error disabling MSI-X interrupts for fd 26
>> Done
>>
>> Becasue the interrupt already been disabled in stop stage.
>> Since it is enabled in init stage, better remove from stop stage.
> I'm afraid it?s not a good idea to just remove the intr_disable from
>> dev_stop.
> I think dev_stop have the chance to be used independently with
> dev_unint. In
 this scenario, we still need intr_disable, right?
> Maybe what we need is some check before we disable the intr:)
 Yes, indeed we need some check in disable intr, but it need
 additional fields in "struct rte_intr_handle",  and it's much saft to
 do so, but as I check i40e/fm10k code, only ixgbe disable it in dev_stop().
>>> I found fm10k doesn?t enable intr in dev_start. So, I think it's OK. But 
>>> i40e
>> enables intr in dev_start.
>>> To my opinion, it's more like i40e misses the intr_disable in dev_stop.
>> I don't think i40e miss it, because it not the right please to disable 
>> interrupt.
>> because all interrupts are enabled in init stage.
>>
>> Actually, ixgbe enable the interrupt in init stage, but in dev_start, it 
>> disable it
>> first and re-enable, so it just the same with doing nothing about interrupt.
>>
>> Just think below:
>>
>> 1. start the port.(interrupt already enabled in init stage, disable -->
>> re-enable)
>> 2. stop the port.(disable interrupt)
>> 3. start port again(Try to disable, but failed, already disabled)
>>
>> Would you think the code has issue?
> [Zhang, Helin] in ixgbe PMD, it can be seen that uninit() calls dev_close(),
> which calls dev_stop(). So I think the disabling can be done only in 
> dev_stop().
> All others can make use of dev_stop to disable the interrupt.

As I said, if it is in dev_stop, it will has issue when dev_start -->
dev_stop --> dev_start, this also could applied in i40e and fm10k. If
you want to put it in dev_stop, better to remove enable interrupts in
init stage, and only put it in dev_start.

Thanks,
Michael
> Regards,
> Helin
>
>> Thanks,
>> Michael
>>
>>> Maybe we can follow fm10k's style.
>>>
 On other hand, if we remove it in dev_stop, any side effect? In ixgbe
 start, it will always disable it first and then re-enable it, so it's safe.
>>> I think you mean we can disable intr anyway even if it has been disabled.
>> Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable interrupts, 
>> and
>> if we try disable twice, it will return and error.
>> That's why I mean we need a flag to show the interrupts stats. If it already
>> disabled, we do not need call in to kernel. just return and give a warning
>> message.
>>
>> Thanks,
>> Michael
>>
>>>  Sounds more like why we don't
>>> need this patch :)
>>>
 Thanks,
 Michael
>

[dpdk-dev] i40evf DPDK init_adminq failed: -53

2016-02-02 Thread Zhang, Helin

Hi Saurabh

XL710/X710 firmware update tool/images can be found on www.intel.com, that?s 
the formal way.
Please try there first.

Regards,
Helin

From: Saurabh Mishra [mailto:saurabh.gl...@gmail.com]
Sent: Tuesday, February 2, 2016 2:25 AM
To: Qiu, Michael
Cc: dev at dpdk.org; users at dpdk.org; Zhang, Helin; Xu, Qian Q
Subject: Re: [dpdk-dev] i40evf DPDK init_adminq failed: -53

Hi Michael --

What are the steps to upgrade i40e firmware. We are using CentOS7

It didn't work with guest VF driver either on ESXi and KVM.

Sure. I will blacklist i40evf driver and try it out.

Thanks,
/Saurabh


On Mon, Feb 1, 2016 at 12:16 AM, Qiu, Michael mailto:michael.qiu at intel.com>> wrote:
Hi, Saurabh

It's known issue, to fix this you'd better to upgrade the firmware
version of i40e.

BTW, will it work in guest with kernel driver?

If yes, to workaround(somebody reports it does not work for them):
Remove i40e.ko in guest, so that it will not auto-loaded when boot up.

Hope it works for you.

Thanks,
Michael


On 1/30/2016 4:35 AM, Saurabh Mishra wrote:
> Has anybody seen this before? What's the workaround or fix? We are using
> dpdk-2.2.0 on KVM centos:
>
> Host PF version: 1.0.11-k on Centos7
>
>
> [root@ ~]# ./symmetric_mp fakeelf -c 2 -m2048 -n4 --proc-type=primary -- -p
> 3 --num-procs=2 --proc-id=0
>
> [.]
>
> EAL: Virtual area found at 0x7fff7580 (size = 0x20)
>
> EAL: Requesting 1024 pages of size 2MB from socket 0
>
> EAL: TSC frequency is ~2600141 KHz
>
> EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable
> clock cycles !
>
> EAL: Master lcore 1 is ready (tid=f7fed880;cpuset=[1])
>
> EAL: PCI device :00:04.0 on NUMA socket 0
>
> EAL:   probe driver: 8086:154c rte_i40evf_pmd
>
> EAL:   PCI memory mapped at 0x7620
>
> EAL:   PCI memory mapped at 0x7621
>
> EAL: PCI device :00:05.0 on NUMA socket 0
>
> EAL:   probe driver: 8086:154c rte_i40evf_pmd
>
> EAL:   PCI memory mapped at 0x76214000
>
> EAL:   PCI memory mapped at 0x76224000
>
> PMD: i40evf_init_vf(): init_adminq failed: -53
>
> PMD: i40evf_dev_init(): Init vf failed
>
> EAL: Error - exiting with code: 1
>
>   Cause: Requested device :00:05.0 cannot be used
>
> [root at PA-VM ~]# ./dpdk-2.2.0/tools/dpdk_nic_bind.py --status
>
>
> Network devices using DPDK-compatible driver
>
> 
>
> :00:04.0 'Device 154c' drv=igb_uio unused=uio_pci_generic
>
> :00:05.0 'Device 154c' drv=igb_uio unused=uio_pci_generic
>
>
> Network devices using kernel driver
>
> ===
>
> :00:03.0 'RTL-8139/8139C/8139C+' if=eth0 drv=8139cp
> unused=igb_uio,uio_pci_generic *Active*
>
>
> Other network devices
>
> =
>
> 
>
> [root@ ~]#
>
>
>
> 04:00.0 *Ether*net controller: Intel Corporation *Ether*net Controller X710
> for 10GbE SFP+ (rev 01)
>
> 04:00.1 *Ether*net controller: Intel Corporation *Ether*net Controller X710
> for 10GbE SFP+ (rev 01)
>
> 04:02.0 *Ether*net controller: Intel Corporation XL710/X710 Virtual
> Function (rev 01)
>
> 04:02.1 *Ether*net controller: Intel Corporation XL710/X710 Virtual
> Function (rev 01)
>
> 04:0a.0 *Ether*net controller: Intel Corporation XL710/X710 Virtual
> Function (rev 01)
>
> 04:0a.1 *Ether*net controller: Intel Corporation XL710/X710 Virtual
> Function (rev 01)
>
>
>
> [root at oscompute3 ~]# dmesg | tail
>
> [2064188.042835] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.062836] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.082862] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.102838] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.122850] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.142852] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.162850] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.182845] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.202845] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [2064188.222858] i40evf :04:0a.1: i40evf_add_ether_addrs: command 15
> pending
>
> [root at oscompute3 ~]#
>
>
> /Saurabh
>

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Lu, Wenzhuo

Hi Michael,

Acked-by: Wenzhuo Lu 

> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, February 2, 2016 10:07 AM
> To: Lu, Wenzhuo; dev at dpdk.org
> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
> [+cc helin]
> 
> On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> > Hi Michael,
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Monday, February 1, 2016 4:05 PM
> >> To: Lu, Wenzhuo; dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
> >> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> >>> Hi Michael,
> >>>
>  -Original Message-
>  From: Qiu, Michael
>  Sent: Friday, January 29, 2016 1:58 PM
>  To: dev at dpdk.org
>  Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
>  Michael
>  Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
>  Currently, ixgbe vf and pf will disable interrupt twice in stop
>  stage and uninit stage. It will cause an error:
> 
>  testpmd> quit
> 
>  Shutting down port 0...
>  Stopping ports...
>  Done
>  Closing ports...
>  EAL: Error disabling MSI-X interrupts for fd 26
>  Done
> 
>  Becasue the interrupt already been disabled in stop stage.
>  Since it is enabled in init stage, better remove from stop stage.
> >>> I'm afraid it's not a good idea to just remove the intr_disable from 
> >>> dev_stop.
> >>> I think dev_stop have the chance to be used independently with
> >>> dev_unint. In
> >> this scenario, we still need intr_disable, right?
> >>> Maybe what we need is some check before we disable the intr:)
> >> Yes, indeed we need some check in disable intr, but it need
> >> additional fields in "struct rte_intr_handle",  and it's much saft to
> >> do so, but as I check i40e/fm10k code, only ixgbe disable it in dev_stop().
> > I found fm10k doesn't enable intr in dev_start. So, I think it's OK. But 
> > i40e
> enables intr in dev_start.
> > To my opinion, it's more like i40e misses the intr_disable in dev_stop.
> 
> I don't think i40e miss it, because it not the right please to disable 
> interrupt.
> because all interrupts are enabled in init stage.
> 
> Actually, ixgbe enable the interrupt in init stage, but in dev_start, it 
> disable it first
> and re-enable, so it just the same with doing nothing about interrupt.
> 
> Just think below:
> 
> 1. start the port.(interrupt already enabled in init stage, disable -->
> re-enable)
> 2. stop the port.(disable interrupt)
> 3. start port again(Try to disable, but failed, already disabled)
> 
> Would you think the code has issue?
Got your point. So, dev_start/stop will not influence the state of intr 
enabling/disabling.
The intr will be enabled/disabled during dev_init/unint. 
Thanks.

> 
> Thanks,
> Michael
> 
> > Maybe we can follow fm10k's style.
> >
> >> On other hand, if we remove it in dev_stop, any side effect? In ixgbe
> >> start, it will always disable it first and then re-enable it, so it's safe.
> > I think you mean we can disable intr anyway even if it has been disabled.
> 
> Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable interrupts, 
> and if
> we try disable twice, it will return and error.
> That's why I mean we need a flag to show the interrupts stats. If it already
> disabled, we do not need call in to kernel. just return and give a warning
> message.
> 
> Thanks,
> Michael
> 
> >  Sounds more like why we don't
> > need this patch :)
> >
> >> Thanks,
> >> Michael
> >

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Zhang, Helin



> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, February 2, 2016 10:07 AM
> To: Lu, Wenzhuo; dev at dpdk.org
> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Zhang, Helin
> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
> [+cc helin]
> 
> On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> > Hi Michael,
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Monday, February 1, 2016 4:05 PM
> >> To: Lu, Wenzhuo; dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
> >> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> >>> Hi Michael,
> >>>
>  -Original Message-
>  From: Qiu, Michael
>  Sent: Friday, January 29, 2016 1:58 PM
>  To: dev at dpdk.org
>  Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu,
>  Michael
>  Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
>  Currently, ixgbe vf and pf will disable interrupt twice in stop
>  stage and uninit stage. It will cause an error:
> 
>  testpmd> quit
> 
>  Shutting down port 0...
>  Stopping ports...
>  Done
>  Closing ports...
>  EAL: Error disabling MSI-X interrupts for fd 26
>  Done
> 
>  Becasue the interrupt already been disabled in stop stage.
>  Since it is enabled in init stage, better remove from stop stage.
> >>> I'm afraid it's not a good idea to just remove the intr_disable from
> dev_stop.
> >>> I think dev_stop have the chance to be used independently with
> >>> dev_unint. In
> >> this scenario, we still need intr_disable, right?
> >>> Maybe what we need is some check before we disable the intr:)
> >> Yes, indeed we need some check in disable intr, but it need
> >> additional fields in "struct rte_intr_handle",  and it's much saft to
> >> do so, but as I check i40e/fm10k code, only ixgbe disable it in dev_stop().
> > I found fm10k doesn't enable intr in dev_start. So, I think it's OK. But 
> > i40e
> enables intr in dev_start.
> > To my opinion, it's more like i40e misses the intr_disable in dev_stop.
> 
> I don't think i40e miss it, because it not the right please to disable 
> interrupt.
> because all interrupts are enabled in init stage.
> 
> Actually, ixgbe enable the interrupt in init stage, but in dev_start, it 
> disable it
> first and re-enable, so it just the same with doing nothing about interrupt.
> 
> Just think below:
> 
> 1. start the port.(interrupt already enabled in init stage, disable -->
> re-enable)
> 2. stop the port.(disable interrupt)
> 3. start port again(Try to disable, but failed, already disabled)
> 
> Would you think the code has issue?
[Zhang, Helin] in ixgbe PMD, it can be seen that uninit() calls dev_close(),
which calls dev_stop(). So I think the disabling can be done only in dev_stop().
All others can make use of dev_stop to disable the interrupt.

Regards,
Helin

> 
> Thanks,
> Michael
> 
> > Maybe we can follow fm10k's style.
> >
> >> On other hand, if we remove it in dev_stop, any side effect? In ixgbe
> >> start, it will always disable it first and then re-enable it, so it's safe.
> > I think you mean we can disable intr anyway even if it has been disabled.
> 
> Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable interrupts, 
> and
> if we try disable twice, it will return and error.
> That's why I mean we need a flag to show the interrupts stats. If it already
> disabled, we do not need call in to kernel. just return and give a warning
> message.
> 
> Thanks,
> Michael
> 
> >  Sounds more like why we don't
> > need this patch :)
> >
> >> Thanks,
> >> Michael
> >

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Qiu, Michael

[+cc helin]

On 2/2/2016 9:03 AM, Lu, Wenzhuo wrote:
> Hi Michael,
>
>> -Original Message-
>> From: Qiu, Michael
>> Sent: Monday, February 1, 2016 4:05 PM
>> To: Lu, Wenzhuo; dev at dpdk.org
>> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
>> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
>>
>> On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
>>> Hi Michael,
>>>
 -Original Message-
 From: Qiu, Michael
 Sent: Friday, January 29, 2016 1:58 PM
 To: dev at dpdk.org
 Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu, Michael
 Subject: [PATCH v2] ixgbe: Fix disable interrupt twice

 Currently, ixgbe vf and pf will disable interrupt twice in stop stage
 and uninit stage. It will cause an error:

 testpmd> quit

 Shutting down port 0...
 Stopping ports...
 Done
 Closing ports...
 EAL: Error disabling MSI-X interrupts for fd 26
 Done

 Becasue the interrupt already been disabled in stop stage.
 Since it is enabled in init stage, better remove from stop stage.
>>> I'm afraid it?s not a good idea to just remove the intr_disable from 
>>> dev_stop.
>>> I think dev_stop have the chance to be used independently with dev_unint. In
>> this scenario, we still need intr_disable, right?
>>> Maybe what we need is some check before we disable the intr:)
>> Yes, indeed we need some check in disable intr, but it need additional 
>> fields in
>> "struct rte_intr_handle",  and it's much saft to do so, but as I check 
>> i40e/fm10k
>> code, only ixgbe disable it in dev_stop().
> I found fm10k doesn?t enable intr in dev_start. So, I think it's OK. But i40e 
> enables intr in dev_start.
> To my opinion, it's more like i40e misses the intr_disable in dev_stop.

I don't think i40e miss it, because it not the right please to disable
interrupt. because all interrupts are enabled in init stage.

Actually, ixgbe enable the interrupt in init stage, but in dev_start, it
disable it first and re-enable, so it just the same with doing nothing
about interrupt.

Just think below:

1. start the port.(interrupt already enabled in init stage, disable -->
re-enable)
2. stop the port.(disable interrupt)
3. start port again(Try to disable, but failed, already disabled)

Would you think the code has issue?

Thanks,
Michael

> Maybe we can follow fm10k's style.
>
>> On other hand, if we remove it in dev_stop, any side effect? In ixgbe start, 
>> it will
>> always disable it first and then re-enable it, so it's safe.
> I think you mean we can disable intr anyway even if it has been disabled.

Actually, we couldn't, DPDK call VFIO ioctl to kernel to disable
interrupts, and if we try disable twice, it will return and error.
That's why I mean we need a flag to show the interrupts stats. If it
already disabled, we do not need call in to kernel. just return and give
a warning message.

Thanks,
Michael

>  Sounds more like why we don't
> need this patch :)
>
>> Thanks,
>> Michael
>

[dpdk-dev] [PATCH 0/8] support E-tag offloading and forwarding on Intel X550 NIC

2016-02-02 Thread Lu, Wenzhuo

Hi Michael,

> -Original Message-
> From: Qiu, Michael
> Sent: Monday, February 1, 2016 4:32 PM
> To: Yuanhan Liu; Lu, Wenzhuo
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/8] support E-tag offloading and forwarding on
> Intel X550 NIC
> 
> On 2/1/2016 9:38 AM, Yuanhan Liu wrote:
> > On Mon, Feb 01, 2016 at 01:04:52AM +, Lu, Wenzhuo wrote:
> >> Hi,
> >>
> >>> -Original Message-
> >>> From: Qiu, Michael
> >>> Sent: Friday, January 29, 2016 3:16 PM
> >>> To: Lu, Wenzhuo; dev at dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH 0/8] support E-tag offloading and
> >>> forwarding on Intel X550 NIC
> >>>
> >>> Hi, Wenzhuo
> >>>
> >>> Better to explain what E-tag is, so that reviewers could known it.
> >> Yes, it's better. But not sure where should I add this info. In release 
> >> note or
> just cover letter? Any suggestion? Thanks.
> > It should be done in the first patch introduced E-tag, so that it will
> > be in the git log history. And of course, it does no harm at all to
> > mention (briefly) it again in cover letter, so that
> > reviewer/maintainer has a brief understanding of your whole patchset first.
> >
> > --yliu
> 
> Yes, in my view, cover letter is a good place, but as want to be in git log 
> history,
> it would be better to include in the right patch of the feature, because most 
> of
> time, the first patch is not core related to new feature, perhaps only some
> prepare code.
> 
> 
> My opinion is to explain it where it first be mentioned in the code.
I think it's good to explain it when we meet this concept at the first time. 
I'll do that. Thanks.

> 
> But again, it OK for Yuanhan's solution, the only thing you want to do is 
> think
> you are a reviewer, and want to review you patch, what do you want.
> 
> Thanks,
> Michael

[dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed config API

2016-02-02 Thread Marc

On 1 February 2016 at 01:40, Zhang, Helin  wrote:

>
>
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Friday, January 29, 2016 6:18 PM
> > To: Thomas Monjalon
> > Cc: dev at dpdk.org; Marc Sune; Lu, Wenzhuo; Zhang, Helin; Harish Patil;
> Chen,
> > Jing D; Mcnamara, John
> > Subject: RE: [dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed config
> > API
> >
> >
> >
> > > -Original Message-
> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > Sent: Friday, January 29, 2016 9:54 AM
> > > To: Ananyev, Konstantin
> > > Cc: dev at dpdk.org; Marc Sune; Lu, Wenzhuo; Zhang, Helin; Harish Patil;
> > > Chen, Jing D; Mcnamara, John
> > > Subject: Re: [dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed
> > > config API
> > >
> > > 2016-01-29 09:47, Ananyev, Konstantin:
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > > 2016-01-29 09:24, Ananyev, Konstantin:
> > > > > > Can you avoid modifications in the e1000/base code?
> > > > > > We do not modify (and maintain) that part on our own.
> > > > > > Instead we take it straight from Intel ND.
> > > > > > So if you feel like these changes are really necessary - please
> > > > > > submit a patch to ND first, and if your changes will be applied,
> will pick
> > it up from them.
> > > > >
> > > > > I was not aware we can submit a change to ND for Intel base
> drivers.
> > > > > What is the procedure please?
> > > >
> > > > I meant not to the ND directly, but probably to the freebsd e1000
> kernel
> > driver.
> > > > As I remember, that is the closest one to what we have.
> > > > From my understanding (I might be wrong here):
> > > > If they will be accepted, we should see these changes In next code
> drops
> > from ND.
> > >
> > > These base drivers are used in several places.
> > > We are allowed to submit a patch in Linux or FreeBSD but not in DPDK
> > > where the base driver is verbatim?
> >
> > Yes, that's my understanding.
> >
> > > We have an agreement to not touch them in DPDK
> >
> > Yes.
> >
> > > but I still think the
> > > ND team could consider some patches from dpdk.org.
> >
> > I personally think that would be a good thing, but it is up to ND guys
> to make
> > such decision.
> [Zhang, Helin] The key reason of not touching base driver is we don't want
> to
> maintain those source files, and just reuse others.


So files under base/ strictly copies of what is in this other Intel
repository (ND) or there are modifications?

If IIRC rte_link was crafted so that matches e1000 (at least) so that link
reads can be done atomically. I think it makes more sense that ethdev has a
generic, device independent struct and that drivers handle the translation,
if necessary. Do we agree on this?

This can help us a lot.
> We should try to avoid touching source files in base driver, but if you
> still insist
> something critical or a bug should be faced. First of all we can try to do
> something
> in the dpdk developed source files (e.g. i40e_ethdev.c, i40e_rxtx.c,
> i40e_osdep.h).
> This was what we have done for a long time, and it works quite well.
> If there is no other way to fix a bug in base driver, we can try the way
> like
> Konstantin indicated, or let me know, I will try to influence ND. But note
> that this
> might be the lowest efficiency way, due to the complicated process.


> Sorry for any inconvenience! This the way we are using now might be the
> best for
> us right now.
>

I will go back and redesign commit 3 in the series once more. I will need
some time (other things in the pipeline). I would have appreciated rising
this red flag in one of the 6 previous versions.

marc


>
> Regards,
> Heiln
>
> >
> > Konstantin
>
>

[dpdk-dev] [PATCH v2] ixgbe: Fix disable interrupt twice

2016-02-02 Thread Lu, Wenzhuo

Hi Michael,

> -Original Message-
> From: Qiu, Michael
> Sent: Monday, February 1, 2016 4:05 PM
> To: Lu, Wenzhuo; dev at dpdk.org
> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming
> Subject: Re: [PATCH v2] ixgbe: Fix disable interrupt twice
> 
> On 1/29/2016 4:07 PM, Lu, Wenzhuo wrote:
> > Hi Michael,
> >
> >> -Original Message-
> >> From: Qiu, Michael
> >> Sent: Friday, January 29, 2016 1:58 PM
> >> To: dev at dpdk.org
> >> Cc: Zhou, Danny; Liu, Yong; Liang, Cunming; Lu, Wenzhuo; Qiu, Michael
> >> Subject: [PATCH v2] ixgbe: Fix disable interrupt twice
> >>
> >> Currently, ixgbe vf and pf will disable interrupt twice in stop stage
> >> and uninit stage. It will cause an error:
> >>
> >> testpmd> quit
> >>
> >> Shutting down port 0...
> >> Stopping ports...
> >> Done
> >> Closing ports...
> >> EAL: Error disabling MSI-X interrupts for fd 26
> >> Done
> >>
> >> Becasue the interrupt already been disabled in stop stage.
> >> Since it is enabled in init stage, better remove from stop stage.
> > I'm afraid it's not a good idea to just remove the intr_disable from 
> > dev_stop.
> > I think dev_stop have the chance to be used independently with dev_unint. In
> this scenario, we still need intr_disable, right?
> > Maybe what we need is some check before we disable the intr:)
> 
> Yes, indeed we need some check in disable intr, but it need additional fields 
> in
> "struct rte_intr_handle",  and it's much saft to do so, but as I check 
> i40e/fm10k
> code, only ixgbe disable it in dev_stop().
I found fm10k doesn't enable intr in dev_start. So, I think it's OK. But i40e 
enables intr in dev_start.
To my opinion, it's more like i40e misses the intr_disable in dev_stop.
Maybe we can follow fm10k's style.

> 
> On other hand, if we remove it in dev_stop, any side effect? In ixgbe start, 
> it will
> always disable it first and then re-enable it, so it's safe.
I think you mean we can disable intr anyway even if it has been disabled. 
Sounds more like why we don't
need this patch :)

> 
> Thanks,
> Michael
> >

[dpdk-dev] FW: [PATCH v7 3/5] ethdev: redesign link speed config API

2016-02-02 Thread Zhang, Helin



>From: marc.sune at gmail.com [mailto:marc.sune at gmail.com] On Behalf Of Marc
>Sent: Tuesday, February 2, 2016 8:04 AM
>To: Zhang, Helin
>Cc: Ananyev, Konstantin; Thomas Monjalon; dev at dpdk.org; Lu, Wenzhuo; Harish 
>Patil; Chen, Jing D; Mcnamara, John
>Subject: Re: [dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed config API
>
>
>
>On 1 February 2016 at 01:40, Zhang, Helin  wrote:
>
>
>> -Original Message-
>> From: Ananyev, Konstantin
>> Sent: Friday, January 29, 2016 6:18 PM
>> To: Thomas Monjalon
>> Cc: dev at dpdk.org; Marc Sune; Lu, Wenzhuo; Zhang, Helin; Harish Patil; 
>> Chen,
>> Jing D; Mcnamara, John
>> Subject: RE: [dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed config
>> API
>>
>>
>>
>> > -Original Message-
>> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> > Sent: Friday, January 29, 2016 9:54 AM
>> > To: Ananyev, Konstantin
>> > Cc: dev at dpdk.org; Marc Sune; Lu, Wenzhuo; Zhang, Helin; Harish Patil;
>> > Chen, Jing D; Mcnamara, John
>> > Subject: Re: [dpdk-dev] [PATCH v7 3/5] ethdev: redesign link speed
>> > config API
>> >
>> > 2016-01-29 09:47, Ananyev, Konstantin:
>> > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> > > > 2016-01-29 09:24, Ananyev, Konstantin:
>> > > > > Can you avoid modifications in the e1000/base code?
>> > > > > We do not modify (and maintain) that part on our own.
>> > > > > Instead we take it straight from Intel ND.
>> > > > > So if you feel like these changes are really necessary - please
>> > > > > submit a patch to ND first, and if your changes will be applied, 
>> > > > > will pick
>> it up from them.
>> > > >
>> > > > I was not aware we can submit a change to ND for Intel base drivers.
>> > > > What is the procedure please?
>> > >
>> > > I meant not to the ND directly, but probably to the freebsd e1000 kernel
>> driver.
>> > > As I remember, that is the closest one to what we have.
>> > > From my understanding (I might be wrong here):
>> > > If they will be accepted, we should see these changes In next code drops
>> from ND.
>> >
>> > These base drivers are used in several places.
>> > We are allowed to submit a patch in Linux or FreeBSD but not in DPDK
>> > where the base driver is verbatim?
>>
>> Yes, that's my understanding.
>>
>> > We have an agreement to not touch them in DPDK
>>
>> Yes.
>>
>> > but I still think the
>> > ND team could consider some patches from dpdk.org.
>>
>> I personally think that would be a good thing, but it is up to ND guys to 
>> make
>> such decision.
>[Zhang, Helin] The key reason of not touching base driver is we don't want to
>maintain those source files, and just reuse others. 
>
>So files under base/ strictly copies of what is in this other Intel repository 
>(ND) or there are modifications?
Long time ago, those source files were copied from FreeBSD version of driver, 
without any modification.
Though there is a bit different from that time, but they are similar.
Note that 'base/i40e_osdep.h or ixgbe_osdep.h' is the only file we can modify 
to support DPDK.
Of cause, all other files than base driver are what we developed, and can be 
modified.

>
>If IIRC rte_link was crafted so that matches e1000 (at least) so that link 
>reads can be done atomically. I think it makes more sense that ethdev has a 
>generic, device independent struct and that drivers handle the translation, if 
>necessary. Do we agree on this?
I agree with you to have a generic structure in ethdev, and it can be filled 
with information obtained from base driver.

>
>This can help us a lot.
>We should try to avoid touching source files in base driver, but if you still 
>insist
>something critical or a bug should be faced. First of all we can try to do 
>something
>in the dpdk developed source files (e.g. i40e_ethdev.c, i40e_rxtx.c, 
>i40e_osdep.h).
>This was what we have done for a long time, and it works quite well.
>If there is no other way to fix a bug in base driver, we can try the way like
>Konstantin indicated, or let me know, I will try to influence ND. But note 
>that this
>might be the lowest efficiency way, due to the complicated process.?
>
>Sorry for any inconvenience! This the way we are using now might be the best 
>for
>us right now.
>
>I will go back and redesign commit 3 in the series once more. I will need some 
>time (other things in the pipeline). I would have appreciated rising this red 
>flag in one of the 6 previous versions.
Thank you very much for the helps, and patience!

Regards,
Helin

>
>marc
>?
>
>Regards,
>Heiln
>
>>
>> Konstantin
>

80 matches

Mail list logo