[dpdk-dev] [PATCH] doc: announce ivshmem support removal

2016-07-22 Thread Hiroshi Shimamoto
Hi,

> Subject: [dpdk-dev] [PATCH] doc: announce ivshmem support removal
> 
> There was a prior call with an explanation of what needs to be done:
>   http://dpdk.org/ml/archives/dev/2016-June/040844.html
> - Qemu patch upstreamed
> - IVSHMEM PCI device managed by a PCI driver
> - No DPDK objects (ring/mempool) allocated by EAL
> 
> As nobody seems interested, it is time to remove this code which
> makes EAL improvements harder.

I'd like to confirm about the issue.
I know there are real users who rely on ivshmem mechanism. e.g. spp user.
Unfortunately they don't prefer to expose their opinion to the community.
Furthermore they may not have noticed this situation.

Anyway, it is the issue that the current ivshmem implementation breaks
EAL framework and is much complicated, right?
IIUC, for DPDK, ivshmem support module should be separated from a middle of
EAL code and make it as a PCI driver. That means the current rte_ivshmem
removal should happen. To keep the functionality to share DPDK objects
between host and guest in shared memory like ivshmem, it should be
implemented cleanly.
Is my understanding correct?

thanks,
Hiroshi

> 
> Signed-off-by: Thomas Monjalon 
> ---
>  doc/guides/rel_notes/deprecation.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index 9cadf6a..1ef8460 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -42,6 +42,9 @@ Deprecation Notices
>will be removed in 16.11.
>It is replaced by rte_mempool_generic_get/put functions.
> 
> +* The ``rte_ivshmem`` feature (including library and EAL code) will be 
> removed
> +  in 16.11 because it has some design issues which are not planned to be 
> fixed.
> +
>  * The ethtool support will be removed from KNI in 16.11.
>It is implemented only for igb and ixgbe.
>It is really hard to maintain because it requires some out-of-tree kernel
> --
> 2.7.0



[dpdk-dev] daemon process problem in DPDK

2015-01-13 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] daemon process problem in DPDK
> 
> Much appericated, Get it now.
> 
> Thanks,
> Xun
> 
> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, January 13, 2015 3:14 AM
> To: Neil Horman
> Cc: Ni, Xun; dev at dpdk.org
> Subject: Re: [dpdk-dev] daemon process problem in DPDK
> 
> On Mon, 12 Jan 2015 09:52:10 -0500
> Neil Horman  wrote:
> 
> > On Mon, Jan 12, 2015 at 02:28:20PM +, Ni, Xun wrote:
> > > Hello:
> > >
> > >I have basic questions related to dpdk and trying to find help.
> > >
> > >I am about to create a daemon process, is there a way for other 
> > > process to know whether the daemon is already created?
> I doesn't mean to get the pid, because it changes every time.
> > >
> > >If the daemon is created, how do other process to communicate with 
> > > this daemon? Dpdk seems to have rte ring but
> it only exists on the Ethernet, while I am talking about the process within 
> the same computer, and the way like share-memory,
> but I didn't find examples about the share memory between processes.
> > >
> > > Thanks,
> > > Xun
> > >
> > >
> >
> > Thats not really a dpdk question, that a generic programming question.
> > You can do this lots of ways.  Open a socket that other process can
> > connect to on an agreed port, create a shared memory segment, write a
> > file with connect information to a well know location, etc.
> > Neil
> >
> 
> We did have to make some changes to the basic application model (not in DPDK) 
> to allow for a daemon.
> 
> The normal/correct way to make a daemon is to use the daemon glibc call, and 
> this closes all file descriptors etc. Therefore
> the DPDK (eal) must be initialized after the daemon call.

How about to have daemon option in DPDK eal?

I think that many network service programs work as daemon.
If DPDK has daemon option, it may be helpful.

thanks,
Hiroshi

> 
> Also, wanted to make daemon optional for debugging.
> This led to change where the main program process application argv first then 
> passes DPDK args as second group. This is
> the inverse of the example applications.
> 
> 
> int
> main(int argc, char **argv)
> {
>   int ret;
> char *progname;
> 
>   progname = strrchr(argv[0], '/');
>   progname = strdup(progname ? progname + 1 : argv[0]);
> 
>   ret = parse_args(argc, argv);
>   if (ret < 0)
>   return -1;
> 
>   argc -= ret;
>   argv += ret;
> 
>   if (daemon_mode && daemon(1, 1) < 0)
>   return -1;
> 
>   /* workaround fact that EAL expects progname as first argument */
>   argv[0] = progname;
> 
>   ret = rte_eal_init(argc, argv);
>   if (ret < 0)
>   return -1;


[dpdk-dev] mk: fix app linking for combined libs

2014-12-16 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] mk: fix app linking for combined libs
> 
> 2014-12-08 09:53, Neil Horman:
> > On Thu, Oct 23, 2014 at 04:36:44PM +0100, Sergio Gonzalez Monroy wrote:
> > > Building combined shared libraries results in applications being linked
> > > against separeted/individual and combined libs altogether.
> > >
> > > Link only against combined lib when the config option is enabled.
> > >
> > > Signed-off-by: Sergio Gonzalez Monroy  > > intel.com>
> > > Acked-by: Pablo de Lara 
> [...]
> > Acked-by: Neil Horman 
> 
> Neil, I didn't noticed your ack which happened after a discussion I had with
> Sergio. He agreed to make a v2.
> Actually Hiroshi did it:
>   http://dpdk.org/ml/archives/dev/2014-December/009847.html
> I'm going to apply Hiroshi's one.

Ah, I haven't noticed there was the patch to address this issue.

thanks,
Hiroshi


[dpdk-dev] [PATCH v2] add one option memory-only for secondary processes

2014-12-11 Thread Hiroshi Shimamoto
Hi,

sorry for the delay.

> Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> Hi, Hiroshi,
> Yes, you are right, in order to avoid such problem, while create the mempool, 
> which shall be shared between the primary
> process and those secondary Processes, we need to assign the cache_size param 
> value to be zero. And in order to make the
> system more stable, it's better to define the RTE_MEMPOOL_CACHE_MAX_SIZE to 
> be 0 in rte_config.h.

Yes, it prevents the data corruption, but it also hurts the performance.
I think, if we use the mbuf w/o cache for PMD, we will see the performance 
degradation.

Don't you have any number?

thanks,
Hiroshi

> 
> /* create the mempool */
> struct rte_mempool *
> rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
>  unsigned cache_size, unsigned private_data_size,
>  rte_mempool_ctor_t *mp_init, void *mp_init_arg,
>  rte_mempool_obj_ctor_t *obj_init, void *obj_init_arg,
>  int socket_id, unsigned flags);
> 
> 
> Brgs,
> Chi xiaobo
> 
> 
> -Original Message-
> From: ext Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> Sent: Wednesday, December 03, 2014 6:54 PM
> To: Chi, Xiaobo (NSN - CN/Hangzhou); dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> Hi,
> 
> > Subject: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> > processes
> >
> > From: Chi Xiaobo 
> >
> > Problem: There is one normal DPDK processes deployment scenarios: one 
> > primary process and several (even hundreds) secondary
> > processes; all outside packets/messages are sent/received by primary 
> > process and then distribute them to those secondary
> > processes by DPDK's ring/sharedmemory mechanism. In such scenarios, those 
> > SECONDARY processes need only hugepage based
> > sharememory mechanism and it???s upper libs (such as ring, mempool, etc.), 
> > they need not cpu core pinning, iopl privilege
> > changing , pci device, timer, alarm, interrupt, shared_driver_list,  
> > core_info, threads for each core, etc. Then, for
> > such kind of SECONDARY processes, the current rte_eal_init() is too heavy.
> >
> > Solution:One new EAL initializing argument, --memory-only, is added. It is 
> > only for those SECONDARY processes which
> only
> > want to share memory with other processes. if this argument is defined, 
> > users need not define those mandatory arguments,
> > such as -c and -n, due to we don't want to pin such kind of processes to 
> > any CPUs.
> 
> however, we need the lcore_id per thread to use mempool.
> If the lcore_id is not initialized, it must be 0, and multiple threads will 
> break
> mempool caches per thread, because of race condition.
> We have to assign lcore_id per thread, these ids must not be overlapped, or 
> disable
> mempool handling in SECONDARY process.
> 
> thanks,
> Hiroshi
> 
> > Signed-off-by: Chi Xiaobo 
> > ---
> >  lib/librte_eal/common/eal_common_options.c | 17 ---
> >  lib/librte_eal/common/eal_internal_cfg.h   |  1 +
> >  lib/librte_eal/common/eal_options.h|  2 ++
> >  lib/librte_eal/linuxapp/eal/eal.c  | 34 
> > +-
> >  4 files changed, 36 insertions(+), 18 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_options.c 
> > b/lib/librte_eal/common/eal_common_options.c
> > index e2810ab..7b18498 100644
> > --- a/lib/librte_eal/common/eal_common_options.c
> > +++ b/lib/librte_eal/common/eal_common_options.c
> > @@ -85,6 +85,7 @@ eal_long_options[] = {
> > {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
> > {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
> > {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> > +   {OPT_MEMORY_ONLY, 0, NULL, OPT_MEMORY_ONLY_NUM},
> > {0, 0, 0, 0}
> >  };
> >
> > @@ -126,6 +127,7 @@ eal_reset_internal_config(struct internal_config 
> > *internal_cfg)
> > internal_cfg->no_hpet = 1;
> >  #endif
> > internal_cfg->vmware_tsc_map = 0;
> > +   internal_cfg->memory_only= 0;
> >  }
> >
> >  /*
> > @@ -454,6 +456,10 @@ eal_parse_common_option(int opt, const char *optarg,
> > conf->process_type = eal_parse_proc_type(optarg);
> > break;
> >
> > +   case OPT_MEMORY_ONLY_NUM:
> > +   conf->memory_only= 1;
> > +   break;
> > +
> > case OPT_MASTER_LCORE_NUM:
> >  

[dpdk-dev] [PATCH] mk: fix link to combined library

2014-12-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

The application should be linked to the single combined library in the
condition that both of CONFIG_RTE_BUILD_COMBINE_LIB and
CONFIG_RTE_BUILD_SHARED_LIB are enabled.

The current makefile generates an application that links to each library.
This patch fixes to link the single library.

Before
$ ldd x86_64-ivshmem-linuxapp-gcc/app/test
linux-vdso.so.1 =>  (0x7fff232a1000)
librte_distributor.so => not found
librte_kni.so => not found
librte_ivshmem.so => not found
librte_pipeline.so => not found
librte_table.so => not found
librte_port.so => not found
librte_timer.so => not found
librte_hash.so => not found
librte_lpm.so => not found
librte_power.so => not found
librte_acl.so => not found
librte_meter.so => not found
librte_sched.so => not found
libm.so.6 => /lib64/libm.so.6 (0x7fc63802)
librt.so.1 => /lib64/librt.so.1 (0x7fc637e18000)
librte_kvargs.so => not found
librte_mbuf.so => not found
librte_ip_frag.so => not found
libethdev.so => not found
librte_malloc.so => not found
librte_mempool.so => not found
librte_ring.so => not found
librte_eal.so => not found
librte_cmdline.so => not found
librte_cfgfile.so => not found
librte_pmd_bond.so => not found
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fc637bfe000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fc6379fa000)
libintel_dpdk.so => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fc6377dd000)
libc.so.6 => /lib64/libc.so.6 (0x7fc63741c000)
/lib64/ld-linux-x86-64.so.2 (0x7fc63833)

After
$ ldd x86_64-ivshmem-linuxapp-gcc/app/test
linux-vdso.so.1 =>  (0x7fffb79fe000)
librt.so.1 => /lib64/librt.so.1 (0x7f0d8a971000)
libm.so.6 => /lib64/libm.so.6 (0x7f0d8a66f000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f0d8a458000)
libdl.so.2 => /lib64/libdl.so.2 (0x7f0d8a254000)
libintel_dpdk.so => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f0d8a037000)
libc.so.6 => /lib64/libc.so.6 (0x7f0d89c76000)
/lib64/ld-linux-x86-64.so.2 (0x7f0d8ab82000)

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 mk/rte.app.mk | 8 
 1 file changed, 8 insertions(+)

diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 84ec4df..3782eab 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -61,6 +61,8 @@ ifeq ($(NO_AUTOLIBS),)

 LDLIBS += --whole-archive

+ifeq ($(RTE_BUILD_COMBINE_LIBS),n)
+
 ifeq ($(CONFIG_RTE_LIBRTE_DISTRIBUTOR),y)
 LDLIBS += -lrte_distributor
 endif
@@ -119,8 +121,12 @@ LDLIBS += -lm
 LDLIBS += -lrt
 endif

+endif # ! RTE_BUILD_COMBINE_LIBS
+
 LDLIBS += --start-group

+ifeq ($(RTE_BUILD_COMBINE_LIBS),n)
+
 ifeq ($(CONFIG_RTE_LIBRTE_KVARGS),y)
 LDLIBS += -lrte_kvargs
 endif
@@ -216,6 +222,8 @@ endif

 endif # plugins

+endif # ! RTE_BUILD_COMBINE_LIBS
+
 LDLIBS += $(EXECENV_LDLIBS)

 LDLIBS += --end-group
-- 
1.8.3.1



[dpdk-dev] [PATCH] mk: fix LDFLAGS for shared lib

2014-12-03 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Only CPU_LDFLAGS is used in mk/rte.sharelib.mk.
It should be LDFLAGS to build the library with correct linkage options.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 mk/rte.sharelib.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.sharelib.mk b/mk/rte.sharelib.mk
index c0a811a..df6c268 100644
--- a/mk/rte.sharelib.mk
+++ b/mk/rte.sharelib.mk
@@ -45,7 +45,7 @@ sharelib: $(LIB_ONE) FORCE

 OBJS = $(wildcard $(RTE_OUTPUT)/build/lib/*.o)

-O_TO_S = $(LD) $(CPU_LDFLAGS) -shared $(OBJS) -o $(RTE_OUTPUT)/lib/$(LIB_ONE)
+O_TO_S = $(LD) $(LDFLAGS) -shared $(OBJS) -o $(RTE_OUTPUT)/lib/$(LIB_ONE)
 O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
 O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
 O_TO_S_CMD = "cmd_$@ = $(O_TO_S_STR)"
-- 
1.8.3.1



[dpdk-dev] [PATCH v2] add one option memory-only for secondary processes

2014-12-03 Thread Hiroshi Shimamoto
Hi,

> Subject: [dpdk-dev] [PATCH v2] add one option memory-only for secondary 
> processes
> 
> From: Chi Xiaobo 
> 
> Problem: There is one normal DPDK processes deployment scenarios: one primary 
> process and several (even hundreds) secondary
> processes; all outside packets/messages are sent/received by primary process 
> and then distribute them to those secondary
> processes by DPDK's ring/sharedmemory mechanism. In such scenarios, those 
> SECONDARY processes need only hugepage based
> sharememory mechanism and it???s upper libs (such as ring, mempool, etc.), 
> they need not cpu core pinning, iopl privilege
> changing , pci device, timer, alarm, interrupt, shared_driver_list,  
> core_info, threads for each core, etc. Then, for
> such kind of SECONDARY processes, the current rte_eal_init() is too heavy.
> 
> Solution:One new EAL initializing argument, --memory-only, is added. It is 
> only for those SECONDARY processes which only
> want to share memory with other processes. if this argument is defined, users 
> need not define those mandatory arguments,
> such as -c and -n, due to we don't want to pin such kind of processes to any 
> CPUs.

however, we need the lcore_id per thread to use mempool.
If the lcore_id is not initialized, it must be 0, and multiple threads will 
break
mempool caches per thread, because of race condition.
We have to assign lcore_id per thread, these ids must not be overlapped, or 
disable
mempool handling in SECONDARY process.

thanks,
Hiroshi

> Signed-off-by: Chi Xiaobo 
> ---
>  lib/librte_eal/common/eal_common_options.c | 17 ---
>  lib/librte_eal/common/eal_internal_cfg.h   |  1 +
>  lib/librte_eal/common/eal_options.h|  2 ++
>  lib/librte_eal/linuxapp/eal/eal.c  | 34 
> +-
>  4 files changed, 36 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_options.c 
> b/lib/librte_eal/common/eal_common_options.c
> index e2810ab..7b18498 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -85,6 +85,7 @@ eal_long_options[] = {
>   {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
>   {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
>   {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> + {OPT_MEMORY_ONLY, 0, NULL, OPT_MEMORY_ONLY_NUM},
>   {0, 0, 0, 0}
>  };
> 
> @@ -126,6 +127,7 @@ eal_reset_internal_config(struct internal_config 
> *internal_cfg)
>   internal_cfg->no_hpet = 1;
>  #endif
>   internal_cfg->vmware_tsc_map = 0;
> + internal_cfg->memory_only= 0;
>  }
> 
>  /*
> @@ -454,6 +456,10 @@ eal_parse_common_option(int opt, const char *optarg,
>   conf->process_type = eal_parse_proc_type(optarg);
>   break;
> 
> + case OPT_MEMORY_ONLY_NUM:
> + conf->memory_only= 1;
> + break;
> +
>   case OPT_MASTER_LCORE_NUM:
>   if (eal_parse_master_lcore(optarg) < 0) {
>   RTE_LOG(ERR, EAL, "invalid parameter for --"
> @@ -525,9 +531,9 @@ eal_check_common_options(struct internal_config 
> *internal_cfg)
>  {
>   struct rte_config *cfg = rte_eal_get_configuration();
> 
> - if (!lcores_parsed) {
> - RTE_LOG(ERR, EAL, "CPU cores must be enabled with options "
> - "-c or -l\n");
> + if (!lcores_parsed && !(internal_cfg->process_type == 
> RTE_PROC_SECONDARY&& internal_cfg->memory_only) ) {
> + RTE_LOG(ERR, EAL, "For those processes without memory-only 
> option, CPU cores "
> + "must be enabled with 
> options -c or -l\n");
>   return -1;
>   }
>   if (cfg->lcore_role[cfg->master_lcore] != ROLE_RTE) {
> @@ -545,6 +551,10 @@ eal_check_common_options(struct internal_config 
> *internal_cfg)
>   "specified\n");
>   return -1;
>   }
> + if ( internal_cfg->process_type != RTE_PROC_SECONDARY && 
> internal_cfg->memory_only ) {
> + RTE_LOG(ERR, EAL, "only secondary processes can specify 
> memory-only option.\n");
> + return -1;
> + }
>   if (index(internal_cfg->hugefile_prefix, '%') != NULL) {
>   RTE_LOG(ERR, EAL, "Invalid char, '%%', in --"OPT_FILE_PREFIX" "
>   "option\n");
> @@ -590,6 +600,7 @@ eal_common_usage(void)
>  "  --"OPT_SYSLOG" : set syslog facility\n"
>  "  --"OPT_LOG_LEVEL"  : set default log level\n"
>  "  --"OPT_PROC_TYPE"  : type of this process\n"
> +"  --"OPT_MEMORY_ONLY": only use shared memory, valid only for 
> secondary process.\n"
>  "  --"OPT_PCI_BLACKLIST", -b: add a PCI device in black list.\n"
>  "   Prevent EAL from using this PCI device. The 
> argument\n"
>  "   format is .\n"
> diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
> 

[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-10-02 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> recv/xmit
> 
> On Wed, Oct 01, 2014 at 11:33:23PM +0000, Hiroshi Shimamoto wrote:
> > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> > > recv/xmit
> > >
> > > On Wed, Oct 01, 2014 at 09:12:44AM +, Hiroshi Shimamoto wrote:
> > > > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> > > > > recv/xmit
> > > > >
> > > > > On Tue, Sep 30, 2014 at 11:52:00PM +, Hiroshi Shimamoto wrote:
> > > > > > Hi,
> > > > > >
> > > > > > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch 
> > > > > > > hint in recv/xmit
> > > > > > >
> > > > > > > On Tue, Sep 30, 2014 at 11:14:40AM +, Hiroshi Shimamoto wrote:
> > > > > > > > From: Hiroshi Shimamoto 
> > > > > > > >
> > > > > > > > To reduce instruction cache miss, add branch condition hints 
> > > > > > > > into
> > > > > > > > recv/xmit functions. This improves a bit performance.
> > > > > > > >
> > > > > > > > We can see performance improvements with memnic-tester.
> > > > > > > > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> > > > > > > >  size |  before  |  after
> > > > > > > >64 | 5.54Mpps | 5.55Mpps
> > > > > > > >   128 | 5.46Mpps | 5.44Mpps
> > > > > > > >   256 | 5.21Mpps | 5.22Mpps
> > > > > > > >   512 | 4.50Mpps | 4.52Mpps
> > > > > > > >  1024 | 3.71Mpps | 3.73Mpps
> > > > > > > >  1280 | 3.21Mpps | 3.22Mpps
> > > > > > > >  1518 | 2.92Mpps | 2.93Mpps
> > > > > > > >
> > > > > > > > Signed-off-by: Hiroshi Shimamoto 
> > > > > > > > Reviewed-by: Hayato Momma 
> > > > > > > > ---
> > > > > > > >  pmd/pmd_memnic.c | 18 +-
> > > > > > > >  1 file changed, 9 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> > > > > > > > index 7fc3093..875d3ea 100644
> > > > > > > > --- a/pmd/pmd_memnic.c
> > > > > > > > +++ b/pmd/pmd_memnic.c
> > > > > > > > @@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void 
> > > > > > > > *rx_queue,
> > > > > > > > int idx, next;
> > > > > > > > struct rte_eth_stats *st = 
> > > > > > > > >stats[rte_lcore_id()];
> > > > > > > >
> > > > > > > > -   if (!adapter->nic->hdr.valid)
> > > > > > > > +   if (unlikely(!adapter->nic->hdr.valid))
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > pkts = bytes = errs = 0;
> > > > > > > > idx = adapter->up_idx;
> > > > > > > > for (nr = 0; nr < nb_pkts; nr++) {
> > > > > > > > p = >packets[idx];
> > > > > > > > -   if (p->status != MEMNIC_PKT_ST_FILLED)
> > > > > > > > +   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
> > > > > > > > break;
> > > > > > > > /* prefetch the next area */
> > > > > > > > next = idx;
> > > > > > > > -   if (++next >= MEMNIC_NR_PACKET)
> > > > > > > > +   if (unlikely(++next >= MEMNIC_NR_PACKET))
> > > > > > > > next = 0;
> > > > > > > > rte_prefetch0(>packets[next]);
> > > > > > > > -   if (p->len > framesz) {
> > > > > > > > +   if (unlikely(p->len > framesz)) {
> > > > > > > > errs++;
> > > > > > > > goto drop;
> > > > > > > > }
> > > > > 

[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-10-02 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> recv/xmit
> 
> On Wed, Oct 01, 2014 at 09:12:44AM +0000, Hiroshi Shimamoto wrote:
> > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> > > recv/xmit
> > >
> > > On Tue, Sep 30, 2014 at 11:52:00PM +, Hiroshi Shimamoto wrote:
> > > > Hi,
> > > >
> > > > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> > > > > recv/xmit
> > > > >
> > > > > On Tue, Sep 30, 2014 at 11:14:40AM +, Hiroshi Shimamoto wrote:
> > > > > > From: Hiroshi Shimamoto 
> > > > > >
> > > > > > To reduce instruction cache miss, add branch condition hints into
> > > > > > recv/xmit functions. This improves a bit performance.
> > > > > >
> > > > > > We can see performance improvements with memnic-tester.
> > > > > > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> > > > > >  size |  before  |  after
> > > > > >64 | 5.54Mpps | 5.55Mpps
> > > > > >   128 | 5.46Mpps | 5.44Mpps
> > > > > >   256 | 5.21Mpps | 5.22Mpps
> > > > > >   512 | 4.50Mpps | 4.52Mpps
> > > > > >  1024 | 3.71Mpps | 3.73Mpps
> > > > > >  1280 | 3.21Mpps | 3.22Mpps
> > > > > >  1518 | 2.92Mpps | 2.93Mpps
> > > > > >
> > > > > > Signed-off-by: Hiroshi Shimamoto 
> > > > > > Reviewed-by: Hayato Momma 
> > > > > > ---
> > > > > >  pmd/pmd_memnic.c | 18 +-
> > > > > >  1 file changed, 9 insertions(+), 9 deletions(-)
> > > > > >
> > > > > > diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> > > > > > index 7fc3093..875d3ea 100644
> > > > > > --- a/pmd/pmd_memnic.c
> > > > > > +++ b/pmd/pmd_memnic.c
> > > > > > @@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void 
> > > > > > *rx_queue,
> > > > > > int idx, next;
> > > > > > struct rte_eth_stats *st = >stats[rte_lcore_id()];
> > > > > >
> > > > > > -   if (!adapter->nic->hdr.valid)
> > > > > > +   if (unlikely(!adapter->nic->hdr.valid))
> > > > > > return 0;
> > > > > >
> > > > > > pkts = bytes = errs = 0;
> > > > > > idx = adapter->up_idx;
> > > > > > for (nr = 0; nr < nb_pkts; nr++) {
> > > > > > p = >packets[idx];
> > > > > > -   if (p->status != MEMNIC_PKT_ST_FILLED)
> > > > > > +   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
> > > > > > break;
> > > > > > /* prefetch the next area */
> > > > > > next = idx;
> > > > > > -   if (++next >= MEMNIC_NR_PACKET)
> > > > > > +   if (unlikely(++next >= MEMNIC_NR_PACKET))
> > > > > > next = 0;
> > > > > > rte_prefetch0(>packets[next]);
> > > > > > -   if (p->len > framesz) {
> > > > > > +   if (unlikely(p->len > framesz)) {
> > > > > > errs++;
> > > > > > goto drop;
> > > > > > }
> > > > > > mb = rte_pktmbuf_alloc(adapter->mp);
> > > > > > -   if (!mb)
> > > > > > +   if (unlikely(!mb))
> > > > > > break;
> > > > > >
> > > > > > rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, 
> > > > > > p->len);
> > > > > > @@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > > > > > uint64_t pkts, bytes, errs;
> > > > > > uint32_t framesz = adapter->framesz;
> > > > > >
> > > > > > -   if (!adapter->nic->hdr.valid)
> > > > > > +   if (unlikely(!adapter->nic->hdr.valid))
> > > > > > return 0;
> > > > > >
> > > > > > pkts = bytes = errs = 0;
> > > > > > @@ -360,7 +360,7 @@ static uint1

[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-10-01 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> recv/xmit
> 
> On Tue, Sep 30, 2014 at 11:52:00PM +0000, Hiroshi Shimamoto wrote:
> > Hi,
> >
> > > Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> > > recv/xmit
> > >
> > > On Tue, Sep 30, 2014 at 11:14:40AM +0000, Hiroshi Shimamoto wrote:
> > > > From: Hiroshi Shimamoto 
> > > >
> > > > To reduce instruction cache miss, add branch condition hints into
> > > > recv/xmit functions. This improves a bit performance.
> > > >
> > > > We can see performance improvements with memnic-tester.
> > > > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> > > >  size |  before  |  after
> > > >64 | 5.54Mpps | 5.55Mpps
> > > >   128 | 5.46Mpps | 5.44Mpps
> > > >   256 | 5.21Mpps | 5.22Mpps
> > > >   512 | 4.50Mpps | 4.52Mpps
> > > >  1024 | 3.71Mpps | 3.73Mpps
> > > >  1280 | 3.21Mpps | 3.22Mpps
> > > >  1518 | 2.92Mpps | 2.93Mpps
> > > >
> > > > Signed-off-by: Hiroshi Shimamoto 
> > > > Reviewed-by: Hayato Momma 
> > > > ---
> > > >  pmd/pmd_memnic.c | 18 +-
> > > >  1 file changed, 9 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> > > > index 7fc3093..875d3ea 100644
> > > > --- a/pmd/pmd_memnic.c
> > > > +++ b/pmd/pmd_memnic.c
> > > > @@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
> > > > int idx, next;
> > > > struct rte_eth_stats *st = >stats[rte_lcore_id()];
> > > >
> > > > -   if (!adapter->nic->hdr.valid)
> > > > +   if (unlikely(!adapter->nic->hdr.valid))
> > > > return 0;
> > > >
> > > > pkts = bytes = errs = 0;
> > > > idx = adapter->up_idx;
> > > > for (nr = 0; nr < nb_pkts; nr++) {
> > > > p = >packets[idx];
> > > > -   if (p->status != MEMNIC_PKT_ST_FILLED)
> > > > +   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
> > > > break;
> > > > /* prefetch the next area */
> > > > next = idx;
> > > > -   if (++next >= MEMNIC_NR_PACKET)
> > > > +   if (unlikely(++next >= MEMNIC_NR_PACKET))
> > > > next = 0;
> > > > rte_prefetch0(>packets[next]);
> > > > -   if (p->len > framesz) {
> > > > +   if (unlikely(p->len > framesz)) {
> > > > errs++;
> > > > goto drop;
> > > > }
> > > > mb = rte_pktmbuf_alloc(adapter->mp);
> > > > -   if (!mb)
> > > > +   if (unlikely(!mb))
> > > > break;
> > > >
> > > > rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, 
> > > > p->len);
> > > > @@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > > > uint64_t pkts, bytes, errs;
> > > > uint32_t framesz = adapter->framesz;
> > > >
> > > > -   if (!adapter->nic->hdr.valid)
> > > > +   if (unlikely(!adapter->nic->hdr.valid))
> > > > return 0;
> > > >
> > > > pkts = bytes = errs = 0;
> > > > @@ -360,7 +360,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > > > struct rte_mbuf *sg;
> > > > void *ptr;
> > > >
> > > > -   if (pkt_len > framesz) {
> > > > +   if (unlikely(pkt_len > framesz)) {
> > > > errs++;
> > > > break;
> > > > }
> > > > @@ -379,7 +379,7 @@ retry:
> > > > goto retry;
> > > > }
> > > >
> > > > -   if (idx != ACCESS_ONCE(adapter->down_idx)) {
> > > > +   if (unlikely(idx != ACCESS_ONCE(adapter->down_idx))) {
> > > Why are you using ACCESS_ONCE here?  Or for that matter, anywhere else in 
> > > this
> > > PMD?  The whole idea of the ACCESS_ONCE macro is to assign a value to a 
> > > variable
> > > once and prevent it from getting reloaded from memory at a later time, 
> > > this is
> > > exactly contrary to that, both in the sense that you're explicitly 
> > > reloading the
> > > same variable multiple times, and that you're using it as part of a 
> > > comparison
> > > operation, rather than an asignment operation
> >
> > ACCESS_ONCE prevents compiler optimization and ensures load from memory.
> > There could be multiple threads which read/write that index.
> > We should compare the value previous and the current value in memory.
> > In that reason, I use ACCESS_ONCE macro to get value in the memory.
> 
> Should you not just make the variable volatile? That's the normal way to
> guarantee reads from memory and prevent the compiler caching things in
> registers.

We don't want always accessing to memory, it could cause performance 
degradation.
Like linux kernel, I use it in the place only we really load from memory.

thanks,
Hiroshi

> 
> /Bruce
> 
> >
> > thanks,
> > Hiroshi
> >
> > >
> > > Neil
> >


[dpdk-dev] [memnic PATCH v2 0/7] MEMNIC PMD performance improvement

2014-10-01 Thread Hiroshi Shimamoto
Hi Thomas,

> Subject: Re: [dpdk-dev] [memnic PATCH v2 0/7] MEMNIC PMD performance 
> improvement
> 
> > This patchset improves MEMNIC PMD performance.
> >
> > Hiroshi Shimamoto (7):
> >   guest: memnic-tester: PMD benchmark in guest
> >   pmd: remove needless assignment
> >   pmd: use helper macros
> >   pmd: use compiler barrier
> >   pmd: packet receiving optimization with prefetch
> >   pmd: add branch hint in recv/xmit
> >   pmd: burst mbuf freeing in xmit
> 
> Applied with Huawei's wording comment.
> 
> If there is no more patch, it will be tagged v1.3 at the end
> of the week.

I'm fine with that.

Then, will start to work to support DPDK v1.8.

thanks,
Hiroshi

> 
> Thanks
> --
> Thomas


[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-10-01 Thread Hiroshi Shimamoto
Hi,

> Subject: RE: [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit
> 
> The patch is ok. For the commit message, is it better
> "to reduce branch mispredication"?

yes, that seems more suitable to explain the situation.

Thomas, what do you think? Can you replace the message when you apply
this patch?

thanks,
Hiroshi

> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> > Sent: Tuesday, September 30, 2014 7:15 PM
> > To: dev at dpdk.org
> > Cc: Hayato Momma
> > Subject: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit
> >
> > From: Hiroshi Shimamoto 
> >
> > To reduce instruction cache miss, add branch condition hints into
> > recv/xmit functions. This improves a bit performance.
> >
> > We can see performance improvements with memnic-tester.
> > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> >  size |  before  |  after
> >64 | 5.54Mpps | 5.55Mpps
> >   128 | 5.46Mpps | 5.44Mpps
> >   256 | 5.21Mpps | 5.22Mpps
> >   512 | 4.50Mpps | 4.52Mpps
> >  1024 | 3.71Mpps | 3.73Mpps
> >  1280 | 3.21Mpps | 3.22Mpps
> >  1518 | 2.92Mpps | 2.93Mpps
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > Reviewed-by: Hayato Momma 
> > ---
> >  pmd/pmd_memnic.c | 18 +-
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> > index 7fc3093..875d3ea 100644
> > --- a/pmd/pmd_memnic.c
> > +++ b/pmd/pmd_memnic.c
> > @@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
> > int idx, next;
> > struct rte_eth_stats *st = >stats[rte_lcore_id()];
> >
> > -   if (!adapter->nic->hdr.valid)
> > +   if (unlikely(!adapter->nic->hdr.valid))
> > return 0;
> >
> > pkts = bytes = errs = 0;
> > idx = adapter->up_idx;
> > for (nr = 0; nr < nb_pkts; nr++) {
> > p = >packets[idx];
> > -   if (p->status != MEMNIC_PKT_ST_FILLED)
> > +   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
> > break;
> > /* prefetch the next area */
> > next = idx;
> > -   if (++next >= MEMNIC_NR_PACKET)
> > +   if (unlikely(++next >= MEMNIC_NR_PACKET))
> > next = 0;
> > rte_prefetch0(>packets[next]);
> > -   if (p->len > framesz) {
> > +   if (unlikely(p->len > framesz)) {
> > errs++;
> > goto drop;
> > }
> > mb = rte_pktmbuf_alloc(adapter->mp);
> > -   if (!mb)
> > +   if (unlikely(!mb))
> > break;
> >
> > rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
> > @@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > uint64_t pkts, bytes, errs;
> > uint32_t framesz = adapter->framesz;
> >
> > -   if (!adapter->nic->hdr.valid)
> > +   if (unlikely(!adapter->nic->hdr.valid))
> > return 0;
> >
> > pkts = bytes = errs = 0;
> > @@ -360,7 +360,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > struct rte_mbuf *sg;
> > void *ptr;
> >
> > -   if (pkt_len > framesz) {
> > +   if (unlikely(pkt_len > framesz)) {
> > errs++;
> > break;
> > }
> > @@ -379,7 +379,7 @@ retry:
> > goto retry;
> > }
> >
> > -   if (idx != ACCESS_ONCE(adapter->down_idx)) {
> > +   if (unlikely(idx != ACCESS_ONCE(adapter->down_idx))) {
> > /*
> >  * host freed this and got false positive,
> >  * need to recover the status and retry.
> > @@ -388,7 +388,7 @@ retry:
> > goto retry;
> > }
> >
> > -   if (++idx >= MEMNIC_NR_PACKET)
> > +   if (unlikely(++idx >= MEMNIC_NR_PACKET))
> > idx = 0;
> > adapter->down_idx = idx;
> >
> > --
> > 1.8.3.1



[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-10-01 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in 
> recv/xmit
> 
> On Tue, Sep 30, 2014 at 11:14:40AM +0000, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > To reduce instruction cache miss, add branch condition hints into
> > recv/xmit functions. This improves a bit performance.
> >
> > We can see performance improvements with memnic-tester.
> > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> >  size |  before  |  after
> >64 | 5.54Mpps | 5.55Mpps
> >   128 | 5.46Mpps | 5.44Mpps
> >   256 | 5.21Mpps | 5.22Mpps
> >   512 | 4.50Mpps | 4.52Mpps
> >  1024 | 3.71Mpps | 3.73Mpps
> >  1280 | 3.21Mpps | 3.22Mpps
> >  1518 | 2.92Mpps | 2.93Mpps
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > Reviewed-by: Hayato Momma 
> > ---
> >  pmd/pmd_memnic.c | 18 +-
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> > index 7fc3093..875d3ea 100644
> > --- a/pmd/pmd_memnic.c
> > +++ b/pmd/pmd_memnic.c
> > @@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
> > int idx, next;
> > struct rte_eth_stats *st = >stats[rte_lcore_id()];
> >
> > -   if (!adapter->nic->hdr.valid)
> > +   if (unlikely(!adapter->nic->hdr.valid))
> > return 0;
> >
> > pkts = bytes = errs = 0;
> > idx = adapter->up_idx;
> > for (nr = 0; nr < nb_pkts; nr++) {
> > p = >packets[idx];
> > -   if (p->status != MEMNIC_PKT_ST_FILLED)
> > +   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
> > break;
> > /* prefetch the next area */
> > next = idx;
> > -   if (++next >= MEMNIC_NR_PACKET)
> > +   if (unlikely(++next >= MEMNIC_NR_PACKET))
> > next = 0;
> > rte_prefetch0(>packets[next]);
> > -   if (p->len > framesz) {
> > +   if (unlikely(p->len > framesz)) {
> > errs++;
> > goto drop;
> > }
> > mb = rte_pktmbuf_alloc(adapter->mp);
> > -   if (!mb)
> > +   if (unlikely(!mb))
> > break;
> >
> > rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
> > @@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > uint64_t pkts, bytes, errs;
> > uint32_t framesz = adapter->framesz;
> >
> > -   if (!adapter->nic->hdr.valid)
> > +   if (unlikely(!adapter->nic->hdr.valid))
> > return 0;
> >
> > pkts = bytes = errs = 0;
> > @@ -360,7 +360,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
> > struct rte_mbuf *sg;
> > void *ptr;
> >
> > -   if (pkt_len > framesz) {
> > +   if (unlikely(pkt_len > framesz)) {
> > errs++;
> > break;
> > }
> > @@ -379,7 +379,7 @@ retry:
> > goto retry;
> > }
> >
> > -   if (idx != ACCESS_ONCE(adapter->down_idx)) {
> > +   if (unlikely(idx != ACCESS_ONCE(adapter->down_idx))) {
> Why are you using ACCESS_ONCE here?  Or for that matter, anywhere else in this
> PMD?  The whole idea of the ACCESS_ONCE macro is to assign a value to a 
> variable
> once and prevent it from getting reloaded from memory at a later time, this is
> exactly contrary to that, both in the sense that you're explicitly reloading 
> the
> same variable multiple times, and that you're using it as part of a comparison
> operation, rather than an asignment operation

ACCESS_ONCE prevents compiler optimization and ensures load from memory.
There could be multiple threads which read/write that index.
We should compare the value previous and the current value in memory.
In that reason, I use ACCESS_ONCE macro to get value in the memory.

thanks,
Hiroshi

> 
> Neil



[dpdk-dev] [memnic PATCH v2 7/7] pmd: burst mbuf freeing in xmit

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

In rte_pktmbuf_free(), there might be cache miss/memory stall issue.
In small packet case, it could harm the performance.

>From the result of memnic-tester, in less than 1024 frame size the
performance could be improved.

Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 5.55Mpps | 5.83Mpps
  128 | 5.44Mpps | 5.71Mpps
  256 | 5.22Mpps | 5.40Mpps
  512 | 4.52Mpps | 4.64Mpps
 1024 | 3.73Mpps | 3.68Mpps
 1280 | 3.22Mpps | 3.17Mpps
 1518 | 2.93Mpps | 2.90Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 875d3ea..59ee332 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -344,7 +344,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct memnic_adapter *adapter = q->adapter;
struct memnic_data *data = >nic->down;
struct memnic_packet *p;
-   uint16_t nr;
+   uint16_t i, nr;
int idx;
struct rte_eth_stats *st = >stats[rte_lcore_id()];
uint64_t pkts, bytes, errs;
@@ -408,9 +408,9 @@ retry:

rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FILLED;
-
-   rte_pktmbuf_free(tx_pkts[nr]);
}
+   for (i = 0; i < nr; i++)
+   rte_pktmbuf_free(tx_pkts[i]);

/* stats */
st->opackets += pkts;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 6/7] pmd: add branch hint in recv/xmit

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

To reduce instruction cache miss, add branch condition hints into
recv/xmit functions. This improves a bit performance.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 5.54Mpps | 5.55Mpps
  128 | 5.46Mpps | 5.44Mpps
  256 | 5.21Mpps | 5.22Mpps
  512 | 4.50Mpps | 4.52Mpps
 1024 | 3.71Mpps | 3.73Mpps
 1280 | 3.21Mpps | 3.22Mpps
 1518 | 2.92Mpps | 2.93Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 7fc3093..875d3ea 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
int idx, next;
struct rte_eth_stats *st = >stats[rte_lcore_id()];

-   if (!adapter->nic->hdr.valid)
+   if (unlikely(!adapter->nic->hdr.valid))
return 0;

pkts = bytes = errs = 0;
idx = adapter->up_idx;
for (nr = 0; nr < nb_pkts; nr++) {
p = >packets[idx];
-   if (p->status != MEMNIC_PKT_ST_FILLED)
+   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
break;
/* prefetch the next area */
next = idx;
-   if (++next >= MEMNIC_NR_PACKET)
+   if (unlikely(++next >= MEMNIC_NR_PACKET))
next = 0;
rte_prefetch0(>packets[next]);
-   if (p->len > framesz) {
+   if (unlikely(p->len > framesz)) {
errs++;
goto drop;
}
mb = rte_pktmbuf_alloc(adapter->mp);
-   if (!mb)
+   if (unlikely(!mb))
break;

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
@@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
uint64_t pkts, bytes, errs;
uint32_t framesz = adapter->framesz;

-   if (!adapter->nic->hdr.valid)
+   if (unlikely(!adapter->nic->hdr.valid))
return 0;

pkts = bytes = errs = 0;
@@ -360,7 +360,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct rte_mbuf *sg;
void *ptr;

-   if (pkt_len > framesz) {
+   if (unlikely(pkt_len > framesz)) {
errs++;
break;
}
@@ -379,7 +379,7 @@ retry:
goto retry;
}

-   if (idx != ACCESS_ONCE(adapter->down_idx)) {
+   if (unlikely(idx != ACCESS_ONCE(adapter->down_idx))) {
/*
 * host freed this and got false positive,
 * need to recover the status and retry.
@@ -388,7 +388,7 @@ retry:
goto retry;
}

-   if (++idx >= MEMNIC_NR_PACKET)
+   if (unlikely(++idx >= MEMNIC_NR_PACKET))
idx = 0;
adapter->down_idx = idx;

-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 5/7] pmd: packet receiving optimization with prefetch

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Prefetch the next packet area to reduce memory stall cycles.

Prefetching the next packet area could hide memory stall, because the next
area will be accessed just after processing the current receive operations.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.59Mpps | 5.54Mpps
  128 | 4.87Mpps | 5.46Mpps
  256 | 4.72Mpps | 5.21Mpps
  512 | 4.41Mpps | 4.50Mpps
 1024 | 3.64Mpps | 3.71Mpps
 1280 | 3.15Mpps | 3.21Mpps
 1518 | 2.87Mpps | 2.92Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 0783440..7fc3093 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -286,7 +286,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
uint16_t nr;
uint64_t pkts, bytes, errs;
uint32_t framesz = adapter->framesz;
-   int idx;
+   int idx, next;
struct rte_eth_stats *st = >stats[rte_lcore_id()];

if (!adapter->nic->hdr.valid)
@@ -298,6 +298,11 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
p = >packets[idx];
if (p->status != MEMNIC_PKT_ST_FILLED)
break;
+   /* prefetch the next area */
+   next = idx;
+   if (++next >= MEMNIC_NR_PACKET)
+   next = 0;
+   rte_prefetch0(>packets[next]);
if (p->len > framesz) {
errs++;
goto drop;
@@ -318,9 +323,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
 drop:
rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FREE;
-
-   if (++idx >= MEMNIC_NR_PACKET)
-   idx = 0;
+   idx = next;
}
adapter->up_idx = idx;

-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 4/7] pmd: use compiler barrier

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

x86 can keep store ordering with standard operations.

Using memory barrier is much expensive in main packet processing loop.
Removing this improves xmit/recv packet performance.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.18Mpps | 4.59Mpps
  128 | 3.85Mpps | 4.87Mpps
  256 | 4.01Mpps | 4.72Mpps
  512 | 3.52Mpps | 4.41Mpps
 1024 | 3.18Mpps | 3.64Mpps
 1280 | 2.86Mpps | 3.15Mpps
 1518 | 2.59Mpps | 2.87Mpps

Note: we have to take care if we use non-temporal cache.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 872f3c4..0783440 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -316,7 +316,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
bytes += p->len;

 drop:
-   rte_mb();
+   rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FREE;

if (++idx >= MEMNIC_NR_PACKET)
@@ -403,7 +403,7 @@ retry:
pkts++;
bytes += pkt_len;

-   rte_mb();
+   rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FILLED;

rte_pktmbuf_free(tx_pkts[nr]);
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 3/7] pmd: use helper macros

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Do not touch pktmbuf directly.

Intead of direct access, use rte_pktmbuf_pkt_len() and rte_pktmbuf_data_len()
to access the property.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index bbb5380..872f3c4 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -308,8 +308,8 @@ static uint16_t memnic_recv_pkts(void *rx_queue,

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
mb->pkt.in_port = q->port_id;
-   mb->pkt.pkt_len = p->len;
-   mb->pkt.data_len = p->len;
+   rte_pktmbuf_pkt_len(mb) = p->len;
+   rte_pktmbuf_data_len(mb) = p->len;
rx_pkts[nr] = mb;

pkts++;
@@ -394,7 +394,7 @@ retry:
ptr = p->data;
for (sg = tx_pkts[nr]; sg; sg = sg->pkt.next) {
void *src = rte_pktmbuf_mtod(sg, void *);
-   int data_len = sg->pkt.data_len;
+   int data_len = rte_pktmbuf_data_len(sg);

rte_memcpy(ptr, src, data_len);
ptr += data_len;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 2/7] pmd: remove needless assignment

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Because these assignment are done in rte_pktmbuf_alloc(), get rid of them.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 994ed0a..bbb5380 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -308,8 +308,6 @@ static uint16_t memnic_recv_pkts(void *rx_queue,

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
mb->pkt.in_port = q->port_id;
-   mb->pkt.nb_segs = 1;
-   mb->pkt.next = NULL;
mb->pkt.pkt_len = p->len;
mb->pkt.data_len = p->len;
rx_pkts[nr] = mb;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH v2 1/7] guest: memnic-tester: PMD benchmark in guest

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Introduce memnic-tester which benchmarks MEMNIC PMD performance in guest.

It starts with two threads, one thread produces and consumes packets,
other thread receives packets and directly transmits the received
packets. This evaluates MEMNIC PMD running cost.

memnic-tester is a benchmark tool to measure performance of MEMNIC PMD itself.
The master thread forward packets with Rx and Tx bursts.
The slave thread fills and clears packets in the lightest way. It doesn't get
packet out of VM because it would increase jitter and hide PMD performance.
Throughput (number of forwarded packets per second) is given for each frame 
size.

The master thread does rx_burst and tx_burst through MEMNIC PMD.
+-+
| master  |
+-+
 rx_burst ^ | tx_burst
  | V
  +--+--+
  |  up  | down | MEMNIC shared memory
  +--+--+
 set flag ^ | unset flag
  | V
+-+
|  slave  |
+-+
The slave thread emulates packet-in/out by setting flag on/off.

It shows that throughputs in different frame size.
  64, 128, 256, 512, 1024, 1280, 1518

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 guest/Makefile|  20 
 guest/README.rst  |  93 +
 guest/memnic-tester.c | 281 ++
 3 files changed, 394 insertions(+)
 create mode 100644 guest/Makefile
 create mode 100644 guest/README.rst
 create mode 100644 guest/memnic-tester.c

diff --git a/guest/Makefile b/guest/Makefile
new file mode 100644
index 000..3c90350
--- /dev/null
+++ b/guest/Makefile
@@ -0,0 +1,20 @@
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+ifeq ($(RTE_TARGET),)
+$(error "Please define RTE_TARGET environment variable")
+endif
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+COMMON_INC_OPT = -I $(PWD)/../common
+
+APP = memnic-tester
+
+CFLAGS += -Wall -g -O3 $(COMMON_INC_OPT)
+
+SRCS-y := memnic-tester.c
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/guest/README.rst b/guest/README.rst
new file mode 100644
index 000..eb230b0
--- /dev/null
+++ b/guest/README.rst
@@ -0,0 +1,93 @@
+.. Copyright 2014 NEC
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions
+   are met:
+   - Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+   - Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+   FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+   COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+   INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+   (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+   HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+   STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+   OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MEMNIC TESTER
+=
+
+DESCRIPTION
+---
+
+It is a simple benchmark test of MEMNIC PMD in guest.
+
+It have two threads, one thread produces and consumes packets,
+other thread receives packets and directly transmits the received
+packets back in MEMNIC interface. This evaluates MEMNIC PMD running cost.
+
+memnic-tester is a benchmark tool to measure performance of MEMNIC PMD itself.
+The master thread forward packets with Rx and Tx bursts.
+The slave thread fills and clears packets in the lightest way. It doesn't get
+packet out of VM because it would increase jitter and hide PMD performance.
+Throughput (number of forwarded packets per second) is given for each frame 
size.
+
+The master thread does rx_burst and tx_burst through MEMNIC PMD.
++-+
+| master  |
++-+
+ rx_burst ^ | tx_burst
+  | V
+  +--+--+
+  |  up  | down | MEMNIC shared memory
+  +--+--+
+ set flag ^ | unset flag
+  | V
++-+
+|  slave  |
++-+
+The slave thread emulates packet-in/out by setting flag on/off.
+
+Like RFC2544, evaluations are performed the below frame size packets.
+  64, 128, 25

[dpdk-dev] [memnic PATCH v2 0/7] MEMNIC PMD performance improvement

2014-09-30 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

This patchset improves MEMNIC PMD performance.

The first patch introduces a new benchmark test run in guest,
and will be used to evaluate the following patch effects.

This patchset improves the throughput results of memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.18Mpps | 5.83Mpps
  128 | 3.85Mpps | 5.71Mpps
  256 | 4.01Mpps | 5.40Mpps
  512 | 3.52Mpps | 4.64Mpps
 1024 | 3.18Mpps | 3.68Mpps
 1280 | 2.86Mpps | 3.17Mpps
 1518 | 2.59Mpps | 2.90Mpps

Hiroshi Shimamoto (7):
  guest: memnic-tester: PMD benchmark in guest
  pmd: remove needless assignment
  pmd: use helper macros
  pmd: use compiler barrier
  pmd: packet receiving optimization with prefetch
  pmd: add branch hint in recv/xmit
  pmd: burst mbuf freeing in xmit

 guest/Makefile|  20 
 guest/README.rst  |  93 +
 guest/memnic-tester.c | 281 ++
 pmd/pmd_memnic.c  |  45 
 4 files changed, 417 insertions(+), 22 deletions(-)
 create mode 100644 guest/Makefile
 create mode 100644 guest/README.rst
 create mode 100644 guest/memnic-tester.c

-- 
1.8.3.1



[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-30 Thread Hiroshi Shimamoto

> Subject: Re: [dpdk-dev] DPDK doesn't work with iommu=pt
> 
> 
> 
> On Mon, Sep 29, 2014 at 2:53 AM, Hiroshi Shimamoto  ct.jp.nec.com> wrote:
> > Hi,
> >
> >> Subject: Re: [dpdk-dev] DPDK doesn't work with iommu=pt
> >>
> >> iommu=pt effectively disables iommu for the kernel and iommu is
> >> enabled only for KVM.
> >> http://lwn.net/Articles/329174/
> >
> > thanks for pointing that.
> >
> > Okay, I think DPDK cannot handle IOMMU because of no kernel code in
> > DPDK application.
> >
> > And now, I think "iommu=pt" doesn't work correctly DMA on host PMD
> > causes DMAR fault which means IOMMU catches a wrong operation.
> > Will dig around "iommu=pt".
> >
> I agree with your analysis, It seems that a fairly recent patch (3~4) months 
> has introduced a bug that confuses unprotected
> DMA access with an iommu access, by the device and produces an equivalent of 
> a page fault.
> 
> >>
> >> Basically unless you have KVM running you can remove both lines for
> >> the same effect.
> >> On the other hand if you do have KVM and you do want iommu=on You can
> >> remove the iommu=pt for the same performance because AFAIK unlike the
> >> kernel drivers DPDK doesn't dma_map and dma_unman each and every
> >> ingress/egress packet (Please correct me if I'm wrong), and will not
> >> suffer any performance penalties.
> >
> > I also tried "iommu=on", but it didn't fix the issue.
> > I saw the same error messages in kernel.
> >
> 
> Just to clarify, what I suggested you to try is leaving only this string in 
> the command line "intel_iommu=on".  w/o iommu=pt.
> But this would work iff DPDK can handle iota's (I/O virtual addresses).

okay, I tried with "intel_iommu=on" only, but nothing was changed.

By the way, in several testing and my investigation, I think the issue comes 
from
no DMAR entry for hw pass through mode.
So using VFIO which turns IOMMU always on seems to solve my issue.

Unbind devices from igb_uio, and bind them vfio-pci, run testpmd looks working.

thanks,
Hiroshi

> 
> >   [   46.978097] dmar: DRHD: handling fault status reg 2
> >   [   46.978120] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
> > aa01
> >   DMAR:[fault reason 02] Present bit in context entry is clear
> >
> > thanks,
> > Hiroshi
> >
> >>
> >> FYI. Kernel NIC drivers:
> >> When iommu=on{,strict} the kernel network drivers will suffer a heavy
> >> performance penalty due to regular IOVA modifications (both HW and SW
> >> at fault here). Ixgbe and Mellanox reuse dma_mapped pages on the
> >> receive side to avoid this penalty, but still suffer from iommu on TX.
> >>
> >> On Fri, Sep 26, 2014 at 5:47 PM, Choi, Sy Jong  
> >> wrote:
> >> > Hi Shimamoto-san,
> >> >
> >> > There are a lot of sighting relate to "DMAR:[fault reason 06] PTE Read 
> >> > access is not set"
> >> > https://www.mail-archive.com/kvm at vger.kernel.org/msg106573.html
> >> >
> >> > This might be related to IOMMU, and kernel code.
> >> >
> >> > Here is what we know :-
> >> > 1) Disabling VT-d in bios also removed the symptom
> >> > 2) Switch to another OS distribution also removed the symptom
> >> > 3) even different HW we will not see the symptom. In my case, switch 
> >> > from Engineering board to EPSD board.
> >> >
> >> > Regards,
> >> > Choi, Sy Jong
> >> > Platform Application Engineer
> >> >
> >> >
> >> > -Original Message-
> >> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> >> > Sent: Friday, September 26, 2014 5:14 PM
> >> > To: dev at dpdk.org
> >> > Cc: Hayato Momma
> >> > Subject: [dpdk-dev] DPDK doesn't work with iommu=pt
> >> >
> >> > I encountered an issue that DPDK doesn't work with "iommu=pt 
> >> > intel_iommu=on"
> >> > on HP ProLiant DL380p Gen8 server. I'm using the following environment;
> >> >
> >> >   HW: ProLiant DL380p Gen8
> >> >   CPU: E5-2697 v2
> >> >   OS: RHEL7
> >> >   kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
> >> >   DPDK: v1.7.1-53-gce5abac
> >> >   NIC: 82599ES
> >> >
> >> > When boot with "iommu=pt intel_iommu=on", I got the bel

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-29 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] DPDK doesn't work with iommu=pt
> 
> iommu=pt effectively disables iommu for the kernel and iommu is
> enabled only for KVM.
> http://lwn.net/Articles/329174/

thanks for pointing that.

Okay, I think DPDK cannot handle IOMMU because of no kernel code in
DPDK application.

And now, I think "iommu=pt" doesn't work correctly DMA on host PMD
causes DMAR fault which means IOMMU catches a wrong operation.
Will dig around "iommu=pt".

> 
> Basically unless you have KVM running you can remove both lines for
> the same effect.
> On the other hand if you do have KVM and you do want iommu=on You can
> remove the iommu=pt for the same performance because AFAIK unlike the
> kernel drivers DPDK doesn't dma_map and dma_unman each and every
> ingress/egress packet (Please correct me if I'm wrong), and will not
> suffer any performance penalties.

I also tried "iommu=on", but it didn't fix the issue.
I saw the same error messages in kernel.

  [   46.978097] dmar: DRHD: handling fault status reg 2
  [   46.978120] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01
  DMAR:[fault reason 02] Present bit in context entry is clear

thanks,
Hiroshi

> 
> FYI. Kernel NIC drivers:
> When iommu=on{,strict} the kernel network drivers will suffer a heavy
> performance penalty due to regular IOVA modifications (both HW and SW
> at fault here). Ixgbe and Mellanox reuse dma_mapped pages on the
> receive side to avoid this penalty, but still suffer from iommu on TX.
> 
> On Fri, Sep 26, 2014 at 5:47 PM, Choi, Sy Jong  
> wrote:
> > Hi Shimamoto-san,
> >
> > There are a lot of sighting relate to "DMAR:[fault reason 06] PTE Read 
> > access is not set"
> > https://www.mail-archive.com/kvm at vger.kernel.org/msg106573.html
> >
> > This might be related to IOMMU, and kernel code.
> >
> > Here is what we know :-
> > 1) Disabling VT-d in bios also removed the symptom
> > 2) Switch to another OS distribution also removed the symptom
> > 3) even different HW we will not see the symptom. In my case, switch from 
> > Engineering board to EPSD board.
> >
> > Regards,
> > Choi, Sy Jong
> > Platform Application Engineer
> >
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> > Sent: Friday, September 26, 2014 5:14 PM
> > To: dev at dpdk.org
> > Cc: Hayato Momma
> > Subject: [dpdk-dev] DPDK doesn't work with iommu=pt
> >
> > I encountered an issue that DPDK doesn't work with "iommu=pt intel_iommu=on"
> > on HP ProLiant DL380p Gen8 server. I'm using the following environment;
> >
> >   HW: ProLiant DL380p Gen8
> >   CPU: E5-2697 v2
> >   OS: RHEL7
> >   kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
> >   DPDK: v1.7.1-53-gce5abac
> >   NIC: 82599ES
> >
> > When boot with "iommu=pt intel_iommu=on", I got the below message and no 
> > packets are handled.
> >
> >   [  120.809611] dmar: DRHD: handling fault status reg 2
> >   [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
> > aa01
> >   DMAR:[fault reason 02] Present bit in context entry is clear
> >
> > How to reproduce;
> > just run testpmd
> > # ./testpmd -c 0xf -n 4 -- -i
> >
> > Configuring Port 0 (socket 0)
> > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 
> > hw_ring=0x7420 dma_addr=0xaa00
> > PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
> > PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
> > PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 
> > [RTE_PMD_IXGBE_TX_MAX_BURST=32]
> > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
> > hw_ring=0x7421 dma_addr=0xaa01
> > PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
> > Preconditions: rxq->rx_free_thresh=0,
> RTE_PMD_IXGBE_RX_MAX_BURST=32
> > PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not 
> > satisfied, Scattered Rx is requested, or
> RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
> > PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
> > Preconditions: rxq->rx_free_thresh=0,
> RTE_PMD_IXGBE_RX_MAX_BURST=32
> >
> > testpmd> start
> >   io packet forwarding - CRC stripping disabled - packets/burst=32
> >   nb forwarding cores=1 - nb forwarding ports=2
> >   RX queues=1 - RX desc=128 - RX free threshold=0
> >   RX threshold registers: pthresh=8 hthresh=8 wthresh=0

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-29 Thread Hiroshi Shimamoto
Hi,

> Subject: RE: DPDK doesn't work with iommu=pt
> 
> Met the similar issue before.
> VT-d enabled? If so you may need to contact HP to upgrade the BIOS or you may 
> disable VT-d and remove iommu=pt intel_iommu=on
> if you don't need VF function.

we need VT-d and it's enabled.
What we want to do is that use SR-IOV functionality and DPDK application 
concurrently on the same box.

thanks,
Hiroshi

> 
> >-Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> >Sent: Friday, September 26, 2014 5:14 PM
> >To: dev at dpdk.org
> >Cc: Hayato Momma
> >Subject: [dpdk-dev] DPDK doesn't work with iommu=pt
> >
> >I encountered an issue that DPDK doesn't work with "iommu=pt
> >intel_iommu=on"
> >on HP ProLiant DL380p Gen8 server. I'm using the following environment;
> >
> >  HW: ProLiant DL380p Gen8
> >  CPU: E5-2697 v2
> >  OS: RHEL7
> >  kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
> >  DPDK: v1.7.1-53-gce5abac
> >  NIC: 82599ES
> >
> >When boot with "iommu=pt intel_iommu=on", I got the below message and no
> >packets are handled.
> >
> >  [  120.809611] dmar: DRHD: handling fault status reg 2
> >  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr
> >aa01
> >  DMAR:[fault reason 02] Present bit in context entry is clear
> >
> >How to reproduce;
> >just run testpmd
> ># ./testpmd -c 0xf -n 4 -- -i
> >
> >Configuring Port 0 (socket 0)
> >PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0
> >hw_ring=0x7420 dma_addr=0xaa00
> >PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
> >PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0
> >[IXGBE_SIMPLE_FLAGS=f01]
> >PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32
> >[RTE_PMD_IXGBE_TX_MAX_BURST=32]
> >PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740
> >hw_ring=0x7421 dma_addr=0xaa01
> >PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> >Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> >PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not
> >satisfied, Scattered Rx is requested, or
> >RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
> >PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> >Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> >
> >testpmd> start
> >  io packet forwarding - CRC stripping disabled - packets/burst=32
> >  nb forwarding cores=1 - nb forwarding ports=2
> >  RX queues=1 - RX desc=128 - RX free threshold=0
> >  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
> >  TX queues=1 - TX desc=512 - TX free threshold=0
> >  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
> >  TX RS bit threshold=0 - TXQ flags=0x0
> >
> >
> >and ping from another box to this server.
> ># ping6 -I eth2 ff02::1
> >
> >I got the below error message and no packet is received.
> >I couldn't see any increase RX/TX count in testpmt statistics
> >
> >testpmd> show port stats 0
> >
> >   NIC statistics for port 0
> >
> >  RX-packets: 6  RX-missed: 0  RX-bytes:  732
> >  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
> >  RX-nombuf:  0
> >  TX-packets: 0  TX-errors: 0  TX-bytes:  0
> >
> >#
> >###
> >testpmd> show port stats 0
> >
> >   NIC statistics for port 0
> >
> >  RX-packets: 6  RX-missed: 0  RX-bytes:  732
> >  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
> >  RX-nombuf:  0
> >  TX-packets: 0  TX-errors: 0  TX-bytes:  0
> >
> >#
> >###
> >
> >
> >The fault addr in error message must be RX DMA descriptor
> >
> >error message
> >  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr
> >aa01
> >
> >log in testpmd
> >  PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740
> >hw_ring=0x7421 dma_addr=0xaa01
> >
> >I think the NIC received a packet in fifo and try to put into memory with 
> >DMA.
> >Before starting DMA, the NIC get the target address from RX descriptors in
> >RDBA register.
>

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-26 Thread Hiroshi Shimamoto
I encountered an issue that DPDK doesn't work with "iommu=pt intel_iommu=on"
on HP ProLiant DL380p Gen8 server. I'm using the following environment;

  HW: ProLiant DL380p Gen8
  CPU: E5-2697 v2
  OS: RHEL7 
  kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
  DPDK: v1.7.1-53-gce5abac
  NIC: 82599ES

When boot with "iommu=pt intel_iommu=on", I got the below message and
no packets are handled.

  [  120.809611] dmar: DRHD: handling fault status reg 2
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01
  DMAR:[fault reason 02] Present bit in context entry is clear

How to reproduce;
just run testpmd
# ./testpmd -c 0xf -n 4 -- -i

Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 hw_ring=0x7420 
dma_addr=0xaa00
PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 
[RTE_PMD_IXGBE_TX_MAX_BURST=32]
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 hw_ring=0x7421 
dma_addr=0xaa01
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not 
satisfied, Scattered Rx is requested, or RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC 
is not enabled (port=0, queue=0).
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32

testpmd> start
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0


and ping from another box to this server.
# ping6 -I eth2 ff02::1

I got the below error message and no packet is received.
I couldn't see any increase RX/TX count in testpmt statistics

testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  
testpmd> show port stats 0

   NIC statistics for port 0  
  RX-packets: 6  RX-missed: 0  RX-bytes:  732
  RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 0  TX-errors: 0  TX-bytes:  0
  


The fault addr in error message must be RX DMA descriptor

error message
  [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
aa01

log in testpmd
  PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
hw_ring=0x7421 dma_addr=0xaa01

I think the NIC received a packet in fifo and try to put into memory with DMA.
Before starting DMA, the NIC get the target address from RX descriptors in RDBA 
register.
But accessing RX descriptors failed in IOMMU unit and reported it to the kernel.

  DMAR:[fault reason 02] Present bit in context entry is clear

The error message looks there is no valid entry in IOMMU.

I think the following issue is very similar, but using Ubuntu14.04 couldn't fix 
in my case.
http://thread.gmane.org/gmane.comp.networking.dpdk.devel/2281

I tried Ubuntu14.04.1 and got the below error.

  [  199.710191] dmar: DRHD: handling fault status reg 2
  [  199.710896] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
7c24df000
  [  199.710896] DMAR:[fault reason 06] PTE Read access is not set

Currently I could see this issue on HP ProLiant DL380p Gen8 only.
Is there any idea?
Has anyone noticed this issue?

Note: we're thinking to use SR-IOV and DPDK app in the same box.
The box has 2 NICs, one for SR-IOV and pass through to VM, one (no SR-IOV) for 
DPDK app in host.

thanks,
Hiroshi


[dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free

2014-09-25 Thread Hiroshi Shimamoto
Hi Thomas, Keith,

> Subject: Re: [dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free
> 
> 
> On Sep 24, 2014, at 10:20 AM, Thomas Monjalon  
> wrote:
> 
> > 2014-09-11 07:52, Hiroshi Shimamoto:
> >> @@ -408,9 +408,9 @@ retry:
> >>
> >>rte_compiler_barrier();
> >>p->status = MEMNIC_PKT_ST_FILLED;
> >> -
> >> -  rte_pktmbuf_free(tx_pkts[nr]);
> >>}
> >> +  for (i = 0; i < nr; i++)
> >> +  rte_pktmbuf_free(tx_pkts[i]);
> >>
> >>/* stats */
> >>st->opackets += pkts;
> >>
> >
> > You are bursting mbuf freeing. Why title is about "split??

I thought that in this patch splits main loop operations to putting content and
freeing mbuf, then took work "split", but I see "burst mbuf freeing" is 
preferable.

> 
> Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ??
> This would remove the loop in the application and I know I have done the same 
> thing for Pktgen too.

Good point, yes, I'm thinking that having new API like 
rte_pktmbuf_(alloc|free)_bulk()
is good to reduce TLS access and gain performance.
I put that on my stack, but haven't had a time yet.

Do you have any plan to do such thing?

thanks,
Hiroshi

> >
> > --
> > Thomas
> 
> Keith Wiles, Principal Technologist with CTO office, Wind River mobile 
> 972-213-5533



[dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier

2014-09-25 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier
> 
> 2014-09-11 07:48, Hiroshi Shimamoto:
> > x86 can keep store ordering with standard operations.
> 
> Are we sure it's always the case (including old 32-bit CPU)?
> I would prefer to have a reference here. I know we already discussed
> this kind of things but having a reference in commit log could help
> for future discussions.
> 
> > Using memory barrier is much expensive in main packet processing loop.
> > Removing this improves xmit/recv packet performance.
> >
> > We can see performance improvements with memnic-tester.
> > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> >  size |  before  |  after
> >64 | 4.18Mpps | 4.59Mpps
> >   128 | 3.85Mpps | 4.87Mpps
> >   256 | 4.01Mpps | 4.72Mpps
> >   512 | 3.52Mpps | 4.41Mpps
> >  1024 | 3.18Mpps | 3.64Mpps
> >  1280 | 2.86Mpps | 3.15Mpps
> >  1518 | 2.59Mpps | 2.87Mpps
> >
> > Note: we have to take care if we use temporal cache.
> 
> Please, could you explain this last sentence?

Oops, I have mistaken the word, "temporal" should be "non-temporal".

By the way, there are some instructions which use non-temporal
cache liek MOVNTx series.
The store ordering of these instructions is not kept.

Ref. Intel Software Developer Manual
 Vol.1 10.4.6.2 Caching of Temporal vs. Non-Temporal Data
 Vol.3 8.2 Memory Ordering

thanks,
Hiroshi

> 
> Thanks
> --
> Thomas


[dpdk-dev] [memnic PATCH 3/7] pmd: use helper macros

2014-09-25 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH 3/7] pmd: use helper macros
> 
> 2014-09-11 07:47, Hiroshi Shimamoto:
> > Do not touch pktmbuf directly.
> >
> > Instead of direct access, use rte_pktmbuf_pkt_len() and 
> > rte_pktmbuf_data_len()
> > to access the property.
> 
> I guess this change is for compatibility with DPDK 1.8.

Yep, I had the thought need to prepare upcoming code.

> Does it have an impact on performance?

No, it must not.

thanks,
Hiroshi

> 
> --
> Thomas


[dpdk-dev] [memnic PATCH 2/7] pmd: remove needless assignment

2014-09-25 Thread Hiroshi Shimamoto
> Subject: Re: [dpdk-dev] [memnic PATCH 2/7] pmd: remove needless assignment
> 
> 2014-09-11 07:47, Hiroshi Shimamoto:
> > Because these assignment are done in rte_pktmbuf_alloc(), get rid of them.
> 
> Is it increasing the performances?

I hadn't tried to test, because I don't think it can be noticed.
Just clean up, removing a few redundant instructions.

thanks,
Hiroshi

> 
> --
> Thomas


[dpdk-dev] [PATCH 0/3] eal affinitize low priority threads to lcore 0

2014-09-12 Thread Hiroshi Shimamoto
Hi Bruce,

> Subject: [dpdk-dev] [PATCH 0/3] eal affinitize low priority threads to lcore 0
> 
> This patchset sets things up so that we can affinitize the interrupt,
> vfio management, and hpet timer management threads to lcore 0, so that
> they never interfere with data plane threads.

I don't think it works well always.
The management threads can be floating all cpus on demand, because those
threads are created before the master thread affinity is set. The kernel
scheduler will take care of it. And we should isolate cpus which data plane
threads are pinned to, so the management threads cannot run on those isolated
cpus data plane thread run.
In some cases, the user may run data plane thread on lcore 0, but with
this patchset the data plane pinned to lcore 0 always run with the
management threads. That doesn't seem good.

I think this functionality should be conditional.
How about to add a parameter to specify the mask for the management threads
instead of statically assignment to lcore 0?

thanks,
Hiroshi

> 
> Bruce Richardson (3):
>   eal: add core id param to  eal_thread_set_affinity
>   eal: increase scope of eal_thread_set_affinity
>   eal: affinitize low-priority threads to lcore 0
> 
>  lib/librte_eal/bsdapp/eal/eal_thread.c | 12 ++--
>  lib/librte_eal/common/include/eal_private.h| 10 ++
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c   |  5 +
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c |  6 ++
>  lib/librte_eal/linuxapp/eal/eal_thread.c   | 12 ++--
>  lib/librte_eal/linuxapp/eal/eal_timer.c|  5 +
>  6 files changed, 38 insertions(+), 12 deletions(-)
> 
> --
> 1.9.3



[dpdk-dev] [memnic PATCH 0/7] MEMNIC PMD performance improvement

2014-09-11 Thread Hiroshi Shimamoto
Hi Mukawa-san,

> Subject: Re: [dpdk-dev] [memnic PATCH 0/7] MEMNIC PMD performance improvement
> 
> Hi Shimamoto-san,
> 
> 
> (2014/09/11 16:45), Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > This patchset improves MEMNIC PMD performance.
> >
> > The first patch introduces a new benchmark test run in guest,
> > and will be used to evaluate the following patch effects.
> >
> > This patchset improves the throughput results of memnic-tester.
> > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> How many cores are you actually using for sending and receiving?

In this case, I use 4 dedicated cores pinned to each vCPU,
so the answer is 4 cores, more precisely 2 cores for the test DPDK App.

> I guess 1 dedicated core is used for sending on host or guest side, and
> one more dedicated core is for receiving on the other side.
> And you've got a following performance result.
> Is this correct?

I think you can see the test details in the first patch.
The test is done in guest only because I just want to know the
PMD performance only. The host does nothing in the test.
In guest 1 thread = 1 dedicated core emulates packet send/recv
by turning flag on/off. On the other hand another thread, also
pinned 1 dedicated core, does rx_burst and tx_burst.
The test measures how much packets can be received and transmitted
by MEMNIC PMD.
This results means that if host can sends and receives packets in
enough performance, how much throughput the guest application can
achieve.

thanks,
Hiroshi

> 
> Thanks,
> Tetsuya Mukawa
> 
> >  size |  before  |  after
> >64 | 4.18Mpps | 5.83Mpps
> >   128 | 3.85Mpps | 5.71Mpps
> >   256 | 4.01Mpps | 5.40Mpps
> >   512 | 3.52Mpps | 4.64Mpps
> >  1024 | 3.18Mpps | 3.68Mpps
> >  1280 | 2.86Mpps | 3.17Mpps
> >  1518 | 2.59Mpps | 2.90Mpps
> >
> > Hiroshi Shimamoto (7):
> >   guest: memnic-tester: PMD benchmark in guest
> >   pmd: remove needless assignment
> >   pmd: use helper macros
> >   pmd: use compiler barrier
> >   pmd: packet receiving optimization with prefetch
> >   pmd: add branch hint in recv/xmit
> >   pmd: split calling mbuf free
> >
> >  guest/Makefile|  20 
> >  guest/README.rst  |  94 +
> >  guest/memnic-tester.c | 281 
> > ++
> >  pmd/pmd_memnic.c  |  43 
> >  4 files changed, 417 insertions(+), 21 deletions(-)
> >  create mode 100644 guest/Makefile
> >  create mode 100644 guest/README.rst
> >  create mode 100644 guest/memnic-tester.c
> >



[dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

In rte_pktmbuf_free(), there might be cache miss/memory stall issue.
In small packet case, it could harm the performance.

>From the result of memnic-tester, in less than 1024 frame size the
performance could be improved.

Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 5.55Mpps | 5.83Mpps
  128 | 5.44Mpps | 5.71Mpps
  256 | 5.22Mpps | 5.40Mpps
  512 | 4.52Mpps | 4.64Mpps
 1024 | 3.73Mpps | 3.68Mpps
 1280 | 3.22Mpps | 3.17Mpps
 1518 | 2.93Mpps | 2.90Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index cc0ae25..1db065f 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -344,7 +344,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct memnic_adapter *adapter = q->adapter;
struct memnic_data *data = >nic->down;
struct memnic_packet *p;
-   uint16_t nr;
+   uint16_t i, nr;
int idx;
struct rte_eth_stats *st = >stats[rte_lcore_id()];
uint64_t pkts, bytes, errs;
@@ -408,9 +408,9 @@ retry:

rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FILLED;
-
-   rte_pktmbuf_free(tx_pkts[nr]);
}
+   for (i = 0; i < nr; i++)
+   rte_pktmbuf_free(tx_pkts[i]);

/* stats */
st->opackets += pkts;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 6/7] pmd: add branch hint in recv/xmit

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

To reduce instruction cache miss, add branch condition hints into
recv/xmit functions. This improves a bit performance.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 5.54Mpps | 5.55Mpps
  128 | 5.46Mpps | 5.44Mpps
  256 | 5.21Mpps | 5.22Mpps
  512 | 4.50Mpps | 4.52Mpps
 1024 | 3.71Mpps | 3.73Mpps
 1280 | 3.21Mpps | 3.22Mpps
 1518 | 2.92Mpps | 2.93Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index dbe5033..cc0ae25 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -289,26 +289,26 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
int idx, next;
struct rte_eth_stats *st = >stats[rte_lcore_id()];

-   if (!adapter->nic->hdr.valid)
+   if (unlikely(!adapter->nic->hdr.valid))
return 0;

pkts = bytes = errs = 0;
idx = adapter->up_idx;
for (nr = 0; nr < nb_pkts; nr++) {
p = >packets[idx];
-   if (p->status != MEMNIC_PKT_ST_FILLED)
+   if (unlikely(p->status != MEMNIC_PKT_ST_FILLED))
break;
/* prefetch the next area */
next = idx;
-   if (++next >= MEMNIC_NR_PACKET)
+   if (unlikely(++next >= MEMNIC_NR_PACKET))
next = 0;
rte_prefetch0(>packets[next]);
-   if (p->len > framesz) {
+   if (unlikely(p->len > framesz)) {
errs++;
goto drop;
}
mb = rte_pktmbuf_alloc(adapter->mp);
-   if (!mb)
+   if (unlikely(!mb))
break;

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
@@ -350,7 +350,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
uint64_t pkts, bytes, errs;
uint32_t framesz = adapter->framesz;

-   if (!adapter->nic->hdr.valid)
+   if (unlikely(!adapter->nic->hdr.valid))
return 0;

pkts = bytes = errs = 0;
@@ -360,7 +360,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct rte_mbuf *sg;
void *ptr;

-   if (pkt_len > framesz) {
+   if (unlikely(pkt_len > framesz)) {
errs++;
break;
}
@@ -379,7 +379,7 @@ retry:
goto retry;
}

-   if (idx != ACCESS_ONCE(adapter->down_idx)) {
+   if (unlikely(idx != ACCESS_ONCE(adapter->down_idx))) {
/*
 * host freed this and got false positive,
 * need to recover the status and retry.
@@ -388,7 +388,7 @@ retry:
goto retry;
}

-   if (++idx >= MEMNIC_NR_PACKET)
+   if (unlikely(++idx >= MEMNIC_NR_PACKET))
idx = 0;
adapter->down_idx = idx;

-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 5/7] pmd: packet receiving optimization with prefetch

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Prefetch the next packet area to reduce memory stall cycles.

Prefetching the next packet area could hide memory stall, because the next
area will be accessed just after processing the current receive operations.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.59Mpps | 5.54Mpps
  128 | 4.87Mpps | 5.46Mpps
  256 | 4.72Mpps | 5.21Mpps
  512 | 4.41Mpps | 4.50Mpps
 1024 | 3.64Mpps | 3.71Mpps
 1280 | 3.15Mpps | 3.21Mpps
 1518 | 2.87Mpps | 2.92Mpps

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index c22a14d..dbe5033 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -286,7 +286,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
uint16_t nr;
uint64_t pkts, bytes, errs;
uint32_t framesz = adapter->framesz;
-   int idx;
+   int idx, next;
struct rte_eth_stats *st = >stats[rte_lcore_id()];

if (!adapter->nic->hdr.valid)
@@ -298,6 +298,11 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
p = >packets[idx];
if (p->status != MEMNIC_PKT_ST_FILLED)
break;
+   /* prefetch the next area */
+   next = idx;
+   if (++next >= MEMNIC_NR_PACKET)
+   next = 0;
+   rte_prefetch0(>packets[next]);
if (p->len > framesz) {
errs++;
goto drop;
@@ -318,9 +323,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
 drop:
rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FREE;
-
-   if (++idx >= MEMNIC_NR_PACKET)
-   idx = 0;
+   idx = next;
}
adapter->up_idx = idx;

-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

x86 can keep store ordering with standard operations.

Using memory barrier is much expensive in main packet processing loop.
Removing this improves xmit/recv packet performance.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.18Mpps | 4.59Mpps
  128 | 3.85Mpps | 4.87Mpps
  256 | 4.01Mpps | 4.72Mpps
  512 | 3.52Mpps | 4.41Mpps
 1024 | 3.18Mpps | 3.64Mpps
 1280 | 2.86Mpps | 3.15Mpps
 1518 | 2.59Mpps | 2.87Mpps

Note: we have to take care if we use temporal cache.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 8341da7..c22a14d 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -316,7 +316,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
bytes += p->len;

 drop:
-   rte_mb();
+   rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FREE;

if (++idx >= MEMNIC_NR_PACKET)
@@ -403,7 +403,7 @@ retry:
pkts++;
bytes += pkt_len;

-   rte_mb();
+   rte_compiler_barrier();
p->status = MEMNIC_PKT_ST_FILLED;

rte_pktmbuf_free(tx_pkts[nr]);
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 3/7] pmd: use helper macros

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Do not touch pktmbuf directly.

Instead of direct access, use rte_pktmbuf_pkt_len() and rte_pktmbuf_data_len()
to access the property.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index bbb5380..8341da7 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -308,8 +308,8 @@ static uint16_t memnic_recv_pkts(void *rx_queue,

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
mb->pkt.in_port = q->port_id;
-   mb->pkt.pkt_len = p->len;
-   mb->pkt.data_len = p->len;
+   rte_pktmbuf_pkt_len(mb) = p->len;
+   rte_pktmbuf_data_len(mb) = p->len;
rx_pkts[nr] = mb;

pkts++;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 2/7] pmd: remove needless assignment

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Because these assignment are done in rte_pktmbuf_alloc(), get rid of them.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 994ed0a..bbb5380 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -308,8 +308,6 @@ static uint16_t memnic_recv_pkts(void *rx_queue,

rte_memcpy(rte_pktmbuf_mtod(mb, void *), p->data, p->len);
mb->pkt.in_port = q->port_id;
-   mb->pkt.nb_segs = 1;
-   mb->pkt.next = NULL;
mb->pkt.pkt_len = p->len;
mb->pkt.data_len = p->len;
rx_pkts[nr] = mb;
-- 
1.8.3.1



[dpdk-dev] [memnic PATCH 1/7] guest: memnic-tester: PMD benchmark in guest

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Introduce memnic-tester which benchmarks MEMNIC PMD performance in guest.

It starts with two threads, one thread produces and consumes packets,
other thread receives packets and directly transmits the received
packets. This evaluates MEMNIC PMD running cost.

The master thread does rx_burst and tx_burst through MEMNIC PMD.
+-+
| master  |
+-+
 rx_burst ^ | tx_burst
  | V
  +--+--+
  |  up  | down | MEMNIC shared memory
  +--+--+
 set flag ^ | unset flag
  | V
+-+
|  slave  |
+-+
The slave thread emulates packet-in/out by setting flag on/off.

 master |<- put packets ->| |<- get packets ->|
 slave  |   |<- rx packets ->|<- tx packets ->|   |
|<- set ->|

Measuring how many sets in the certain period, that represents
the MEMNIC PMD performance. The master workload must be very low.

It shows that throughputs in different frame size.
  64, 128, 256, 512, 1024, 1280, 1518

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 guest/Makefile|  20 
 guest/README.rst  |  94 +
 guest/memnic-tester.c | 281 ++
 3 files changed, 395 insertions(+)
 create mode 100644 guest/Makefile
 create mode 100644 guest/README.rst
 create mode 100644 guest/memnic-tester.c

diff --git a/guest/Makefile b/guest/Makefile
new file mode 100644
index 000..3c90350
--- /dev/null
+++ b/guest/Makefile
@@ -0,0 +1,20 @@
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+ifeq ($(RTE_TARGET),)
+$(error "Please define RTE_TARGET environment variable")
+endif
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+COMMON_INC_OPT = -I $(PWD)/../common
+
+APP = memnic-tester
+
+CFLAGS += -Wall -g -O3 $(COMMON_INC_OPT)
+
+SRCS-y := memnic-tester.c
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/guest/README.rst b/guest/README.rst
new file mode 100644
index 000..760014e
--- /dev/null
+++ b/guest/README.rst
@@ -0,0 +1,94 @@
+.. Copyright 2014 NEC
+   Redistribution and use in source and binary forms, with or without
+   modification, are permitted provided that the following conditions
+   are met:
+   - Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+   - Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+   FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+   COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+   INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+   (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+   HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+   STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+   ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+   OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MEMNIC TESTER
+=
+
+DESCRIPTION
+---
+
+It is a simple benchmark test of MEMNIC PMD in guest.
+
+It have two threads, one thread produces and consumes packets,
+other thread receives packets and directly transmits the received
+packets back in MEMNIC interface. This evaluates MEMNIC PMD running cost.
+
+The master thread does rx_burst and tx_burst through MEMNIC PMD.
++-+
+| master  |
++-+
+ rx_burst ^ | tx_burst
+  | V
+  +--+--+
+  |  up  | down | MEMNIC shared memory
+  +--+--+
+ set flag ^ | unset flag
+  | V
++-+
+|  slave  |
++-+
+The slave thread emulates packet-in/out by setting flag on/off.
+
+Measuring how many sets in the certain period, that represents
+the MEMNIC PMD performance. The master workload must be very low.
+
+ master |<- put packets ->| |<- get packets ->|
+ slave  |   |<- rx packets ->|<- tx packets ->|   |
+|<- set ->|
+
+Like RFC2544, evaluations are performed the below frame size packets.
+  64, 128, 256, 512, 1024, 1280, 1518
+
+It shows the result as packets per second number of each frame size.
+
+HOW T

[dpdk-dev] [memnic PATCH 0/7] MEMNIC PMD performance improvement

2014-09-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

This patchset improves MEMNIC PMD performance.

The first patch introduces a new benchmark test run in guest,
and will be used to evaluate the following patch effects.

This patchset improves the throughput results of memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.18Mpps | 5.83Mpps
  128 | 3.85Mpps | 5.71Mpps
  256 | 4.01Mpps | 5.40Mpps
  512 | 3.52Mpps | 4.64Mpps
 1024 | 3.18Mpps | 3.68Mpps
 1280 | 2.86Mpps | 3.17Mpps
 1518 | 2.59Mpps | 2.90Mpps

Hiroshi Shimamoto (7):
  guest: memnic-tester: PMD benchmark in guest
  pmd: remove needless assignment
  pmd: use helper macros
  pmd: use compiler barrier
  pmd: packet receiving optimization with prefetch
  pmd: add branch hint in recv/xmit
  pmd: split calling mbuf free

 guest/Makefile|  20 
 guest/README.rst  |  94 +
 guest/memnic-tester.c | 281 ++
 pmd/pmd_memnic.c  |  43 
 4 files changed, 417 insertions(+), 21 deletions(-)
 create mode 100644 guest/Makefile
 create mode 100644 guest/README.rst
 create mode 100644 guest/memnic-tester.c

-- 
1.8.3.1



[dpdk-dev] [PATCH] eal/linuxapp: Add parameter to specify master lcore id

2014-08-04 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] [PATCH] eal/linuxapp: Add parameter to specify master 
> lcore id
> 
> 2014-07-23 08:53, Hiroshi Shimamoto:
> > 2014-07-23 09:50, Thomas Monjalon:
> > > 2014-07-22 23:40, Hiroshi Shimamoto:
> > > > does anyone have interest in this functionality?
> > > >
> > > > I think this is important and useful.
> > > > Since we should care about core assignment to get high performance
> > > > and the master lcore thread is special in DPDK, we will want to
> > > > assign the master to the target core.
> > > > For example, with hyperthreading I'd like to make a pair of packet
> > > > processing threads into one physical core and separate the master
> > > > thread which does some management.
> > >
> > > Thank you for showing your interest.
> > > Does it mean you carefully reviewed this patch? In this case, I'd 
> > > appreciate
> > > a note "Reviewed-by:".
> >
> > Not yet deeply, wait a bit, we're testing this patch in our application.
> > Will report if it works fine.

Sorry a delay, I had confirmed the functionality.
I'm fine to add
Reviewed-by: Hiroshi Shimamoto 

thanks,
Hiroshi

> >
> > By the way, we should add the same code into the BSD code, right?
> 
> Right.
> I'd prefer to reduce the duplicated footprint and have more common code
> between BSD and Linux. But waiting this enhancement, we have to maintain
> the duplicated code for BSD.
> 
> --
> Thomas


[dpdk-dev] [PATCH] eal/linuxapp: Add parameter to specify master lcore id

2014-07-23 Thread Hiroshi Shimamoto
Hi all,

does anyone have interest in this functionality?

I think this is important and useful.
Since we should care about core assignment to get high performance
and the master lcore thread is special in DPDK, we will want to
assign the master to the target core.
For example, with hyperthreading I'd like to make a pair of packet
processing threads into one physical core and separate the master
thread which does some management.

thanks,
Hiroshi

> Subject: Re: [dpdk-dev] [PATCH] eal/linuxapp: Add parameter to specify master 
> lcore id
> 
> Comments?
> 
> On 08.07.2014 11:42, Simon Kuenzer wrote:
> > Here are some comments about the use case of this patch:
> >
> > This patch is especially useful in cases where DPDK applications scale
> > their CPU resources at runtime via starting and stopping slave lcores.
> > Since the coremask defines the maximum scale-out for such a application,
> > the master lcore becomes to the minimum scale-in.
> > Imagine, running multiple primary processed of such DPDK applications,
> > users might want to overlap the coremasks for scaling. However, it would
> > still make sense to run the master lcores on different CPU cores.
> >
> > In DPDK vSwitch we might end up in such a scenario with a future release:
> >https://lists.01.org/pipermail/dpdk-ovs/2014-March/000770.html
> >https://lists.01.org/pipermail/dpdk-ovs/2014-March/000773.html
> >
> > Thanks,
> >
> > Simon
> >
> > On 08.07.2014 10:28, Simon Kuenzer wrote:
> >> This commit enables users to specify the lcore id that
> >> is used as master lcore.
> >>
> >> Signed-off-by: Simon Kuenzer 
> >> ---
> >>   lib/librte_eal/linuxapp/eal/eal.c |   33
> >> +
> >>   1 file changed, 33 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> >> b/lib/librte_eal/linuxapp/eal/eal.c
> >> index 573fd06..4ad5b9b 100644
> >> --- a/lib/librte_eal/linuxapp/eal/eal.c
> >> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> >> @@ -101,6 +101,7 @@
> >>   #define OPT_XEN_DOM0"xen-dom0"
> >>   #define OPT_CREATE_UIO_DEV "create-uio-dev"
> >>   #define OPT_VFIO_INTR"vfio-intr"
> >> +#define OPT_MASTER_LCORE "master-lcore"
> >>
> >>   #define RTE_EAL_BLACKLIST_SIZE0x100
> >>
> >> @@ -336,6 +337,7 @@ eal_usage(const char *prgname)
> >>  "[--proc-type primary|secondary|auto] \n\n"
> >>  "EAL options:\n"
> >>  "  -c COREMASK  : A hexadecimal bitmask of cores to run
> >> on\n"
> >> +   "  --"OPT_MASTER_LCORE" ID: Core ID that is used as master\n"
> >>  "  -n NUM   : Number of memory channels\n"
> >>  "  -v   : Display version information on startup\n"
> >>  "  -d LIB.so: add driver (can be used multiple times)\n"
> >> @@ -468,6 +470,21 @@ eal_parse_coremask(const char *coremask)
> >>   return 0;
> >>   }
> >>
> >> +/* Changes the lcore id of the master thread */
> >> +static int
> >> +eal_parse_master_lcore(const char *arg)
> >> +{
> >> +struct rte_config *cfg = rte_eal_get_configuration();
> >> +int master_lcore = atoi(arg);
> >> +
> >> +if (!(master_lcore >= 0 && master_lcore < RTE_MAX_LCORE))
> >> +return -1;
> >> +if (cfg->lcore_role[master_lcore] != ROLE_RTE)
> >> +return -1;
> >> +cfg->master_lcore = master_lcore;
> >> +return 0;
> >> +}
> >> +
> >>   static int
> >>   eal_parse_syslog(const char *facility)
> >>   {
> >> @@ -653,6 +670,7 @@ eal_parse_args(int argc, char **argv)
> >>   {OPT_HUGE_DIR, 1, 0, 0},
> >>   {OPT_NO_SHCONF, 0, 0, 0},
> >>   {OPT_PROC_TYPE, 1, 0, 0},
> >> +{OPT_MASTER_LCORE, 1, 0, 0},
> >>   {OPT_FILE_PREFIX, 1, 0, 0},
> >>   {OPT_SOCKET_MEM, 1, 0, 0},
> >>   {OPT_PCI_WHITELIST, 1, 0, 0},
> >> @@ -802,6 +820,21 @@ eal_parse_args(int argc, char **argv)
> >>   else if (!strcmp(lgopts[option_index].name,
> >> OPT_PROC_TYPE)) {
> >>   internal_config.process_type =
> >> eal_parse_proc_type(optarg);
> >>   }
> >> +else if (!strcmp(lgopts[option_index].name,
> >> OPT_MASTER_LCORE)) {
> >> +if (!coremask_ok) {
> >> +RTE_LOG(ERR, EAL, "please specify the master "
> >> +"lcore id after specifying "
> >> +"the coremask\n");
> >> +eal_usage(prgname);
> >> +return -1;
> >> +}
> >> +if (eal_parse_master_lcore(optarg) < 0) {
> >> +RTE_LOG(ERR, EAL, "invalid parameter for --"
> >> +OPT_MASTER_LCORE "\n");
> >> +eal_usage(prgname);
> >> +return -1;
> >> +}
> >> +}
> >>   else if (!strcmp(lgopts[option_index].name,
> >> OPT_FILE_PREFIX)) {
> >>   internal_config.hugefile_prefix = optarg;
> >>   }
> >>
> >



[dpdk-dev] MENNIC1.2 host-sim crashed for me

2014-07-15 Thread Hiroshi Shimamoto
Hi Srinivas,

> Subject: FW: MENNIC1.2 host-sim crashed for me
> 
> 
> Hi Hiroshi,
> Thanks for ur reply .. I have moved forward little bit.
> 
> MEMNIC-1.2
> 
> 1. I started qemu and then started host-sim application
> 
> Qemu command :
> qemu-system-x86_64 -enable-kvm -cpu host   -boot c -hda 
> /home/vm-images/vm1-clone.img -m 8192M -smp 3 --enable-kvm -name
> vm1 -vnc :1 -pidfile /tmp/vm1.pid -drive file=fat:rw:/tmp/share  -device 
> ivshmem,size=16,shm=ivshm
> vvfat fat:rw:/tmp/share chs 1024,16,63
> 
> 2.Host-sim app command :
> 3.[root at localhost host-sim]# ./memnic-host-sim   /dev/shm/ivshm
> 4.On the guest compiled  memnic-1.2 .
> 5.Inserted memnic.ko
> 6.Found and interface ens4  after insmod memnic.ko
> 
> [root at localhost memnic-1.2]# ifconfig -a
> ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
> inet6 fe80::5054:ff:fe12:3456  prefixlen 64  scopeid 0x20
> ether 52:54:00:12:34:56  txqueuelen 1000  (Ethernet)
> RX packets 0  bytes 0 (0.0 B)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 8  bytes 648 (648.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> ens4: flags=4098<BROADCAST,MULTICAST>  mtu 1500
> ether 00:09:c0:00:13:37  txqueuelen 1000  (Ethernet)
> RX packets 0  bytes 0 (0.0 B)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 0  bytes 0 (0.0 B)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
> inet 127.0.0.1  netmask 255.0.0.0
> inet6 ::1  prefixlen 128  scopeid 0x10
> loop  txqueuelen 0  (Local Loopback)
> RX packets 386  bytes 33548 (32.7 KiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 386  bytes 33548 (32.7 KiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> 7.lspci on the guest
> 
> [root at localhost memnic-1.2]# lspci
> 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
> 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
> 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
> 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
> 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
> 00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
> Controller (rev 03)
> 00:04.0 RAM memory: Red Hat, Inc Device 1110 [root at localhost memnic-1.2]#
> 
> 8.on the Guest  ran test pmd application

you cannot use both kernel driver and PMD concurrently.
Before run testpmd, you should unload memnic.ko by rmmod command.

> 
> [root at localhost test-pmd]# ./testpmd -c7 -n3  -- --d 
> /usr/local/lib/librte_pmd_memnic_copy.so  -i --nb-cores=1
> --nb-ports=1 --port-topology=chained

I don't know about testpmd so much, but I guess the correct EAL parameters are 
like this.

# ./testpmd -c 0x7 -n 3 -d /usr/local/lib/librte_pmd_memnic_copy.so -- ...

Please pass extra library in EAL parameter.

> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 0 on socket 0
> EAL: Detected lcore 2 as core 0 on socket 0
> EAL: Setting up memory...
> EAL: Ask a virtual area of 0x4000 bytes
> EAL: Virtual area found at 0x7f551400 (size = 0x4000)
> EAL: Requesting 512 pages of size 2MB from socket 0
> EAL: TSC frequency is ~3092833 KHz
> EAL: Master core 0 is ready (tid=54398880)
> EAL: Core 1 is ready (tid=135f8700)
> EAL: Core 2 is ready (tid=12df7700)
> EAL: PCI device :00:03.0 on NUMA socket -1
> EAL:   probe driver: 8086:100e rte_em_pmd
> EAL:   :00:03.0 not managed by UIO driver, skipping
> EAL: Error - exiting with code: 1
>   Cause: No probed ethernet devices - check that CONFIG_RTE_LIBRTE_IGB_PMD=y 
> and that CONFIG_RTE_LIBRTE_EM_PMD=y and that
> CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file
> [root at localhost test-pmd]#
> 
> How can I bind 00:04.0 Ram controller to dpdk application (test-pmd ) .
> How DPDK test-pmd application finds the memnic device.
> 
> 9.Am I missing any steps in the guest configurations or host configuration .
> 10.Is there any better manual for testing MEMNIC-1.2 or better understanding .
> 11.Is there any better application  to test MEMNIC   for VM-VM  or VM to host 
> data transfer .

I think the current host-sim doesn't have any packet switching capability, we 
need to implement such
a functionality to test MEMNIC.

Actually, I started MEMNIC develop in DPDK vSwitch project.
You can see that in https://github.com/01org/dpdk-ovs/tree/development

thanks,
Hiroshi

> 
> Thanks &  regards,
> Srinivas.
> 
> 
> 
&g

[dpdk-dev] MENNIC1.2 host-sim crashed for me

2014-07-15 Thread Hiroshi Shimamoto
Hi,

> Subject: [dpdk-dev] MENNIC1.2 host-sim crashed for me
> 
> Hi,
> I want to run MEMNIC 1.2 application .
> 
> 1.   I compiled DPDK1.6
> 
> 2.   I compiled memnic.12
> 
> 3.   And while running memnic-hostsim appgot strucked
> 
> 4.
> 
> 5.   [root at localhost host-sim]# ./memnic-host-sim /dev/shm/ivshm
> 
> Bus error (core dumped)
> 
> 
> 
> Core was generated by `./memnic-host-sim  /dev/shm/ivshm'.
> 
> Program terminated with signal SIGBUS, Bus error.
> 
> #0  0x003a82e894e4 in memset () from /lib64/libc.so.6
> 
> Missing separate debuginfos, use: debuginfo-install glibc-2.18-11.fc20.x86_64
> 
> (gdb) bt
> 
> #0  0x003a82e894e4 in memset () from /lib64/libc.so.6
> 
> #1  0x004008a3 in init_memnic (nic=0x76fe2000) at host-sim.c:55
> 
> #2  0x00400a8a in main (argc=2, argv=0x7fffe4a8) at host-sim.c:106
> 
> (gdb)
> 
> 
> 
> 
> 
> Got error at line 55 .. saying nic is read only..


I have never tried host-sim yet though.
I guess it's the cause that host-sim doesn't increase the shared memory size.
Could you try booting qemu first with -device ivshmem,size=16,shm=/ivshm then 
run host-sim?

thanks,
Hiroshi

> 
> 
> 
> 53 static void init_memnic(struct memnic_area *nic)
> 
> 54 {
> 
> 55 memset(nic, 0, sizeof(*nic));
> 
> 56 nic->hdr.magic = MEMNIC_MAGIC;
> 
> 57 nic->hdr.version = MEMNIC_VERSION;
> 
> 58 /* 00:09:c0:00:13:37 */
> 
> 59 nic->hdr.mac_addr[0] = 0x00;
> 
> 60 nic->hdr.mac_addr[1] = 0x09;
> 
> 61 nic->hdr.mac_addr[2] = 0xc0;
> 
> 62 nic->hdr.mac_addr[3] = 0x00;
> 
> 63 nic->hdr.mac_addr[4] = 0x13;
> 
> 64 nic->hdr.mac_addr[5] = 0x37;
> 
> 65 }
> 
> 
> 
> Thanks,
> 
> Srinivas.
> 
> "DISCLAIMER: This message is proprietary to Aricent and is intended solely 
> for the use of the individual to whom it is
> addressed. It may contain privileged or confidential information and should 
> not be circulated or used for any purpose
> other than for what it is intended. If you have received this message in 
> error, please notify the originator immediately.
> If you are not the intended recipient, you are notified that you are strictly 
> prohibited from using, copying, altering,
> or disclosing the contents of this message. Aricent accepts no responsibility 
> for loss or damage arising from the use
> of the information transmitted by this email including damage from virus."


[dpdk-dev] [PATCH] kni: compatibility with RHEL 7

2014-06-26 Thread Hiroshi Shimamoto
Hi,

> Subject: RE: [dpdk-dev] [PATCH] kni: compatibility with RHEL 7
> 
> Hi Hiroshi,
> 
>   Helin submitted one patch to fix compilation error in the redhat 6.4 and 
> 6.5.
>   Patch title is [dpdk-dev] [PATCH] kni: fix compile errors on Oracle 
> Linux6.4 and RHEL6.5
>   With this patch, we don't meet this compilation error in latest RHEL 7.0
>   Can you download latest DPDK code, and try to compile with this patch in 
> RHEL 7.0 again?

okay, I will try the latest code.

thanks,
Hiroshi

> 
> Thanks
> Waterman
> 
> 
> >-Original Message-
> >From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >Sent: Wednesday, June 25, 2014 6:05 PM
> >To: Cao, Waterman
> >Cc: dev at dpdk.org; Hiroshi Shimamoto; Hayato Momma
> >Subject: Re: [dpdk-dev] [PATCH] kni: compatibility with RHEL 7
> >
> >Hi Waterman,
> >
> >2014-06-12 09:35, Hiroshi Shimamoto:
> >> 2014-06-12 09:18, Cao, Waterman:
> >> >   Can you give details about Linux Kernel version and complier version?
> >> >   Because we tried to build code in the Redhat 7.0 before, but we don't
> >> >   meet this issue. Please see information as the following:
> >> >   Linux kernel 3.10.0-54.0.1.el7.x86_64
> >> >   RHEL70BETA_64  GCC 4.8.2  ICC: 14.0.0
> >>
> >> Yes,
> >>
> >> Linux REHEL7RC-1 3.10.0-121.el7.x86_64 #1 SMP Tue Apr 8 10:48:19 EDT
> >> 2014
> >> x86_64 x86_64 x86_64 GNU/Linux gcc version 4.8.2 20140120 (Red Hat
> >> 4.8.2-16) (GCC)
> >>
> >> I got the below error;
> >> /path/to/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h:3851:1: error:
> >> conflicting types for ?skb_set_hash? skb_set_hash(struct sk_buff *skb,
> >> __u32 hash, __always_unused int type)
> >>
> >> /usr/src/kernels/3.10.0-121.el7.x86_64/include/linux/skbuff.h:762:1: note:
> >> previous definition of ?skb_set_hash? was here skb_set_hash(struct
> >> sk_buff *skb, __u32 hash, enum pkt_hash_types type)
> >
> >Could you confirm this fix is needed and acknowledge it?
> >Thanks
> >
> >
> >> > -Original Message-
> >> >
> >> > >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi
> >> > >Shimamoto
> >> > >Sent: Thursday, June 12, 2014 4:10 PM
> >> > >To: dev at dpdk.org
> >> > >Cc: Hayato Momma
> >> > >Subject: [dpdk-dev] [PATCH] kni: compatibility with RHEL 7
> >> > >
> >> > >From: Hiroshi Shimamoto 
> >> > >
> >> > >Compilation in RHEL7 is failed. This fixes the build issue.
> >> > >
> >> > >RHEL7 has skb_set_hash, the kernel version is 3.10 though.
> >> > >Don't define skb_set_hash for RHEL7.
> >> > >
> >> > >Signed-off-by: Hiroshi Shimamoto 
> >> > >Reviewed-by: Hayato Momma 
> >> > >---
> >> > >
> >> > > lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h | 5 +
> >> > > 1 file changed, 5 insertions(+)
> >> > >
> >> > >diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >> > >b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h index
> >> > >4c27d5d..b4de6e2 100644
> >> > >--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >> > >+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >> > >@@ -3843,6 +3843,9 @@ static inline struct sk_buff
> >> > >*__kc__vlan_hwaccel_put_tag(struct sk_buff *skb,  #endif /* >=
> >> > >3.10.0>
> >> > */
> >> >
> >> > > #if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0) )
> >> > >
> >> > >+
> >> > >+#if (!(RHEL_RELEASE_CODE && RHEL_RELEASE_CODE >=
> >> > >+RHEL_RELEASE_VERSION(7,0)))
> >> > >+
> >> > >
> >> > > #ifdef NETIF_F_RXHASH
> >> > > #define PKT_HASH_TYPE_L3 0
> >> > > static inline void
> >> > >
> >> > >@@ -3851,6 +3854,8 @@ skb_set_hash(struct sk_buff *skb, __u32 hash,
> >> > >__always_unused int type)> >
> >> > >skb->rxhash = hash;
> >> > >
> >> > > }
> >> > > #endif /* NETIF_F_RXHASH */
> >> > >
> >> > >+#endif /* < RHEL7 */
> >> > >+
> >> > >
> >> > > #endif /* < 3.14.0 */
> >> > >
> >> > > #endif /* _KCOMPAT_H_ */
> >> > >
> >> > >--
> >> > >1.9.1
> >
> >
> >--
> >Thomas


[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> 2014-06-18 11:42, Hiroshi Shimamoto:
> > 2014-06-18 19:26, GongJinrong:
> > > Do you have any idea that how to write a host application
> > > to put the data to guest memnic PMD?
> >
> > Yes, basically I made the MEMNIC interface work with DPDK vSwitch.
> >
> > By the way, you can mmap() the shm which specified as the ivshmem and put
> > the proper data to send a packet to guest PMD.
> > I don't have time to make proper code, but can advise you;
> > please see common/memnic.h and the memory layout.
> > 1) Set magic and version in header on host.
> > 2) Initialize PMD on guest.
> > 3) Check the reset is 1 and set valid to 1, reset to 0 on host.
> > 4) Use uplink area the default block size 4K.
> >Set len and fill ether frame data, then set the status to 2 on host.
> >Guest PMD may receive the packet.
> >Proceed to the next packet block.
> 
> Such application should be integrated in memnic repository.
> I know Olivier wrote one which could be sent on next week.

yeah, I just begin to feel to need such a software in the repository.

thanks,
Hiroshi

> 
> --
> Thomas


[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto
Hi,

> Subject: ##freemail## RE: ##freemail## RE: [dpdk-dev] Testing memnic for VM 
> to VM transfer
> 
> Hi, Hiroshi
> 
>Do you mean I must use DPDK vSwitch in host when I use MEMNIC PMD in
> guest VM? actually, I just want a channel which can put the data from host
> to guest quickly. Do you have any idea that how to write a host application
> to put the data to guest memnic PMD?

Yes, basically I made the MEMNIC interface work with DPDK vSwitch.

By the way, you can mmap() the shm which specified as the ivshmem and put
the proper data to send a packet to guest PMD.
I don't have time to make proper code, but can advise you;
please see common/memnic.h and the memory layout.
1) Set magic and version in header on host.
2) Initialize PMD on guest.
3) Check the reset is 1 and set valid to 1, reset to 0 on host.
4) Use uplink area the default block size 4K.
   Set len and fill ether frame data, then set the status to 2 on host.
   Guest PMD may receive the packet.
   Proceed to the next packet block.

thanks,
Hiroshi

> 
> -----Original Message-
> From: Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> Sent: Wednesday, June 18, 2014 7:11 PM
> To: GongJinrong; 'John Joyce (joycej)'; dev at dpdk.org
> Subject: RE: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM
> transfer
> 
> Hi,
> 
> > Subject: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM
> > transfer
> >
> > Hi, Hiroshi
> >
> >I just start to learn DPDK and memnic, in memnic guide, you said
> > "On host, the shared memory must be initialized by an application
> > using memnic", I am not so clear that how to initialize the share
> > memory in host, do you means use posix API or DPDK API to create the
> > share memory?(it seems memnic guest side use rte_mbuf to transfer
> > data), do you have any sample code to demo how to use memnic in host?
> 
> I don't have simple MEMNIC sample to use it on host.
> Could you please try DPDK vSwitch and enables MEMNIC vport?
> DPDK vSwitch must handle packets between physical NIC port and MEMNIC vport
> exposed to guest with dpdk.org memnic driver.
> 
> thanks,
> Hiroshi
> 
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> > Sent: Wednesday, June 18, 2014 12:02 PM
> > To: John Joyce (joycej); dev at dpdk.org
> > Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> >
> > Hi,
> >
> > > Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> > >
> > > Hi everyone:
> > > We are interested in testing the performance of the memnic
> > > driver
> > posted at http://dpdk.org/browse/memnic/refs/.
> > > We want to compare its performance compared to other techniques to
> > > transfer packets between the guest and the kernel, predominately for
> > > VM to
> > VM transfers.
> > >
> > > We have downloaded the memnic components and have got it running in
> > > a
> > guest VM.
> > >
> > > The question we hope this group might be able to help with is what
> > > would be the best way to processes the packets in the kernel to get
> > > a VM
> > to VM transfer.
> >
> > I think there is no kernel code work with MEMNIC.
> > The recommend switching software on the host is Intel DPDK vSwitch
> > hosted on 01.org and github.
> > https://github.com/01org/dpdk-ovs/tree/development
> >
> > Intel DPDK vSwitch runs on userspace not kernel.
> >
> > I introduced this mechanism to DPDK vSwitch and the guest drivers are
> > maintained in dpdk.org.
> >
> > thanks,
> > Hiroshi
> >
> > >
> > > A couple options might be possible
> > >
> > >
> > > 1.   Common shared buffer between two VMs.  With some utility/code
> to
> > switch TX & RX rings between the two VMs.
> > >
> > > VM1 application --- memnic  ---  common shared memory buffer on the
> > > host --- memnic  ---  VM2 application
> > >
> > > 2.   Special purpose Kernel switching module
> > >
> > > VM1 application --- memnic  ---  shared memory VM1  --- Kernel
> > > switching module  --- shared memory VM2  --- memnic  ---
> > > VM2 application
> > >
> > > 3.   Existing Kernel switching module
> > >
> > > VM1 application --- memnic  ---  shared memory VM1  --- existing
> > > Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> > > --- shared memory VM2  --- memnic  ---  VM2 application
> > >
> > > Can anyone recommend which approach might be best or easiest?   We would
> > like to avoid writing much (or any) kernel code
> > > so if there are already any open source code or test utilities that
> > > provide one of these options or would be a good starting point to
> > > start
> > from,  a pointer would be much appreciated.
> > >
> > > Thanks in advance
> > >
> > >
> > > John Joyce



[dpdk-dev] ##freemail## RE: Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto
Hi,

> Subject: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi, Hiroshi
> 
>I just start to learn DPDK and memnic, in memnic guide, you said "On
> host, the shared memory must be initialized by an application using memnic",
> I am not so clear that how to initialize the share memory in host, do you
> means use posix API or DPDK API to create the share memory?(it seems memnic
> guest side use rte_mbuf to transfer data), do you have any sample code to
> demo how to use memnic in host?

I don't have simple MEMNIC sample to use it on host.
Could you please try DPDK vSwitch and enables MEMNIC vport?
DPDK vSwitch must handle packets between physical NIC port and MEMNIC vport
exposed to guest with dpdk.org memnic driver.

thanks,
Hiroshi

> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> Sent: Wednesday, June 18, 2014 12:02 PM
> To: John Joyce (joycej); dev at dpdk.org
> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi,
> 
> > Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> >
> > Hi everyone:
> > We are interested in testing the performance of the memnic driver
> posted at http://dpdk.org/browse/memnic/refs/.
> > We want to compare its performance compared to other techniques to
> > transfer packets between the guest and the kernel, predominately for VM to
> VM transfers.
> >
> > We have downloaded the memnic components and have got it running in a
> guest VM.
> >
> > The question we hope this group might be able to help with is what
> > would be the best way to processes the packets in the kernel to get a VM
> to VM transfer.
> 
> I think there is no kernel code work with MEMNIC.
> The recommend switching software on the host is Intel DPDK vSwitch hosted on
> 01.org and github.
> https://github.com/01org/dpdk-ovs/tree/development
> 
> Intel DPDK vSwitch runs on userspace not kernel.
> 
> I introduced this mechanism to DPDK vSwitch and the guest drivers are
> maintained in dpdk.org.
> 
> thanks,
> Hiroshi
> 
> >
> > A couple options might be possible
> >
> >
> > 1.   Common shared buffer between two VMs.  With some utility/code to
> switch TX & RX rings between the two VMs.
> >
> > VM1 application --- memnic  ---  common shared memory buffer on the
> > host --- memnic  ---  VM2 application
> >
> > 2.   Special purpose Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- Kernel
> > switching module  --- shared memory VM2  --- memnic  ---
> > VM2 application
> >
> > 3.   Existing Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- existing
> > Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> > --- shared memory VM2  --- memnic  ---  VM2 application
> >
> > Can anyone recommend which approach might be best or easiest?   We would
> like to avoid writing much (or any) kernel code
> > so if there are already any open source code or test utilities that
> > provide one of these options or would be a good starting point to start
> from,  a pointer would be much appreciated.
> >
> > Thanks in advance
> >
> >
> > John Joyce



[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto
Hi,

> Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi everyone:
> We are interested in testing the performance of the memnic driver 
> posted at http://dpdk.org/browse/memnic/refs/.
> We want to compare its performance compared to other techniques to transfer 
> packets between the guest and the kernel,
> predominately for VM to VM transfers.
> 
> We have downloaded the memnic components and have got it running in a guest 
> VM.
> 
> The question we hope this group might be able to help with is what would be 
> the best way to processes the packets in the
> kernel to get a VM to VM transfer.

I think there is no kernel code work with MEMNIC.
The recommend switching software on the host is Intel DPDK vSwitch hosted on 
01.org and github.
https://github.com/01org/dpdk-ovs/tree/development

Intel DPDK vSwitch runs on userspace not kernel.

I introduced this mechanism to DPDK vSwitch and the guest drivers are 
maintained in dpdk.org.

thanks,
Hiroshi

> 
> A couple options might be possible
> 
> 
> 1.   Common shared buffer between two VMs.  With some utility/code to 
> switch TX & RX rings between the two VMs.
> 
> VM1 application --- memnic  ---  common shared memory buffer on the host --- 
> memnic  ---  VM2 application
> 
> 2.   Special purpose Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- Kernel switching 
> module  --- shared memory VM2  --- memnic  ---
> VM2 application
> 
> 3.   Existing Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- existing Kernel 
> switching module (e.g. OVS/linux Bridge/VETh pair)
> --- shared memory VM2  --- memnic  ---  VM2 application
> 
> Can anyone recommend which approach might be best or easiest?   We would like 
> to avoid writing much (or any) kernel code
> so if there are already any open source code or test utilities that provide 
> one of these options or would be a good starting
> point to start from,  a pointer would be much appreciated.
> 
> Thanks in advance
> 
> 
> John Joyce



[dpdk-dev] [PATCH] kni: compatibility with RHEL 7

2014-06-12 Thread Hiroshi Shimamoto
Hi,

> Subject: RE: [PATCH] kni: compatibility with RHEL 7
> 
> Hi Shimamoto,
> 
>   Can you give details about Linux Kernel version and complier version?
>   Because we tried to build code in the Redhat 7.0 before, but we don't meet 
> this issue.
>   Please see information as the following:
>   Linux kernel 3.10.0-54.0.1.el7.x86_64
>   RHEL70BETA_64   GCC 4.8.2  ICC: 14.0.0

Yes,

Linux REHEL7RC-1 3.10.0-121.el7.x86_64 #1 SMP Tue Apr 8 10:48:19 EDT 2014 
x86_64 x86_64 x86_64 GNU/Linux
gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC)

I got the below error;
/path/to/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h:3851:1: error: 
conflicting types for ?skb_set_hash?
 skb_set_hash(struct sk_buff *skb, __u32 hash, __always_unused int type)

/usr/src/kernels/3.10.0-121.el7.x86_64/include/linux/skbuff.h:762:1: note: 
previous definition of ?skb_set_hash? was here
 skb_set_hash(struct sk_buff *skb, __u32 hash, enum pkt_hash_types type)


thanks,
Hiroshi

> 
> Thanks
> 
> Waterman
> 
> -Original Message-
> >From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> >Sent: Thursday, June 12, 2014 4:10 PM
> >To: dev at dpdk.org
> >Cc: Hayato Momma
> >Subject: [dpdk-dev] [PATCH] kni: compatibility with RHEL 7
> >
> >From: Hiroshi Shimamoto 
> >
> >Compilation in RHEL7 is failed. This fixes the build issue.
> >
> >RHEL7 has skb_set_hash, the kernel version is 3.10 though.
> >Don't define skb_set_hash for RHEL7.
> >
> >Signed-off-by: Hiroshi Shimamoto 
> >Reviewed-by: Hayato Momma 
> >---
> > lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h | 5 +
> > 1 file changed, 5 insertions(+)
> >
> >diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h 
> >b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >index 4c27d5d..b4de6e2 100644
> >--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
> >@@ -3843,6 +3843,9 @@ static inline struct sk_buff 
> >*__kc__vlan_hwaccel_put_tag(struct sk_buff *skb,  #endif /* >= 3.10.0
> */
> >
> > #if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0) )
> >+
> >+#if (!(RHEL_RELEASE_CODE && RHEL_RELEASE_CODE >=
> >+RHEL_RELEASE_VERSION(7,0)))
> >+
> > #ifdef NETIF_F_RXHASH
> > #define PKT_HASH_TYPE_L3 0
> > static inline void
> >@@ -3851,6 +3854,8 @@ skb_set_hash(struct sk_buff *skb, __u32 hash, 
> >__always_unused int type)
> > skb->rxhash = hash;
> > }
> > #endif /* NETIF_F_RXHASH */
> >+#endif /* < RHEL7 */
> >+
> > #endif /* < 3.14.0 */
> >
> > #endif /* _KCOMPAT_H_ */
> >--
> >1.9.1
> >


[dpdk-dev] [PATCH] rte_memory.h: include stdio.h for FILE

2014-06-12 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

The below commit requires stdio FILE structure.

commit 591a9d7985c1230652d9f7ea1f9221e8c66ec188
Author: Stephen Hemminger 
Date:   Fri May 2 16:42:56 2014 -0700

add FILE argument to debug functions

Application which includes rte_memory.h without stdio.h will be hit
compilation failure.

/path/to/include/rte_memory.h:146:30: error: unknown type name ?FILE?
 void rte_dump_physmem_layout(FILE *f);

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 lib/librte_eal/common/include/rte_memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/include/rte_memory.h 
b/lib/librte_eal/common/include/rte_memory.h
index 7f21244..4cf8ea9 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -42,6 +42,7 @@

 #include 
 #include 
+#include 

 #ifdef RTE_EXEC_ENV_LINUXAPP
 #include 
-- 
1.9.1



[dpdk-dev] [PATCH] kni: compatibility with RHEL 7

2014-06-12 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Compilation in RHEL7 is failed. This fixes the build issue.

RHEL7 has skb_set_hash, the kernel version is 3.10 though.
Don't define skb_set_hash for RHEL7.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h 
b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
index 4c27d5d..b4de6e2 100644
--- a/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
+++ b/lib/librte_eal/linuxapp/kni/ethtool/igb/kcompat.h
@@ -3843,6 +3843,9 @@ static inline struct sk_buff 
*__kc__vlan_hwaccel_put_tag(struct sk_buff *skb,
 #endif /* >= 3.10.0 */

 #if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,14,0) )
+
+#if (!(RHEL_RELEASE_CODE && RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,0)))
+
 #ifdef NETIF_F_RXHASH
 #define PKT_HASH_TYPE_L3 0
 static inline void
@@ -3851,6 +3854,8 @@ skb_set_hash(struct sk_buff *skb, __u32 hash, 
__always_unused int type)
skb->rxhash = hash;
 }
 #endif /* NETIF_F_RXHASH */
+#endif /* < RHEL7 */
+
 #endif /* < 3.14.0 */

 #endif /* _KCOMPAT_H_ */
-- 
1.9.1



[dpdk-dev] [memnic PATCH 5/5] linux: support MTU change

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Add the capability to change MTU.

On MTU change, remember the corresponding frame size and request new
frame size to the host on reset, if the host MEMNIC has that feature.

Don't trust framesz of header in general usage, because host might change
the value unexpectedly.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 linux/memnic_net.c | 39 ---
 linux/memnic_net.h |  5 +
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/linux/memnic_net.c b/linux/memnic_net.c
index 02b5acc..8de3668 100644
--- a/linux/memnic_net.c
+++ b/linux/memnic_net.c
@@ -31,6 +31,7 @@

 #include 
 #include 
+#include 

 #include "memnic_net.h"
 #include "memnic.h"
@@ -152,19 +153,31 @@ static int memnic_open(struct net_device *netdev)
 {
struct memnic_net *memnic = netdev_priv(netdev);
struct memnic_area *nic = memnic->dev->base_addr;
+   struct memnic_header *hdr = >hdr;
struct task_struct *kthread;

/* clear stats */
memset(>stats, 0, sizeof(memnic->stats));
/* invalidate and reset here */
-   nic->hdr.valid = 0;
+   hdr->valid = 0;
+
+   /* setup parameters */
+   if (memnic->request.features & MEMNIC_FEAT_FRAME_SIZE)
+   hdr->framesz = memnic->request.framesz;
+   hdr->request = memnic->request.features;
+
smp_wmb();
-   nic->hdr.reset = 1;
+   hdr->reset = 1;
+
+   while (ACCESS_ONCE(hdr->reset))
+   schedule_timeout_interruptible(HZ/100);
+
/* clear index */
memnic->up = 0;
memnic->down = 0;
memnic->framesz = MEMNIC_MAX_FRAME_LEN;
-   /* will become valid after reset handling in vswitch */
+   if (memnic->request.features & MEMNIC_FEAT_FRAME_SIZE)
+   memnic->framesz = hdr->framesz;

/* already run */
if (memnic->kthread)
@@ -260,6 +273,24 @@ static int memnic_set_mac(struct net_device *netdev, void 
*p)

 static int memnic_change_mtu(struct net_device *netdev, int new_mtu)
 {
+   struct memnic_net *memnic = netdev_priv(netdev);
+   struct memnic_area *nic = memnic->dev->base_addr;
+   struct memnic_header *hdr = >hdr;
+   uint32_t framesz = new_mtu + ETH_HLEN + VLAN_HLEN;
+
+   if (!(hdr->features & MEMNIC_FEAT_FRAME_SIZE))
+   return -ENOSYS;
+
+   /* new_mtu less than 68 might cause problem */
+   if (new_mtu < 68 || framesz > MEMNIC_MAX_JUMBO_FRAME_LEN)
+   return -EINVAL;
+
+   printk(KERN_INFO "MEMNIC: Changing MTU from %u to %u\n",
+   netdev->mtu, new_mtu);
+
+   memnic->request.features |= MEMNIC_FEAT_FRAME_SIZE;
+   memnic->request.framesz = framesz;
+
return 0;
 }

@@ -298,6 +329,8 @@ struct memnic_net *memnic_net_create(struct memnic_dev *dev)

memnic->netdev = netdev;
memnic->dev = dev;
+   memnic->framesz = MEMNIC_MAX_FRAME_LEN;
+   memnic->request.features = 0;

netdev->netdev_ops = _netdev_ops;

diff --git a/linux/memnic_net.h b/linux/memnic_net.h
index 10c8eed..b6c57ab 100644
--- a/linux/memnic_net.h
+++ b/linux/memnic_net.h
@@ -44,6 +44,11 @@ struct memnic_net {
struct net_device_stats stats;
int up, down;
uint32_t framesz;
+   /* request to host */
+   struct {
+   uint32_t features;
+   uint32_t framesz;
+   } request;
 };

 struct memnic_net *memnic_net_create(struct memnic_dev *dev);
-- 
1.8.4



[dpdk-dev] [memnic PATCH 4/5] linux: prepare to support variable frame size

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Add framesz field in memnic data structure, and initialized with the current
frame size.
Replace length check on TX/RX with the above frame size.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 linux/memnic_net.c | 7 +--
 linux/memnic_net.h | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/linux/memnic_net.c b/linux/memnic_net.c
index a1b433a..02b5acc 100644
--- a/linux/memnic_net.c
+++ b/linux/memnic_net.c
@@ -46,6 +46,7 @@ static struct sk_buff *memnic_rx(struct memnic_net *memnic)
struct sk_buff *skb;
struct memnic_packet *p;
int idx, len;
+   uint32_t framesz = memnic->framesz;

idx = ACCESS_ONCE(memnic->up);
p = >packets[idx];
@@ -54,7 +55,7 @@ static struct sk_buff *memnic_rx(struct memnic_net *memnic)
return ERR_PTR(-ENOENT);

len = p->len;
-   if (len > MEMNIC_MAX_FRAME_LEN) {
+   if (len > framesz) {
p->status = MEMNIC_PKT_ST_FREE;
memnic->stats.rx_errors++;
skb = ERR_PTR(-EINVAL);
@@ -162,6 +163,7 @@ static int memnic_open(struct net_device *netdev)
/* clear index */
memnic->up = 0;
memnic->down = 0;
+   memnic->framesz = MEMNIC_MAX_FRAME_LEN;
/* will become valid after reset handling in vswitch */

/* already run */
@@ -196,12 +198,13 @@ static netdev_tx_t memnic_start_xmit(struct sk_buff *skb,
struct memnic_data *down = >down;
struct memnic_packet *p;
int idx, old, len;
+   uint32_t framesz = memnic->framesz;

if (!(nic->hdr.valid))
goto drop;

len = skb->len;
-   if (len > MEMNIC_MAX_FRAME_LEN)
+   if (len > framesz)
goto drop;
 retry:
idx = ACCESS_ONCE(memnic->down);
diff --git a/linux/memnic_net.h b/linux/memnic_net.h
index 761ed0a..10c8eed 100644
--- a/linux/memnic_net.h
+++ b/linux/memnic_net.h
@@ -43,6 +43,7 @@ struct memnic_net {
struct task_struct *kthread;
struct net_device_stats stats;
int up, down;
+   uint32_t framesz;
 };

 struct memnic_net *memnic_net_create(struct memnic_dev *dev);
-- 
1.8.4



[dpdk-dev] [memnic PATCH 3/5] pmd: support variable frame size

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

If the MEMNIC framework has the feature MEMNIC_FEAT_FRAME_SIZE and
configured frame size, set request bit and frame size to support
larger frame size on reset.

Don't trust framesz of header in general usage, because host might change
the value unexpectedly.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 6b6bcb3..1f30a8c 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -100,15 +100,36 @@ static int memnic_dev_configure(__rte_unused struct 
rte_eth_dev *dev)
 static int memnic_dev_start(struct rte_eth_dev *dev)
 {
struct memnic_adapter *adapter = get_adapter(dev);
+   struct memnic_header *hdr = >nic->hdr;
+   uint32_t request;

/* invalidate */
-   adapter->nic->hdr.valid = 0;
+   hdr->valid = 0;
+
+   /* setup parameters */
+   request = 0;
+   /* setup jumbo frame support, if any. */
+   if (dev->data->dev_conf.rxmode.jumbo_frame == 1) {
+   hdr->framesz = dev->data->dev_conf.rxmode.max_rx_pkt_len;
+   request |= MEMNIC_FEAT_FRAME_SIZE;
+   } else {
+   hdr->framesz = MEMNIC_MAX_FRAME_LEN;
+   }
+   hdr->request = request;
+
rte_mb();
/* reset */
-   adapter->nic->hdr.reset = 1;
-   /* no need to wait here */
+   hdr->reset = 1;
+
+   /* wait */
+   while (ACCESS_ONCE(hdr->reset))
+   rte_pause();
+
adapter->up_idx = adapter->down_idx = 0;
+
adapter->framesz = MEMNIC_MAX_FRAME_LEN;
+   if (request & MEMNIC_FEAT_FRAME_SIZE)
+   adapter->framesz = hdr->framesz;

return 0;
 }
@@ -126,12 +147,18 @@ static void memnic_dev_stop(struct rte_eth_dev *dev)
 static void memnic_dev_infos_get(struct rte_eth_dev *dev,
 struct rte_eth_dev_info *dev_info)
 {
+   struct memnic_adapter *adapter = get_adapter(dev);
+   struct memnic_header *hdr = >nic->hdr;
+
dev_info->driver_name = dev->driver->pci_drv.name;
dev_info->max_rx_queues = 1;
dev_info->max_tx_queues = 1;
dev_info->min_rx_bufsize = 60;
dev_info->max_rx_pktlen = MEMNIC_MAX_FRAME_LEN;
dev_info->max_mac_addrs = 1;
+
+   if (hdr->features & MEMNIC_FEAT_FRAME_SIZE)
+   dev_info->max_rx_pktlen = MEMNIC_MAX_JUMBO_FRAME_LEN;
 }

 static void memnic_dev_stats_get(struct rte_eth_dev *dev,
-- 
1.8.4



[dpdk-dev] [memnic PATCH 2/5] pmd: prepare to support variable frame size

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Add framesz field in adapter structure, and initialized with the current
frame size.
Replace length check on TX/RX with the above frame size.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 4abdf26..6b6bcb3 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -49,6 +49,7 @@
 struct memnic_adapter {
struct memnic_area *nic;
int up_idx, down_idx;
+   uint32_t framesz;
struct rte_mempool *mp;
struct ether_addr mac_addr;
/*
@@ -107,6 +108,7 @@ static int memnic_dev_start(struct rte_eth_dev *dev)
adapter->nic->hdr.reset = 1;
/* no need to wait here */
adapter->up_idx = adapter->down_idx = 0;
+   adapter->framesz = MEMNIC_MAX_FRAME_LEN;

return 0;
 }
@@ -256,6 +258,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
struct rte_mbuf *mb;
uint16_t nr;
uint64_t pkts, bytes, errs;
+   uint32_t framesz = adapter->framesz;
int idx;
struct rte_eth_stats *st = >stats[rte_lcore_id()];

@@ -268,7 +271,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
p = >packets[idx];
if (p->status != MEMNIC_PKT_ST_FILLED)
break;
-   if (p->len > MEMNIC_MAX_FRAME_LEN) {
+   if (p->len > framesz) {
errs++;
goto drop;
}
@@ -317,6 +320,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
int idx;
struct rte_eth_stats *st = >stats[rte_lcore_id()];
uint64_t pkts, bytes, errs;
+   uint32_t framesz = adapter->framesz;

if (!adapter->nic->hdr.valid)
return 0;
@@ -324,11 +328,11 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
pkts = bytes = errs = 0;

for (nr = 0; nr < nb_pkts; nr++) {
-   int pkt_len = rte_pktmbuf_pkt_len(tx_pkts[nr]);
+   uint32_t pkt_len = rte_pktmbuf_pkt_len(tx_pkts[nr]);
struct rte_mbuf *sg;
void *ptr;

-   if (pkt_len > MEMNIC_MAX_FRAME_LEN) {
+   if (pkt_len > framesz) {
errs++;
break;
}
-- 
1.8.4



[dpdk-dev] [memnic PATCH 1/5] common: update memnic.h to support variable frame size

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Update MEMNIC data structure in common header file.

Prepare to support extra features for MEMNIC.

Change name reserved to request which will be used to negotiate between
host and guest, and add feature flag and other definitions.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 common/memnic.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/common/memnic.h b/common/memnic.h
index 84e941c..d5a651f 100644
--- a/common/memnic.h
+++ b/common/memnic.h
@@ -42,19 +42,24 @@
 #define MEMNIC_MAX_PACKET_SIZE (4096)

 #define MEMNIC_MAX_FRAME_LEN   (1500 + 14 + 4) /* MTU + ether header + vlan */
+#define MEMNIC_MAX_JUMBO_FRAME_LEN (MEMNIC_MAX_PACKET_SIZE - 8)

 struct memnic_header {
uint32_t magic;
uint32_t version;
uint32_t valid;
uint32_t reset;
-   uint32_t features;
-   uint32_t reserved;
+   uint32_t features;  /* features this MEMNIC provides */
+   uint32_t request;   /* requesting features from Guest */
union {
uint8_t mac_addr[6];
uint8_t dummy[8];
};
+   /* for extra features */
+   uint32_t framesz;
 };
+#define MEMNIC_FEAT_FRAME_SIZE (0x0001)
+#define MEMNIC_FEAT_ALL(MEMNIC_FEAT_FRAME_SIZE)

 struct memnic_info {
uint32_t flags;
-- 
1.8.4



[dpdk-dev] [memnic PATCH 0/5] support variable frame size

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

This patchset provides variable frame size functionality with MEMNIC
extra features framework.

First, update the memnic.h to synchronise upstream data structure which
has extra feature framework.
Next, prepare for changing frame size.
Finally implement frame size negotiation with host.

Hiroshi Shimamoto (5):
  common: update memnic.h to support variable frame size
  pmd: prepare to support variable frame size
  pmd: support variable frame size
  linux: prepare to support variable frame size
  linux: support MTU change

 common/memnic.h|  9 +++--
 linux/memnic_net.c | 46 +-
 linux/memnic_net.h |  6 ++
 pmd/pmd_memnic.c   | 43 +--
 4 files changed, 91 insertions(+), 13 deletions(-)

-- 
1.8.4



[dpdk-dev] [memnic PATCH] linux: fix to disable softirq before netif_receive_skb()

2014-06-06 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Calling netif_receive_skb() from memnic thread may cause deadlock, if
softirq is not disabled.

The netif_receive_skb() should be called in softirq context, but memnic
thread is not softirq context. That may conflict softirq work like a
timer handler in kernel network stack.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 linux/memnic_net.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux/memnic_net.c b/linux/memnic_net.c
index fadece6..a1b433a 100644
--- a/linux/memnic_net.c
+++ b/linux/memnic_net.c
@@ -133,8 +133,14 @@ static int memnic_thread(void *param)
continue;
}

+   local_bh_disable();
+   /*
+* Disable softirq here to avoid race between timers and
+* netif_receive_skb
+*/
for (i = 0; i < n; i++)
netif_receive_skb(skbs[i]);
+   local_bh_enable();

cnt = 0;
}
-- 
1.8.4



[dpdk-dev] [memnic PATCH] pmd: use rte_atomic32_cmpset instead of cmpxchg

2014-04-03 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Because DPDK has its own compare and set function to optimize to
dedicated processor type, use that rte_atomic32_cmpset() instead of
cmpxchg macro which is specially introduced for MEMNIC.

Signed-off-by: Hiroshi Shimamoto 
---
 common/memnic.h  | 12 
 pmd/pmd_memnic.c | 10 ++
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/common/memnic.h b/common/memnic.h
index 2187ac1..84e941c 100644
--- a/common/memnic.h
+++ b/common/memnic.h
@@ -120,18 +120,6 @@ struct memnic_area {
 /* for userspace */
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

-static inline uint32_t cmpxchg(uint32_t *dst, uint32_t old, uint32_t new)
-{
-   volatile uint32_t *ptr = (volatile uint32_t *)dst;
-   uint32_t ret;
-
-   asm volatile("lock; cmpxchgl %2, %1"
-: "=a" (ret), "+m" (*ptr)
-: "r" (new), "0" (old)
-: "memory");
-
-   return ret;
-}
 #endif /* __KERNEL__ */

 #endif /* MEMNIC_H */
diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 4a1c1e4..4abdf26 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -314,7 +314,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct memnic_data *data = >nic->down;
struct memnic_packet *p;
uint16_t nr;
-   int idx, old;
+   int idx;
struct rte_eth_stats *st = >stats[rte_lcore_id()];
uint64_t pkts, bytes, errs;

@@ -335,10 +335,12 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
 retry:
idx = ACCESS_ONCE(adapter->down_idx);
p = >packets[idx];
-   old = cmpxchg(>status, MEMNIC_PKT_ST_FREE, 
MEMNIC_PKT_ST_USED);
-   if (old != MEMNIC_PKT_ST_FREE) {
-   if (old == MEMNIC_PKT_ST_FILLED &&
+   if (unlikely(rte_atomic32_cmpset(>status,
+   MEMNIC_PKT_ST_FREE, MEMNIC_PKT_ST_USED) == 0)) {
+   /* cmpxchg failed */
+   if (p->status == MEMNIC_PKT_ST_FILLED &&
idx == ACCESS_ONCE(adapter->down_idx)) {
+   /* what we're seeing is FILLED means queue full 
*/
errs++;
break;
}
-- 
1.8.4



[dpdk-dev] [memnic PATCH] common: add Dual BSD/GPL license line

2014-04-03 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

The MEMNIC header file should be under Dual BSD/GPL license.
Put the license text "Dual BSD/GPL" into the file header.

Signed-off-by: Hiroshi Shimamoto 
---
 common/memnic.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/common/memnic.h b/common/memnic.h
index 8bd483c..2187ac1 100644
--- a/common/memnic.h
+++ b/common/memnic.h
@@ -27,6 +27,7 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  */
+/* Dual BSD/GPL */

 #ifndef MEMNIC_H
 #define MEMNIC_H
-- 
1.8.4



[dpdk-dev] [memnic PATCH v2] pmd: fix race condition

2014-04-03 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

There is a race condition, on transmit to host.

Guest PMD Host
Thread-A Thread-B vSwitch
   |idx=0   |idx=0   |p[0] st!=2
   |cmpxchg ||
   |p[0] st->1  ||
   |idx=1   ||
   |fill data   ||
   |p[0] st->2  ||p[0] st==2
   |||receive data
   |||p[0] st->0
   ||cmpxchg |
   ||success |p[1] st!=2
   ||p[0] st->1  |
  This is BAD

That causes traffic stop.

We have to take care about that race condition with checking
whether current index is correct.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index d833130..4a1c1e4 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -345,6 +345,15 @@ retry:
goto retry;
}

+   if (idx != ACCESS_ONCE(adapter->down_idx)) {
+   /*
+* host freed this and got false positive,
+* need to recover the status and retry.
+*/
+   p->status = MEMNIC_PKT_ST_FREE;
+   goto retry;
+   }
+
if (++idx >= MEMNIC_NR_PACKET)
idx = 0;
adapter->down_idx = idx;
-- 
1.8.4



[dpdk-dev] [memnic PATCH 1/5] pmd: fix race condition

2014-03-28 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] [memnic PATCH 1/5] pmd: fix race condition
> 
> Hi Hiroshi-san,
> 
> Please see my comments below.
> 
> On 03/11/2014 06:37 AM, Hiroshi Shimamoto wrote:
> > From: Hiroshi Shimamoto 
> >
> > There is a race condition, on transmit to vSwitch.
> 
> I think we should not talk specifically about vSwitch, as
> another implementation of host memnic is possible. Maybe using
> the term "host" is more appropriate?
> 
> > +   if (idx != ACCESS_ONCE(adapter->down_idx)) {
> > +   /*
> > +* vSwitch freed this and got false positive,
> > +* need to recover the status and retry.
> > +*/
> > +   p->status = MEMNIC_PKT_ST_FREE;
> > +   goto retry;
> > +   }
> > +
> 
> The patch indeed looks to improve reliability, even if it's
> difficult to me to be sure that there is no other race condition.
> Again, I would replace "vSwitch" by "host".

okay, I'm fine with that.
Do you want me resubmit update one?
If so, will do next week.

> 
> By the way, I guess the Linux code in linux/memnic_net.c should be
> modified in the same way.

Hm, yes, we should check kernel driver too.


thanks,
Hiroshi

> 
> Regards,
> Olivier



[dpdk-dev] [memnic PATCH 3/5] pmd: implement stats of MEMNIC

2014-03-25 Thread Hiroshi Shimamoto
Hi,

> Subject: Re: [dpdk-dev] [memnic PATCH 3/5] pmd: implement stats of MEMNIC
> 
> Hi,
> 
> 11/03/2014 05:38, Hiroshi Shimamoto:
> > From: Hiroshi Shimamoto 
> >
> > Implement missing feature to account statistics.
> > This patch adds just an infrastructure.
> >
> > Signed-off-by: Hiroshi Shimamoto 
> > Reviewed-by: Hayato Momma 
> 
> [...]
> 
> > @@ -51,6 +51,7 @@ struct memnic_adapter {
> > int up_idx, down_idx;
> > struct rte_mempool *mp;
> > struct ether_addr mac_addr;
> > +   struct rte_eth_stats stats[RTE_MAX_LCORE];
> >  };
> 
> Could you make a comment to explain why you allocate a structure per core?
> It is easier to read when locking strategy is described.

sure, could you please see the new one?

> 
> > +   for (i = 0; i < RTE_MAX_LCORE; i++) {
> > +   struct rte_eth_stats *st = >stats[i];
> > +
> > +   memset(st, 0, sizeof(*st));
> > +   }
> 
> Could you use only one memset for the array?
> 

Yep, it's reasonable.

The below is the updated patch.
Is it okay for you?

thanks,
Hiroshi

==

From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>
Subject: [PATCH v2] pmd: Implement stats of MEMNIC

Implement missing feature to account statistics.
This patch adds just an infrastructure.

Allocating per core stats area to avoid locking.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 45 ++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index bf5fc2e..facaf54 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -51,6 +51,12 @@ struct memnic_adapter {
int up_idx, down_idx;
struct rte_mempool *mp;
struct ether_addr mac_addr;
+   /*
+* Allocate per core stats to avoid lock for accounting.
+* Incrementing stats doesn't require lock, because only one thread
+* is running on per core.
+*/
+   struct rte_eth_stats stats[RTE_MAX_LCORE];
 };

 static inline struct memnic_adapter *get_adapter(const struct rte_eth_dev *dev)
@@ -126,13 +132,46 @@ static void memnic_dev_infos_get(struct rte_eth_dev *dev,
dev_info->max_mac_addrs = 1;
 }

-static void memnic_dev_stats_get(__rte_unused struct rte_eth_dev *dev,
-__rte_unused struct rte_eth_stats *stats)
+static void memnic_dev_stats_get(struct rte_eth_dev *dev,
+struct rte_eth_stats *stats)
 {
+   struct memnic_adapter *adapter = get_adapter(dev);
+   int i;
+
+   memset(stats, 0, sizeof(*stats));
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct rte_eth_stats *st = >stats[i];
+
+   stats->ipackets += st->ipackets;
+   stats->opackets += st->opackets;
+   stats->ibytes += st->ibytes;
+   stats->obytes += st->obytes;
+   stats->ierrors += st->ierrors;
+   stats->oerrors += st->oerrors;
+   stats->imcasts += st->imcasts;
+   stats->rx_nombuf += st->rx_nombuf;
+   stats->fdirmatch += st->fdirmatch;
+   stats->fdirmiss += st->fdirmiss;
+
+   /* no multiqueue support now */
+   stats->q_ipackets[0] = st->q_ipackets[0];
+   stats->q_opackets[0] = st->q_opackets[0];
+   stats->q_ibytes[0] = st->q_ibytes[0];
+   stats->q_obytes[0] = st->q_obytes[0];
+   stats->q_errors[0] = st->q_errors[0];
+
+   stats->ilbpackets += st->ilbpackets;
+   stats->olbpackets += st->olbpackets;
+   stats->ilbbytes += st->ilbbytes;
+   stats->olbbytes += st->olbbytes;
+   }
 }

-static void memnic_dev_stats_reset(__rte_unused struct rte_eth_dev *dev)
+static void memnic_dev_stats_reset(struct rte_eth_dev *dev)
 {
+   struct memnic_adapter *adapter = get_adapter(dev);
+
+   memset(adapter->stats, 0, sizeof(adapter->stats));
 }

 static int memnic_dev_link_update(struct rte_eth_dev *dev,
-- 
1.8.4



[dpdk-dev] [memnic PATCH v2] linux: fix build with kernel 3.3

2014-03-19 Thread Hiroshi Shimamoto
Hi,

I missed it sorry.

> Subject: [memnic PATCH v2] linux: fix build with kernel 3.3
> 
> Remove unused dev_ops functions.
> 
> The API of some functions (memnic_vlan_rx_add_vid,
> memnic_vlan_rx_kill_vid) changed starting from 3.3 kernel. Instead of
> using a #ifdef to handle the compilation on any kernel, we can just
> remove these functions as they are not needed.
> 
> Signed-off-by: Olivier Matz 

Acked-by: Hiroshi Shimamoto 

thanks,
Hiroshi

> ---
>  linux/memnic_net.c | 33 -
>  1 file changed, 33 deletions(-)
> 
> Hi Shimamoto-san,
> 
> Here is a new version of the patch, I think we don't need the following
> functions so we can just remove them instead of keeping several dummy
> functions for different kernel versions.
> 
> Let me know if you have any comment.
> 
> Regards,
> Olivier
> 
> diff --git a/linux/memnic_net.c b/linux/memnic_net.c
> index 747ae51..9019258 100644
> --- a/linux/memnic_net.c
> +++ b/linux/memnic_net.c
> @@ -235,16 +235,6 @@ drop:
>   return NETDEV_TX_OK;
>  }
> 
> -static u16 memnic_select_queue(struct net_device *netdev,
> - struct sk_buff *skb)
> -{
> - return 0;
> -}
> -
> -static void memnic_set_rx_mode(struct net_device *netdev)
> -{
> -}
> -
>  static int memnic_set_mac(struct net_device *netdev, void *p)
>  {
>   return 0;
> @@ -255,23 +245,6 @@ static int memnic_change_mtu(struct net_device *netdev, 
> int new_mtu)
>   return 0;
>  }
> 
> -static void memnic_tx_timeout(struct net_device *netdev)
> -{
> -}
> -
> -static void memnic_vlan_rx_add_vid(struct net_device *netdev, unsigned short 
> vid)
> -{
> -}
> -
> -static void memnic_vlan_rx_kill_vid(struct net_device *netdev, unsigned 
> short vid)
> -{
> -}
> -
> -static int memnic_ioctl(struct net_device *netdev, struct ifreq *req, int 
> cmd)
> -{
> - return 0;
> -}
> -
>  static struct net_device_stats *memnic_get_stats(struct net_device *netdev)
>  {
>   struct memnic_net *memnic = netdev_priv(netdev);
> @@ -283,15 +256,9 @@ static const struct net_device_ops memnic_netdev_ops = {
>   .ndo_open   = memnic_open,
>   .ndo_stop   = memnic_close,
>   .ndo_start_xmit = memnic_start_xmit,
> - .ndo_select_queue   = memnic_select_queue,
> - .ndo_set_rx_mode= memnic_set_rx_mode,
>   .ndo_validate_addr  = eth_validate_addr,
>   .ndo_set_mac_address= memnic_set_mac,
>   .ndo_change_mtu = memnic_change_mtu,
> - .ndo_tx_timeout = memnic_tx_timeout,
> - .ndo_vlan_rx_add_vid= memnic_vlan_rx_add_vid,
> - .ndo_vlan_rx_kill_vid   = memnic_vlan_rx_kill_vid,
> - .ndo_do_ioctl   = memnic_ioctl,
>   .ndo_get_stats  = memnic_get_stats,
>  };
> 
> --
> 1.8.5.3



[dpdk-dev] [memnic PATCH 5/5] pmd: handle multiple segments on xmit

2014-03-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

The current MEMNIC PMD cannot handle multiple segments.

Add the functionality to transmit a mbuf which has multiple segments.
Walk every segment in transmitting mbuf and copy the data to MEMNIC
packet buffer.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index abfd437..4ee655d 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -324,9 +324,11 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
pkts = bytes = errs = 0;

for (nr = 0; nr < nb_pkts; nr++) {
-   int len = rte_pktmbuf_data_len(tx_pkts[nr]);
+   int pkt_len = rte_pktmbuf_pkt_len(tx_pkts[nr]);
+   struct rte_mbuf *sg;
+   void *ptr;

-   if (len > MEMNIC_MAX_FRAME_LEN) {
+   if (pkt_len > MEMNIC_MAX_FRAME_LEN) {
errs++;
break;
}
@@ -356,12 +358,19 @@ retry:
idx = 0;
adapter->down_idx = idx;

-   p->len = len;
+   p->len = pkt_len;

-   rte_memcpy(p->data, rte_pktmbuf_mtod(tx_pkts[nr], void *), len);
+   ptr = p->data;
+   for (sg = tx_pkts[nr]; sg; sg = sg->pkt.next) {
+   void *src = rte_pktmbuf_mtod(sg, void *);
+   int data_len = sg->pkt.data_len;
+
+   rte_memcpy(ptr, src, data_len);
+   ptr += data_len;
+   }

pkts++;
-   bytes += len;
+   bytes += pkt_len;

rte_mb();
p->status = MEMNIC_PKT_ST_FILLED;
-- 
1.8.4



[dpdk-dev] [memnic PATCH 4/5] pmd: account statistics

2014-03-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Implement packet accounting of MEMNIC on TX/RX.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index fc2d990..abfd437 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -255,18 +255,23 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
struct memnic_packet *p;
struct rte_mbuf *mb;
uint16_t nr;
+   uint64_t pkts, bytes, errs;
int idx;
+   struct rte_eth_stats *st = >stats[rte_lcore_id()];

if (!adapter->nic->hdr.valid)
return 0;

+   pkts = bytes = errs = 0;
idx = adapter->up_idx;
for (nr = 0; nr < nb_pkts; nr++) {
p = >packets[idx];
if (p->status != MEMNIC_PKT_ST_FILLED)
break;
-   if (p->len > MEMNIC_MAX_FRAME_LEN)
+   if (p->len > MEMNIC_MAX_FRAME_LEN) {
+   errs++;
goto drop;
+   }
mb = rte_pktmbuf_alloc(adapter->mp);
if (!mb)
break;
@@ -279,6 +284,9 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
mb->pkt.data_len = p->len;
rx_pkts[nr] = mb;

+   pkts++;
+   bytes += p->len;
+
 drop:
rte_mb();
p->status = MEMNIC_PKT_ST_FREE;
@@ -288,6 +296,13 @@ drop:
}
adapter->up_idx = idx;

+   /* stats */
+   st->ipackets += pkts;
+   st->ibytes += bytes;
+   st->ierrors += errs;
+   st->q_ipackets[0] += pkts;
+   st->q_ibytes[0] += bytes;
+
return nr;
 }

@@ -300,14 +315,21 @@ static uint16_t memnic_xmit_pkts(void *tx_queue,
struct memnic_packet *p;
uint16_t nr;
int idx, old;
+   struct rte_eth_stats *st = >stats[rte_lcore_id()];
+   uint64_t pkts, bytes, errs;

if (!adapter->nic->hdr.valid)
return 0;

+   pkts = bytes = errs = 0;
+
for (nr = 0; nr < nb_pkts; nr++) {
int len = rte_pktmbuf_data_len(tx_pkts[nr]);
-   if (len > MEMNIC_MAX_FRAME_LEN)
+
+   if (len > MEMNIC_MAX_FRAME_LEN) {
+   errs++;
break;
+   }
 retry:
idx = ACCESS_ONCE(adapter->down_idx);
p = >packets[idx];
@@ -315,6 +337,7 @@ retry:
if (old != MEMNIC_PKT_ST_FREE) {
if (old == MEMNIC_PKT_ST_FILLED &&
idx == ACCESS_ONCE(adapter->down_idx)) {
+   errs++;
break;
}
goto retry;
@@ -337,12 +360,22 @@ retry:

rte_memcpy(p->data, rte_pktmbuf_mtod(tx_pkts[nr], void *), len);

+   pkts++;
+   bytes += len;
+
rte_mb();
p->status = MEMNIC_PKT_ST_FILLED;

rte_pktmbuf_free(tx_pkts[nr]);
}

+   /* stats */
+   st->opackets += pkts;
+   st->obytes += bytes;
+   st->oerrors += errs;
+   st->q_opackets[0] += pkts;
+   st->q_obytes[0] += bytes;
+
return nr;
 }

-- 
1.8.4



[dpdk-dev] [memnic PATCH 3/5] pmd: implement stats of MEMNIC

2014-03-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Implement missing feature to account statistics.
This patch adds just an infrastructure.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 45 ++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index bf5fc2e..fc2d990 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -51,6 +51,7 @@ struct memnic_adapter {
int up_idx, down_idx;
struct rte_mempool *mp;
struct ether_addr mac_addr;
+   struct rte_eth_stats stats[RTE_MAX_LCORE];
 };

 static inline struct memnic_adapter *get_adapter(const struct rte_eth_dev *dev)
@@ -126,13 +127,51 @@ static void memnic_dev_infos_get(struct rte_eth_dev *dev,
dev_info->max_mac_addrs = 1;
 }

-static void memnic_dev_stats_get(__rte_unused struct rte_eth_dev *dev,
-__rte_unused struct rte_eth_stats *stats)
+static void memnic_dev_stats_get(struct rte_eth_dev *dev,
+struct rte_eth_stats *stats)
 {
+   struct memnic_adapter *adapter = get_adapter(dev);
+   int i;
+
+   memset(stats, 0, sizeof(*stats));
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct rte_eth_stats *st = >stats[i];
+
+   stats->ipackets += st->ipackets;
+   stats->opackets += st->opackets;
+   stats->ibytes += st->ibytes;
+   stats->obytes += st->obytes;
+   stats->ierrors += st->ierrors;
+   stats->oerrors += st->oerrors;
+   stats->imcasts += st->imcasts;
+   stats->rx_nombuf += st->rx_nombuf;
+   stats->fdirmatch += st->fdirmatch;
+   stats->fdirmiss += st->fdirmiss;
+
+   /* no multiqueue support now */
+   stats->q_ipackets[0] = st->q_ipackets[0];
+   stats->q_opackets[0] = st->q_opackets[0];
+   stats->q_ibytes[0] = st->q_ibytes[0];
+   stats->q_obytes[0] = st->q_obytes[0];
+   stats->q_errors[0] = st->q_errors[0];
+
+   stats->ilbpackets += st->ilbpackets;
+   stats->olbpackets += st->olbpackets;
+   stats->ilbbytes += st->ilbbytes;
+   stats->olbbytes += st->olbbytes;
+   }
 }

-static void memnic_dev_stats_reset(__rte_unused struct rte_eth_dev *dev)
+static void memnic_dev_stats_reset(struct rte_eth_dev *dev)
 {
+   struct memnic_adapter *adapter = get_adapter(dev);
+   int i;
+
+   for (i = 0; i < RTE_MAX_LCORE; i++) {
+   struct rte_eth_stats *st = >stats[i];
+
+   memset(st, 0, sizeof(*st));
+   }
 }

 static int memnic_dev_link_update(struct rte_eth_dev *dev,
-- 
1.8.4



[dpdk-dev] [memnic PATCH 2/5] pmd: check frame length from host

2014-03-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

Drop packets which have invalid length.

Normally this must not happen while vSwitch works fine, however
it's better to put a sentinel to prevent memory corruption.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 805f0b2..bf5fc2e 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -226,6 +226,8 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
p = >packets[idx];
if (p->status != MEMNIC_PKT_ST_FILLED)
break;
+   if (p->len > MEMNIC_MAX_FRAME_LEN)
+   goto drop;
mb = rte_pktmbuf_alloc(adapter->mp);
if (!mb)
break;
@@ -238,6 +240,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
mb->pkt.data_len = p->len;
rx_pkts[nr] = mb;

+drop:
rte_mb();
p->status = MEMNIC_PKT_ST_FREE;

-- 
1.8.4



[dpdk-dev] [memnic PATCH 1/5] pmd: fix race condition

2014-03-11 Thread Hiroshi Shimamoto
From: Hiroshi Shimamoto <h-shimam...@ct.jp.nec.com>

There is a race condition, on transmit to vSwitch.

Guest PMD Host
Thread-A Thread-B vSwitch
   |idx=0   |idx=0   |p[0] st!=2
   |cmpxchg ||
   |p[0] st->1  ||
   |idx=1   ||
   |fill data   ||
   |p[0] st->2  ||p[0] st==2
   |||receive data
   |||p[0] st->0
   ||cmpxchg |
   ||success |p[1] st!=2
   ||p[0] st->1  |
  This is BAD

That causes traffic stop.

We have to take care about that race condition with checking
whether current index is correct.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 pmd/pmd_memnic.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 30d5a1b..805f0b2 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -278,6 +278,15 @@ retry:
goto retry;
}

+   if (idx != ACCESS_ONCE(adapter->down_idx)) {
+   /*
+* vSwitch freed this and got false positive,
+* need to recover the status and retry.
+*/
+   p->status = MEMNIC_PKT_ST_FREE;
+   goto retry;
+   }
+
if (++idx >= MEMNIC_NR_PACKET)
idx = 0;
adapter->down_idx = idx;
-- 
1.8.4



[dpdk-dev] [memnic PATCH] linux: fix build with kernel >= 3.3

2014-01-30 Thread Hiroshi Shimamoto
I never noticed about that and I haven't check compilation with newer kernel.
But I think you have completed to test it.
Fine to me.

thanks,
Hiroshi

> Subject: [dpdk-dev] [memnic PATCH] linux: fix build with kernel >= 3.3
> 
> Signed-off-by: Olivier Matz 
> ---
>  linux/memnic_net.c | 28 ++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/linux/memnic_net.c b/linux/memnic_net.c
> index 747ae51..b6018fb 100644
> --- a/linux/memnic_net.c
> +++ b/linux/memnic_net.c
> @@ -2,6 +2,7 @@
>   *   BSD LICENSE
>   *
>   *   Copyright(c) 2013-2014 NEC All rights reserved.
> + *   Copyright(c) 2014 6WIND S.A.
>   *
>   *   Redistribution and use in source and binary forms, with or without
>   *   modification, are permitted provided that the following conditions
> @@ -29,6 +30,7 @@
>   */
>  /* Dual BSD/GPL */
> 
> +#include 
>  #include 
>  #include 
> 
> @@ -259,13 +261,35 @@ static void memnic_tx_timeout(struct net_device *netdev)
>  {
>  }
> 
> -static void memnic_vlan_rx_add_vid(struct net_device *netdev, unsigned short 
> vid)
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0)
> +static int memnic_vlan_rx_add_vid(struct net_device *netdev, __be16 proto, 
> u16 vid)
> +{
> + return 0;
> +}
> +
> +static int memnic_vlan_rx_kill_vid(struct net_device *netdev, __be16 proto, 
> u16 vid)
> +{
> + return 0;
> +}
> +#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3,3,0)
> +static int memnic_vlan_rx_add_vid(struct net_device *netdev, uint16_t vid)
> +{
> + return 0;
> +}
> +
> +static int memnic_vlan_rx_kill_vid(struct net_device *netdev, uint16_t vid)
> +{
> + return 0;
> +}
> +#else
> +static void memnic_vlan_rx_add_vid(struct net_device *netdev, uint16_t vid)
>  {
>  }
> 
> -static void memnic_vlan_rx_kill_vid(struct net_device *netdev, unsigned 
> short vid)
> +static void memnic_vlan_rx_kill_vid(struct net_device *netdev, uint16_t vid)
>  {
>  }
> +#endif
> 
>  static int memnic_ioctl(struct net_device *netdev, struct ifreq *req, int 
> cmd)
>  {
> --
> 1.8.4.rc3



[dpdk-dev] [memnic PATCH] pmd: use memory barrier function instead of asm volatile

2014-01-30 Thread Hiroshi Shimamoto
> Subject: [dpdk-dev] [memnic PATCH] pmd: use memory barrier function instead 
> of asm volatile
> 
> Use the DPDK specific function rte_mb() instead of
> the GCC statement asm volatile ("" ::: "memory").

Yes, that's preferred for DPDK, I think.
Looks okay to me.

By the way, I was also asked to use rte atomic function
instead of cmpxchg asm statement.
My re-submitted version in dpdk-ovs has such a change.
What do you think?

thanks,
Hiroshi

> 
> Signed-off-by: Olivier Matz 
> ---
>  common/memnic.h  | 2 --
>  pmd/pmd_memnic.c | 6 +++---
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/common/memnic.h b/common/memnic.h
> index 6ff38a0..fdc9fa3 100644
> --- a/common/memnic.h
> +++ b/common/memnic.h
> @@ -123,8 +123,6 @@ struct memnic_area {
>  /* for userspace */
>  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> 
> -#define barrier() do { asm volatile("": : :"memory"); } while (0)
> -
>  static inline uint32_t cmpxchg(uint32_t *dst, uint32_t old, uint32_t new)
>  {
>   volatile uint32_t *ptr = (volatile uint32_t *)dst;
> diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> index bc01746..1586222 100644
> --- a/pmd/pmd_memnic.c
> +++ b/pmd/pmd_memnic.c
> @@ -100,7 +100,7 @@ static int memnic_dev_start(struct rte_eth_dev *dev)
> 
>   /* invalidate */
>   adapter->nic->hdr.valid = 0;
> - barrier();
> + rte_mb();
>   /* reset */
>   adapter->nic->hdr.reset = 1;
>   /* no need to wait here */
> @@ -242,7 +242,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
>   mb->pkt.data_len = p->len;
>   rx_pkts[nr] = mb;
> 
> - barrier();
> + rte_mb();
>   p->status = MEMNIC_PKT_ST_FREE;
> 
>   if (++idx >= MEMNIC_NR_PACKET)
> @@ -290,7 +290,7 @@ retry:
> 
>   rte_memcpy(p->data, rte_pktmbuf_mtod(tx_pkts[nr], void *), len);
> 
> - barrier();
> + rte_mb();
>   p->status = MEMNIC_PKT_ST_FILLED;
> 
>   rte_pktmbuf_free(tx_pkts[nr]);
> --
> 1.8.4.rc3



[dpdk-dev] [memnic PATCH] pmd: fix attributes

2014-01-30 Thread Hiroshi Shimamoto
> Subject: [dpdk-dev] [memnic PATCH] pmd: fix attributes
> 
> Add missing "const" and remove useless "rte_unused" attributes.

Good catch. Looks fine to me.

thanks,
Hiroshi

> 
> Signed-off-by: Olivier Matz 
> ---
>  pmd/pmd_memnic.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> index d16eb0d..bc01746 100644
> --- a/pmd/pmd_memnic.c
> +++ b/pmd/pmd_memnic.c
> @@ -57,7 +57,7 @@ struct memnic_adapter {
>   struct ether_addr mac_addr;
>  };
> 
> -static inline struct memnic_adapter *get_adapter(struct rte_eth_dev *dev)
> +static inline struct memnic_adapter *get_adapter(const struct rte_eth_dev 
> *dev)
>  {
>   return (struct memnic_adapter *)(dev->data->dev_private);
>  }
> @@ -67,7 +67,7 @@ struct memnic_queue {
>   uint8_t port_id;
>  };
> 
> -static struct memnic_queue *memnic_queue_alloc(struct rte_eth_dev *dev,
> +static struct memnic_queue *memnic_queue_alloc(const struct rte_eth_dev *dev,
>  int tx, uint16_t id)
>  {
>   struct memnic_adapter *adapter = get_adapter(dev);
> @@ -119,7 +119,7 @@ static void memnic_dev_stop(struct rte_eth_dev *dev)
>   return;
>  }
> 
> -static void memnic_dev_infos_get(__rte_unused struct rte_eth_dev *dev,
> +static void memnic_dev_infos_get(struct rte_eth_dev *dev,
>struct rte_eth_dev_info *dev_info)
>  {
>   dev_info->driver_name = dev->driver->pci_drv.name;
> --
> 1.8.4.rc3



[dpdk-dev] [memnic PATCH 3/3] common: remove double underscores

2014-01-30 Thread Hiroshi Shimamoto
Looks fine to me.

thanks,
Hiroshi

> Subject: [memnic PATCH 3/3] common: remove double underscores
> 
> The usage of double underscores is reserved.
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  common/memnic.h |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/common/memnic.h b/common/memnic.h
> index 58dd019..e5b3c6f 100644
> --- a/common/memnic.h
> +++ b/common/memnic.h
> @@ -28,8 +28,8 @@
>   *
>   */
> 
> -#ifndef __MEMNIC_H__
> -#define __MEMNIC_H__
> +#ifndef MEMNIC_H
> +#define MEMNIC_H
> 
>  #define MEMNIC_MAGIC 0x43494e76
>  #define MEMNIC_VERSION   0x0001
> @@ -135,4 +135,4 @@ static inline uint32_t cmpxchg(uint32_t *dst, uint32_t 
> old, uint32_t new)
>  }
>  #endif /* __KERNEL__ */
> 
> -#endif /* __MEMNIC_H__ */
> +#endif /* MEMNIC_H */
> --
> 1.7.10.4



[dpdk-dev] [memnic PATCH 2/3] pmd: remove useless includes

2014-01-30 Thread Hiroshi Shimamoto
> Subject: [memnic PATCH 2/3] pmd: remove useless includes
> 
> Signed-off-by: Thomas Monjalon 
> ---
>  common/memnic.h  |4 
>  pmd/pmd_memnic.c |4 
>  2 files changed, 8 deletions(-)
> 
> diff --git a/common/memnic.h b/common/memnic.h
> index 6ff38a0..58dd019 100644
> --- a/common/memnic.h
> +++ b/common/memnic.h
> @@ -31,10 +31,6 @@
>  #ifndef __MEMNIC_H__
>  #define __MEMNIC_H__
> 
> -#ifndef __KERNEL__
> -#include 
> -#endif /* __KERNEL__ */
> -

I'm not sure, but you're not seeing error, it's okay.
I put it for uintxx_t series, originally.

others, fine to me.

thanks,
Hiroshi

>  #define MEMNIC_MAGIC 0x43494e76
>  #define MEMNIC_VERSION   0x0001
>  #define MEMNIC_VERSION_1 0x0001
> diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
> index d16eb0d..619941a 100644
> --- a/pmd/pmd_memnic.c
> +++ b/pmd/pmd_memnic.c
> @@ -30,18 +30,14 @@
>   */
> 
>  #include 
> -
>  #include 
>  #include 
>  #include 
> -#include 
> 
>  #include "memnic.h"
> 
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
>  #include 
> 
> --
> 1.7.10.4



[dpdk-dev] [memnic PATCH 1/3] pmd: remove symlink

2014-01-30 Thread Hiroshi Shimamoto
Hi,

> Subject: [memnic PATCH 1/3] pmd: remove symlink
> 
> No need to have a symbolic link to a common file
> when it can be simply included.

Looks fine to me.

When I prepared the file with a bit complex file path layout,
easy to keep consistency.
Because you separated the code from DPDK vSwitch, there is
no reason to do that.

thanks,
Hiroshi

> 
> Signed-off-by: Thomas Monjalon 
> ---
>  pmd/Makefile |2 +-
>  pmd/memnic.h |1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>  delete mode 12 pmd/memnic.h
> 
> diff --git a/pmd/Makefile b/pmd/Makefile
> index a96e125..7f96af1 100644
> --- a/pmd/Makefile
> +++ b/pmd/Makefile
> @@ -59,7 +59,7 @@ ifeq '$(RTE_INCLUDE)' ''
>  endif
>   $(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) \
>   -I$(RTE_INCLUDE) -include $(RTE_CONFIG) \
> - -o $@ $<
> + -I$S/../common -o $@ $<
> 
>  install : $(DESTDIR)$(libdir)/$(SOLIB)
>   install -D -m 644 $S/README.rst $(DESTDIR)$(docdir)/README.rst
> diff --git a/pmd/memnic.h b/pmd/memnic.h
> deleted file mode 12
> index 5303ad4..000
> --- a/pmd/memnic.h
> +++ /dev/null
> @@ -1 +0,0 @@
> -../common/memnic.h
> \ No newline at end of file
> --
> 1.7.10.4