[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX


> -Original Message-
> From: Buriez, Patrice
> Sent: Thursday, May 28, 2015 1:07 PM
> To: Wodkowski, PawelX; Simon K?gstr?m; Zhang, Helin; Alexander Guy; Julien
> Cretin
> Cc: dev at dpdk.org
> Subject: RE: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version
> 
> Hi all,
> Please forgive top reply and bottom disclaimer.
> Not sure anyway that this email will reach the mailing list, since I did not
> subscribe to it.
> 
> I am worried about the removal of: cut -d'~' -f1
> It was introduced by Pawel in commit

You are absolutely right. That is why I asked to check with documentation
and to verify  with real build on all supported Ubuntu versions. :)


-- 
Pawel







[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX

> -Original Message-
> From: Simon K?gstr?m [mailto:simon.kagstrom at netinsight.net]
> Sent: Thursday, May 28, 2015 12:37 PM
> To: Wodkowski, PawelX; Zhang, Helin; Alexander Guy; Julien Cretin; Buriez,
> Patrice
> Cc: dev at dpdk.org
> Subject: Re: [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version
> 
> On 2015-05-28 12:05, Wodkowski, PawelX wrote:
> >>>
> >>> -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> >>> -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> >>> +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> >>> $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> >>> +  | cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >
> > It is fine for me if it do the job and does not break build on other OS 
> > (also other
> > Ubuntu versions especially 12.04 if we still support it).
> > Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK
> support.
> 
> From some digging, it appears it entered the kernel tree in 2006 and
> moved to include/generated/ in 2009 so I guess that should be fine for
> DPDK builds?
> 
> // Simon

I also think that it is OK but I also think should check by building you (o ask
someone to do it for you)  on those systems not by theory :)

-- 
Pawel




[dpdk-dev] [PATCH] kni: Use utsrelease.h to determine Ubuntu kernel version

2015-05-28 Thread Wodkowski, PawelX
> >
> > -ifeq ($(shell test -f /proc/version_signature && lsb_release -si
> > 2>/dev/null),Ubuntu)
> > +ifeq ($(shell lsb_release -si 2>/dev/null),Ubuntu)
> >  MODULE_CFLAGS += -DUBUNTU_RELEASE_CODE=$(shell lsb_release -sr | tr -
> d .)
> > -UBUNTU_KERNEL_CODE := $(shell cut -d' ' -f2 /proc/version_signature | \
> > -cut -d'~' -f1 | cut -d- -f1,2 | tr .- $(comma))
> > +UBUNTU_KERNEL_CODE := $(shell echo `grep UTS_RELEASE
> > $(RTE_KERNELDIR)/include/generated/utsrelease.h \
> > +| cut -d '"' -f2 | cut -d- -f1,2 | tr .- $(comma)`,1)
> >  MODULE_CFLAGS +=
> > -
> D"UBUNTU_KERNEL_CODE=UBUNTU_KERNEL_VERSION($(UBUNTU_KERNEL_
> C
> > ODE))"
> >  endif
> >
> > --
> > 1.9.1
Hi,

It is fine for me if it do the job and does not break build on other OS (also 
other 
Ubuntu versions especially 12.04 if we still support it).
Please only check if UTS_RELEASE is available on all Ubuntu versions DPDK 
support.

Pawel




[dpdk-dev] [PATCH 0/2] cmdline: add polling mode for command line

2015-05-12 Thread Wodkowski, PawelX
Self NACK - misconfigured git - wrong email address :(

I will resend this.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pawel Wodkowski
> Sent: Tuesday, May 12, 2015 1:10 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 0/2] cmdline: add polling mode for command line
> 
> This patchset adds the ability to process console input in the same thread
> as packet processing by using poll() function and fixes some minor issues.
> 
> Pawel Wodkowski (2):
>   cmdline: fix missing include files
>   cmdline: add polling mode for command line
> 
>  lib/librte_cmdline/cmdline.c   | 35 
> ++
>  lib/librte_cmdline/cmdline.h   |  4 
>  lib/librte_cmdline/cmdline_rdline.h|  1 +
>  lib/librte_cmdline/cmdline_vt100.h |  2 ++
>  lib/librte_cmdline/rte_cmdline_version.map |  1 +
>  5 files changed, 43 insertions(+)
> 
> --
> 1.9.1



[dpdk-dev] [PATCH 1/5] bond: use existing enslaved device queues

2015-04-14 Thread Wodkowski, PawelX
> 
> Pawel,
> 
> I generally test things I've just built using virtio devices and calling
> rte_eth_tx_queue_setup() more than once for a given queue id fails.
> However, it seems that most PMDs allow re-allocating device queues while
> virtio does not (xenvirt also seems to lack this functionality), so I
> don't think my approach here is right.  I'll remove this patch when I
> send the next version of this series.
> 
> Thanks,
> 
> Eric

Maybe you should rise this as separate issue maintainers of these drivers?

-- 
Pawel




[dpdk-dev] [PATCH v5 0/3] new headroom stats library and example application

2015-02-24 Thread Wodkowski, PawelX
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, February 24, 2015 11:00 AM
> To: Jastrzebski, MichalX K
> Cc: Wodkowski, PawelX; dev at dpdk.org; Neil Horman
> Subject: Re: [dpdk-dev] [PATCH v5 0/3] new headroom stats library and example
> application
> 
> 2015-02-24 09:49, Jastrzebski, MichalX K:
> > > > > I also feel that the name of this library is a bit too vague. Some 
> > > > > people
> > > > > were asking first what means "headroom". It's actually for CPU
> headroom
> > > > > monitoring.
> > > > > What about "cpuheadroom", "cpuheadroomstat", "jobstat"?
> > > >
> > > > I think we can change the name to "cpuheadroom" as it describes more
> > > clear this library.
> > >
> > > If you're focusing on CPU usage with possible actions, yes.
> > > If you're focusing on decision helper, jobstat would be better IMHO.
> >
> > We will change the name to "jobstats".
> 
> OK, I don't want to impose any name, it should be your choice.
> I guess you did an internal survey?

Yes, there was no better name that we could fit in.

> 
> Would you able to send a v6 today?

In two or three hours it should be ready.

Thanks
Pawel


[dpdk-dev] [PATCH v5 0/3] new headroom stats library and example application

2015-02-24 Thread Wodkowski, PawelX
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, February 24, 2015 11:00 AM
> To: Jastrzebski, MichalX K
> Cc: Wodkowski, PawelX; dev at dpdk.org; Neil Horman
> Subject: Re: [dpdk-dev] [PATCH v5 0/3] new headroom stats library and example
> application
> 
> 2015-02-24 09:49, Jastrzebski, MichalX K:
> > > > > I also feel that the name of this library is a bit too vague. Some 
> > > > > people
> > > > > were asking first what means "headroom". It's actually for CPU
> headroom
> > > > > monitoring.
> > > > > What about "cpuheadroom", "cpuheadroomstat", "jobstat"?
> > > >
> > > > I think we can change the name to "cpuheadroom" as it describes more
> > > clear this library.
> > >
> > > If you're focusing on CPU usage with possible actions, yes.
> > > If you're focusing on decision helper, jobstat would be better IMHO.
> >
> > We will change the name to "jobstats".
> 
> OK, I don't want to impose any name, it should be your choice.
> I guess you did an internal survey?

Yes, there was no better name that we could fit in.

> 
> Would you able to send a v6 today?

In two or three hours it should be ready.

Thanks


[dpdk-dev] [PATCH] ring: fix minor memory leak of kvlist in dev init

2015-02-23 Thread Wodkowski, PawelX


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John McNamara
> Sent: Monday, February 23, 2015 2:17 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] ring: fix minor memory leak of kvlist in dev init
> 
> Fix for Klockwork identified issue.
> 
> Signed-off-by: John McNamara 
> ---
>  lib/librte_pmd_ring/rte_eth_ring.c |3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/librte_pmd_ring/rte_eth_ring.c
> b/lib/librte_pmd_ring/rte_eth_ring.c
> index a23e933..88a1382 100644
> --- a/lib/librte_pmd_ring/rte_eth_ring.c
> +++ b/lib/librte_pmd_ring/rte_eth_ring.c
> @@ -527,7 +527,7 @@ out:
>  static int
>  rte_pmd_ring_devinit(const char *name, const char *params)
>  {
> - struct rte_kvargs *kvlist;
> + struct rte_kvargs *kvlist = NULL;
>   int ret = 0;
>   struct node_action_list *info = NULL;
> 
> @@ -569,6 +569,7 @@ rte_pmd_ring_devinit(const char *name, const char
> *params)
>  out_free:
>   rte_free(info);
>  out:
> + rte_kvargs_free(kvlist);
>   return ret;
>  }
> 
> --
> 1.7.4.1

This is wrong/incomplete as rte_kvargs_free() is unable to handle NULL argument.
I have patch under review that fix this issue along with rte_kvargs_free().

Pawel




[dpdk-dev] [RFC PATCH] lib/librte_ethdev: Expand port identifier

2015-02-18 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, February 18, 2015 1:32 PM
> To: Tetsuya Mukawa
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH] lib/librte_ethdev: Expand port identifier
> 
> On Wed, Feb 18, 2015 at 12:30:07PM +, Bruce Richardson wrote:
> > On Wed, Feb 18, 2015 at 08:02:49PM +0900, Tetsuya Mukawa wrote:
> > > Currently uint8_t is used for port identifier. This patch changes it,
> > > and use uint16_t as port identifier.
> > > This patch only changes ethdev library. ABI of the library will be
> > > kept even after applying it.
> > >
> > > Also, this patch involves following fixes.
> > > - Use "port_id" as variable name instead of "port".
> > >
> > >
> > > Signed-off-by: Tetsuya Mukawa 
> > > ---
> > >  lib/librte_ether/rte_ethdev.c  |  212 +-
> > >  lib/librte_ether/rte_ethdev_internal.h | 3672
> 
> > >  2 files changed, 3778 insertions(+), 106 deletions(-)
> > >  create mode 100644 lib/librte_ether/rte_ethdev_internal.h
> > >
> > I'm not sure I follow why we need a new header file for this.
> > Also, thinking about this change, a more fundamental problem is going to be
> > the mbuf structure, which stores a port id inside it in an 8-bit value.
> > Upgrading that to a 16-bit value requires some thought, and verification to
> > ensure any adjustment of fields does not lead to serious performance issues.
> >
> > Therefore, I suggest we leave the port id values as 8-bits until such time
> > as we need greater than 255 port values in a real-world use case.
> > Out of interest - anyone have a DPDK app where they use >16 port id values? 
> > If
> > so, how high does the port id value get?

Not real application but simple example of setup:
4 Niantic x 2 ports x 64 VF = 512 port id

I don't know what would be the real need/advantage of such setup (bonding?) but 
you see, in theory it is already insufficient.

> >
> > Regards,
> > /Bruce
> >
> 
> Resending with correct email addr for Neil.
> 
> /Bruce


[dpdk-dev] [PATCH v3 0/2] new headroom stats library and example application

2015-02-17 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pawel Wodkowski
> Sent: Tuesday, February 17, 2015 5:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3 0/2] new headroom stats library and example
> application
> 
> Hi community,
> I would like to introduce library for measuring load of some arbitrary jobs. 
> It
> can be used to profile every kind of job sets on any arbitrary execution unit 
> or
> tasking library.
> 
> In provided l2fwd-headroom example I demonstrate how to use this library to
> select optimal rx burst poll time. Jobs are selected by using existing 
> rte_timer
> library calls. This example does no limit possible schemes on which this 
> library
> can be used.
> 
> PATCH v3 changes:
>  - spelling fixes.
> 
> PATCH v2 changes:
>  - Remove jobs management/callback from library to not duplicate tasking
> library
>behaviour.
>  - Cleenup/remove useless statistics.
>  - Rework example application to use rte_timer library for jobs selection.
>  - Introduce new app parameter '-l' for automatic thousands separating in 
> stats.
>  - More readable statistics format.
> 
> 
> Pawel Wodkowski (2):
>   pmd: enable DCB in SRIOV
>   tespmd: fix DCB in SRIOV mode support
> 
>  app/test-pmd/cmdline.c  |  4 ++--
>  app/test-pmd/testpmd.c  | 39 
> +++--
>  app/test-pmd/testpmd.h  | 10 --
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |  2 +-
>  lib/librte_pmd_ixgbe/ixgbe_pf.c | 19 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  7 +++
>  6 files changed, 45 insertions(+), 36 deletions(-)
> 
> --
> 1.9.1

Not this branch :(
Self-NACK


[dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header files

2015-01-30 Thread Wodkowski, PawelX
> -Original Message-
> From: Doherty, Declan
> Sent: Friday, January 30, 2015 12:42 PM
> To: Wodkowski, PawelX; Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4 header
> files
> 
> On 30/01/15 10:56, Wodkowski, PawelX wrote:
> >> -Original Message-
> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> Sent: Friday, January 30, 2015 11:21 AM
> >> To: Wodkowski, PawelX
> >> Cc: dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH] Added missing extern 'C' decls in mode4
> header
> >> files
> >>
> >> Hi Pawel,
> >>
> >>> Signed-off-by: Pawel Wodkowski 
> >>> ---
> >>>   lib/librte_pmd_bond/rte_eth_bond_8023ad.h |8 
> >>>   lib/librte_pmd_bond/rte_eth_bond_8023ad_private.h |8 
> >>
> >> Why adding extern C in a private header file?
> >>
> >> --
> >> Thomas
> >
> > To be consistent with rte_eth_bond_private.h where it is included.
> >
> >
> 
> We only need the decls on the public headers exported by the
> librte_pmd_bond makefile, so there is no need to modify
> rte_eth_bond_private.h as it should never be linked to directly by
> external code.

I modified rte_eth_bond_8023ad_private.h not rte_eth_bond_private.h.
In rte_eth_bond_private.h those declarations are present already.
If so those declarations should be removed from rte_eth_bond_private.h.

I can do this in v2 if you accept this.


[dpdk-dev] [PATCH 0/2] new headroom stats library and example application

2015-01-30 Thread Wodkowski, PawelX


> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, January 29, 2015 8:13 PM
> To: Wodkowski, PawelX
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/2] new headroom stats library and example
> application
> 
> On Thu, Jan 29, 2015 at 05:10:36PM +, Wodkowski, PawelX wrote:
> > > -Original Message-
> > > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > > Sent: Thursday, January 29, 2015 2:25 PM
> > > To: Wodkowski, PawelX
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 0/2] new headroom stats library and
> example
> > > application
> > >
> > > On Thu, Jan 29, 2015 at 12:50:04PM +0100, Pawel Wodkowski wrote:
> > > > Hi community,
> > > > I would like to introduce library for measuring load of some arbitrary 
> > > > jobs.
> It
> > > > can be used to profile every kind of job sets on any arbitrary 
> > > > execution unit.
> > > > In provided l2fwd-headroom example I demonstrate how to use this library
> to
> > > > profile packet forwarding (job set is froward, flush and stats) on 
> > > > LCores
> > > > (execution unit). This example does no limit possible schemes on which 
> > > > this
> > > > library can be used.
> > > >
> > > > Pawel Wodkowski (2):
> > > >   librte_headroom: New library for checking core/system/app load
> > > >   examples: introduce new l2fwd-headroom example
> > > >
> > > >  config/common_bsdapp   |6 +
> > > >  config/common_linuxapp |6 +
> > > >  examples/Makefile  |1 +
> > > >  examples/l2fwd-headroom/Makefile   |   51 +++
> > > >  examples/l2fwd-headroom/main.c |  875
> > > 
> > > >  lib/Makefile   |1 +
> > > >  lib/librte_headroom/Makefile   |   50 +++
> > > >  lib/librte_headroom/rte_headroom.c |  368 +++
> > > >  lib/librte_headroom/rte_headroom.h |  481 
> > > >  mk/rte.app.mk  |4 +
> > > >  10 files changed, 1843 insertions(+)
> > > >  create mode 100644 examples/l2fwd-headroom/Makefile
> > > >  create mode 100644 examples/l2fwd-headroom/main.c
> > > >  create mode 100644 lib/librte_headroom/Makefile
> > > >  create mode 100644 lib/librte_headroom/rte_headroom.c
> > > >  create mode 100644 lib/librte_headroom/rte_headroom.h
> > > >
> > > > --
> > > > 1.7.9.5
> > > >
> > > >
> > >
> > > Whats the advantage of this library over the other tools to preform the 
> > > same
> > > function.
> >
> > Hi Neil,
> >
> > Good point, what is advantage over perf. Answer is: this library does not
> > supposed to be a perf competition and is not for profiling app in the way 
> > perf
> does.
> > It is an small and fast extension. It's main task is to manage job list to 
> > invoke
> > them exactly when needed and provide some basic stats about application idle
> > time (whatever programmer will consider the idle) and busy time.
> >
> > For example:
> > application might decide to add remove some jobs to/from LCore(s)
> dynamically
> > basing on current idle time (ex: move job from one core to another).
> >
> > Also application might have some information's about traffic type it handles
> > and provide own algorithm to calculate invocation time (it can also
> dynamically
> > switch between those algorithms only replacing handlers).
> >
> > >  Perf can provide all the information in this library, and do so
> > > without having to directly modify the source for the execution unit under
> test
> >
> > Yes, perf can provide those information's  but it can't handle the case when
> > you are poling for packets too fast or too slow and waist time getting only
> couple
> > of them. Library will adjust time when it execute job basing on value this 
> > job
> > returned previously. Code modifications are not so deep, as you can see
> comparing
> > l2wf vs l2fwd-headroom app.
> >
> > For example in  application I introduced, when forward job return less than
> > MAX_PKT_BURST execution period will be increased. If it return more it will
> decrease
> > execution period. Stats provided for that can be used to determine if
> appl

[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss

2015-01-30 Thread Wodkowski, PawelX
> -Original Message-
> From: Ouyang, Changchun
> Sent: Friday, January 30, 2015 2:20 AM
> To: Wodkowski, PawelX; Thomas Monjalon; Richardson, Bruce
> Cc: dev at dpdk.org; Ouyang, Changchun
> Subject: RE: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
> 
> Hi PawelX
> 
> > -Original Message-
> > From: Wodkowski, PawelX
> > Sent: Friday, January 30, 2015 12:14 AM
> > To: Ouyang, Changchun; Thomas Monjalon; Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang,
> > Changchun
> > > Sent: Wednesday, January 28, 2015 2:35 AM
> > > To: Thomas Monjalon
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf
> > > rss
> > >
> > > Hi Thomas,
> > >
> > > > -Original Message-
> > > > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > > Sent: Tuesday, January 27, 2015 8:13 PM
> > > > To: Ouyang, Changchun
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in
> > > > vf rss
> > > >
> > > > > To follow up the comments from Wodkowski, PawelX, remove this
> > > > > unnecessary check, as check_mq_mode has already check the queue
> > > > number
> > > > > in device configure stage, if the queue number of vf is not
> > > > > correct, it will return error code and exit, so it doesn't need
> > > > > check again here in device start stage(note: pf_host_configure is
> > > > > called in device start
> > > > stage).
> > > > >
> > > > > This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f
> > > > >
> > > > > Signed-off-by: Changchun Ouyang 
> > > >
> > > > Suggested-by: Pawel Wodkowski 
> > > > Fixes: 42d2f78abcb77 ("configure VF RSS")
> > > >
> > > > Applied
> > > >
> > >
> > > Thanks very much for the applying!
> > >
> > > > Changchun, as you are working on ixgbe, maybe you would like to
> > > > review some ixgbe patches from others?
> > > >
> > >
> > > No problem, I will try to do it when my bandwidth allows me to do it,
> > > :-) Thanks Changchun
> >
> > Actually I was suggesting exactly opposite direction. Main issue is that the
> > sriov field in rte_eth_dev_data is only used by igb and ixgbe drivers. In
> > addition
> > rte_eth_dev_check_mq_mode() is specialized for ixgbe driver.
> >
> > I am thinking about moving sriov from rte_eth_dev_data to driver's private
> > structure or at least move rte_eth_dev_check_mq_mode() to struct
> > eth_dev_ops as optional driver configuration step.
> >
> > What do you think about both steps?
> 
> Good opinion!
> I prefer to move rte_eth_dev_check_mq_mode to eth_dev_ops as optional
> driver configure,
> The reason is that in future other eth type may also need such kind of check 
> or
> even refine some queue number values by their own way,
> I can help review your patch after you send out.
> Thanks for your enhancing that.
> 

What about about moving sriov from rte_eth_dev_data to driver's private
structure?

Pawel


[dpdk-dev] [PATCH 0/2] new headroom stats library and example application

2015-01-29 Thread Wodkowski, PawelX
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, January 29, 2015 2:25 PM
> To: Wodkowski, PawelX
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/2] new headroom stats library and example
> application
> 
> On Thu, Jan 29, 2015 at 12:50:04PM +0100, Pawel Wodkowski wrote:
> > Hi community,
> > I would like to introduce library for measuring load of some arbitrary 
> > jobs. It
> > can be used to profile every kind of job sets on any arbitrary execution 
> > unit.
> > In provided l2fwd-headroom example I demonstrate how to use this library to
> > profile packet forwarding (job set is froward, flush and stats) on LCores
> > (execution unit). This example does no limit possible schemes on which this
> > library can be used.
> >
> > Pawel Wodkowski (2):
> >   librte_headroom: New library for checking core/system/app load
> >   examples: introduce new l2fwd-headroom example
> >
> >  config/common_bsdapp   |6 +
> >  config/common_linuxapp |6 +
> >  examples/Makefile  |1 +
> >  examples/l2fwd-headroom/Makefile   |   51 +++
> >  examples/l2fwd-headroom/main.c |  875
> 
> >  lib/Makefile   |1 +
> >  lib/librte_headroom/Makefile   |   50 +++
> >  lib/librte_headroom/rte_headroom.c |  368 +++
> >  lib/librte_headroom/rte_headroom.h |  481 
> >  mk/rte.app.mk  |4 +
> >  10 files changed, 1843 insertions(+)
> >  create mode 100644 examples/l2fwd-headroom/Makefile
> >  create mode 100644 examples/l2fwd-headroom/main.c
> >  create mode 100644 lib/librte_headroom/Makefile
> >  create mode 100644 lib/librte_headroom/rte_headroom.c
> >  create mode 100644 lib/librte_headroom/rte_headroom.h
> >
> > --
> > 1.7.9.5
> >
> >
> 
> Whats the advantage of this library over the other tools to preform the same
> function.

Hi Neil,

Good point, what is advantage over perf. Answer is: this library does not 
supposed to be a perf competition and is not for profiling app in the way perf 
does.
It is an small and fast extension. It's main task is to manage job list to 
invoke
them exactly when needed and provide some basic stats about application idle 
time (whatever programmer will consider the idle) and busy time.

For example:
application might decide to add remove some jobs to/from LCore(s) dynamically
basing on current idle time (ex: move job from one core to another).

Also application might have some information's about traffic type it handles
and provide own algorithm to calculate invocation time (it can also dynamically
switch between those algorithms only replacing handlers).

>  Perf can provide all the information in this library, and do so
> without having to directly modify the source for the execution unit under test

Yes, perf can provide those information's  but it can't handle the case when 
you are poling for packets too fast or too slow and waist time getting only 
couple 
of them. Library will adjust time when it execute job basing on value this job
returned previously. Code modifications are not so deep, as you can see 
comparing 
l2wf vs l2fwd-headroom app.

For example in  application I introduced, when forward job return less than
MAX_PKT_BURST execution period will be increased. If it return more it will 
decrease
execution period. Stats provided for that can be used to determine if 
application is
behaving correctly and if there is a time for handling another port (what did 
for tests).

Pawel


[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss

2015-01-29 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang, Changchun
> Sent: Wednesday, January 28, 2015 2:35 AM
> To: Thomas Monjalon
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
> 
> Hi Thomas,
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Tuesday, January 27, 2015 8:13 PM
> > To: Ouyang, Changchun
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
> >
> > > To follow up the comments from Wodkowski, PawelX, remove this
> > > unnecessary check, as check_mq_mode has already check the queue
> > number
> > > in device configure stage, if the queue number of vf is not correct,
> > > it will return error code and exit, so it doesn't need check again
> > > here in device start stage(note: pf_host_configure is called in device 
> > > start
> > stage).
> > >
> > > This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f
> > >
> > > Signed-off-by: Changchun Ouyang 
> >
> > Suggested-by: Pawel Wodkowski 
> > Fixes: 42d2f78abcb77 ("configure VF RSS")
> >
> > Applied
> >
> 
> Thanks very much for the applying!
> 
> > Changchun, as you are working on ixgbe, maybe you would like to review
> > some ixgbe patches from others?
> >
> 
> No problem, I will try to do it when my bandwidth allows me to do it, :-)
> Thanks
> Changchun

Actually I was suggesting exactly opposite direction. Main issue is that the
sriov field in rte_eth_dev_data is only used by igb and ixgbe drivers. In 
addition
rte_eth_dev_check_mq_mode() is specialized for ixgbe driver.

I am thinking about moving sriov from rte_eth_dev_data to driver's private
structure or at least move rte_eth_dev_check_mq_mode() to struct eth_dev_ops
as optional driver configuration step.

What do you think about both steps?

Pawel



[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem

2015-01-28 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon
> Sent: Tuesday, January 27, 2015 8:31 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem
> 
> Hi all,
> 
>  I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a back-to-back
> setup both equipped with 82599EB 10-Gigabit NIC.
> The problem is when I start it, pktgen indicates 1Mbits/s Tx with 64B 
> packet
> size,  but I'm receiving  about 15% of it on the other end.
> This percentage seems to be proportional with the packet size.
> 
> e.g.
> Using nload to read Rx traffic
> Pktgen:   Tx: 1Mbits/s==> Other end:  Rx 1660 Mbits/s
>   Rate: 100%
>   Pkt size: 64B
> 
> 
> e.g 2
> Pktgen:   Tx: 1Mbits/s==> Other end:  Rx 9385 Mbits/s
>   Rate: 100%
>   Pkt size: 1518B
> 
> 
> Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz
> ./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024 --file-prefix pg 
> -- -p
> 0x3 -P  -N -m "[1:3].0, [2:4].1"

>From past experience I don't assign more than 1 core per port. It had some
race conditions issues and one core I capable to RX or TX full 10G.
Also check if you assign proper cores/memory for your NICs (the same
NUMA node).

> 
> Is there something I'm not configuring correctly or something I have miss?
> 
> Also, the % rate is  acting strangely since anything above 50% doesn't change
> the Tx rate and anything below is modifying it
> e.g   Tx:  1Mbits/s   5000Mbits/s
>   %Rate:  >=50%   25%
> 
> 

Actually I am getting exactly opposite results :)
If I set rate to 50% I get
MBits/s Rx/Tx   : 0/9942 9942/0  9942/9942

For 10%:
MBits/s Rx/Tx   : 0/1997 1997/0  1997/1997

Which is about 2x  set :D

Additionaly I am getting message when "start 0" -> "stop 0" -> "start 0" is 
issued
PMD: ixgbe_dev_rx_init(): forcing scatter mode

So there is definitely something wrong there but don't know where.
Another issue I encountered is build system that fail when building out-of-tree.

Till this is fixed you can try version 2.7.1 that is working for me.

Pawel


[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

2015-01-26 Thread Wodkowski, PawelX
Hi,

I must say: greate work.

I have some small comments:

> +/**
> + * Macro for copying unaligned block from one location to another,
> + * 47 bytes leftover maximum,
> + * locations should not overlap.
> + * Requirements:
> + * - Store is aligned
> + * - Load offset is , which must be immediate value within [1, 15]
> + * - For , make sure  bit backwards & <16 - offset> bit forwards
> are available for loading
> + * - , ,  must be variables
> + * - __m128i  ~  must be pre-defined
> + */
> +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> \
> +{
>\
...
> +}

Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable side
effects.

Second:
Why you completely substitute
#define rte_memcpy(dst, src, n)  \
({ (__builtin_constant_p(n)) ?   \
memcpy((dst), (src), (n)) :  \
rte_memcpy_func((dst), (src), (n)); })

with inline rte_memcpy()? This construction  can help compiler to deduce
which version to use (static?) inline implementation or call external
function.

Did you try 'extern inline' type? It could help reducing compilation time.



[dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for cpu assignment

2015-01-22 Thread Wodkowski, PawelX
Hi,
I want to mention that similar but for me much more readable syntax have 
Pktgen-DPDK for defining core - port mapping. Maybe we can adopt this syntax
for new '--lcores' parameter.

See '-m' parameter syntax on Pktgen readme.
https://github.com/pktgen/Pktgen-DPDK/blob/master/dpdk/examples/pktgen/README.md

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, January 22, 2015 3:34 PM
> To: Richardson, Bruce; Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> cpu
> assignment
> 
> Hi Bruce,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Thursday, January 22, 2015 12:19 PM
> > To: Liang, Cunming
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1 02/15] eal: new eal option '--lcores' for 
> > cpu
> assignment
> >
> > On Thu, Jan 22, 2015 at 04:16:25PM +0800, Cunming Liang wrote:
> > > It supports one new eal long option '--lcores' for EAL thread cpuset
> assignment.
> > >
> > > The format pattern:
> > >   --lcores='lcores[@cpus]<,lcores[@cpus]>'
> > > lcores, cpus could be a single digit or a group.
> > > '(' and ')' are necessary if it's a group.
> > > If not supply '@cpus', the value of cpus uses the same as lcores.
> > >
> > > e.g. '1,2@(5-7),(3-5)@(0,2),(0,6)' means starting 7 EAL thread as below
> > >   lcore 0 runs on cpuset 0x41 (cpu 0,6)
> > >   lcore 1 runs on cpuset 0x2 (cpu 1)
> > >   lcore 2 runs on cpuset 0xe0 (cpu 5,6,7)
> > >   lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2)
> > >   lcore 6 runs on cpuset 0x41 (cpu 0,6)
> > >
> >
> > This strikes me as very confusing, though a couple of tweaks might help with
> > readability. The lcore 0 at the end is especially confusing.
> 
> Didn't get you here: do you find (0,6) confusing, right?
> Because braces implicitly specifies affinity for group of en-braced lcores?
> 
> > Perhaps we can
> > limit the allowed formats here,
> > * require the lcore_id to be specified - the lack of an lcore id for the 
> > last part
> > makes having it as lcore 0 surprising.
> 
> Again, not sure I understand you properly:  lcore_id(s) are always specified
> explicitly.
> Physical cpus part might be omitted.
> 
> > * only allow one lcore id to be given for each set of cores.
> 
> So you mean for '(3-5)@(0,2)' user would have to: '3@(0,2),4@(0,2),5@(0,2)'?
> I don't see big difference here, but imagine you'd like to create a pool of 
> 32 EAL-
> threads running on same cpu set.
> With current syntax it is just something like: '(32-63)@(0-7)'.
> With what you proposing it will be a very long list.
> 
> >
> > I think it may still be readable if we allow the core set to be omitted if 
> > its
> > to be the same as the lcore_id.
> 
> I think that is supported.
> See lcore_id=1 in Steve's example above.
> As I understand: --lcores='0,2,3-5' is equal to '-l 0,2,3-5' and to '-c 0x3d'.
> 
> Konstantin
> 
> >
> > It's probably still not going to be very tidy, but I think we can improve 
> > things.
> >
> > /Bruce
> >
> > > Signed-off-by: Cunming Liang 
> > > ---
> > >  lib/librte_eal/common/eal_common_launch.c  |   1 -
> > >  lib/librte_eal/common/eal_common_options.c | 262
> -
> > >  lib/librte_eal/common/eal_options.h|   2 +
> > >  lib/librte_eal/linuxapp/eal/Makefile   |   1 +
> > >  4 files changed, 261 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/eal_common_launch.c
> b/lib/librte_eal/common/eal_common_launch.c
> > > index 599f83b..2d732b1 100644
> > > --- a/lib/librte_eal/common/eal_common_launch.c
> > > +++ b/lib/librte_eal/common/eal_common_launch.c
> > > @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void)
> > >   rte_eal_wait_lcore(lcore_id);
> > >   }
> > >  }
> > > -
> > > diff --git a/lib/librte_eal/common/eal_common_options.c
> b/lib/librte_eal/common/eal_common_options.c
> > > index e2810ab..fc47588 100644
> > > --- a/lib/librte_eal/common/eal_common_options.c
> > > +++ b/lib/librte_eal/common/eal_common_options.c
> > > @@ -45,6 +45,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >
> > >  #include "eal_internal_cfg.h"
> > >  #include "eal_options.h"
> > > @@ -85,6 +86,7 @@ eal_long_options[] = {
> > >   {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM},
> > >   {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM},
> > >   {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
> > > + {OPT_LCORES, 1, 0, OPT_LCORES_NUM},
> > >   {0, 0, 0, 0}
> > >  };
> > >
> > > @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist)
> > >   if (min == RTE_MAX_LCORE)
> > >   min = idx;
> > >   for (idx = min; idx <= max; idx++) {
> > > - cfg->lcore_role[idx] = ROLE_RTE;
> > > - lcore_config[idx].core_index = count;
> > > - count++;
> > > +   

[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-22 Thread Wodkowski, PawelX
> 
> I'm new on the list and my experience with DPDK is about two months so,
> pls., don't judge me too harsh... ;)
> I tried to cover the obvious things and actually learned the code while
> reviewing. The things u say, Pavel(X?) make sense and I obviously missed

I am really puzzled about mail client I have to use. It is really stubborn 
about using my correct name :P

> that.
> But as Changchun mentioned there is nothing that can't be fixed with a
> followup patches... ;)
> 
Roger that :P 
No judging, I should also look those patches before they were acked.

Waiting for fixes.

Pawel


[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-21 Thread Wodkowski, PawelX


> -Original Message-
> From: Ouyang, Changchun
> Sent: Wednesday, January 21, 2015 3:44 AM
> To: Wodkowski, PawelX; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> 
> 
> 
> > -Original Message-
> > From: Wodkowski, PawelX
> > Sent: Tuesday, January 20, 2015 5:35 PM
> > To: Ouyang, Changchun; dev at dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang
> > Changchun
> > > Sent: Monday, January 12, 2015 6:59 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> > >
> > > It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF
> > RSS.
> > >
> > > The psrtype will determine how many queues the received packets will
> > > distribute to, and the value of psrtype should depends on both facet:
> > > max VF rxq number which has been negotiated with PF, and the number of
> > > rxq specified in config on guest.
> > >
> > > Signed-off-by: Changchun Ouyang 
> > >
> > > Changes in v6:
> > >   - Raise an error for the case of ETH_16_POOLS in config vf rss, as the
> > previous
> > > logic have changed it into: ETH_32_POOLS.
> > >
> > > Changes in v4:
> > >  - The number of rxq from config should be power of 2 and should not
> > > bigger than
> > > max VF rxq number(negotiated between guest and host).
> > >
> > > ---
> > >  lib/librte_pmd_ixgbe/ixgbe_pf.c   |  15 ++
> > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 102
> > > +-
> > >  2 files changed, 105 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > b/lib/librte_pmd_ixgbe/ixgbe_pf.c index dbda9b5..93f6e43 100644
> > > --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> > > @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
> > > *eth_dev)
> > >   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw-
> > >mac.num_rar_entries),
> > > 0);
> > >   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw-
> > >mac.num_rar_entries),
> > > 0);
> > >
> > > + /*
> > > +  * VF RSS can support at most 4 queues for each VF, even if
> > > +  * 8 queues are available for each VF, it need refine to 4
> > > +  * queues here due to this limitation, otherwise no queue
> > > +  * will receive any packet even RSS is enabled.
> > > +  */
> > > + if (eth_dev->data->dev_conf.rxmode.mq_mode ==
> > > ETH_MQ_RX_VMDQ_RSS) {
> > > + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
> > > + RTE_ETH_DEV_SRIOV(eth_dev).active =
> > > ETH_32_POOLS;
> > > + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
> > > + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
> > > + dev_num_vf(eth_dev) * 4;
> > > + }
> > > + }
> > > +
> >
> > I did not looked before at your patches but I think you are messing with
> > things that should not be changed:
> >
> > Why you are changing those values. They are set up during
> > ixgbe_pf_host_init(). Limitation you are describing is only RSS related. If
> > there will be reconfiguration from ETH_MQ_RX_VMDQ_RSS to other mode
> > those value need to be re-evaluated. If you find this kind of limitation you
> > should handle it during RSS part configuration. Or if your way is the right 
> > way
> > you should explicitly make separate function that will re-evaluate those
> > parameters each time.
> >
> > Second issue with this code is that the nb_q_per_pool is changed from:
> > ixgbe_pf_host_configure() -> ixgbe_dev_start() -> rte_eth_dev_start() and
> > rte_eth_dev_check_vf_rss_rxq_num() -> rte_eth_dev_check_mq_mode() ->
> > rte_eth_dev_configure()
> >
> > Which one is the right one? If both, why they are calculated twice?
> >
> > I don't think that rte_eth_dev_data::sriov field should be changed at all - 
> > it
> > holds current SRIOV capabilities.
> > If this will change during runtime it no point to have this field at all 
> > and should
> > be some kind of "siov_get()"
> > function that will calculate and return those parameters dynamically.
> &g

[dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS

2015-01-20 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Monday, January 12, 2015 6:59 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 5/6] ixgbe: Config VF RSS
> 
> It needs config RSS and IXGBE_MRQC and IXGBE_VFPSRTYPE to enable VF RSS.
> 
> The psrtype will determine how many queues the received packets will 
> distribute
> to,
> and the value of psrtype should depends on both facet: max VF rxq number
> which
> has been negotiated with PF, and the number of rxq specified in config on 
> guest.
> 
> Signed-off-by: Changchun Ouyang 
> 
> Changes in v6:
>   - Raise an error for the case of ETH_16_POOLS in config vf rss, as the 
> previous
> logic have changed it into: ETH_32_POOLS.
> 
> Changes in v4:
>  - The number of rxq from config should be power of 2 and should not bigger
> than
> max VF rxq number(negotiated between guest and host).
> 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_pf.c   |  15 ++
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 102
> +-
>  2 files changed, 105 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> index dbda9b5..93f6e43 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> @@ -187,6 +187,21 @@ int ixgbe_pf_host_configure(struct rte_eth_dev
> *eth_dev)
>   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_LO(hw->mac.num_rar_entries),
> 0);
>   IXGBE_WRITE_REG(hw, IXGBE_MPSAR_HI(hw->mac.num_rar_entries),
> 0);
> 
> + /*
> +  * VF RSS can support at most 4 queues for each VF, even if
> +  * 8 queues are available for each VF, it need refine to 4
> +  * queues here due to this limitation, otherwise no queue
> +  * will receive any packet even RSS is enabled.
> +  */
> + if (eth_dev->data->dev_conf.rxmode.mq_mode ==
> ETH_MQ_RX_VMDQ_RSS) {
> + if (RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool == 8) {
> + RTE_ETH_DEV_SRIOV(eth_dev).active =
> ETH_32_POOLS;
> + RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = 4;
> + RTE_ETH_DEV_SRIOV(eth_dev).def_pool_q_idx =
> + dev_num_vf(eth_dev) * 4;
> + }
> + }
> +

I did not looked before at your patches but I think you are messing with things 
that should not be changed:

Why you are changing those values. They are set up during ixgbe_pf_host_init(). 
Limitation you are
describing is only RSS related. If there will be reconfiguration from 
ETH_MQ_RX_VMDQ_RSS to other mode those value need to be re-evaluated. If you 
find this
kind of limitation you should handle it during RSS part configuration. Or if 
your way is the right way
you should explicitly make separate function that will re-evaluate those 
parameters each time.

Second issue with this code is that the nb_q_per_pool is changed from:
ixgbe_pf_host_configure() -> ixgbe_dev_start() -> rte_eth_dev_start()
and
rte_eth_dev_check_vf_rss_rxq_num() -> rte_eth_dev_check_mq_mode() -> 
rte_eth_dev_configure()

Which one is the right one? If both, why they are calculated twice?

I don't think that rte_eth_dev_data::sriov field should be changed at all - it 
holds current SRIOV capabilities.
If this will change during runtime it no point to have this field at all and 
should be some kind of "siov_get()"
function that will calculate and return those parameters dynamically.

Please refer also to 
for further issues.

I think this patchset should not be applied.

>   /* set VMDq map to default PF pool */
>   hw->mac.ops.set_vmdq(hw, 0,
> RTE_ETH_DEV_SRIOV(eth_dev).def_vmdq_idx);
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> index f69abda..20627df 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> @@ -3327,6 +3327,67 @@ ixgbe_alloc_rx_queue_mbufs(struct igb_rx_queue
> *rxq)
>  }
> 
>  static int
> +ixgbe_config_vf_rss(struct rte_eth_dev *dev)
> +{
> + struct ixgbe_hw *hw;
> + uint32_t mrqc;
> +
> + ixgbe_rss_configure(dev);
> +
> + hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +
> + /* MRQC: enable VF RSS */
> + mrqc = IXGBE_READ_REG(hw, IXGBE_MRQC);
> + mrqc &= ~IXGBE_MRQC_MRQE_MASK;
> + switch (RTE_ETH_DEV_SRIOV(dev).active) {
> + case ETH_64_POOLS:
> + mrqc |= IXGBE_MRQC_VMDQRSS64EN;
> + break;
> +
> + case ETH_32_POOLS:
> + mrqc |= IXGBE_MRQC_VMDQRSS32EN;
> + break;
> +
> + default:
> + PMD_INIT_LOG(ERR, "Invalid pool number in IOV mode with
> VMDQ RSS");
> + return -EINVAL;
> + }
> +
> + IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc);
> +
> + return 0;
> +}
> +
> +static int
> +ixgbe_config_vf_default(struct rte_eth_dev *dev)
> +{
> + struct ixgbe_hw *hw =
> + IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +
> + switch 

[dpdk-dev] [PATCH v2 2/4] ethdev: prevent changing of nb_q_per_pool in rte_eth_dev_check_mq_mode()

2015-01-20 Thread Wodkowski, PawelX
> -Original Message-
> From: Ouyang, Changchun
> Sent: Tuesday, January 20, 2015 2:33 AM
> To: Wodkowski, PawelX; dev at dpdk.org
> Cc: Ouyang, Changchun
> Subject: RE: [dpdk-dev] [PATCH v2 2/4] ethdev: prevent changing of
> nb_q_per_pool in rte_eth_dev_check_mq_mode()
> 
> 
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Pawel Wodkowski
> > Sent: Monday, January 19, 2015 9:02 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2 2/4] ethdev: prevent changing of
> > nb_q_per_pool in rte_eth_dev_check_mq_mode()
> >
> > If SRIOV is used and device configuration does not use MQ the
> > RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool is set to 1 in
> > rte_eth_dev_check_mq_mode().
> > This is bad becouse of two reasons:
> > 1. Port reconfiguration from non-MQ mode to MQ mode is impossible 2.
> > Confguring RX and TX side in different way is impossible.
> >
> 
> This case is possible:
> rxmode.mq_mode is ETH_MQ_RX_VMDQ_RSS, and txmode.mq_mode is
> ETH_MQ_TX_NONE.
> 
but ETH_MQ_RX_NONE -> ETH_MQ_RX_VMDQ_RSS is not. 

I have 8 VFs
In testpmd

testpmd> port config all rxq 2
port config all rxq 2

testpmd> port start 0
port start 0

Configuring Port 0 (socket 0)
Fail to configure port 0
testpmd> port config all rxq 4
port config all rxq 4

testpmd> port start 0
port start 0

Configuring Port 0 (socket 0)
Fail to configure port 0
testpmd> port config all rxq 8
port config all rxq 8

testpmd> port start all
port start all

Configuring Port 0 (socket 0)
Fail to configure port 0
testpmd> port config all rxq 1
port config all rxq 1

testpmd> port start 0
port start 0

Configuring Port 0 (socket 0)
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7ffec0ae9140 hw_ring=0x7ffec2c0bf00 
dma_addr=0x102c0bf00
PMD: set_tx_function(): Using full-featured tx code path
PMD: set_tx_function():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7ffec0ae88c0 hw_ring=0x7ffec2c1bf00 
dma_addr=0x102c1bf00
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are 
satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=0.
PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst 
size no less than 32.
Port 0: 00:1B:21:C7:33:B0
Checking link statuses...
Port 0 Link Up - speed 1 Mbps - full-duplex
Port 1 Link Down
Done
testpmd>

Please refer to RSS patch thread. I will post there second reply.

Pawel


[dpdk-dev] [PATCH v6 4/6] ether: Check VMDq RSS mode

2015-01-19 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Monday, January 12, 2015 6:59 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 4/6] ether: Check VMDq RSS mode
> 
> Check mq mode for VMDq RSS, handle it correctly instead of returning an error;
> Also remove the limitation of per pool queue number has max value of 1,
> because
> the per pool queue number could be 2 or 4 if it is VMDq RSS mode;
> 
> The number of rxq specified in config will determine the mq mode for VMDq
> RSS.
> 
> Signed-off-by: Changchun Ouyang 
> 
> changes in v6:
>   - More clear error message when queue number is invalid.
> 
> changes in v5:
>   - Fix '<' issue, it should be '<=' to test rxq number;
>   - Extract a function to remove the embeded switch-case statement.
> 
> ---
>  lib/librte_ether/rte_ethdev.c | 51
> ++-
>  1 file changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 95f2ceb..e9e3368 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -503,6 +503,31 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev
> *dev, uint16_t nb_queues)
>  }
> 
>  static int
> +rte_eth_dev_check_vf_rss_rxq_num(uint8_t port_id, uint16_t nb_rx_q)
> +{
> + struct rte_eth_dev *dev = _eth_devices[port_id];
> + switch (nb_rx_q) {
> + case 1:
> + case 2:
> + RTE_ETH_DEV_SRIOV(dev).active =
> + ETH_64_POOLS;
> + break;
> + case 4:
> + RTE_ETH_DEV_SRIOV(dev).active =
> + ETH_32_POOLS;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool = nb_rx_q;
> + RTE_ETH_DEV_SRIOV(dev).def_pool_q_idx =
> + dev->pci_dev->max_vfs * nb_rx_q;
> +
> + return 0;
> +}
> +
> +static int
>  rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t nb_rx_q, uint16_t
> nb_tx_q,
> const struct rte_eth_conf *dev_conf)
>  {
> @@ -510,8 +535,7 @@ rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t
> nb_rx_q, uint16_t nb_tx_q,
> 
>   if (RTE_ETH_DEV_SRIOV(dev).active != 0) {
>   /* check multi-queue mode */
> - if ((dev_conf->rxmode.mq_mode == ETH_MQ_RX_RSS) ||
> - (dev_conf->rxmode.mq_mode == ETH_MQ_RX_DCB) ||
> + if ((dev_conf->rxmode.mq_mode == ETH_MQ_RX_DCB) ||
>   (dev_conf->rxmode.mq_mode == ETH_MQ_RX_DCB_RSS) ||
>   (dev_conf->txmode.mq_mode == ETH_MQ_TX_DCB)) {
>   /* SRIOV only works in VMDq enable mode */
> @@ -525,7 +549,6 @@ rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t
> nb_rx_q, uint16_t nb_tx_q,
>   }
> 
>   switch (dev_conf->rxmode.mq_mode) {
> - case ETH_MQ_RX_VMDQ_RSS:
>   case ETH_MQ_RX_VMDQ_DCB:
>   case ETH_MQ_RX_VMDQ_DCB_RSS:
>   /* DCB/RSS VMDQ in SRIOV mode, not implement yet
> */
> @@ -534,6 +557,26 @@ rte_eth_dev_check_mq_mode(uint8_t port_id,
> uint16_t nb_rx_q, uint16_t nb_tx_q,
>   "unsupported VMDQ mq_mode rx
> %u\n",
>   port_id, dev_conf-
> >rxmode.mq_mode);
>   return (-EINVAL);
> + case ETH_MQ_RX_RSS:
> + PMD_DEBUG_TRACE("ethdev port_id=%" PRIu8
> + " SRIOV active, "
> + "Rx mq mode is changed from:"
> + "mq_mode %u into VMDQ mq_mode
> %u\n",
> + port_id,
> + dev_conf->rxmode.mq_mode,
> + dev->data-
> >dev_conf.rxmode.mq_mode);
> + case ETH_MQ_RX_VMDQ_RSS:
> + dev->data->dev_conf.rxmode.mq_mode =
> ETH_MQ_RX_VMDQ_RSS;
> + if (nb_rx_q <=
> RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool)
> + if (rte_eth_dev_check_vf_rss_rxq_num(port_id,
> nb_rx_q) != 0) {
> + PMD_DEBUG_TRACE("ethdev
> port_id=%d"
> + " SRIOV active, invalid queue"
> + " number for VMDQ RSS,
> allowed"
> + " value are 1, 2 or 4\n",
> + port_id);
> + return -EINVAL;
> + }
> + break;
>   default: /* ETH_MQ_RX_VMDQ_ONLY or ETH_MQ_RX_NONE
> */
>   /* if nothing mq mode configure, use default scheme */
>   dev->data->dev_conf.rxmode.mq_mode =
> ETH_MQ_RX_VMDQ_ONLY;
> @@ -553,8 +596,6 @@ rte_eth_dev_check_mq_mode(uint8_t port_id, 

[dpdk-dev] [PATCH v6 3/6] ixgbe: Get VF queue number

2015-01-19 Thread Wodkowski, PawelX


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang Changchun
> Sent: Monday, January 12, 2015 6:59 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v6 3/6] ixgbe: Get VF queue number
> 
> Get the available Rx and Tx queue number when receiving
> IXGBE_VF_GET_QUEUES message from VF.
> 
> Signed-off-by: Changchun Ouyang 
> 
> changes in v5
>   - Add some 'FIX ME' comments for IXGBE_VF_TRANS_VLAN.
> 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_pf.c | 40
> +++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_pf.c b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> index 495aff5..dbda9b5 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_pf.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_pf.c
> @@ -53,6 +53,8 @@
>  #include "ixgbe_ethdev.h"
> 
>  #define IXGBE_MAX_VFTA (128)
> +#define IXGBE_VF_MSG_SIZE_DEFAULT 1
> +#define IXGBE_VF_GET_QUEUE_MSG_SIZE 5
> 
>  static inline uint16_t
>  dev_num_vf(struct rte_eth_dev *eth_dev)
> @@ -491,9 +493,41 @@ ixgbe_negotiate_vf_api(struct rte_eth_dev *dev,
> uint32_t vf, uint32_t *msgbuf)
>  }
> 
>  static int
> +ixgbe_get_vf_queues(struct rte_eth_dev *dev, uint32_t vf, uint32_t *msgbuf)
> +{
> + struct ixgbe_vf_info *vfinfo =
> + *IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
> + uint32_t default_q = vf * RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool;
> +
> + /* Verify if the PF supports the mbox APIs version or not */
> + switch (vfinfo[vf].api_version) {
> + case ixgbe_mbox_api_20:
> + case ixgbe_mbox_api_11:
> + break;
> + default:
> + return -1;
> + }
> +
> + /* Notify VF of Rx and Tx queue number */
> + msgbuf[IXGBE_VF_RX_QUEUES] = RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool;
> + msgbuf[IXGBE_VF_TX_QUEUES] = RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool;

Are you sure this is good approach to pass nb_q_per_pool to VF as the number of 
available queues? What if PF does not use RSS nor DCB? Are thos queues always 
available in that case?

> +
> + /* Notify VF of default queue */
> + msgbuf[IXGBE_VF_DEF_QUEUE] = default_q;
> +
> + /*
> +  * FIX ME if it needs fill msgbuf[IXGBE_VF_TRANS_VLAN]
> +  * for VLAN strip or VMDQ_DCB or VMDQ_DCB_RSS
> +  */
> +
> + return 0;
> +}
> +


[dpdk-dev] [PATCH] ethdev: fix missing parenthesis

2015-01-16 Thread Wodkowski, PawelX
> Good catch!
> Was introduced in commit 4bdefaade6d1 (VMDQ enhancements).
> Note that quite often, when a patch contains too much things,
> we miss this kind of bugs. That's a reason to well split patches.
> 

What is most surprising I did not spotted this, neither  the compiler
what I think it should. It was my IDE which pointed possible bug.

Are we using '-Wparentheses' switch? If not, we should consider
to use this switch?

Pawel


[dpdk-dev] [PATCH 1/2] pmd: add DCB for VF for ixgbe

2015-01-14 Thread Wodkowski, PawelX
> > >
> > >   - split nb_q_per_pool to nb_rx_q_per_pool and nb_tx_q_per_pool
> > >
> > > Rationale:
> > >
> > > rx and tx number of queue might be different if RX and TX are
> > >
> > > configured in different mode. This allow to inform VF about
> > >
> > > proper number of queues.
> >
> >
> > Nice move! Ouyang, this is a nice answer to my recent remarks about your
> > PATCH4 in "Enable VF RSS for Niantic" series.
> 
> After I respond your last comments, I see this,  :-), I am sure we both agree 
> it is
> the right way to resolve it in vmdq dcb case.
> 

I am now dividing this patch with your suggestions and I am little confused.

In this (DCB in SRIOV) case the primary cause for spliting nb_q_per_pool into
nb_rx_q_per_pool and nb_tx_q_per_pool was because of this code:

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index af9e261..be3afe4 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -537,8 +537,8 @@
default: /* ETH_MQ_RX_VMDQ_ONLY or ETH_MQ_RX_NONE */
/* if nothing mq mode configure, use default scheme */
dev->data->dev_conf.rxmode.mq_mode = 
ETH_MQ_RX_VMDQ_ONLY;
-   if (RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool > 1)
-   RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool = 1;
+   if (RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool > 1)
+   RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool = 1;
break;
}

@@ -553,17 +553,18 @@
default: /* ETH_MQ_TX_VMDQ_ONLY or ETH_MQ_TX_NONE */
/* if nothing mq mode configure, use default scheme */
dev->data->dev_conf.txmode.mq_mode = 
ETH_MQ_TX_VMDQ_ONLY;
-   if (RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool > 1)
-   RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool = 1;
+   if (RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool > 1)
+   RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool = 1;
break;
}

/* check valid queue number */
-   if ((nb_rx_q > RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool) ||
-   (nb_tx_q > RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool)) {
+   if ((nb_rx_q > RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool) ||
+   (nb_tx_q > RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool)) {
PMD_DEBUG_TRACE("ethdev port_id=%d SRIOV active, "
-   "queue number must less equal to %d\n",
-   port_id, 
RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool);
+   "rx/tx queue number must less equal to 
%d/%d\n",
+   port_id, 
RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool,
+   
RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool);
return (-EINVAL);
}
} else {
--

This introduced an issue when RX and TX was configure in different way. The 
problem was
that the RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool as common for RX and TX and it is
changed. So I did the above. But when testpmd was adjusted for DCB in SRIOV 
there 
was another issue. Testpmd is pre-configuring ports by default and since
nb_rx_q_per_pool  and nb_tx_q_per_pool was already reset to 1 there was no way 
to 
use it for DCB in SRIOV. So I did another modification:

> + uint16_t nb_rx_q_per_pool = 
> RTE_ETH_DEV_SRIOV(dev).nb_rx_q_per_pool;
> + uint16_t nb_tx_q_per_pool = 
> RTE_ETH_DEV_SRIOV(dev).nb_tx_q_per_pool;
> +
>   switch (dev_conf->rxmode.mq_mode) {
> - case ETH_MQ_RX_VMDQ_RSS:
>   case ETH_MQ_RX_VMDQ_DCB:
> + break;
> + case ETH_MQ_RX_VMDQ_RSS:
>   case ETH_MQ_RX_VMDQ_DCB_RSS:
> - /* DCB/RSS VMDQ in SRIOV mode, not implement yet */
> + /* RSS, DCB+RSS VMDQ in SRIOV mode, not implement yet */
>   PMD_DEBUG_TRACE("ethdev port_id=%" PRIu8
>   " SRIOV active, "
>   "unsupported VMDQ mq_mode rx %u\n",
> @@ -537,37 +560,32 @@ rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t 
> nb_rx_q, uint16_t nb_tx_q,
>   default: /* ETH_MQ_RX_VMDQ_ONLY or ETH_MQ_RX_NONE */
>   /* if nothing mq mode configure, use default scheme */
>   dev->data->dev_conf.rxmode.mq_mode = 
> ETH_MQ_RX_VMDQ_ONLY;
> - if (RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool > 1)
> - RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool = 1;
> + if (nb_rx_q_per_pool > 1)
> + nb_rx_q_per_pool = 1;
>   break;
>   }
>   
> 

[dpdk-dev] [PATCH 2/2] testpmd: fix dcb in vt mode

2015-01-13 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> Sent: Tuesday, January 13, 2015 11:16 AM
> To: Jastrzebski, MichalX K; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/2] testpmd: fix dcb in vt mode
> 
> 
> On 01/12/15 17:50, Michal Jastrzebski wrote:
> > From: Pawel Wodkowski 
> >
> > This patch incorporate fixes to support DCB in SRIOV mode for testpmd.
> > It also clean up some old code that is not needed or wrong.
> 
> The same here: could u, pls., separate the "cleanup" part of the patch
> from the "fixes" part into separate patches?
> 

Maybe little confusion I introduced by saying cleanups. Some code became
obsolete (like enum dcb_mode_enable) when I fixed DCV in VT mode, so
removing those parts I called "cleanups". Please consider them to be a fixes.

Pawel



[dpdk-dev] [PATCH 1/2] pmd: add DCB for VF for ixgbe

2015-01-13 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> Sent: Tuesday, January 13, 2015 11:14 AM
> To: Jastrzebski, MichalX K; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/2] pmd: add DCB for VF for ixgbe
> 
> 
> On 01/12/15 17:50, Michal Jastrzebski wrote:
> > From: Pawel Wodkowski 
> >
> > This patch add support for DCB in SRIOV mode. When no PFC
> > is enabled this feature might be used as multiple queues
> > (up to 8 or 4) for VF.
> >
> > It incorporate following modifications:
> >   - Allow zero rx/tx queues to be passed to rte_eth_dev_configure().
> > Rationale:
> > in SRIOV mode PF use first free VF to RX/TX. If VF count
> > is 16 or 32 all recources are assigned to VFs so PF can
> > be used only for configuration.
> >   - split nb_q_per_pool to nb_rx_q_per_pool and nb_tx_q_per_pool
> > Rationale:
> > rx and tx number of queue might be different if RX and TX are
> > configured in different mode. This allow to inform VF about
> > proper number of queues.
> >   - extern mailbox API for DCB mode
> 
> IMHO each bullet above is worth a separate patch. ;)
> It would be much easier to review.
> 

Good point. I will send next version shortly.

Pawel


[dpdk-dev] [PATCH 0/2] Enable DCB in SRIOV mode for ixgbe driver

2015-01-13 Thread Wodkowski, PawelX
Comments are more than welcome :)

Pawel



[dpdk-dev] [PATCH] bond: vlan flags misinterpreted in xmit_slave_hash function

2014-12-16 Thread Wodkowski, PawelX
> -Original Message-
> From: Doherty, Declan
> Sent: Tuesday, December 16, 2014 12:16 PM
> To: dev at dpdk.org
> Cc: Wodkowski, PawelX; Doherty, Declan
> Subject: [PATCH] bond: vlan flags misinterpreted in xmit_slave_hash function
> 
> - Split transmit hashing function into separate functions to reduce branching
>   and to make code clearer.
> - Add IPv4 IHL parameters to rte_ip.h
> - Fixed VLAN tag support in hashing functions and add support for TCP
>   in layer 4 header hashing.
> - Fixed incorrect flag set in test application packet generator.
> 
> Signed-off-by: Declan Doherty 


Acked-by: Wodkowski, Pawel 





[dpdk-dev] [PATCH v2] bond: static analysis issues fix

2014-12-16 Thread Wodkowski, PawelX
> -Original Message-
> From: Doherty, Declan
> Sent: Monday, December 15, 2014 6:14 PM
> To: dev at dpdk.org
> Cc: Wodkowski, PawelX; Doherty, Declan
> Subject: [PATCH v2] bond: static analysis issues fix
> 
> -v2:
> Incorporates Pawel's comments regarding assertion's check on activate_slave
> array indexing
> 
> Fixes for link bonding library identified by static analysis tool
> 
> - Overflow assert for active_slaves array in activate_slave function
> - Allocation check of pci_id_table in rte_eth_bond_create
> - Use of eth_dev pointer in mac_address_get/set before NULL check
> 
> Signed-off-by: Declan Doherty 
> ---
> 

Acked-by: Wodkowski, Pawel 




[dpdk-dev] error: value computed is not used

2014-12-15 Thread Wodkowski, PawelX
> > > > And something like this?
> > > >
> > > >  #define rte_memcpy(dst, src, n)  \
> > > > -   ((__builtin_constant_p(n)) ?  \
> > > > +   ({ (__builtin_constant_p(n)) ?  \
> > > > memcpy((dst), (src), (n)) :  \
> > > > -   rte_memcpy_func((dst), (src), (n)))
> > > > +   rte_memcpy_func((dst), (src), (n)); })
> > >
> > > What happens to the returned value after this change?
> > > ptr = rte_memcpy(dst, src, n) + offset:
> > >
> > https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs
> >
> > Whole expression should be 'void *' type (like *memcpy()) and it should work
> > as usual (see maxint() example in above link). It is GCC extension.
> 
> OK nice.
> I didn't test it on SUSE 11 SP3. I assume you did it?
I did not tested this, as this was only proposal. I only run build process and 
it pass.
Patch proposal will be sent in a while.

> Please Pawel, could you send a proper patch quickly?
> If nobody disagree, it'll be merged in RC5 today.
> 
> > > > Thomas, can you check build with EXTRA_CFLAG='-Wunused-value'.
> > >
> > > You mean EXTRA_CFLAGS (with a S).
> > > It fails in many locations.
> > > What's your point?
> >
> > I am just asking if this is an typo, error or intend to do statements with 
> > no
> effects like bellow.
> >
> > ixgbe_common.c:4429:3: error: statement with no effect [-Werror=unused-
> value]
> >
> > 4426:   /* first pull in the header so we know the buffer length */
> > 4427:   for (bi = 0; bi < dword_len; bi++) {
> > 4428:   buffer[bi] = IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG,
> bi);
> > 4429:   IXGBE_LE32_TO_CPUS([bi]); // <-- here
> > 4430}
> 
> It's an intent. On big endian CPU, this has an effect.
> 

If you see something what I am not, please ignore this part but for me this 
looks like it should be:
tmp = IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG,
buffer[bi] = IXGBE_LE32_TO_CPUS (tmp);

Pawel


[dpdk-dev] error: value computed is not used

2014-12-15 Thread Wodkowski, PawelX


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, December 15, 2014 2:27 PM
> To: Wodkowski, PawelX
> Cc: Qiu, Michael; dev at dpdk.org
> Subject: Re: [dpdk-dev] error: value computed is not used
> 
> 2014-12-15 11:27, Wodkowski, PawelX:
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > > 2014-12-08 15:26, Wodkowski, PawelX:
> > > > From: Qiu, Michael
> > > > > On 2014/12/8 19:00, Wodkowski, PawelX wrote:
> > > > > >> lib/librte_pmd_enic/enic_main.c: In function 'enic_set_rsskey':
> > > > > >> lib/librte_pmd_enic/enic_main.c:862:2: error: value computed is not
> used
> > > > > >>
> > > > > >> I dig out that, it was ome issue of  the macros rte_memcpy()
> > > > > >> #define rte_memcpy(dst, src, n)  \
> > > > > >> ((__builtin_constant_p(n)) ?  \
> > > > > >> memcpy((dst), (src), (n)) :  \
> > > > > >> rte_memcpy_func((dst), (src), (n)))
> > > > > >>
> > > > > >> When I use only (n) instead of (__builtin_constant_p(n), it will 
> > > > > >> pass( I
> > > > > >> know that it was incorrect, just a experiment).
> > > > > >>
> > > > > >> But I try to use inline function instead of macros:
> > > > > >> static inline void * rte_memcpy(void *dst, const void *src, size_t 
> > > > > >> n)
> > > > > >> {
> > > > > >> return __builtin_constant_p(n) ? memcpy(dst, src, n) :
> > > > > >>  rte_memcpy_func(dst, src, 
> > > > > >> n);
> > > > > >> }
> > > > > >>
> > > > > >> It will pass:), and works, this could be one potential workaround 
> > > > > >> fix.
> > > > > >>
> > > > > >> Who knows why? The root cause is what?
> > > > > >>
> > > > > >> I've no idea about this.
> > > > > >>
> > > > > > I got the same issue while ago. I don't remember exactly everything
> > > > > > but my conclusion was that there was some bug in compiler. I think,
> > > > > > when 'n' I constant and/or small compiler is inlining memcpy and
> throwing
> > > > > > everything else (including returned value). In that case error is 
> > > > > > not
> > > > > > produced (I think this is a bug in compiler). In other case it is 
> > > > > > computing
> > > > > > some value calling memcpy or rte_ memcpy and you should at least
> > > > > > explicitly throw it away by casting to void. I like solution with 
> > > > > > static
> > > > >
> > > > > Actually, I try to pass "n" as a Int value like 4, it still report 
> > > > > this
> > > > > error :)
> > > >
> > > > My workaround was:
> > > > (void) rte_memcpy(...);
> > > >
> > > > But this is only a workaround.
> > >
> > > It's not so bad.
> > >
> > > > > > inline but someone else should spoke about possible side effects.
> > > > >
> > > > > Yes, but as I know inline is better than macros.
> > >
> > > From the GCC manual:
> > > "
> > > You may use this built-in function in either a macro or an inline 
> > > function.
> > > However, if you use it in an inlined function and pass an argument of the
> > > function as the argument to the built-in, GCC never returns 1 when you 
> > > call
> > > the inline function with a string constant or compound literal and does 
> > > not
> > > return 1 when you pass a constant numeric value to the inline function 
> > > unless
> > > you specify the -O option.
> > > "
> > >
> > > It seems the "inline fix" cannot be used.
> > >
> > > I'm going to send a patch with Pawel's workaround.
> >
> > And something like this?
> >
> >  #define rte_memcpy(dst, src, n)  \
> > -   ((__builtin_constant_p(n)) ?  \
> > +   ({ (__builtin_constant_p(n)) ?  \
> > memcpy((dst), (src), (n)) :  \
> > -   rte_memcpy_func((dst), (src), (n)))
> > +   rte_memcpy_func((dst), (src), (n)); })
> 
> What happens to the returned value after this change?
> ptr = rte_memcpy(dst, src, n) + offset:
> 
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs

Whole expression should be 'void *' type (like *memcpy()) and it should work
as usual (see maxint() example in above link). It is GCC extension. 

> > Thomas, can you check build with EXTRA_CFLAG='-Wunused-value'.
> 
> You mean EXTRA_CFLAGS (with a S).
> It fails in many locations.
> What's your point?

I am just asking if this is an typo, error or intend to do statements with no 
effects like bellow.

ixgbe_common.c:4429:3: error: statement with no effect [-Werror=unused-value]

4426:   /* first pull in the header so we know the buffer length */
4427:   for (bi = 0; bi < dword_len; bi++) {
4428:   buffer[bi] = IXGBE_READ_REG_ARRAY(hw, IXGBE_FLEX_MNG, bi);
4429:   IXGBE_LE32_TO_CPUS([bi]); // <-- here
4430}


> Do you to support -Wunused-value?
No, I just turned this on to check above change and was surprised what happened.


-- 
Pawel



[dpdk-dev] error: value computed is not used

2014-12-15 Thread Wodkowski, PawelX

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, December 15, 2014 11:55 AM
> To: Wodkowski, PawelX; Qiu, Michael
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] error: value computed is not used
> 
> 2014-12-08 15:26, Wodkowski, PawelX:
> > From: Qiu, Michael
> > > On 2014/12/8 19:00, Wodkowski, PawelX wrote:
> > > >> lib/librte_pmd_enic/enic_main.c: In function 'enic_set_rsskey':
> > > >> lib/librte_pmd_enic/enic_main.c:862:2: error: value computed is not 
> > > >> used
> > > >>
> > > >> I dig out that, it was ome issue of  the macros rte_memcpy()
> > > >> #define rte_memcpy(dst, src, n)  \
> > > >> ((__builtin_constant_p(n)) ?  \
> > > >> memcpy((dst), (src), (n)) :  \
> > > >> rte_memcpy_func((dst), (src), (n)))
> > > >>
> > > >> When I use only (n) instead of (__builtin_constant_p(n), it will pass( 
> > > >> I
> > > >> know that it was incorrect, just a experiment).
> > > >>
> > > >> But I try to use inline function instead of macros:
> > > >> static inline void * rte_memcpy(void *dst, const void *src, size_t n)
> > > >> {
> > > >> return __builtin_constant_p(n) ? memcpy(dst, src, n) :
> > > >>  rte_memcpy_func(dst, src, n);
> > > >> }
> > > >>
> > > >> It will pass:), and works, this could be one potential workaround fix.
> > > >>
> > > >> Who knows why? The root cause is what?
> > > >>
> > > >> I've no idea about this.
> > > >>
> > > > I got the same issue while ago. I don't remember exactly everything
> > > > but my conclusion was that there was some bug in compiler. I think,
> > > > when 'n' I constant and/or small compiler is inlining memcpy and 
> > > > throwing
> > > > everything else (including returned value). In that case error is not
> > > > produced (I think this is a bug in compiler). In other case it is 
> > > > computing
> > > > some value calling memcpy or rte_ memcpy and you should at least
> > > > explicitly throw it away by casting to void. I like solution with static
> > >
> > > Actually, I try to pass "n" as a Int value like 4, it still report this
> > > error :)
> >
> > My workaround was:
> > (void) rte_memcpy(...);
> >
> > But this is only a workaround.
> 
> It's not so bad.
> 
> > > > inline but someone else should spoke about possible side effects.
> > >
> > > Yes, but as I know inline is better than macros.
> 
> From the GCC manual:
> "
> You may use this built-in function in either a macro or an inline function.
> However, if you use it in an inlined function and pass an argument of the
> function as the argument to the built-in, GCC never returns 1 when you call
> the inline function with a string constant or compound literal and does not
> return 1 when you pass a constant numeric value to the inline function unless
> you specify the -O option.
> "
> 
> It seems the "inline fix" cannot be used.
> 
> I'm going to send a patch with Pawel's workaround.

And something like this?

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index 290c5cd..906c911 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -168,10 +168,10 @@
rte_mov128(dst + 128, src + 128);
 }

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index 290c5cd..c3e8b81 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -169,9 +169,9 @@
 }

 #define rte_memcpy(dst, src, n)  \
-   ((__builtin_constant_p(n)) ?  \
+   ({ (__builtin_constant_p(n)) ?  \
memcpy((dst), (src), (n)) :  \
-   rte_memcpy_func((dst), (src), (n)))
+   rte_memcpy_func((dst), (src), (n)); })

 static inline void *
 rte_memcpy_func(void *dst, const void *src, size_t n)


Thomas, can you check build with EXTRA_CFLAG='-Wunused-value'. 

/home/pwodkowx/grizzly/dpdk_org_declan_v3_mode4_v2/lib/librte_pmd_ixgbe/ixgbe/ixgbe_common.c:
 In function 'ixgbe_host_interface_command':
/home/pwodkowx/grizzly/dpdk_org_declan_v3_mode4_v2/lib/librte_pmd_ixgbe/ixgbe/ixgbe_common.c:4429:3:
 error: statement with no effect [-Werror=unused-value]
/home/pwodkowx/grizzly/dpdk_org_declan_v3_mode4_v2/lib/librte_pmd_ixgbe/ixgbe/ixgbe_common.c:4448:3:
 error: statement with no effect [-Werror=unused-value]



[dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with qsort() from standard library

2014-12-15 Thread Wodkowski, PawelX
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jay Rolette
> Sent: Thursday, December 11, 2014 5:06 PM
> To: Dev
> Subject: [dpdk-dev] [PATCH] replaced O(n^2) sort in sort_by_physaddr() with
> qsort() from standard library
> 
> Signed-off-by: Jay Rolette 
> ---
>  lib/librte_eal/linuxapp/eal/eal_memory.c | 59
> +++-
>  1 file changed, 20 insertions(+), 39 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index bae2507..3656515 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -670,6 +670,25 @@ error:
>   return -1;
>  }
> 
> +static int
> +cmp_physaddr(const void *a, const void *b)
> +{
> +#ifndef RTE_ARCH_PPC_64
> + const struct hugepage_file *p1 = (const struct hugepage_file *)a;
> + const struct hugepage_file *p2 = (const struct hugepage_file *)b;
> +#else
> + // PowerPC needs memory sorted in reverse order from x86
> + const struct hugepage_file *p1 = (const struct hugepage_file *)b;
> + const struct hugepage_file *p2 = (const struct hugepage_file *)a;
> +#endif
> + if (p1->physaddr < p2->physaddr)
> + return -1;
> + else if (p1->physaddr > p2->physaddr)
> + return 1;
> + else
> + return 0;
> +}
> +

Why not simply

return (int)(p1->physaddr - p2->physaddr);



[dpdk-dev] [PATCH] bond: static analysis issues fix

2014-12-12 Thread Wodkowski, PawelX


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Declan Doherty
> Sent: Friday, December 12, 2014 6:40 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] bond: static analysis issues fix
> 
> Fixes for link bonding library identified by static analysis tool
> 
> - Overflow check for active_slaves array in activate_slave function
> - Allocation check of pci_id_table in rte_eth_bond_create
> - Use of eth_dev pointer in mac_address_get/set before NULL check
> 
> Signed-off-by: Declan Doherty 
> ---
>  lib/librte_pmd_bond/rte_eth_bond_api.c | 12 
>  lib/librte_pmd_bond/rte_eth_bond_pmd.c |  8 
>  2 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_pmd_bond/rte_eth_bond_api.c
> b/lib/librte_pmd_bond/rte_eth_bond_api.c
> index ef5ddf4..9cb1c1f 100644
> --- a/lib/librte_pmd_bond/rte_eth_bond_api.c
> +++ b/lib/librte_pmd_bond/rte_eth_bond_api.c
> @@ -115,8 +115,11 @@ activate_slave(struct rte_eth_dev *eth_dev, uint8_t
> port_id)
>   if (internals->mode == BONDING_MODE_8023AD)
>   bond_mode_8023ad_activate_slave(eth_dev, port_id);
> 
> - internals->active_slaves[internals->active_slave_count] = port_id;
> - internals->active_slave_count++;
> + if (internals->active_slave_count <
> + RTE_DIM(internals->active_slaves) - 1) {
> + internals->active_slaves[internals->active_slave_count] =
> port_id;
> + internals->active_slave_count++;
> + }
>  }
> 
>  void
> @@ -144,7 +147,8 @@ deactivate_slave(struct rte_eth_dev *eth_dev, uint8_t
> port_id)
>   sizeof(internals->active_slaves[0]));
>   }
> 
> - internals->active_slave_count = active_count;
> + internals->active_slave_count = active_count < RTE_MAX_ETHPORTS ?
> + active_count : RTE_MAX_ETHPORTS - 1;

Since port might not be added twice and active_slaves array is (should be)
 proper size to contain every port you can add to bonding and in fact is
one element bigger and active_slave_count should newer overflow, those
changes might only mask real problems in user application and/or library itself.
I think if you want to make this static analysis tool happy it should be changed
to RTE_VERIFY(), assert(), rte_panic() or something like that to indicate
undefined state.

Pawel


[dpdk-dev] Added Spinlock to l3fwd-vf example to prevent race conditioning

2014-12-10 Thread Wodkowski, PawelX
> Though, that said, doesn't it seem to anyone else like serialization of 
> enqueue
> to a port should be the responsibility of the library, not the application?
> 
> Neil

>From my knowledge it is an application  responsibility to serialize access to
queue on particular port.

Pawel


[dpdk-dev] error: value computed is not used

2014-12-08 Thread Wodkowski, PawelX


> -Original Message-
> From: Qiu, Michael
> Sent: Monday, December 08, 2014 4:24 PM
> To: Wodkowski, PawelX; dev at dpdk.org
> Subject: Re: error: value computed is not used
> 
> On 2014/12/8 19:00, Wodkowski, PawelX wrote:
> >> lib/librte_pmd_enic/enic_main.c: In function 'enic_set_rsskey':
> >> lib/librte_pmd_enic/enic_main.c:862:2: error: value computed is not used
> >>
> >> I dig out that, it was ome issue of  the macros rte_memcpy()
> >> #define rte_memcpy(dst, src, n)  \
> >> ((__builtin_constant_p(n)) ?  \
> >> memcpy((dst), (src), (n)) :  \
> >> rte_memcpy_func((dst), (src), (n)))
> >>
> >> When I use only (n) instead of (__builtin_constant_p(n), it will pass( I
> >> know that it was incorrect, just a experiment).
> >>
> >> But I try to use inline function instead of macros:
> >> static inline void * rte_memcpy(void *dst, const void *src, size_t n)
> >> {
> >> return __builtin_constant_p(n) ? memcpy(dst, src, n) :
> >>  rte_memcpy_func(dst, src, n);
> >> }
> >>
> >> It will pass:), and works, this could be one potential workaround fix.
> >>
> >> Who knows why? The root cause is what?
> >>
> >> I've no idea about this.
> >>
> > I got the same issue while ago. I don't remember exactly everything
> > but my conclusion was that there was some bug in compiler. I think,
> > when 'n' I constant and/or small compiler is inlining memcpy and throwing
> > everything else (including returned value). In that case error is not
> > produced (I think this is a bug in compiler). In other case it is computing
> > some value calling memcpy or rte_ memcpy and you should at least
> > explicitly throw it away by casting to void. I like solution with static
> 
> Actually, I try to pass "n" as a Int value like 4, it still report this
> error :)

My workaround was:
(void) rte_memcpy(...);

But this is only a workaround.

> 
> > inline but someone else should spoke about possible side effects.
> 
> Yes, but as I know inline is better than macros.
> 
> Thanks,
> Michael
> >
> > Pawel
> >
> >



[dpdk-dev] error: value computed is not used

2014-12-08 Thread Wodkowski, PawelX
> lib/librte_pmd_enic/enic_main.c: In function 'enic_set_rsskey':
> lib/librte_pmd_enic/enic_main.c:862:2: error: value computed is not used
> 
> I dig out that, it was ome issue of  the macros rte_memcpy()
> #define rte_memcpy(dst, src, n)  \
> ((__builtin_constant_p(n)) ?  \
> memcpy((dst), (src), (n)) :  \
> rte_memcpy_func((dst), (src), (n)))
> 
> When I use only (n) instead of (__builtin_constant_p(n), it will pass( I
> know that it was incorrect, just a experiment).
> 
> But I try to use inline function instead of macros:
> static inline void * rte_memcpy(void *dst, const void *src, size_t n)
> {
> return __builtin_constant_p(n) ? memcpy(dst, src, n) :
>  rte_memcpy_func(dst, src, n);
> }
> 
> It will pass:), and works, this could be one potential workaround fix.
> 
> Who knows why? The root cause is what?
> 
> I've no idea about this.
> 

I got the same issue while ago. I don't remember exactly everything
but my conclusion was that there was some bug in compiler. I think,
when 'n' I constant and/or small compiler is inlining memcpy and throwing
everything else (including returned value). In that case error is not
produced (I think this is a bug in compiler). In other case it is computing
some value calling memcpy or rte_ memcpy and you should at least
explicitly throw it away by casting to void. I like solution with static
inline but someone else should spoke about possible side effects.

Pawel



[dpdk-dev] [PATCH] bond: fix for mac assignment to slaves device

2014-12-08 Thread Wodkowski, PawelX
Some formatting issues during posting. I was talking about parenthesis in 
count calculation

 (sizeof(internals->slaves[0]) *
internals->slave_count - i - 1));


[dpdk-dev] [PATCH v6 4/6] enicpmd: pmd specific code

2014-11-27 Thread Wodkowski, PawelX
> diff --git a/lib/librte_pmd_enic/enic_main.c b/lib/librte_pmd_enic/enic_main.c
> new file mode 100644
> index 000..c047cc8
> --- /dev/null
> +++ b/lib/librte_pmd_enic/enic_main.c
> @@ -0,0 +1,1266 @@
> +/*
> + * Copyright 2008-2014 Cisco Systems, Inc.  All rights reserved.


> +#ident "$Id$"
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#ifdef RTE_EAL_VFIO
> +#include 
> +#endif
> +

This gives compilation error: lib/librte_pmd_enic/enic_main.c:43:24: fatal 
error: linux/vfio.h: No such file or directory

Ubuntu 12.04
Linux grizzly 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 
2014 x86_64 x86_64 x86_64 GNU/Linux

lib/librte_eal/linuxapp/eal/eal_vfio.h gives some tips how vfio might be 
included.

Pawel


[dpdk-dev] [PATCH v3 0/2] bond: mode 4 support

2014-11-07 Thread Wodkowski, PawelX


Pawe?


> -Original Message-
> From: Wodkowski, PawelX
> Sent: Friday, November 07, 2014 16:05
> To: De Lara Guarch, Pablo; Jastrzebski, MichalX K
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3 0/2] bond: mode 4 support
> 
> > Since Declan sent a v6 for link bonding changes, maybe it is a good idea
> > to send another version, as v5 will not be merged.
> >
> > Pablo
> Hi Pablo
> 
> I compared v6 patches vs patches produced by git rebase v5 patches
> on top which I used to produce mode 4 v3 patches and I think they are
> compatible. There is one difference is in unit tests but, fortunately I
> do not touch unit tests yet so they are be applicable. I can't fully verify
> that as I am unable to apply patches in v6.
> If you think this is insufficient perhaps I should wait for Declan's patches
> be merged into master branch.
> 
> $ git apply d_v6/0007*.patch -v
> Checking patch app/test-pmd/cmdline.c...
> Checking patch app/test/test.h...
> Checking patch app/test/test_link_bonding.c...
> Checking patch app/test/virtual_pmd.c...
> Checking patch app/test/virtual_pmd.h...
> Checking patch doc/guides/testpmd_app_ug/testpmd_funcs.rst...
> error: doc/guides/testpmd_app_ug/testpmd_funcs.rst: No such file or directory
> Checking patch lib/librte_pmd_bond/rte_eth_bond.h...
> Checking patch lib/librte_pmd_bond/rte_eth_bond_api.c...
> Checking patch lib/librte_pmd_bond/rte_eth_bond_args.c...
> Checking patch lib/librte_pmd_bond/rte_eth_bond_pmd.c...
> Checking patch lib/librte_pmd_bond/rte_eth_bond_private.h...
> 

Ok, I did manage to apply v6 patches after pulling latest commits. Mode 4
patches are applicable also for Declan's v6 patches.

Pawel


[dpdk-dev] [PATCH v3 0/2] bond: mode 4 support

2014-11-07 Thread Wodkowski, PawelX
> Since Declan sent a v6 for link bonding changes, maybe it is a good idea
> to send another version, as v5 will not be merged.
> 
> Pablo
Hi Pablo

I compared v6 patches vs patches produced by git rebase v5 patches 
on top which I used to produce mode 4 v3 patches and I think they are
compatible. There is one difference is in unit tests but, fortunately I
do not touch unit tests yet so they are be applicable. I can't fully verify
that as I am unable to apply patches in v6.
If you think this is insufficient perhaps I should wait for Declan's patches
be merged into master branch.

$ git apply d_v6/0007*.patch -v
Checking patch app/test-pmd/cmdline.c...
Checking patch app/test/test.h...
Checking patch app/test/test_link_bonding.c...
Checking patch app/test/virtual_pmd.c...
Checking patch app/test/virtual_pmd.h...
Checking patch doc/guides/testpmd_app_ug/testpmd_funcs.rst...
error: doc/guides/testpmd_app_ug/testpmd_funcs.rst: No such file or directory
Checking patch lib/librte_pmd_bond/rte_eth_bond.h...
Checking patch lib/librte_pmd_bond/rte_eth_bond_api.c...
Checking patch lib/librte_pmd_bond/rte_eth_bond_args.c...
Checking patch lib/librte_pmd_bond/rte_eth_bond_pmd.c...
Checking patch lib/librte_pmd_bond/rte_eth_bond_private.h...

Pawel


[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-30 Thread Wodkowski, PawelX
> -Original Message-
> Pawe?
> 
> > On Mon, Sep 29, 2014 at 10:11:38AM +0000, Wodkowski, PawelX wrote:
> > > > >
> > > > > Image how you will be damned by someone that not even notice you
> > change
> > > > > and he Is managing some kind of resource based on returned number of
> > > > > set/canceled timers. If you suddenly start returning negative values 
> > > > > how
> > those
> > > > > application will behave? Silently changing returned value domain is 
> > > > > evil in
> > its
> > > > > pure form.
> > > >
> > > > As I can see the impact is very limited.
> > >
> > > It is small impact to DPDK but can be huge to user application:
> >
> > This is why we traditionally have in the release-notes for each release a
> > section dedicated to calling out changes from one release to another. [See
> > http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> > from release-to-release there are generally only a couple of changes -
> > though our next release may be a little different - the actual changes are
> > clear enough to read about without wading through pages of documentation.
> I
> > thinking calling out the change in both the release notes and the API docs
> > is sufficient even for a change like this.
> >
> > Basically, I wouldn't let API stability factor in too much in trying to get
> > a proper fix for this issue.
> >
> > /Bruce
> >
> 
> Summarizing all proposed solutions and to be able to produce final patch - 
> what
> Is desired behavior after fix?
> 
> 1. do we leave as is in Patch v2:
> 1.1 if canceling from other thread - if one of the alarms is executing, wait 
> to
>   finish its execution and then cancel any rearmed alarms.
> 1.2 if canceling from alarm handler and one of the alarms to cancel is this
>   executing callback do no wait for it to finish and cancel anything else.
> 
>  in both cases return number of canceled callbacks.
> 
> 2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
>   if one of the alarms to cancel is currently executing callback from alarm 
> thread
>   (information about number of canceled alarms will be lost).

Or instead of returning -EINPROGRESS set errno to EINPROGRESS (replace
returning error value by setting errno and function can always return number
of canceled callbacks - in error condition 0)?

> 
> 3. refuse to cancel anything if canceling currently executing alarm from alarm
>   callback and return -EINPROGRESS otherwise do like in 1.1.
> 
> 4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
>   alarms and retrun state of given alarms(s).
> 
> 5. other solutions?
> 
> Pawel


[dpdk-dev] [PATCH v2] bond: Add mode 4 support.

2014-09-30 Thread Wodkowski, PawelX
Fixed patch version sent.

Pawel


[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-30 Thread Wodkowski, PawelX


Pawe?

> -Original Message-
> From: Richardson, Bruce
> Sent: Monday, September 29, 2014 12:33
> To: Wodkowski, PawelX
> Cc: Ananyev, Konstantin; Neil Horman; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] Change alarm cancel function to thread-
> safe:
> 
> On Mon, Sep 29, 2014 at 10:11:38AM +, Wodkowski, PawelX wrote:
> > > >
> > > > Image how you will be damned by someone that not even notice you
> change
> > > > and he Is managing some kind of resource based on returned number of
> > > > set/canceled timers. If you suddenly start returning negative values how
> those
> > > > application will behave? Silently changing returned value domain is 
> > > > evil in
> its
> > > > pure form.
> > >
> > > As I can see the impact is very limited.
> >
> > It is small impact to DPDK but can be huge to user application:
> 
> This is why we traditionally have in the release-notes for each release a
> section dedicated to calling out changes from one release to another. [See
> http://dpdk.org/doc/intel/dpdk-release-notes-1.7.0.pdf section 5]. Since
> from release-to-release there are generally only a couple of changes -
> though our next release may be a little different - the actual changes are
> clear enough to read about without wading through pages of documentation. I
> thinking calling out the change in both the release notes and the API docs
> is sufficient even for a change like this.
> 
> Basically, I wouldn't let API stability factor in too much in trying to get
> a proper fix for this issue.
> 
> /Bruce
> 

Summarizing all proposed solutions and to be able to produce final patch - what
Is desired behavior after fix?

1. do we leave as is in Patch v2:
1.1 if canceling from other thread - if one of the alarms is executing, wait to 
  finish its execution and then cancel any rearmed alarms.
1.2 if canceling from alarm handler and one of the alarms to cancel is this 
  executing callback do no wait for it to finish and cancel anything else.

 in both cases return number of canceled callbacks.

2. Do exactly like in 1. but return -EINPROGRESS instead of canceled alarms
  if one of the alarms to cancel is currently executing callback from alarm 
thread
  (information about number of canceled alarms will be lost).

3. refuse to cancel anything if canceling currently executing alarm from alarm 
  callback and return -EINPROGRESS otherwise do like in 1.1.

4. Implement behaviour 1/2/3 (which?) and add API call to interrogate list of
  alarms and retrun state of given alarms(s).

5. other solutions?

Pawel


[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-29 Thread Wodkowski, PawelX
> Yes, this is my concern exactly.
> 
> >  If that's so, then I suppose we can do: make alarm_cancel() to return a
> negative value for the case #3 (-EINPROGRESS or something).
> >  Something like:
> > ...
> > if (ap->executing == 0) {
> >LIST_REMOVE(ap,next);
> > rte_free(ap);
> > count++;
> > ap = ap_prev;
> > } else if (pthread_equal(ap->executing_id, pthread_self()) == 0) {
> > executing++;
> > } else {
> >ret = -EINPROGRESS;
> > }
> > ...
> > return ((ret != 0) ? ret : count);
> >
> > So the return value  will be > 0 for #1, 0 for #2, <0 for #3.
> > As I remember, you already suggested something similar in one of the 
> > previous
> mails.
> Yes, I rolled the API changes I suggested in with this model, because I wanted
> to be able to do precise specification of a timer instance to cancel, but if
> we're not ready to make that change, I think what you propose above would be
> suffficient.  Theres some question as to weather we would cancel timers that
> are
> still pending on a return of -EINPROGRESS, but I think if we document it
> accordingly, then it can be worked out just fine.
> 
> Best
> Neil
> 

Image how you will be damned by someone that not even notice you change
and he Is managing some kind of resource based on returned number of 
set/canceled timers. If you suddenly start returning negative values how those
application will behave? Silently changing returned value domain is evil in its 
pure form.

>From my point of view, problem is virtual because this is user application 
>task to 
know what it can and what it not. If you really want to inform user application
about timer state you can introduce API call which will interrogate timers list
and return appropriate value, but for god sake, do not introduce untraceable 
bugs.

Pawel


[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX
> > > Maybe I don't see something obvious? :)
> 
> I think you're missing the fact that your patch doesn't do what you assert 
> above
> either :)

Issue is not in setting alarms but canceling it. If you look closer to my patch 
you
see that it address this issue (look at added *do { lock(); ; unlock(); } 
while( )* 
part).

> 
> First, lets address rte_alarm_set.  There is no notion of "re-arming" in this
> alarm implementation, because theres no ability to refer to a specific alarm
> from the callers perspective.  When you call rte_eal_alarm_set you get a new
> alarm every time.  So I don't really see a race there.  It might not be 
> exactly
> the behavior you want, but its not a race, becuase you're not modifying an
> alarm
> in the middle of execution, you're just creating a new alarm, which is safe.

OK, it is safe, but this is not the case.

> 
> There is a race in what you describe above, insofar as its possible that you
> might call rte_eal_alarm_cancel and return without having canceled all the
> matching alarms.  I don't see any clear documentation on what the behavior is
> supposed to be, but if you want to ensure that all matching alarms are 
> cancelled
> or complete on return from rte_eal_alarm_cancel, thats perfectly fine (in 
> linux
> API parlance, thats usually denoted as a cancel_sync operation).

Again, look at the patch. I changed documentation to inform about this behavior.

> 
> For that race condition, you're correct, my patch doesn't address it, I see 
> that
> now.  Though your patch doesn't either.  If you call rte_eal_alarm_cancel from
> within a callback function, then, by definition, you can't wait on the
> completion of the active alarm, because thats a deadlock.  Its a necessecary
> evil, I grant you, but it means that you can't be guaranteed the cancelled and
> complete (cancel_sync) behavior that you want, at least not with the current
> api.  If you want that behavior, you need to do one of two things:

This patch does not break any API. It only removes undefined behavior.

> 
> 1) Modify the api to allow callers to individually reference timer instances, 
> so
> that when cancelling, we can return an appropriate return code to indicate to
> the caller that this alarm is in-progress.  That way you can guarantee the
> caller that the specific alarm that you cancelled is either complete and 
> cancelled
> or currently executing.  Add an api to expicitly wait on a referenced alarm as
> well.  This allows developers to know that, when executing an alarm callback, 
> an
> -ECURRENTLYEXECUTING return code is ok, because they are in the currently
> executing context.

This would brake API for sure.



[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX
> So basically cancel() just set ALARM_CANCELLED and leaves actual alarm
> deletion to the callback()?
> That was the thought, yes.
> 
> > I think it is doable - but I don't see any real advantage with that 
> > approach.
> > Yes, code will become a bit simpler, as  we'll have one point when we remove
> alarm from the list.
> Yes, that would be the advantage, that the code would be much simpler.
> 
> > But from other side, imagine such simple test-case:
> >
> > for (i = 0; i < 0x10; i++) {
> >rte_eal_alarm_set(ONE_MIN, cb_func, (void *)i);
> >rte_eal_alarm_cancel(cb_func, (void *)i);
> > }
> >
> > We'll endup with 1M of cancelled, but still not removed entries in the
> alarm_list.
> > With current implementation that means - few MBs of wasted memory,
> Thats correct, and the tradeoff to choose between.  Do you want simpler code
> that is easier to maintain, or do you want a high speed cancel and set
> operation.  I'm not aware of all the use cases, but I have a hard time seeing
> a use case in which the in-flight alarm list grows unboundedly large, which in
> my mind mitigates the risk of deferred removal, but I'm perfectly willing to
> believe that there are use cases which I'm not aware of.
> 
> > plus very slow set() and cancel(), as they'll  have to traverse all entries 
> > in the
> list.
> > And all that - for empty from user perspective alarm_list
> > So I still prefer Michal's way.
> > After all, it doesn't look that complicated to me.
> Except that the need for Michals fix arose from the fact that we have two free
> locations that might both get called depending on the situation.  Thats what 
> I'm
> trying to address here, the complexity itself, rather than the fix (which I
> agree is perfectly valid).
> 
> > BTW, any particular reason you are so negative about pthread_self()?
> >
> Nothing specifically against it (save for its inverted return code sense, 
> which
> made it difficult for me to parse when reviewing).  Its more the complexity
> itself in the alarm cancel and callback routine that I was looking at.  Given
> that the origional bug happened because an cancel in a callback might produce 
> a
> double free, I wanted to fix it by simpifying the code, not adding conditions
> which make it more complex.
> 
> You know, looking at it, something else just occured to me.  I think this 
> could
> all be fixed without the cancel flag or the pthread_self assignments.  What if
> we simply removed the alarm from the list before we called the callback in
> rte_eal_alarm_callback()?  That way any cancel operation called from within 
> the
> callback would fail, as it wouldn't appear on the list, and the callback
> operation would be the only freeing entity?  That would let you still have a
> fast set and cancel, and avoid the race.  Thoughts?  Untested sample patch
> below
> 
> 
> > >
> > > It also seems like the alarm api as a whole could use some improvement.
> The
> > > way its written right now, theres no way to refer to a specific alarm 
> > > (i.e.
> > > cancelation relies on the specification of a function and data pointer, 
> > > which
> > > may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an 
> > > opaque
> > > handle to a unique timer instance that can be store by a caller and used 
> > > to
> > > specfically cancel that timer?  Thats how both the bsd and linux timer
> > > subsystems model timers.
> >
> > Yeh,  alarm API looks a bit unusual.
> > Though, I suppose that's subject for another patch/discussion :)
> >
> Yes, agreed :)
> 

Please read quoted message bellow:

> >
> >
> > His solution *does* eliminate race condition too.
> >
> I applied his patch. And here is the problem
> 1 rte_spinlock_lock(_list_lk);
> 2 while ((ap = LIST_FIRST(_list)) !=NULL &&
> 3 gettimeofday(, NULL) == 0 &&
> 4 (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
> 5 ap->time.tv_usec <=
> now.tv_usec))){
> 6 ap->executing |= ALARM_EXECUTING;
> 7 if (likely(!(ap->executing & ALARM_CANCELLED))) {
> 8 rte_spinlock_unlock(_list_lk);
> 9   //another thread: rte_alarm_cancel called, mark 
> this timer
> canceled and exit ( THE RACE)
> 10ap->cb_fn(ap->cb_arg); // rte_alarm_set called
> (THE RACE)
> 11
> 12rte_spinlock_lock(_list_lk);
> 13}
> 14
> 15rte_spinlock_lock(_list_lk);
> 16LIST_REMOVE(ap, next);
> 17rte_free(ap);
> 18}
> 
> Imagine
> 
> Thread 1: Thread2
> Execute eal_alarm_callback
> Lock list at 1   rte_alarm_cancel -> block on 
> spinlock
> 
> Realease lock at line 8  rte_alarm_cancel -> resumes 
> execution -> it
> mark this timer canceled
> ap->cb_fn is called at line 10   rte_alarm_cancel 

[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > index 480f0cb..73b6dc5 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> > @@ -64,6 +64,9 @@
> >  #define MS_PER_S 1000
> >  #define US_PER_S (US_PER_MS * MS_PER_S)
> >
> > +#define ALARM_EXECUTING (1 << 0)
> > +#define ALARM_CANCELLED (1 << 1)
> > +
> >  struct alarm_entry {
> > LIST_ENTRY(alarm_entry) next;
> > struct timeval time;
> > @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> > __rte_unused,
> > gettimeofday(, NULL) == 0 &&
> > (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> > now.tv_sec &&
> > ap->time.tv_usec <=
> > now.tv_usec))){
> > -   ap->executing = 1;
> > -   rte_spinlock_unlock(_list_lk);
> 
> Removing unlock here introduce deadlock.

I does no spotted unlocking bellow so above is invalid.

> 
> > +   ap->executing |= ALARM_EXECUTING;
> > +   if (likely(!(ap->executing & ALARM_CANCELLED)) {
> > +   rte_spinlock_unlock(_list_lk);
> >
> > -   ap->cb_fn(ap->cb_arg);
> > +   ap->cb_fn(ap->cb_arg);
> >
> > -   rte_spinlock_lock(_list_lk);
> > +   rte_spinlock_lock(_list_lk);
> > +   }
> > LIST_REMOVE(ap, next);
> > rte_free(ap);
> > }
> > @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > rte_spinlock_lock(_list_lk);
> > /* remove any matches at the start of the list */
> > while ((ap = LIST_FIRST(_list)) != NULL &&
> > -   cb_fn == ap->cb_fn && ap->executing == 0 &&
> > +   cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> > -   LIST_REMOVE(ap, next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > }
> > ap_prev = ap;
> > @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> > void *cb_arg)
> > /* now go through list, removing entries not at start */
> > LIST_FOREACH(ap, _list, next) {
> > /* this won't be true first time through */
> > -   if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> > +   if (cb_fn == ap->cb_fn &&
> > (cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> > {
> > -   LIST_REMOVE(ap,next);
> > -   rte_free(ap);
> > +   ap->executing |= ALARM_CANCELLED;
> > count++;
> > ap = ap_prev;
> > }
> 
> Pawel


[dpdk-dev] [PATCH v2] Change alarm cancel function to thread-safe:

2014-09-26 Thread Wodkowski, PawelX
> Given what you said above, I agree, at least in the current implementation.  
> It
> still seems like theres a simpler solution that doesn't require all the
> comparative gymnastics.

Yes, there is simpler solution, but this solution involve recursive locking.
DPDK recursive spinlocks are no an option in here, so only option is posix 
recursive
mutex, which I think is even worst option than this gymnastics.

> 
> What if, instead of testing if you're the callback thread, we turn the 
> executing
> field of alarm_entry into a bitfield, where bit 0 represents the former
> "executing" state, and bit 1 is defined as a "cancelled" bit.  Then
> rte_eal_alarm_cancel becomes a search that, when an alarm is found simply or's
> in the cancelled bit to the executing bit field.  When the callback thread 
> runs,
> it skips executing any alarm that is marked as cancelled, but frees all alarm
> entries that are executed or cancelled.  That gives us a single point at which
> frees of alarm entires happen?  Something like the patch below (completely
> untested)?
> 
> It also seems like the alarm api as a whole could use some improvement.  The
> way its written right now, theres no way to refer to a specific alarm (i.e.
> cancelation relies on the specification of a function and data pointer, which
> may refer to multiple timers).  Shouldn't rte_eal_alarm_set return an opaque
> handle to a unique timer instance that can be store by a caller and used to
> specfically cancel that timer?  Thats how both the bsd and linux timer
> subsystems model timers.
> 

Goal was to not break user applications that use this library.

> 
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> index 480f0cb..73b6dc5 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_alarm.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_alarm.c
> @@ -64,6 +64,9 @@
>  #define MS_PER_S 1000
>  #define US_PER_S (US_PER_MS * MS_PER_S)
> 
> +#define ALARM_EXECUTING (1 << 0)
> +#define ALARM_CANCELLED (1 << 1)
> +
>  struct alarm_entry {
>   LIST_ENTRY(alarm_entry) next;
>   struct timeval time;
> @@ -107,12 +110,14 @@ eal_alarm_callback(struct rte_intr_handle *hdl
> __rte_unused,
>   gettimeofday(, NULL) == 0 &&
>   (ap->time.tv_sec < now.tv_sec || (ap->time.tv_sec ==
> now.tv_sec &&
>   ap->time.tv_usec <=
> now.tv_usec))){
> - ap->executing = 1;
> - rte_spinlock_unlock(_list_lk);

Removing unlock here introduce deadlock.

> + ap->executing |= ALARM_EXECUTING;
> + if (likely(!(ap->executing & ALARM_CANCELLED)) {
> + rte_spinlock_unlock(_list_lk);
> 
> - ap->cb_fn(ap->cb_arg);
> + ap->cb_fn(ap->cb_arg);
> 
> - rte_spinlock_lock(_list_lk);
> + rte_spinlock_lock(_list_lk);
> + }
>   LIST_REMOVE(ap, next);
>   rte_free(ap);
>   }
> @@ -209,10 +214,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>   rte_spinlock_lock(_list_lk);
>   /* remove any matches at the start of the list */
>   while ((ap = LIST_FIRST(_list)) != NULL &&
> - cb_fn == ap->cb_fn && ap->executing == 0 &&
> + cb_fn == ap->cb_fn &&
>   (cb_arg == (void *)-1 || cb_arg == ap->cb_arg)) {
> - LIST_REMOVE(ap, next);
> - rte_free(ap);
> + ap->executing |= ALARM_CANCELLED;
>   count++;
>   }
>   ap_prev = ap;
> @@ -220,10 +224,9 @@ rte_eal_alarm_cancel(rte_eal_alarm_callback cb_fn,
> void *cb_arg)
>   /* now go through list, removing entries not at start */
>   LIST_FOREACH(ap, _list, next) {
>   /* this won't be true first time through */
> - if (cb_fn == ap->cb_fn &&  ap->executing == 0 &&
> + if (cb_fn == ap->cb_fn &&
>   (cb_arg == (void *)-1 || cb_arg == ap->cb_arg))
> {
> - LIST_REMOVE(ap,next);
> - rte_free(ap);
> + ap->executing |= ALARM_CANCELLED;
>   count++;
>   ap = ap_prev;
>   }

Pawel


[dpdk-dev] [PATCH 2/2] bond: add mode 4 support

2014-09-22 Thread Wodkowski, PawelX
> I think that will work, but I believe you're making it more complicated (and
> less reusable) than it needs to be.  What I think you really need to do is
> create a new rte api call, rte_eal_alarm_cancel_sync (something like the
> equivalent of del_timer_sync in linux, that wraps up the
> while(rte_eal_alarm_cancel(...) == 0) {rte_pause} in its own function (so 
> other
> call sites can use it, as I don't think this is an uncommon problem), Then 
> just
> create a bonding-internal state flag to signal the periodic callback that it
> shouldn't re-arm the timer.  That way all you have to do is set the flag, and
> call rte_eal_alarm_cancel_sync, and you're done.  And other applications will 
> be
> able to handle this common type of operation as well
> 
> Neil

I agree with you that alarms should be upgraded but this need to discussed and 
tested. Now there is not time for that.

Thank you
Pawel


[dpdk-dev] [PATCH 2/2] bond: add mode 4 support

2014-09-19 Thread Wodkowski, PawelX
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Thursday, September 18, 2014 18:03
> To: Wodkowski, PawelX
> Cc: dev at dpdk.org; Jastrzebski, MichalX K; Doherty, Declan
> Subject: Re: [dpdk-dev] [PATCH 2/2] bond: add mode 4 support
> 
> On Thu, Sep 18, 2014 at 08:07:31AM +, Wodkowski, PawelX wrote:
> > > > +int
> > > > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev,
> > > > +   uint8_t slave_pos)
> > > > +{
> > > > +   struct bond_dev_private *internals = 
> > > > bond_dev->data->dev_private;
> > > > +   struct mode8023ad_data *data = >mode4;
> > > > +   struct port *port;
> > > > +   uint8_t i;
> > > > +
> > > > +   bond_mode_8023ad_stop(bond_dev);
> > > > +
> > > > +   /* Exclude slave from transmit policy. If this slave is an 
> > > > aggregator
> > > > +* make all aggregated slaves unselected to force sellection 
> > > > logic
> > > > +* to select suitable aggregator for this port   */
> > > > +   for (i = 0; i < internals->active_slave_count; i++) {
> > > > +   port = >port_list[slave_pos];
> > > > +   if (port->used_agregator_idx == slave_pos) {
> > > > +   port->selected = UNSELECTED;
> > > > +   port->actor_state &= ~(STATE_SYNCHRONIZATION |
> > > STATE_DISTRIBUTING |
> > > > +   STATE_COLLECTING);
> > > > +
> > > > +   /* Use default aggregator */
> > > > +   port->used_agregator_idx = i;
> > > > +   }
> > > > +   }
> > > > +
> > > > +   port = >port_list[slave_pos];
> > > > +   timer_cancel(>current_while_timer);
> > > > +   timer_cancel(>periodic_timer);
> > > > +   timer_cancel(>wait_while_timer);
> > > > +   timer_cancel(>tx_machine_timer);
> > > > +
> > > These all seem rather racy.  Alarm callbacks are executed with the alarm 
> > > list
> > > locks not held.  So there is every possibility that you could execute 
> > > these (or
> > > any timer_cancel calls in this PMD in parallel with the internal state 
> > > machine
> > > timer callback, and leave either with a corrupted timer list (resulting 
> > > from a
> > > double free between here, and the actual callback site),
> >
> > I don't think so. Yes, callbacks are executed with  alarm list locks not 
> > held, but
> > this is not the issue because access to list itself is guarded by lock and
> > ap->executing variable. So list will not be trashed. Check source of
> > eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel().
> >
> Yes, you're right, the list is probably safe wht the executing bit.
> 
> > > or a timer that is
> > > actually still pending when a slave is removed.
> > >
> > This is not the issue also, but problem might be similar. I assumed that 
> > alarms
> > are atomic but when I looked at rte alarms closer I saw a race condition
> > between and rte_eal_alarm_cancel() from  bond_mode_8023ad_stop()
> > and rte_eal_alarm_set() from state machines callback. This need to be
> > reworked in some way.
> 
> Yes, this is what I was referring to:
> 
> CPU0  CPU1
> rte_eal_alarm_callbackbond_8023ad_deactivate_slave
> -bond_8023_ad_periodic_cb timer_cancel
> timer_set
> 
> If those timer functions operate on the same timer, the result is that you can
> leave the stop/deactivate slave paths with a timer function for that slave 
> still
> pending. The bonding mode needs some internal state to serialize those
> operations and determine if the timer should be reactivated.
> 
> Neil

I did rethink the issue and problem is much simpler than it looks like. I did 
the 
following:
1. Change internal state machine alarms to use rte_rdtsc(). This makes all 
 mode 4 internal timer_*() function not affected by any race condition.
2. Do a busy loop when canceling main callback timer until cancel is 
successfull.
This should do the trick about race condition. Do you agree?

Here is part involving timers I have changed:

static void
-timer_expired_cb(void *arg)
+timer_stop(uint64_t *timer)
 {
-   enum timer_state *timer = arg;
-
-   BOND_ASSERT(*timer == TIMER_RUNNING);
-  

[dpdk-dev] [PATCH 2/2] bond: add mode 4 support

2014-09-18 Thread Wodkowski, PawelX
> > +int
> > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev,
> > +   uint8_t slave_pos)
> > +{
> > +   struct bond_dev_private *internals = bond_dev->data->dev_private;
> > +   struct mode8023ad_data *data = >mode4;
> > +   struct port *port;
> > +   uint8_t i;
> > +
> > +   bond_mode_8023ad_stop(bond_dev);
> > +
> > +   /* Exclude slave from transmit policy. If this slave is an aggregator
> > +* make all aggregated slaves unselected to force sellection logic
> > +* to select suitable aggregator for this port   */
> > +   for (i = 0; i < internals->active_slave_count; i++) {
> > +   port = >port_list[slave_pos];
> > +   if (port->used_agregator_idx == slave_pos) {
> > +   port->selected = UNSELECTED;
> > +   port->actor_state &= ~(STATE_SYNCHRONIZATION |
> STATE_DISTRIBUTING |
> > +   STATE_COLLECTING);
> > +
> > +   /* Use default aggregator */
> > +   port->used_agregator_idx = i;
> > +   }
> > +   }
> > +
> > +   port = >port_list[slave_pos];
> > +   timer_cancel(>current_while_timer);
> > +   timer_cancel(>periodic_timer);
> > +   timer_cancel(>wait_while_timer);
> > +   timer_cancel(>tx_machine_timer);
> > +
> These all seem rather racy.  Alarm callbacks are executed with the alarm list
> locks not held.  So there is every possibility that you could execute these 
> (or
> any timer_cancel calls in this PMD in parallel with the internal state machine
> timer callback, and leave either with a corrupted timer list (resulting from a
> double free between here, and the actual callback site),

I don't think so. Yes, callbacks are executed with  alarm list locks not held, 
but 
this is not the issue because access to list itself is guarded by lock and 
ap->executing variable. So list will not be trashed. Check source of 
eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel().

> or a timer that is
> actually still pending when a slave is removed.
> 
This is not the issue also, but problem might be similar. I assumed that alarms
are atomic but when I looked at rte alarms closer I saw a race condition
between and rte_eal_alarm_cancel() from  bond_mode_8023ad_stop()
and rte_eal_alarm_set() from state machines callback. This need to be 
reworked in some way.


[dpdk-dev] [PATCH] ixgbe_rx_scan_hw_ring: Fix initializing id and hash fields in flow director mode.

2014-09-08 Thread Wodkowski, PawelX
> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Friday, August 29, 2014 17:41
> To: dev at dpdk.org
> Cc: Wodkowski, PawelX
> Subject: Re: [dpdk-dev] [PATCH] ixgbe_rx_scan_hw_ring: Fix initializing id and
> hash fields in flow director mode.
> 
> Hi,
> 
> 2014-08-20 08:46, Pawel Wodkowski:
> > When Flow Director was used together with bulk alloc, id and hash was
> swapped
> > when packet matches flow director filter due to improper fdir field
> initialization.
> >
> > Signed-off-by: Pawel Wodkowski 
> 
> I'd really like someone else review this touchy patch for integration
> in 1.7.1.
> 
> Thanks
> --
> Thomas

Reviewed-by: Helin Zhang 
Reviewed-by: Jingjing Wu 


[dpdk-dev] SRIOV mode and different RX and TX configuration

2014-08-11 Thread Wodkowski, PawelX
> Do you mean the configurable number of rx/tx queues in VF? For Niantic,
> hardware just supports only one queue in VF, so there is no flexibility for 
> that.
> For later NICs like i40e, we will have that flexibility.

Yes, you are right but only in when DCB and RSS/TSS are off. When using DCB 
and/or RSS/TSS you have some number TCs/TSS pools x 1 queue.
I am asking if configuring RX side and TX side in different mode is permitted 
and make sense. From Niantic datasheet I did not seen any restrictions on this.
Some part of ixgbe pmd driver code make difference in that and some don't.
I need to unify this in one direction before enabling DCB in SR-IOV mode.

Pawel


[dpdk-dev] About round trip latency with DPDK

2014-07-24 Thread Wodkowski, PawelX
Refer to DPDK getting started guide paragraph 5.4. It might help.
Also it might be easier to do write simple application that send a packet on 
port 1 and rx it on port 2 in separate threads on separate cores (simple 
physical loop). You can then add timestamp and send packet back and see how 
long it will take TX to RX and RX to TX process.

Pawel Wodkowski
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.