[dpdk-dev] [PATCH v2] cryptodev: fix crash on null dereference

2016-12-03 Thread Jerin Jacob
crypodev->data->name will be null when
rte_cryptodev_get_dev_id() invoked without a valid
crypto device instance.

Fixes: d11b0f30df88 ("cryptodev: introduce API and framework for crypto 
devices")

Signed-off-by: Jerin Jacob <jerin.ja...@caviumnetworks.com>
Acked-by: Arek Kusztal <arkadiuszx.kusz...@intel.com>
CC: sta...@dpdk.org
---
 lib/librte_cryptodev/rte_cryptodev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 127e8d0..54e95d5 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -225,13 +225,14 @@ rte_cryptodev_create_vdev(const char *name, const char 
*args)
 }
 
 int
-rte_cryptodev_get_dev_id(const char *name) {
+rte_cryptodev_get_dev_id(const char *name)
+{
unsigned i;
 
if (name == NULL)
return -1;
 
-   for (i = 0; i < rte_cryptodev_globals->max_devs; i++)
+   for (i = 0; i < rte_cryptodev_globals->nb_devs; i++)
if ((strcmp(rte_cryptodev_globals->devs[i].data->name, name)
== 0) &&
(rte_cryptodev_globals->devs[i].attached ==
-- 
2.5.5



Re: [dpdk-dev] [RFC PATCH] eventdev: add buffered enqueue and flush APIs

2016-12-02 Thread Jerin Jacob
On Fri, Dec 02, 2016 at 01:45:56PM -0600, Gage Eads wrote:
> This commit adds buffered enqueue functionality to the eventdev API.
> It is conceptually similar to the ethdev API's tx buffering, however
> with a smaller API surface and no dropping of events.

Hello Gage,

Different implementation may have different strategies to hold the buffers.
and some does not need to hold the buffers if it is DDR backed.
IHMO, This may not be the candidate for common code. I guess you can move this
to driver side and abstract under SW driver's enqueue_burst.


> 
> Signed-off-by: Gage Eads 
> ---
>  lib/librte_eventdev/rte_eventdev.c |  29 ++
>  lib/librte_eventdev/rte_eventdev.h | 106 
> +
>  2 files changed, 135 insertions(+)
> 
> diff --git a/lib/librte_eventdev/rte_eventdev.c 
> b/lib/librte_eventdev/rte_eventdev.c
> index 17ce5c3..564573f 100644
> --- a/lib/librte_eventdev/rte_eventdev.c
> +++ b/lib/librte_eventdev/rte_eventdev.c
> @@ -219,6 +219,7 @@
>   uint16_t *links_map;
>   uint8_t *ports_dequeue_depth;
>   uint8_t *ports_enqueue_depth;
> + struct rte_eventdev_enqueue_buffer *port_buffers;
>   unsigned int i;
>  
>   EDEV_LOG_DEBUG("Setup %d ports on device %u", nb_ports,
> @@ -272,6 +273,19 @@
>   "nb_ports %u", nb_ports);
>   return -(ENOMEM);
>   }
> +
> + /* Allocate memory to store port enqueue buffers */
> + dev->data->port_buffers =
> + rte_zmalloc_socket("eventdev->port_buffers",
> + sizeof(dev->data->port_buffers[0]) * nb_ports,
> + RTE_CACHE_LINE_SIZE, dev->data->socket_id);
> + if (dev->data->port_buffers == NULL) {
> + dev->data->nb_ports = 0;
> + EDEV_LOG_ERR("failed to get memory for port enq"
> +  " buffers, nb_ports %u", nb_ports);
> + return -(ENOMEM);
> + }
> +
>   } else if (dev->data->ports != NULL && nb_ports != 0) {/* re-config */
>   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->port_release, -ENOTSUP);
>  
> @@ -279,6 +293,7 @@
>   ports_dequeue_depth = dev->data->ports_dequeue_depth;
>   ports_enqueue_depth = dev->data->ports_enqueue_depth;
>   links_map = dev->data->links_map;
> + port_buffers = dev->data->port_buffers;
>  
>   for (i = nb_ports; i < old_nb_ports; i++)
>   (*dev->dev_ops->port_release)(ports[i]);
> @@ -324,6 +339,17 @@
>   return -(ENOMEM);
>   }
>  
> + /* Realloc memory to store port enqueue buffers */
> + port_buffers = rte_realloc(dev->data->port_buffers,
> + sizeof(dev->data->port_buffers[0]) * nb_ports,
> + RTE_CACHE_LINE_SIZE);
> + if (port_buffers == NULL) {
> + dev->data->nb_ports = 0;
> + EDEV_LOG_ERR("failed to realloc mem for port enq"
> +  " buffers, nb_ports %u", nb_ports);
> + return -(ENOMEM);
> + }
> +
>   if (nb_ports > old_nb_ports) {
>   uint8_t new_ps = nb_ports - old_nb_ports;
>  
> @@ -336,12 +362,15 @@
>   memset(links_map +
>   (old_nb_ports * RTE_EVENT_MAX_QUEUES_PER_DEV),
>   0, sizeof(ports_enqueue_depth[0]) * new_ps);
> + memset(port_buffers + old_nb_ports, 0,
> + sizeof(port_buffers[0]) * new_ps);
>   }
>  
>   dev->data->ports = ports;
>   dev->data->ports_dequeue_depth = ports_dequeue_depth;
>   dev->data->ports_enqueue_depth = ports_enqueue_depth;
>   dev->data->links_map = links_map;
> + dev->data->port_buffers = port_buffers;
>   } else if (dev->data->ports != NULL && nb_ports == 0) {
>   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->port_release, -ENOTSUP);
>  
> diff --git a/lib/librte_eventdev/rte_eventdev.h 
> b/lib/librte_eventdev/rte_eventdev.h
> index 778d6dc..3f24342 100644
> --- a/lib/librte_eventdev/rte_eventdev.h
> +++ b/lib/librte_eventdev/rte_eventdev.h
> @@ -246,6 +246,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define EVENTDEV_NAME_SKELETON_PMD event_skeleton
>  /**< Skeleton event device PMD name */
> @@ -965,6 +966,26 @@ typedef uint16_t (*event_dequeue_burst_t)(void *port, 
> struct rte_event ev[],
>  #define RTE_EVENTDEV_NAME_MAX_LEN(64)
>  /**< @internal Max length of name of event PMD */
>  
> +#define RTE_EVENT_BUF_MAX 16
> +/**< Maximum number of events in an enqueue buffer. */
> +
> +/**
> + * @internal
> + * An enqueue buffer for each port.
> + *
> + * The reason this struct is in the header is for 

Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-01 Thread Jerin Jacob
On Thu, Dec 01, 2016 at 09:58:31AM +0100, Thomas Monjalon wrote:
> 2016-12-01 08:15, Adrien Mazarguil:
> > I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() will
> > remain optional for long. Sure, PMDs that do not implement it do not care,
> > I'm focusing on applications, for which the performance impact of calling
> > tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> > performing all the necessary preparation at once.
> 
> I agree that tx_prepare() should become mandatory shortly.

I agree. The tx_prepare has to be mandatory. Application will have no
idea on how PMD drivers use this hook to fix up PMD tx side limitations.
On other side, if it turns out to be mandatory, what real benefit it is
going to have compared to existing scheme of just tx_burst.

> 
> > [...]
> > > > Following the same logic, why can't such a thing be made part of the TX
> > > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > > whenever necessary). From an application standpoint, what are the 
> > > > advantages
> > > > of having to:
> > > > 
> > > >  if (tx_prep()) // iterate and update mbufs as needed
> > > >  tx_burst(); // iterate and send
> > > > 
> > > > Compared to:
> > > > 
> > > >  tx_burst(); // iterate, update as needed and send
> > > 
> > > I think that was discussed extensively quite a lot previously here:
> > > As Thomas already replied - main motivation is to allow user
> > > to execute them on different stages of packet TX pipeline,
> > > and probably on different cores.
> > > I think that provides better flexibility to the user to when/where
> > > do these preparations and hopefully would lead to better performance.
> > 
> > And I agree, I think this use case is valid but does not warrant such a high
> > penalty when your application does not need that much flexibility. Simple
> > (yet conscious) applications need the highest performance. Complex ones as
> > you described already suffer quite a bit from IPCs and won't mind a couple
> > of extra CPU cycles right?
> > 
> > Yes they will, therefore we need a method that satisfies both cases.
> > 
> > As a possible solution, a special mbuf flag could be added to each mbuf
> > having gone through tx_prepare(). That way, tx_burst() could skip some
> > checks and things it would otherwise have done.
> 
> I like this idea!
> 


[dpdk-dev] [PATCH v12 0/6] add Tx preparation

2016-12-01 Thread Jerin Jacob
On Wed, Nov 30, 2016 at 07:26:36PM +0100, Thomas Monjalon wrote:
> 2016-11-30 17:42, Ananyev, Konstantin:
> > > >Please, we need a comment for each driver saying
> > > >"it is OK, we do not need any checksum preparation for TSO"
> > > >or
> > > >"yes we have to implement tx_prepare or TSO will not work in this mode"
> > > >
> > > 
> > > qede PMD doesn?t currently support TSO yet, it only supports Tx TCP/UDP/IP
> > > csum offloads.
> > > So Tx preparation isn?t applicable. So as of now -
> > > "it is OK, we do not need any checksum preparation for TSO"
> > 
> > Thanks for the answer.
> > Though please note that it not only for TSO.
> 
> Oh yes, sorry, my wording was incorrect.
> We need to know if any checksum preparation is needed prior
> offloading its final computation to the hardware or driver.
> So the question applies to TSO and simple checksum offload.
> 
> We are still waiting answers for
>   bnxt, cxgbe, ena, nfp, thunderx, virtio and vmxnet3.

The thunderx devices don't need pseudo header checksum
in the packet for TSO or TX checksum offload. So..
"it is OK, we do not need any checksum preparation for TSO"

> 
> > This is for any TX offload for which the upper layer SW would have
> > to modify the contents of the packet.
> > Though as I can see for qede neither PKT_TX_IP_CKSUM or PKT_TX_TCP_CKSUM
> > exhibits any extra requirements for the user.
> > Is that correct?
> 


[dpdk-dev] [PATCH v2] i40e: Fix eth_i40e_dev_init sequence on ThunderX

2016-12-01 Thread Jerin Jacob
On Wed, Nov 30, 2016 at 05:52:02PM +, Ananyev, Konstantin wrote:
> Hi Jerin,

Hi Konstantin,

> 
> > 
> > On Tue, Nov 22, 2016 at 01:46:54PM +, Bruce Richardson wrote:
> > > On Tue, Nov 22, 2016 at 03:46:38AM +0530, Jerin Jacob wrote:
> > > > On Sun, Nov 20, 2016 at 11:21:43PM +, Ananyev, Konstantin wrote:
> > > > Hi
> > > > > >
> > > > > > i40e_asq_send_command: rd32 & wr32 under ThunderX gives 
> > > > > > unpredictable
> > > > > >results. To solve this include rte memory 
> > > > > > barriers
> > > > > >
> > > > > > Signed-off-by: Satha Rao 
> > > > > > ---
> > > > > >  drivers/net/i40e/base/i40e_osdep.h | 14 ++
> > > > > >  1 file changed, 14 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/net/i40e/base/i40e_osdep.h 
> > > > > > b/drivers/net/i40e/base/i40e_osdep.h
> > > > > > index 38e7ba5..ffa3160 100644
> > > > > > --- a/drivers/net/i40e/base/i40e_osdep.h
> > > > > > +++ b/drivers/net/i40e/base/i40e_osdep.h
> > > > > > @@ -158,7 +158,13 @@ do {   
> > > > > >  \
> > > > > > ((volatile uint32_t *)((char *)(a)->hw_addr + (reg)))
> > > > > >  static inline uint32_t i40e_read_addr(volatile void *addr)
> > > > > >  {
> > > > > > +#if defined(RTE_ARCH_ARM64)
> > > > > > +   uint32_t val = rte_le_to_cpu_32(I40E_PCI_REG(addr));
> > > > > > +   rte_rmb();
> > > > > > +   return val;
> > > > >
> > > > > If you really need an rmb/wmb with MMIO read/writes on ARM,
> > > > > I think you can avoid #ifdefs here and use rte_smp_rmb/rte_smp_wmb.
> > > > > BTW, I suppose if you need it for i40e, you would need it for other 
> > > > > devices too.
> > > >
> > > > Yes. ARM would need for all devices(typically, the devices on external 
> > > > PCI bus).
> > > > I guess rte_smp_rmb may not be the correct abstraction. So we need more 
> > > > of
> > > > rte_rmb() as we need only non smp variant on IO side. I guess then it 
> > > > make sense to
> > > > create new abstraction in eal with following variants so that each arch
> > > > gets opportunity to make what it makes sense that specific platform
> > > >
> > > > rte_readb_relaxed
> > > > rte_readw_relaxed
> > > > rte_readl_relaxed
> > > > rte_readq_relaxed
> > > > rte_writeb_relaxed
> > > > rte_writew_relaxed
> > > > rte_writel_relaxed
> > > > rte_writeq_relaxed
> > > > rte_readb
> > > > rte_readw
> > > > rte_readl
> > > > rte_readq
> > > > rte_writeb
> > > > rte_writew
> > > > rte_writel
> > > > rte_writeq
> > > >
> > > > Thoughts ?
> > > >
> > >
> > > That seems like a lot of API calls!
> > > Perhaps you can clarify - why would the rte_smp_rmb() not work for you?
> > 
> > Currently arm64 mapped DMB as rte_smp_rmb() for smp case.
> > 
> > Ideally for io barrier and non smp case, we need to map it as DSB and it is
> > bit heavier than DMB
> 
> Ok, so you need some new macro, like rte_io_(r|w)mb or so, that would expand 
> into dmb
> for ARM,  correct?

The io barrier expands to dsb.
http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L110

> 
> > 
> > The linux kernel arm64 mappings
> > http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L142
> > 
> > DMB vs DSB
> > https://community.arm.com/thread/3833
> > 
> > The relaxed one are without any barriers.(the use case like accessing on
> > chip peripherals may need only relaxed versions)
> > 
> > Thoughts on new rte EAL abstraction?
> 
> Looks like a lot of macros but if you guys think that would help - NP with 
> that :)

I don't have strong opinion here. If there is concern on a lot of macros
then, I can introduce only "rte_io_(r|w)mb" instead of 
read[b|w|l|q]/write[b|w|l|q]/relaxed.
let me know?

> Again, in that case we probably can get rid of driver specific pci reg 
> read/write defines.
Yes. But, That's going to have a lot of change :-(

If there is no objection then I will introduce
"read[b|w|l|q]/write[b|w|l|q]/relaxed" and then change all external pcie drivers
with new macros.

> 
> Konstantin
> 
> > 
> > >
> > > /Bruce


[dpdk-dev] [PATCH] cryptodev: fix crash on null dereference

2016-12-01 Thread Jerin Jacob
On Wed, Nov 30, 2016 at 03:10:14PM +, De Lara Guarch, Pablo wrote:
> Hi Jerin,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, November 15, 2016 7:12 PM
> > To: dev at dpdk.org
> > Cc: Doherty, Declan; Jerin Jacob
> > Subject: [dpdk-dev] [PATCH] cryptodev: fix crash on null dereference
> > 
> > crypodev->data->name will be null when
> > rte_cryptodev_get_dev_id() invoked without a valid
> > crypto device instance.
> > 
> > Signed-off-by: Jerin Jacob 
> 
> Could you add a "Fixes" line? 

Sure. I will send the v2 then

> 
> Thanks,
> Pablo


[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-29 Thread Jerin Jacob
On Mon, Nov 28, 2016 at 09:16:10AM +, Bruce Richardson wrote:
> On Sat, Nov 26, 2016 at 08:24:55AM +0530, Jerin Jacob wrote:
> > On Fri, Nov 25, 2016 at 11:00:53AM +, Bruce Richardson wrote:
> > > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > > > +M: Jerin Jacob 
> > > > > > > > +F: lib/librte_eventdev/
> > > > > > > 
> > > > 
> > > > I don't think there is any portability issue here, I can explain.
> > > > 
> > > > The application level, we have two more use case to deal with non burst
> > > > variant
> > > > 
> > > > - latency critical work
> > > > - on dequeue, if application wants to deal with only one flow(i.e to
> > > >   avoid processing two different application flows to avoid cache 
> > > > trashing)
> > > > 
> > > > Selection of the burst variants will be based on
> > > > rte_event_dev_info_get() and rte_event_dev_configure()(see, 
> > > > max_event_port_dequeue_depth,
> > > > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, 
> > > > nb_event_port_enqueue_depth )
> > > > So I don't think their is portability issue here and I don't want to 
> > > > waste my
> > > > CPU cycles on the for loop if application known to be working with non
> > > > bursts variant like below
> > > > 
> > > 
> > > If the application is known to be working on non-burst varients, then
> > > they always request a burst-size of 1, and skip the loop completely.
> > > There is no extra performance hit in that case in either the app or the
> > > driver (since the non-burst driver always returns 1, irrespective of the
> > > number requested).
> > 
> > Hmm. I am afraid, There is.
> > On the app side, the const "1" can not be optimized by the compiler as
> > on downside it is function pointer based driver interface
> > On the driver side, the implementation would be for loop based instead
> > of plain access.
> > (compiler never can see the const "1" in driver interface)
> > 
> > We are planning to implement burst mode as kind of emulation mode and
> > have a different scheme for burst and nonburst. The similar approach we have
> > taken in introducing rte_event_schedule() and split the responsibility so
> > that SW driver can work without additional performance overhead and neat
> > driver interface.
> > 
> > If you are concerned about the usability part and regression on the SW
> > driver, then it's not the case, application will use nonburst variant only 
> > if
> > dequeue_depth == 1 and/or explicit case where latency matters.
> > 
> > On the portability side, we support both case and application if written 
> > based
> > on dequeue_depth it will perform well in both implementations.IMO, There is
> > no another shortcut for performance optimized application running on 
> > different
> > set of model.I think it is not an issue as, in event model as each cores
> > identical and main loop can be changed based on dequeue_depth
> > if needs performance(anyway mainloop will be function pointer based).
> > 
> 
> Ok, I think I see your point now. Here is an alternative suggestion.
> 
> 1. Keep the single user API.
> 2. Have both single and burst function pointers in the driver
> 3. Call appropriately in the eventdev layer based on parameters. For
> example:
> 
> rte_event_dequeue_burst(..., int num)
> {
>   if (num == 1 && single_dequeue_fn != NULL)
>   return single_dequeue_fn(...);
>   return burst_dequeue_fn(...);
> }
> 
> This way drivers can optionally special-case the single dequeue case -
> the function pointer check will definitely be predictable in HW making
> that a near-zero-cost check - while not forcing all drivers to do so.
> It also reduces the public API surface, and gives us a single enqueue
> and dequeue function.

The alternative suggestion looks good to me. Yes, it makes sense to reduces the
public API interface if possible.

Regarding the implementation, I thought to have a bit approach like below
to reduce the cost of additional AND operation.(with const "1", compiler
can choose with correct one with out any overhead)

rte_event_dequeue_burst(..., int num)
{
if (num == 1)
return single_dequeue_fn(...);
return burst_dequeue_fn(...);
}

"single_dequeue_fn" populated from the driver layer.
In the absence of populating the "single_dequeue_fn" from the driver layer,
The common code can create the single_dequeue_fn using driver
provided "burst_dequeue_fn"

something like
generic_single_dequeue_fn(dev){
{
dev->burst_dequeue_fn(..,1);
}

Any concerns?

> 
> /Bruce
> 


[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-29 Thread Jerin Jacob
On Mon, Nov 28, 2016 at 03:53:08PM +, Eads, Gage wrote:
> (Bruce's adviced heeded :))
> 
> >  -Original Message-
> >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  Sent: Tuesday, November 22, 2016 5:44 PM
> >  To: Eads, Gage 
> >  Cc: dev at dpdk.org; Richardson, Bruce ; Van
> >  Haaren, Harry ; hemant.agrawal at nxp.com
> >  Subject: Re: [dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs
> >  
> >  On Tue, Nov 22, 2016 at 10:48:32PM +, Eads, Gage wrote:
> >  >
> >  >
> >  > >  -Original Message-
> >  > >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  > >  Sent: Tuesday, November 22, 2016 2:00 PM
> >  > >  To: Eads, Gage 
> >  > >  Cc: dev at dpdk.org; Richardson, Bruce  > intel.com>;
> >  > > Van  Haaren, Harry ;
> >  > > hemant.agrawal at nxp.com
> >  > >  Subject: Re: [dpdk-dev] [PATCH 2/4] eventdev: implement the
> >  > > northbound APIs
> >  > >
> >  > >  On Tue, Nov 22, 2016 at 07:43:03PM +, Eads, Gage wrote:
> >  > >  > >  > >  > > One open issue I noticed is the "typical workflow"
> >  > >  > > description starting in  > >  rte_eventdev.h:204 conflicts with
> >  > > the  > > centralized software PMD that Harry  > >  posted last week.
> >  > >  > > Specifically, that PMD expects a single core to call the  > >
> >  > > > > schedule function. We could extend the documentation to account
> >  > > for  > > this  > >  alternative style of scheduler invocation, or
> >  > > discuss  > > ways to make the  software  > >  PMD work with the
> >  > > documented  > > workflow. I prefer the former, but either  way I  >
> >  > > >  think we  > > ought to expose the scheduler's expected usage to
> >  > > the user --  > > perhaps  > >  through an RTE_EVENT_DEV_CAP flag?
> >  > >  > >  > >  >
> >  > >  > >  > >  > I prefer former too, you can propose the documentation
> >  > > > > change required  for  > >  software PMD.
> >  > >  > >  >
> >  > >  > >  > Sure, proposal follows. The "typical workflow" isn't the
> >  > > most  > > optimal by  having a conditional in the fast-path, of
> >  > > course, but it  > > demonstrates the idea  simply.
> >  > >  > >  >
> >  > >  > >  > (line 204)
> >  > >  > >  >  * An event driven based application has following typical
> >  > > > > workflow on  > >  fastpath:
> >  > >  > >  >  * \code{.c}
> >  > >  > >  >  *  while (1) {
> >  > >  > >  >  *
> >  > >  > >  >  *  if (dev_info.event_dev_cap &
> >  > >  > >  >  *  RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED)
> >  > >  > >  >  *  rte_event_schedule(dev_id);
> >  > >  > >
> >  > >  > >  Yes, I like the idea of RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED.
> >  > >  > >  It  can be input to application/subsystem to  launch separate
> >  > > > > core(s) for schedule functions.
> >  > >  > >  But, I think, the "dev_info.event_dev_cap &  > >
> >  > > RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED"
> >  > >  > >  check can be moved inside the implementation(to make the
> >  > > better  > > decisions  and  avoiding consuming cycles on HW based
> >  schedulers.
> >  > >  >
> >  > >  > How would this check work? Wouldn't it prevent any core from
> >  > > running the  software scheduler in the centralized case?
> >  > >
> >  > >  I guess you may not need RTE_EVENT_DEV_CAP here, instead need flag
> >  > > for  device configure here
> >  > >
> >  > >  #define RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED (1ULL << 1)
> >  > >
> >  > >  struct rte_event_dev_config config;  config.event_dev_cfg =
> >  > > RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED;
> >  > >  rte_event_dev_configure(.., );
> >  > >
> >  > >  on the driver side on configure,
> >  > >  if (config.event_dev_cfg & RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED)
> >  > >eventdev->schedule = NULL;
>

[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-29 Thread Jerin Jacob
On Mon, Nov 28, 2016 at 03:53:08PM +, Eads, Gage wrote:
> (Bruce's adviced heeded :))
> 
> >  > >  >
> >  > >  > How would this check work? Wouldn't it prevent any core from
> >  > > running the  software scheduler in the centralized case?
> >  > >
> >  > >  I guess you may not need RTE_EVENT_DEV_CAP here, instead need flag
> >  > > for  device configure here
> >  > >
> >  > >  #define RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED (1ULL << 1)
> >  > >
> >  > >  struct rte_event_dev_config config;  config.event_dev_cfg =
> >  > > RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED;
> >  > >  rte_event_dev_configure(.., );
> >  > >
> >  > >  on the driver side on configure,
> >  > >  if (config.event_dev_cfg & RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED)
> >  > >eventdev->schedule = NULL;
> >  > >  else // centralized case
> >  > >eventdev->schedule = your_centrized_schedule_function;
> >  > >
> >  > >  Does that work?
> >  >
> >  > Hm, I fear the API would give users the impression that they can select 
> > the
> >  scheduling behavior of a given eventdev, when a software scheduler is more
> >  likely to be either distributed or centralized -- not both.
> >  
> >  Even if it is capability flag then also it is per "device". Right ?
> >  capability flag is more of read only too. Am i missing something here?
> >  
> 
> Correct, the capability flag I'm envisioning is per-device and read-only. 
> 
> >  >
> >  > What if we use the capability flag, and define rte_event_schedule() as 
> > the
> >  scheduling function for centralized schedulers and rte_event_dequeue() as 
> > the
> >  scheduling function for distributed schedulers? That way, the datapath 
> > could be
> >  the simple dequeue -> process -> enqueue. Applications would check the
> >  capability flag at configuration time to decide whether or not to launch an
> >  lcore that calls rte_event_schedule().
> >  
> >  I am all for simple "dequeue -> process -> enqueue".
> >  rte_event_schedule() added for SW scheduler only,  now it may not make 
> > sense
> >  to add one more check on top of "rte_event_schedule()" to see it is really 
> > need
> >  or not in fastpath?
> >  
> 
> Yes, the additional check shouldn't be needed. In terms of the 'typical 
> workflow' description, this is what I have in mind:
> 
> *
>  * An event driven based application has following typical workflow on 
> fastpath:
>  * \code{.c}
>  *  while (1) {
>  *
>  *  rte_event_dequeue(...);
>  *
>  *  (event processing)
>  *
>  *  rte_event_enqueue(...);
>  *  }
>  * \endcode
>  *
>  * The point at which events are scheduled to ports depends on the device. For
>  * hardware devices, scheduling occurs asynchronously. Software schedulers can
>  * either be distributed (each worker thread schedules events to its own port)
>  * or centralized (a dedicated thread schedules to all ports). Distributed
>  * software schedulers perform the scheduling in rte_event_dequeue(), whereas
>  * centralized scheduler logic is located in rte_event_schedule(). The
>  * RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED capability flag indicates whether a
>  * device is centralized and thus needs a dedicated scheduling thread that
>  * repeatedly calls rte_event_schedule().

Makes sense. I will change the existing schedule description to the
proposed one and add RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED capability flag
in v2.

Thanks Gage.
>  *
>  */


[dpdk-dev] [PATCH] tools: add tags and cscope index file generation support

2016-11-27 Thread Jerin Jacob
This script generates cscope, gtags, and tags
index files based on EAL environment.
(architecture and OS(linux/bsd))

Selection of the architecture and OS environment
is based on dpdk configuration target(T=)

example usage:
make tags T=x86_64-native-linuxapp-gcc
make cscope T=x86_64-native-linuxapp-gcc
make gtags T=x86_64-native-linuxapp-gcc

Signed-off-by: Jerin Jacob 
---
 .gitignore|   8 ++
 mk/rte.sdkroot.mk |   4 +
 scripts/tags.sh   | 251 ++
 3 files changed, 263 insertions(+)
 create mode 100755 scripts/tags.sh

diff --git a/.gitignore b/.gitignore
index a722abe..76bcae2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1 +1,9 @@
 doc/guides/nics/overview_table.txt
+cscope.out.po
+cscope.out.in
+cscope.out
+cscope.files
+GTAGS
+GPATH
+GRTAGS
+tags
diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
index 04ad523..de6355a 100644
--- a/mk/rte.sdkroot.mk
+++ b/mk/rte.sdkroot.mk
@@ -92,6 +92,10 @@ default: all
 config showconfigs showversion showversionum:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk $@

+.PHONY: cscope gtags tags
+cscope gtags tags:
+   $(Q)$(RTE_SDK)/scripts/tags.sh $@
+
 .PHONY: test fast_test ring_test mempool_test perf_test coverage
 test fast_test ring_test mempool_test perf_test coverage:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdktest.mk $@
diff --git a/scripts/tags.sh b/scripts/tags.sh
new file mode 100755
index 000..82c1a2a
--- /dev/null
+++ b/scripts/tags.sh
@@ -0,0 +1,251 @@
+#!/bin/bash
+# Generate tags or gtags or cscope files
+# Usage tags.sh  T= [VERBOSE=1]
+# set -x
+
+verbose=false
+linuxapp=false
+bsdapp=false
+x86_64=false
+arm=false
+arm64=false
+ia_32=false
+ppc_64=false
+tile=false
+
+if [ "$VERBOSE" = "1" ]; then
+   verbose=true
+fi
+
+#ignore version control files
+ignore="( -name .svn -o -name CVS -o -name .hg -o -name .git ) -prune -o"
+
+source_dirs="app buildtools drivers examples lib"
+
+skip_bsd="( -name bsdapp ) -prune -o"
+skip_linux="( -name linuxapp ) -prune -o"
+skip_arch="( -name arch ) -prune -o"
+skip_sse="( -name *_sse*.[chS] ) -prune -o"
+skip_avx="( -name *_avx*.[chS] ) -prune -o"
+skip_neon="( -name *_neon*.[chS] ) -prune -o"
+skip_altivec="( -name *_altivec*.[chS] ) -prune -o"
+skip_arm64="( -name *arm64*.[chS] ) -prune -o"
+skip_x86="( -name *x86*.[chS] ) -prune -o"
+skip_32b_files="( -name *_32.h ) -prune -o"
+skip_64b_files="( -name *_64.h ) -prune -o"
+
+skiplist="${skip_bsd} ${skip_linux} ${skip_arch} ${skip_sse} ${skip_avx} \
+${skip_neon} ${skip_altivec} ${skip_x86} ${skip_arm64}"
+
+find_sources()
+{
+   find $1 $ignore $3 -name $2 -not -type l -print
+}
+
+common_sources()
+{
+   find_sources "${source_dirs}" '*.[chS]' "$skiplist"
+}
+
+linuxapp_sources()
+{
+   find_sources "lib/librte_eal/linuxapp" '*.[chS]'
+}
+
+bsdapp_sources()
+{
+   find_sources "lib/librte_eal/bsdapp" '*.[chS]'
+}
+
+arm_common()
+{
+   find_sources "lib/librte_eal/common/arch/arm" '*.[chS]'
+   find_sources "${source_dirs}" '*neon*.[chS]'
+}
+
+arm_sources()
+{
+   arm_common
+   find_sources "lib/librte_eal/common/include/arch/arm" '*.[chS]' \
+   "$skip_64b_files"
+}
+
+arm64_sources()
+{
+   arm_common
+   find_sources "lib/librte_eal/common/include/arch/arm" '*.[chS]' \
+"$skip_32b_files"
+   find_sources "${source_dirs}" '*arm64.[chS]'
+}
+
+ia_common()
+{
+   find_sources "lib/librte_eal/common/arch/x86" '*.[chS]'
+
+   find_sources "examples/performance-thread/common/arch/x86" '*.[chS]'
+   find_sources "${source_dirs}" '*_sse*.[chS]'
+   find_sources "${source_dirs}" '*_avx*.[chS]'
+   find_sources "${source_dirs}" '*x86.[chS]'
+}
+
+i686_sources()
+{
+   ia_common
+   find_sources "lib/librte_eal/common/include/arch/x86" '*.[chS]' \
+   "$skip_64b_files"
+}
+
+x86_64_sources()
+{
+   ia_common
+   find_sources "lib/librte_eal/common/include/arch/x86" '*.[chS]' \
+   "$skip_32b_files"
+}
+
+ppc64_sources()
+{
+   find_sources "lib/librte_eal/common/arch/ppc_64" '*.[chS]'
+   find_sources "lib/librte_eal/common/include/arch/ppc_64" '*.[chS]'
+   find_sources "${source_dirs}" '*altivec*.[chS]'
+}
+
+tile_sources()
+{
+   find_sources "lib/librte_eal/common/arch/tile" '*.[chS]'
+   find_sources "lib/librte_eal/common/include/arch/tile" '*.[chS]'
+}
+
+config_file()
+{
+   if [ -f $RTE_OUTPUT/include/rte_conf

[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-26 Thread Jerin Jacob
On Fri, Nov 25, 2016 at 11:00:53AM +, Bruce Richardson wrote:
> On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > 2016-11-24 07:29, Jerin Jacob:
> > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > +Eventdev API - EXPERIMENTAL
> > > > > > +M: Jerin Jacob 
> > > > > > +F: lib/librte_eventdev/
> > > > > 
> > 
> > I don't think there is any portability issue here, I can explain.
> > 
> > The application level, we have two more use case to deal with non burst
> > variant
> > 
> > - latency critical work
> > - on dequeue, if application wants to deal with only one flow(i.e to
> >   avoid processing two different application flows to avoid cache trashing)
> > 
> > Selection of the burst variants will be based on
> > rte_event_dev_info_get() and rte_event_dev_configure()(see, 
> > max_event_port_dequeue_depth,
> > max_event_port_enqueue_depth, nb_event_port_dequeue_depth, 
> > nb_event_port_enqueue_depth )
> > So I don't think their is portability issue here and I don't want to waste 
> > my
> > CPU cycles on the for loop if application known to be working with non
> > bursts variant like below
> > 
> 
> If the application is known to be working on non-burst varients, then
> they always request a burst-size of 1, and skip the loop completely.
> There is no extra performance hit in that case in either the app or the
> driver (since the non-burst driver always returns 1, irrespective of the
> number requested).

Hmm. I am afraid, There is.
On the app side, the const "1" can not be optimized by the compiler as
on downside it is function pointer based driver interface
On the driver side, the implementation would be for loop based instead
of plain access.
(compiler never can see the const "1" in driver interface)

We are planning to implement burst mode as kind of emulation mode and
have a different scheme for burst and nonburst. The similar approach we have
taken in introducing rte_event_schedule() and split the responsibility so
that SW driver can work without additional performance overhead and neat
driver interface.

If you are concerned about the usability part and regression on the SW
driver, then it's not the case, application will use nonburst variant only if
dequeue_depth == 1 and/or explicit case where latency matters.

On the portability side, we support both case and application if written based
on dequeue_depth it will perform well in both implementations.IMO, There is
no another shortcut for performance optimized application running on different
set of model.I think it is not an issue as, in event model as each cores
identical and main loop can be changed based on dequeue_depth
if needs performance(anyway mainloop will be function pointer based).

> 
> > nb_events = rte_event_dequeue_burst();
> > for(i=0; i < nb_events; i++){
> > process ev[i]
> > }
> > 
> > And mostly importantly the NPU can get almost same throughput
> > without burst variant so why not?
> > 
> > > 
> > > > > > +/**
> > > > > > + * Converts nanoseconds to *wait* value for rte_event_dequeue()
> > > > > > + *
> > > > > > + * If the device is configured with 
> > > > > > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
> > > > > > + * application can use this function to convert wait value in 
> > > > > > nanoseconds to
> > > > > > + * implementations specific wait value supplied in 
> > > > > > rte_event_dequeue()
> > > > > 
> > > > > Why is it implementation-specific?
> > > > > Why this conversion is not internal in the driver?
> > > > 
> > > > This is for performance optimization, otherwise in drivers
> > > > need to convert ns to ticks in "fast path"
> > > 
> > > So why not defining the unit of this timeout as CPU cycles like the ones
> > > returned by rte_get_timer_cycles()?
> > 
> > Because HW co-processor can run in different clock domain. Need not be at
> > CPU frequency.
> > 
> While I've no huge objection to this API, since it will not be
> implemented by our SW implementation, I'm just curious as to how much
> having this will save. How complicated is the arithmetic that needs to
> be done, and how many cycles on your platform is that going to take?

one load, division and/or multiplication of (floating) numbers. I could be
6isl cycles or more, but it matters when burst size is less(worst case 1).
I think the software implementation could use rte_get_timer_cycles() here
if required.I think there is no harm in moving some-work in slow-path if it
can be, like this case.



[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-26 Thread Jerin Jacob
On Fri, Nov 25, 2016 at 02:09:22PM +0100, Thomas Monjalon wrote:
> 2016-11-25 11:00, Bruce Richardson:
> > On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
> > > On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> > > > 2016-11-24 07:29, Jerin Jacob:
> > > > > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > > > > 2016-11-18 11:14, Jerin Jacob:
> > > > > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > > > > +/**< Skeleton event device PMD name */
> > > > > > 
> > > > > > I do not understand this #define.
> > > > > 
> > > > > Applications can explicitly request the a specific driver though 
> > > > > driver
> > > > > name. This will go as argument to rte_event_dev_get_dev_id(const char 
> > > > > *name).
> > > > > The reason for keeping this #define in rte_eventdev.h is that,
> > > > > application needs to include only rte_eventdev.h not 
> > > > > rte_eventdev_pmd.h.
> > > > 
> > > > So each driver must register its name in the API?
> > > > Is it really needed?
> > > 
> > > Otherwise how application knows the name of the driver.
> > > The similar scheme used in cryptodev.
> > > http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
> > > No strong opinion here. Open for suggestions.
> > > 
> > 
> > I like having a name registered. I think we need a scheme where an app
> > can find and use an implementation using a specific driver.
> 
> I do not like having the driver names in the API.
> An API should not know its drivers.
> If an application do some driver-specific processing, it knows
> the driver name as well. The driver name is written in the driver.

If Bruce don't have further objection, Then I will go with Thomas's
suggestion.



[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-26 Thread Jerin Jacob
On Fri, Nov 25, 2016 at 09:55:39AM +, Richardson, Bruce wrote:
> > > > +/* Macros to check for valid device */ #define
> > > > +RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, retval) do { \
> > >
> > > Sometimes you use RTE_EVENT_DEV_ and sometimes RTE_EVENTDEV.
> > > (I prefer the latter).
> > 
> > I choose the naming conversion based on the interface. API side it is
> > rte_event_ and driver side it is rte_eventdev_*
> > 
> > rte_event_dev_count;
> > rte_event_dev_get_dev_id
> > rte_event_dev_socket_id;
> > rte_event_dev_info_get;
> > rte_event_dev_configure;
> > rte_event_dev_start;
> > rte_event_dev_stop;
> > rte_event_dev_close;
> > rte_event_dev_dump;
> > 
> > rte_event_port_default_conf_get;
> > rte_event_port_setup;
> > rte_event_port_dequeue_depth;
> > rte_event_port_enqueue_depth;
> > rte_event_port_count;
> > rte_event_port_link;
> > rte_event_port_unlink;
> > rte_event_port_links_get;
> > 
> > rte_event_queue_default_conf_get
> > rte_event_queue_setup;
> > rte_event_queue_count;
> > rte_event_queue_priority;
> > 
> > rte_event_dequeue_wait_time;
> > 
> > rte_eventdev_pmd_allocate;
> > rte_eventdev_pmd_release;
> > rte_eventdev_pmd_vdev_init;
> > rte_eventdev_pmd_pci_probe;
> > rte_eventdev_pmd_pci_remove;
> 
> For this last set, you probably are ok prefixing with just "rte_event_pmd_", 
> and drop the "dev" as unnecessary. That makes everything have a prefix of 
> "rte_event_" and thereafter dev, port, queue, or pmd as appropriate.

OK. I will change the last set to rte_event_pmd_*

> 
> /Bruce


[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-25 Thread Jerin Jacob
On Wed, Nov 23, 2016 at 08:18:09PM +0100, Thomas Monjalon wrote:
> 2016-11-18 11:15, Jerin Jacob:
> > This patch set defines the southbound driver interface
> > and implements the common code required for northbound
> > eventdev API interface.
> 
> Please make two separate patches.

OK

> 
> > +#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
> > +#define RTE_PMD_DEBUG_TRACE(...) \
> > +   rte_pmd_debug_trace(__func__, __VA_ARGS__)
> > +#else
> > +#define RTE_PMD_DEBUG_TRACE(...)
> > +#endif
> 
> I would like to discuss the need for a debug option as there is
> already a log level.

IMO, we don't need this. However, RTE_FUNC_PTR_OR_ERR_RET needs the
definition of RTE_PMD_DEBUG_TRACE inorder to compile. I think we can
remove it when it get fixed in EAL layer.

> 
> > +/* Logging Macros */
> > +#define EDEV_LOG_ERR(fmt, args...) \
> 
> Every symbols and macros in an exported header must be prefixed by RTE_.
> 
OK. I will fix it

> > +/* Macros to check for valid device */
> > +#define RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, retval) do { \
> 
> Sometimes you use RTE_EVENT_DEV_ and sometimes RTE_EVENTDEV.
> (I prefer the latter).

I choose the naming conversion based on the interface. API side it
is rte_event_ and driver side it is rte_eventdev_*

rte_event_dev_count;
rte_event_dev_get_dev_id
rte_event_dev_socket_id;
rte_event_dev_info_get;
rte_event_dev_configure;
rte_event_dev_start;
rte_event_dev_stop;
rte_event_dev_close;
rte_event_dev_dump;

rte_event_port_default_conf_get;
rte_event_port_setup;
rte_event_port_dequeue_depth;
rte_event_port_enqueue_depth;
rte_event_port_count;
rte_event_port_link;
rte_event_port_unlink;
rte_event_port_links_get;

rte_event_queue_default_conf_get
rte_event_queue_setup;
rte_event_queue_count;
rte_event_queue_priority;

rte_event_dequeue_wait_time;

rte_eventdev_pmd_allocate;
rte_eventdev_pmd_release;
rte_eventdev_pmd_vdev_init;
rte_eventdev_pmd_pci_probe;
rte_eventdev_pmd_pci_remove;



[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-25 Thread Jerin Jacob
On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
> 2016-11-24 07:29, Jerin Jacob:
> > On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> > > 2016-11-18 11:14, Jerin Jacob:
> > > > +Eventdev API - EXPERIMENTAL
> > > > +M: Jerin Jacob 
> > > > +F: lib/librte_eventdev/
> > > 
> > > OK to mark it experimental.
> > > What is the plan to remove the experimental word?
> > 
> > IMO, EXPERIMENTAL status can be changed when
> > - At least two event drivers available(Intel and Cavium are working on
> >   SW and HW event drivers)
> > - Functional test applications are fine with at least two drivers
> > - Portable example application to showcase the features of the library
> > - eventdev integration with another dpdk subsystem such as ethdev
> > 
> > Thoughts?. I am not sure the criteria used in cryptodev case.
> 
> Sounds good.
> We will be more confident when drivers and tests will be implemented.
> 
> I think the roadmap for the SW driver targets the release 17.05.
> Do you still plan 17.02 for this API and the Cavium driver?

No. 17.02 too short for up-streaming the Cavium driver.However, I think API and
skeleton event driver can go in 17.02 if there are no objections.

> 
> > > > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > > > +/**< Skeleton event device PMD name */
> > > 
> > > I do not understand this #define.
> > 
> > Applications can explicitly request the a specific driver though driver
> > name. This will go as argument to rte_event_dev_get_dev_id(const char 
> > *name).
> > The reason for keeping this #define in rte_eventdev.h is that,
> > application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
> 
> So each driver must register its name in the API?
> Is it really needed?

Otherwise how application knows the name of the driver.
The similar scheme used in cryptodev.
http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
No strong opinion here. Open for suggestions.

> 
> > > > +struct rte_event_dev_config {
> > > > +   uint32_t dequeue_wait_ns;
> > > > +   /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this 
> > > > device.
> > > 
> > > Please explain exactly when the wait occurs and why.
> > 
> > Here is the explanation from rte_event_dequeue() API definition,
> > -
> > @param wait
> > 0 - no-wait, returns immediately if there is no event.
> > >0 - wait for the event, if the device is configured with
> > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
> > the event available or *wait* time.
> > if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> > then this function will wait until the event available or *dequeue_wait_ns*
> >   ^^
> > ns which was previously supplied to rte_event_dev_configure()
> > -
> > This is provides the application to have control over, how long the
> > implementation should wait if event is not available.
> > 
> > Let me know what exact changes are required if details are not enough in
> > rte_event_dequeue() API definition.
> 
> Maybe that timeout would be a better name.
> It waits only if there is nothing in the queue.
> It can be interesting to highlight in this comment that this parameter
> makes the dequeue function a blocking call.

OK. I will change to timeout then

> 
> > > > +/** Event port configuration structure */
> > > > +struct rte_event_port_conf {
> > > > +   int32_t new_event_threshold;
> > > > +   /**< A backpressure threshold for new event enqueues on this 
> > > > port.
> > > > +* Use for *closed system* event dev where event capacity is 
> > > > limited,
> > > > +* and cannot exceed the capacity of the event dev.
> > > > +* Configuring ports with different thresholds can make higher 
> > > > priority
> > > > +* traffic less likely to  be backpressured.
> > > > +* For example, a port used to inject NIC Rx packets into the 
> > > > event dev
> > > > +* can have a lower threshold so as not to overwhelm the device,
> > > > +* while ports used for worker pools can have a higher 
> > > > threshold.
> > > > +* This value cannot exceed the *nb_events_limit*
> > > > +* which previously supplied to rte_event_dev_configu

[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-25 Thread Jerin Jacob
On Thu, Nov 24, 2016 at 04:24:11PM +, Bruce Richardson wrote:
> On Fri, Nov 18, 2016 at 11:14:59AM +0530, Jerin Jacob wrote:
> > In a polling model, lcores poll ethdev ports and associated
> > rx queues directly to look for packet. In an event driven model,
> > by contrast, lcores call the scheduler that selects packets for
> > them based on programmer-specified criteria. Eventdev library
> > adds support for event driven programming model, which offer
> > applications automatic multicore scaling, dynamic load balancing,
> > pipelining, packet ingress order maintenance and
> > synchronization services to simplify application packet processing.
> > 
> > By introducing event driven programming model, DPDK can support
> > both polling and event driven programming models for packet processing,
> > and applications are free to choose whatever model
> > (or combination of the two) that best suits their needs.
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> 
> Hi Jerin,
> 
> Thanks for the patchset. A few minor comments in general on the API that
> we found from working with it (thus far - more may follow :-) ).

Thanks Bruce.

> 
> 1. Priorities: priorities are used in a number of places in the API, but
>all are uint8_t types and have their own MAX/NORMAL/MIN values. I think
>it would be simpler for the user just to have one priority type in the
>library, and use that everywhere. I suggest using RTE_EVENT_PRIORITY_*
>and drop the separate defines for SERVICE_PRIORITY, and QUEUE_PRIORITY
>etc. Ideally, I'd see things like this converted to enums too, rather
>than defines, but I'm not sure it's possible in this case.

OK. I will address it in v2

> 
> 2. Functions for config and setup can have their structure parameter
>types as const as they don't/shouldn't change the values internally.
>So add "const" to parameters to:
>  rte_event_dev_configure()
>  rte_event_queue_setup()
>  rte_event_port_setup()
>  rte_event_port_link()
> 

OK. I will address it in v2

> 3. in event schedule() function, the dev->schedule() function needs the
>dev instance pointer passed in as parameter.

OK. I will address it in v2

> 
> 4. The event op values and the event type values would be better as
>enums rather than as a set of #defines.

OK. I will address it in v2

I will reply to your other comments in Thomas's email.

> 
> Regards,
> /Bruce


[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-24 Thread Jerin Jacob
On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
> Hi Jerin,

Hi Thomas,

> 
> Thanks for bringing a big new piece in DPDK.
> 
> I made some comments below.

Thanks for the review.

> 
> 2016-11-18 11:14, Jerin Jacob:
> > +Eventdev API - EXPERIMENTAL
> > +M: Jerin Jacob 
> > +F: lib/librte_eventdev/
> 
> OK to mark it experimental.
> What is the plan to remove the experimental word?

IMO, EXPERIMENTAL status can be changed when
- At least two event drivers available(Intel and Cavium are working on
  SW and HW event drivers)
- Functional test applications are fine with at least two drivers
- Portable example application to showcase the features of the library
- eventdev integration with another dpdk subsystem such as ethdev

Thoughts?. I am not sure the criteria used in cryptodev case.


> 
> > + * RTE event device drivers do not use interrupts for enqueue or dequeue
> > + * operation. Instead, Event drivers export Poll-Mode enqueue and dequeue
> > + * functions to applications.
> 
> To the question "what makes DPDK different" it could be answered
> that DPDK event drivers implement polling functions :)

Mostly taken from ethdev API header file :-)

> 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> 
> Is it possible to remove some of these includes from the API?

OK. I will scan through all the header file and remove the not required
ones.

> 
> > +
> > +#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
> > +/**< Skeleton event device PMD name */
> 
> I do not understand this #define.

Applications can explicitly request the a specific driver though driver
name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
The reason for keeping this #define in rte_eventdev.h is that,
application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.

I will remove the definition from this patch and add this definition in
skeleton driver patch(patch 03/04)

> And it is not properly prefixed.

OK. I will prefix with RTE_ in v2.

> 
> > +struct rte_event_dev_info {
> > +   const char *driver_name;/**< Event driver name */
> > +   struct rte_pci_device *pci_dev; /**< PCI information */
> 
> There is some work in progress to remove PCI information from ethdev.
> Please do not add any PCI related structure in eventdev.
> The generic structure is rte_device.

OK. Makes sense. A grep of "rte_device" shows none of the subsystem
implemented yet and the work in progress. I will change to rte_device
when it is mainline. The skeleton eventdev driver based on PCI bus needs
this for the moment.


> 
> > +struct rte_event_dev_config {
> > +   uint32_t dequeue_wait_ns;
> > +   /**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
> 
> Please explain exactly when the wait occurs and why.

Here is the explanation from rte_event_dequeue() API definition,
-
@param wait
0 - no-wait, returns immediately if there is no event.
>0 - wait for the event, if the device is configured with
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
the event available or *wait* time.
if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
then this function will wait until the event available or *dequeue_wait_ns*
  ^^
ns which was previously supplied to rte_event_dev_configure()
-
This is provides the application to have control over, how long the
implementation should wait if event is not available.

Let me know what exact changes are required if details are not enough in
rte_event_dequeue() API definition.

> 
> > +* This value should be in the range of *min_dequeue_wait_ns* and
> > +* *max_dequeue_wait_ns* which previously provided in
> > +* rte_event_dev_info_get()
> > +* \see RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
> 
> I think the @see syntax would be more consistent than \see.

OK. I will change to @see

> 
> > +   uint8_t nb_event_port_dequeue_depth;
> > +   /**< Number of dequeue queue depth for any event port on this device.
> 
> I think it deserves more explanations.

see below

> 
> > +   uint32_t event_dev_cfg;
> > +   /**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
> 
> How this field differs from others in the struct?
> Should it be named flags?

OK. I will change to flags

> 
> > +   uint32_t event_queue_cfg; /**< Queue config flags(EVENT_QUEUE_CFG_) */
> 
> Same comment about the naming of this field for event_queue config sruct.

OK. I will change to flags

> 
> > +/** Event port configuration structure */
> > +struct rte_event_port_conf {
> > +   int32_t 

[dpdk-dev] [PATCH 5/7] test/eventdev: unit and functional tests

2016-11-23 Thread Jerin Jacob
On Wed, Nov 16, 2016 at 06:00:05PM +, Harry van Haaren wrote:
> This commit adds basic unit and functional tests for the eventdev
> API. The test code is added in this commit, but not yet enabled until
> the next commit.
> 
> Signed-off-by: Gage Eads 
> Signed-off-by: David Hunt 
> Signed-off-by: Harry van Haaren 
> ---

A few comments on portability and usage perspective. See below,

> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "test.h"
> +
> +
> +static inline int
> +create_ports(struct test *t, int num_ports)
> +{
> + int i;
> + static const struct rte_event_port_conf conf = {
> + .dequeue_queue_depth = 32,
> + .enqueue_queue_depth = 64,
> + };

Check the max supported through info get first.

> +
> + for (i = 0; i < num_ports; i++) {
> + if (rte_event_port_setup(t->ev, i, ) < 0) {
> + printf("Error setting up port %d\n", i);
> + return -1;
> + }
> + t->port[i] = i;
> + }
> +
> + return 0;
> +}
> +
> +
> +static int
> +run_prio_packet_test(struct test *t)

Run per event enqueue priority test if the platform supports
RTE_EVENT_DEV_CAP_EVENT_QOS


> +{
> + int err;
> + const uint32_t MAGIC_SEQN[] = {4711, 1234};
> + const uint32_t PRIORITY[] = {3, 0};
> + unsigned i;
> + for(i = 0; i < RTE_DIM(MAGIC_SEQN); i++) {
> + /* generate pkt and enqueue */
> + struct rte_event ev;
> + struct rte_mbuf *arp = rte_gen_arp(0, t->mbuf_pool);
> + if (!arp) {
> + printf("%d: gen of pkt failed\n", __LINE__);
> + return -1;
> + }
> + arp->seqn = MAGIC_SEQN[i];

For me, it make sense to don't touch any field in mbuf to make eventdev
model works. use private field to store test specific data

> +
> + ev = (struct rte_event){
> + .priority = PRIORITY[i],
> + .operation = RTE_EVENT_OP_NEW,
> + .queue_id = t->qid[0],
> + .mbuf = arp
> + };
> + err = rte_event_enqueue(t->ev, t->port[0], , 0);
> + if (err < 0) {
> + printf("%d: error failed to enqueue\n", __LINE__);
> + return -1;
> + }
> + }
> +
> + rte_event_schedule(t->ev);
> +
> + struct rte_event_dev_stats stats;
> + err = rte_event_dev_stats_get(t->ev, );
> + if (err) {
> + printf("%d: error failed to get stats\n", __LINE__);
> + return -1;
> + }
> +
> + if (stats.port_rx_pkts[t->port[0]] != 2) {
> + printf("%d: error stats incorrect for directed port\n", 
> __LINE__);
> + rte_event_dev_dump(stdout, t->ev);
> + return -1;
> + }

rely on stats for functional verification may not work in all the
implementation. makes sense to have more concrete functional
verification without stats

> +
> + struct rte_event ev, ev2;
> + uint32_t deq_pkts;
> + deq_pkts = rte_event_dequeue(t->ev, t->port[0], , 0);
> + if (deq_pkts != 1) {
> + printf("%d: error failed to deq\n", __LINE__);
> + rte_event_dev_dump(stdout, t->ev);
> + return -1;
> + }
> + if(ev.mbuf->seqn != MAGIC_SEQN[1]) {
> + printf("%d: first packet out not highest priority\n", __LINE__);
> + rte_event_dev_dump(stdout, t->ev);
> + return -1;
> + }
> + rte_pktmbuf_free(ev.mbuf);
> +
> +
> +static int
> +test_overload_trip(struct test *t)

overload tests wont fail in ddr backed systems.(ddr backed system will
mimic infinite size queue to application). So testing against failure
may not work at all in some implementation

> +{
> + int err;
> +
> + /* Create instance with 3 directed QIDs going to 3 ports */
> + if (init(t, 1, 1) < 0 ||
> + create_ports(t, 1) < 0 ||
> + create_atomic_qids(t, 1) < 0)
> + return -1;
> +


[dpdk-dev] [PATCH] eal: postpone vdev initialization

2016-11-23 Thread Jerin Jacob
On Mon, Nov 21, 2016 at 05:35:58PM +, Ferruh Yigit wrote:
> On 11/21/2016 5:02 PM, Jerin Jacob wrote:
> > On Mon, Nov 21, 2016 at 09:54:57AM +, Ferruh Yigit wrote:
> >> On 11/20/2016 8:00 AM, Jerin Jacob wrote:
> >>> Some platform like octeontx may use pci and
> >>> vdev based combined device to represent a logical
> >>> dpdk functional device.In such case, postponing the
> >>> vdev initialization after pci device
> >>> initialization will provide the better view of
> >>> the pci device resources in the system in
> >>> vdev's probe function, and it allows better
> >>> functional subsystem registration in vdev probe
> >>> function.
> >>>
> >>> As a bonus, This patch fixes a bond device
> >>> initialization use case.
> >>>
> >>> example command to reproduce the issue:
> >>> ./testpmd -c 0x2  --vdev 'eth_bond0,mode=0,
> >>> slave=:02:00.0,slave=:03:00.0' --
> >>> --port-topology=chained
> >>>
> >>> root cause:
> >>> In existing case(vdev initialization and then pci
> >>> initialization), creates three Ethernet ports with
> >>> following port ids
> >>> 0 - Bond device
> >>> 1 - PCI device 0
> >>> 2 - PCI devive 1
> >>>
> >>> Since testpmd, calls the configure/start on all the ports on
> >>> start up,it will translate to following illegal setup sequence
> >>>
> >>> 1)bond device configure/start
> >>> 1.1) pci device0 stop/configure/start
> >>> 1.2) pci device1 stop/configure/start
> >>> 2)pci device 0 configure(illegal setup case,
> >>> as device in start state)
> >>>
> >>> The fix changes the initialization sequence and
> >>> allow initialization in following valid setup order
> >>> 1) pcie device 0 configure/start
> >>> 2) pcie device 1 configure/start
> >>> 3) bond device 2 configure/start
> >>> 3.1) pcie device 0/stop/configure/start
> >>> 3.2) pcie device 1/stop/configure/start
> >>>
> >>> Signed-off-by: Jerin Jacob 
> >>> ---
> >>
> >> This changes the port id assignments to the devices, right?
> >>
> >> Previously virtual devices get first available port ids (0..N1), later
> >> physical devices (N1..N2). Now this becomes reverse.
> >>
> >> Can this change break some existing user applications?
> > 
> > I guess it may be effected only to ethdev bond pmd based application,
> > which is broken anyway.
> 
> My concern is, this may effect the applications that use "--vdev" eal
> parameter and has an assumption about port assignment.

Not sure. Application expectation on specific port assignment is bad anyway.
But in any event, what we do with exiting ethdev bond pmd failure.

> 
> And if this breaks any userspace application, does it require a
> deprecation notice?

I am not sure. Thomas, Any input on this ?

> 
> > Let me know what it takes to make forward progress on this patch. I can
> > fix the same in v2.
> > 
> > Jerin
> > 
> 


[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-23 Thread Jerin Jacob
On Tue, Nov 22, 2016 at 10:48:32PM +, Eads, Gage wrote:
> 
> 
> >  -Original Message-
> >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  Sent: Tuesday, November 22, 2016 2:00 PM
> >  To: Eads, Gage 
> >  Cc: dev at dpdk.org; Richardson, Bruce ; Van
> >  Haaren, Harry ; hemant.agrawal at nxp.com
> >  Subject: Re: [dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs
> >  
> >  On Tue, Nov 22, 2016 at 07:43:03PM +, Eads, Gage wrote:
> >  > >  > >  > > One open issue I noticed is the "typical workflow"
> >  > > description starting in  > >  rte_eventdev.h:204 conflicts with the
> >  > > centralized software PMD that Harry  > >  posted last week.
> >  > > Specifically, that PMD expects a single core to call the  > >
> >  > > schedule function. We could extend the documentation to account for
> >  > > this  > >  alternative style of scheduler invocation, or discuss
> >  > > ways to make the  software  > >  PMD work with the documented
> >  > > workflow. I prefer the former, but either  way I  > >  think we
> >  > > ought to expose the scheduler's expected usage to the user --
> >  > > perhaps  > >  through an RTE_EVENT_DEV_CAP flag?
> >  > >  > >  >
> >  > >  > >  > I prefer former too, you can propose the documentation
> >  > > change required  for  > >  software PMD.
> >  > >  >
> >  > >  > Sure, proposal follows. The "typical workflow" isn't the most
> >  > > optimal by  having a conditional in the fast-path, of course, but it
> >  > > demonstrates the idea  simply.
> >  > >  >
> >  > >  > (line 204)
> >  > >  >  * An event driven based application has following typical
> >  > > workflow on
> >  > >  fastpath:
> >  > >  >  * \code{.c}
> >  > >  >  *  while (1) {
> >  > >  >  *
> >  > >  >  *  if (dev_info.event_dev_cap &
> >  > >  >  *  RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED)
> >  > >  >  *  rte_event_schedule(dev_id);
> >  > >
> >  > >  Yes, I like the idea of RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED.
> >  > >  It  can be input to application/subsystem to  launch separate
> >  > > core(s) for schedule functions.
> >  > >  But, I think, the "dev_info.event_dev_cap &
> >  > > RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED"
> >  > >  check can be moved inside the implementation(to make the better
> >  > > decisions  and  avoiding consuming cycles on HW based schedulers.
> >  >
> >  > How would this check work? Wouldn't it prevent any core from running the
> >  software scheduler in the centralized case?
> >  
> >  I guess you may not need RTE_EVENT_DEV_CAP here, instead need flag for
> >  device configure here
> >  
> >  #define RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED (1ULL << 1)
> >  
> >  struct rte_event_dev_config config;
> >  config.event_dev_cfg = RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED;
> >  rte_event_dev_configure(.., );
> >  
> >  on the driver side on configure,
> >  if (config.event_dev_cfg & RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED)
> > eventdev->schedule = NULL;
> >  else // centralized case
> > eventdev->schedule = your_centrized_schedule_function;
> >  
> >  Does that work?
> 
> Hm, I fear the API would give users the impression that they can select the 
> scheduling behavior of a given eventdev, when a software scheduler is more 
> likely to be either distributed or centralized -- not both.

Even if it is capability flag then also it is per "device". Right ?
capability flag is more of read only too. Am i missing something here?

> 
> What if we use the capability flag, and define rte_event_schedule() as the 
> scheduling function for centralized schedulers and rte_event_dequeue() as the 
> scheduling function for distributed schedulers? That way, the datapath could 
> be the simple dequeue -> process -> enqueue. Applications would check the 
> capability flag at configuration time to decide whether or not to launch an 
> lcore that calls rte_event_schedule().

I am all for simple "dequeue -> process -> enqueue".
rte_event_schedule() added for SW scheduler only,  now it may not make
sense to add one more check on top of "rte_event_schedule()" to see
it is really need or 

[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-23 Thread Jerin Jacob
On Tue, Nov 22, 2016 at 07:43:03PM +, Eads, Gage wrote:
> >  > >  > > One open issue I noticed is the "typical workflow" description 
> > starting in
> >  > >  rte_eventdev.h:204 conflicts with the centralized software PMD that 
> > Harry
> >  > >  posted last week. Specifically, that PMD expects a single core to 
> > call the
> >  > >  schedule function. We could extend the documentation to account for 
> > this
> >  > >  alternative style of scheduler invocation, or discuss ways to make the
> >  software
> >  > >  PMD work with the documented workflow. I prefer the former, but either
> >  way I
> >  > >  think we ought to expose the scheduler's expected usage to the user --
> >  perhaps
> >  > >  through an RTE_EVENT_DEV_CAP flag?
> >  > >  >
> >  > >  > I prefer former too, you can propose the documentation change 
> > required
> >  for
> >  > >  software PMD.
> >  >
> >  > Sure, proposal follows. The "typical workflow" isn't the most optimal by
> >  having a conditional in the fast-path, of course, but it demonstrates the 
> > idea
> >  simply.
> >  >
> >  > (line 204)
> >  >  * An event driven based application has following typical workflow on
> >  fastpath:
> >  >  * \code{.c}
> >  >  *  while (1) {
> >  >  *
> >  >  *  if (dev_info.event_dev_cap &
> >  >  *  RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED)
> >  >  *  rte_event_schedule(dev_id);
> >  
> >  Yes, I like the idea of RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED.
> >  It  can be input to application/subsystem to
> >  launch separate core(s) for schedule functions.
> >  But, I think, the "dev_info.event_dev_cap &
> >  RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED"
> >  check can be moved inside the implementation(to make the better decisions
> >  and
> >  avoiding consuming cycles on HW based schedulers.
> 
> How would this check work? Wouldn't it prevent any core from running the 
> software scheduler in the centralized case?

I guess you may not need RTE_EVENT_DEV_CAP here, instead need flag for device
configure here

#define RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED (1ULL << 1)

struct rte_event_dev_config config;
config.event_dev_cfg = RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED;
rte_event_dev_configure(.., );

on the driver side on configure,
if (config.event_dev_cfg & RTE_EVENT_DEV_CFG_DISTRIBUTED_SCHED)
eventdev->schedule = NULL;
else // centralized case
eventdev->schedule = your_centrized_schedule_function;

Does that work?

> 
> >  
> >  >  *
> >  >  *  rte_event_dequeue(...);
> >  >  *
> >  >  *  (event processing)
> >  >  *
> >  >  *  rte_event_enqueue(...);
> >  >  *  }
> >  >  * \endcode
> >  >  *
> >  >  * The *schedule* operation is intended to do event scheduling, and the
> >  >  * *dequeue* operation returns the scheduled events. An implementation
> >  >  * is free to define the semantics between *schedule* and *dequeue*. For
> >  >  * example, a system based on a hardware scheduler can define its
> >  >  * rte_event_schedule() to be an NOOP, whereas a software scheduler can
> >  use
> >  >  * the *schedule* operation to schedule events. The
> >  >  * RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED capability flag indicates
> >  whether
> >  >  * rte_event_schedule() should be called by all cores or by a single 
> > (typically
> >  >  * dedicated) core.
> >  >
> >  > (line 308)
> >  > #define RTE_EVENT_DEV_CAP_DISTRIBUTED_SCHED (1ULL < 2)
> >  > /**< Event scheduling implementation is distributed and all cores must
> >  execute
> >  >  *  rte_event_schedule(). If unset, the implementation is centralized and
> >  >  *  a single core must execute the schedule operation.
> >  >  *
> >  >  *  \see rte_event_schedule()
> >  >  */
> >  >
> >  > >  >
> >  > >  > On same note, If software PMD based workflow need  a separate 
> > core(s)
> >  for
> >  > >  > schedule function then, Can we hide that from API specification and 
> > pass
> >  an
> >  > >  > argument to SW pmd to define the scheduling core(s)?
> >  > >  >
> >  > >  > Something like --vdev=eventsw0,schedule_cmask=0x2
> >  >
> >  > An API for controlling the scheduler coremask instead of (or perhaps in
> >  addition to) the vdev argument would be good, to allow runtime control. I 
> > can
> >  imagine apps that scale the number of cores based on load, and in doing so
> >  may want to migrate the scheduler to a different core.
> >  
> >  Yes, an API for number of scheduler core looks OK. But if we are going to
> >  have service core approach then we just need to specify at one place as
> >  application will not creating the service functions.
> >  
> >  >
> >  > >
> >  > >  Just a thought,
> >  > >
> >  > >  Perhaps, We could introduce generic "service" cores concept to DPDK to
> >  hide
> >  > >  the
> >  > >  requirement where the implementation needs dedicated core to do 
> > certain
> >  > >  work. I guess it would useful for other NPU integration in DPDK.
> >  > >
> >  >
> >  > That's an interesting idea. 

[dpdk-dev] [PATCH v2] i40e: Fix eth_i40e_dev_init sequence on ThunderX

2016-11-23 Thread Jerin Jacob
On Tue, Nov 22, 2016 at 01:46:54PM +, Bruce Richardson wrote:
> On Tue, Nov 22, 2016 at 03:46:38AM +0530, Jerin Jacob wrote:
> > On Sun, Nov 20, 2016 at 11:21:43PM +, Ananyev, Konstantin wrote:
> > > Hi
> > > > 
> > > > i40e_asq_send_command: rd32 & wr32 under ThunderX gives unpredictable
> > > >results. To solve this include rte memory 
> > > > barriers
> > > > 
> > > > Signed-off-by: Satha Rao 
> > > > ---
> > > >  drivers/net/i40e/base/i40e_osdep.h | 14 ++
> > > >  1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/i40e/base/i40e_osdep.h 
> > > > b/drivers/net/i40e/base/i40e_osdep.h
> > > > index 38e7ba5..ffa3160 100644
> > > > --- a/drivers/net/i40e/base/i40e_osdep.h
> > > > +++ b/drivers/net/i40e/base/i40e_osdep.h
> > > > @@ -158,7 +158,13 @@ do {   
> > > >  \
> > > > ((volatile uint32_t *)((char *)(a)->hw_addr + (reg)))
> > > >  static inline uint32_t i40e_read_addr(volatile void *addr)
> > > >  {
> > > > +#if defined(RTE_ARCH_ARM64)
> > > > +   uint32_t val = rte_le_to_cpu_32(I40E_PCI_REG(addr));
> > > > +   rte_rmb();
> > > > +   return val;
> > > 
> > > If you really need an rmb/wmb with MMIO read/writes on ARM,
> > > I think you can avoid #ifdefs here and use rte_smp_rmb/rte_smp_wmb.
> > > BTW, I suppose if you need it for i40e, you would need it for other 
> > > devices too.
> > 
> > Yes. ARM would need for all devices(typically, the devices on external PCI 
> > bus).
> > I guess rte_smp_rmb may not be the correct abstraction. So we need more of
> > rte_rmb() as we need only non smp variant on IO side. I guess then it make 
> > sense to
> > create new abstraction in eal with following variants so that each arch
> > gets opportunity to make what it makes sense that specific platform
> > 
> > rte_readb_relaxed
> > rte_readw_relaxed
> > rte_readl_relaxed
> > rte_readq_relaxed
> > rte_writeb_relaxed
> > rte_writew_relaxed
> > rte_writel_relaxed
> > rte_writeq_relaxed
> > rte_readb
> > rte_readw
> > rte_readl
> > rte_readq
> > rte_writeb
> > rte_writew
> > rte_writel
> > rte_writeq
> > 
> > Thoughts ?
> > 
> 
> That seems like a lot of API calls!
> Perhaps you can clarify - why would the rte_smp_rmb() not work for you?

Currently arm64 mapped DMB as rte_smp_rmb() for smp case.

Ideally for io barrier and non smp case, we need to map it as DSB and it is
bit heavier than DMB

The linux kernel arm64 mappings
http://lxr.free-electrons.com/source/arch/arm64/include/asm/io.h#L142

DMB vs DSB
https://community.arm.com/thread/3833

The relaxed one are without any barriers.(the use case like accessing on
chip peripherals may need only relaxed versions)

Thoughts on new rte EAL abstraction?

> 
> /Bruce


[dpdk-dev] [RFC PATCH 0/7] RFC: EventDev Software PMD

2016-11-22 Thread Jerin Jacob
On Mon, Nov 21, 2016 at 09:48:56AM +, Bruce Richardson wrote:
> On Sat, Nov 19, 2016 at 03:53:25AM +0530, Jerin Jacob wrote:
> > On Thu, Nov 17, 2016 at 10:05:07AM +, Bruce Richardson wrote:
> > > > 2) device stats API can be based on capability, HW implementations may 
> > > > not
> > > > support all the stats
> > > 
> > > Yes, this is something we were thinking about. It would be nice if we
> > > could at least come up with a common set of stats - maybe even ones
> > > tracked at an eventdev API level, e.g. nb enqueues/dequeues. As well as
> > > that, we think the idea of an xstats API, like in ethdev, might work
> > > well. For our software implementation, having visibility into the
> > > scheduler behaviour can be important, so we'd like a way to report out
> > > things like internal queue depths etc.
> > >
> > 
> > Since these are not very generic hardware, I am not sure how much sense
> > to have generic stats API. But, Something similar to ethdev's xstat(any 
> > capability based)
> > the scheme works well. Look forward to seeing API proposal with common code.
> > 
> > Jerin
> > 
> Well, to start off with, some stats that could be tracked at the API
> level could be common. What about counts of number of enqueues and
> dequeues?
> 
> I suppose the other way we can look at this is: once we get a few
> implementations of the interface, we can look at the provided xstats
> values from each one, and see if there is anything common between them.

That makes more sense to me as we don't have proposed counts. I think,
Then we should not use stats for functional tests as proposed. We could
verify the functional test by embedding some value in event object on
enqueue and later check the same on dequeue kind of scheme.

Jerin



> 
> /Bruce


[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-22 Thread Jerin Jacob
On Tue, Nov 22, 2016 at 12:43:58AM +0530, Jerin Jacob wrote:
> On Mon, Nov 21, 2016 at 05:45:51PM +, Eads, Gage wrote:
> > Hi Jerin,
> > 
> > I did a quick review and overall this implementation looks good. I noticed 
> > just one issue in rte_event_queue_setup(): the check of 
> > nb_atomic_order_sequences is being applied to atomic-type queues, but that 
> > field applies to ordered-type queues.
> 
> Thanks Gage. I will fix that in v2.
> 
> > 
> > One open issue I noticed is the "typical workflow" description starting in 
> > rte_eventdev.h:204 conflicts with the centralized software PMD that Harry 
> > posted last week. Specifically, that PMD expects a single core to call the 
> > schedule function. We could extend the documentation to account for this 
> > alternative style of scheduler invocation, or discuss ways to make the 
> > software PMD work with the documented workflow. I prefer the former, but 
> > either way I think we ought to expose the scheduler's expected usage to the 
> > user -- perhaps through an RTE_EVENT_DEV_CAP flag?
> 
> I prefer former too, you can propose the documentation change required for 
> software PMD.
> 
> On same note, If software PMD based workflow need  a separate core(s) for
> schedule function then, Can we hide that from API specification and pass an
> argument to SW pmd to define the scheduling core(s)?
> 
> Something like --vdev=eventsw0,schedule_cmask=0x2

Just a thought,

Perhaps, We could introduce generic "service" cores concept to DPDK to hide the
requirement where the implementation needs dedicated core to do certain
work. I guess it would useful for other NPU integration in DPDK.

> 
> > 
> > Thanks,
> > Gage
> > 
> > >  -Original Message-
> > >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > >  Sent: Thursday, November 17, 2016 11:45 PM
> > >  To: dev at dpdk.org
> > >  Cc: Richardson, Bruce ; Van Haaren, Harry
> > >  ; hemant.agrawal at nxp.com; Eads, Gage
> > >  ; Jerin Jacob 
> > >  Subject: [dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs
> > >  
> > >  This patch set defines the southbound driver interface
> > >  and implements the common code required for northbound
> > >  eventdev API interface.
> > >  
> > >  Signed-off-by: Jerin Jacob 
> > >  ---
> > >   config/common_base   |6 +
> > >   lib/Makefile |1 +
> > >   lib/librte_eal/common/include/rte_log.h  |1 +
> > >   lib/librte_eventdev/Makefile |   57 ++
> > >   lib/librte_eventdev/rte_eventdev.c   | 1211
> > >  ++
> > >   lib/librte_eventdev/rte_eventdev_pmd.h   |  504 +++
> > >   lib/librte_eventdev/rte_eventdev_version.map |   39 +
> > >   mk/rte.app.mk|1 +
> > >   8 files changed, 1820 insertions(+)
> > >   create mode 100644 lib/librte_eventdev/Makefile
> > >   create mode 100644 lib/librte_eventdev/rte_eventdev.c
> > >   create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
> > >   create mode 100644 lib/librte_eventdev/rte_eventdev_version.map
> > >  
> > >  diff --git a/config/common_base b/config/common_base
> > >  index 4bff83a..7a8814e 100644
> > >  --- a/config/common_base
> > >  +++ b/config/common_base
> > >  @@ -411,6 +411,12 @@ CONFIG_RTE_LIBRTE_PMD_ZUC_DEBUG=n
> > >   CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > >  
> > >   #
> > >  +# Compile generic event device library
> > >  +#
> > >  +CONFIG_RTE_LIBRTE_EVENTDEV=y
> > >  +CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
> > >  +CONFIG_RTE_EVENT_MAX_DEVS=16
> > >  +CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
> > >   # Compile librte_ring
> > >   #
> > >   CONFIG_RTE_LIBRTE_RING=y
> > >  diff --git a/lib/Makefile b/lib/Makefile
> > >  index 990f23a..1a067bf 100644
> > >  --- a/lib/Makefile
> > >  +++ b/lib/Makefile
> > >  @@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
> > >   DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
> > >   DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
> > >   DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
> > >  +DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += librte_eventdev
> > >   DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
> > >   DIRS-$(CONFIG_RTE_LIBRTE_HASH) +=

[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-22 Thread Jerin Jacob
On Mon, Nov 21, 2016 at 05:45:51PM +, Eads, Gage wrote:
> Hi Jerin,
> 
> I did a quick review and overall this implementation looks good. I noticed 
> just one issue in rte_event_queue_setup(): the check of 
> nb_atomic_order_sequences is being applied to atomic-type queues, but that 
> field applies to ordered-type queues.

Thanks Gage. I will fix that in v2.

> 
> One open issue I noticed is the "typical workflow" description starting in 
> rte_eventdev.h:204 conflicts with the centralized software PMD that Harry 
> posted last week. Specifically, that PMD expects a single core to call the 
> schedule function. We could extend the documentation to account for this 
> alternative style of scheduler invocation, or discuss ways to make the 
> software PMD work with the documented workflow. I prefer the former, but 
> either way I think we ought to expose the scheduler's expected usage to the 
> user -- perhaps through an RTE_EVENT_DEV_CAP flag?

I prefer former too, you can propose the documentation change required for 
software PMD.

On same note, If software PMD based workflow need  a separate core(s) for
schedule function then, Can we hide that from API specification and pass an
argument to SW pmd to define the scheduling core(s)?

Something like --vdev=eventsw0,schedule_cmask=0x2

> 
> Thanks,
> Gage
> 
> >  -Original Message-
> >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  Sent: Thursday, November 17, 2016 11:45 PM
> >  To: dev at dpdk.org
> >  Cc: Richardson, Bruce ; Van Haaren, Harry
> >  ; hemant.agrawal at nxp.com; Eads, Gage
> >  ; Jerin Jacob 
> >  Subject: [dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs
> >  
> >  This patch set defines the southbound driver interface
> >  and implements the common code required for northbound
> >  eventdev API interface.
> >  
> >  Signed-off-by: Jerin Jacob 
> >  ---
> >   config/common_base   |6 +
> >   lib/Makefile |1 +
> >   lib/librte_eal/common/include/rte_log.h  |1 +
> >   lib/librte_eventdev/Makefile |   57 ++
> >   lib/librte_eventdev/rte_eventdev.c   | 1211
> >  ++
> >   lib/librte_eventdev/rte_eventdev_pmd.h   |  504 +++
> >   lib/librte_eventdev/rte_eventdev_version.map |   39 +
> >   mk/rte.app.mk|1 +
> >   8 files changed, 1820 insertions(+)
> >   create mode 100644 lib/librte_eventdev/Makefile
> >   create mode 100644 lib/librte_eventdev/rte_eventdev.c
> >   create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
> >   create mode 100644 lib/librte_eventdev/rte_eventdev_version.map
> >  
> >  diff --git a/config/common_base b/config/common_base
> >  index 4bff83a..7a8814e 100644
> >  --- a/config/common_base
> >  +++ b/config/common_base
> >  @@ -411,6 +411,12 @@ CONFIG_RTE_LIBRTE_PMD_ZUC_DEBUG=n
> >   CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> >  
> >   #
> >  +# Compile generic event device library
> >  +#
> >  +CONFIG_RTE_LIBRTE_EVENTDEV=y
> >  +CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
> >  +CONFIG_RTE_EVENT_MAX_DEVS=16
> >  +CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
> >   # Compile librte_ring
> >   #
> >   CONFIG_RTE_LIBRTE_RING=y
> >  diff --git a/lib/Makefile b/lib/Makefile
> >  index 990f23a..1a067bf 100644
> >  --- a/lib/Makefile
> >  +++ b/lib/Makefile
> >  @@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
> >   DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
> >   DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
> >   DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
> >  +DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += librte_eventdev
> >   DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
> >   DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
> >   DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
> >  diff --git a/lib/librte_eal/common/include/rte_log.h
> >  b/lib/librte_eal/common/include/rte_log.h
> >  index 29f7d19..9a07d92 100644
> >  --- a/lib/librte_eal/common/include/rte_log.h
> >  +++ b/lib/librte_eal/common/include/rte_log.h
> >  @@ -79,6 +79,7 @@ extern struct rte_logs rte_logs;
> >   #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
> >   #define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
> >   #define RTE_LOGTYPE_CRYPTODEV 0x0002 /**< Log related to
> >  cryptodev. */
> >  +#define RTE_LOGTYPE_EVENTDEV 0x0004 /**< Log related to eventdev.
> >  */
> >  
&

[dpdk-dev] [PATCH] eal: postpone vdev initialization

2016-11-21 Thread Jerin Jacob
On Mon, Nov 21, 2016 at 09:54:57AM +, Ferruh Yigit wrote:
> On 11/20/2016 8:00 AM, Jerin Jacob wrote:
> > Some platform like octeontx may use pci and
> > vdev based combined device to represent a logical
> > dpdk functional device.In such case, postponing the
> > vdev initialization after pci device
> > initialization will provide the better view of
> > the pci device resources in the system in
> > vdev's probe function, and it allows better
> > functional subsystem registration in vdev probe
> > function.
> > 
> > As a bonus, This patch fixes a bond device
> > initialization use case.
> > 
> > example command to reproduce the issue:
> > ./testpmd -c 0x2  --vdev 'eth_bond0,mode=0,
> > slave=:02:00.0,slave=:03:00.0' --
> > --port-topology=chained
> > 
> > root cause:
> > In existing case(vdev initialization and then pci
> > initialization), creates three Ethernet ports with
> > following port ids
> > 0 - Bond device
> > 1 - PCI device 0
> > 2 - PCI devive 1
> > 
> > Since testpmd, calls the configure/start on all the ports on
> > start up,it will translate to following illegal setup sequence
> > 
> > 1)bond device configure/start
> > 1.1) pci device0 stop/configure/start
> > 1.2) pci device1 stop/configure/start
> > 2)pci device 0 configure(illegal setup case,
> > as device in start state)
> > 
> > The fix changes the initialization sequence and
> > allow initialization in following valid setup order
> > 1) pcie device 0 configure/start
> > 2) pcie device 1 configure/start
> > 3) bond device 2 configure/start
> > 3.1) pcie device 0/stop/configure/start
> > 3.2) pcie device 1/stop/configure/start
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> 
> This changes the port id assignments to the devices, right?
> 
> Previously virtual devices get first available port ids (0..N1), later
> physical devices (N1..N2). Now this becomes reverse.
> 
> Can this change break some existing user applications?

I guess it may be effected only to ethdev bond pmd based application,
which is broken anyway.
Let me know what it takes to make forward progress on this patch. I can
fix the same in v2.

Jerin



[dpdk-dev] [PATCH] eal: postpone vdev initialization

2016-11-20 Thread Jerin Jacob
Some platform like octeontx may use pci and
vdev based combined device to represent a logical
dpdk functional device.In such case, postponing the
vdev initialization after pci device
initialization will provide the better view of
the pci device resources in the system in
vdev's probe function, and it allows better
functional subsystem registration in vdev probe
function.

As a bonus, This patch fixes a bond device
initialization use case.

example command to reproduce the issue:
./testpmd -c 0x2  --vdev 'eth_bond0,mode=0,
slave=:02:00.0,slave=:03:00.0' --
--port-topology=chained

root cause:
In existing case(vdev initialization and then pci
initialization), creates three Ethernet ports with
following port ids
0 - Bond device
1 - PCI device 0
2 - PCI devive 1

Since testpmd, calls the configure/start on all the ports on
start up,it will translate to following illegal setup sequence

1)bond device configure/start
1.1) pci device0 stop/configure/start
1.2) pci device1 stop/configure/start
2)pci device 0 configure(illegal setup case,
as device in start state)

The fix changes the initialization sequence and
allow initialization in following valid setup order
1) pcie device 0 configure/start
2) pcie device 1 configure/start
3) bond device 2 configure/start
3.1) pcie device 0/stop/configure/start
3.2) pcie device 1/stop/configure/start

Signed-off-by: Jerin Jacob 
---
 lib/librte_eal/bsdapp/eal/eal.c   | 6 +++---
 lib/librte_eal/linuxapp/eal/eal.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 35e3117..2206277 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -577,9 +577,6 @@ rte_eal_init(int argc, char **argv)
rte_config.master_lcore, thread_id, cpuset,
ret == 0 ? "" : "...");

-   if (rte_eal_dev_init() < 0)
-   rte_panic("Cannot init pmd devices\n");
-
RTE_LCORE_FOREACH_SLAVE(i) {

/*
@@ -616,6 +613,9 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_probe())
rte_panic("Cannot probe PCI\n");

+   if (rte_eal_dev_init() < 0)
+   rte_panic("Cannot init pmd devices\n");
+
rte_eal_mcfg_complete();

return fctret;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 2075282..16dd5b9 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -841,9 +841,6 @@ rte_eal_init(int argc, char **argv)
rte_config.master_lcore, (int)thread_id, cpuset,
ret == 0 ? "" : "...");

-   if (rte_eal_dev_init() < 0)
-   rte_panic("Cannot init pmd devices\n");
-
if (rte_eal_intr_init() < 0)
rte_panic("Cannot init interrupt-handling thread\n");

@@ -887,6 +884,9 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_probe())
rte_panic("Cannot probe PCI\n");

+   if (rte_eal_dev_init() < 0)
+   rte_panic("Cannot init pmd devices\n");
+
rte_eal_mcfg_complete();

return fctret;
-- 
2.5.5



[dpdk-dev] [PATCH 0/4] libeventdev API and northbound implementation

2016-11-19 Thread Jerin Jacob
On Fri, Nov 18, 2016 at 04:04:29PM +, Bruce Richardson wrote:
> +Thomas
> 
> On Fri, Nov 18, 2016 at 03:25:18PM +, Bruce Richardson wrote:
> > On Fri, Nov 18, 2016 at 11:14:58AM +0530, Jerin Jacob wrote:
> > > As previously discussed in RFC v1 [1], RFC v2 [2], with changes
> > > described in [3] (also pasted below), here is the first non-draft series
> > > for this new API.
> > > 
> > > [1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
> > > [2] http://dpdk.org/ml/archives/dev/2016-October/048592.html
> > > [3] http://dpdk.org/ml/archives/dev/2016-October/048196.html
> > > 
> > > Changes since RFC v2:
> > > 
> > > - Updated the documentation to define the need for this library[Jerin]
> > > - Added RTE_EVENT_QUEUE_CFG_*_ONLY configuration parameters in
> > >   struct rte_event_queue_conf to enable optimized sw implementation 
> > > [Bruce]
> > > - Introduced RTE_EVENT_OP* ops [Bruce]
> > > - Added nb_event_queue_flows,nb_event_port_dequeue_depth, 
> > > nb_event_port_enqueue_depth
> > >   in rte_event_dev_configure() like ethdev and crypto library[Jerin]
> > > - Removed rte_event_release() and replaced with RTE_EVENT_OP_RELEASE ops 
> > > to
> > >   reduce fast path APIs and it is redundant too[Jerin]
> > > - In the view of better application portability, Removed pin_event
> > >   from rte_event_enqueue as it is just hint and Intel/NXP can not support 
> > > it[Jerin]
> > > - Added rte_event_port_links_get()[Jerin]
> > > - Added rte_event_dev_dump[Harry]
> > > 
> > > Notes:
> > > 
> > > - This patch set is check-patch clean with an exception that
> > > 02/04 has one WARNING:MACRO_WITH_FLOW_CONTROL
> > > - Looking forward to getting additional maintainers for libeventdev
> > > 
> > > 
> > > Possible next steps:
> > > 1) Review this patch set
> > > 2) Integrate Intel's SW driver[http://dpdk.org/dev/patchwork/patch/17049/]
> > > 3) Review proposed examples/eventdev_pipeline 
> > > application[http://dpdk.org/dev/patchwork/patch/17053/]
> > > 4) Review proposed functional 
> > > tests[http://dpdk.org/dev/patchwork/patch/17051/]
> > > 5) Cavium's HW based eventdev driver
> > > 
> > > I am planning to work on (3),(4) and (5)
> > > 
> > Thanks Jerin,
> > 
> > we'll review and get back to you with any comments or feedback (1), and
> > obviously start working on item (2) also! :-)
> > 
> > I'm also wonder whether we should have a staging tree for this work to
> > make interaction between us easier. Although this may not be
> > finalised enough for 17.02 release, do you think having an
> > dpdk-eventdev-next tree would be a help? My thinking is that once we get
> > the eventdev library itself in reasonable shape following our review, we
> > could commit that and make any changes thereafter as new patches, rather
> > than constantly respinning the same set. It also gives us a clean git
> > tree to base the respective driver implementations on from our two sides.
> > 
> > Thomas, any thoughts here on your end - or from anyone else?

I was thinking more or less along the same lines. To avoid re-spinning the
same set, it is better to have libeventdev library mark as EXPERIMENTAL
and commit it somewhere on dpdk-eventdev-next or main tree

I think, EXPERIMENTAL status can be changed only when
- At least two event drivers available
- Functional test applications fine with at least two drivers
- Portable example application to showcase the features of the library
- eventdev integration with another dpdk subsystem such as ethdev

Jerin

> > 
> > Regards,
> > /Bruce
> > 


[dpdk-dev] Proposal for a new Committer model

2016-11-19 Thread Jerin Jacob
On Fri, Nov 18, 2016 at 01:09:35PM -0500, Neil Horman wrote:
> On Thu, Nov 17, 2016 at 09:20:50AM +, Mcnamara, John wrote:
> > Repost from the moving at dpdk.org mailing list to get a wider audience.
> > Original thread: 
> > http://dpdk.org/ml/archives/moving/2016-November/59.html
> > 
> > 
> > Hi,
> > 
> > I'd like to propose a change to the DPDK committer model. Currently we have 
> > one committer for the master branch of the DPDK project. 
> > 
> > One committer to master represents a single point of failure and at times 
> > can be inefficient. There is also no agreed cover for times when the 
> > committer is unavailable such as vacation, public holidays, etc. I propose 
> > that we change to a multi-committer model for the DPDK project. We should 
> > have three committers for each release that can commit changes to the 
> > master branch.
> >  
> > There are a number of benefits:
> >  
> > 1. Greater capacity to commit patches.
> > 2. No single points of failure - a committer should always be available if 
> > we have three.
> > 3. A more timely committing of patches. More committers should equal a 
> > faster turnaround - ideally, maintainers should also provide feedback on 
> > patches submitted within a 2-3 day period, as much as possible, to 
> > facilitate this. 
> > 4. It follows best practice in creating a successful multi-vendor community 
> > - to achieve this we must ensure there is a level playing field for all 
> > participants, no single person should be required to make all of the 
> > decisions on patches to be included in the release.  
> > 
> > Having multiple committers will require some degree of co-ordination but 
> > there are a number of other communities successfully following this model 
> > such as Apache, OVS, FD.io, OpenStack etc. so the approach is workable.
> > 
> > John
> 
> I agree that the problems you are attempting to address exist and are
> worth finding a solution for.  That said, I don't think the solution you
> are proposing is the ideal, or complete fix for any of the issues being
> addressed.
> 
> If I may, I'd like to ennumerate the issues I think you are trying to
> address based on your comments above, then make a counter-proposal for a
> solution:
> 
> Problems to address:
> 
> 1) high-availability - There is a desire to make sure that, when patches
> are proposed, they are integrated in a timely fashion.
> 
> 2) high-throughput - DPDK has a large volume of patches, more than one
> person can normally integrate.  There is a desire to shard that work such
> that it is handled by multiple individuals
> 
> 3) Multi-Vendor fairness - There is a desire for multiple vendors to feel
> as though the project tree maintainer isn't biased toward any individual
> vendor.
> 
> To solve these I would propose the following solution (which is simmilar
> to, but not quite identical, to yours).
> 
> A) Further promote subtree maintainership.  This was a conversation that I
> proposed some time ago, but my proposed granularity was discarded in favor
> of something that hasn't worked as well (in my opinion).  That is to say a
> few driver pmds (i40e and fm10k come to mind) have their own tree that
> send pull requests to Thomas.  We should be sharding that at a much higher
> granularity and using it much more consistently.  That is to say, that we
> should have a maintainer for all the ethernet pmds, and another for the
> crypto pmds, another for the core eal layer, another for misc libraries
> that have low patch volumes, etc.  Each of those subdivisions should have
> their own list to communicate on, and each should have a tree that
> integrates patches for their own subsystem, and they should on a regular
> cycle send pull requests to Thomas.  Thomas in turn should by and large,
> only be integrating pull requests.  This should address our high-
> throughput issue, in that it will allow multiple maintainers to share the
> workload, and integration should be relatively easy.

+1

> 
> B) Designate alternates to serve as backups for the maintainer when they
> are unavailable.  This provides high-availablility, and sounds very much
> like your proposal, but in the interests of clarity, there is still a
> single maintainer at any one time, it just may change to ensure the
> continued merging of patches, if the primary maintainer isn't available.
> Ideally however, those backup alternates arent needed, because most of the
> primary maintainers work in merging pull requests, which are done based on
> the trust of the submaintainer, and done during a very limited window of
> time.  This also partially addreses multi-vendor fairness if your subtree
> maintainers come from multiple participating companies.

+1

> 
> Regards
> Neil
> 
> 


[dpdk-dev] [PATCH 4/4] app/test: unit test case for eventdev APIs

2016-11-18 Thread Jerin Jacob
This commit adds basic unit tests for the eventdev API.

commands to run the test app:
./build/app/test -c 2
RTE>>eventdev_common_autotest

Signed-off-by: Jerin Jacob 
---
 MAINTAINERS  |   1 +
 app/test/Makefile|   2 +
 app/test/test_eventdev.c | 776 +++
 3 files changed, 779 insertions(+)
 create mode 100644 app/test/test_eventdev.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c594a23..887f133 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -252,6 +252,7 @@ F: examples/l2fwd-crypto/
 Eventdev API - EXPERIMENTAL
 M: Jerin Jacob 
 F: lib/librte_eventdev/
+F: app/test/test_eventdev*
 F: drivers/event/skeleton/

 Networking Drivers
diff --git a/app/test/Makefile b/app/test/Makefile
index 5be023a..e28c079 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -197,6 +197,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += 
test_cryptodev_blockcipher.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c

+SRCS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += test_eventdev.c
+
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c

 CFLAGS += -O3
diff --git a/app/test/test_eventdev.c b/app/test/test_eventdev.c
new file mode 100644
index 000..e876804
--- /dev/null
+++ b/app/test/test_eventdev.c
@@ -0,0 +1,776 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Cavium networks. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *  * Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ *  * Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *  * Neither the name of Cavium networks nor the names of its
+ *contributors may be used to endorse or promote products derived
+ *from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define TEST_DEV_NAME EVENTDEV_NAME_SKELETON_PMD
+
+static inline uint8_t
+test_dev_id_get(void)
+{
+   return rte_event_dev_get_dev_id(RTE_STR(TEST_DEV_NAME)"_0");
+}
+
+static int
+testsuite_setup(void)
+{
+   return rte_eal_vdev_init(RTE_STR(TEST_DEV_NAME), NULL);
+}
+
+static void
+testsuite_teardown(void)
+{
+}
+
+static int
+test_eventdev_count(void)
+{
+   uint8_t count;
+   count = rte_event_dev_count();
+   TEST_ASSERT(count > 0, "Invalid eventdev count %" PRIu8, count);
+   return TEST_SUCCESS;
+}
+
+static int
+test_eventdev_get_dev_id(void)
+{
+   int ret;
+   ret = rte_event_dev_get_dev_id(RTE_STR(TEST_DEV_NAME)"_0");
+   TEST_ASSERT(ret >= 0, "Failed to get dev_id %d", ret);
+   ret = rte_event_dev_get_dev_id("not_a_valid_ethdev_driver");
+   TEST_ASSERT_FAIL(ret, "Expected <0 for invalid dev name ret=%d", ret);
+   return TEST_SUCCESS;
+}
+
+static int
+test_eventdev_socket_id(void)
+{
+   int ret, socket_id;
+   ret = rte_event_dev_get_dev_id(RTE_STR(TEST_DEV_NAME)"_0");
+   socket_id = rte_event_dev_socket_id(ret);
+   TEST_ASSERT(socket_id != -EINVAL, "Failed to get socket_id %d",
+   socket_id);
+   socket_id = rte_event_dev_socket_id(RTE_EVENT_MAX_DEVS);
+   TEST_ASSERT(socket_id == -EINVAL, "Expected -EINVAL %d", socket_id);
+
+   return TEST_SUCCESS;
+}
+
+static int
+test_eventdev_info_get(void)
+{
+   int ret;
+   struct rte_event_dev_info info;
+   ret = rte_event_dev_info_get(test_dev_id_get(), NULL);
+   TEST_ASSERT(ret == -EINVAL, "Expected -EINVAL, %d", ret);
+   ret = rte_event_dev_info_get(test_dev_id_get(), );
+   TEST_ASSERT_SUCCESS

[dpdk-dev] [PATCH 2/4] eventdev: implement the northbound APIs

2016-11-18 Thread Jerin Jacob
This patch set defines the southbound driver interface
and implements the common code required for northbound
eventdev API interface.

Signed-off-by: Jerin Jacob 
---
 config/common_base   |6 +
 lib/Makefile |1 +
 lib/librte_eal/common/include/rte_log.h  |1 +
 lib/librte_eventdev/Makefile |   57 ++
 lib/librte_eventdev/rte_eventdev.c   | 1211 ++
 lib/librte_eventdev/rte_eventdev_pmd.h   |  504 +++
 lib/librte_eventdev/rte_eventdev_version.map |   39 +
 mk/rte.app.mk|1 +
 8 files changed, 1820 insertions(+)
 create mode 100644 lib/librte_eventdev/Makefile
 create mode 100644 lib/librte_eventdev/rte_eventdev.c
 create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_version.map

diff --git a/config/common_base b/config/common_base
index 4bff83a..7a8814e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -411,6 +411,12 @@ CONFIG_RTE_LIBRTE_PMD_ZUC_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y

 #
+# Compile generic event device library
+#
+CONFIG_RTE_LIBRTE_EVENTDEV=y
+CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
+CONFIG_RTE_EVENT_MAX_DEVS=16
+CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
diff --git a/lib/Makefile b/lib/Makefile
index 990f23a..1a067bf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CFGFILE) += librte_cfgfile
 DIRS-$(CONFIG_RTE_LIBRTE_CMDLINE) += librte_cmdline
 DIRS-$(CONFIG_RTE_LIBRTE_ETHER) += librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += librte_cryptodev
+DIRS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += librte_eventdev
 DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 29f7d19..9a07d92 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -79,6 +79,7 @@ extern struct rte_logs rte_logs;
 #define RTE_LOGTYPE_PIPELINE 0x8000 /**< Log related to pipeline. */
 #define RTE_LOGTYPE_MBUF0x0001 /**< Log related to mbuf. */
 #define RTE_LOGTYPE_CRYPTODEV 0x0002 /**< Log related to cryptodev. */
+#define RTE_LOGTYPE_EVENTDEV 0x0004 /**< Log related to eventdev. */

 /* these log types can be used in an application */
 #define RTE_LOGTYPE_USER1   0x0100 /**< User-defined log type 1. */
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
new file mode 100644
index 000..dac0663
--- /dev/null
+++ b/lib/librte_eventdev/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Cavium networks. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Cavium networks nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_eventdev.a
+
+# library version
+LIBABIVER := 1
+
+# build flags
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+# library source files
+SRCS-y += rte_eventdev.c
+
+# export include files
+SYMLINK-y-include += rte_eventdev.h
+SYMLINK-y-include += rte_eventdev_pmd.h
+
+# versioning export map
+EXPORT_MAP := rte_eventdev_version.map
+
+# library dependencies
+DEPDIRS-y += lib/librte_eal
+DEPDIRS-y += lib/librte_mbuf
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_e

[dpdk-dev] [PATCH 1/4] eventdev: introduce event driven programming model

2016-11-18 Thread Jerin Jacob
In a polling model, lcores poll ethdev ports and associated
rx queues directly to look for packet. In an event driven model,
by contrast, lcores call the scheduler that selects packets for
them based on programmer-specified criteria. Eventdev library
adds support for event driven programming model, which offer
applications automatic multicore scaling, dynamic load balancing,
pipelining, packet ingress order maintenance and
synchronization services to simplify application packet processing.

By introducing event driven programming model, DPDK can support
both polling and event driven programming models for packet processing,
and applications are free to choose whatever model
(or combination of the two) that best suits their needs.

Signed-off-by: Jerin Jacob 
---
 MAINTAINERS|3 +
 doc/api/doxy-api-index.md  |1 +
 doc/api/doxy-api.conf  |1 +
 lib/librte_eventdev/rte_eventdev.h | 1439 
 4 files changed, 1444 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_eventdev.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d6bb8f8..e430ca7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -249,6 +249,9 @@ F: lib/librte_cryptodev/
 F: app/test/test_cryptodev*
 F: examples/l2fwd-crypto/

+Eventdev API - EXPERIMENTAL
+M: Jerin Jacob 
+F: lib/librte_eventdev/

 Networking Drivers
 --
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..28c1329 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -40,6 +40,7 @@ There are many libraries, so their headers may be grouped by 
topics:
   [ethdev] (@ref rte_ethdev.h),
   [ethctrl](@ref rte_eth_ctrl.h),
   [cryptodev]  (@ref rte_cryptodev.h),
+  [eventdev]   (@ref rte_eventdev.h),
   [devargs](@ref rte_devargs.h),
   [bond]   (@ref rte_eth_bond.h),
   [vhost]  (@ref rte_virtio_net.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 9dc7ae5..9841477 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -41,6 +41,7 @@ INPUT   = doc/api/doxy-api-index.md \
   lib/librte_cryptodev \
   lib/librte_distributor \
   lib/librte_ether \
+  lib/librte_eventdev \
   lib/librte_hash \
   lib/librte_ip_frag \
   lib/librte_jobstats \
diff --git a/lib/librte_eventdev/rte_eventdev.h 
b/lib/librte_eventdev/rte_eventdev.h
new file mode 100644
index 000..778d6dc
--- /dev/null
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -0,0 +1,1439 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright 2016 Cavium.
+ *   Copyright 2016 Intel Corporation.
+ *   Copyright 2016 NXP.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Cavium nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_EVENTDEV_H_
+#define _RTE_EVENTDEV_H_
+
+/**
+ * @file
+ *
+ * RTE Event Device API
+ *
+ * In a polling model, lcores poll ethdev ports and associated rx queues
+ * directly to look for packet. In an event driven model, by contrast, lcores
+ * call the scheduler that selects packets for them based on programmer
+ * specified criteria. Eventdev library adds support for event driven
+ * programming model, which offer applications automatic multicore scaling,
+ * dynamic load balancing, pipelining, packet ing

[dpdk-dev] [PATCH 0/4] libeventdev API and northbound implementation

2016-11-18 Thread Jerin Jacob
As previously discussed in RFC v1 [1], RFC v2 [2], with changes
described in [3] (also pasted below), here is the first non-draft series
for this new API.

[1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
[2] http://dpdk.org/ml/archives/dev/2016-October/048592.html
[3] http://dpdk.org/ml/archives/dev/2016-October/048196.html

Changes since RFC v2:

- Updated the documentation to define the need for this library[Jerin]
- Added RTE_EVENT_QUEUE_CFG_*_ONLY configuration parameters in
  struct rte_event_queue_conf to enable optimized sw implementation [Bruce]
- Introduced RTE_EVENT_OP* ops [Bruce]
- Added nb_event_queue_flows,nb_event_port_dequeue_depth, 
nb_event_port_enqueue_depth
  in rte_event_dev_configure() like ethdev and crypto library[Jerin]
- Removed rte_event_release() and replaced with RTE_EVENT_OP_RELEASE ops to
  reduce fast path APIs and it is redundant too[Jerin]
- In the view of better application portability, Removed pin_event
  from rte_event_enqueue as it is just hint and Intel/NXP can not support 
it[Jerin]
- Added rte_event_port_links_get()[Jerin]
- Added rte_event_dev_dump[Harry]

Notes:

- This patch set is check-patch clean with an exception that
02/04 has one WARNING:MACRO_WITH_FLOW_CONTROL
- Looking forward to getting additional maintainers for libeventdev


Possible next steps:
1) Review this patch set
2) Integrate Intel's SW driver[http://dpdk.org/dev/patchwork/patch/17049/]
3) Review proposed examples/eventdev_pipeline 
application[http://dpdk.org/dev/patchwork/patch/17053/]
4) Review proposed functional tests[http://dpdk.org/dev/patchwork/patch/17051/]
5) Cavium's HW based eventdev driver

I am planning to work on (3),(4) and (5)

TODO:
1) Example applications for pipelining, packet ingress order maintenance with
ORDERED type and ATOMIC synchronization services.
2) Create user guide


Jerin Jacob (4):
  eventdev: introduce event driven programming model
  eventdev: implement the northbound APIs
  event/skeleton: add skeleton eventdev driver
  app/test: unit test case for eventdev APIs

 MAINTAINERS|5 +
 app/test/Makefile  |2 +
 app/test/test_eventdev.c   |  776 +++
 config/common_base |   14 +
 doc/api/doxy-api-index.md  |1 +
 doc/api/doxy-api.conf  |1 +
 drivers/Makefile   |1 +
 drivers/event/Makefile |   36 +
 drivers/event/skeleton/Makefile|   55 +
 .../skeleton/rte_pmd_skeleton_event_version.map|4 +
 drivers/event/skeleton/skeleton_eventdev.c |  535 
 drivers/event/skeleton/skeleton_eventdev.h |   72 +
 lib/Makefile   |1 +
 lib/librte_eal/common/include/rte_log.h|1 +
 lib/librte_eventdev/Makefile   |   57 +
 lib/librte_eventdev/rte_eventdev.c | 1211 
 lib/librte_eventdev/rte_eventdev.h | 1439 
 lib/librte_eventdev/rte_eventdev_pmd.h |  504 +++
 lib/librte_eventdev/rte_eventdev_version.map   |   39 +
 mk/rte.app.mk  |5 +
 20 files changed, 4759 insertions(+)
 create mode 100644 app/test/test_eventdev.c
 create mode 100644 drivers/event/Makefile
 create mode 100644 drivers/event/skeleton/Makefile
 create mode 100644 drivers/event/skeleton/rte_pmd_skeleton_event_version.map
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.c
 create mode 100644 drivers/event/skeleton/skeleton_eventdev.h
 create mode 100644 lib/librte_eventdev/Makefile
 create mode 100644 lib/librte_eventdev/rte_eventdev.c
 create mode 100644 lib/librte_eventdev/rte_eventdev.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_pmd.h
 create mode 100644 lib/librte_eventdev/rte_eventdev_version.map

-- 
2.5.5



[dpdk-dev] [RFC PATCH 0/7] RFC: EventDev Software PMD

2016-11-17 Thread Jerin Jacob
On Wed, Nov 16, 2016 at 06:00:00PM +, Harry van Haaren wrote:
> This series of RFC patches implements the libeventdev API and a software
> eventdev PMD.
> 
> The implementation here is intended to enable the community to use the
> eventdev API, specifically to test if the API serves the purpose that it is
> designed to. It should be noted this is an RFC implementation, and hence
> there should be no performance expectations.
> 
> An RFC for the eventdev was sent in August[1] by Jerin Jacob of Cavium,
> which introduced the core concepts of the eventdev to the community. Since
> then there has been extensive discussion[2] on the mailing list, which had
> led to various modifications to the initial proposed API.
> 
> The API as presented in the first patch contains a number of changes that
> have not yet been discussed. These changes were noticed during the
> implementation of the software eventdev PMD, and were added to the API to
> enable completion of the PMD. These modifications include a statistics API
> and a dump API. For more details, please refer to the commit message of the
> patch itself.
> 
> The functionality provided by each of the patches is as follows:
>   1: Add eventdev API and library infrastructure
>   2: Enable compilation of library
>   3: Add software eventdev PMD
>   4: Enable compilation of PMD
>   5: Add test code
>   6: Enable test code compilation
>   7: Sample application demonstrating basic usage
> 
> This breakdown of the patchset hopefully enables the community to experiment
> with the eventdev API, and allows us all to gain first-hand experience in
> using the eventdev API.  Note also that this patchset has not passed
> checkpatch testing just yet - will fix for v2 :)
> 
> As next steps I see value in discussing the proposed changes included in
> this version of the header file, while welcoming feedback from the community
> on the API in general too.

Thanks. Harry.

Even I was writing the similar stuff.I took a bit different approach on
the common code side, where I was trying to have fat common code(
lib/librte_eventdev/rte_eventdev.c) with start/stop support for the
slow-path code. I will post the implementation in few days and then we
can work on a converged solution.

Following sections of code does not have any overlap at all.
test/eventdev: unit and functional tests
event/sw: software eventdev implementation
examples/eventdev_pipeline: adding example

Some questions and initial feedback
1) I thought RTE_EVENT_OP_DROP and rte_event_release() are same ? No ?
2) device stats API can be based on capability, HW implementations may not
support all the stats
3) From the HW implementation perspective, eventdev_pipeline application
needs to have a lot of changes.I will post the comments in coming days
and we can work together on the converged solution.

Jerin


> 
> Signed-off-by: Harry van Haaren 
> 
> [1] http://dpdk.org/ml/archives/dev/2016-August/045181.html
> [2] http://dpdk.org/ml/archives/dev/2016-October/thread.html#48196
> 
> Harry van Haaren (7):
>   eventdev: header and implementation
>   eventdev: makefiles
>   event/sw: software eventdev implementation
>   event/sw: makefiles and config
>   test/eventdev: unit and functional tests
>   test/eventdev: unit func makefiles
>   examples/eventdev_pipeline: adding example
> 
>  app/test/Makefile |3 +
>  app/test/test_eventdev_func.c | 1272 
>  app/test/test_eventdev_unit.c |  557 +++
>  config/common_base|   12 +
>  drivers/Makefile  |1 +
>  drivers/event/Makefile|   36 +
>  drivers/event/sw/Makefile |   59 ++
>  drivers/event/sw/event_ring.h |  142 +++
>  drivers/event/sw/iq_ring.h|  160 +++
>  drivers/event/sw/rte_pmd_evdev_sw_version.map |3 +
>  drivers/event/sw/sw_evdev.c   |  619 
>  drivers/event/sw/sw_evdev.h   |  234 +
>  drivers/event/sw/sw_evdev_scheduler.c |  660 +
>  drivers/event/sw/sw_evdev_worker.c|  218 +
>  examples/eventdev_pipeline/Makefile   |   49 +
>  examples/eventdev_pipeline/main.c |  717 ++
>  lib/Makefile  |1 +
>  lib/librte_eal/common/include/rte_vdev.h  |1 +
>  lib/librte_eventdev/Makefile  |   54 ++
>  lib/librte_eventdev/rte_eventdev.c|  466 +
>  lib/librte_eventdev/rte_eventdev.h| 1289 
> +
>  lib/librte_eventdev/rte_eventdev_ops.h|  177 
>  lib/librte_eventdev/rte_eventdev_pmd.h   

[dpdk-dev] [PATCH] cryptodev: fix crash on null dereference

2016-11-16 Thread Jerin Jacob
crypodev->data->name will be null when
rte_cryptodev_get_dev_id() invoked without a valid
crypto device instance.

Signed-off-by: Jerin Jacob 
---
 lib/librte_cryptodev/rte_cryptodev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index 127e8d0..54e95d5 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -225,13 +225,14 @@ rte_cryptodev_create_vdev(const char *name, const char 
*args)
 }

 int
-rte_cryptodev_get_dev_id(const char *name) {
+rte_cryptodev_get_dev_id(const char *name)
+{
unsigned i;

if (name == NULL)
return -1;

-   for (i = 0; i < rte_cryptodev_globals->max_devs; i++)
+   for (i = 0; i < rte_cryptodev_globals->nb_devs; i++)
if ((strcmp(rte_cryptodev_globals->devs[i].data->name, name)
== 0) &&
(rte_cryptodev_globals->devs[i].attached ==
-- 
2.5.5



[dpdk-dev] pmdinfogen issues: cross compilation for ARM fails with older host compiler

2016-11-14 Thread Jerin Jacob
On Fri, Nov 11, 2016 at 10:34:39AM +, Hemant Agrawal wrote:
> Hi Neil,
>Pmdinfogen compiles with host compiler. It usages 
> rte_byteorder.h of the target platform.
> However, if the host compiler is older than 4.8, it will be an issue during 
> cross compilation for some platforms.
> e.g. if we are compiling on x86 host for ARM, x86 host compiler will not 
> understand the arm asm instructions.
> 
> /* fix missing __builtin_bswap16 for gcc older then 4.8 */
> #if !(__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
> static inline uint16_t rte_arch_bswap16(uint16_t _x)
> {
>register uint16_t x = _x;
>asm volatile ("rev16 %0,%1"
> : "=r" (x)
> : "r" (x)
> );
>return x;
> }
> #endif
> 
> One easy solution is that we add compiler platform check in this code section 
> of rte_byteorder.h
> e.g
> #if !(defined __arm__ || defined __aarch64__)
> static inline uint16_t rte_arch_bswap16(uint16_t _x)
> {
>return (_x >> 8) | ((_x << 8) & 0xff00);
> }
> #else ?.
> 
> Is there a better way to fix it?

IMO, It is a HOST build infrastructure issue. If a host app is using the
dpdk service then it should compile and link against HOST target(in this
specific case, build/x86_64-native-linuxapp-gcc). I think, introducing the
HOSTTARGET kind of scheme is a clean solution.

/Jerin



[dpdk-dev] [PATCH] i40e: Fix eth_i40e_dev_init sequence on ThunderX

2016-11-10 Thread Jerin Jacob
On Thu, Nov 10, 2016 at 04:04:27AM -0800, Satha Rao wrote:
> i40e_asq_send_command: rd32 & wr32 under ThunderX gives unpredictable
>results. To solve this include rte memory barriers
> 
> Signed-off-by: Satha Rao 
> ---
>  drivers/net/i40e/base/i40e_adminq.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/base/i40e_adminq.c 
> b/drivers/net/i40e/base/i40e_adminq.c
> index 0d3a83f..1038a95 100644
> --- a/drivers/net/i40e/base/i40e_adminq.c
> +++ b/drivers/net/i40e/base/i40e_adminq.c
> @@ -832,6 +832,7 @@ enum i40e_status_code i40e_asq_send_command(struct 
> i40e_hw *hw,
>   }
>  
>   val = rd32(hw, hw->aq.asq.head);
> + rte_rmb();

use rte_smp_rmb() variant to avoid performance regression on x86

>   if (val >= hw->aq.num_asq_entries) {
>   i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE,
>  "AQTX: head overrun at %d\n", val);
> @@ -929,8 +930,10 @@ enum i40e_status_code i40e_asq_send_command(struct 
> i40e_hw *hw,
>   (hw->aq.asq.next_to_use)++;
>   if (hw->aq.asq.next_to_use == hw->aq.asq.count)
>   hw->aq.asq.next_to_use = 0;
> - if (!details->postpone)
> + if (!details->postpone) {
>   wr32(hw, hw->aq.asq.tail, hw->aq.asq.next_to_use);
> + rte_wmb();

ditto

> + }
>  
>   /* if cmd_details are not defined or async flag is not set,
>* we need to wait for desc write back
> -- 
> 2.7.4
> 


[dpdk-dev] [PATCH 2/2] net/thunderx: add cn83xx support

2016-11-08 Thread Jerin Jacob
83xx NIC subsystem differs in new PCI subsystem_device_id and
NICVF_CAP_DISABLE_APAD capability.

Signed-off-by: Jerin Jacob 
---
 doc/guides/nics/thunderx.rst | 1 +
 drivers/net/thunderx/base/nicvf_hw.c | 4 
 drivers/net/thunderx/base/nicvf_hw.h | 1 +
 drivers/net/thunderx/nicvf_ethdev.c  | 7 +++
 4 files changed, 13 insertions(+)

diff --git a/doc/guides/nics/thunderx.rst b/doc/guides/nics/thunderx.rst
index 9763bb6..187c9a4 100644
--- a/doc/guides/nics/thunderx.rst
+++ b/doc/guides/nics/thunderx.rst
@@ -62,6 +62,7 @@ Supported ThunderX SoCs
 ---
 - CN88xx
 - CN81xx
+- CN83xx

 Prerequisites
 -
diff --git a/drivers/net/thunderx/base/nicvf_hw.c 
b/drivers/net/thunderx/base/nicvf_hw.c
index a69cd02..04b3b69 100644
--- a/drivers/net/thunderx/base/nicvf_hw.c
+++ b/drivers/net/thunderx/base/nicvf_hw.c
@@ -146,6 +146,10 @@ nicvf_base_init(struct nicvf *nic)
if (nicvf_hw_version(nic) == PCI_SUB_DEVICE_ID_CN81XX_NICVF)
nic->hwcap |= NICVF_CAP_TUNNEL_PARSING | NICVF_CAP_CQE_RX2;

+   if (nicvf_hw_version(nic) == PCI_SUB_DEVICE_ID_CN83XX_NICVF)
+   nic->hwcap |= NICVF_CAP_TUNNEL_PARSING | NICVF_CAP_CQE_RX2 |
+   NICVF_CAP_DISABLE_APAD;
+
return NICVF_OK;
 }

diff --git a/drivers/net/thunderx/base/nicvf_hw.h 
b/drivers/net/thunderx/base/nicvf_hw.h
index cf68be9..14fb2fe 100644
--- a/drivers/net/thunderx/base/nicvf_hw.h
+++ b/drivers/net/thunderx/base/nicvf_hw.h
@@ -43,6 +43,7 @@
 #definePCI_SUB_DEVICE_ID_CN88XX_PASS1_NICVF0xA11E
 #definePCI_SUB_DEVICE_ID_CN88XX_PASS2_NICVF0xA134
 #definePCI_SUB_DEVICE_ID_CN81XX_NICVF  0xA234
+#definePCI_SUB_DEVICE_ID_CN83XX_NICVF  0xA334

 #define NICVF_ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))

diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
b/drivers/net/thunderx/nicvf_ethdev.c
index 501c8c2..466e49c 100644
--- a/drivers/net/thunderx/nicvf_ethdev.c
+++ b/drivers/net/thunderx/nicvf_ethdev.c
@@ -2097,6 +2097,13 @@ static const struct rte_pci_id pci_id_nicvf_map[] = {
.subsystem_device_id = PCI_SUB_DEVICE_ID_CN81XX_NICVF,
},
{
+   .class_id = RTE_CLASS_ANY_ID,
+   .vendor_id = PCI_VENDOR_ID_CAVIUM,
+   .device_id = PCI_DEVICE_ID_THUNDERX_NICVF,
+   .subsystem_vendor_id = PCI_VENDOR_ID_CAVIUM,
+   .subsystem_device_id = PCI_SUB_DEVICE_ID_CN83XX_NICVF,
+   },
+   {
.vendor_id = 0,
},
 };
-- 
2.5.5



[dpdk-dev] [PATCH 1/2] net/thunderx: disable l3 alignment pad feature

2016-11-08 Thread Jerin Jacob
Based on the packet type(IPv4 or IPv6), the nicvf HW aligns
L3 data to the 64bit memory address.
The alignment creates a hole in mbuf(between the
end of headroom and packet data start).
The new revision of the HW provides an option to disable
the L3 alignment feature and make mbuf layout looks
more like other NICs. For better application compatibility,
disabling l3 alignment feature on the hardware revisions it supports.

Signed-off-by: Jerin Jacob 
---
 drivers/net/thunderx/base/nicvf_hw.c  | 18 ++
 drivers/net/thunderx/base/nicvf_hw.h  |  4 
 drivers/net/thunderx/base/nicvf_hw_defs.h |  2 ++
 drivers/net/thunderx/nicvf_ethdev.c   | 10 ++
 4 files changed, 34 insertions(+)

diff --git a/drivers/net/thunderx/base/nicvf_hw.c 
b/drivers/net/thunderx/base/nicvf_hw.c
index 1f08ef2..a69cd02 100644
--- a/drivers/net/thunderx/base/nicvf_hw.c
+++ b/drivers/net/thunderx/base/nicvf_hw.c
@@ -725,6 +725,24 @@ nicvf_vlan_hw_strip(struct nicvf *nic, bool enable)
 }

 void
+nicvf_apad_config(struct nicvf *nic, bool enable)
+{
+   uint64_t val;
+
+   /* APAD always enabled in this device */
+   if (!(nic->hwcap & NICVF_CAP_DISABLE_APAD))
+   return;
+
+   val = nicvf_reg_read(nic, NIC_VNIC_RQ_GEN_CFG);
+   if (enable)
+   val &= ~(1ULL << NICVF_QS_RQ_DIS_APAD_SHIFT);
+   else
+   val |= (1ULL << NICVF_QS_RQ_DIS_APAD_SHIFT);
+
+   nicvf_reg_write(nic, NIC_VNIC_RQ_GEN_CFG, val);
+}
+
+void
 nicvf_rss_set_key(struct nicvf *nic, uint8_t *key)
 {
int idx;
diff --git a/drivers/net/thunderx/base/nicvf_hw.h 
b/drivers/net/thunderx/base/nicvf_hw.h
index 2b8738b..cf68be9 100644
--- a/drivers/net/thunderx/base/nicvf_hw.h
+++ b/drivers/net/thunderx/base/nicvf_hw.h
@@ -54,6 +54,8 @@
 #define NICVF_CAP_TUNNEL_PARSING   (1ULL << 0)
 /* Additional word in Rx descriptor to hold optional tunneling extension info 
*/
 #define NICVF_CAP_CQE_RX2  (1ULL << 1)
+/* The device capable of setting NIC_CQE_RX_S[APAD] == 0 */
+#define NICVF_CAP_DISABLE_APAD (1ULL << 2)

 enum nicvf_tns_mode {
NIC_TNS_BYPASS_MODE,
@@ -217,6 +219,8 @@ uint32_t nicvf_qsize_sq_roundup(uint32_t val);

 void nicvf_vlan_hw_strip(struct nicvf *nic, bool enable);

+void nicvf_apad_config(struct nicvf *nic, bool enable);
+
 int nicvf_rss_config(struct nicvf *nic, uint32_t  qcnt, uint64_t cfg);
 int nicvf_rss_term(struct nicvf *nic);

diff --git a/drivers/net/thunderx/base/nicvf_hw_defs.h 
b/drivers/net/thunderx/base/nicvf_hw_defs.h
index e144d44..00dd2fe 100644
--- a/drivers/net/thunderx/base/nicvf_hw_defs.h
+++ b/drivers/net/thunderx/base/nicvf_hw_defs.h
@@ -105,6 +105,8 @@
 #define NICVF_INTR_MBOX_SHIFT   22
 #define NICVF_INTR_QS_ERR_SHIFT 23

+#define NICVF_QS_RQ_DIS_APAD_SHIFT  22
+
 #define NICVF_INTR_CQ_MASK  (0xFF << NICVF_INTR_CQ_SHIFT)
 #define NICVF_INTR_SQ_MASK  (0xFF << NICVF_INTR_SQ_SHIFT)
 #define NICVF_INTR_RBDR_MASK(0x03 << NICVF_INTR_RBDR_SHIFT)
diff --git a/drivers/net/thunderx/nicvf_ethdev.c 
b/drivers/net/thunderx/nicvf_ethdev.c
index 094c5d5..501c8c2 100644
--- a/drivers/net/thunderx/nicvf_ethdev.c
+++ b/drivers/net/thunderx/nicvf_ethdev.c
@@ -1527,6 +1527,16 @@ nicvf_vf_start(struct rte_eth_dev *dev, struct nicvf 
*nic, uint32_t rbdrsz)
/* Configure VLAN Strip */
nicvf_vlan_hw_strip(nic, dev->data->dev_conf.rxmode.hw_vlan_strip);

+   /* Based on the packet type(IPv4 or IPv6), the nicvf HW aligns L3 data
+* to the 64bit memory address.
+* The alignment creates a hole in mbuf(between the end of headroom and
+* packet data start). The new revision of the HW provides an option to
+* disable the L3 alignment feature and make mbuf layout looks
+* more like other NICs. For better application compatibility, disabling
+* l3 alignment feature on the hardware revisions it supports
+*/
+   nicvf_apad_config(nic, false);
+
/* Get queue ranges for this VF */
nicvf_tx_range(dev, nic, _start, _end);

-- 
2.5.5



[dpdk-dev] [PATCH 0/2] net/thunderx: add 83xx SoC support

2016-11-08 Thread Jerin Jacob
CN83xx is 24 core version of ThunderX ARMv8 SoC with integrated
octeon style packet and crypto accelerators.

The standard NIC block used in 88xx/81xx also included in 83xx.
This patchset adds support for existing standard NIC block on 83xx by
adding new HW capability flag to select the difference in runtime.

Jerin Jacob (2):
  net/thunderx: disable l3 alignment pad feature
  net/thunderx: add cn83xx support

 doc/guides/nics/thunderx.rst  |  1 +
 drivers/net/thunderx/base/nicvf_hw.c  | 22 ++
 drivers/net/thunderx/base/nicvf_hw.h  |  5 +
 drivers/net/thunderx/base/nicvf_hw_defs.h |  2 ++
 drivers/net/thunderx/nicvf_ethdev.c   | 17 +
 5 files changed, 47 insertions(+)

-- 
2.5.5



[dpdk-dev] [PATCH v3] doc: arm64: document DPDK application profiling methods

2016-11-08 Thread Jerin Jacob
Signed-off-by: Jerin Jacob 
Signed-off-by: John McNamara 
---
v3:
Fixed formatting issues:
- Remove the introduction heading and put intro text under the main 
heading(Thomas)
- Fixed RST formatting issues such as enclosing technical terms in 
backquotes(John)
Thanks, John for providing the updated version
v2:
-Addressed ARM64 specific review comments(Suggested by Thomas)
http://dpdk.org/dev/patchwork/patch/16362/
---
 doc/guides/prog_guide/profile_app.rst | 64 ++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/profile_app.rst 
b/doc/guides/prog_guide/profile_app.rst
index 3226187..54b546a 100644
--- a/doc/guides/prog_guide/profile_app.rst
+++ b/doc/guides/prog_guide/profile_app.rst
@@ -31,8 +31,15 @@
 Profile Your Application
 

+The following sections describe methods of profiling DPDK applications on
+different architectures.
+
+
+Profiling on x86
+
+
 Intel processors provide performance counters to monitor events.
-Some tools provided by Intel can be used to profile and benchmark an 
application.
+Some tools provided by Intel, such as VTune, can be used to profile and 
benchmark an application.
 See the *VTune Performance Analyzer Essentials* publication from Intel Press 
for more information.

 For a DPDK application, this can be done in a Linux* application environment 
only.
@@ -50,3 +57,58 @@ The main situations that should be monitored through event 
counters are:
 Refer to the
 `Intel Performance Analysis Guide 
<http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_
 for details about application profiling.
+
+
+Profiling on ARM64
+--
+
+Using Linux perf
+
+
+The ARM64 architecture provide performance counters to monitor events.  The
+Linux ``perf`` tool can be used to profile and benchmark an application.  In
+addition to the standard events, ``perf`` can be used to profile arm64
+specific PMU (Performance Monitor Unit) events through raw events (``-e``
+``-rXX``).
+
+For more derails refer to the
+`ARM64 specific PMU events enumeration 
<http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_.
+
+
+High-resolution cycle counter
+~
+
+The default ``cntvct_el0`` based ``rte_rdtsc()`` provides a portable means to
+get a wall clock counter in user space. Typically it runs at <= 100MHz.
+
+The alternative method to enable ``rte_rdtsc()`` for a high resolution wall
+clock counter is through the armv8 PMU subsystem. The PMU cycle counter runs
+at CPU frequency. However, access to the PMU cycle counter from user space is
+not enabled by default in the arm64 linux kernel. It is possible to enable
+cycle counter for user space access by configuring the PMU from the privileged
+mode (kernel space).
+
+By default the ``rte_rdtsc()`` implementation uses a portable ``cntvct_el0``
+scheme.  Application can choose the PMU based implementation with
+``CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU``.
+
+The example below shows the steps to configure the PMU based cycle counter on
+an armv8 machine.
+
+.. code-block:: console
+
+git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
+cd armv8_pmu_cycle_counter_el0
+make
+sudo insmod pmu_el0_cycle_counter.ko
+cd $DPDK_DIR
+make config T=arm64-armv8a-linuxapp-gcc
+echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
+make
+
+.. warning::
+
+   The PMU based scheme is useful for high accuracy performance profiling with
+   ``rte_rdtsc()``. However, this method can not be used in conjunction with
+   Linux userspace profiling tools like ``perf`` as this scheme alters the PMU
+   registers state.
-- 
2.5.5



[dpdk-dev] [RFC]Generic flow filtering API Sample Application

2016-11-03 Thread Jerin Jacob
On Wed, Nov 02, 2016 at 05:27:50AM +, Zhao1, Wei wrote:
> Hi  All,
> Now we are planning for an sample application for Generic flow 
> filtering API feature, and I have finished the RFC for this example app.
> Now  Adrien Mazarguil  has send v2 version of Generic flow 
> filtering API,  this sample application  RFC is based on that.
> 
> Thank you.
> 
> 
> 
> 
> Generic flow filtering API Sample Application
> 
> 
> The application is a simple example of generic flow filtering API using the 
> DPDK.
> The application performs flow director/filtering/classification in packet 
> processing.
> 
> Overview
> 
> 
> The application demonstrates the use of generic flow 
> director/filtering/classification API 
> in the DPDK to implement packet forwarding.And this document focus on the 
> guide line of writing rules configuration 
> files and prompt commands usage. It also supply the definition of the 
> available EAL options arguments which is useful
> in DPDK packet forwarding processing.
> 
> 
> Compiling the Application
> -
> 
> To compile the application:
> 
> #.Go to the sample application directory:
> 
>   .. code-block:: console
> 
>   export RTE_SDK=/path/to/rte_sdk
>   cd ${RTE_SDK}/examples/gen_filter

Any specific reason to create a separate application for testing the generic
filter but not as a extension to testpmd?

> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Wed, Nov 02, 2016 at 01:56:27PM +, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 06:39:27PM +0530, Jerin Jacob wrote:
> > On Wed, Nov 02, 2016 at 11:35:51AM +, Bruce Richardson wrote:
> > > On Wed, Nov 02, 2016 at 04:55:22PM +0530, Jerin Jacob wrote:
> > > > On Fri, Oct 28, 2016 at 02:36:48PM +0530, Jerin Jacob wrote:
> > > > > On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> > > > > > On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > > > > > > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > > > > > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > > > > How about making default as "mixed" and let application 
> > > > > > > configures what
> > > > > > > is not required?. That way application responsibility is clear.
> > > > > > > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, 
> > > > > > > ETH_TXQ_FLAGS_NOREFCOUNT
> > > > > > > with default.
> > > > > > > 
> > > > > > I suppose it could work, but why bother doing that? If an app knows 
> > > > > > it's
> > > > > > only going to use one traffic type, why not let it just state what 
> > > > > > it
> > > > > > will do rather than try to specify what it won't do. If mixed is 
> > > > > > needed,
> > > > > 
> > > > > My thought was more inline with ethdev spec, like, ref-count is 
> > > > > default,
> > > > > if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But 
> > > > > it is OK, if
> > > > > you need other way.
> > > > > 
> > > > > > then it's easy enough to specify - and we can make it the 
> > > > > > zero/default
> > > > > > value too.
> > > > > 
> > > > > OK. Then we will make MIX as zero/default and add 
> > > > > "allowed_event_types" in
> > > > > event queue config.
> > > > >
> > > > 
> > > > Bruce,
> > > > 
> > > > I have tried to make it as "allowed_event_types" in event queue config.
> > > > However, rte_event_queue_default_conf_get() can also take NULL for 
> > > > default
> > > > configuration. So I think, It makes sense to go with negation approach
> > > > like ethdev to define the default to avoid confusion on the default. So
> > > > I am thinking like below now,
> > > > 
> > > > ? [master][libeventdev] $ git diff
> > > > diff --git a/rte_eventdev.h b/rte_eventdev.h
> > > > index cf22b0e..cac4642 100644
> > > > --- a/rte_eventdev.h
> > > > +++ b/rte_eventdev.h
> > > > @@ -429,6 +429,12 @@ rte_event_dev_configure(uint8_t dev_id, struct
> > > > rte_event_dev_config *config);
> > > >   *
> > > >   *  \see rte_event_port_setup(), rte_event_port_link()
> > > >   */
> > > > +#define RTE_EVENT_QUEUE_CFG_NOATOMIC_TYPE  (1ULL << 1)
> > > > +/**< Skip configuring atomic schedule type resources */
> > > > +#define RTE_EVENT_QUEUE_CFG_NOORDERED_TYPE (1ULL << 2)
> > > > +/**< Skip configuring ordered schedule type resources */
> > > > +#define RTE_EVENT_QUEUE_CFG_NOPARALLEL_TYPE(1ULL << 3)
> > > > +/**< Skip configuring parallel schedule type resources */
> > > > 
> > > >  /** Event queue configuration structure */
> > > >  struct rte_event_queue_conf {
> > > > 
> > > > Thoughts?
> > > > 
> > > 
> > > I'm ok with the default as being all types, in the case where NULL is
> > > specified for the parameter. It does make the most sense.
> > 
> > Yes. That case I need to explicitly mention in the documentation about what
> > is default case. With RTE_EVENT_QUEUE_CFG_NOATOMIC_TYPE scheme it quite
> > understood what is default. Not adding up? :-)
> > 
> 
> Would below not work? DEFAULT explicitly stated, and can be commented to
> say all types allowed.

All I was trying to avoid explicitly stating the default state. Not worth
to have back and forth on slow path configuration, I will keep it as
positive logic as you suggested :-) and inspired from PKT_TX_L4_MASK

#define RTE_EVENT_QUEUE_CFG_TYPE_MASK   (3ULL << 0)
#define RTE_EVENT_QUEUE_CFG_ALL_TYPES   (0ULL << 0) /**< Enable all types */
#define RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY (1ULL << 0)
#define RTE_EVENT_QUEUE_CFG_ORDERED_ONLY(2ULL << 0)
#define RTE_EVENT_QUEUE_CFG_PARALLEL_ONLY   (3ULL << 0)
#define RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER (1ULL << 2)

> 
> #define RTE_EVENT_QUEUE_CFG_DEFAULT 0
> #define RTE_EVENT_QUEUE_CFG_ALL_TYPES RTE_EVENT_QUEUE_CFG_DEFAULT
> #define RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY (1<<0)
> #define RTE_EVENT_QUEUE_CFG_ORDERED_ONLY (1<<1) 
> 
> 
> /Bruce


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Wed, Nov 02, 2016 at 11:35:51AM +, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 04:55:22PM +0530, Jerin Jacob wrote:
> > On Fri, Oct 28, 2016 at 02:36:48PM +0530, Jerin Jacob wrote:
> > > On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> > > > On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > > > > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > > > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > > How about making default as "mixed" and let application configures 
> > > > > what
> > > > > is not required?. That way application responsibility is clear.
> > > > > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, 
> > > > > ETH_TXQ_FLAGS_NOREFCOUNT
> > > > > with default.
> > > > > 
> > > > I suppose it could work, but why bother doing that? If an app knows it's
> > > > only going to use one traffic type, why not let it just state what it
> > > > will do rather than try to specify what it won't do. If mixed is needed,
> > > 
> > > My thought was more inline with ethdev spec, like, ref-count is default,
> > > if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But it 
> > > is OK, if
> > > you need other way.
> > > 
> > > > then it's easy enough to specify - and we can make it the zero/default
> > > > value too.
> > > 
> > > OK. Then we will make MIX as zero/default and add "allowed_event_types" in
> > > event queue config.
> > >
> > 
> > Bruce,
> > 
> > I have tried to make it as "allowed_event_types" in event queue config.
> > However, rte_event_queue_default_conf_get() can also take NULL for default
> > configuration. So I think, It makes sense to go with negation approach
> > like ethdev to define the default to avoid confusion on the default. So
> > I am thinking like below now,
> > 
> > ? [master][libeventdev] $ git diff
> > diff --git a/rte_eventdev.h b/rte_eventdev.h
> > index cf22b0e..cac4642 100644
> > --- a/rte_eventdev.h
> > +++ b/rte_eventdev.h
> > @@ -429,6 +429,12 @@ rte_event_dev_configure(uint8_t dev_id, struct
> > rte_event_dev_config *config);
> >   *
> >   *  \see rte_event_port_setup(), rte_event_port_link()
> >   */
> > +#define RTE_EVENT_QUEUE_CFG_NOATOMIC_TYPE  (1ULL << 1)
> > +/**< Skip configuring atomic schedule type resources */
> > +#define RTE_EVENT_QUEUE_CFG_NOORDERED_TYPE (1ULL << 2)
> > +/**< Skip configuring ordered schedule type resources */
> > +#define RTE_EVENT_QUEUE_CFG_NOPARALLEL_TYPE(1ULL << 3)
> > +/**< Skip configuring parallel schedule type resources */
> > 
> >  /** Event queue configuration structure */
> >  struct rte_event_queue_conf {
> > 
> > Thoughts?
> > 
> 
> I'm ok with the default as being all types, in the case where NULL is
> specified for the parameter. It does make the most sense.

Yes. That case I need to explicitly mention in the documentation about what
is default case. With RTE_EVENT_QUEUE_CFG_NOATOMIC_TYPE scheme it quite
understood what is default. Not adding up? :-)

> 
> However, for the cases where the user does specify what they want, I
> think it does make more sense, and is easier on the user for things to
> be specified in a positive, rather than negative sense. For a user who
> wants to just use atomic events, having to specify that as "not-reordered
> and not-unordered" just isn't as clear! :-)
> 
> /Bruce
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Wed, Nov 02, 2016 at 11:48:37AM +, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 01:36:34PM +0530, Jerin Jacob wrote:
> > On Fri, Oct 28, 2016 at 01:48:57PM +, Van Haaren, Harry wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > > Sent: Tuesday, October 25, 2016 6:49 PM
> > > 
> > > > 
> > > > Hi Community,
> > > > 
> > > > So far, I have received constructive feedback from Intel, NXP and 
> > > > Linaro folks.
> > > > Let me know, if anyone else interested in contributing to the 
> > > > definition of eventdev?
> > > > 
> > > > If there are no major issues in proposed spec, then Cavium would like 
> > > > work on
> > > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > > in next version).
> > > 
> > > 
> > > Hi All,
> > > 
> > > I've been looking at the eventdev API from a use-case point of view, and 
> > > I'm unclear on a how the API caters for two uses. I have simplified these 
> > > as much as possible, think of them as a theoretical unit-test for the API 
> > > :)
> > > 
> > > 
> > > Fragmentation:
> > > 1. Dequeue 8 packets
> > > 2. Process 2 packets
> > > 3. Processing 3rd, this packet needs fragmentation into two packets
> > > 4. Process remaining 5 packets as normal
> > > 
> > > What function calls does the application make to achieve this?
> > > In particular, I'm referring to how can the scheduler know that the 3rd 
> > > packet is the one being fragmented, and how to keep packet order valid. 
> > > 
> > 
> > OK. I will try to share my views on IP fragmentation on event _HW_
> > models(at least on Cavium HW) then we can see, how we can converge.
> > 
> > First, The fragmentation specific logic should be decoupled from the event
> > model as it specific to packet and L3 layer(Not specific to generic event)
> > 
> I would view fragmentation as just one example of a workload like this,
> multicast and broadcast may be two other cases. Yes, they all apply to
> packet, but the general feature support is just how to provide support
> for one event generating multiple further events which should be linked
> together for reordering. [I think this only really applies in the

AFIAK, There two different schemes to "maintain ordering", the first one
is based "reordering buffers" i.e as a list data structure used to hold the
event first and then when it comes correcting the order(ORDERED->ATOMIC),
correct the order based on the previous "reordering buffers".
But some HW implementation use "port" state based reordering scheme
(i.e no external reorder buffer to keep track the order).

So I think, To have portable application workflow, the use case where multiple
event generated based on one event, generated events needs to store in the 
parent event
and in the downstream, process them as required. like fragmentation example in

http://dpdk.org/ml/archives/dev/2016-November/049707.html

The above scheme should OK in your implementation. Right?


> reordered case - which leads to another question: in your experience
> do you see other event types other than packet being handled in a
> "reordered" manner?]

We use both timer events and crypto completion events etc in ORDERED
type. But not like, one event creates N event scheme on those.

> 
> /Bruce
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Wed, Nov 02, 2016 at 11:45:07AM +, Bruce Richardson wrote:
> On Wed, Nov 02, 2016 at 04:17:04PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > > 
> > > > So far, I have received constructive feedback from Intel, NXP and 
> > > > Linaro folks.
> > > > Let me know, if anyone else interested in contributing to the 
> > > > definition of eventdev?
> > > > 
> > > > If there are no major issues in proposed spec, then Cavium would like 
> > > > work on
> > > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > > in next version).
> > >
> > 
> > Hi All,
> > 
> > Two queries,
> > 
> > 1) In SW implementation, Is their any connection between "struct
> > rte_event_port_conf"'s dequeue_queue_depth and enqueue_queue_depth ?
> > i.e it should be enqueue_queue_depth >= dequeue_queue_depth. Right ?
> > Thought of adding the common checks in common layer.
> 
> I think this is probably best left to the driver layers to enforce. For
> us, such a restriction doesn't really make sense, though in many cases
> that would be the usual setup. For accurate load balancing, the dequeue
> queue depth would be small, and the burst size would probably equal the
> queue depth, meaning the enqueue depth needs to be at least as big.
> However, for better throughput, or in cases where all traffic is being
> coalesced to a single core e.g. for transmit out a network port, there
> is no need to keep the dequeue queue shallow and so it can be many times
> the burst size, while the enqueue queue can be kept to 1-2 times the
> burst size.
> 

OK

> > 
> > 2)Any comments on follow item(section under ) that needs improvement.
> > ---
> > Abstract the differences in event QoS management with different
> > priority schemes available in different HW or SW implementations with 
> > portable
> > application workflow.
> > 
> > Based on the feedback, there three different kinds of QoS support
> > available in
> > three different HW or SW implementations.
> > 1) Priority associated with the event queue
> > 2) Priority associated with each event enqueue
> > (Same flow can have two different priority on two separate enqueue)
> > 3) Priority associated with the flow(each flow has unique priority)
> > 
> > In v2, The differences abstracted based on device capability
> > (RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
> > RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
> > This scheme would call for different application workflow for
> > nontrivial QoS-enabled applications.
> > ---
> > After thinking a while, I think, RTE_EVENT_DEV_CAP_EVENT_QOS is a
> > super-set.if so, the subset RTE_EVENT_DEV_CAP_QUEUE_QOS can be
> > implemented with RTE_EVENT_DEV_CAP_EVENT_QOS. i.e We may not need two
> > flags, Just one flag RTE_EVENT_DEV_CAP_EVENT_QOS is enough to fix
> > portability issue with basic QoS enabled applications.
> > 
> > i.e Introduce RTE_EVENT_DEV_CAP_EVENT_QOS as config option in device
> > configure stage if application needs fine granularity on QoS per event
> > enqueue.For trivial applications, configured
> > rte_event_queue_conf->priority can be used as rte_event_enqueue(struct
> > rte_event.priority)
> > 
> So all implementations should support the concept of priority among
> queues, and then there is optional support for event or flow based
> prioritization. Is that a correct interpretation of what you propose?

Yes. If you _can_ implement it and if possible in the system.

> 
> /Bruce
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Fri, Oct 28, 2016 at 02:36:48PM +0530, Jerin Jacob wrote:
> On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> > On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > > > > -Original Message-
> > > > rte_event_queue_conf, with possible values:
> > > > * atomic
> > > > * ordered
> > > > * parallel
> > > > * mixed - allowing all 3 types. I think allowing 2 of three types might
> > > > make things too complicated.
> > > > 
> > > > An open question would then be how to behave when the queue type and
> > > > requested event type conflict. We can either throw an error, or just
> > > > ignore the event type and always treat enqueued events as being of the
> > > > queue type. I prefer the latter, because it's faster not having to
> > > > error-check, and it pushes the responsibility on the app to know what
> > > > it's doing.
> > > 
> > > How about making default as "mixed" and let application configures what
> > > is not required?. That way application responsibility is clear.
> > > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> > > with default.
> > > 
> > I suppose it could work, but why bother doing that? If an app knows it's
> > only going to use one traffic type, why not let it just state what it
> > will do rather than try to specify what it won't do. If mixed is needed,
> 
> My thought was more inline with ethdev spec, like, ref-count is default,
> if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But it is 
> OK, if
> you need other way.
> 
> > then it's easy enough to specify - and we can make it the zero/default
> > value too.
> 
> OK. Then we will make MIX as zero/default and add "allowed_event_types" in
> event queue config.
>

Bruce,

I have tried to make it as "allowed_event_types" in event queue config.
However, rte_event_queue_default_conf_get() can also take NULL for default
configuration. So I think, It makes sense to go with negation approach
like ethdev to define the default to avoid confusion on the default. So
I am thinking like below now,

? [master][libeventdev] $ git diff
diff --git a/rte_eventdev.h b/rte_eventdev.h
index cf22b0e..cac4642 100644
--- a/rte_eventdev.h
+++ b/rte_eventdev.h
@@ -429,6 +429,12 @@ rte_event_dev_configure(uint8_t dev_id, struct
rte_event_dev_config *config);
  *
  *  \see rte_event_port_setup(), rte_event_port_link()
  */
+#define RTE_EVENT_QUEUE_CFG_NOATOMIC_TYPE  (1ULL << 1)
+/**< Skip configuring atomic schedule type resources */
+#define RTE_EVENT_QUEUE_CFG_NOORDERED_TYPE (1ULL << 2)
+/**< Skip configuring ordered schedule type resources */
+#define RTE_EVENT_QUEUE_CFG_NOPARALLEL_TYPE(1ULL << 3)
+/**< Skip configuring parallel schedule type resources */

 /** Event queue configuration structure */
 struct rte_event_queue_conf {

Thoughts?


> /Jerin
> 
> > 
> > Our software implementation for now, only supports one type per queue -
> > which we suspect should meet a lot of use-cases. We'll have to see about
> > adding in mixed types in future.
> > 
> > /Bruce


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > folks.
> > Let me know, if anyone else interested in contributing to the definition of 
> > eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work 
> > on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
>

Hi All,

Two queries,

1) In SW implementation, Is their any connection between "struct
rte_event_port_conf"'s dequeue_queue_depth and enqueue_queue_depth ?
i.e it should be enqueue_queue_depth >= dequeue_queue_depth. Right ?
Thought of adding the common checks in common layer.

2)Any comments on follow item(section under ) that needs improvement.
---
Abstract the differences in event QoS management with different
priority schemes available in different HW or SW implementations with portable
application workflow.

Based on the feedback, there three different kinds of QoS support
available in
three different HW or SW implementations.
1) Priority associated with the event queue
2) Priority associated with each event enqueue
(Same flow can have two different priority on two separate enqueue)
3) Priority associated with the flow(each flow has unique priority)

In v2, The differences abstracted based on device capability
(RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
This scheme would call for different application workflow for
nontrivial QoS-enabled applications.
---
After thinking a while, I think, RTE_EVENT_DEV_CAP_EVENT_QOS is a
super-set.if so, the subset RTE_EVENT_DEV_CAP_QUEUE_QOS can be
implemented with RTE_EVENT_DEV_CAP_EVENT_QOS. i.e We may not need two
flags, Just one flag RTE_EVENT_DEV_CAP_EVENT_QOS is enough to fix
portability issue with basic QoS enabled applications.

i.e Introduce RTE_EVENT_DEV_CAP_EVENT_QOS as config option in device
configure stage if application needs fine granularity on QoS per event
enqueue.For trivial applications, configured
rte_event_queue_conf->priority can be used as rte_event_enqueue(struct
rte_event.priority)

Thoughts?

/Jerin




[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Fri, Oct 28, 2016 at 03:16:18PM +0100, Bruce Richardson wrote:
> On Fri, Oct 28, 2016 at 02:48:57PM +0100, Van Haaren, Harry wrote:
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Tuesday, October 25, 2016 6:49 PM
> > 
> > > 
> > > Hi Community,
> > > 
> > > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > > folks.
> > > Let me know, if anyone else interested in contributing to the definition 
> > > of eventdev?
> > > 
> > > If there are no major issues in proposed spec, then Cavium would like 
> > > work on
> > > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > > an associated HW driver.(Requested minor changes of v2 will be addressed
> > > in next version).
> > 
> > 
> > Hi All,
> > 
> > I've been looking at the eventdev API from a use-case point of view, and 
> > I'm unclear on a how the API caters for two uses. I have simplified these 
> > as much as possible, think of them as a theoretical unit-test for the API :)
> > 
> > 
> > Fragmentation:
> > 1. Dequeue 8 packets
> > 2. Process 2 packets
> > 3. Processing 3rd, this packet needs fragmentation into two packets
> > 4. Process remaining 5 packets as normal
> > 
> > What function calls does the application make to achieve this?
> > In particular, I'm referring to how can the scheduler know that the 3rd 
> > packet is the one being fragmented, and how to keep packet order valid. 
> > 
> > 
> > Dropping packets:
> > 1. Dequeue 8 packets
> > 2. Process 2 packets
> > 3. Processing 3rd, this packet needs to be dropped
> > 4. Process remaining 5 packets as normal
> > 
> > What function calls does the application make to achieve this?
> > Again, in particular how does the scheduler know that the 3rd packet is 
> > being dropped.
> > 
> > 
> > Regards, -Harry
> 
> Hi,
> 
> these questions apply particularly to reordered which has a lot more
> complications than the other types in terms of sending packets back into
> the scheduler. However, atomic types will still suffer from problems
> with things the way they are - again if we assume a burst of 8 packets,
> then to forward those packets, we need to re-enqueue them again to the
> scheduler, and also then send 8 releases to the scheduler as well, to
> release the atomic locks for those packets.
> This means that for each packet we have to send two messages to a
> scheduler core, something that is really inefficient.
> 
> This number of messages is critical for any software implementation, as
> the cost of moving items core-to-core is going to be a big bottleneck
> (perhaps the biggest bottleneck) in the system. It's for this reason we
> need to use burst APIs - as with rte_rings.

I agree, That the reason why we have rte_event_*_burst()

> 
> How we have solved this in our implementation, is to allow there to be
> an event operation type. The four operations we implemented are as below
> (using packet as a synonym for event here, since these would mostly
> apply to packets flowing through a system):
> 
> * NEW - just a regular enqueue of a packet, without any previous context

Makes sense. I was trying derive it.Make sense for application
requesting it.

> * FORWARD - enqueue a packet, and mark the flow processing for the
> equivalent packet that was dequeued as completed, i.e.
>   release any atomic locks, or reorder this packet with
>   respect to any other outstanding packets from the event queue.

Default case

> * DROP- this is roughtly equivalent to the existing "release" API call,
> except that having it as an enqueue type allows us to
>   release multiple items in a single call, and also to mix
>   releases with new packets and forwarded packets

Yes. Maps to rte_event_release(), with index parameter, its kind doing
the job. But, Makes sense as flag to enable burst.
But it calls for removing the index parameter. Looks like index parameter
has issue in Intel implementation. If so, may be we(Cavium) can fill the
index in the dequeue as implementation specific bits like Harry
suggested and use it in enqueue.
http://dpdk.org/ml/archives/dev/2016-October/049459.html

Any thoughts from NXP?

> * PARTIAL - this indicates that the packet being enqueued should be
>   treated according to the context of the current packet, but
>   that that context should not be released/completed by the
>   enqueue of this packet. This only really applie

[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-11-02 Thread Jerin Jacob
On Fri, Oct 28, 2016 at 01:48:57PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, October 25, 2016 6:49 PM
> 
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > folks.
> > Let me know, if anyone else interested in contributing to the definition of 
> > eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work 
> > on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> 
> Hi All,
> 
> I've been looking at the eventdev API from a use-case point of view, and I'm 
> unclear on a how the API caters for two uses. I have simplified these as much 
> as possible, think of them as a theoretical unit-test for the API :)
> 
> 
> Fragmentation:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs fragmentation into two packets
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> In particular, I'm referring to how can the scheduler know that the 3rd 
> packet is the one being fragmented, and how to keep packet order valid. 
> 

OK. I will try to share my views on IP fragmentation on event _HW_
models(at least on Cavium HW) then we can see, how we can converge.

First, The fragmentation specific logic should be decoupled from the event
model as it specific to packet and L3 layer(Not specific to generic event)

Now, let us consider the fragmentation handling with non-burst case and single 
flow.
The following text outlines the event flow

a)Setup an event device with single event queue
b)Link multiple ports to single event queue
c)Event producer enqueues p0..p7 packets to event queue with ORDERED
type.(let's assume p2 packet needs to be fragmented i.e application
needs to create p2.0 and p2.1 from p2)
d)Since it is an ORDERED type, p0 to p7 packets are distributed to multiple
ports in parallel(assigned to each lcore or lightweight thread)
e) each lcore/lightweight thread get the packet from designated event port
and process them in parallel and enqueue back to ATOMIC type to maintain
ordering
f)The one lcore dequeues the p2 packet, understands it needs to be
fragmented due to MTU size etc. So it calls rte_ipv4_fragment_packet()
and store the fragmented packet p2.0 and p2.1 in private area of p2 mbuf.
and as usual like other workers, it enqueues p2 to atomic queue for maintaining
the order.
g)On the atomic flow, when lcore dequeues packets, then it comes in order 
p0..p7.
The application sends p0 to p7 on the wire. When application checks the p2 mbuf
private area it understands it is fragmented and then sends p2.0 and p2.1
on the wire.

OR

skip the fragmentation step in (f) and in step (g),
while processing the p2, run over rte_ipv4_fragment_packet() and split the 
packet
and transmit the packets(in case application don't want to deal with mbuf 
private area)

Now, When it comes to BURST scheme. We are planning to create a SW
structure as a virtual event port and associate N 
(N=rte_event_port_dequeue_depth())
physical HW event ports to the virtual port.
That way, it just come as an extension to non burst API and on the
release call have explicit "index" and identify the physical event port
associated with the virtual port.

/Jerin

> 
> Dropping packets:
> 1. Dequeue 8 packets
> 2. Process 2 packets
> 3. Processing 3rd, this packet needs to be dropped
> 4. Process remaining 5 packets as normal
> 
> What function calls does the application make to achieve this?
> Again, in particular how does the scheduler know that the 3rd packet is being 
> dropped.

rte_event_release(..,..,3)??

> 
> 
> Regards, -Harry


[dpdk-dev] [PATCH v11 1/6] ethdev: add Tx preparation

2016-10-28 Thread Jerin Jacob
On Fri, Oct 28, 2016 at 10:15:47AM +, Ananyev, Konstantin wrote:
> Hi Tomasz,
> 
> > > > > Not sure why?
> > > > > If tx_pkt_prep == NULL, then rte_eth_tx_prep() would just act as noop.
> > > > > Right now it is not mandatory for the PMD to implement it.
> > > >
> > > > If it is not implemented, the application must do the preparation by
> > > itself.
> > > > From patch 6:
> > > > "
> > > > Removed pseudo header calculation for udp/tcp/tso packets from
> > > > application and used Tx preparation API for packet preparation and
> > > > verification.
> > > > "
> > > > So how does it behave with other drivers?
> > >
> > > Hmm so it seems that we broke testpmd csumonly mode for non-intel
> > > drivers..
> > > My bad, missed that part completely.
> > > Yes, then I suppose for now we'll need to support both (with and without)
> > > code paths for testpmd.
> > > Probably a new fwd mode or just extra parameter for the existing one?
> > > Any other suggestions?
> > >
> > 
> > I had sent txprep engine in v2 
> > (http://dpdk.org/dev/patchwork/patch/15775/), but I'm opened on the 
> > suggestions. If you like it I can resent
> > it in place of csumonly modification.
> 
> I still not sure it is worth to have another version of csum...
> Can we introduce a new global variable in testpmd and a new command:
> testpmd> csum tx_prep

Just my 2 cents, As "tx_prep" is a generic API and if PMD tries to
fix-up some other limitation(not csum) then in that case it is difficult for
the application to know in which PMD combination it needs be used.

> or so? 
> Looking at current testpmd patch, I suppose the changes will be minimal.
> What do you think?
> Konstantin 
> 
> > 
> > Tomasz
> > 
> > > >
> > > > > > >  struct rte_eth_dev {
> > > > > > >   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive
> > > function. */
> > > > > > >   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
> > > > > > > function. */
> > > > > > > + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit
> > > > > > > +prepare function. */
> > > > > > >   struct rte_eth_dev_data *data;  /**< Pointer to device data */
> > > > > > >   const struct eth_driver *driver;/**< Driver for this device */
> > > > > > >   const struct eth_dev_ops *dev_ops; /**< Functions exported by
> > > > > > > PMD */
> > > > > >
> > > > > > Could you confirm why tx_pkt_prep is not in dev_ops?
> > > > > > I guess we want to have several implementations?
> > > > >
> > > > > Yes, it depends on configuration options, same as tx_pkt_burst.
> > > > >
> > > > > >
> > > > > > Shouldn't we have a const struct control_dev_ops and a struct
> > > datapath_dev_ops?
> > > > >
> > > > > That's probably a good idea, but I suppose it is out of scope for that
> > > patch.
> > > >
> > > > No it's not out of scope.
> > > > It answers to the question "why is it added in this structure and not
> > > dev_ops".
> > > > We won't do this change when nothing else is changed in the struct.
> > >
> > > Not sure I understood you here:
> > > Are you saying datapath_dev_ops/controlpath_dev_ops have to be introduced
> > > as part of that patch?
> > > But that's a lot of  changes all over rte_ethdev.[h,c].
> > > It definitely worse a separate patch (might be some discussion) for me.
> > > Konstantin
> > >
> > >
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Jerin Jacob
On Fri, Oct 28, 2016 at 09:36:46AM +0100, Bruce Richardson wrote:
> On Fri, Oct 28, 2016 at 08:31:41AM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> > > On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > > > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > > > -Original Message-
> > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > > Thanks. One other suggestion is that it might be useful to provide
> > > support for having typed queues explicitly in the API. Right now, when
> > > you create an queue, the queue_conf structure takes as parameters how
> > > many atomic flows that are needed for the queue, or how many reorder
> > > slots need to be reserved for it. This implicitly hints at the type of
> > > traffic which will be sent to the queue, but I'm wondering if it's
> > > better to make it explicit. There are certain optimisations that can be
> > > looked at if we know that a queue only handles packets of a particular
> > > type. [Not having to handle reordering when pulling events from a core
> > > can be a big win for software!].
> > 
> > If it helps in SW implementation, then I think we can add this in queue
> > configuration. 
> > 
> > > 
> > > How about adding: "allowed_event_types" as a field to
> > > rte_event_queue_conf, with possible values:
> > > * atomic
> > > * ordered
> > > * parallel
> > > * mixed - allowing all 3 types. I think allowing 2 of three types might
> > > make things too complicated.
> > > 
> > > An open question would then be how to behave when the queue type and
> > > requested event type conflict. We can either throw an error, or just
> > > ignore the event type and always treat enqueued events as being of the
> > > queue type. I prefer the latter, because it's faster not having to
> > > error-check, and it pushes the responsibility on the app to know what
> > > it's doing.
> > 
> > How about making default as "mixed" and let application configures what
> > is not required?. That way application responsibility is clear.
> > something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
> > with default.
> > 
> I suppose it could work, but why bother doing that? If an app knows it's
> only going to use one traffic type, why not let it just state what it
> will do rather than try to specify what it won't do. If mixed is needed,

My thought was more inline with ethdev spec, like, ref-count is default,
if application need exception then set ETH_TXQ_FLAGS_NOREFCOUNT. But it is OK, 
if
you need other way.

> then it's easy enough to specify - and we can make it the zero/default
> value too.

OK. Then we will make MIX as zero/default and add "allowed_event_types" in
event queue config.

/Jerin

> 
> Our software implementation for now, only supports one type per queue -
> which we suspect should meet a lot of use-cases. We'll have to see about
> adding in mixed types in future.
> 
> /Bruce


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-28 Thread Jerin Jacob
On Wed, Oct 26, 2016 at 01:54:14PM +0100, Bruce Richardson wrote:
> On Wed, Oct 26, 2016 at 05:54:17PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> Thanks. One other suggestion is that it might be useful to provide
> support for having typed queues explicitly in the API. Right now, when
> you create an queue, the queue_conf structure takes as parameters how
> many atomic flows that are needed for the queue, or how many reorder
> slots need to be reserved for it. This implicitly hints at the type of
> traffic which will be sent to the queue, but I'm wondering if it's
> better to make it explicit. There are certain optimisations that can be
> looked at if we know that a queue only handles packets of a particular
> type. [Not having to handle reordering when pulling events from a core
> can be a big win for software!].

If it helps in SW implementation, then I think we can add this in queue
configuration. 

> 
> How about adding: "allowed_event_types" as a field to
> rte_event_queue_conf, with possible values:
> * atomic
> * ordered
> * parallel
> * mixed - allowing all 3 types. I think allowing 2 of three types might
> make things too complicated.
> 
> An open question would then be how to behave when the queue type and
> requested event type conflict. We can either throw an error, or just
> ignore the event type and always treat enqueued events as being of the
> queue type. I prefer the latter, because it's faster not having to
> error-check, and it pushes the responsibility on the app to know what
> it's doing.

How about making default as "mixed" and let application configures what
is not required?. That way application responsibility is clear.
something similar to ETH_TXQ_FLAGS_NOMULTSEGS, ETH_TXQ_FLAGS_NOREFCOUNT
with default.

/Jerin


> 
> /Bruce


[dpdk-dev] [PATCH] doc: remove Intel reference from multi-process support guide

2016-10-27 Thread Jerin Jacob
multi-process support has been verified on non IA such as ARMv8.

Signed-off-by: Jerin Jacob 
---
 doc/guides/prog_guide/multi_proc_support.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/prog_guide/multi_proc_support.rst 
b/doc/guides/prog_guide/multi_proc_support.rst
index badd102..2a996ae 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -35,7 +35,7 @@ Multi-process Support

 In the DPDK, multi-process support is designed to allow a group of DPDK 
processes
 to work together in a simple transparent manner to perform packet processing,
-or other workloads, on Intel? architecture hardware.
+or other workloads.
 To support this functionality,
 a number of additions have been made to the core DPDK Environment Abstraction 
Layer (EAL).

-- 
2.5.5



[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-27 Thread Jerin Jacob
On Wed, Oct 26, 2016 at 01:43:25PM +0100, Bruce Richardson wrote:
> On Tue, Oct 25, 2016 at 11:19:05PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> > > Thanks to Intel and NXP folks for the positive and constructive feedback
> > > I've received so far. Here is the updated RFC(v2).
> > > 
> > > I've attempted to address as many comments as possible.
> > > 
> > > This series adds rte_eventdev.h to the DPDK tree with
> > > adequate documentation in doxygen format.
> > > 
> > > Updates are also available online:
> > > 
> > > Related draft header file (this patch):
> > > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > > 
> > > PDF version(doxgen output):
> > > https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> > > 
> > > Repo:
> > > https://github.com/jerinjacobk/libeventdev
> > >
> > 
> > Hi Community,
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > folks.
> > Let me know, if anyone else interested in contributing to the definition of 
> > eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work 
> > on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> > 
> > We are planning to submit the work for 17.02 or 17.05 release(based on
> > how implementation goes).
> > 
> 
> Hi Jerin,

Hi Bruce,

> 
> thanks for driving this. In terms of the common code framework, when
> would you see that you might have something to upstream for that? As you
> know, we've been working on a software implementation which we are now
> looking to move to the eventdev APIs, and which also needs this common
> code to support it. 
> 
> If it can accelerate this effort, we can perhaps provide as an RFC
> the common code part that we have implemented for our work, or else we
> are happy to migrate to use common code you provide if it can be
> upstreamed fairly soon.

I have already started the common code framework. I will send the common code
as RFC in couple of days with vdev and pci bus interface.

> 
> Regards,
> /Bruce


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-26 Thread Jerin Jacob
On Wed, Oct 26, 2016 at 12:11:03PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > 
> > So far, I have received constructive feedback from Intel, NXP and Linaro 
> > folks.
> > Let me know, if anyone else interested in contributing to the definition of 
> > eventdev?
> > 
> > If there are no major issues in proposed spec, then Cavium would like work 
> > on
> > implementing and up-streaming the common code(lib/librte_eventdev/) and
> > an associated HW driver.(Requested minor changes of v2 will be addressed
> > in next version).
> 
> Hi All,
> 
> I will propose a minor change to the rte_event struct, allowing some bits to 
> be implementation specific. Currently the rte_event struct has no space to 
> allow an implementation store any metadata about the event. For software 
> performance it would be really helpful if there are some bits available for 
> the implementation to keep some flags about each event.

OK.

> 
> I suggest to rework the struct as below which opens 6 bits that were 
> otherwise wasted, and define them as implementation specific. By 
> implementation specific it is understood that the implementation can 
> overwrite any information stored in those bits, and the application must not 
> expect the data to remain after the event is scheduled.
> 
> OLD:
> struct rte_event {
>   uint32_t flow_id:24;
>   uint32_t queue_id:8;
>   uint8_t  sched_type; /* Note only 2 bits of 8 are required */
> 
> NEW:
> struct rte_event {
>   uint32_t flow_id:24;
>   uint32_t sched_type:2; /* reduced size : but 2 bits is enough for the 
> enqueue types Ordered,Atomic,Parallel.*/
>   uint32_t implementation:6; /* available for implementation specific 
> metadata */
>   uint8_t queue_id; /* still 8 bits as before */
> 
> 
> Thoughts? -Harry

Looks good to me. I will add it in v3.




[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-26 Thread Jerin Jacob
On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> Thanks to Intel and NXP folks for the positive and constructive feedback
> I've received so far. Here is the updated RFC(v2).
> 
> I've attempted to address as many comments as possible.
> 
> This series adds rte_eventdev.h to the DPDK tree with
> adequate documentation in doxygen format.
> 
> Updates are also available online:
> 
> Related draft header file (this patch):
> https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> 
> PDF version(doxgen output):
> https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> 
> Repo:
> https://github.com/jerinjacobk/libeventdev
>

Hi Community,

So far, I have received constructive feedback from Intel, NXP and Linaro folks.
Let me know, if anyone else interested in contributing to the definition of 
eventdev?

If there are no major issues in proposed spec, then Cavium would like work on
implementing and up-streaming the common code(lib/librte_eventdev/) and
an associated HW driver.(Requested minor changes of v2 will be addressed
in next version).

We are planning to submit the work for 17.02 or 17.05 release(based on
how implementation goes).

/Jerin
Cavium


[dpdk-dev] Project Governance and Linux Foundation

2016-10-19 Thread Jerin Jacob
On Wed, Oct 19, 2016 at 08:04:19AM +, O'Driscoll, Tim wrote:
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > 
> > On Tue, Oct 18, 2016 at 03:27:27PM +0200, Thomas Monjalon wrote:
> > > 2016-10-18 17:04, Jerin Jacob:
> > > > On Mon, Oct 17, 2016 at 05:23:42PM -0400, Dave Neary wrote:
> > > > > > I still hear concerns on this, and based on discussions with
> > others who put their names to the post below, they do too. I think it's
> > a perception that we need to address.
> > > > >
> > > > > I would say that there is still a perception issue, for companies
> > who
> > > > > look at the active developers, the owners of the project's
> > resources
> > > > > (infra, domain name), and who have heard anecdotal evidence of
> > issues in
> > > > > the past. I think the project has made a lot of progress since I
> > have
> > > > > been following it, and I do not believe there are any major issues
> > with
> > > > > the independence of the project. However, there are still
> > concerned
> > > > > parties on this front, and the concerns can be easily addressed by
> > a
> > > > > move to the LF.
> > > >
> > > > +1
> > >
> > > How can we solve issues if you don't give more details than
> > > "hear concerns" or "heard anecdotal evidence of issues"?
> > 
> > Honestly, I don't see any issue in the current DPDK project execution.
> > The concern was more towards the fact that multi-vendor infrastructure
> > project
> > like DPDK owned and controlled by the single company.
> > 
> > We believe, Moving to LF will fix that issue/perception and it will
> > enable more users to use/consume/invest DPDK in their products.
> 
> +1. This is in danger of becoming a never-ending argument. We said in the 
> original post that one of the goals of moving to LF is to "Remove any 
> remaining perception that DPDK is not truly open". I believe that's an 
> important goal for the project and one that we should all agree on.
> 
> Whether you choose the accept it or not, it's a fact that concerns exist in 
> the community over the fact that one single company controls the 
> infrastructure for the project. Moving the project to an independent body 
> like the LF would fix that.
> 
> > Having said that, Does anyone see any issue in moving to LF?
> > If yes, Then we should enumerate the issues and discuss further.
> 
> This is a great point. Can you explain what you see as the benefits of 
> maintaining the current model?

We don't see any additional benefits of maintaining the current model(when we
compare with LF model)

> As far as I can see, the LF model provides everything that we currently have, 
> plus it makes DPDK independent of any single company, and it also gives us 
> the option of availing of other LF services if we choose to do so, including 
> the ability to host lab infrastructure for the project, legal support for 
> trademarks if we need that, event planning etc.
> 
> > 
> > Jerin
> 


[dpdk-dev] Project Governance and Linux Foundation

2016-10-18 Thread Jerin Jacob
On Tue, Oct 18, 2016 at 03:27:27PM +0200, Thomas Monjalon wrote:
> 2016-10-18 17:04, Jerin Jacob:
> > On Mon, Oct 17, 2016 at 05:23:42PM -0400, Dave Neary wrote:
> > > > I still hear concerns on this, and based on discussions with others who 
> > > > put their names to the post below, they do too. I think it's a 
> > > > perception that we need to address.
> > > 
> > > I would say that there is still a perception issue, for companies who
> > > look at the active developers, the owners of the project's resources
> > > (infra, domain name), and who have heard anecdotal evidence of issues in
> > > the past. I think the project has made a lot of progress since I have
> > > been following it, and I do not believe there are any major issues with
> > > the independence of the project. However, there are still concerned
> > > parties on this front, and the concerns can be easily addressed by a
> > > move to the LF.
> > 
> > +1
> 
> How can we solve issues if you don't give more details than
> "hear concerns" or "heard anecdotal evidence of issues"?

Honestly, I don't see any issue in the current DPDK project execution.
The concern was more towards the fact that multi-vendor infrastructure project
like DPDK owned and controlled by the single company.

We believe, Moving to LF will fix that issue/perception and it will
enable more users to use/consume/invest DPDK in their products.
Having said that, Does anyone see any issue in moving to LF?
If yes, Then we should enumerate the issues and discuss further.

Jerin



[dpdk-dev] Project Governance and Linux Foundation

2016-10-18 Thread Jerin Jacob
On Mon, Oct 17, 2016 at 05:23:42PM -0400, Dave Neary wrote:
> Hi,
> 
> On 10/17/2016 07:52 AM, O'Driscoll, Tim wrote:
> >> -Original Message-
> >> I don't really understand what can be gained by moving to Linux
> >> Foundation, but I am almost sure that no individual expert will be able
> >> to take any leaderhip role as those roles will be fulfilled by Platinum,
> >> Gold or Silver members: right ?
> > 
> > No. If DPDK were to move to LF as an independent project, then as discussed 
> > at the Userspace event in Dublin last year, and as documented in the 
> > original post below, the intention would be not to make any significant 
> > changes to the technical governance.
> > 
> > If DPDK were to move to FD.io the situation would be the same. The FD.io 
> > Technical Community Charter 
> > (https://fd.io/governance/technical-community-charter) specifies how 
> > Project Technical Leaders and Committers are nominated and approved, but 
> > there's no requirement for people in those roles to come from Platinum, 
> > Gold or Silver FD.io members. Those decisions are based purely on technical 
> > merit.
> 
> I just want to second what Tim said - it's important for Red Hat, at
> least, that the technical governance of a project be kept separate from
> any membership of an organization managing the budget for the project.
> 
> The technical management of the project can also be discussed, but it is
> out of scope, IMHO, when talking about moving to fd.io or the Linux
> Foundation.
> 
> >> The current DPDK version can run on virtually all processors (Intel, IBM
> >> and ARM) and leverage all NICs: is there **really** anyone questionning
> >> openness of the community?
> > 
> > I still hear concerns on this, and based on discussions with others who put 
> > their names to the post below, they do too. I think it's a perception that 
> > we need to address.
> 
> I would say that there is still a perception issue, for companies who
> look at the active developers, the owners of the project's resources
> (infra, domain name), and who have heard anecdotal evidence of issues in
> the past. I think the project has made a lot of progress since I have
> been following it, and I do not believe there are any major issues with
> the independence of the project. However, there are still concerned
> parties on this front, and the concerns can be easily addressed by a
> move to the LF.

+1

> 
> Regards,
> Dave.
> 
> -- 
> Dave Neary - NFV/SDN Community Strategy
> Open Source and Standards, Red Hat - http://community.redhat.com
> Ph: +1-978-399-2182 / Cell: +1-978-799-3338


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-18 Thread Jerin Jacob
On Mon, Oct 17, 2016 at 08:26:33PM +, Eads, Gage wrote:
> 
> 
> >  -Original Message-
> >  From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> >  Sent: Sunday, October 16, 2016 11:18 PM
> >  To: Eads, Gage 
> >  Cc: dev at dpdk.org; thomas.monjalon at 6wind.com; Richardson, Bruce
> >  ; Vangati, Narender
> >  ; hemant.agrawal at nxp.com
> >  Subject: Re: [dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven
> >  programming model framework for DPDK
> >  
> >  On Fri, Oct 14, 2016 at 03:00:57PM +, Eads, Gage wrote:
> >  > Thanks Jerin, this looks good. I've put a few notes/questions inline.
> >  
> >  Thanks Gage.
> >  
> >  >
> >  > >  +
> >  > >  +/**
> >  > >  + * Get the device identifier for the named event device.
> >  > >  + *
> >  > >  + * @param name
> >  > >  + *   Event device name to select the event device identifier.
> >  > >  + *
> >  > >  + * @return
> >  > >  + *   Returns event device identifier on success.
> >  > >  + *   - <0: Failure to find named event device.
> >  > >  + */
> >  > >  +extern uint8_t
> >  > >  +rte_event_dev_get_dev_id(const char *name);
> >  >
> >  > This return type should be int8_t, or some signed type, to support the 
> > failure
> >  case.
> >  
> >  Makes sense. I will change to int to make consistent with
> >  rte_cryptodev_get_dev_id()
> >  
> >  >
> >  > >  +};
> >  > >  +
> >  > >  +/**
> >  > >  + * Schedule one or more events in the event dev.
> >  > >  + *
> >  > >  + * An event dev implementation may define this is a NOOP, for
> >  > > instance if  + * the event dev performs its scheduling in hardware.
> >  > >  + *
> >  > >  + * @param dev_id
> >  > >  + *   The identifier of the device.
> >  > >  + */
> >  > >  +extern void
> >  > >  +rte_event_schedule(uint8_t dev_id);
> >  >
> >  > One idea: Have the function return the number of scheduled packets (or 0 
> > for
> >  implementations that do scheduling in hardware). This could be a helpful
> >  diagnostic for the software scheduler.
> >  
> >  How about returning an implementation specific value ?
> >  Rather than defining certain function associated with returned value.
> >  Just to  make sure it works with all HW/SW implementations. Something like
> >  below,
> >  
> >  /**
> >   * Schedule one or more events in the event dev.
> >   *
> >   * An event dev implementation may define this is a NOOP, for instance if
> >   * the event dev performs its scheduling in hardware.
> >   *
> >   * @param dev_id
> >   *   The identifier of the device.
> >   * @return
> >   *   Implementation specific value from the event driver for diagnostic 
> > purpose
> >   */
> >  extern int
> >  rte_event_schedule(uint8_t dev_id);
> >  
> >  
> 
> That's fine by me.

OK. I will change it in v3

> 
> I also had a comment on the return value of rte_event_dev_info_get() in my 
> previous email: "I'm wondering if this return type should be int, so we can 
> return an error if the dev_id is invalid."
> 
> What do you think?

The void return was based on cryptodev_info_get().I think, it makes
sense to return "int". I will change it in v3.


> 
> Thanks,
> Gage
> 
> >  
> >  


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-17 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 05:02:21PM +0100, Bruce Richardson wrote:
> On Wed, Oct 12, 2016 at 01:00:16AM +0530, Jerin Jacob wrote:
> > Thanks to Intel and NXP folks for the positive and constructive feedback
> > I've received so far. Here is the updated RFC(v2).
> > 
> > I've attempted to address as many comments as possible.
> > 
> > This series adds rte_eventdev.h to the DPDK tree with
> > adequate documentation in doxygen format.
> > 
> > Updates are also available online:
> > 
> > Related draft header file (this patch):
> > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > 
> > PDF version(doxgen output):
> > https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf
> > 
> > Repo:
> > https://github.com/jerinjacobk/libeventdev
> > 
> 
> Thanks for all the work on this.

Thanks

> 
> 
> > +/* Event device configuration bitmap flags */
> > +#define RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT (1 << 0)
> > +/**< Override the global *dequeue_wait_ns* and use per dequeue wait in ns.
> > + *  \see rte_event_dequeue_wait_time(), rte_event_dequeue()
> > + */
> 
> Can you clarify why this is needed? If an app wants to use the same
> dequeue wait times for all dequeues can it not specify that itself via
> the wait time parameter, rather than having a global dequeue wait value?

The rational for choosing this scheme to have optimized
rte_event_dequeue() for some implementation without loosing application
portability and need.

We mostly have two different types of HW schemes to define the wait time

HW1) Have only global wait value for the eventdev across all the
dequeue
HW2) Per queue wait value

In-terms of applications,
APP1) Trivial application does not need different dequeue value for each
dequeue
APP2) Non trivial applications does need different dequeue values

This config option can take advantage if application demands only APP1
on HW1 without loosing application potablity.(i.e if application demand
for APP2 scheme then HW1 based implementation can have different function
pointer to implement dequeue function)

The overall theme of the proposal to have more configuration options(like
RTE_EVENT_QUEUE_CFG_SINGLE_CONSUMER) to have high performance SW/HW 
implementations



[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-17 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 03:00:57PM +, Eads, Gage wrote:
> Thanks Jerin, this looks good. I've put a few notes/questions inline.

Thanks Gage.

> 
> >  +
> >  +/**
> >  + * Get the device identifier for the named event device.
> >  + *
> >  + * @param name
> >  + *   Event device name to select the event device identifier.
> >  + *
> >  + * @return
> >  + *   Returns event device identifier on success.
> >  + *   - <0: Failure to find named event device.
> >  + */
> >  +extern uint8_t
> >  +rte_event_dev_get_dev_id(const char *name);
> 
> This return type should be int8_t, or some signed type, to support the 
> failure case.

Makes sense. I will change to int to make consistent with 
rte_cryptodev_get_dev_id()

> 
> >  +};
> >  +
> >  +/**
> >  + * Schedule one or more events in the event dev.
> >  + *
> >  + * An event dev implementation may define this is a NOOP, for instance if
> >  + * the event dev performs its scheduling in hardware.
> >  + *
> >  + * @param dev_id
> >  + *   The identifier of the device.
> >  + */
> >  +extern void
> >  +rte_event_schedule(uint8_t dev_id);
> 
> One idea: Have the function return the number of scheduled packets (or 0 for 
> implementations that do scheduling in hardware). This could be a helpful 
> diagnostic for the software scheduler.

How about returning an implementation specific value ?
Rather than defining certain function associated with returned value.
Just to  make sure it works with all HW/SW implementations. Something like 
below,

/**
 * Schedule one or more events in the event dev.
 *
 * An event dev implementation may define this is a NOOP, for instance if
 * the event dev performs its scheduling in hardware.
 *
 * @param dev_id
 *   The identifier of the device.
 * @return
 *   Implementation specific value from the event driver for diagnostic purpose
 */
extern int
rte_event_schedule(uint8_t dev_id);






[dpdk-dev] [PATCH v2 3/5] i40e: enable i40e vector PMD on ARMv8a platform

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 09:30:02AM +0530, Jianbo Liu wrote:
> Signed-off-by: Jianbo Liu 

Reviewed-by: Jerin Jacob 

> ---
>  config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
>  doc/guides/nics/features/i40e_vec.ini  | 1 +
>  doc/guides/nics/features/i40e_vf_vec.ini   | 1 +
>  3 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
> b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index a0f4473..6321884 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -45,6 +45,5 @@ CONFIG_RTE_TOOLCHAIN_GCC=y
>  CONFIG_RTE_EAL_IGB_UIO=n
>  
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> -CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n
>  
>  CONFIG_RTE_SCHED_VECTOR=n
> diff --git a/doc/guides/nics/features/i40e_vec.ini 
> b/doc/guides/nics/features/i40e_vec.ini
> index 0953d84..edd6b71 100644
> --- a/doc/guides/nics/features/i40e_vec.ini
> +++ b/doc/guides/nics/features/i40e_vec.ini
> @@ -37,3 +37,4 @@ Linux UIO= Y
>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> +ARMv8= Y
> diff --git a/doc/guides/nics/features/i40e_vf_vec.ini 
> b/doc/guides/nics/features/i40e_vf_vec.ini
> index 2a44bf6..d6674f7 100644
> --- a/doc/guides/nics/features/i40e_vf_vec.ini
> +++ b/doc/guides/nics/features/i40e_vf_vec.ini
> @@ -26,3 +26,4 @@ Linux UIO= Y
>  Linux VFIO   = Y
>  x86-32   = Y
>  x86-64   = Y
> +ARMv8= Y
> -- 
> 2.4.11
> 


[dpdk-dev] [PATCH v2 2/5] i40e: implement vector PMD for ARM architecture

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 09:30:01AM +0530, Jianbo Liu wrote:
> Use ARM NEON intrinsic to implement i40e vPMD
> 
> Signed-off-by: Jianbo Liu 

I'm not entirely familiar with i40e internals.The patch looks OK interms
of using NEON instructions.

Acked-by: Jerin Jacob 

> ---
>  drivers/net/i40e/Makefile |   4 +
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 614 
> ++
>  2 files changed, 618 insertions(+)
>  create mode 100644 drivers/net/i40e/i40e_rxtx_vec_neon.c
> 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 10:30:33AM +, Hemant Agrawal wrote:

> > > Am I reading this correctly that there is no way to support an
> > > indefinite waiting capability? Or is this just saying that if a timed
> > > wait is performed there are min/max limits for the wait duration?
> > 
> > Application can wait indefinite if required. see
> > RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.
> > 
> > Trivial application may not need different wait values on each dequeue.This 
> > is a
> > performance optimization opportunity for implementation.
> 
>  Jerin, It is irrespective of wait configuration, whether you are using per 
> device wait or per dequeuer wait. 
>  Can the value of MAX_U32 or MAX_U64 be treated as infinite weight? 

That will be yet another check in the fast path in the implementation, I
think, for more fine-grained wait scheme. Let application configure the device
with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT so that the application can have
two different function pointer-based implementation for dequeue function
if required.

With RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration, implicitly
MAX_U64 becomes infinite weight as the wait is uint64_t.
I can add this info in v3 if required.

Jerin

> 
> > 


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-14 Thread Jerin Jacob
On Thu, Oct 13, 2016 at 11:14:38PM -0500, Bill Fischofer wrote:
> Hi Jerin,

Hi Bill,

Thanks for the review.

[snip]
> > + * If the device init operation is successful, the correspondence between
> > + * the device identifier assigned to the new device and its associated
> > + * *rte_event_dev* structure is effectively registered.
> > + * Otherwise, both the *rte_event_dev* structure and the device
> > identifier are
> > + * freed.
> > + *
> > + * The functions exported by the application Event API to setup a device
> > + * designated by its device identifier must be invoked in the following
> > order:
> > + * - rte_event_dev_configure()
> > + * - rte_event_queue_setup()
> > + * - rte_event_port_setup()
> > + * - rte_event_port_link()
> > + * - rte_event_dev_start()
> > + *
> > + * Then, the application can invoke, in any order, the functions
> > + * exported by the Event API to schedule events, dequeue events, enqueue
> > events,
> > + * change event queue(s) to event port [un]link establishment and so on.
> > + *
> > + * Application may use rte_event_[queue/port]_default_conf_get() to get
> > the
> > + * default configuration to set up an event queue or event port by
> > + * overriding few default values.
> > + *
> > + * If the application wants to change the configuration (i.e. call
> > + * rte_event_dev_configure(), rte_event_queue_setup(), or
> > + * rte_event_port_setup()), it must call rte_event_dev_stop() first to
> > stop the
> > + * device and then do the reconfiguration before calling
> > rte_event_dev_start()
> > + * again. The schedule, enqueue and dequeue functions should not be
> > invoked
> > + * when the device is stopped.
> >
> 
> Given this requirement, the question is what happens to events that are "in
> flight" at the time rte_event_dev_stop() is called? Is stop an asynchronous
> operation that quiesces the event _dev and allows in-flight events to drain
> from queues/ports prior to fully stopping, or is some sort of separate
> explicit quiesce mechanism required? If stop is synchronous and simply
> halts the event_dev, then how is an application to know if subsequent
> configure/setup calls would leave these pending events with no place to
> stand?
>

>From an application API perspective rte_event_dev_stop() is a synchronous 
>function.
If the stop has been called for re-configuring the number of queues, ports etc 
of
the device, then "in flight" entry preservation will be implementation defined.
else "in flight" entries will be preserved.

[snip]

> > +extern int
> > +rte_event_dev_socket_id(uint8_t dev_id);
> > +
> > +/* Event device capability bitmap flags */
> > +#define RTE_EVENT_DEV_CAP_QUEUE_QOS(1 << 0)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event queue.
> > + *
> > + *  \see rte_event_queue_setup(), RTE_EVENT_QUEUE_PRIORITY_NORMAL
> > + */
> > +#define RTE_EVENT_DEV_CAP_EVENT_QOS(1 << 1)
> > +/**< Event scheduling prioritization is based on the priority associated
> > with
> > + *  each event. Priority of each event is supplied in *rte_event*
> > structure
> > + *  on each enqueue operation.
> > + *
> > + *  \see rte_event_enqueue()
> > + */
> > +
> > +/**
> > + * Event device information
> > + */
> > +struct rte_event_dev_info {
> > +   const char *driver_name;/**< Event driver name */
> > +   struct rte_pci_device *pci_dev; /**< PCI information */
> > +   uint32_t min_dequeue_wait_ns;
> > +   /**< Minimum supported global dequeue wait delay(ns) by this
> > device */
> > +   uint32_t max_dequeue_wait_ns;
> > +   /**< Maximum supported global dequeue wait delay(ns) by this
> > device */
> > +   uint32_t dequeue_wait_ns;
> >
> 
> Am I reading this correctly that there is no way to support an indefinite
> waiting capability? Or is this just saying that if a timed wait is
> performed there are min/max limits for the wait duration?

Application can wait indefinite if required. see
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT configuration option.

Trivial application may not need different wait values on each dequeue.This is
a performance optimization opportunity for implementation.

> 
> 
> > +   /**< Configured global dequeue wait delay(ns) for this device */
> > +   uint8_t max_event_queues;
> > +   /**< Maximum event_queues supported by this device */
> > +   uint32_t max_event_queue_flows;
> > +   /**< Maximum supported flows in an event queue by this device*/
> > +   uint8_t max_event_queue_priority_levels;
> > +   /**< Maximum number of event queue priority levels by this device.
> > +* Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
> > +*/
> > +   uint8_t nb_event_queues;
> > +   /**< Configured number of event queues for this device */
> >
> 
> Is 256 a sufficient number of queues? While various SoCs may have limits,
> why impose such a small limit architecturally?

Each event 

[dpdk-dev] [PATCH v2] examples/l3fwd: em: use hw accelerated crc hash function for arm64

2016-10-13 Thread Jerin Jacob
On Fri, Oct 14, 2016 at 12:17:05AM +0530, Hemant Agrawal wrote:
> if machine level CRC extension are available, offload the
> hash to machine provide functions e.g. armv8-a CRC extensions
> support it
> 
> Signed-off-by: Hemant Agrawal 
> Reviewed-by: Jerin Jacob 
> ---
>  examples/l3fwd/l3fwd_em.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index 89a68e6..d92d0aa 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -57,13 +57,17 @@
>  
>  #include "l3fwd.h"
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) && defined(RTE_MACHINE_CPUFLAG_CRC32)

The will evaluate as FALSE always.

Please change to logical OR operation here. ie #if 
defined(RTE_MACHINE_CPUFLAG_SSE4_2) ||
defined(RTE_MACHINE_CPUFLAG_CRC32)

> +#define EM_HASH_CRC 1
> +#endif


[dpdk-dev] [RFC] [PATCH v2] libeventdev: event driven programming model framework for DPDK

2016-10-12 Thread Jerin Jacob
Thanks to Intel and NXP folks for the positive and constructive feedback
I've received so far. Here is the updated RFC(v2).

I've attempted to address as many comments as possible.

This series adds rte_eventdev.h to the DPDK tree with
adequate documentation in doxygen format.

Updates are also available online:

Related draft header file (this patch):
https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

PDF version(doxgen output):
https://rawgit.com/jerinjacobk/libeventdev/master/librte_eventdev_v2.pdf

Repo:
https://github.com/jerinjacobk/libeventdev

v1..v2

- Added Cavium, Intel, NXP copyrights in header file

- Changed the concept of flow queues to flow ids.
This is avoid dictating a specific structure to hold the flows.
A s/w implementation can do atomic load balancing on multiple
flow ids more efficiently than maintaining each event in a specific flow queue.

- Change the scheduling group to event queue.
A scheduling group is more a stream of events, so an event queue is a better
 abstraction.

- Introduced event port concept, Instead of trying eventdev access to the lcore,
a higher level of abstraction called event port is needed which is the
application i/f to the eventdev to dequeue and enqueue the events.
One or more event queues can be linked to single event port.
There can be more than one event port per lcore allowing multiple lightweight
threads to have their own i/f into eventdev, if the implementation supports it.
An event port will be bound to a lcore or a lightweight thread to keep
portable application workflow.
An event port abstraction also encapsulates dequeue depth and enqueue depth for
a scheduler implementations which can schedule multiple events at a time and
output events that can be buffered.

- Added configuration options with event queue(nb_atomic_flows,
nb_atomic_order_sequences, single consumer etc)
and event port(dequeue_queue_depth, enqueue_queue_depth etc) to define the
limits on the resource usage.(Useful for optimized software implementation)

- Introduced RTE_EVENT_DEV_CAP_QUEUE_QOS and RTE_EVENT_DEV_CAP_EVENT_QOS
schemes of priority handling

- Added event port to event queue servicing priority.
This allows two event ports to connect to the same event queue with
different priorities.

- Changed the workflow as schedule/dequeue/enqueue.
An implementation is free to define schedule as NOOP.
A distributed s/w scheduler can use this to schedule events;
also a centralized s/w scheduler can make this a NOOP on non-scheduler cores.

- Removed Cavium HW specific schedule_from_group API

- Removed Cavium HW specific ctxt_update/ctxt_wait APIs.
 Introduced a more generic "event pinning" concept. i.e
If the normal workflow is a dequeue -> do work based on event type -> enqueue,
a pin_event argument to enqueue
where the pinned event is returned through the normal dequeue)
allows application workflow to remain the same whether or not an
implementation supports it.

- Added dequeue() burst variant

- Added the definition of a closed/open system - where open system is memory
backed and closed system eventdev has limited capacity.
In such systems, it is also useful to denote per event port how many packets
can be active in the system.
This can serve as a threshold for ethdev like devices so they don't overwhelm
core to core events.

- Added the option to specify maximum amount of time(in ns) application needs
wait on dequeue()

- Removed the scheme of expressing the number of flows in log2 format

Open item or the item needs improvement.

- Abstract the differences in event QoS management with different priority 
schemes
available in different HW or SW implementations with portable application 
workflow.

Based on the feedback, there three different kinds of QoS support available in
three different HW or SW implementations.
1) Priority associated with the event queue
2) Priority associated with each event enqueue
(Same flow can have two different priority on two separate enqueue)
3) Priority associated with the flow(each flow has unique priority)

In v2, The differences abstracted based on device capability
(RTE_EVENT_DEV_CAP_QUEUE_QOS for the first scheme,
RTE_EVENT_DEV_CAP_EVENT_QOS for the second and third scheme).
This scheme would call for different application workflow for
nontrivial QoS-enabled applications.

Looking forward to getting comments from both application and driver
implementation perspective.

/Jerin

---
 doc/api/doxy-api-index.md  |1 +
 doc/api/doxy-api.conf  |1 +
 lib/librte_eventdev/rte_eventdev.h | 1204 
 3 files changed, 1206 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_eventdev.h

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 6675f96..28c1329 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -40,6 +40,7 @@ There are many libraries, so their headers may be grouped by 
topics:
   

[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-10-09 Thread Jerin Jacob
On Fri, Oct 07, 2016 at 10:40:03AM +, Hemant Agrawal wrote:
> Hi Jerin/Narender,

Hi Hemant,

Thanks for the review.

> 
>   Thanks for the proposal and discussions. 

> 
>   I agree with many of the comment made by Narender.  Here are some 
> additional comments.
> 
> 1. rte_event_schedule - should support option for bulk dequeue. The size of 
> bulk should be a property of device, how much depth it can support.

OK. Will fix it in v2.

> 
> 2. The event schedule should also support the option to specify the amount of 
> time, it can wait. The implementation may only support global 
> setting(dequeue_wait_ns) for wait time. They can take any non-zero wait value 
> as to implement wait.  

OK. Will fix it in v2.

> 
> 3. rte_event_schedule_from_group - there should be one model.  Both Push and 
> Pull may not work well together. At least the simultaneous mixed config will 
> not work on NXP hardware scheduler. 

OK. Will remove Cavium specific "rte_event_schedule_from_group" API in v2.

> 
> 4. Priority of queues within the scheduling group?  - Please keep in mind 
> that some hardware supports intra scheduler priority and some only support 
> intra flow_queue priority within a scheduler instance. The events of same 
> flow id should have same priority.

Will try to address some solution based on capability.

> 
> 5. w.r.t flow_queue numbers in log2, I will prefer to have absolute number. 
> Not all system may have large number of queues. So the design should keep in 
> account the system will fewer number of queues.

OK. Will fix it in v2.

> 
> Regards,
> Hemant
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Wednesday, October 05, 2016 12:55 PM
> > On Tue, Oct 04, 2016 at 09:49:52PM +, Vangati, Narender wrote:
> > > Hi Jerin,
> > 
> > Hi Narender,
> > 
> > Thanks for the comments.I agree with proposed changes; I will address these
> > comments in v2.
> > 
> > /Jerin
> > 
> > 
> > >
> > >
> > >
> > > Here are some comments on the libeventdev RFC.
> > >
> > > These are collated thoughts after discussions with you & others to 
> > > understand
> > the concepts and rationale for the current proposal.
> > >
> > >
> > >
> > > 1. Concept of flow queues. This is better abstracted as flow ids and not 
> > > as flow
> > queues which implies there is a queueing structure per flow. A s/w
> > implementation can do atomic load balancing on multiple flow ids more
> > efficiently than maintaining each event in a specific flow queue.
> > >
> > >
> > >
> > > 2. Scheduling group. A scheduling group is more a steam of events, so an 
> > > event
> > queue might be a better abstraction.
> > >
> > >
> > >
> > > 3. An event queue should support the concept of max active atomic flows
> > (maximum number of active flows this queue can track at any given time) and
> > max active ordered sequences (maximum number of outstanding events waiting
> > to be egress reordered by this queue). This allows a scheduler 
> > implementation to
> > dimension/partition its resources among event queues.
> > >
> > >
> > >
> > > 4. An event queue should support concept of a single consumer. In an
> > application, a stream of events may need to be brought together to a single
> > core for some stages of processing, e.g. for TX at the end of the pipeline 
> > to
> > avoid NIC reordering of the packets. Having a 'single consumer' event queue 
> > for
> > that stage allows the intensive scheduling logic to be short circuited and 
> > can
> > improve throughput for s/w implementations.
> > >
> > >
> > >
> > > 5. Instead of tying eventdev access to an lcore, a higher level of 
> > > abstraction
> > called event port is needed which is the application i/f to the eventdev. 
> > Event
> > ports are connected to event queues and is the object the application uses 
> > to
> > dequeue and enqueue events. There can be more than one event port per lcore
> > allowing multiple lightweight threads to have their own i/f into eventdev, 
> > if the
> > implementation supports it. An event port abstraction also encapsulates
> > dequeue depth and enqueue depth for a scheduler implementations which can
> > schedule multiple events at a time and output events that can be buffered.
> > >
> > >
> > >
> > > 6. 

[dpdk-dev] [PATCH] examples/l3fwd: em path hash offload to machine

2016-10-05 Thread Jerin Jacob
On Tue, Aug 23, 2016 at 08:24:39PM +0530, Hemant Agrawal wrote:

Maybe you can change the subject line to:
examples/l3fwd: em: use hw accelerated crc hash function for arm64
instead of:
examples/l3fwd: em path hash offload to machine

> if machine level CRC extension are available, offload the
> hash to machine provided functions e.g. armv8-a CRC extensions
> support it
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  examples/l3fwd/l3fwd_em.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
> index def5a02..a889c67 100644
> --- a/examples/l3fwd/l3fwd_em.c
> +++ b/examples/l3fwd/l3fwd_em.c
> @@ -58,13 +58,13 @@
>  
>  #include "l3fwd.h"
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)

Rather than adding new compilation flag everywhere, Maybe you can add

#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) &&
defined(RTE_MACHINE_CPUFLAG_CRC32)
#define EM_HASH_CRC 1
#endif

something like above to reduce the change for future platforms with crc
support.

Other than that, you can add:
Reviewed-by: Jerin Jacob 

>  #include 
>  #define DEFAULT_HASH_FUNC   rte_hash_crc
>  #else
>  #include 
>  #define DEFAULT_HASH_FUNC   rte_jhash
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>  #define IPV6_ADDR_LEN 16
>  
> @@ -169,17 +169,17 @@ ipv4_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   t = k->proto;
>   p = (const uint32_t *)>port_src;
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   init_val = rte_hash_crc_4byte(t, init_val);
>   init_val = rte_hash_crc_4byte(k->ip_src, init_val);
>   init_val = rte_hash_crc_4byte(k->ip_dst, init_val);
>   init_val = rte_hash_crc_4byte(*p, init_val);
> -#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#else
>   init_val = rte_jhash_1word(t, init_val);
>   init_val = rte_jhash_1word(k->ip_src, init_val);
>   init_val = rte_jhash_1word(k->ip_dst, init_val);
>   init_val = rte_jhash_1word(*p, init_val);
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>   return init_val;
>  }
> @@ -191,16 +191,16 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   const union ipv6_5tuple_host *k;
>   uint32_t t;
>   const uint32_t *p;
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   const uint32_t  *ip_src0, *ip_src1, *ip_src2, *ip_src3;
>   const uint32_t  *ip_dst0, *ip_dst1, *ip_dst2, *ip_dst3;
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>  
>   k = data;
>   t = k->proto;
>   p = (const uint32_t *)>port_src;
>  
> -#ifdef RTE_MACHINE_CPUFLAG_SSE4_2
> +#if defined(RTE_MACHINE_CPUFLAG_SSE4_2) || defined(RTE_MACHINE_CPUFLAG_CRC32)
>   ip_src0 = (const uint32_t *) k->ip_src;
>   ip_src1 = (const uint32_t *)(k->ip_src+4);
>   ip_src2 = (const uint32_t *)(k->ip_src+8);
> @@ -219,14 +219,14 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t 
> data_len,
>   init_val = rte_hash_crc_4byte(*ip_dst2, init_val);
>   init_val = rte_hash_crc_4byte(*ip_dst3, init_val);
>   init_val = rte_hash_crc_4byte(*p, init_val);
> -#else /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#else
>   init_val = rte_jhash_1word(t, init_val);
>   init_val = rte_jhash(k->ip_src,
>   sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
>   init_val = rte_jhash(k->ip_dst,
>   sizeof(uint8_t) * IPV6_ADDR_LEN, init_val);
>   init_val = rte_jhash_1word(*p, init_val);
> -#endif /* RTE_MACHINE_CPUFLAG_SSE4_2 */
> +#endif
>   return init_val;
>  }
>  
> -- 
> 1.9.1
> 


[dpdk-dev] [PATCH v2] doc: arm64: document DPDK application profiling methods

2016-10-05 Thread Jerin Jacob
Signed-off-by: Jerin Jacob 
---
v2:
-Addressed ARM64 specific review comments(Suggested by Thomas)
http://dpdk.org/dev/patchwork/patch/16362/
---
 doc/guides/prog_guide/profile_app.rst | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/doc/guides/prog_guide/profile_app.rst 
b/doc/guides/prog_guide/profile_app.rst
index 3226187..9f1b7ee 100644
--- a/doc/guides/prog_guide/profile_app.rst
+++ b/doc/guides/prog_guide/profile_app.rst
@@ -31,6 +31,14 @@
 Profile Your Application
 

+Introduction
+
+
+The following sections describe the methods to profile DPDK applications on
+different architectures.
+
+x86
+~~~
 Intel processors provide performance counters to monitor events.
 Some tools provided by Intel can be used to profile and benchmark an 
application.
 See the *VTune Performance Analyzer Essentials* publication from Intel Press 
for more information.
@@ -50,3 +58,53 @@ The main situations that should be monitored through event 
counters are:
 Refer to the
 `Intel Performance Analysis Guide 
<http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_
 for details about application profiling.
+
+ARM64
+~
+
+Perf
+
+ARM64 architecture provide performance counters to monitor events.
+The Linux perf tool can be used to profile and benchmark an application.
+In addition to the standard events, perf can be used to profile arm64 specific
+PMU events through raw events(-e -rXX)
+
+Refer to the
+`ARM64 specific PMU events enumeration 
<http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_
+
+High-resolution cycle counter
+^
+The default cntvct_el0 based rte_rdtsc() provides portable means to get wall
+clock counter at user space. Typically it runs at <= 100MHz.
+
+The alternative method to enable rte_rdtsc() for high resolution
+wall clock counter is through armv8 PMU subsystem.
+The PMU cycle counter runs at CPU frequency, However, access to PMU cycle
+counter from user space is not enabled by default in the arm64 linux kernel.
+It is possible to enable cycle counter at user space access
+by configuring the PMU from the privileged mode (kernel space).
+
+by default rte_rdtsc() implementation uses portable cntvct_el0 scheme.
+Application can choose the PMU based implementation with
+CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+
+Find below the example steps to configure the PMU based cycle counter on an
+armv8 machine.
+
+.. code-block:: console
+
+git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
+cd armv8_pmu_cycle_counter_el0
+make
+sudo insmod pmu_el0_cycle_counter.ko
+cd $DPDK_DIR
+make config T=arm64-armv8a-linuxapp-gcc
+echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
+make
+
+.. warning::
+
+   The PMU based scheme is useful for high accuracy performance profiling with
+   rte_rdtsc(). However, This method can not be used in conjunction with Linux
+   userspace profiling tools like perf as this scheme alters the PMU registers
+   state.
-- 
2.5.5



[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-10-05 Thread Jerin Jacob
On Tue, Oct 04, 2016 at 09:49:52PM +, Vangati, Narender wrote:
> Hi Jerin,

Hi Narender,

Thanks for the comments.I agree with proposed changes; I will address these 
comments in v2.

/Jerin


> 
> 
> 
> Here are some comments on the libeventdev RFC.
> 
> These are collated thoughts after discussions with you & others to understand 
> the concepts and rationale for the current proposal.
> 
> 
> 
> 1. Concept of flow queues. This is better abstracted as flow ids and not as 
> flow queues which implies there is a queueing structure per flow. A s/w 
> implementation can do atomic load balancing on multiple flow ids more 
> efficiently than maintaining each event in a specific flow queue.
> 
> 
> 
> 2. Scheduling group. A scheduling group is more a steam of events, so an 
> event queue might be a better abstraction.
> 
> 
> 
> 3. An event queue should support the concept of max active atomic flows 
> (maximum number of active flows this queue can track at any given time) and 
> max active ordered sequences (maximum number of outstanding events waiting to 
> be egress reordered by this queue). This allows a scheduler implementation to 
> dimension/partition its resources among event queues.
> 
> 
> 
> 4. An event queue should support concept of a single consumer. In an 
> application, a stream of events may need to be brought together to a single 
> core for some stages of processing, e.g. for TX at the end of the pipeline to 
> avoid NIC reordering of the packets. Having a 'single consumer' event queue 
> for that stage allows the intensive scheduling logic to be short circuited 
> and can improve throughput for s/w implementations.
> 
> 
> 
> 5. Instead of tying eventdev access to an lcore, a higher level of 
> abstraction called event port is needed which is the application i/f to the 
> eventdev. Event ports are connected to event queues and is the object the 
> application uses to dequeue and enqueue events. There can be more than one 
> event port per lcore allowing multiple lightweight threads to have their own 
> i/f into eventdev, if the implementation supports it. An event port 
> abstraction also encapsulates dequeue depth and enqueue depth for a scheduler 
> implementations which can schedule multiple events at a time and output 
> events that can be buffered.
> 
> 
> 
> 6. An event should support priority. Per event priority is useful for 
> segregating high priority (control messages) traffic from low priority within 
> the same flow. This needs to be part of the event definition for 
> implementations which support it.
> 
> 
> 
> 7. Event port to event queue servicing priority. This allows two event ports 
> to connect to the same event queue with different priorities. For 
> implementations which support it, this allows a worker core to participate in 
> two different workflows with different priorities (workflow 1 needing 3.5 
> cores, workflow 2 needing 2.5 cores, and so on).
> 
> 
> 
> 8. Define the workflow as schedule/dequeue/enqueue. An implementation is free 
> to define schedule as NOOP. A distributed s/w scheduler can use this to 
> schedule events; also a centralized s/w scheduler can make this a NOOP on 
> non-scheduler cores.
> 
> 
> 
> 9. The schedule_from_group API does not fit the workflow.
> 
> 
> 
> 10. The ctxt_update/ctxt_wait breaks the normal workflow. If the normal 
> workflow is a dequeue -> do work based on event type -> enqueue,  a pin_event 
> argument to enqueue (where the pinned event is returned through the normal 
> dequeue) allows application workflow to remain the same whether or not an 
> implementation supports it.
> 
> 
> 
> 11. Burst dequeue/enqueue needed.
> 
> 
> 
> 12. Definition of a closed/open system - where open system is memory backed 
> and closed system eventdev has limited capacity. In such systems, it is also 
> useful to denote per event port how many packets can be active in the system. 
> This can serve as a threshold for ethdev like devices so they don't overwhelm 
> core to core events.
> 
> 
> 
> 13. There should be sort of device capabilities definition to address 
> different implementations.
> 
> 
> 
> 
> vnr
> ---
> 


[dpdk-dev] [PATCH] doc: arm64: document DPDK application profiling methods

2016-10-04 Thread Jerin Jacob
Signed-off-by: Jerin Jacob 
---
 doc/guides/prog_guide/profile_app.rst | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/doc/guides/prog_guide/profile_app.rst 
b/doc/guides/prog_guide/profile_app.rst
index 3226187..bb78623 100644
--- a/doc/guides/prog_guide/profile_app.rst
+++ b/doc/guides/prog_guide/profile_app.rst
@@ -31,6 +31,14 @@
 Profile Your Application
 

+Introduction
+
+
+The following sections describe the methods to profile DPDK applications on
+different architectures.
+
+x86
+~~~
 Intel processors provide performance counters to monitor events.
 Some tools provided by Intel can be used to profile and benchmark an 
application.
 See the *VTune Performance Analyzer Essentials* publication from Intel Press 
for more information.
@@ -50,3 +58,53 @@ The main situations that should be monitored through event 
counters are:
 Refer to the
 `Intel Performance Analysis Guide 
<http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf>`_
 for details about application profiling.
+
+ARM64
+~
+
+Perf
+
+ARM64 architecture provide performance counters to monitor events.
+The Linux perf tool can be used to profile and benchmark an application.
+In addition to the standard events, perf can be used to profile arm64 specific
+PMU events through raw events(-e -rXX)
+
+Refer to the
+`ARM64 specific PMU events enumeration 
<http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100095_0002_04_en/way1382543438508.html>`_
+
+High-resolution cycle counter
+^
+The default cntvct_el0 based rte_rdtsc() provides portable means to get wall
+clock counter at user space. Typically it runs at <= 100MHz.
+
+The alternative method to enable rte_rdtsc() for high resolution
+wall clock counter is through armv8 PMU subsystem.
+The PMU cycle counter runs at CPU frequency, However, access to PMU cycle
+counter from user space is not enabled by default in the arm64 linux kernel.
+It is possible to enable cycle counter at user space access
+by configuring the PMU from the privileged mode (kernel space).
+
+by default rte_rdtsc() implementation uses portable cntvct_el0 scheme.
+Application can choose the PMU based implementation with
+CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
+
+The PMU based scheme useful for high accuracy performance profiling.
+Find below the example steps to configure the PMU based cycle counter on an
+armv8 machine.
+
+.. code-block:: console
+
+git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
+cd armv8_pmu_cycle_counter_el0
+make
+sudo insmod pmu_el0_cycle_counter.ko
+cd $DPDK_DIR
+make config T=arm64-armv8a-linuxapp-gcc
+echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
+make
+
+.. warning::
+
+This method can not be used in production systems as this may alter PMU
+state used by standard Linux user space tool like perf.
+
-- 
2.5.5



[dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal callback functions

2016-09-14 Thread Jerin Jacob
On Tue, Sep 13, 2016 at 02:05:49PM +, ZELEZNIAK, ALEX wrote:
> Idea here is not to allow VM to control policies assigned to it for security
> and other reasons. PF is controlled by host and dictates what VM can and 
> can't do in regards of setting VF parameters.

I think the proposed scheme, The VM does not take any action on its own.
The VM will just follow what the centralized entity to do so.
I think if you are planning to support different varieties of PMD then
this could be an option.However, if you wish to support only a subset of
PMDs then PF MBOX based scheme may be enough.
In any case, I think exposing the fine details of PF/VF MBOX scheme
in the ethdev spec is not a good idea.

> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Tuesday, September 13, 2016 4:46 AM
> > To: ZELEZNIAK, ALEX 
> > Cc: Bernard Iremonger ;
> > rahul.r.shah at intel.com; wenzhuo.lu at intel.com; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal
> > callback functions
> > 
> > On Fri, Sep 09, 2016 at 04:32:07PM +, ZELEZNIAK, ALEX wrote:
> > > Use case could be to inform application managing SRIOV about VM's
> > intention
> > > to modify parameters like add VLAN which might not be the one which is
> > > assigned to VF or inform about VF reset and reapply settings like
> > strip/insert
> > > VLAN id based on policy.
> > 
> > Is there any other way(more portable way) where we can realize the same
> > use case?
> > 
> > Something like,
> > 
> > 1) The assigned VM operates/control the VF
> > 2) A centralized entity post messages through UNIX socket or
> > something(like vhost user communicates with VM).
> > On message receive, VM can take necessary action on assigned VF.
> > 
> > This will avoid the need of defining specifics of PF to VF mailbox
> > communication in normative ethdev specification.
> > 
> > And I guess it will work almost the PMD drivers as their is no
> > PMD specific work here.
> > 
> > Just a thought.
> > 
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > > Sent: Friday, September 09, 2016 10:11 AM
> > > > To: Bernard Iremonger 
> > > > Cc: rahul.r.shah at intel.com; wenzhuo.lu at intel.com; dev at dpdk.org;
> > > > ZELEZNIAK, ALEX 
> > > > Subject: Re: [dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal
> > > > callback functions
> > > >
> > > > On Fri, Aug 26, 2016 at 10:10:16AM +0100, Bernard Iremonger wrote:
> > > > > add _rte_eth_dev_callback_process_vf function.
> > > > > add _rte_eth_dev_callback_process_generic function
> > > > >
> > > > > Adding a callback to the user application on VF to PF mailbox message,
> > > > > allows passing information to the application controlling the PF
> > > > > when a VF mailbox event message is received, such as VF reset.
> > > > >
> > > > > Signed-off-by: azelezniak 
> > > > > Signed-off-by: Bernard Iremonger 
> > > > > ---
> > > > >  lib/librte_ether/rte_ethdev.c  | 17 ++
> > > > >  lib/librte_ether/rte_ethdev.h  | 61
> > > > ++
> > > > >  lib/librte_ether/rte_ether_version.map |  7 
> > > > >  3 files changed, 85 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c
> > > > > index f62a9ec..1388ea3 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -2690,6 +2690,20 @@ void
> > > > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > > > >   enum rte_eth_event_type event)
> > > > >  {
> > > > > + return _rte_eth_dev_callback_process_generic(dev, event, NULL);
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > > > > + enum rte_eth_event_type event, void *param)
> > > > > +{
> > > > > + return _rte_eth_dev_callback_process_generic(dev, event, param);
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +_rte_eth_dev_callb

[dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal callback functions

2016-09-13 Thread Jerin Jacob
On Fri, Sep 09, 2016 at 04:32:07PM +, ZELEZNIAK, ALEX wrote:
> Use case could be to inform application managing SRIOV about VM's intention
> to modify parameters like add VLAN which might not be the one which is 
> assigned to VF or inform about VF reset and reapply settings like strip/insert
> VLAN id based on policy.

Is there any other way(more portable way) where we can realize the same
use case?

Something like,

1) The assigned VM operates/control the VF
2) A centralized entity post messages through UNIX socket or
something(like vhost user communicates with VM).
On message receive, VM can take necessary action on assigned VF.

This will avoid the need of defining specifics of PF to VF mailbox
communication in normative ethdev specification.

And I guess it will work almost the PMD drivers as their is no
PMD specific work here.

Just a thought.

> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Friday, September 09, 2016 10:11 AM
> > To: Bernard Iremonger 
> > Cc: rahul.r.shah at intel.com; wenzhuo.lu at intel.com; dev at dpdk.org;
> > ZELEZNIAK, ALEX 
> > Subject: Re: [dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal
> > callback functions
> > 
> > On Fri, Aug 26, 2016 at 10:10:16AM +0100, Bernard Iremonger wrote:
> > > add _rte_eth_dev_callback_process_vf function.
> > > add _rte_eth_dev_callback_process_generic function
> > >
> > > Adding a callback to the user application on VF to PF mailbox message,
> > > allows passing information to the application controlling the PF
> > > when a VF mailbox event message is received, such as VF reset.
> > >
> > > Signed-off-by: azelezniak 
> > > Signed-off-by: Bernard Iremonger 
> > > ---
> > >  lib/librte_ether/rte_ethdev.c  | 17 ++
> > >  lib/librte_ether/rte_ethdev.h  | 61
> > ++
> > >  lib/librte_ether/rte_ether_version.map |  7 
> > >  3 files changed, 85 insertions(+)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> > > index f62a9ec..1388ea3 100644
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -2690,6 +2690,20 @@ void
> > >  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
> > >   enum rte_eth_event_type event)
> > >  {
> > > + return _rte_eth_dev_callback_process_generic(dev, event, NULL);
> > > +}
> > > +
> > > +void
> > > +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> > > + enum rte_eth_event_type event, void *param)
> > > +{
> > > + return _rte_eth_dev_callback_process_generic(dev, event, param);
> > > +}
> > > +
> > > +void
> > > +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> > > + enum rte_eth_event_type event, void *param)
> > > +{
> > >   struct rte_eth_dev_callback *cb_lst;
> > >   struct rte_eth_dev_callback dev_cb;
> > >
> > > @@ -2699,6 +2713,9 @@ _rte_eth_dev_callback_process(struct
> > rte_eth_dev *dev,
> > >   continue;
> > >   dev_cb = *cb_lst;
> > >   cb_lst->active = 1;
> > > + if (param != NULL)
> > > + dev_cb.cb_arg = (void *) param;
> > > +
> > >   rte_spinlock_unlock(_eth_dev_cb_lock);
> > >   dev_cb.cb_fn(dev->data->port_id, dev_cb.event,
> > >   dev_cb.cb_arg);
> > > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > > index b0fe033..4fb0b9c 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -3047,9 +3047,27 @@ enum rte_eth_event_type {
> > >   /**< queue state event (enabled/disabled)
> > */
> > >   RTE_ETH_EVENT_INTR_RESET,
> > >   /**< reset interrupt event, sent to VF on PF reset */
> > > + RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
> > >   RTE_ETH_EVENT_MAX   /**< max value of this enum */
> > >  };
> > >
> > > +/**
> > > + * Response sent back to ixgbe driver from user app after callback
> > > + */
> > > +enum rte_eth_mb_event_rsp {
> > > + RTE_ETH_MB_EVENT_NOOP_ACK,  /**< skip mbox request and ACK
> > */
> > > + RTE_ETH_MB_EVENT_NOOP_NACK, /**< skip mbox request and
> > NACK */
> > > + RTE_ETH_MB_EVENT_PROCEED,  /**< proceed with mbox request
> > */
> > > + RTE_ETH_MB_EVENT_MAX   /**< max value of this enum */
> > > +};
> > 
> > Do we really need to define the specifics of PF to VF MBOX communication
> > in normative ethdev specification?
> > Each drivers may have different PF to VF MBOX definitions so it may not be
> > very portable.
> > What is the use-case here? If its for VF configuration, I think we can
> > do it as separate 'sync' functions for each functionality so that
> > PMDs will have room hiding the specifics on MBOX definitions.
> 


[dpdk-dev] [RFC PATCH v2 3/5] librte_ether: add API's for VF management

2016-09-09 Thread Jerin Jacob
On Fri, Aug 26, 2016 at 10:10:18AM +0100, Bernard Iremonger wrote:
> Add new API functions to configure and manage VF's on a NIC.
> 
> add rte_eth_dev_vf_ping function.
> add rte_eth_dev_set_vf_vlan_anti_spoof function.
> add rte_eth_dev_set_vf_mac_anti_spoof function.
> 
> Signed-off-by: azelezniak 
> 
> add rte_eth_dev_set_vf_vlan_strip function.
> add rte_eth_dev_set_vf_vlan_insert function.
> add rte_eth_dev_set_loopback function.
> add rte_eth_dev_set_all_queues_drop function.
> add rte_eth_dev_set_vf_split_drop_en function
> add rte_eth_dev_set_vf_mac_addr function.

Do we really need to expose VF specific functions here?
It can be generic(PF/VF) function indexed only through port_id.
(example: as rte_eth_dev_set_vlan_anti_spoof(uint8_t port_id, uint8_t on))
For instance, In Thunderx PMD, We are not exposing a separate port_id
for PF. We only enumerate 0..N VFs as 0..N ethdev port_id

> increment LIBABIVER to 5.
> 
> Signed-off-by: Bernard Iremonger 
> ---
>  lib/librte_ether/rte_ethdev.c  | 159 +++
>  lib/librte_ether/rte_ethdev.h  | 223 
> +
>  lib/librte_ether/rte_ether_version.map |   9 ++
>  3 files changed, 391 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 1388ea3..2a3d2ae 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2306,6 +2306,22 @@ rte_eth_dev_default_mac_addr_set(uint8_t port_id, 
> struct ether_addr *addr)
>  }
>  
>  int
> +rte_eth_dev_set_vf_mac_addr(uint8_t port_id, uint16_t vf, struct ether_addr 
> *addr)
> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> + if (!is_valid_assigned_ether_addr(addr))
> + return -EINVAL;
> +
> + dev = _eth_devices[port_id];
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_mac_addr, -ENOTSUP);
> +
> + return (*dev->dev_ops->set_vf_mac_addr)(dev, vf, addr);
> +}
> +
> +int
>  rte_eth_dev_set_vf_rxmode(uint8_t port_id,  uint16_t vf,
>   uint16_t rx_mode, uint8_t on)
>  {
> @@ -2490,6 +2506,149 @@ rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, 
> uint16_t vlan_id,
>  vf_mask, vlan_on);
>  }
>  
> +int
> +rte_eth_dev_set_vf_vlan_anti_spoof(uint8_t port_id,
> +uint16_t vf, uint8_t on)
> +{
> + struct rte_eth_dev *dev;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +
> + dev = _eth_devices[port_id];
> + if (vf > 63) {

PMD may have more than 64 VFs.


> + RTE_PMD_DEBUG_TRACE("VF VLAN anti spoof:VF %d > 63\n", vf);
> + return -EINVAL;
> + }
> +
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_vlan_anti_spoof, 
> -ENOTSUP);
> + (*dev->dev_ops->set_vf_vlan_anti_spoof)(dev, vf, on);
> + return 0;
> +}
> +


[dpdk-dev] [RFC PATCH v2 1/5] librte_ether: add internal callback functions

2016-09-09 Thread Jerin Jacob
On Fri, Aug 26, 2016 at 10:10:16AM +0100, Bernard Iremonger wrote:
> add _rte_eth_dev_callback_process_vf function.
> add _rte_eth_dev_callback_process_generic function
> 
> Adding a callback to the user application on VF to PF mailbox message,
> allows passing information to the application controlling the PF
> when a VF mailbox event message is received, such as VF reset.
> 
> Signed-off-by: azelezniak 
> Signed-off-by: Bernard Iremonger 
> ---
>  lib/librte_ether/rte_ethdev.c  | 17 ++
>  lib/librte_ether/rte_ethdev.h  | 61 
> ++
>  lib/librte_ether/rte_ether_version.map |  7 
>  3 files changed, 85 insertions(+)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index f62a9ec..1388ea3 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -2690,6 +2690,20 @@ void
>  _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   enum rte_eth_event_type event)
>  {
> + return _rte_eth_dev_callback_process_generic(dev, event, NULL);
> +}
> +
> +void
> +_rte_eth_dev_callback_process_vf(struct rte_eth_dev *dev,
> + enum rte_eth_event_type event, void *param)
> +{
> + return _rte_eth_dev_callback_process_generic(dev, event, param);
> +}
> +
> +void
> +_rte_eth_dev_callback_process_generic(struct rte_eth_dev *dev,
> + enum rte_eth_event_type event, void *param)
> +{
>   struct rte_eth_dev_callback *cb_lst;
>   struct rte_eth_dev_callback dev_cb;
>  
> @@ -2699,6 +2713,9 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   continue;
>   dev_cb = *cb_lst;
>   cb_lst->active = 1;
> + if (param != NULL)
> + dev_cb.cb_arg = (void *) param;
> +
>   rte_spinlock_unlock(_eth_dev_cb_lock);
>   dev_cb.cb_fn(dev->data->port_id, dev_cb.event,
>   dev_cb.cb_arg);
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index b0fe033..4fb0b9c 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -3047,9 +3047,27 @@ enum rte_eth_event_type {
>   /**< queue state event (enabled/disabled) */
>   RTE_ETH_EVENT_INTR_RESET,
>   /**< reset interrupt event, sent to VF on PF reset */
> + RTE_ETH_EVENT_VF_MBOX,  /**< PF mailbox processing callback */
>   RTE_ETH_EVENT_MAX   /**< max value of this enum */
>  };
>  
> +/**
> + * Response sent back to ixgbe driver from user app after callback
> + */
> +enum rte_eth_mb_event_rsp {
> + RTE_ETH_MB_EVENT_NOOP_ACK,  /**< skip mbox request and ACK */
> + RTE_ETH_MB_EVENT_NOOP_NACK, /**< skip mbox request and NACK */
> + RTE_ETH_MB_EVENT_PROCEED,  /**< proceed with mbox request  */
> + RTE_ETH_MB_EVENT_MAX   /**< max value of this enum */
> +};

Do we really need to define the specifics of PF to VF MBOX communication
in normative ethdev specification?
Each drivers may have different PF to VF MBOX definitions so it may not be
very portable.
What is the use-case here? If its for VF configuration, I think we can
do it as separate 'sync' functions for each functionality so that
PMDs will have room hiding the specifics on MBOX definitions.



[dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation

2016-09-09 Thread Jerin Jacob
On Thu, Sep 08, 2016 at 04:09:05PM +, Kulasek, TomaszX wrote:
> Hi Jerin,

Hi TomaszX,

> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Thursday, September 8, 2016 09:29
> > To: Kulasek, TomaszX 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation
> > 
> 
> [...]
> 
> > > +static inline uint16_t
> > > +rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
> > **tx_pkts,
> > > + uint16_t nb_pkts)
> > > +{
> > > + struct rte_eth_dev *dev = _eth_devices[port_id];
> > > +
> > > + if (!dev->tx_pkt_prep) {
> > > + rte_errno = -ENOTSUP;
> > 
> > rte_errno update may not be necessary here. see below
> > 
> > > + return 0;
> > IMO, We should return "nb_pkts" here instead of 0(i.e, all the packets are
> > valid in-case PMD does not have tx_prep function) and in-case of "0"
> > the following check in the application also will fail for no reason if
> > (nb_prep < nb_pkts) {
> > printf("tx_prep failed\n");
> > }
> > 
> 
> Yes, it seems to be reasonable.
> 
> > 
> > > + }
> > > +
> > > +#ifdef RTE_LIBRTE_ETHDEV_DEBUG
> > > + if (queue_id >= dev->data->nb_tx_queues) {
> > > + RTE_PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", queue_id);
> > > + rte_errno = -EINVAL;
> > > + return 0;
> > > + }
> > > +#endif
> > > +
> > > + return (*dev->tx_pkt_prep)(dev->data->tx_queues[queue_id],
> > > + tx_pkts, nb_pkts);
> > > +}
> > > +
> > 
> > IMO, We need to provide a compile time option for rte_eth_tx_prep as NOOP.
> > Default option should be non NOOP but incase a _target_ want to override
> > to NOOP it should be possible, the reasons is:
> > 
> > - Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
> > only integrated NIC controller. On those targets, where integrated NIC
> > controller does not use tx_prep service it can made it as NOOP to save
> > cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
> > < nb_rx))" checks in the application.
> > 
> > /* Prepare burst of TX packets */
> > nb_prep = rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx);
> > 
> > if (unlikely(nb_prep < nb_rx)) {
> > int i;
> > for (i = nb_prep; i < nb_rx; i++)
> > rte_pktmbuf_free(pkts_burst[i]); }
> > 
> 
> You mean to have a code for NOOP like:
> 
> 
>   /* Prepare burst of TX packets */
>   nb_prep = nb_rx; /* rte_eth_tx_prep(fs->rx_port, 0, pkts_burst, nb_rx); 
> */
>  
>   if (unlikely(nb_prep < nb_rx)) {
>  int i;
>  for (i = nb_prep; i < nb_rx; i++)
>  rte_pktmbuf_free(pkts_burst[i]); }
> 
> 
> and let optimizer to remove unused parts?

I thought of creating compile time NOOP like this,
CONFIG_RTE_LIBRTE_ETHDEV_TXPREP_SUPPORT=y in config/common_base and
and have two flavors of definitions for rte_eth_tx_prep

#ifdef RTE_LIBRTE_ETHDEV_TXPREP_SUPPORT
static inline uint16_t
rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
**tx_pkts, uint16_t nb_pkts)
{
Proposed implementation
}
#else
static inline uint16_t
rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id, struct rte_mbuf
**tx_pkts, uint16_t nb_pkts)
{
(void)port_id;
(void)queue_id;
..
}
#endif

> 
> 
> IMHO it should be an application issue to use tx_prep or not.

Some cases even _target_(example: config/defconfig_arm64-*) can also decides 
that.
An example of such target is:
Low-end ARMv7,ARMv8 targets may not have PCIE-RC support and it may have
only integrated NIC controller. On those targets/configs, where integrated NIC
controller does not use tx_prep service it can made it as NOOP to save
cycles on following "rte_eth_tx_prep" and associated "if (unlikely(nb_prep
< nb_rx))" checks in the application.

> 
> While part of the job is done by the driver (verification and preparation), 
> and part by application (error handling), such a global compile time option 
> can introduce inconsistency, if application will not handle both cases.

Each DPDK application build/compile against the target/config so I think
it is OK.

> 
> If someone wants to turn off this functionality, it should be done on 
> application level, e.g. with compilation option.
>  


[dpdk-dev] [PATCH 1/6] ethdev: add Tx preparation

2016-09-08 Thread Jerin Jacob
On Fri, Aug 26, 2016 at 06:22:53PM +0200, Tomasz Kulasek wrote:
> Added API for `rte_eth_tx_prep`
> 
> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
>   struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
>   uint16_t nb_seg_max;
>   /**< Max number of segments per whole packet. */
> 
>   uint16_t nb_mtu_seg_max;
>   /**< Max number of segments per one MTU */
> 
> Created `rte_pkt.h` header with common used functions:
> 
> int rte_validate_tx_offload(struct rte_mbuf *m)
>   to validate general requirements for tx offload in packet such a
>   flag completness. In current implementation this function is called
>   optionaly when RTE_LIBRTE_ETHDEV_DEBUG is enabled.
> 
> int rte_phdr_cksum_fix(struct rte_mbuf *m)
>   to fix pseudo header checksum for TSO and non-TSO tcp/udp packets
>   before hardware tx checksum offload.
>- for non-TSO tcp/udp packets full pseudo-header checksum is
>  counted and set.
>- for TSO the IP payload length is not included.
> 
> Signed-off-by: Tomasz Kulasek 
> ---
>  lib/librte_ether/rte_ethdev.h |   74 +++
>  lib/librte_mbuf/rte_mbuf.h|8 +++
>  lib/librte_net/Makefile   |2 +-
>  lib/librte_net/rte_pkt.h  |  132 
> +
>  4 files changed, 215 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_net/rte_pkt.h
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index b0fe033..02569ca 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -182,6 +182,7 @@ extern "C" {
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "rte_ether.h"
>  #include "rte_eth_ctrl.h"
>  #include "rte_dev_info.h"
> @@ -696,6 +697,8 @@ struct rte_eth_desc_lim {
>   uint16_t nb_max;   /**< Max allowed number of descriptors. */
>   uint16_t nb_min;   /**< Min allowed number of descriptors. */
>   uint16_t nb_align; /**< Number of descriptors should be aligned to. */
> + uint16_t nb_seg_max; /**< Max number of segments per whole packet. 
> */
> + uint16_t nb_mtu_seg_max; /**< Max number of segments per one MTU */
>  };
>  
>  /**
> @@ -1181,6 +1184,12 @@ typedef uint16_t (*eth_tx_burst_t)(void *txq,
>  uint16_t nb_pkts);
>  /**< @internal Send output packets on a transmit queue of an Ethernet 
> device. */
>  
> +typedef uint16_t (*eth_tx_prep_t)(void *txq,
> +struct rte_mbuf **tx_pkts,
> +uint16_t nb_pkts);
> +/**< @internal Prepare output packets on a transmit queue of an Ethernet
> + device. */
> +
>  typedef int (*flow_ctrl_get_t)(struct rte_eth_dev *dev,
>  struct rte_eth_fc_conf *fc_conf);
>  /**< @internal Get current flow control parameter on an Ethernet device */
> @@ -1626,6 +1635,7 @@ enum rte_eth_dev_type {
>  struct rte_eth_dev {
>   eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function. */
>   eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit function. */
> + eth_tx_prep_t tx_pkt_prep; /**< Pointer to PMD transmit prepare 
> function. */
>   struct rte_eth_dev_data *data;  /**< Pointer to device data */
>   const struct eth_driver *driver;/**< Driver for this device */
>   const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
> @@ -2833,6 +2843,70 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>   return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, 
> nb_pkts);
>  }
>  
> +/**
> + * Process a burst of output packets on a transmit queue of an Ethernet 
> device.
> + *
> + * The rte_eth_tx_prep() function is invoked to prepare output packets to be
> + * transmitted on the output queue *queue_id* of the Ethernet device 
> designated
> + * by its *port_id*.
> + * The *nb_pkts* parameter is the number of packets to be prepared which are
> + * supplied in the *tx_pkts* array of *rte_mbuf* structures, each of them
> + * allocated from a pool created with rte_pktmbuf_pool_create().
> + * For each packet to send, the rte_eth_tx_prep() function performs
> + * the following operations:
> + *
> + * - Check if packet meets devices requirements for tx offloads.
> + *
> + * - Check limitations about number of segments.
> + *
> + * - Check additional requirements when debug is enabled.
> + *
> + * - Update and/or reset required checksums when tx offload is set for 
> packet.
> + *
> + * The rte_eth_tx_prep() function returns the number of packets ready to be
> + * sent. A return value equal to *nb_pkts* means that all packets are valid 
> and
> + * ready to be sent.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_id
> + *   The index of the transmit queue through which output packets must be
> + *   sent.
> + *   The 

[dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter

2016-08-19 Thread Jerin Jacob
On Fri, Aug 19, 2016 at 02:24:58PM +0200, Jan Viktorin wrote:
> On Fri, 19 Aug 2016 17:16:12 +0530
> Jerin Jacob  wrote:
> 
> 
> I've got a private kernel driver enabling and disabling (hopefully) properly
> this for ARMv7. If we'd like to merge it, I'd like to have a single module
> or at least single module with 2 implementations...
> 
> I can post it if it would be helpful.

I don't think we can use this in production as this may alter PMU state used
by 'perf' etc.I think let it be a debug interface for armv7 and armv8
and disable it by default.


> 
> Regards
> Jan
> 
> > 
> > >   
> > > > + *
> > > > + */
> > > > +static inline uint64_t
> > > > +rte_rdtsc(void)
> > > > +{
> > > > +   uint64_t tsc;
> > > > +
> > > > +   asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> > > > +   return tsc;
> > > > +}
> > > > +#endif
> > > > 
> > > >  static inline uint64_t
> > > >  rte_rdtsc_precise(void)
> > > > --
> > > > 2.5.5  
> > > 
> > > Do you also plan to support performance monitor event counters?  
> > 
> > No. This patch was inspired by armv7 PMU scheme and its part of DPDK.
> > The sole reason to add this support to catch any performance regression
> > through app/test application.Other than that, I think cntvct_el0 based
> > existing scheme is good enough for all the use cases.
> > 
> > > 
> > > Regards,
> > > Nipun
> > >   
> 
> 
> 
> -- 
>Jan Viktorin  E-mail: Viktorin at RehiveTech.com
>System Architect  Web:www.RehiveTech.com
>RehiveTech
>Brno, Czech Republic


[dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter

2016-08-19 Thread Jerin Jacob
On Fri, Aug 19, 2016 at 09:43:36AM +, Nipun Gupta wrote:
> Hi Jerin,
> 

Hi Nipun,

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Thursday, August 18, 2016 17:22
> > To: dev at dpdk.org
> > Cc: thomas.monjalon at 6wind.com; jianbo.liu at linaro.org;
> > viktorin at rehivetech.com; Jerin Jacob 
> > Subject: [dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter
> > 
> > Existing cntvct_el0 based rte_rdtsc() provides portable
> > means to get wall clock counter at user space. Typically
> > it runs at <= 100MHz.
> > 
> > The alternative method to enable rte_rdtsc() for high resolution
> > wall clock counter is through armv8 PMU subsystem.
> > The PMU cycle counter runs at CPU frequency, However,
> > access to PMU cycle counter from user space is not enabled
> > by default in the arm64 linux kernel.
> > It is possible to enable cycle counter at user space access
> > by configuring the PMU from the privileged mode (kernel space).
> > 
> > by default rte_rdtsc() implementation uses portable
> > cntvct_el0 scheme. Application can choose the PMU based
> > implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU
> > 
> > Signed-off-by: Jerin Jacob 
> > ---
> > 
> > The PMU based scheme useful for high accuracy performance profiling.
> > Find below the example steps to configure the PMU based cycle counter on an
> > armv8 machine.
> > 
> > # git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
> > # cd armv8_pmu_cycle_counter_el0
> > # make
> > # sudo insmod pmu_el0_cycle_counter.ko
> > # cd $DPDK_DIR
> > # make config T=arm64-armv8a-linuxapp-gcc
> > # echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
> > # make -j 4
> 
> Can we make this kernel module also a part of DPDK. May be in the linuxapp so 
> that it is also compiled with DPDK?

I thought so, Later I realized it may not be a good idea to add yet
another out of tree module in DPDK repo and DPDK tries to get rid of
existing out of tree modules.

> 
> > 
> > ---
> >  .../common/include/arch/arm/rte_cycles_64.h| 33
> > ++
> >  1 file changed, 33 insertions(+)
> > 
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > index 14f2612..867a946 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > @@ -45,6 +45,11 @@ extern "C" {
> >   * @return
> >   *   The time base for this lcore.
> >   */
> > +#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
> > +/**
> > + * This call is portable to any ARMv8 architecture, however, typically
> > + * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
> > + */
> >  static inline uint64_t
> >  rte_rdtsc(void)
> >  {
> > @@ -53,6 +58,34 @@ rte_rdtsc(void)
> > asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
> > return tsc;
> >  }
> > +#else
> > +/**
> > + * This is an alternative method to enable rte_rdtsc() with high resolution
> > + * PMU cycles counter.The cycle counter runs at cpu frequency and this 
> > scheme
> > + * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
> > + * access to PMU cycle counter from user space is not enabled by default in
> > + * arm64 linux kernel.
> > + * It is possible to enable cycle counter at user space access by 
> > configuring
> > + * the PMU from the privileged mode (kernel space).
> > + *
> > + * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
> > + * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
> > + * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
> > + * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
> > + * val |= (BIT(0) | BIT(2));
> > + * isb();
> > + * asm volatile("msr pmcr_el0, %0" : : "r" (val));
> 
> In your git repo I see that on cleanup the cycle count register is not 
> disabled (PMCNTENCLR_EL0). It shall be better to disable the cycle count 
> register too at module exit.

OK

> 
> > + *
> > + */
> > +static inline uint64_t
> > +rte_rdtsc(void)
> > +{
> > +   uint64_t tsc;
> > +
> > +   asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
> > +   return tsc;
> > +}
> > +#endif
> > 
> >  static inline uint64_t
> >  rte_rdtsc_precise(void)
> > --
> > 2.5.5
> 
> Do you also plan to support performance monitor event counters?

No. This patch was inspired by armv7 PMU scheme and its part of DPDK.
The sole reason to add this support to catch any performance regression
through app/test application.Other than that, I think cntvct_el0 based
existing scheme is good enough for all the use cases.

> 
> Regards,
> Nipun
> 


[dpdk-dev] [PATCH v3 2/4] virtio: move SSE based Rx implementation to separate file

2016-08-19 Thread Jerin Jacob
On Thu, Aug 18, 2016 at 02:52:31PM +0800, Yuanhan Liu wrote:
> On Tue, Jul 05, 2016 at 06:19:24PM +0530, Jerin Jacob wrote:
> > Split out SSE instruction based virtio simple Rx
> > implementation to a separate file
> > 
> > Signed-off-by: Jerin Jacob 
> 
> Hi,
> 
> I was about to apply this set. I then did some build test and found a
> weird issue: it breaks the build with clang (ubuntu 16.04).
> 
> drivers/net/virtio/virtio_rxtx_simple_sse.c:130:2: error: cast from 
> 'const void *' to 'void *' drops const qualifier [-Werror,-Wcast-qual]
> _mm_prefetch((const void *)rused, _MM_HINT_T0);
> ^
> /usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include/xmmintrin.h:684:58: 
> note: expanded from macro '_mm_prefetch'
> #define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
>  ^
> 1 error generated.
> 
> Weird enough I don't see this issue before this commit: the error
> line is exactly the same before and after this commit.

Yes, I looked at the pre processed output as well, it comes as same before and
after this commit.

> 
> Another note is that _mm_prefetch() is actually with different prototype
> for gcc and clang. For gcc, we have:
> 
> _mm_prefetch (const void *__P, enum _mm_hint __I)
> 
> Any thoughts?

How about replacing "_mm_prefetch((const void *)rused, _MM_HINT_T0)"
with "rte_prefetch0(rused)" to have same prototype and fix the issue
with clang?

> 
>   --yliu
> 
> 


[dpdk-dev] [PATCH] eal/armv8: high-resolution cycle counter

2016-08-18 Thread Jerin Jacob
Existing cntvct_el0 based rte_rdtsc() provides portable
means to get wall clock counter at user space. Typically
it runs at <= 100MHz.

The alternative method to enable rte_rdtsc() for high resolution
wall clock counter is through armv8 PMU subsystem.
The PMU cycle counter runs at CPU frequency, However,
access to PMU cycle counter from user space is not enabled
by default in the arm64 linux kernel.
It is possible to enable cycle counter at user space access
by configuring the PMU from the privileged mode (kernel space).

by default rte_rdtsc() implementation uses portable
cntvct_el0 scheme. Application can choose the PMU based
implementation with CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU

Signed-off-by: Jerin Jacob 
---

The PMU based scheme useful for high accuracy performance profiling.
Find below the example steps to configure the PMU based cycle counter on an
armv8 machine.

# git clone https://github.com/jerinjacobk/armv8_pmu_cycle_counter_el0
# cd armv8_pmu_cycle_counter_el0
# make
# sudo insmod pmu_el0_cycle_counter.ko
# cd $DPDK_DIR
# make config T=arm64-armv8a-linuxapp-gcc
# echo "CONFIG_RTE_ARM_EAL_RDTSC_USE_PMU=y" >> build/.config
# make -j 4

---
 .../common/include/arch/arm/rte_cycles_64.h| 33 ++
 1 file changed, 33 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 14f2612..867a946 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -45,6 +45,11 @@ extern "C" {
  * @return
  *   The time base for this lcore.
  */
+#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+/**
+ * This call is portable to any ARMv8 architecture, however, typically
+ * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
+ */
 static inline uint64_t
 rte_rdtsc(void)
 {
@@ -53,6 +58,34 @@ rte_rdtsc(void)
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
return tsc;
 }
+#else
+/**
+ * This is an alternative method to enable rte_rdtsc() with high resolution
+ * PMU cycles counter.The cycle counter runs at cpu frequency and this scheme
+ * uses ARMv8 PMU subsystem to get the cycle counter at userspace, However,
+ * access to PMU cycle counter from user space is not enabled by default in
+ * arm64 linux kernel.
+ * It is possible to enable cycle counter at user space access by configuring
+ * the PMU from the privileged mode (kernel space).
+ *
+ * asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
+ * asm volatile("msr pmcntenset_el0, %0" :: "r" BIT(31));
+ * asm volatile("msr pmuserenr_el0, %0" : : "r"(BIT(0) | BIT(2)));
+ * asm volatile("mrs %0, pmcr_el0" : "=r" (val));
+ * val |= (BIT(0) | BIT(2));
+ * isb();
+ * asm volatile("msr pmcr_el0, %0" : : "r" (val));
+ *
+ */
+static inline uint64_t
+rte_rdtsc(void)
+{
+   uint64_t tsc;
+
+   asm volatile("mrs %0, pmccntr_el0" : "=r"(tsc));
+   return tsc;
+}
+#endif

 static inline uint64_t
 rte_rdtsc_precise(void)
-- 
2.5.5



[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-08-10 Thread Jerin Jacob
On Tue, Aug 09, 2016 at 09:48:46AM +0100, Bruce Richardson wrote:
> On Tue, Aug 09, 2016 at 06:31:41AM +0530, Jerin Jacob wrote:
> > Find below the URL for the complete API specification.
> > 
> > https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h
> > 
> > I have created a supportive document to share the concepts of
> > event driven programming model and proposed APIs details to get
> > better reach for the specification.
> > This presentation will cover introduction to event driven programming model 
> > concepts,
> > characteristics of hardware-based event manager devices,
> > RFC API proposal, example use case, and benefits of using the event driven 
> > programming model.
> > 
> > Find below the URL for the supportive document.
> > 
> > https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf
> > 
> > git repo for the above documents:
> > 
> > https://github.com/jerinjacobk/libeventdev/
> > 
> > Looking forward to getting comments from both application and driver
> > implementation perspective.
> > 
> 
> Hi Jerin,
> 

Hi Bruce,

> thanks for the RFC. Packet distribution and scheduling is something we've been
> thinking about here too. This RFC gives us plenty of new ideas to take on 
> board. :-)

Thanks

> While you refer to HW implementations on SOC's, have you given any thought to
> how a pure-software implementation of an event API might work? I know that

Yes. I have removed almost all hardware specific details from the API
specification. Mostly the APIs are driven by the use case.

I had impression that software based scheme will use
lib_rte_distributor or lib_rte_reorder libraries to get load balancing
and reordering features. However, if we are looking for some converged
solution without impacting the HW models then I think it is a good step
forward.

IMO, Implementing the ORDERED schedule sync method in a performance effective
way in the SW may be tricky. May be we can introduces some capability based
schemes to co-exists the HW and SW solution.

> while a software implemenation can obviously be done for just about any API,
> I'd be concerned that the API not get in the way of a very highly
> tuned implementation.
> 
> We'll look at it in some detail and get back to you with our feedback, as soon
> as we can, to start getting the discussion going.

OK

> 
> Regards,
> /Bruce
> 


[dpdk-dev] [RFC] libeventdev: event driven programming model framework for DPDK

2016-08-09 Thread Jerin Jacob
Hi All,

Find below an RFC API specification which attempts to
define the standard application programming interface
for event driven programming in DPDK and to abstract HW based event devices.

These devices can support event scheduling and flow ordering
in HW and typically found in NW SoCs as an integrated device or
as PCI EP device.

The RFC APIs are inspired from existing ethernet and crypto devices.
Following are the requirements considered to define the RFC API.

1) APIs similar to existing Ethernet and crypto API framework for
? Device creation, device Identification and device configuration
2) Enumerate libeventdev resources as numbers(0..N) to
? Avoid ABI issues with handles
? Event device may have million flow queues so it's not practical to
have handles for each flow queue and its associated name based
lookup in multiprocess case
3) Avoid struct mbuf changes
4) APIs to
? Enumerate eventdev driver capabilities and resources
? Enqueue events from l-core
? Schedule events
? Synchronize events
? Maintain ingress order of the events
? Run to completion support

Find below the URL for the complete API specification.

https://rawgit.com/jerinjacobk/libeventdev/master/rte_eventdev.h

I have created a supportive document to share the concepts of
event driven programming model and proposed APIs details to get
better reach for the specification.
This presentation will cover introduction to event driven programming model 
concepts,
characteristics of hardware-based event manager devices,
RFC API proposal, example use case, and benefits of using the event driven 
programming model.

Find below the URL for the supportive document.

https://rawgit.com/jerinjacobk/libeventdev/master/DPDK-event_driven_programming_framework.pdf

git repo for the above documents:

https://github.com/jerinjacobk/libeventdev/

Looking forward to getting comments from both application and driver
implementation perspective.

What follows is the text version of the above documents, for inline comments 
and discussion.
I intend to update that specification accordingly.

/**
 * Get the total number of event devices that have been successfully
 * initialised.
 *
 * @return
 *   The total number of usable event devices.
 */
extern uint8_t
rte_eventdev_count(void);

/**
 * Get the device identifier for the named event device.
 *
 * @param name
 *   Event device name to select the event device identifier.
 *
 * @return
 *   Returns event device identifier on success.
 *   - <0: Failure to find named event device.
 */
extern uint8_t
rte_eventdev_get_dev_id(const char *name);

/*
 * Return the NUMA socket to which a device is connected.
 *
 * @param dev_id
 *   The identifier of the device.
 * @return
 *   The NUMA socket id to which the device is connected or
 *   a default of zero if the socket could not be determined.
 *   - -1: dev_id value is out of range.
 */
extern int
rte_eventdev_socket_id(uint8_t dev_id);

/**  Event device information */
struct rte_eventdev_info {
const char *driver_name;/**< Event driver name */
struct rte_pci_device *pci_dev; /**< PCI information */
uint32_t min_sched_wait_ns;
/**< Minimum supported scheduler wait delay in ns by this device */
uint32_t max_sched_wait_ns;
/**< Maximum supported scheduler wait delay in ns by this device */
uint32_t sched_wait_ns;
/**< Configured scheduler wait delay in ns of this device */
uint32_t max_flow_queues_log2;
/**< LOG2 of maximum flow queues supported by this device */
uint8_t  max_sched_groups;
/**< Maximum schedule groups supported by this device */
uint8_t  max_sched_group_priority_levels;
/**< Maximum schedule group priority levels supported by this device */
}

/**
 * Retrieve the contextual information of an event device.
 *
 * @param dev_id
 *   The identifier of the device.
 * @param[out] dev_info
 *   A pointer to a structure of type *rte_eventdev_info* to be filled with the
 *   contextual information of the device.
 */
extern void
rte_eventdev_info_get(uint8_t dev_id, struct rte_eventdev_info *dev_info);

/** Event device configuration structure */
struct rte_eventdev_config {
uint32_t sched_wait_ns;
/**< rte_event_schedule() wait for *sched_wait_ns* ns on this device */
uint32_t nb_flow_queues_log2;
/**< LOG2 of the number of flow queues to configure on this device */
uint8_t  nb_sched_groups;
/**< The number of schedule groups to configure on this device */
};

/**
 * Configure an event device.
 *
 * This function must be invoked first before any other function in the
 * API. This function can also be re-invoked when a device is in the
 * stopped state.
 *
 * The caller may use rte_eventdev_info_get() to get the capability of each
 * resources available in this event device.
 *
 * @param dev_id
 *   The identifier of the device to configure.
 * 

[dpdk-dev] [PATCH] doc: announce driver name changes

2016-07-29 Thread Jerin Jacob
On Sat, Jul 09, 2016 at 05:56:34PM +0100, Pablo de Lara wrote:
> Driver names for all the supported devices in DPDK do not have
> a naming convention. Some are using a prefix, some are not
> and some have long names. Driver names are used when creating
> virtual devices, so it is useful to have consistency in the names.
> 
> Signed-off-by: Pablo de Lara 

Acked-by: Jerin Jacob 

> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index f502f86..37d65c8 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -41,3 +41,8 @@ Deprecation Notices
>  * The mempool functions for single/multi producer/consumer are deprecated and
>will be removed in 16.11.
>It is replaced by rte_mempool_generic_get/put functions.
> +
> +* Driver names are quite inconsistent among each others and they will be
> +  renamed to something more consistent (net_ prefix for net drivers and
> +  crypto_ for crypto drivers) in 16.11. Some of these driver names are used
> +  publicly, to create virtual devices, so a deprecation notice is necessary.
> -- 
> 2.7.4
> 


[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-28 Thread Jerin Jacob
On Thu, Jul 28, 2016 at 04:52:45PM +0200, Thomas Monjalon wrote:
> 2016-07-28 19:29, Jerin Jacob:
> > Above things worries me, I wouldn't have cared if the changes are not comes
> > in fastpath and I don't think this sort of issues will never get fixed any 
> > time
> > soon in this community.
> > 
> > So I given up.
> 
> I feel something goes wrong here but I cannot understand your sentence.
> Please could you reword/explain Jerin?

I guess you have removed the context from the email. Never mind.

1) IMHO, Introducing a new fast path API which has "performance impact"
on existing other PMD should get the consensus from the other PMD maintainers.
At least, bare minimum, send a patch much in advance with the
implementation of ethdev API as well as PMD
driver implementation to get feedback from other developers _before_ ABI
change announcement rather just debating on hypothetical points.

2) What I can understand from the discussion is that it is the
workaround for an HW limitation.
At this point, I am not sure tx_prep is the only way to address it and
do other PMD have similar
restriction?. If yes, Can we have abstract it in a proper way the usage
part will be very clear from PMD and application perspective?

Jerin


[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-28 Thread Jerin Jacob
On Thu, Jul 28, 2016 at 01:01:16PM +, Ananyev, Konstantin wrote:
> 
> 
> > > >
> > > > Not according to proposal. It can't be too as application has no
> > > > idea what PMD driver does with "prep" what is the implication on a
> > > > HW if application does not
> > >
> > > Why application writer wouldn't have an idea?
> > > We would document what tx_prep() supposed to do, and in what cases user 
> > > don't need it.
> > 
> > But how he/she detect that on that run-time ?
> 
> By the application logic for example.
> If let say is doing the l2fwd for that group of packets, it would know
> that it doesn't need to do tx_prep().
> 
> To be honest, I don't understand what is your concern here.
> That proposed change doesn't break any existing functionality,
> doesn't introduce any new requirements to the existing API, 
> and wouldn't introduce any performance regression for existing apps.

Yes for the existing application but no for ANY application that uses tx_prep() 
in future,
that run on the PMD where callback is NULL(one/two PMDs vs N PMDs)

> It is a an extension, and user is free not to use it, if it doesn't fit his 
> needs.

If it is a workaround for a specific HW then why to change the normative
"fastpath" ethdev specification. You could give your fixup as internal
PMD driver routine and be done with it. It is as simple as that.

> From other side there are users who are interested in that functionality,
> and they do have use-cases for  it.
> So what worries you?
Above things worries me, I wouldn't have cared if the changes are not comes
in fastpath and I don't think this sort of issues will never get fixed any time
soon in this community.

So I given up.

Jerin

> Konstantin
> 
> > 
> > > Then it would be up to the user:
> > > - not to use it at all (one segment per packet, no HW TX offloads)
> > 
> > We already have TX flags for that
> > 
> > > - not to use tx_prep(), and make necessary preparations himself,
> > >   that what people have to do now.
> > > - use tx_prep()
> > >
> > > Konstantin
> > >


[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-28 Thread Jerin Jacob
On Thu, Jul 28, 2016 at 10:36:07AM +, Ananyev, Konstantin wrote:
> > If it does not cope up then it can skip tx'ing in the actual tx burst
> > itself and move the "skipped" tx packets to end of the list in the tx
> > burst so that application can take the action on "skipped" packet after
> > the tx burst
> 
> Sorry, that's too cryptic for me.
> Can you reword it somehow?

OK. 
1) lets say application requests 32 packets to send it using tx_burst.
2) packets are from p0 to p31
3) in driver due to some reason, it is not able to send the packets due to some
constraints in the driver(say expect p2 and p16 everything else sent
successfully by the driver)
4) driver can move p2 and p16 at pkt[0] and pkt[1] on tx_burst and
return 30
5) application can take action on p2 and p16 based the return value of
30(ie 32-30 = 2 packets needs to handle at pkt[0] and pkt[1]


> 
> > 
> > 
> > > Instead it just setups the ol_flags, fills tx_offload fields and calls 
> > > tx_prep().
> > > Please read the original Tomasz's patch, I think he explained possible 
> > > use-cases
> > > with lot of details.
> > 
> > Sorry, it is not very clear in terms of use cases.
> 
> Ok, what I meant to say:
> Right now, if user wants to use HW TX cksum/TSO offloads he might have to:
> - setup ipv4 header cksum field.
> - calculate the pseudo header checksum
> - setup tcp/udp cksum field.
> 
> Rules how these calculations need to be done and which fields need to be 
> updated,
> may vary depending on HW underneath and requested offloads.
> tx_prep() - supposed to hide all these nuances from user and allow him to use 
> TX HW offloads
> in a transparent way.

Not sure I understand it completely. Bit contradicting with below
statement
|We would document what tx_prep() supposed to do, and in what cases user
|don't need it.

How about introducing a new ethdev generic eal command-line mode OR
new ethdev_configure hint that PMD driver is in "tx_prep->tx_burst" mode
instead of just tx_burst? That way no fast-path performance degradation
for the PMD that does not need it


> 
> Another main purpose of tx_prep(): for multi-segment packets is to check
> that number of segments doesn't exceed  HW limit.
> Again right now users have to do that on their own.
> 
> > 
> > In HW perspective, It it tries to avoid the illegal state. But not sure
> > calling "back to back" tx prepare and then tx burst how does it improve the
> > situation as the check illegal state check introduce in actual tx burst
> > it self.
> > 
> > In SW perspective, its try to avoid sending malformed packets. In my
> > view the same can achieved with existing tx burst it self as PMD is the
> > one finally send the packets on the wire.
> 
> Ok, so your question is: why not to put that functionality into
> tx_burst() itself, right?
> For few reasons:
> 1. putting that functionality into tx_burst() would introduce unnecessary
> slowdown for cases when that functionality is not needed
> (one segment per packet, no HW offloads).

These parameters can be configured on init time

> 2. User might don't want to use tx_prep() - he/she might have its
> own analog, which he/she belives is faster/smarter,etc.

That's the current mode. Right?
> 3.  Having it a s separate function would allow user control when/where
>   to call it, let say only for some packets, or probably call tx_prep()
>   on one core, and do actual tx_burst() for these packets on the other. 
Why to process it under tx_prep() as application can always process the
packet in one core

> > 
> > proposal quote:
> > 
> > 1. Introduce rte_eth_tx_prep() function to do necessary preparations of
> >packet burst to be safely transmitted on device for desired HW
> >offloads (set/reset checksum field according to the hardware
> >requirements) and check HW constraints (number of segments per
> >packet, etc).
> > 
> >While the limitations and requirements may differ for devices, it
> >requires to extend rte_eth_dev structure with new function pointer
> >"tx_pkt_prep" which can be implemented in the driver to prepare and
> >verify packets, in devices specific way, before burst, what should to
> >prevent application to send malformed packets.
> > 
> > 
> > >
> > > > and what if the PMD does not implement that callback then it is of 
> > > > waste cycles. Right?
> > >
> > > If you refer as lost cycles here something like:
> > > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP);
> > > then yes.
> > > Though comparing to actual work need to be done for most HW TX offloads,
> > > I think it is neglectable.
> > 
> > Not sure.
> > 
> > > Again, as I said before, it is totally voluntary for the application.
> > 
> > Not according to proposal. It can't be too as application has no idea
> > what PMD driver does with "prep" what is the implication on a HW if
> > application does not
> 
> Why application writer wouldn't have an idea? 
> We would document what tx_prep() supposed to do, and in 

[dpdk-dev] usages issue with external mempool

2016-07-28 Thread Jerin Jacob
On Thu, Jul 28, 2016 at 10:32:44AM +0200, Olivier MATZ wrote:

Hi Olivier,

> Hi Hemant, Jerin,
> 
> On 07/27/2016 11:51 AM, Jerin Jacob wrote:
> > On Tue, Jul 26, 2016 at 10:11:13AM +, Hemant Agrawal wrote:
> 
> > 
> > I agree, To me, this is very bad. I have raised this concern earlier
> > also
> > 
> > Since applications like OVS goes through "rte_mempool_create" for
> > even packet buffer pool creation. IMO it make senses to extend
> > "rte_mempool_create" to take one more argument to provide external pool
> > handler name(NULL for default). I don't see any valid technical reason
> > to treat external pool handler based mempool creation API different
> > from default handler.
> 
> I disagree that changing from one function do_many_stuff(11 args) to several
> do_one_stuff(few args) functions is a regression.
> 
> I don't feel that having a new function with 12 args solves anything.
> What is the problem of having 20 lines of code for initializing a mbuf pool?
> The new API gives more flexibility, and it allow an application to define
> its own function if the default one cannot be used.
>

The problem I have in this scheme is that there is no visibility on converging 
part of
external vs default based pool manager Creation/usage. Can we deprecate the
original (11 args) API have ONLY ONE WAY to create/use the mempool
irrespective of external or internal ?.So in application perspective it is
converged and its matter selecting the handler name.
I believe deprecation of (11 args) API is the only that can happen ?
Any other thoughts on converging it?

for an example,
look at cryptodev, selection of SW based virtual or HW based physical
device section and usage are identical in APPLICATION perspective.





> I think that the name of the functions pretty well defines what they do:
> 
>   rte_mempool_create_empty(): create an empty mempool
>   rte_mempool_set_ops_byname(): set the mempool handler from its name
>   rte_pktmbuf_pool_init(): initialize the mempool as a packet pool
>   rte_mempool_populate_default(): populate the pool with objects
>   rte_mempool_obj_iter(): call a function for each object
> 

I agree.New APIs are great. See above


Jerin


[dpdk-dev] [PATCH v2] doc: announce ABI change for mbuf structure

2016-07-28 Thread Jerin Jacob
On Wed, Jul 20, 2016 at 09:16:14AM +0200, Olivier Matz wrote:
> For 16.11, the mbuf structure will be modified implying ABI breakage.
> Some discussions already took place here:
> http://www.dpdk.org/dev/patchwork/patch/12878/
> 
> Signed-off-by: Olivier Matz 
Acked-by: Jerin Jacob 


[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-28 Thread Jerin Jacob
On Wed, Jul 27, 2016 at 08:51:09PM +, Ananyev, Konstantin wrote:
> 
> > 
> > On Wed, Jul 27, 2016 at 05:33:01PM +, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-
> > > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > > Sent: Wednesday, July 27, 2016 6:11 PM
> > > > To: Thomas Monjalon 
> > > > Cc: Kulasek, TomaszX ; dev at dpdk.org;
> > > > Ananyev, Konstantin 
> > > > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for
> > > > rte_eth_dev structure
> > > >
> > > > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > > > Signed-off-by: Tomasz Kulasek 
> > > > > > > ---
> > > > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev``
> > > > > > > +structure will be
> > > > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > > > +verification
> > > > > > > +  and processing of packet burst to meet HW specific
> > > > > > > +requirements before
> > > > > > > +  transmit. Also new fields will be added to the 
> > > > > > > ``rte_eth_desc_lim`` structure:
> > > > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing
> > > > > > > +information about number of
> > > > > > > +  segments limit to be transmitted by device for TSO/non-TSO 
> > > > > > > packets.
> > > > > >
> > > > > > Acked-by: Konstantin Ananyev 
> > > > >
> > > > > I think I understand you want to split the TX processing:
> > > > >   1/ modify/write in mbufs
> > > > >   2/ write in HW
> > > > > and let application decide:
> > > > >   - where the TX prep is done (which core)
> > > >
> > > > In what basics applications knows when and where to call tx_pkt_prep in 
> > > > fast path.
> > > > if all the time it needs to call before tx_burst then the PMD won't
> > > > have/don't need this callback waste cycles in fast path.Is this the 
> > > > expected behavior ?
> > > > Anything think it as compile time to make other PMDs wont suffer 
> > > > because of this change.
> > >
> > > Not sure what suffering you are talking about...
> > > Current model - i.e. when application does preparations (or doesn't if
> > > none is required) on its own and just call tx_burst() would still be 
> > > there.
> > > If the app doesn't want to use tx_prep() by some reason - that still
> > > ok, and decision is up to the particular app.
> > 
> > So my question is in what basics application decides to call the 
> > preparation.
> > Can you tell me the use case in application perspective?
> 
> I suppose one most common use-case when application uses HW TX offloads,
> and don't' to cope on its own which L3/L4 header fields need to be filled
> for that particular dev_type/hw_offload combination.

If it does not cope up then it can skip tx'ing in the actual tx burst
itself and move the "skipped" tx packets to end of the list in the tx
burst so that application can take the action on "skipped" packet after
the tx burst


> Instead it just setups the ol_flags, fills tx_offload fields and calls 
> tx_prep().
> Please read the original Tomasz's patch, I think he explained possible 
> use-cases 
> with lot of details.

Sorry, it is not very clear in terms of use cases.

In HW perspective, It it tries to avoid the illegal state. But not sure
calling "back to back" tx prepare and then tx burst how does it improve the
situation as the check illegal state check introduce in actual tx burst
it self.

In SW perspective, its try to avoid sending malformed packets. In my
view the same can achieved with existing tx burst it self as PMD is the
one finally send the packets on the wire.

proposal quote:

1. Introduce rte_eth_tx_prep() function to do necessary preparations of
   packet burst to be safely transmitted on device for desired HW
   offloads (set/reset checksum field according to the hardware
   requirements) and check HW constraints (number of segments per
   packet, etc).

   While the limitations and requirements may differ for devices, it
   requires to extend rte_eth_dev structure with new function pointer
   "tx_pkt_prep" which can be implemented in the driver to prepare and
   verify packets, in devices specific way, before

[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-28 Thread Jerin Jacob
On Wed, Jul 27, 2016 at 05:33:01PM +, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Wednesday, July 27, 2016 6:11 PM
> > To: Thomas Monjalon 
> > Cc: Kulasek, TomaszX ; dev at dpdk.org; 
> > Ananyev, Konstantin 
> > Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev 
> > structure
> > 
> > On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > > > Signed-off-by: Tomasz Kulasek 
> > > > > ---
> > > > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure
> > > > > +will be
> > > > > +  extended with new function pointer ``tx_pkt_prep`` allowing
> > > > > +verification
> > > > > +  and processing of packet burst to meet HW specific requirements
> > > > > +before
> > > > > +  transmit. Also new fields will be added to the 
> > > > > ``rte_eth_desc_lim`` structure:
> > > > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information
> > > > > +about number of
> > > > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > > >
> > > > Acked-by: Konstantin Ananyev 
> > >
> > > I think I understand you want to split the TX processing:
> > >   1/ modify/write in mbufs
> > >   2/ write in HW
> > > and let application decide:
> > >   - where the TX prep is done (which core)
> > 
> > In what basics applications knows when and where to call tx_pkt_prep in 
> > fast path.
> > if all the time it needs to call before tx_burst then the PMD won't 
> > have/don't need this callback waste cycles in fast path.Is this the expected
> > behavior ?
> > Anything think it as compile time to make other PMDs wont suffer because of 
> > this change.
> 
> Not sure what suffering you are talking about...
> Current model - i.e. when application does preparations (or doesn't if none 
> is required)
> on its own and just call tx_burst() would still be there.
> If the app doesn't want to use tx_prep() by some reason - that still ok,
> and decision is up to the particular app. 

So my question is in what basics application decides to call the preparation.
Can you tell me the use case in application perspective?
and what if the PMD does not implement that callback then it is of waste
cycles. Right?

Jerin


> Konstantin
> 
> > 
> > 
> > >   - what to do if the TX prep fail
> > > So adding some processing in this first part becomes "not too
> > > expensive" or "manageable" from the application point of view.
> > >
> > > If I well understand the intent,
> > >
> > > Acked-by: Thomas Monjalon  (except typos ;)


[dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev structure

2016-07-27 Thread Jerin Jacob
On Wed, Jul 27, 2016 at 01:59:01AM -0700, Thomas Monjalon wrote:
> > > Signed-off-by: Tomasz Kulasek 
> > > ---
> > > +* In 16.11 ABI changes are plained: the ``rte_eth_dev`` structure will be
> > > +  extended with new function pointer ``tx_pkt_prep`` allowing 
> > > verification
> > > +  and processing of packet burst to meet HW specific requirements before
> > > +  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` 
> > > structure:
> > > +  ``nb_seg_max`` and ``nb_mtu_seg_max`` provideing information about 
> > > number of
> > > +  segments limit to be transmitted by device for TSO/non-TSO packets.
> > 
> > Acked-by: Konstantin Ananyev 
> 
> I think I understand you want to split the TX processing:
>   1/ modify/write in mbufs
>   2/ write in HW
> and let application decide:
>   - where the TX prep is done (which core)

In what basics applications knows when and where to call tx_pkt_prep in fast 
path.
if all the time it needs to call before tx_burst then the PMD won't have/don't 
need this
callback waste cycles in fast path.Is this the expected behavior ?
Anything think it as compile time to make other PMDs wont suffer because
of this change.


>   - what to do if the TX prep fail
> So adding some processing in this first part becomes "not too expensive" or
> "manageable" from the application point of view.
> 
> If I well understand the intent,
> 
> Acked-by: Thomas Monjalon 
> (except typos ;)


[dpdk-dev] usages issue with external mempool

2016-07-27 Thread Jerin Jacob
On Tue, Jul 26, 2016 at 10:11:13AM +, Hemant Agrawal wrote:
> Hi,
>There was lengthy discussions w.r.t external mempool patches. 
> However, I am still finding usages issue with the agreed approach.
> 
> The existing API to create packet mempool, "rte_pktmbuf_pool_create" does not 
> provide the option to change the object init iterator. This may be the reason 
> that many applications (e.g. OVS) are using rte_mempool_create to create 
> packet mempool  with their own object iterator (e.g. ovs_rte_pktmbuf_init).
> 
> e.g the existing usages are:
> dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
>  MP_CACHE_SZ,
>  sizeof(struct rte_pktmbuf_pool_private),
>  rte_pktmbuf_pool_init, NULL,
>  ovs_rte_pktmbuf_init, NULL,
> socket_id, 0);
> 
> 
> With the new API set for packet pool create, this need to be changed to:
> 
> dmp->mp = rte_mempool_create_empty(mp_name, mp_size, MBUF_SIZE(mtu),
>  MP_CACHE_SZ,
>  sizeof(struct rte_pktmbuf_pool_private),
>  socket_id, 0);
>   if (dmp->mp == NULL)
>  break;
> 
>   rte_errno = rte_mempool_set_ops_byname(dmp-mp,
> 
> RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>   if (rte_errno != 0) {
>  RTE_LOG(ERR, MBUF, "error 
> setting mempool handler\n");
>  return NULL;
>   }
>   rte_pktmbuf_pool_init(dmp->mp, NULL);
> 
>   ret = rte_mempool_populate_default(dmp->mp);
>   if (ret < 0) {
>  rte_mempool_free(dmp->mp);
>  rte_errno = -ret;
>  return NULL;
>   }
> 
>   rte_mempool_obj_iter(dmp->mp, 
> ovs_rte_pktmbuf_init, NULL);
> 
> This is not a user friendly approach to ask for changing 1 API to 6 new APIs. 
> Or, am I missing something?

I agree, To me, this is very bad. I have raised this concern earlier
also

Since applications like OVS goes through "rte_mempool_create" for
even packet buffer pool creation. IMO it make senses to extend
"rte_mempool_create" to take one more argument to provide external pool
handler name(NULL for default). I don't see any valid technical reason
to treat external pool handler based mempool creation API different
from default handler.

Oliver, David

Thoughts ?

If we agree on this then may be I can send the API deprecation notices for
rte_mempool_create for v16.11

Jerin


> 
> I think, we should do one of the following:
> 
> 1. Enhance "rte_pktmbuf_pool_create" to optionally accept 
> "rte_mempool_obj_cb_t *obj_init, void *obj_init_arg" as inputs. If obj_init 
> is not present, default can be used.
> 2. Create a new wrapper API (e.g. e_pktmbuf_pool_create_new) with  the above 
> said behavior e.g.:
> /* helper to create a mbuf pool */
> struct rte_mempool *
> rte_pktmbuf_pool_create_new(const char *name, unsigned n,
>unsigned cache_size, uint16_t priv_size, uint16_t 
> data_room_size,
> rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
>int socket_id)
> 3. Let the existing rte_mempool_create accept flag as 
> "MEMPOOL_F_HW_PKT_POOL". Obviously, if this flag is set - all other flag 
> values should be ignored. This was discussed earlier also.
> 
> Please share your opinion.
> 
> Regards,
> Hemant
> 
> 


[dpdk-dev] [PATCH] ring: fix sc dequeue performance issue

2016-07-24 Thread Jerin Jacob
Use of rte_smb_wmb() instead of rte_smb_rmb() in sc dequeue
function creates the additional overhead of waiting for
all the STOREs to be completed to local buffer from ring buffer
memory. The sc dequeue function demands only LOAD-STORE barrier
where LOADs from ring buffer memory needs to be
completed before tail pointer update. Changing to rte_smb_rmb()
to enable the required LOAD-STORE barrier.

Fixes: ecc7d10e448e ("ring: guarantee dequeue ordering before tail update")

Signed-off-by: Jerin Jacob 
---
 lib/librte_ring/rte_ring.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f928324..0e22e69 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -756,7 +756,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void 
**obj_table,

/* copy in table */
DEQUEUE_PTRS();
-   rte_smp_wmb();
+   rte_smp_rmb();

__RING_STAT_ADD(r, deq_success, n);
r->cons.tail = cons_next;
-- 
2.5.5



[dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee ordering before tail update

2016-07-23 Thread Jerin Jacob
On Sat, Jul 23, 2016 at 12:32:01PM +, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Saturday, July 23, 2016 12:49 PM
> > To: Ananyev, Konstantin 
> > Cc: Thomas Monjalon ; Juhamatti Kuusisaari 
> > ; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee 
> > ordering before tail update
> > 
> > On Sat, Jul 23, 2016 at 11:15:27AM +, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > > > Sent: Saturday, July 23, 2016 11:39 AM
> > > > To: Ananyev, Konstantin 
> > > > Cc: Thomas Monjalon ; Juhamatti
> > > > Kuusisaari ; dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH] lib: change rte_ring dequeue to
> > > > guarantee ordering before tail update
> > > >
> > > > On Sat, Jul 23, 2016 at 10:14:51AM +, Ananyev, Konstantin wrote:
> > > > > Hi lads,
> > > > >
> > > > > > On Sat, Jul 23, 2016 at 11:02:33AM +0200, Thomas Monjalon wrote:
> > > > > > > 2016-07-23 8:05 GMT+02:00 Jerin Jacob  > > > > > > caviumnetworks.com>:
> > > > > > > > On Thu, Jul 21, 2016 at 11:26:50PM +0200, Thomas Monjalon wrote:
> > > > > > > >> > > Consumer queue dequeuing must be guaranteed to be done
> > > > > > > >> > > fully before the tail is updated. This is not
> > > > > > > >> > > guaranteed with a read barrier, changed to a write
> > > > > > > >> > > barrier just before tail update which in
> > > > > > practice guarantees correct order of reads and writes.
> > > > > > > >> > >
> > > > > > > >> > > Signed-off-by: Juhamatti Kuusisaari
> > > > > > > >> > > 
> > > > > > > >> >
> > > > > > > >> > Acked-by: Konstantin Ananyev
> > > > > > > >> > 
> > > > > > > >>
> > > > > > > >> Applied, thanks
> > > > > > > >
> > > > > > > > There was ongoing discussion on this
> > > > > > > > http://dpdk.org/ml/archives/dev/2016-July/044168.html
> > > > > > >
> > > > > > > Sorry Jerin, I forgot this email.
> > > > > > > The problem is that nobody replied to your email and you did
> > > > > > > not nack the v2 of this patch.
> > > > >
> > > > > It's probably my bad.
> > > > > I acked the patch before Jerin response, and forgot to reply later.
> > > > >
> > > > > > >
> > > > > > > > This change may not be required as it has the performance 
> > > > > > > > impact.
> > > > > > >
> > > > > > > We need to clearly understand what is the performance impact
> > > > > > > (numbers and use cases) on one hand, and is there a real bug
> > > > > > > fixed by this patch on the other hand?
> > > > > >
> > > > > > IHMO, there is no real bug here. rte_smb_rmb() provides the
> > > > > > LOAD-STORE barrier to make sure tail pointer WRITE happens only 
> > > > > > after prior LOADS.
> > > > >
> > > > > Yep, from what I read at the link Jerin provided, indeed it seems 
> > > > > rte_smp_rmb() is enough for the arm arch here...
> > > > > For ppc, as I can see both rte_smp_rmb()/rte_smp_wmb() emits the same 
> > > > > instruction.
> > > > >
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Wonder how big is a performance impact?
> > > >
> > > > With this change we need to wait for addtional STORES to be completed 
> > > > to local buffer in addtion to LOADS from ring buffers memory.
> > >
> > > I understand that, just wonder did you see any real performance 
> > > difference?
> > 
> > Yeah...
> 
> Ok, then I don't see any good reason why we shouldn't revert it.
> I suppose the best way would be to submit a new patch for RC5 to revert the 
> changes.

[dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee ordering before tail update

2016-07-23 Thread Jerin Jacob
On Sat, Jul 23, 2016 at 11:15:27AM +, Ananyev, Konstantin wrote:
> 
> 
> > -Original Message-
> > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> > Sent: Saturday, July 23, 2016 11:39 AM
> > To: Ananyev, Konstantin 
> > Cc: Thomas Monjalon ; Juhamatti Kuusisaari 
> > ; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee 
> > ordering before tail update
> > 
> > On Sat, Jul 23, 2016 at 10:14:51AM +, Ananyev, Konstantin wrote:
> > > Hi lads,
> > >
> > > > On Sat, Jul 23, 2016 at 11:02:33AM +0200, Thomas Monjalon wrote:
> > > > > 2016-07-23 8:05 GMT+02:00 Jerin Jacob  > > > > caviumnetworks.com>:
> > > > > > On Thu, Jul 21, 2016 at 11:26:50PM +0200, Thomas Monjalon wrote:
> > > > > >> > > Consumer queue dequeuing must be guaranteed to be done
> > > > > >> > > fully before the tail is updated. This is not guaranteed
> > > > > >> > > with a read barrier, changed to a write barrier just before
> > > > > >> > > tail update which in
> > > > practice guarantees correct order of reads and writes.
> > > > > >> > >
> > > > > >> > > Signed-off-by: Juhamatti Kuusisaari
> > > > > >> > > 
> > > > > >> >
> > > > > >> > Acked-by: Konstantin Ananyev 
> > > > > >>
> > > > > >> Applied, thanks
> > > > > >
> > > > > > There was ongoing discussion on this
> > > > > > http://dpdk.org/ml/archives/dev/2016-July/044168.html
> > > > >
> > > > > Sorry Jerin, I forgot this email.
> > > > > The problem is that nobody replied to your email and you did not
> > > > > nack the v2 of this patch.
> > >
> > > It's probably my bad.
> > > I acked the patch before Jerin response, and forgot to reply later.
> > >
> > > > >
> > > > > > This change may not be required as it has the performance impact.
> > > > >
> > > > > We need to clearly understand what is the performance impact
> > > > > (numbers and use cases) on one hand, and is there a real bug fixed
> > > > > by this patch on the other hand?
> > > >
> > > > IHMO, there is no real bug here. rte_smb_rmb() provides the
> > > > LOAD-STORE barrier to make sure tail pointer WRITE happens only after 
> > > > prior LOADS.
> > >
> > > Yep, from what I read at the link Jerin provided, indeed it seems 
> > > rte_smp_rmb() is enough for the arm arch here...
> > > For ppc, as I can see both rte_smp_rmb()/rte_smp_wmb() emits the same 
> > > instruction.
> > >
> > > >
> > > > Thoughts?
> > >
> > > Wonder how big is a performance impact?
> > 
> > With this change we need to wait for addtional STORES to be completed to 
> > local buffer in addtion to LOADS from ring buffers memory.
> 
> I understand that, just wonder did you see any real performance difference?

Yeah...

> Probably with ring_perf_autotest/mempool_perf_autotest or something?

W/O change 
RTE>>ring_perf_autotest 
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 4
MP/MC single enq/dequeue: 16
SP/SC burst enq/dequeue (size: 8): 0
MP/MC burst enq/dequeue (size: 8): 2
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.35
MC empty dequeue: 0.60

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 0.93
MP/MC bulk enq/dequeue (size: 8): 2.45
SP/SC bulk enq/dequeue (size: 32): 0.58
MP/MC bulk enq/dequeue (size: 32): 0.97

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 1.89
MP/MC bulk enq/dequeue (size: 8): 4.28
SP/SC bulk enq/dequeue (size: 32): 0.90
MP/MC bulk enq/dequeue (size: 32): 1.19
Test OK
RTE>>

With change
RTE>>ring_perf_autotest 
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 6
MP/MC single enq/dequeue: 16
SP/SC burst enq/dequeue (size: 8): 1
MP/MC burst enq/dequeue (size: 8): 2
SP/SC burst enq/dequeue (size: 32): 0
MP/MC burst enq/dequeue (size: 32): 0

### Testing empty dequeue ###
SC empty dequeue: 0.35
MC empty dequeue: 0.60

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 1.28
MP/MC bulk enq/dequeue (size: 8): 2.47
SP/SC bulk enq/dequeue (size: 32): 0.64
MP/MC bulk enq/dequeue (size: 32): 0.97

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 2.08
MP/MC bulk enq/dequeue (size: 8): 4.29
SP/SC bulk enq/dequeue (size: 32): 1.24
MP/MC bulk enq/dequeue (size: 32): 1.19
Test OK

> Konstantin 
> 
> > 
> > > If there is a real one, I suppose we can revert the patch?
> > 
> > Request to revert this one as their no benifts for other architectures and 
> > indeed it creates addtional delay in waiting for STORES to complete
> > in ARM.
> > Lets do the correct thing by reverting it.
> > 
> > Jerin
> > 
> > 
> > 
> > > Konstantin
> > >
> > > >
> > > > >
> > > > > Please guys make things clear and we'll revert if needed.


[dpdk-dev] [PATCH] lib: change rte_ring dequeue to guarantee ordering before tail update

2016-07-23 Thread Jerin Jacob
On Sat, Jul 23, 2016 at 10:14:51AM +, Ananyev, Konstantin wrote:
> Hi lads,
> 
> > On Sat, Jul 23, 2016 at 11:02:33AM +0200, Thomas Monjalon wrote:
> > > 2016-07-23 8:05 GMT+02:00 Jerin Jacob :
> > > > On Thu, Jul 21, 2016 at 11:26:50PM +0200, Thomas Monjalon wrote:
> > > >> > > Consumer queue dequeuing must be guaranteed to be done fully
> > > >> > > before the tail is updated. This is not guaranteed with a read 
> > > >> > > barrier, changed to a write barrier just before tail update which 
> > > >> > > in
> > practice guarantees correct order of reads and writes.
> > > >> > >
> > > >> > > Signed-off-by: Juhamatti Kuusisaari
> > > >> > > 
> > > >> >
> > > >> > Acked-by: Konstantin Ananyev 
> > > >>
> > > >> Applied, thanks
> > > >
> > > > There was ongoing discussion on this
> > > > http://dpdk.org/ml/archives/dev/2016-July/044168.html
> > >
> > > Sorry Jerin, I forgot this email.
> > > The problem is that nobody replied to your email and you did not nack
> > > the v2 of this patch.
> 
> It's probably my bad.
> I acked the patch before Jerin response, and forgot to reply later. 
> 
> > >
> > > > This change may not be required as it has the performance impact.
> > >
> > > We need to clearly understand what is the performance impact (numbers
> > > and use cases) on one hand, and is there a real bug fixed by this
> > > patch on the other hand?
> > 
> > IHMO, there is no real bug here. rte_smb_rmb() provides the LOAD-STORE 
> > barrier to make sure tail pointer WRITE happens only after prior
> > LOADS.
> 
> Yep, from what I read at the link Jerin provided, indeed it seems 
> rte_smp_rmb() is enough for the arm arch here...
> For ppc, as I can see both rte_smp_rmb()/rte_smp_wmb() emits the same 
> instruction.
> 
> > 
> > Thoughts?
> 
> Wonder how big is a performance impact?

With this change we need to wait for addtional STORES to be completed to
local buffer in addtion to LOADS from ring buffers memory.

> If there is a real one, I suppose we can revert the patch?

Request to revert this one as their no benifts for other architectures
and indeed it creates addtional delay in waiting for STORES to complete in ARM.
Lets do the correct thing by reverting it.

Jerin



> Konstantin 
> 
> > 
> > >
> > > Please guys make things clear and we'll revert if needed.


  1   2   3   4   5   6   >