[lng-odp] [RFC] A prototype of a SW scheduler for ODP

2016-09-20 Thread ola.liljedahl
From: Ola Liljedahl 

A Monkeys Can Code Production

<*- Locks are for Lamers -*>

A high performance SW scheduler for ODP


A queue and scheduler design attempting to use lock-free and lock-less
synchronisation where possible and to minimise ordering and synchronisation
between threads.

Optimised for ARM (specifically Cortex-A53) targets. Builds and runs on
x86 (-64) but no attempt to optimise performance here.

Simple performance benchmark, pushing 2048 events through 20 queues (which
takes a few milliseconds).
Avg cycles for single-event enqueue/schedule operations on Cortex-A53@1.5GHz
CPU's   atomic  parallelordered
 1  183 222 388
 2  254 282 450
 3  269 333 489

A presentation and discussion is scheduled for the ODP Design Sprint at
Linaro Connect Las Vegas.

Signed-off-by: Ola Liljedahl 
---
 LICENSE |   28 +
 Makefile|  164 +
 llqueue.c   |  363 +++
 llsc.c  |  254 
 scheduler.c | 2042 +++
 5 files changed, 2851 insertions(+)
 create mode 100644 LICENSE
 create mode 100644 Makefile
 create mode 100644 llqueue.c
 create mode 100644 llsc.c
 create mode 100644 scheduler.c

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 000..15fdb21
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,28 @@
+Copyright (c) 2016, ARM Limited. All rights reserved.
+
+SPDX-License-Identifier:   BSD-3-Clause
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimer in the documentation and/or
+other materials provided with the distribution.
+
+Neither the name of ARM Limited nor the names of its contributors may be
+used to endorse or promote products derived from this software without specific
+prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/Makefile b/Makefile
new file mode 100644
index 000..ac7cd6b
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,164 @@
+###
+# Copyright (c) 2016, ARM Limited. All rights reserved.
+#
+# SPDX-License-Identifier:BSD-3-Clause
+
+
+###
+# Project specific definitions
+
+
+#Name of directory and also Dropbox source tar file
+DIRNAME = scheduler
+#List of executable files to build
+TARGETS = scheduler
+#List object files for each target
+OBJECTS_scheduler = scheduler.o
+
+#Customizable compiler and linker flags
+GCCTARGET =
+CCFLAGS += -mcx16#Required for CMPXCHG16 on x86
+#GCCTARGET = aarch64-linux-gnu
+#CCFLAGS += -mcpu=cortex-a53
+DEFINE += -DNDEBUG#disable assertions
+CCFLAGS += -std=c99
+CCFLAGS += -g -ggdb -Wall
+CCFLAGS += -O2 -fno-stack-check -fno-stack-protector
+LDFLAGS += -g -ggdb -pthread
+LIBS = -lrt
+
+#Where to find the source files
+VPATH += .
+
+#Default to non-verbose mode (echo command lines)
+VERB = @
+
+#Location of object and other derived/temporary files
+OBJDIR = obj#Must not be .
+
+###
+# Make actions (phony targets)
+
+
+.PHONY : default all clean tags etags
+
+default:
+   @echo "Make targets:"
+   @echo "all build all targets ($(TARGETS))"
+   @echo "clean   remove derived files"
+   @echo "tagsgenerate vi tags file"
+   @echo "etags   generate emacs tags file"
+
+all : $(TARGETS)
+
+#Make sure we don't remove current directory with all source files
+ifeq ($(OBJDIR),.)
+$(error invalid OBJDIR=$(OBJDIR))
+endif
+ifeq 

Re: [lng-odp] LPM Algorithm APIs in ODP

2016-09-20 Thread Mike Holmes
On 20 September 2016 at 13:37, gyanesh patra 
wrote:

> Hi,
> The L3fwd is a great addition to the ODP. I am curious if the IPv6 support
> is also under investigation for the l3fwd example??
>

We dont have anything scheduled to specifically address IPv6


> If not it will be of my interest to take it up and contribute to ODP code
> base.
>

I am sure you would get interested reviewers for your patches.


>
> Thank you
>
> P Gyanesh Kumar Patra
>
> On Mon, Apr 18, 2016 at 10:26 PM, HePeng  wrote:
>
> > Hi,
> >Our current LPM code is based on Tree Bitmap as a backend for IP
> > prefixes
> > management. Before we submit the code, we need first to remove this part
> > of
> > code as Tree Bitmap is a patent algorithm for Cisco.
> >
> >If you just want an LPM algorithm for evaluation, we can provide the
> > Tree Bitmap code alone,
> > but it is not fit into ODP architecture. Please check
> https://github.com/
> > xnhp0320/prefix_lookup_mc.git
> > and pull the develop branch.
> >
> >We are working on the code, I think the code should be ready in two
> > weeks.
> >
> >
> >
> >
> > 在 2016年4月18日,下午12:39,P Gyanesh Kumar Patra 
> 写道:
> >
> > Hi,
> > Thank you for the details.
> > Do we have any time frame for the LPM code submission?
> > Is it possible to do some trial on the LPM code now?
> >
> > Is there a list of names of Algorithms in the pipeline to be developed
> for
> > ODP?
> >
> > Thank You
> > *P Gyanesh K. Patra*
> > *University of Campinas (Unicamp)*
> >
> >
> >
> >
> > On Apr 17, 2016, at 22:55, HePeng  wrote:
> >
> > Hi,
> >We are in the progress of releasing the LPM code, but currently we are
> > busy submitting the cuckoo hash code into ODP helper.
> >
> >About the LPM code we have already a 16-8-8 implementation. Now we are
> > working on the code to fit it into ODP architecture. But we have not
> > submitted any code for LPM yet.
> >
> >If there is a requirement, we can switch to focus on LPM code.
> >
> >
> > 在 2016年4月18日,上午9:03,gyanesh patra  写道:
> >
> > I encountered an old email chain about the different LPM algorithm for
> > ODP. I am curious if anyone has released or working on something for l3
> > forwarding/routing.
> > Here is the link to the mail chain:
> > https://lists.linaro.org/pipermail/lng-odp/2015-August/014717.html
> >
> > If any work is going on, then point me in the correct direction. Also do
> > we have any example code for l3 forwarding in ODP available now?
> >
> > Thank you
> > *P Gyanesh K. Patra*
> > *University of Campinas (Unicamp)*
> >
> >
> > ___
> > lng-odp mailing list
> > lng-odp@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/lng-odp
> >
> >
> >
> >
> >
>



-- 
Mike Holmes
Program Manager - Linaro Networking Group
Linaro.org  *│ *Open source software for ARM SoCs
"Work should be fun and collaborative, the rest follows"


[lng-odp] query regarding cuckoo hash table support

2016-09-20 Thread gyanesh patra
Hi,
I am unable to find the cuckoo hash files in the recent code base. Is the
feature is removed from ODP code base or renamed to something else?

Thank you
Gyanesh


Re: [lng-odp] LPM Algorithm APIs in ODP

2016-09-20 Thread gyanesh patra
Hi,
The L3fwd is a great addition to the ODP. I am curious if the IPv6 support
is also under investigation for the l3fwd example??
If not it will be of my interest to take it up and contribute to ODP code
base.

Thank you

P Gyanesh Kumar Patra

On Mon, Apr 18, 2016 at 10:26 PM, HePeng  wrote:

> Hi,
>Our current LPM code is based on Tree Bitmap as a backend for IP
> prefixes
> management. Before we submit the code, we need first to remove this part
> of
> code as Tree Bitmap is a patent algorithm for Cisco.
>
>If you just want an LPM algorithm for evaluation, we can provide the
> Tree Bitmap code alone,
> but it is not fit into ODP architecture. Please check https://github.com/
> xnhp0320/prefix_lookup_mc.git
> and pull the develop branch.
>
>We are working on the code, I think the code should be ready in two
> weeks.
>
>
>
>
> 在 2016年4月18日,下午12:39,P Gyanesh Kumar Patra  写道:
>
> Hi,
> Thank you for the details.
> Do we have any time frame for the LPM code submission?
> Is it possible to do some trial on the LPM code now?
>
> Is there a list of names of Algorithms in the pipeline to be developed for
> ODP?
>
> Thank You
> *P Gyanesh K. Patra*
> *University of Campinas (Unicamp)*
>
>
>
>
> On Apr 17, 2016, at 22:55, HePeng  wrote:
>
> Hi,
>We are in the progress of releasing the LPM code, but currently we are
> busy submitting the cuckoo hash code into ODP helper.
>
>About the LPM code we have already a 16-8-8 implementation. Now we are
> working on the code to fit it into ODP architecture. But we have not
> submitted any code for LPM yet.
>
>If there is a requirement, we can switch to focus on LPM code.
>
>
> 在 2016年4月18日,上午9:03,gyanesh patra  写道:
>
> I encountered an old email chain about the different LPM algorithm for
> ODP. I am curious if anyone has released or working on something for l3
> forwarding/routing.
> Here is the link to the mail chain:
> https://lists.linaro.org/pipermail/lng-odp/2015-August/014717.html
>
> If any work is going on, then point me in the correct direction. Also do
> we have any example code for l3 forwarding in ODP available now?
>
> Thank you
> *P Gyanesh K. Patra*
> *University of Campinas (Unicamp)*
>
>
> ___
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>
>
>
>


Re: [lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Christophe Milard
On 20 September 2016 at 16:01, Bill Fischofer 
wrote:

>
>
> On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard <
> christophe.mil...@linaro.org> wrote:
>
>> Hi,
>>
>> I am here trying to make a summary of what is needed by the driver
>> interface
>> regarding odp packet handling. Will serve as the base for the discussions
>> at connect. Please read and comment... possibly at connect...
>>
>> /Christophe
>>
>> From the driver perspective, the situation is rather simple: what we need
>> is:
>>
>> /* definition of a packet segment descriptor:
>>  * A packet segment is just an area, continuous in virtual address space,
>>  * and continuous in the physical address space -at least when no iommu is
>>  * used, e.g for virtio-. Probably we want to have physical continuity in
>>  * all cases (to avoid handling different cases to start with), but that
>>  * would not take advantage of the remapping that can be done by iommus,
>>  * so it can come with a little performance penalty for iommu cases.
>>
>
> I thought we had discussed and agreed that ODP would assume it is running
> on a platform with IOMMU capability? Are there any non-IOMMU platforms of
> interest that we need to support? If not, then I see no need to make this
> provision. In ODP we already have an odp_packet_seg_t type that represents
> a portion of an odp_packet_t that can be contiguously addressed.
>

yes. we did. but then the focus changed to virtio. there is no iommu
there...


>
>
>>  * Segments are shared among all odp threads (including linux processes),
>>
>
> Might be more precise to simply say "segments are accessible to all odp
> threads". Sharing implies simultaneous access, along with some notion of
> coherence, which is something that probably isn't needed.
>

Tell me if I am wrong, but the default in ODP is that a queue access can be
shared between different ODP thread (there is a flag  to garantee
1thread<->1queue access -and hence to have performance benefit-), but as it
is now, nothing prevent thread A to but something in the TX ring buffer and
thread B to free the TX'ed data when putting its own stuff in the same TX
queue. Same shareability in RX.
With these ODP assumptions, we have to access the segments from  different
ODP threads. I would be very pleased to be wrong here :-)
Maybe I should say that I don't think it is an option to have a context
switch at each driver "access", i.e. I don't see a driver as its own ODP
thread/linux process being accessed by some IPC: For me, any ODPthread
sending/receiving packet will act as a driver (same context).


>
>
>>  * and are guaranteed to be mapped at the same virtual address space in
>>  * all ODP instances (single_va flag in ishm) */
>>
>
> Why is this important? How does Thread A know how a segment is accessible
> by Thread B, and does it care?
>

I am afraid it is with regard to my previous answer. If addresses of
segment (and packets) differ from thread to thread, no reference via shared
pointer will be possible between the ODP threads acting as driver =>loss in
efficiency.

>
>
>>  * Note that this definition just implies that a packet segment is
>> reachable
>>  * by the driver. A segment could actually be part of a HW IO chip in a HW
>>  * accelerated HW.
>>
>
> I think this is the key. All that (should) be needed is for a driver to be
> able to access any segment that it is working with. How it does so would
> seem to be secondary from an architectural perspective.
>

Sure. but we still have to implement something on linux-generic, and to
make it possible for other to do something good.


>
>
>> /* for linux-gen:
>>  * Segment are memory areas.
>>  * In TX, pkt_sgmt_join() put the pointer to the odp packet in the
>> 'odp_private'
>>  * element of the last segment of each packet, so that pkt_sgmt_free()
>>  * can just do nothing when odp_private is NULL and release the complete
>>  * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
>>  * will have their odp_private set to NULL. The name and the 'void*' is
>>  * to make that opaque to the driver interface which really should not
>> care...
>>  * Other ODP implementation could handle that as they wish.
>>
>
> Need to elaborate on this. Currently we have an odp_packet_alloc() API
> that allocates a packet that consists of one or more segments. What seems
> to be new from the driver is the ability to allocate (and free) individual
> segments and then (a) assemble them into odp_packet_t objects or (b) remove
> them from odp_packet_t objects so that they become unaffiliated raw
> segments not associated with any odp_packet_t.
>

Yes a) is definitely needed. We have to be able to allocate segments
without telling which ODP packet they refer to: simply because we cannot
know that at alloc time (at least for some NICs) what packet segment would
relate to which packet: if we put 32 x 2K segments in a RX buffer, this can
result as one single ODP packet using them all (for a 64K jumbo frame) or

Re: [lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Bill Fischofer
On Tue, Sep 20, 2016 at 8:30 AM, Christophe Milard <
christophe.mil...@linaro.org> wrote:

> Hi,
>
> I am here trying to make a summary of what is needed by the driver
> interface
> regarding odp packet handling. Will serve as the base for the discussions
> at connect. Please read and comment... possibly at connect...
>
> /Christophe
>
> From the driver perspective, the situation is rather simple: what we need
> is:
>
> /* definition of a packet segment descriptor:
>  * A packet segment is just an area, continuous in virtual address space,
>  * and continuous in the physical address space -at least when no iommu is
>  * used, e.g for virtio-. Probably we want to have physical continuity in
>  * all cases (to avoid handling different cases to start with), but that
>  * would not take advantage of the remapping that can be done by iommus,
>  * so it can come with a little performance penalty for iommu cases.
>

I thought we had discussed and agreed that ODP would assume it is running
on a platform with IOMMU capability? Are there any non-IOMMU platforms of
interest that we need to support? If not, then I see no need to make this
provision. In ODP we already have an odp_packet_seg_t type that represents
a portion of an odp_packet_t that can be contiguously addressed.


>  * Segments are shared among all odp threads (including linux processes),
>

Might be more precise to simply say "segments are accessible to all odp
threads". Sharing implies simultaneous access, along with some notion of
coherence, which is something that probably isn't needed.


>  * and are guaranteed to be mapped at the same virtual address space in
>  * all ODP instances (single_va flag in ishm) */
>

Why is this important? How does Thread A know how a segment is accessible
by Thread B, and does it care?


>  * Note that this definition just implies that a packet segment is
> reachable
>  * by the driver. A segment could actually be part of a HW IO chip in a HW
>  * accelerated HW.
>

I think this is the key. All that (should) be needed is for a driver to be
able to access any segment that it is working with. How it does so would
seem to be secondary from an architectural perspective.


> /* for linux-gen:
>  * Segment are memory areas.
>  * In TX, pkt_sgmt_join() put the pointer to the odp packet in the
> 'odp_private'
>  * element of the last segment of each packet, so that pkt_sgmt_free()
>  * can just do nothing when odp_private is NULL and release the complete
>  * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
>  * will have their odp_private set to NULL. The name and the 'void*' is
>  * to make that opaque to the driver interface which really should not
> care...
>  * Other ODP implementation could handle that as they wish.
>

Need to elaborate on this. Currently we have an odp_packet_alloc() API that
allocates a packet that consists of one or more segments. What seems to be
new from the driver is the ability to allocate (and free) individual
segments and then (a) assemble them into odp_packet_t objects or (b) remove
them from odp_packet_t objects so that they become unaffiliated raw
segments not associated with any odp_packet_t.

So it seems we need a corresponding set of odp_segment_xxx() APIs that
operate on a new base type: odp_segment_t. An odp_segment_t becomes an
odp_packet_seg_t when it (and possibly other segments) are converted into
an odp_packet_t as part of a packet assembly operation. Conversely, an
odp_packet_seg_t becomes an odp_segment_t when it is disconnected from an
odp_packet_t.


>  */
>
> typedef uint64_t phy_address_t;
>
> typedef struct{
> void*address;
> phy_address_t   phy_addr;
> uint32_tlen;
> void*   odp_private;
> } pkt_sgmt_t;
>
> /* FOR RX: */
> /* segment allocation function:
>  * As it is not possible to guarantee physical memory continuity from
>  * user space, this segment alloc function is best effort:
>  * The size passed in parameter is a hint of what the most probable
> received
>  * packet size could be: this alloc function will allocate a segment whose
> size
>  * will be greater or equal to the required size if the latter can fit in
>  * a single page (or huge page), hence guarateeing the segment physical
>  * continuity.
>  * If there is no physical page large enough for 'size' bytes, then
>  * the largest page is returned, meaning that in that case the allocated
>  * segment will be smaller than the required size. (the received packet
>  * will be fragmented in this case).
>  * This pkt_sgmt_alloc function is called by the driver RX side to populate
>  * the NIC RX ring buffer(s).
>  * returns the number of allocated segments (1) on success or 0 on error.
>  * Note: on unix system with 2K and 2M pages, this means that 2M will get
>  * allocated for each large (64K?) packet... to much waste? should we
> handle
>  * page fragmentation (which would really not change this interface)?
>  */
> int 

[lng-odp] RFC: packet interface to drivers

2016-09-20 Thread Christophe Milard
Hi,

I am here trying to make a summary of what is needed by the driver interface
regarding odp packet handling. Will serve as the base for the discussions
at connect. Please read and comment... possibly at connect...

/Christophe

>From the driver perspective, the situation is rather simple: what we need is:

/* definition of a packet segment descriptor:
 * A packet segment is just an area, continuous in virtual address space,
 * and continuous in the physical address space -at least when no iommu is
 * used, e.g for virtio-. Probably we want to have physical continuity in
 * all cases (to avoid handling different cases to start with), but that
 * would not take advantage of the remapping that can be done by iommus,
 * so it can come with a little performance penalty for iommu cases.
 * Segments are shared among all odp threads (including linux processes),
 * and are guaranteed to be mapped at the same virtual address space in
 * all ODP instances (single_va flag in ishm) */
 * Note that this definition just implies that a packet segment is reachable
 * by the driver. A segment could actually be part of a HW IO chip in a HW
 * accelerated HW.
/* for linux-gen:
 * Segment are memory areas.
 * In TX, pkt_sgmt_join() put the pointer to the odp packet in the 'odp_private'
 * element of the last segment of each packet, so that pkt_sgmt_free()
 * can just do nothing when odp_private is NULL and release the complete
 * odp packet when not null. Segments allocated with pkt_sgmt_alloc()
 * will have their odp_private set to NULL. The name and the 'void*' is
 * to make that opaque to the driver interface which really should not care...
 * Other ODP implementation could handle that as they wish.
 */

typedef uint64_t phy_address_t;

typedef struct{
void*address;
phy_address_t   phy_addr;
uint32_tlen;
void*   odp_private;
} pkt_sgmt_t;

/* FOR RX: */
/* segment allocation function:
 * As it is not possible to guarantee physical memory continuity from
 * user space, this segment alloc function is best effort:
 * The size passed in parameter is a hint of what the most probable received
 * packet size could be: this alloc function will allocate a segment whose size
 * will be greater or equal to the required size if the latter can fit in
 * a single page (or huge page), hence guarateeing the segment physical
 * continuity.
 * If there is no physical page large enough for 'size' bytes, then
 * the largest page is returned, meaning that in that case the allocated
 * segment will be smaller than the required size. (the received packet
 * will be fragmented in this case).
 * This pkt_sgmt_alloc function is called by the driver RX side to populate
 * the NIC RX ring buffer(s).
 * returns the number of allocated segments (1) on success or 0 on error.
 * Note: on unix system with 2K and 2M pages, this means that 2M will get
 * allocated for each large (64K?) packet... to much waste? should we handle
 * page fragmentation (which would really not change this interface)?
 */
int pkt_sgmt_alloc(uint32_t size, pkt_sgmt_t *returned_sgmt);

/*
 * another variant of the above function could be:
 * returns the number of allocated segments on success or 0 on error.
 */
int pkt_sgmt_alloc_multi(uint32_t size, pkt_sgmt_t *returned_sgmts,
 int* nb_sgmts);

/*
 * creating ODP packets from the segments:
 * Once a series of segments belonging to a single received packet is
 * fully received (note that this serie can be of lengh 1 if the received
 * packet fitted in a single segment), we need a function to create the
 * ODP packet from the list of segments.
 * We first define the "pkt_sgmt_hint" structure, which can be used by
 * a NIC to pass information about the received packet (the HW probably
 * knows a lot about the received packet so the SW does not nesseceraly
 * need to reparse it: the hint struct contains info which is already known
 * by the HW. If hint is NULL when calling pkt_sgmt_join(), then the SW has
 * to reparse the received packet from scratch.
 * pkt_sgmt_join() returns 0 on success.
 */
typedef struct {
/* ethtype, crc_ok, L2 and L3 offset, ip_crc_ok, ... */
} pkt_sgmt_hint;

int pkt_sgmt_join(pkt_sgmt_hint *hint,
  pkt_sgmt_t *segments, int nb_segments,
  odp_packet_t *returned_packet);

/* another variant of the above, directely passing the packet to a given queue*/
int pkt_sgmt_join_and_send(pkt_sgmt_hint *hint,
   pkt_sgmt_t *segments, int nb_segments,
   odp_queue_t *dest_queue);


/* FOR TX: */
/*
 * Function returning a list of segments making an odp_packet:
 * return the number of segments or 0 on error:
 * The segments are returned in the segments[] array, whose length will
 * never exceed max_nb_segments.
 */
int pkt_sgmt_get(odp_pool_t *packet, pkt_sgmt_t *segments, int max_nb_segments);

/*
 * "free" a segment
 */
/*
 * For linux-generic, 

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

2016-09-20 Thread Savolainen, Petri (Nokia - FI/Espoo)
Hi,

First, this app is written according to the current API and we'd like to start 
latency testing schedulers ASAP. A review of the app code itself would be 
appreciated.

Anayway, I'll answer those API related comments under.


> -Original Message-
> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
> Fischofer
> Sent: Monday, September 19, 2016 11:41 PM
> To: Brian Brooks 
> Cc: Elo, Matias (Nokia - FI/Espoo) ; lng-
> o...@lists.linaro.org
> Subject: Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling
> latency test
> 
> On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks 
> wrote:
> 
> > On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > > >
> > > > On 09/14 11:53:06, Matias Elo wrote:
> > > > > +


> >
> > Thinking in the general sense..
> >
> > Should applications have to reason about _and_ code around pre-scheduled
> > and non-scheduled events? If the event hasn't crossed the API boundary
> to
> > be
> > delivered to the application according to the scheduling group policies
> for
> > that core, what is the difference to the application?
> >
> > If a scheduler implementation uses TLS to pre-schedule events it also
> seems
> > like it should be able to support work-stealing of those pre-scheduled
> > events
> > by other threads in the runtime case where odp_schedule() is not called
> > from
> > that thread or the thread id is removed from scheduling group masks.
> From
> > the application perspective these are all implementation details.
> >

Pause signals a (HW) scheduler that application will leave the schedule loop 
soon (app stops calling schedule() for a long time or forever). Without the 
signal, scheduler would not see any difference between a "mid" schedule call 
vs. the last call. A schedule() call starts and ends a schedule context (e.g. 
atomic locking of a queue). If application just leaves the loop, the last 
context will not be freed and e.g. an atomic queue would deadlock.

Also generally pre-scheduled work cannot be "stolen" since:
1) it would be costly operation to unwind already made decisions
2) packet order must be maintained also in this case. It's costly to reorder / 
force order for stolen events (other events may have been already processed on 
other cores before you "steal" some events).



> 
> You're making an argument I made some time back. :)  As I recall, the
> rationale for pause/resume was to make life easier for existing code that
> is introducing ODP on a more gradual basis. Presumably Nokia has examples
> of such code in house.

No. See, rationale above. It's based on functionality of existing SoC HW 
schedulers. HW is bad in unwinding already made decisions. Application is in 
the best position to decide what to do for the last events before a thread 
exists. Typically, those are processed as any other event.

> 
> From a design standpoint worker threads shouldn't "change their minds" and
> go off to do something else for a while. For whatever else they might want
> to do it would seem that such requirements would be better served by
> simply
> having another thread to do the other things that wakes up periodically to
> do them.
> 

Pause/resume should not be something that a thread is doing very often. But 
without it, any worker thread could not ever exit the schedule loop - doing so 
could deadlock a queue (or a number of queues).

> 
> >
> > This pause state may also cause some confusion for application writers
> > because
> > it is now possible to write two different event loops for the same core
> > depending on how a particular scheduler implementation behaves. The
> > semantics
> > seem to blur a bit with scheduling groups. Level of abstraction can be
> > raised
> > by deprecating the scheduler pause state and APIs.
> >

Those cannot be just deprecated. The same signal is needed in some form to 
avoid deadlocks.

> 
> This is a worthwhile discussion to have. I'll add it to the agenda for
> tomorrow's ODP call and we can include it in the wider scheduler
> discussions scheduled for next week. The other rationale for not wanting
> this behavior (another argument I advanced earlier) is that it greatly
> complicates recovery processing. A robustly designed application should be
> able to recover from the failure of an individual thread (this is
> especially true if the ODP thread is in fact a separate process). If the
> implementation has prescheduled events to a failed thread then how are
> they
> recovered gracefully? Conversely, if the implementation can recover from
> such a scenario than it would seem it could equally "unschedule" prestaged
> events as needed due to thread termination (normal or abnormal) or for
> load
> balancing purposes.

Unwinding is hard in HW schedulers and something that is not generally 
supported.

> 
> We may not be able to fully deprecate these APIs, but perhaps we can make
> it clearer how they are