Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

Brian Brooks Wed, 21 Sep 2016 17:13:24 -0700

On 09/20 08:01:49, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> Hi,
> 
> First, this app is written according to the current API and we'd like to 
> start latency testing schedulers ASAP. A review of the app code itself would 
> be appreciated.


Reviewed and tested.

> Anayway, I'll answer those API related comments under.
> 
> 
> > -----Original Message-----
> > From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Bill
> > Fischofer
> > Sent: Monday, September 19, 2016 11:41 PM
> > To: Brian Brooks <brian.bro...@linaro.org>
> > Cc: Elo, Matias (Nokia - FI/Espoo) <matias....@nokia-bell-labs.com>; lng-
> > o...@lists.linaro.org
> > Subject: Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling
> > latency test
> > 
> > On Mon, Sep 19, 2016 at 2:11 PM, Brian Brooks <brian.bro...@linaro.org>
> > wrote:
> > 
> > > On 09/19 07:55:22, Elo, Matias (Nokia - FI/Espoo) wrote:
> > > > >
> > > > > On 09/14 11:53:06, Matias Elo wrote:
> > > > > > +
> 
> 
> > >
> > > Thinking in the general sense..
> > >
> > > Should applications have to reason about _and_ code around pre-scheduled
> > > and non-scheduled events? If the event hasn't crossed the API boundary
> > to
> > > be
> > > delivered to the application according to the scheduling group policies
> > for
> > > that core, what is the difference to the application?
> > >
> > > If a scheduler implementation uses TLS to pre-schedule events it also
> > seems
> > > like it should be able to support work-stealing of those pre-scheduled
> > > events
> > > by other threads in the runtime case where odp_schedule() is not called
> > > from
> > > that thread or the thread id is removed from scheduling group masks.
> > From
> > > the application perspective these are all implementation details.
> > >
> 
> Pause signals a (HW) scheduler that application will leave the schedule loop 
> soon (app stops calling schedule() for a long time or forever). Without the 
> signal, scheduler would not see any difference between a "mid" schedule call 
> vs. the last call. A schedule() call starts and ends a schedule context (e.g. 
> atomic locking of a queue). If application just leaves the loop, the last 
> context will not be freed and e.g. an atomic queue would deadlock.

It is the scheduler providing exclusive access to the atomic queue. At any
one point in time there may only be one core processing an event from an
atomic queue. Multiple cores can participate in processing from an atomic
queue, but the scheduler will ensure exclusive access.

If the core processing an event from an atomic queue finishes its work and
asks the scheduler for more work, the atomic context is implicitly released
by the application. The scheduler may give that core an event from a higher
priority queue and an event from the original atomic queue to another core.

Another scenario is when the core processing an event from an atomic queue
finishes the critical section work but still needs to continue processing the
event, it may release the atomic context explicitly. At this point, the
scheduler may dispatch the next event from the atomic queue to another core
and there could be parallel processing of events from an atomic queue. Maybe
switching the queue to be ordered instead of atomic could be considered here.

Do you have something in mind as to why the odp_schedule_release_xxx() APIs
are insufficient for the 'last' schedule call?

> Also generally pre-scheduled work cannot be "stolen" since:
> 1) it would be costly operation to unwind already made decisions
> 2) packet order must be maintained also in this case. It's costly to reorder 
> / force order for stolen events (other events may have been already processed 
> on other cores before you "steal" some events).

A scheduler implementation may pre-schedule work to cores, but you're right
it could end up being costly if data is being moved like that. Ensuring
correctness could become challenging too.

> > You're making an argument I made some time back. :)  As I recall, the
> > rationale for pause/resume was to make life easier for existing code that
> > is introducing ODP on a more gradual basis. Presumably Nokia has examples
> > of such code in house.
> 
> No. See, rationale above. It's based on functionality of existing SoC HW 
> schedulers. HW is bad in unwinding already made decisions. Application is in 
> the best position to decide what to do for the last events before a thread 
> exists. Typically, those are processed as any other event.
> 
> > 
> > From a design standpoint worker threads shouldn't "change their minds" and
> > go off to do something else for a while. For whatever else they might want
> > to do it would seem that such requirements would be better served by
> > simply
> > having another thread to do the other things that wakes up periodically to
> > do them.
> > 
> 
> Pause/resume should not be something that a thread is doing very often. But 
> without it, any worker thread could not ever exit the schedule loop - doing 
> so could deadlock a queue (or a number of queues).
> 
> > 
> > >
> > > This pause state may also cause some confusion for application writers
> > > because
> > > it is now possible to write two different event loops for the same core
> > > depending on how a particular scheduler implementation behaves. The
> > > semantics
> > > seem to blur a bit with scheduling groups. Level of abstraction can be
> > > raised
> > > by deprecating the scheduler pause state and APIs.
> > >
> 
> Those cannot be just deprecated. The same signal is needed in some form to 
> avoid deadlocks.
> 
> > 
> > This is a worthwhile discussion to have. I'll add it to the agenda for
> > tomorrow's ODP call and we can include it in the wider scheduler
> > discussions scheduled for next week. The other rationale for not wanting
> > this behavior (another argument I advanced earlier) is that it greatly
> > complicates recovery processing. A robustly designed application should be
> > able to recover from the failure of an individual thread (this is
> > especially true if the ODP thread is in fact a separate process). If the
> > implementation has prescheduled events to a failed thread then how are
> > they
> > recovered gracefully? Conversely, if the implementation can recover from
> > such a scenario than it would seem it could equally "unschedule" prestaged
> > events as needed due to thread termination (normal or abnormal) or for
> > load
> > balancing purposes.
> 
> Unwinding is hard in HW schedulers and something that is not generally 
> supported.

A description of the unwinding process may help here.

> > 
> > We may not be able to fully deprecate these APIs, but perhaps we can make
> > it clearer how they are intended to be used and classify them as
> > "discouraged" for new code.
> 
> Explained above why those are needed. I think there's no real reason to 
> change the current API. It's optimized for the normal operation (== threads 
> don't exit), but offers a way to exit the loop gracefully (== without ruing 
> the performance or ease of use of the normal operation).
> 
> -Petri
> 
> 
>

Re: [lng-odp] [PATCH v2 1/2] test: perf: add new scheduling latency test

Reply via email to