On 25/04/2017, 12:26, "Savolainen, Petri (Nokia - FI/Espoo)" <petri.savolai...@nokia-bell-labs.com<mailto:petri.savolai...@nokia-bell-labs.com>> wrote:
-----Original Message----- From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On Behalf Of Brian Brooks Sent: Monday, April 24, 2017 11:59 PM To: lng-odp@lists.linaro.org<mailto:lng-odp@lists.linaro.org> Cc: Ola Liljedahl <ola.liljed...@arm.com<mailto:ola.liljed...@arm.com>> Subject: [lng-odp] [PATCH] test: odp_sched_latency: robust draining of queues From: Ola Liljedahl <ola.liljed...@arm.com<mailto:ola.liljed...@arm.com>> In order to robustly drain all queues when the benchmark has ended, we enqueue a special event on every queue and invoke the scheduler until all such events have been received. odp_schedule_pause(); while (1) { ev = odp_schedule(&src_queue, ODP_SCHED_NO_WAIT); if (ev == ODP_EVENT_INVALID) break; if (odp_queue_enq(src_queue, ev)) { LOG_ERR("[%i] Queue enqueue failed.\n", thr); odp_event_free(ev); return -1; } } odp_schedule_resume(); odp_barrier_wait(&globals->barrier); clear_sched_queues(); What is the issue that this patch fixes? The issue is that odp_schedule() (even with a timeout) returns ODP_EVENT_INVALID but the queues are not actually empty. In a loosely synchronised (e.g. using weak ordering) queue and scheduler implementation, odp_schedule() can spuriously return EVENT_INVALID. This happens infrequently on some A57 targets. This sequence should be quite robust already since no new enqueues happen after the barrier. In a simple test code like this, the latency from last enq() (through the barrier) to schedule loop (in clear_sched_queues()) could be overcome just by not exiting after the first EVENT_INVALID from scheduler, but after N EVENT_INVALIDs in a row. In the scalable scheduler & queue implementation, it can take some time before enqueued events become visible and the corresponding ODP queues pushed to some scheduler queue. So odp_schedule() can return ODP_EVENT_INVALID, even when called with a timeout. There is no timeout or no amount of INVALID_EVENT returns that *guarantees* that the queues have been drained. Also in your patch, thread should exit only after scheduler returns EVENT_INVALID. Since the cool_down event is the last event on all queues (as they are enqueued after all threads have passed the barrier), when we have received all cool_down events we know that there are no other events on the these queues. No need to call odp_schedule() until it returns ODP_EVENT_INVALID (which can happen spuriously anyway so doesn’t signify anything). -Petri