On 04/06/17 13:35, Ola Liljedahl wrote:
> On 5 April 2017 at 23:39, Maxim Uvarov <maxim.uva...@linaro.org> wrote:
>> On 04/05/17 17:30, Ola Liljedahl wrote:
>>> On 5 April 2017 at 14:50, Maxim Uvarov <maxim.uva...@linaro.org> wrote:
>>>> On 04/05/17 06:57, Honnappa Nagarahalli wrote:
>>>>> This can go into master/api-next as an independent patch. Agree?
>>>>
>>>> agree. If we accept implementation where events can be 'delayed'
>>> Probably all platforms with HW queues.
>>>
>>>> than it
>>>> looks like we missed some api to sync queues.
>>> When would those API's be used?
>>>
>>
>> might be in case like that. Might be it's not needed in real world
>> application.
> This was a test program. I don't see the same situation occurring in a
> real world application. I could be wrong.
> 
>>
>> My point that if situation of postpone event is accepted that we need
>> document that in api doxygen comment.
> I think the asynchronous behaviour is the default. ODP is a hardware
> abstraction. HW is often asynchronous, writes are posted etc. Ensuring
> synchronous behaviour costs performance.
> 
> Single-threaded software is "synchronous", writes are immediately
> visible to the thread. But as soon as you go multi-threaded and don't
> use locks to access shared resources, software also becomes
> "asynchronous" (don't know if it is the right word here). Only if you
> use locks to synchronise accesses to shared memory you return to some
> form of sequential consistency (all threads see updates in the same
> order). You don't want to use locks, that quickly creates scalability
> bottlenecks.
> 
> Since the scalable scheduler does its best to avoid locks
> (non-scalable) and sequential consistency (slow), instead utilising
> lock-less and lock-free algorithms and weak memory ordering (e.g.
> acquire/release), it exposes the underlying hardware characteristics.
> 

Ola, I think you better understand how week memory ordering works. In
this case I understand that hardware can 'delay' events in queue and
make them not visible just after queueing for some reason. And it's not
possible to solve in implementation. If we speak totally about software
I would understand if one thread did queue and other dequeue. Or case if
you queued X and dequeued Y. But in that case if each thread queued 1
and dequeued 1 in each thread. Which look like if you store in one
thread some variable then you need several loads to get value which was
stored. Is that right behaviour of week ordering?

Maxim.


>>
>> Maxim.
>>
>>>>
>>>> But I do not see why we need this patch. On the same cpu test queue 1
>>>> event and after that dequeue 1 event:
>>>>
>>>>         for (i = 0; i < QUEUE_ROUNDS; i++) {
>>>>                 ev = odp_buffer_to_event(buf);
>>>>
>>>>                 if (odp_queue_enq(queue, ev)) {
>>>>                         LOG_ERR("  [%i] Queue enqueue failed.\n", thr);
>>>>                         odp_buffer_free(buf);
>>>>                         return -1;
>>>>                 }
>>>>
>>>>                 ev = odp_queue_deq(queue);
>>>>
>>>>                 buf = odp_buffer_from_event(ev);
>>>>
>>>>                 if (!odp_buffer_is_valid(buf)) {
>>>>                         LOG_ERR("  [%i] Queue empty.\n", thr);
>>>>                         return -1;
>>>>                 }
>>>>         }
>>>>
>>>> Where this exactly event can be delayed?
>>> In the memory system.
>>>
>>>>
>>>> If other threads do the same - then all do enqueue 1 event first and
>>>> then dequeue one event. I can understand problem with queueing on one
>>>> cpu and dequeuing on other cpu. But on the same cpu is has to always
>>>> work. Isn't it?
>>> No.
>>>
>>>>
>>>> Maxim.
>>>>
>>>>>
>>>>> On 4 April 2017 at 21:22, Brian Brooks <brian.bro...@arm.com> wrote:
>>>>>> On 04/04 17:26:12, Bill Fischofer wrote:
>>>>>>> On Tue, Apr 4, 2017 at 3:37 PM, Brian Brooks <brian.bro...@arm.com> 
>>>>>>> wrote:
>>>>>>>> On 04/04 21:59:15, Maxim Uvarov wrote:
>>>>>>>>> On 04/04/17 21:47, Brian Brooks wrote:
>>>>>>>>>> Signed-off-by: Ola Liljedahl <ola.liljed...@arm.com>
>>>>>>>>>> Reviewed-by: Brian Brooks <brian.bro...@arm.com>
>>>>>>>>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
>>>>>>>>>> Reviewed-by: Kevin Wang <kevin.w...@arm.com>
>>>>>>>>>> ---
>>>>>>>>>>  test/common_plat/performance/odp_scheduling.c | 12 ++++++++++--
>>>>>>>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/test/common_plat/performance/odp_scheduling.c 
>>>>>>>>>> b/test/common_plat/performance/odp_scheduling.c
>>>>>>>>>> index c74a0713..38e76257 100644
>>>>>>>>>> --- a/test/common_plat/performance/odp_scheduling.c
>>>>>>>>>> +++ b/test/common_plat/performance/odp_scheduling.c
>>>>>>>>>> @@ -273,7 +273,7 @@ static int test_plain_queue(int thr, 
>>>>>>>>>> test_globals_t *globals)
>>>>>>>>>>     test_message_t *t_msg;
>>>>>>>>>>     odp_queue_t queue;
>>>>>>>>>>     uint64_t c1, c2, cycles;
>>>>>>>>>> -   int i;
>>>>>>>>>> +   int i, j;
>>>>>>>>>>
>>>>>>>>>>     /* Alloc test message */
>>>>>>>>>>     buf = odp_buffer_alloc(globals->pool);
>>>>>>>>>> @@ -307,7 +307,15 @@ static int test_plain_queue(int thr, 
>>>>>>>>>> test_globals_t *globals)
>>>>>>>>>>                     return -1;
>>>>>>>>>>             }
>>>>>>>>>>
>>>>>>>>>> -           ev = odp_queue_deq(queue);
>>>>>>>>>> +           /* When enqueue and dequeue are decoupled (e.g. not 
>>>>>>>>>> using a
>>>>>>>>>> +            * common lock), an enqueued event may not be immediately
>>>>>>>>>> +            * visible to dequeue. So we just try again for a while. 
>>>>>>>>>> */
>>>>>>>>>> +           for (j = 0; j < 100; j++) {
>>>>>>>>>
>>>>>>>>> where 100 number comes from?
>>>>>>>>
>>>>>>>> It is the retry count. Perhaps it could be a bit lower, or a bit 
>>>>>>>> higher, but
>>>>>>>> it works well.
>>>>>>>
>>>>>>> Actually, it's incorrect. What happens if all 100 retries fail? You'll
>>>>>>> call odp_buffer_from_event() for ODP_EVENT_INVALID, which is
>>>>>>> undefined.
>>>>>>
>>>>>> Incorrect? :) The point is that an event may not be immediately available
>>>>>> to dequeue after it has been enqueued. This is due to the way that a 
>>>>>> concurrent
>>>>>> ring buffer behaves in a multi-threaded environment. The approach here is
>>>>>> just to retry the dequeue a couple times (100 times actually) before 
>>>>>> moving
>>>>>> on to the rest of code. Perhaps 100 times is too many times, but some 
>>>>>> amount
>>>>>> of retry is needed.
>>>>>>
>>>>>> If this is not desirable, then I think it would be more accurate to 
>>>>>> consider
>>>>>> odp_queue_enq() / odp_queue_deq() as async APIs -or- MT-unsafe (must be 
>>>>>> called
>>>>>> from one thread at a time in order to ensure the behavior that an event 
>>>>>> is
>>>>>> immediately available for dequeue once it has been enqueued).
>>>>>>
>>>>>>>>
>>>>>>>>> Maxim.
>>>>>>>>>
>>>>>>>>>> +                   ev = odp_queue_deq(queue);
>>>>>>>>>> +                   if (ev != ODP_EVENT_INVALID)
>>>>>>>>>> +                           break;
>>>>>>>>>> +                   odp_cpu_pause();
>>>>>>>>>> +           }
>>>>>>>>>>
>>>>>>>>>>             buf = odp_buffer_from_event(ev);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>
>>

Reply via email to