> -----Original Message-----
> From: Maxim Uvarov [mailto:maxim.uva...@linaro.org]
> Sent: Tuesday, December 13, 2016 4:06 PM
> To: Savolainen, Petri (Nokia - FI/Espoo) <petri.savolainen@nokia-bell-
> labs.com>; lng-odp@lists.linaro.org
> Subject: Re: [lng-odp] [API-NEXT PATCH] linux-gen: schedule_sp: use ring
> as priority queue
> 
> On 12/13/16 10:50, Savolainen, Petri (Nokia - FI/Espoo) wrote:
> >
> >>> +static void add_group(sched_group_t *sched_group, int thr, int group)
> >>> +{
> >>> + int num;
> >>> + uint32_t gen_cnt;
> >>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
> >>> +
> >>> + num = thr_group->num_group;
> >>> + thr_group->group[num] = group;
> >>> + thr_group->num_group  = num + 1;
> >>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
> >>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
> >>
> >>
> >> odp_atomic_inc_u32()
> >
> > Atomic, in-memory increment would be waste here since this operation is
> done while holding a lock. The variable is atomic since it's checked with
> load_acq in sched_cmd() (when not holding a lock).
> >
> >
> >>
> >>
> >>> +}
> >>> +
> >>> +static void remove_group(sched_group_t *sched_group, int thr, int
> >> group)
> >>> +{
> >>> + int i, num;
> >>> + int found = 0;
> >>> + thr_group_t *thr_group = &sched_group->s.thr[thr];
> >>> +
> >>> + num = thr_group->num_group;
> >>> +
> >>> + for (i = 0; i < num; i++) {
> >>> +         if (thr_group->group[i] == group) {
> >>> +                 found = 1;
> >>> +
> >>> +                 for (; i < num - 1; i++)
> >>> +                         thr_group->group[i] = thr_group->group[i + 1];
> >>> +
> >>> +                 break;
> >>> +         }
> >>> + }
> >>> +
> >>> + if (found) {
> >>> +         uint32_t gen_cnt;
> >>> +
> >>> +         thr_group->num_group = num - 1;
> >>> +         gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt);
> >>> +         odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1);
> >>
> >> odp_atomic_inc_u32()
> >>
> >> and you can remove if found by placing inc and break above. That should
> >> fit in 80 characters.
> >
> >
> > Same thing here. Load + store is more performant than an unnecessary
> atomic, in-memory inc.
> >
> > -Petri
> >
> 
> I think it depends. On ABI compat mode that will be function and you
> need 2 stores in stack.
> 
> And what is faster depends on implementation of internal
> __atomic_fetch_add call, which may vary on gcc version. I never measured
> what is faster, but it's easy to check...
> 
> Maxim.


No, it does not depend on GCC. Those are plain load and store instructions, so 
there's no atomicity guarantees needed. Also atomics can and should be inlined 
even in ABI compat mode (which is CPU arch specific anyway). I think those are 
inlined already using GCC __atomics but for strict ABI compat we should write 
those in assembly.

The atomic_inc implements *in-memory* atomic increment, which needs much more 
synchronization than a load or a store (which does not provide any sync). So, 
load + store == always about 2 CPU cycles. Atomic_inc may be e.g. 30 cycles, 
depending on the HW implementation. The main point is that this code section is 
synchronized with a lock already and it does not need any extra (in-memory) 
atomic synchronization because of that.

The variable is atomic only because it's checked with load_acq in sched_cmd().

-Petri


Reply via email to