On 12/13/16 10:50, Savolainen, Petri (Nokia - FI/Espoo) wrote: > >>> +static void add_group(sched_group_t *sched_group, int thr, int group) >>> +{ >>> + int num; >>> + uint32_t gen_cnt; >>> + thr_group_t *thr_group = &sched_group->s.thr[thr]; >>> + >>> + num = thr_group->num_group; >>> + thr_group->group[num] = group; >>> + thr_group->num_group = num + 1; >>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt); >>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1); >> >> >> odp_atomic_inc_u32() > > Atomic, in-memory increment would be waste here since this operation is done > while holding a lock. The variable is atomic since it's checked with load_acq > in sched_cmd() (when not holding a lock). > > >> >> >>> +} >>> + >>> +static void remove_group(sched_group_t *sched_group, int thr, int >> group) >>> +{ >>> + int i, num; >>> + int found = 0; >>> + thr_group_t *thr_group = &sched_group->s.thr[thr]; >>> + >>> + num = thr_group->num_group; >>> + >>> + for (i = 0; i < num; i++) { >>> + if (thr_group->group[i] == group) { >>> + found = 1; >>> + >>> + for (; i < num - 1; i++) >>> + thr_group->group[i] = thr_group->group[i + 1]; >>> + >>> + break; >>> + } >>> + } >>> + >>> + if (found) { >>> + uint32_t gen_cnt; >>> + >>> + thr_group->num_group = num - 1; >>> + gen_cnt = odp_atomic_load_u32(&thr_group->gen_cnt); >>> + odp_atomic_store_u32(&thr_group->gen_cnt, gen_cnt + 1); >> >> odp_atomic_inc_u32() >> >> and you can remove if found by placing inc and break above. That should >> fit in 80 characters. > > > Same thing here. Load + store is more performant than an unnecessary atomic, > in-memory inc. > > -Petri >
I think it depends. On ABI compat mode that will be function and you need 2 stores in stack. And what is faster depends on implementation of internal __atomic_fetch_add call, which may vary on gcc version. I never measured what is faster, but it's easy to check... Maxim. >