On Wed, Nov 01, 2017 at 05:59:51PM +0000, Van Haaren, Harry wrote: > > From: Richardson, Bruce > > Sent: Wednesday, November 1, 2017 5:09 PM > > To: Van Haaren, Harry <harry.van.haa...@intel.com> > > Cc: dev@dpdk.org; pbhagavat...@caviumnetworks.com; tho...@monjalon.net > > Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore > > function > > > > On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote: > > > This commit fixes a possible race condition if an application > > > uses the service-cores infrastructure and the function to run > > > a service on an application lcore at the same time. > > > > > > The fix is to change the num_mapped_cores variable to be an > > > atomic variable. This causes concurrent accesses by multiple > > > threads to a service using rte_service_run_iter_on_app_lcore() > > > to detect if another core is currently mapped to the service, > > > and refuses to run if it is not multi-thread safe. > > > > > > No performance impact is expected as the mappings for the > > > service-cores changes at control-path frequency, hence the > > > change from an ordinary write to an atomic write will not > > > have any significant impact. > > > > > > Two unit tests were added to verify the behaviour of the > > > function to run a service on an application core, testing both > > > a multi-thread safe service, and a multi-thread unsafe service. > > > > > > The doxygen API documentation for the function has been updated > > > to reflect the current and correct behaviour. > > > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > > > --- > > > > > <snip> > > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t > > service_mask) > > > int32_t rte_service_run_iter_on_app_lcore(uint32_t id) > > > { > > > /* run service on calling core, using all-ones as the service mask */ > > > + if (!service_valid(id)) > > > + return -EINVAL; > > > + > > > struct core_state *cs = &lcore_states[rte_lcore_id()]; > > > - return service_run(id, cs, UINT64_MAX); > > > + struct rte_service_spec_impl *s = &rte_services[id]; > > > + > > > + /* Atomically add this core to the mapped cores first, then examine if > > > + * we can run the service. This avoids a race condition between > > > + * checking the value, and atomically adding to the mapped count. > > > + */ > > > + rte_atomic32_inc(&s->num_mapped_cores); > > > + > > > + if (service_mt_safe(s) == 0 && > > > + rte_atomic32_read(&s->num_mapped_cores) > 1) { > > > + rte_atomic32_dec(&s->num_mapped_cores); > > > + return -EBUSY; > > > + } > > > + > > > + int ret = service_run(id, cs, UINT64_MAX); > > > + rte_atomic32_dec(&s->num_mapped_cores); > > > + > > > + return ret; > > > } > > > > Do we really need to do an atomic inc and dec in this function? If we > > check that there are no service cores mapped, would that not be enough > > for safety? If an app core is calling a service, the control plane is > > unlikely to decide to start spawning off a service core for that > > service simultaneously. Manipulating the count is safer, yes, but unlike > > the other changes in this patch, this one will affect performance, so I > > think we can go without. Similarly, for multiple data plane threads > > calling the same service simultaneously: everything else dataplane is > > done without locks so I think this should be too. > > > I think the inc-dec is required with the current code; what if *two* > application > cores are calling the same service? If we don't have the atomic inc-dec of the > num-mapped-cores, both app cores could simultaneously run a multi-thread > unsafe service. > > The other option is to demand that the *app* must serialize access to a > particular service > using this function - but in my eyes that limits the usefulness of this > function, because > it pushes the problem up a level instead of solving it. > > I guess an application "in the know" of how its cores are acting would prefer > to not > have the atomics at the service-cores library level. As a compromise, the use > of these > atomics could be a parameter to the rte_service_run_iter_on_app_lcore() > function. > > That would allow the application to choose if it wants to avail of the > functionality > in the service cores library, or if it takes the responsibility to serialize > access > to multi-thread unsafe functions itself. > > > I'll spin up a v2 with a parameter to the function. > Ok, that sounds reasonable enough. Just so long as apps that don't need the lock don't have to use it.
/Bruce