<snip> > > > > > > Subject: [PATCH v2 1/6] service: fix race condition for MT unsafe > > > > service > > > > > > > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > > > > > The MT unsafe service might get configured to run on another core > > > > while the service is running currently. This might result in the > > > > MT unsafe service running on multiple cores simultaneously. Use > > > > 'execute_lock' always when the service is MT unsafe. > > > > > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > > Cc: sta...@dpdk.org > > > > > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > Reviewed-by: Phil Yang <phil.y...@arm.com> > > > > --- > > > > > > Thanks for spinning a new revision - based on ML discussion > > > previously, it seems like the "use service-run-count" to avoid this > > > race would be a complex solution. > > > > > > Suggesting the following; > > > 1) Take the approach as per this patch, to always take the atomic, > > > fixing the race condition. > > Ok > > I've micro-benchmarked this code change inside the service cores autotest, > and it introduces around 35 cycles of overhead per service call. This is not > ideal, but given it's a bugfix, and by far the simplest method to fix this > race- > condition. Having discussed and investigated multiple other solutions, I > believe this is the right solution. > Thanks Honnappa and Phil for identifying and driving a solution. You are welcome. Thank you for your timely responses.
> > I suggest to post the benchmarking unit-test addition patch, and integrate > that > *before* the series under review here gets merged? This makes > benchmarking of the "before bugfix" performance in future easier should it be > required. I do not see any issues, would be happy to review. I think we still have time to catch up with RC2 (May 8th). You had also mentioned about calling out that, the control plane APIs are not MT safe. Should I add that to this patch? > > > > > 2) Add an API to service-cores, which allows "committing" of mappings. > > > Committing the mapping would imply that the mappings will not be > > > changed in future. With runtime-remapping being removed from the > > > equation, the existing branch-over-atomic optimization is valid again. > > Ok. Just to make sure I understand this: > > a) on the data plane, if commit API is called (probably a new state > > variable) and num_mapped_cores is set to 1, there is no need to take the > lock. > > b) possible implementation of the commit API would check if > > num_mapped_cores for the service is set to 1 and set a variable to > > indicate that the lock is not required. > > > > What do you think about asking the application to set the service > > capability to MT_SAFE if it knows that the service will run on a > > single core? This would require us to change the documentation and does > not require additional code. > > That's a nice idea - I like that if applications want to micro-optimize around > the atomic, that they have a workaround/solution to do so, particularly that > it > doesn't require code-changes and backporting. > > Will send review and send feedback on the patches themselves. > Regards, -Harry > > > > So this would offer applications two situations > > > A) No application change: possible performance regression due to > > > atomic always taken. > > > B) Call "commit" API, and regain the performance as per previous > > > DPDK versions. > > > > > > Thoughts/opinions on the above? I've flagged the rest of the > > > patchset for review ASAP. Regards, -Harry > > > > > > > lib/librte_eal/common/rte_service.c | 11 +++++------ > > > > 1 file changed, 5 insertions(+), 6 deletions(-) > <snip patch changes>