> On Feb 15, 2019, at 9:21 AM, Edward Cree <ec...@solarflare.com> wrote: > > On 05/02/19 08:50, Nadav Amit wrote: >>> In cases where RCU cannot be used (e.g. because some callees need to RCU >>> synchronise), it might be possible to add a variant that uses >>> synchronize_rcu_tasks() when updating, but this series does not attempt >>> this. >> I wonder why. > Mainly because I have yet to convince myself that it's the Right Thing. > Note also the following (from kernel/rcu/update.c): > > /* * This is a very specialized primitive, intended only for a few uses in > * tracing and other situations requiring manipulation of function > * preambles and profiling hooks. The synchronize_rcu_tasks() function > * is not (yet) intended for heavy use from multiple CPUs. */ > >> This seems like an easy solution, and according to Josh, Steven >> Rostedt and the documentation appears to be valid. > Will it hurt performance, though, if we end up (say) having rcu-tasks- > based synchronisation for updates on every indirect call in the kernel? > (As would result from a plugin-based opt-out approach.)
That’s what batching is for.. > >> As I stated before, I think that the best solution is to use a GCC plugin, >> [...] Such a solution will not enable the calling code to be >> written in C and would require a plugin for each architecture. > I'm afraid I don't see why. If we use the static_calls infrastructure, > but then do a source-level transformation in the compiler plugin to turn > indirect calls into dynamic_calls, it should be possible to create an > opt-out system without any arch-specific code in the plugin (the arch- > specific stuff being all in the static_calls code). > Any reason that can't be done? (Note: I don't know much about GCC > internals, maybe there's something obvious that stops a plugin doing > things like that.) Hmm… I think you are right. It may be possible by hooking into PLUGIN_START_PARSE_FUNCTION or PLUGIN_FINISH_PARSE_FUNCTION event. But, I think source code manipulation is likely to be more error prone and “dirty”. I think that assembly is the right level to deal with indirect calls anyhow, specifically if the same mechanism is used for callee-saved functions. >> Feel free to try my code and give me feedback. I did not get a feedback on my >> last version. Is there a fundamental problem with my plugin? Did you try it >> and got bad results, or perhaps it did not build? > I didn't test your patches yet, because I was busy trying to get mine > working and ready to post (and also with unrelated work). But now that > that's done, next time I have cycles spare for indirect call stuff I > guess testing (and reviewing) your approach will be next on my list. > >> Why do you prefer an approach >> which requires annotation of the callers, instead of something that is much >> more transparent? > I'm concerned about the overhead (in both time and memory) of running > learning on every indirect call site (including ones that aren't in a > hot-path, and ones which have such a wide variety of callees that > promotion really doesn't help) throughout the whole kernel. Also, an > annotating programmer knows the locking/rcu context and can thus tell > whether a given dynamic_call should use synchronise_rcu_tasks(), > synchronise_rcu(), or perhaps something else (if e.g. the call always > happens under a mutex, then the updater work could take that mutex). > > The real answer, though, is that I don't so much prefer this approach, > as think that both should be tried "publicly" and evaluated by more > developers than just us three. There's a reason this series is > marked RFC ;-) Reading my email from ~2 weeks ago - I realize I don’t really understand my own question. Clearly, annotation is better (if possible). My point was mainly that it is a tedious job to annotate all the locations, and there are quite a few.