[This is on top of 58d20fcbd005 "Merge branch 'x86/grub2'" from the -tip tree, to have the macros.S mechanism available].
One can replace various uses of variables that are initialized at init and then never changed, so that the code never actually loads the variable from memory. Instead, the value of the variable gets encoded as an immediate operand. For example, many code paths do something like p = kmem_cache_alloc(foo_cachep, GFP_KERNEL) where foo_cachep is some global variable that is set in an init function. The theory is that one can avoid the cost of a D$ miss by having the cpu load the value of foo_cachep directly into %rdi from the instruction stream. There's no way around the I$ cost of running a piece of code. For system hash tables there are typically two __ro_after_init variables in play, the base and either a shift (e.g. dcache) or a mask (e.g. futex). In both cases, one can implement the entire computation of the relevant hash bucket using just the hash value as input. For now, this just aims at giving a POC implementation of the above access patterns for x86-64, but one can rather easily identify other patterns one might want to support. For example, pgdir_shift could give rise to implementing rai_shl() and rai_shr(), and rai_and() is also an obvious candidate. Going a bit further, and no longer restricting to __ro_after_init variables, one can imagine implementing rai_gt(), rai_leq() etc. via asm goto, to allow comparisons to sysctl limits. But while that might be able to reuse some of this infrastructure, one would need some way to trigger (another) .text update from the sysctl handler. I'm not enforcing that referenced variables are actually __ro_after_init, partly because many of the obvious subjects are merely __read_mostly, partly to be able to change some test variables deliberately and see that the rai_load still returns the initial value. The prefix rai_ is probably awful, but seemed to be an available three-letter acronym. Suggestions for better naming are much welcome. Implementation-wise, each access to a rai variable that should be patched shortly after init needs to be annotated using one of the rai_* macros. Doing anything more automatic would likely require a gcc plugin, and I'm not sure all read accesses to rai variables from non-init code should necessarily be patched. I'd really like kmalloc(128, GFP_KERNEL) to do a rai_load() of the appropriate kmalloc cache, but it's likely one runs into some __builtin_constant_p trouble. At each such access, we create four pieces of data: A template with the right instructions for patching in, but with dummy immediates; a thunk which may be slow and stupid, but which computes the correct result until patching is done (and which is also used as an int3 handler), and which is careful not to clobber any registers; a short piece of .text that simply jumps to the thunk, plus nops to make room for the full template; and finally a struct describing the type of access, the variables involved and where to find the template, thunk and instruction to be patched. I'm sure some of this metadata can eventually be discarded with __initdata, but for now I'm just keeping it simple. It's not a big deal when there's only a handful of core users, but if the kmalloc() thing gets implemented, we're going to have lots more rai_entry's. I have no idea how to benchmark this, or if it is worth it at all. Any micro-benchmark would probably just keep the variable in L1 cache, but if one accesses the variable sufficiently rarely that it's no longer in L1, that extra cache miss is hardly noticable. Comments? Flames? Signed-off-by: Rasmus Villemoes <li...@rasmusvillemoes.dk> --- include/linux/rai.h | 83 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 include/linux/rai.h diff --git a/include/linux/rai.h b/include/linux/rai.h new file mode 100644 index 000000000000..e839454000ee --- /dev/null +++ b/include/linux/rai.h @@ -0,0 +1,83 @@ +#ifndef _LINUX_RAI_H +#define _LINUX_RAI_H + +/* + * These document the behaviour any arch implementation of _rai_* + * should have, and can be used by those in cases the arch does not + * want to handle (e.g. _rai_load of a 2-byte quantity). + */ +#define _rai_load_fallback(var) (var) +#define _rai_bucket_shift_fallback(base, shift, hash) (&(base)[(hash) >> (shift)]) +#define _rai_bucket_mask_fallback(base, mask, hash) (&(base)[(hash) & (mask)]) + +#ifdef CONFIG_ARCH_HAS_RAI +#include <asm/rai.h> +void update_rai_access(void); +#else +static inline void update_rai_access(void) {} +#endif + +#ifdef MODULE /* don't bother with modules for now */ +#undef _rai_load +#undef _rai_bucket_shift +#undef _rai_bucket_mask +#endif + +/* Make sure all _rai_* are defined. */ +#ifndef _rai_load +#define _rai_load _rai_load_fallback +#endif +#ifndef _rai_bucket_shift +#define _rai_bucket_shift _rai_bucket_shift_fallback +#endif +#ifndef _rai_bucket_mask +#define _rai_bucket_mask _rai_bucket_mask_fallback +#endif + + +/* + * The non-underscored rai_* are property of this header, so that it + * can do tricks like defining debugging versions. Usually, it just + * defines rai_foo as _rai_foo, with the latter being guaranteed to be + * defined by the above logic. + */ +#if defined(CONFIG_RAI_DEBUG) + +#include <bug.h> + +#define rai_warn(what, expect, got) \ + WARN_ONCE(expect != got, \ + "%s:%d: %s() returned %*phN, expected %*phN\n", \ + __FILE__, __LINE__, what, \ + (int)sizeof(got), &(got), \ + (int)sizeof(expect), &(expect)) + +#define rai_load(var) ({ \ + typeof(var) v1 = _rai_load_fallback(var); \ + typeof(var) v2 = _rai_load(var); \ + rai_warn("rai_load", v1, v2); \ + (v1); /* chicken */ \ + }) + +#define rai_bucket_shift(base, shift, hash) ({ \ + typeof(hash) h = (hash); \ + typeof(base) b1 = _rai_bucket_shift_fallback(base, shift, h); \ + typeof(base) b2 = _rai_bucket_shift(base, shift, h); \ + rai_warn("rai_bucket_shift", b1, b2); \ + (b1); \ + }) + +#define rai_bucket_mask(base, mask, hash) ({ \ + typeof(hash) h = (hash); \ + typeof(base) b1 = _rai_bucket_mask_fallback(base, mask, h); \ + typeof(base) b2 = _rai_bucket_mask(base, mask, h); \ + rai_warn("rai_bucket_mask", b1, b2); \ + (b1); \ + }) +#else +#define rai_load _rai_load +#define rai_bucket_shift _rai_bucket_shift +#define rai_bucket_mask _rai_bucket_mask +#endif + +#endif /* _LINUX_RAI_H */ -- 2.19.1.6.gbde171bbf5