On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit <na...@vmware.com> wrote: > > From: Andy Lutomirski > Sent: November 5, 2018 at 7:03:49 PM GMT > > To: Nadav Amit <na...@vmware.com> > > Cc: Peter Zijlstra <pet...@infradead.org>, Ingo Molnar <mi...@redhat.com>, > > LKML <linux-kernel@vger.kernel.org>, X86 ML <x...@kernel.org>, H. Peter > > Anvin <h...@zytor.com>, Thomas Gleixner <t...@linutronix.de>, Borislav > > Petkov <b...@alien8.de>, Dave Hansen <dave.han...@linux.intel.com>, Andy > > Lutomirski <l...@kernel.org>, Kees Cook <keesc...@chromium.org>, Dave > > Hansen <dave.han...@intel.com>, Masami Hiramatsu <mhira...@kernel.org> > > Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during > > early_init > > > > > > > > > >> On Nov 5, 2018, at 9:49 AM, Nadav Amit <na...@vmware.com> wrote: > >> > >> From: Andy Lutomirski > >> Sent: November 5, 2018 at 5:22:32 PM GMT > >>> To: Peter Zijlstra <pet...@infradead.org> > >>> Cc: Nadav Amit <na...@vmware.com>, Ingo Molnar <mi...@redhat.com>, > >>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin > >>> <h...@zytor.com>, Thomas Gleixner <t...@linutronix.de>, Borislav Petkov > >>> <b...@alien8.de>, Dave Hansen <dave.han...@linux.intel.com>, Andy > >>> Lutomirski <l...@kernel.org>, Kees Cook <keesc...@chromium.org>, Dave > >>> Hansen <dave.han...@intel.com>, Masami Hiramatsu <mhira...@kernel.org> > >>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during > >>> early_init > >>> > >>> > >>> > >>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra <pet...@infradead.org> wrote: > >>>>> > >>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote: > >>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > >>>>> index aac0c1f7e354..367c1d0c20a3 100644 > >>>>> --- a/arch/x86/kernel/jump_label.c > >>>>> +++ b/arch/x86/kernel/jump_label.c > >>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct > >>>>> jump_entry *entry, > >>>>> jmp.offset = jump_entry_target(entry) - > >>>>> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE); > >>>>> > >>>>> - if (early_boot_irqs_disabled) > >>>>> + /* > >>>>> + * As long as we are in early boot, we can use text_poke_early(), > >>>>> which > >>>>> + * is more efficient: the memory was still not marked as read-only > >>>>> (it > >>>>> + * is only marked after poking_init()). This also prevents us from > >>>>> using > >>>>> + * text_poke() before poking_init() is called. > >>>>> + */ > >>>>> + if (!early_boot_done) > >>>>> poker = text_poke_early; > >>>>> > >>>>> if (type == JUMP_LABEL_JMP) { > >>>> > >>>> It took me a while to untangle init/maze^H^Hin.c... but I think this > >>>> is all we need: > >>>> > >>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > >>>> index aac0c1f7e354..ed5fe274a7d8 100644 > >>>> --- a/arch/x86/kernel/jump_label.c > >>>> +++ b/arch/x86/kernel/jump_label.c > >>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct > >>>> jump_entry *entry, > >>>> jmp.offset = jump_entry_target(entry) - > >>>> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE); > >>>> > >>>> - if (early_boot_irqs_disabled) > >>>> + /* > >>>> + * As long as we're UP and not yet marked RO, we can use > >>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to > >>>> + * SYSTEM_SCHEDULING before going either. > >>>> + */ > >>>> + if (system_state == SYSTEM_BOOTING) > >>>> poker = text_poke_early; > >>>> > >>>> if (type == JUMP_LABEL_JMP) { > >>> > >>> Can we move this logic into text_poke() and get rid of text_poke_early()? > >> > >> This will negatively affect poking of modules doing module loading, e.g., > >> apply_paravirt(). This can be resolved by keeping track when the module is > >> write-protected and giving a module parameter to text_poke(). Does it worth > >> the complexity? > > > > Probably not. > > > > OTOH, why does alternative patching need text_poke() at all? Can’t it just > > write to the text? > > Good question. According to my understanding, these games of > text_poke_early() are not needed, at least for modules (on Intel). > > Intel SDM 11.6 "SELF-MODIFYING CODE” says: > > "A write to a memory location in a code segment that is currently cached in > the processor causes the associated cache line (or lines) to be invalidated. > This check is based on the physical address of the instruction.” > > Then the manual talks about prefetched instructions, but the modules code is > presumably not be “prefetchable” at this point. So I think it should be > safe, but I guess that you reviewed Intel/AMD manuals better when you wrote > sync_core().
Beats the heck out of me. Linus, hpa, or Dave, a question for you: suppose I map some page writably, write to it, then upgrade permissions to allow execute. Must I force all CPUs that might execute from it without first serializing to serialize? I suspect this doesn't really affect user code, but it may affect the module loader. To be safe, shouldn't the module loader broadcast an IPI to sync_core() everywhere after loading a module and before making it runnable, regardless of alternative patching? IOW, the right sequence of events probably ought to me: 1. Allocate the memory and map it. 2. Copy in the text. 3. Patch alternatives, etc. This is logically just like (2) from an architectural perspective -- we're just writing to memory that won't be executed. 4. Serialize everything. 5. Run it! > > Anyhow, there should be a function that wraps the memcpy() to keep track > when someone changes the text (for potential future use). > > Does it make sense? Do you want me to give it a spin? Sure, I guess. Linus, what do you think? > > Thanks, > Nadav -- Andy Lutomirski AMA Capital Management, LLC