Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-08 Thread Peter Zijlstra
On Wed, Nov 07, 2018 at 07:13:03PM +, Nadav Amit wrote:
> > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> > index aac0c1f7e354..ed5fe274a7d8 100644
> > --- a/arch/x86/kernel/jump_label.c
> > +++ b/arch/x86/kernel/jump_label.c
> > @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
> > jump_entry *entry,
> > jmp.offset = jump_entry_target(entry) -
> >  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> > 
> > -   if (early_boot_irqs_disabled)
> > +   /*
> > +* As long as we're UP and not yet marked RO, we can use
> > +* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> > +* SYSTEM_SCHEDULING before going either.
> > +*/
> > +   if (system_state == SYSTEM_BOOTING)
> > poker = text_poke_early;
> > 
> > if (type == JUMP_LABEL_JMP) {
> 
> Thanks for this change, I will incorporate it.
> 
> I wanted to point a small difference from my version. Although this version
> ensures we are UP and the kernel is still RW, preemption is possible with
> this version. I presume that it should not affect jump-labels, since it
> switches between JMP and multi-byte NOPs.

Right, we're never running the code we're going to change on UP.


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-08 Thread Peter Zijlstra
On Wed, Nov 07, 2018 at 07:13:03PM +, Nadav Amit wrote:
> > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> > index aac0c1f7e354..ed5fe274a7d8 100644
> > --- a/arch/x86/kernel/jump_label.c
> > +++ b/arch/x86/kernel/jump_label.c
> > @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
> > jump_entry *entry,
> > jmp.offset = jump_entry_target(entry) -
> >  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> > 
> > -   if (early_boot_irqs_disabled)
> > +   /*
> > +* As long as we're UP and not yet marked RO, we can use
> > +* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> > +* SYSTEM_SCHEDULING before going either.
> > +*/
> > +   if (system_state == SYSTEM_BOOTING)
> > poker = text_poke_early;
> > 
> > if (type == JUMP_LABEL_JMP) {
> 
> Thanks for this change, I will incorporate it.
> 
> I wanted to point a small difference from my version. Although this version
> ensures we are UP and the kernel is still RW, preemption is possible with
> this version. I presume that it should not affect jump-labels, since it
> switches between JMP and multi-byte NOPs.

Right, we're never running the code we're going to change on UP.


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-07 Thread Nadav Amit
From: Peter Zijlstra
Sent: November 5, 2018 at 2:09:25 PM GMT
> To: Nadav Amit 
> Cc: Ingo Molnar , linux-kernel@vger.kernel.org, 
> x...@kernel.org, H. Peter Anvin , Thomas Gleixner 
> , Borislav Petkov , Dave Hansen 
> , Andy Lutomirski , Kees Cook 
> , Dave Hansen , Masami 
> Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..367c1d0c20a3 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>  jmp.offset = jump_entry_target(entry) -
>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we are in early boot, we can use text_poke_early(), which
>> + * is more efficient: the memory was still not marked as read-only (it
>> + * is only marked after poking_init()). This also prevents us from using
>> + * text_poke() before poking_init() is called.
>> + */
>> +if (!early_boot_done)
>>  poker = text_poke_early;
>> 
>>  if (type == JUMP_LABEL_JMP) {
> 
> It took me a while to untangle init/maze^H^Hin.c... but I think this
> is all we need:
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..ed5fe274a7d8 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>   jmp.offset = jump_entry_target(entry) -
>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> 
> - if (early_boot_irqs_disabled)
> + /*
> +  * As long as we're UP and not yet marked RO, we can use
> +  * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> +  * SYSTEM_SCHEDULING before going either.
> +  */
> + if (system_state == SYSTEM_BOOTING)
>   poker = text_poke_early;
> 
>   if (type == JUMP_LABEL_JMP) {

Thanks for this change, I will incorporate it.

I wanted to point a small difference from my version. Although this version
ensures we are UP and the kernel is still RW, preemption is possible with
this version. I presume that it should not affect jump-labels, since it
switches between JMP and multi-byte NOPs.

Thanks,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-07 Thread Nadav Amit
From: Peter Zijlstra
Sent: November 5, 2018 at 2:09:25 PM GMT
> To: Nadav Amit 
> Cc: Ingo Molnar , linux-kernel@vger.kernel.org, 
> x...@kernel.org, H. Peter Anvin , Thomas Gleixner 
> , Borislav Petkov , Dave Hansen 
> , Andy Lutomirski , Kees Cook 
> , Dave Hansen , Masami 
> Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..367c1d0c20a3 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>  jmp.offset = jump_entry_target(entry) -
>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we are in early boot, we can use text_poke_early(), which
>> + * is more efficient: the memory was still not marked as read-only (it
>> + * is only marked after poking_init()). This also prevents us from using
>> + * text_poke() before poking_init() is called.
>> + */
>> +if (!early_boot_done)
>>  poker = text_poke_early;
>> 
>>  if (type == JUMP_LABEL_JMP) {
> 
> It took me a while to untangle init/maze^H^Hin.c... but I think this
> is all we need:
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..ed5fe274a7d8 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>   jmp.offset = jump_entry_target(entry) -
>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> 
> - if (early_boot_irqs_disabled)
> + /*
> +  * As long as we're UP and not yet marked RO, we can use
> +  * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> +  * SYSTEM_SCHEDULING before going either.
> +  */
> + if (system_state == SYSTEM_BOOTING)
>   poker = text_poke_early;
> 
>   if (type == JUMP_LABEL_JMP) {

Thanks for this change, I will incorporate it.

I wanted to point a small difference from my version. Although this version
ensures we are UP and the kernel is still RW, preemption is possible with
this version. I presume that it should not affect jump-labels, since it
switches between JMP and multi-byte NOPs.

Thanks,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Thomas Gleixner
Sent: November 5, 2018 at 8:28:29 PM GMT
> To: Andy Lutomirski 
> Cc: Nadav Amit , Linus Torvalds 
> , H. Peter Anvin , Peter 
> Zijlstra , Ingo Molnar , LKML 
> , X86 ML , Borislav Petkov 
> , Dave Hansen , Andrew 
> Lutomirski , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> On Mon, 5 Nov 2018, Andy Lutomirski wrote:
>> On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
>> Linus, hpa, or Dave, a question for you: suppose I map some page
>> writably, write to it, then upgrade permissions to allow execute.
>> Must I force all CPUs that might execute from it without first
>> serializing to serialize?  I suspect this doesn't really affect user
>> code, but it may affect the module loader.
>> 
>> To be safe, shouldn't the module loader broadcast an IPI to
>> sync_core() everywhere after loading a module and before making it
>> runnable, regardless of alternative patching?
>> 
>> IOW, the right sequence of events probably ought to me:
>> 
>> 1. Allocate the memory and map it.
>> 2. Copy in the text.
>> 3. Patch alternatives, etc.  This is logically just like (2) from an
>> architectural perspective -- we're just writing to memory that won't
>> be executed.
>> 4. Serialize everything.
>> 5. Run it!
> 
> I'd make that:
> 
> 1. Allocate the memory and map it RW
> 2. Copy in the text.
> 3. Patch alternatives, etc.  This is logically just like (2) from an
>   architectural perspective -- we're just writing to memory that won't
>   be executed.
> 4. Map it RX
> 5. Serialize everything.
> 6. Run it!

Thanks. I will do something along these lines. This can improve module
loading time (saving IRQ save/restore time), but it will not make things
much prettier, since two code-paths for “early init kernel” and “early init
module” would be needed.



Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Thomas Gleixner
Sent: November 5, 2018 at 8:28:29 PM GMT
> To: Andy Lutomirski 
> Cc: Nadav Amit , Linus Torvalds 
> , H. Peter Anvin , Peter 
> Zijlstra , Ingo Molnar , LKML 
> , X86 ML , Borislav Petkov 
> , Dave Hansen , Andrew 
> Lutomirski , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> On Mon, 5 Nov 2018, Andy Lutomirski wrote:
>> On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
>> Linus, hpa, or Dave, a question for you: suppose I map some page
>> writably, write to it, then upgrade permissions to allow execute.
>> Must I force all CPUs that might execute from it without first
>> serializing to serialize?  I suspect this doesn't really affect user
>> code, but it may affect the module loader.
>> 
>> To be safe, shouldn't the module loader broadcast an IPI to
>> sync_core() everywhere after loading a module and before making it
>> runnable, regardless of alternative patching?
>> 
>> IOW, the right sequence of events probably ought to me:
>> 
>> 1. Allocate the memory and map it.
>> 2. Copy in the text.
>> 3. Patch alternatives, etc.  This is logically just like (2) from an
>> architectural perspective -- we're just writing to memory that won't
>> be executed.
>> 4. Serialize everything.
>> 5. Run it!
> 
> I'd make that:
> 
> 1. Allocate the memory and map it RW
> 2. Copy in the text.
> 3. Patch alternatives, etc.  This is logically just like (2) from an
>   architectural perspective -- we're just writing to memory that won't
>   be executed.
> 4. Map it RX
> 5. Serialize everything.
> 6. Run it!

Thanks. I will do something along these lines. This can improve module
loading time (saving IRQ save/restore time), but it will not make things
much prettier, since two code-paths for “early init kernel” and “early init
module” would be needed.



Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Thomas Gleixner
On Mon, 5 Nov 2018, Andy Lutomirski wrote:
> On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
> Linus, hpa, or Dave, a question for you: suppose I map some page
> writably, write to it, then upgrade permissions to allow execute.
> Must I force all CPUs that might execute from it without first
> serializing to serialize?  I suspect this doesn't really affect user
> code, but it may affect the module loader.
> 
> To be safe, shouldn't the module loader broadcast an IPI to
> sync_core() everywhere after loading a module and before making it
> runnable, regardless of alternative patching?
> 
> IOW, the right sequence of events probably ought to me:
> 
> 1. Allocate the memory and map it.
> 2. Copy in the text.
> 3. Patch alternatives, etc.  This is logically just like (2) from an
> architectural perspective -- we're just writing to memory that won't
> be executed.
> 4. Serialize everything.
> 5. Run it!

I'd make that:

1. Allocate the memory and map it RW
2. Copy in the text.
3. Patch alternatives, etc.  This is logically just like (2) from an
   architectural perspective -- we're just writing to memory that won't
   be executed.
4. Map it RX
5. Serialize everything.
6. Run it!

Thanks,

tglx


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Thomas Gleixner
On Mon, 5 Nov 2018, Andy Lutomirski wrote:
> On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
> Linus, hpa, or Dave, a question for you: suppose I map some page
> writably, write to it, then upgrade permissions to allow execute.
> Must I force all CPUs that might execute from it without first
> serializing to serialize?  I suspect this doesn't really affect user
> code, but it may affect the module loader.
> 
> To be safe, shouldn't the module loader broadcast an IPI to
> sync_core() everywhere after loading a module and before making it
> runnable, regardless of alternative patching?
> 
> IOW, the right sequence of events probably ought to me:
> 
> 1. Allocate the memory and map it.
> 2. Copy in the text.
> 3. Patch alternatives, etc.  This is logically just like (2) from an
> architectural perspective -- we're just writing to memory that won't
> be executed.
> 4. Serialize everything.
> 5. Run it!

I'd make that:

1. Allocate the memory and map it RW
2. Copy in the text.
3. Patch alternatives, etc.  This is logically just like (2) from an
   architectural perspective -- we're just writing to memory that won't
   be executed.
4. Map it RX
5. Serialize everything.
6. Run it!

Thanks,

tglx


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski
On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
>
> From: Andy Lutomirski
> Sent: November 5, 2018 at 7:03:49 PM GMT
> > To: Nadav Amit 
> > Cc: Peter Zijlstra , Ingo Molnar , 
> > LKML , X86 ML , H. Peter 
> > Anvin , Thomas Gleixner , Borislav 
> > Petkov , Dave Hansen , Andy 
> > Lutomirski , Kees Cook , Dave 
> > Hansen , Masami Hiramatsu 
> > Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> > early_init
> >
> >
> >
> >
> >> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
> >>
> >> From: Andy Lutomirski
> >> Sent: November 5, 2018 at 5:22:32 PM GMT
> >>> To: Peter Zijlstra 
> >>> Cc: Nadav Amit , Ingo Molnar , 
> >>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
> >>> , Thomas Gleixner , Borislav Petkov 
> >>> , Dave Hansen , Andy 
> >>> Lutomirski , Kees Cook , Dave 
> >>> Hansen , Masami Hiramatsu 
> >>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> >>> early_init
> >>>
> >>>
> >>>
> >>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
> >>>>>
> >>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> >>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> >>>>> index aac0c1f7e354..367c1d0c20a3 100644
> >>>>> --- a/arch/x86/kernel/jump_label.c
> >>>>> +++ b/arch/x86/kernel/jump_label.c
> >>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
> >>>>> jump_entry *entry,
> >>>>> jmp.offset = jump_entry_target(entry) -
> >>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> >>>>>
> >>>>> -if (early_boot_irqs_disabled)
> >>>>> +/*
> >>>>> + * As long as we are in early boot, we can use text_poke_early(), 
> >>>>> which
> >>>>> + * is more efficient: the memory was still not marked as read-only 
> >>>>> (it
> >>>>> + * is only marked after poking_init()). This also prevents us from 
> >>>>> using
> >>>>> + * text_poke() before poking_init() is called.
> >>>>> + */
> >>>>> +if (!early_boot_done)
> >>>>> poker = text_poke_early;
> >>>>>
> >>>>> if (type == JUMP_LABEL_JMP) {
> >>>>
> >>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
> >>>> is all we need:
> >>>>
> >>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> >>>> index aac0c1f7e354..ed5fe274a7d8 100644
> >>>> --- a/arch/x86/kernel/jump_label.c
> >>>> +++ b/arch/x86/kernel/jump_label.c
> >>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
> >>>> jump_entry *entry,
> >>>> jmp.offset = jump_entry_target(entry) -
> >>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> >>>>
> >>>> -if (early_boot_irqs_disabled)
> >>>> +/*
> >>>> + * As long as we're UP and not yet marked RO, we can use
> >>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> >>>> + * SYSTEM_SCHEDULING before going either.
> >>>> + */
> >>>> +if (system_state == SYSTEM_BOOTING)
> >>>> poker = text_poke_early;
> >>>>
> >>>> if (type == JUMP_LABEL_JMP) {
> >>>
> >>> Can we move this logic into text_poke() and get rid of text_poke_early()?
> >>
> >> This will negatively affect poking of modules doing module loading, e.g.,
> >> apply_paravirt(). This can be resolved by keeping track when the module is
> >> write-protected and giving a module parameter to text_poke(). Does it worth
> >> the complexity?
> >
> > Probably not.
> >
> > OTOH, why does alternative patching need text_poke() at all? Can’t it just
> > write to the text?
>
> Good question. According to my understanding, these games of
> text_poke_early() are not needed, at least for modules (on Intel).
>
> Intel SDM 11.6 "SELF-MODIFYING CODE” says:
>
> "A write to a memory location in a code segment that is currently cached in
>

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski
On Mon, Nov 5, 2018 at 11:25 AM Nadav Amit  wrote:
>
> From: Andy Lutomirski
> Sent: November 5, 2018 at 7:03:49 PM GMT
> > To: Nadav Amit 
> > Cc: Peter Zijlstra , Ingo Molnar , 
> > LKML , X86 ML , H. Peter 
> > Anvin , Thomas Gleixner , Borislav 
> > Petkov , Dave Hansen , Andy 
> > Lutomirski , Kees Cook , Dave 
> > Hansen , Masami Hiramatsu 
> > Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> > early_init
> >
> >
> >
> >
> >> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
> >>
> >> From: Andy Lutomirski
> >> Sent: November 5, 2018 at 5:22:32 PM GMT
> >>> To: Peter Zijlstra 
> >>> Cc: Nadav Amit , Ingo Molnar , 
> >>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
> >>> , Thomas Gleixner , Borislav Petkov 
> >>> , Dave Hansen , Andy 
> >>> Lutomirski , Kees Cook , Dave 
> >>> Hansen , Masami Hiramatsu 
> >>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> >>> early_init
> >>>
> >>>
> >>>
> >>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
> >>>>>
> >>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> >>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> >>>>> index aac0c1f7e354..367c1d0c20a3 100644
> >>>>> --- a/arch/x86/kernel/jump_label.c
> >>>>> +++ b/arch/x86/kernel/jump_label.c
> >>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
> >>>>> jump_entry *entry,
> >>>>> jmp.offset = jump_entry_target(entry) -
> >>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> >>>>>
> >>>>> -if (early_boot_irqs_disabled)
> >>>>> +/*
> >>>>> + * As long as we are in early boot, we can use text_poke_early(), 
> >>>>> which
> >>>>> + * is more efficient: the memory was still not marked as read-only 
> >>>>> (it
> >>>>> + * is only marked after poking_init()). This also prevents us from 
> >>>>> using
> >>>>> + * text_poke() before poking_init() is called.
> >>>>> + */
> >>>>> +if (!early_boot_done)
> >>>>> poker = text_poke_early;
> >>>>>
> >>>>> if (type == JUMP_LABEL_JMP) {
> >>>>
> >>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
> >>>> is all we need:
> >>>>
> >>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> >>>> index aac0c1f7e354..ed5fe274a7d8 100644
> >>>> --- a/arch/x86/kernel/jump_label.c
> >>>> +++ b/arch/x86/kernel/jump_label.c
> >>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
> >>>> jump_entry *entry,
> >>>> jmp.offset = jump_entry_target(entry) -
> >>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> >>>>
> >>>> -if (early_boot_irqs_disabled)
> >>>> +/*
> >>>> + * As long as we're UP and not yet marked RO, we can use
> >>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> >>>> + * SYSTEM_SCHEDULING before going either.
> >>>> + */
> >>>> +if (system_state == SYSTEM_BOOTING)
> >>>> poker = text_poke_early;
> >>>>
> >>>> if (type == JUMP_LABEL_JMP) {
> >>>
> >>> Can we move this logic into text_poke() and get rid of text_poke_early()?
> >>
> >> This will negatively affect poking of modules doing module loading, e.g.,
> >> apply_paravirt(). This can be resolved by keeping track when the module is
> >> write-protected and giving a module parameter to text_poke(). Does it worth
> >> the complexity?
> >
> > Probably not.
> >
> > OTOH, why does alternative patching need text_poke() at all? Can’t it just
> > write to the text?
>
> Good question. According to my understanding, these games of
> text_poke_early() are not needed, at least for modules (on Intel).
>
> Intel SDM 11.6 "SELF-MODIFYING CODE” says:
>
> "A write to a memory location in a code segment that is currently cached in
>

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 5, 2018 at 7:03:49 PM GMT
> To: Nadav Amit 
> Cc: Peter Zijlstra , Ingo Molnar , 
> LKML , X86 ML , H. Peter Anvin 
> , Thomas Gleixner , Borislav Petkov 
> , Dave Hansen , Andy Lutomirski 
> , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> 
> 
>> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
>> 
>> From: Andy Lutomirski
>> Sent: November 5, 2018 at 5:22:32 PM GMT
>>> To: Peter Zijlstra 
>>> Cc: Nadav Amit , Ingo Molnar , 
>>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
>>> , Thomas Gleixner , Borislav Petkov 
>>> , Dave Hansen , Andy 
>>> Lutomirski , Kees Cook , Dave 
>>> Hansen , Masami Hiramatsu 
>>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
>>> early_init
>>> 
>>> 
>>> 
>>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>>>>> 
>>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>>> index aac0c1f7e354..367c1d0c20a3 100644
>>>>> --- a/arch/x86/kernel/jump_label.c
>>>>> +++ b/arch/x86/kernel/jump_label.c
>>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>>>> jump_entry *entry,
>>>>> jmp.offset = jump_entry_target(entry) -
>>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>>> 
>>>>> -if (early_boot_irqs_disabled)
>>>>> +/*
>>>>> + * As long as we are in early boot, we can use text_poke_early(), 
>>>>> which
>>>>> + * is more efficient: the memory was still not marked as read-only 
>>>>> (it
>>>>> + * is only marked after poking_init()). This also prevents us from 
>>>>> using
>>>>> + * text_poke() before poking_init() is called.
>>>>> + */
>>>>> +if (!early_boot_done)
>>>>> poker = text_poke_early;
>>>>> 
>>>>> if (type == JUMP_LABEL_JMP) {
>>>> 
>>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>>>> is all we need:
>>>> 
>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>> index aac0c1f7e354..ed5fe274a7d8 100644
>>>> --- a/arch/x86/kernel/jump_label.c
>>>> +++ b/arch/x86/kernel/jump_label.c
>>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>>>> jump_entry *entry,
>>>> jmp.offset = jump_entry_target(entry) -
>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>> 
>>>> -if (early_boot_irqs_disabled)
>>>> +/*
>>>> + * As long as we're UP and not yet marked RO, we can use
>>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>>>> + * SYSTEM_SCHEDULING before going either.
>>>> + */
>>>> +if (system_state == SYSTEM_BOOTING)
>>>> poker = text_poke_early;
>>>> 
>>>> if (type == JUMP_LABEL_JMP) {
>>> 
>>> Can we move this logic into text_poke() and get rid of text_poke_early()?
>> 
>> This will negatively affect poking of modules doing module loading, e.g.,
>> apply_paravirt(). This can be resolved by keeping track when the module is
>> write-protected and giving a module parameter to text_poke(). Does it worth
>> the complexity?
> 
> Probably not.
> 
> OTOH, why does alternative patching need text_poke() at all? Can’t it just
> write to the text?

Good question. According to my understanding, these games of
text_poke_early() are not needed, at least for modules (on Intel).

Intel SDM 11.6 "SELF-MODIFYING CODE” says: 

"A write to a memory location in a code segment that is currently cached in
the processor causes the associated cache line (or lines) to be invalidated.
This check is based on the physical address of the instruction.”

Then the manual talks about prefetched instructions, but the modules code is
presumably not be “prefetchable” at this point. So I think it should be
safe, but I guess that you reviewed Intel/AMD manuals better when you wrote
sync_core().

Anyhow, there should be a function that wraps the memcpy() to keep track
when someone changes the text (for potential future use).

Does it make sense? Do you want me to give it a spin?

Thanks,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 5, 2018 at 7:03:49 PM GMT
> To: Nadav Amit 
> Cc: Peter Zijlstra , Ingo Molnar , 
> LKML , X86 ML , H. Peter Anvin 
> , Thomas Gleixner , Borislav Petkov 
> , Dave Hansen , Andy Lutomirski 
> , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> 
> 
>> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
>> 
>> From: Andy Lutomirski
>> Sent: November 5, 2018 at 5:22:32 PM GMT
>>> To: Peter Zijlstra 
>>> Cc: Nadav Amit , Ingo Molnar , 
>>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
>>> , Thomas Gleixner , Borislav Petkov 
>>> , Dave Hansen , Andy 
>>> Lutomirski , Kees Cook , Dave 
>>> Hansen , Masami Hiramatsu 
>>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
>>> early_init
>>> 
>>> 
>>> 
>>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>>>>> 
>>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>>> index aac0c1f7e354..367c1d0c20a3 100644
>>>>> --- a/arch/x86/kernel/jump_label.c
>>>>> +++ b/arch/x86/kernel/jump_label.c
>>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>>>> jump_entry *entry,
>>>>> jmp.offset = jump_entry_target(entry) -
>>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>>> 
>>>>> -if (early_boot_irqs_disabled)
>>>>> +/*
>>>>> + * As long as we are in early boot, we can use text_poke_early(), 
>>>>> which
>>>>> + * is more efficient: the memory was still not marked as read-only 
>>>>> (it
>>>>> + * is only marked after poking_init()). This also prevents us from 
>>>>> using
>>>>> + * text_poke() before poking_init() is called.
>>>>> + */
>>>>> +if (!early_boot_done)
>>>>> poker = text_poke_early;
>>>>> 
>>>>> if (type == JUMP_LABEL_JMP) {
>>>> 
>>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>>>> is all we need:
>>>> 
>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>> index aac0c1f7e354..ed5fe274a7d8 100644
>>>> --- a/arch/x86/kernel/jump_label.c
>>>> +++ b/arch/x86/kernel/jump_label.c
>>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>>>> jump_entry *entry,
>>>> jmp.offset = jump_entry_target(entry) -
>>>>  (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>> 
>>>> -if (early_boot_irqs_disabled)
>>>> +/*
>>>> + * As long as we're UP and not yet marked RO, we can use
>>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>>>> + * SYSTEM_SCHEDULING before going either.
>>>> + */
>>>> +if (system_state == SYSTEM_BOOTING)
>>>> poker = text_poke_early;
>>>> 
>>>> if (type == JUMP_LABEL_JMP) {
>>> 
>>> Can we move this logic into text_poke() and get rid of text_poke_early()?
>> 
>> This will negatively affect poking of modules doing module loading, e.g.,
>> apply_paravirt(). This can be resolved by keeping track when the module is
>> write-protected and giving a module parameter to text_poke(). Does it worth
>> the complexity?
> 
> Probably not.
> 
> OTOH, why does alternative patching need text_poke() at all? Can’t it just
> write to the text?

Good question. According to my understanding, these games of
text_poke_early() are not needed, at least for modules (on Intel).

Intel SDM 11.6 "SELF-MODIFYING CODE” says: 

"A write to a memory location in a code segment that is currently cached in
the processor causes the associated cache line (or lines) to be invalidated.
This check is based on the physical address of the instruction.”

Then the manual talks about prefetched instructions, but the modules code is
presumably not be “prefetchable” at this point. So I think it should be
safe, but I guess that you reviewed Intel/AMD manuals better when you wrote
sync_core().

Anyhow, there should be a function that wraps the memcpy() to keep track
when someone changes the text (for potential future use).

Does it make sense? Do you want me to give it a spin?

Thanks,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski



> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
> 
> From: Andy Lutomirski
> Sent: November 5, 2018 at 5:22:32 PM GMT
>> To: Peter Zijlstra 
>> Cc: Nadav Amit , Ingo Molnar , 
>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
>> , Thomas Gleixner , Borislav Petkov 
>> , Dave Hansen , Andy Lutomirski 
>> , Kees Cook , Dave Hansen 
>> , Masami Hiramatsu 
>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
>> early_init
>> 
>> 
>> 
>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>>>> 
>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>> index aac0c1f7e354..367c1d0c20a3 100644
>>>> --- a/arch/x86/kernel/jump_label.c
>>>> +++ b/arch/x86/kernel/jump_label.c
>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>>> jump_entry *entry,
>>>>  jmp.offset = jump_entry_target(entry) -
>>>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>> 
>>>> -if (early_boot_irqs_disabled)
>>>> +/*
>>>> + * As long as we are in early boot, we can use text_poke_early(), 
>>>> which
>>>> + * is more efficient: the memory was still not marked as read-only (it
>>>> + * is only marked after poking_init()). This also prevents us from 
>>>> using
>>>> + * text_poke() before poking_init() is called.
>>>> + */
>>>> +if (!early_boot_done)
>>>>  poker = text_poke_early;
>>>> 
>>>>  if (type == JUMP_LABEL_JMP) {
>>> 
>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>>> is all we need:
>>> 
>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>> index aac0c1f7e354..ed5fe274a7d8 100644
>>> --- a/arch/x86/kernel/jump_label.c
>>> +++ b/arch/x86/kernel/jump_label.c
>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>>> jump_entry *entry,
>>>  jmp.offset = jump_entry_target(entry) -
>>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>> 
>>> -if (early_boot_irqs_disabled)
>>> +/*
>>> + * As long as we're UP and not yet marked RO, we can use
>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>>> + * SYSTEM_SCHEDULING before going either.
>>> + */
>>> +if (system_state == SYSTEM_BOOTING)
>>>  poker = text_poke_early;
>>> 
>>>  if (type == JUMP_LABEL_JMP) {
>> 
>> Can we move this logic into text_poke() and get rid of text_poke_early()?
> 
> This will negatively affect poking of modules doing module loading, e.g.,
> apply_paravirt(). This can be resolved by keeping track when the module is
> write-protected and giving a module parameter to text_poke(). Does it worth
> the complexity?

Probably not.

OTOH, why does alternative patching need text_poke() at all?  Can’t it just 
write to the text?

> 
>> FWIW, alternative patching was, at some point, a significant fraction of
>> total boot time in some cases. This was probably mostly due to unnecessary
>> sync_core() calls. Although I think this was reported on a VM, and
>> sync_core() used to be *extremely* expensive on a VM, but that’s fixed
>> now, and it even got backported, I think.
>> 
>> (Hmm. Maybe we can also make jump label patching work in early boot, too!)
> 
> It may be possible to resolve the dependencies between poking_init() and the
> other *_init(). I first considered doing that, yet, it makes the code very
> fragile, and I don’t see the value in getting rid of text_poke_early() from
> security or simplicity point of views. Let me know if you think otherwise.
> 
> Regards,
> Nadav


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski



> On Nov 5, 2018, at 9:49 AM, Nadav Amit  wrote:
> 
> From: Andy Lutomirski
> Sent: November 5, 2018 at 5:22:32 PM GMT
>> To: Peter Zijlstra 
>> Cc: Nadav Amit , Ingo Molnar , 
>> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
>> , Thomas Gleixner , Borislav Petkov 
>> , Dave Hansen , Andy Lutomirski 
>> , Kees Cook , Dave Hansen 
>> , Masami Hiramatsu 
>> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
>> early_init
>> 
>> 
>> 
>>>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>>>> 
>>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>>> index aac0c1f7e354..367c1d0c20a3 100644
>>>> --- a/arch/x86/kernel/jump_label.c
>>>> +++ b/arch/x86/kernel/jump_label.c
>>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>>> jump_entry *entry,
>>>>  jmp.offset = jump_entry_target(entry) -
>>>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>>> 
>>>> -if (early_boot_irqs_disabled)
>>>> +/*
>>>> + * As long as we are in early boot, we can use text_poke_early(), 
>>>> which
>>>> + * is more efficient: the memory was still not marked as read-only (it
>>>> + * is only marked after poking_init()). This also prevents us from 
>>>> using
>>>> + * text_poke() before poking_init() is called.
>>>> + */
>>>> +if (!early_boot_done)
>>>>  poker = text_poke_early;
>>>> 
>>>>  if (type == JUMP_LABEL_JMP) {
>>> 
>>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>>> is all we need:
>>> 
>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>> index aac0c1f7e354..ed5fe274a7d8 100644
>>> --- a/arch/x86/kernel/jump_label.c
>>> +++ b/arch/x86/kernel/jump_label.c
>>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>>> jump_entry *entry,
>>>  jmp.offset = jump_entry_target(entry) -
>>>   (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>> 
>>> -if (early_boot_irqs_disabled)
>>> +/*
>>> + * As long as we're UP and not yet marked RO, we can use
>>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>>> + * SYSTEM_SCHEDULING before going either.
>>> + */
>>> +if (system_state == SYSTEM_BOOTING)
>>>  poker = text_poke_early;
>>> 
>>>  if (type == JUMP_LABEL_JMP) {
>> 
>> Can we move this logic into text_poke() and get rid of text_poke_early()?
> 
> This will negatively affect poking of modules doing module loading, e.g.,
> apply_paravirt(). This can be resolved by keeping track when the module is
> write-protected and giving a module parameter to text_poke(). Does it worth
> the complexity?

Probably not.

OTOH, why does alternative patching need text_poke() at all?  Can’t it just 
write to the text?

> 
>> FWIW, alternative patching was, at some point, a significant fraction of
>> total boot time in some cases. This was probably mostly due to unnecessary
>> sync_core() calls. Although I think this was reported on a VM, and
>> sync_core() used to be *extremely* expensive on a VM, but that’s fixed
>> now, and it even got backported, I think.
>> 
>> (Hmm. Maybe we can also make jump label patching work in early boot, too!)
> 
> It may be possible to resolve the dependencies between poking_init() and the
> other *_init(). I first considered doing that, yet, it makes the code very
> fragile, and I don’t see the value in getting rid of text_poke_early() from
> security or simplicity point of views. Let me know if you think otherwise.
> 
> Regards,
> Nadav


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 5, 2018 at 5:22:32 PM GMT
> To: Peter Zijlstra 
> Cc: Nadav Amit , Ingo Molnar , 
> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
> , Thomas Gleixner , Borislav Petkov 
> , Dave Hansen , Andy Lutomirski 
> , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> 
>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>> 
>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>> index aac0c1f7e354..367c1d0c20a3 100644
>>> --- a/arch/x86/kernel/jump_label.c
>>> +++ b/arch/x86/kernel/jump_label.c
>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>> jump_entry *entry,
>>>   jmp.offset = jump_entry_target(entry) -
>>>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>> 
>>> -if (early_boot_irqs_disabled)
>>> +/*
>>> + * As long as we are in early boot, we can use text_poke_early(), which
>>> + * is more efficient: the memory was still not marked as read-only (it
>>> + * is only marked after poking_init()). This also prevents us from 
>>> using
>>> + * text_poke() before poking_init() is called.
>>> + */
>>> +if (!early_boot_done)
>>>   poker = text_poke_early;
>>> 
>>>   if (type == JUMP_LABEL_JMP) {
>> 
>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>> is all we need:
>> 
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..ed5fe274a7d8 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>   jmp.offset = jump_entry_target(entry) -
>>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we're UP and not yet marked RO, we can use
>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>> + * SYSTEM_SCHEDULING before going either.
>> + */
>> +if (system_state == SYSTEM_BOOTING)
>>   poker = text_poke_early;
>> 
>>   if (type == JUMP_LABEL_JMP) {
> 
> Can we move this logic into text_poke() and get rid of text_poke_early()?

This will negatively affect poking of modules doing module loading, e.g.,
apply_paravirt(). This can be resolved by keeping track when the module is
write-protected and giving a module parameter to text_poke(). Does it worth
the complexity?

> FWIW, alternative patching was, at some point, a significant fraction of
> total boot time in some cases. This was probably mostly due to unnecessary
> sync_core() calls. Although I think this was reported on a VM, and
> sync_core() used to be *extremely* expensive on a VM, but that’s fixed
> now, and it even got backported, I think.
> 
> (Hmm. Maybe we can also make jump label patching work in early boot, too!)

It may be possible to resolve the dependencies between poking_init() and the
other *_init(). I first considered doing that, yet, it makes the code very
fragile, and I don’t see the value in getting rid of text_poke_early() from
security or simplicity point of views. Let me know if you think otherwise.

Regards,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Nadav Amit
From: Andy Lutomirski
Sent: November 5, 2018 at 5:22:32 PM GMT
> To: Peter Zijlstra 
> Cc: Nadav Amit , Ingo Molnar , 
> linux-kernel@vger.kernel.org, x...@kernel.org, H. Peter Anvin 
> , Thomas Gleixner , Borislav Petkov 
> , Dave Hansen , Andy Lutomirski 
> , Kees Cook , Dave Hansen 
> , Masami Hiramatsu 
> Subject: Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during 
> early_init
> 
> 
> 
>> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
>> 
>>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>>> index aac0c1f7e354..367c1d0c20a3 100644
>>> --- a/arch/x86/kernel/jump_label.c
>>> +++ b/arch/x86/kernel/jump_label.c
>>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>>> jump_entry *entry,
>>>   jmp.offset = jump_entry_target(entry) -
>>>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>>> 
>>> -if (early_boot_irqs_disabled)
>>> +/*
>>> + * As long as we are in early boot, we can use text_poke_early(), which
>>> + * is more efficient: the memory was still not marked as read-only (it
>>> + * is only marked after poking_init()). This also prevents us from 
>>> using
>>> + * text_poke() before poking_init() is called.
>>> + */
>>> +if (!early_boot_done)
>>>   poker = text_poke_early;
>>> 
>>>   if (type == JUMP_LABEL_JMP) {
>> 
>> It took me a while to untangle init/maze^H^Hin.c... but I think this
>> is all we need:
>> 
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..ed5fe274a7d8 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>   jmp.offset = jump_entry_target(entry) -
>>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we're UP and not yet marked RO, we can use
>> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
>> + * SYSTEM_SCHEDULING before going either.
>> + */
>> +if (system_state == SYSTEM_BOOTING)
>>   poker = text_poke_early;
>> 
>>   if (type == JUMP_LABEL_JMP) {
> 
> Can we move this logic into text_poke() and get rid of text_poke_early()?

This will negatively affect poking of modules doing module loading, e.g.,
apply_paravirt(). This can be resolved by keeping track when the module is
write-protected and giving a module parameter to text_poke(). Does it worth
the complexity?

> FWIW, alternative patching was, at some point, a significant fraction of
> total boot time in some cases. This was probably mostly due to unnecessary
> sync_core() calls. Although I think this was reported on a VM, and
> sync_core() used to be *extremely* expensive on a VM, but that’s fixed
> now, and it even got backported, I think.
> 
> (Hmm. Maybe we can also make jump label patching work in early boot, too!)

It may be possible to resolve the dependencies between poking_init() and the
other *_init(). I first considered doing that, yet, it makes the code very
fragile, and I don’t see the value in getting rid of text_poke_early() from
security or simplicity point of views. Let me know if you think otherwise.

Regards,
Nadav

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski


> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
> 
>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..367c1d0c20a3 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>jmp.offset = jump_entry_target(entry) -
>> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we are in early boot, we can use text_poke_early(), which
>> + * is more efficient: the memory was still not marked as read-only (it
>> + * is only marked after poking_init()). This also prevents us from using
>> + * text_poke() before poking_init() is called.
>> + */
>> +if (!early_boot_done)
>>poker = text_poke_early;
>> 
>>if (type == JUMP_LABEL_JMP) {
> 
> It took me a while to untangle init/maze^H^Hin.c... but I think this
> is all we need:
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..ed5fe274a7d8 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>jmp.offset = jump_entry_target(entry) -
> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> 
> -if (early_boot_irqs_disabled)
> +/*
> + * As long as we're UP and not yet marked RO, we can use
> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> + * SYSTEM_SCHEDULING before going either.
> + */
> +if (system_state == SYSTEM_BOOTING)
>poker = text_poke_early;
> 
>if (type == JUMP_LABEL_JMP) {

Can we move this logic into text_poke() and get rid of text_poke_early()?

FWIW, alternative patching was, at some point, a significant fraction of total 
boot time in some cases. This was probably mostly due to unnecessary 
sync_core() calls.  Although I think this was reported on a VM, and sync_core() 
used to be *extremely* expensive on a VM, but that’s fixed now, and it even got 
backported, I think.

(Hmm. Maybe we can also make jump label patching work in early boot, too!)

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Andy Lutomirski


> On Nov 5, 2018, at 6:09 AM, Peter Zijlstra  wrote:
> 
>> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index aac0c1f7e354..367c1d0c20a3 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct 
>> jump_entry *entry,
>>jmp.offset = jump_entry_target(entry) -
>> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>> 
>> -if (early_boot_irqs_disabled)
>> +/*
>> + * As long as we are in early boot, we can use text_poke_early(), which
>> + * is more efficient: the memory was still not marked as read-only (it
>> + * is only marked after poking_init()). This also prevents us from using
>> + * text_poke() before poking_init() is called.
>> + */
>> +if (!early_boot_done)
>>poker = text_poke_early;
>> 
>>if (type == JUMP_LABEL_JMP) {
> 
> It took me a while to untangle init/maze^H^Hin.c... but I think this
> is all we need:
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..ed5fe274a7d8 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>jmp.offset = jump_entry_target(entry) -
> (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
> 
> -if (early_boot_irqs_disabled)
> +/*
> + * As long as we're UP and not yet marked RO, we can use
> + * text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
> + * SYSTEM_SCHEDULING before going either.
> + */
> +if (system_state == SYSTEM_BOOTING)
>poker = text_poke_early;
> 
>if (type == JUMP_LABEL_JMP) {

Can we move this logic into text_poke() and get rid of text_poke_early()?

FWIW, alternative patching was, at some point, a significant fraction of total 
boot time in some cases. This was probably mostly due to unnecessary 
sync_core() calls.  Although I think this was reported on a VM, and sync_core() 
used to be *extremely* expensive on a VM, but that’s fixed now, and it even got 
backported, I think.

(Hmm. Maybe we can also make jump label patching work in early boot, too!)

Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..367c1d0c20a3 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>   jmp.offset = jump_entry_target(entry) -
>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>  
> - if (early_boot_irqs_disabled)
> + /*
> +  * As long as we are in early boot, we can use text_poke_early(), which
> +  * is more efficient: the memory was still not marked as read-only (it
> +  * is only marked after poking_init()). This also prevents us from using
> +  * text_poke() before poking_init() is called.
> +  */
> + if (!early_boot_done)
>   poker = text_poke_early;
>  
>   if (type == JUMP_LABEL_JMP) {

It took me a while to untangle init/maze^H^Hin.c... but I think this
is all we need:

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index aac0c1f7e354..ed5fe274a7d8 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
jmp.offset = jump_entry_target(entry) -
 (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 
-   if (early_boot_irqs_disabled)
+   /*
+* As long as we're UP and not yet marked RO, we can use
+* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
+* SYSTEM_SCHEDULING before going either.
+*/
+   if (system_state == SYSTEM_BOOTING)
poker = text_poke_early;
 
if (type == JUMP_LABEL_JMP) {


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index aac0c1f7e354..367c1d0c20a3 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -52,7 +52,13 @@ static void __ref __jump_label_transform(struct jump_entry 
> *entry,
>   jmp.offset = jump_entry_target(entry) -
>(jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
>  
> - if (early_boot_irqs_disabled)
> + /*
> +  * As long as we are in early boot, we can use text_poke_early(), which
> +  * is more efficient: the memory was still not marked as read-only (it
> +  * is only marked after poking_init()). This also prevents us from using
> +  * text_poke() before poking_init() is called.
> +  */
> + if (!early_boot_done)
>   poker = text_poke_early;
>  
>   if (type == JUMP_LABEL_JMP) {

It took me a while to untangle init/maze^H^Hin.c... but I think this
is all we need:

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index aac0c1f7e354..ed5fe274a7d8 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -52,7 +52,12 @@ static void __ref __jump_label_transform(struct jump_entry 
*entry,
jmp.offset = jump_entry_target(entry) -
 (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 
-   if (early_boot_irqs_disabled)
+   /*
+* As long as we're UP and not yet marked RO, we can use
+* text_poke_early; SYSTEM_BOOTING guarantees both, as we switch to
+* SYSTEM_SCHEDULING before going either.
+*/
+   if (system_state == SYSTEM_BOOTING)
poker = text_poke_early;
 
if (type == JUMP_LABEL_JMP) {


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Mon, Nov 05, 2018 at 01:39:53PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> > diff --git a/init/main.c b/init/main.c
> > index a664246450d1..b0fa26637496 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -117,6 +117,8 @@ extern void radix_tree_init(void);
> >   */
> >  bool early_boot_irqs_disabled __read_mostly;
> >  
> > +u8 early_boot_done __read_mostly;
> > +
> >  enum system_states system_state __read_mostly;
> >  EXPORT_SYMBOL(system_state);
> 
> Should this not be using system_state ^ ? The site is very close to
> SYSTEM_SCHEDULING, can we use that or should we add another state ?

We must be before kernel_init() -> kernel_init_freeable() -> smp_init().

So we need another state, something like SYSTEM_BOOTING_SMP I suppose ?

> > @@ -735,6 +737,8 @@ asmlinkage __visible void __init start_kernel(void)
> > efi_free_boot_services();
> > }
> >  
> > +   early_boot_done = true;
> > +
> > /* Do the rest non-__init'ed, we're now alive */
> > rest_init();
> >  }
> > -- 
> > 2.17.1
> > 


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Mon, Nov 05, 2018 at 01:39:53PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> > diff --git a/init/main.c b/init/main.c
> > index a664246450d1..b0fa26637496 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -117,6 +117,8 @@ extern void radix_tree_init(void);
> >   */
> >  bool early_boot_irqs_disabled __read_mostly;
> >  
> > +u8 early_boot_done __read_mostly;
> > +
> >  enum system_states system_state __read_mostly;
> >  EXPORT_SYMBOL(system_state);
> 
> Should this not be using system_state ^ ? The site is very close to
> SYSTEM_SCHEDULING, can we use that or should we add another state ?

We must be before kernel_init() -> kernel_init_freeable() -> smp_init().

So we need another state, something like SYSTEM_BOOTING_SMP I suppose ?

> > @@ -735,6 +737,8 @@ asmlinkage __visible void __init start_kernel(void)
> > efi_free_boot_services();
> > }
> >  
> > +   early_boot_done = true;
> > +
> > /* Do the rest non-__init'ed, we're now alive */
> > rest_init();
> >  }
> > -- 
> > 2.17.1
> > 


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> diff --git a/init/main.c b/init/main.c
> index a664246450d1..b0fa26637496 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -117,6 +117,8 @@ extern void radix_tree_init(void);
>   */
>  bool early_boot_irqs_disabled __read_mostly;
>  
> +u8 early_boot_done __read_mostly;
> +
>  enum system_states system_state __read_mostly;
>  EXPORT_SYMBOL(system_state);

Should this not be using system_state ^ ? The site is very close to
SYSTEM_SCHEDULING, can we use that or should we add another state ?

> @@ -735,6 +737,8 @@ asmlinkage __visible void __init start_kernel(void)
>   efi_free_boot_services();
>   }
>  
> + early_boot_done = true;
> +
>   /* Do the rest non-__init'ed, we're now alive */
>   rest_init();
>  }
> -- 
> 2.17.1
> 


Re: [PATCH v3 2/7] x86/jump_label: Use text_poke_early() during early_init

2018-11-05 Thread Peter Zijlstra
On Fri, Nov 02, 2018 at 04:29:41PM -0700, Nadav Amit wrote:
> diff --git a/init/main.c b/init/main.c
> index a664246450d1..b0fa26637496 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -117,6 +117,8 @@ extern void radix_tree_init(void);
>   */
>  bool early_boot_irqs_disabled __read_mostly;
>  
> +u8 early_boot_done __read_mostly;
> +
>  enum system_states system_state __read_mostly;
>  EXPORT_SYMBOL(system_state);

Should this not be using system_state ^ ? The site is very close to
SYSTEM_SCHEDULING, can we use that or should we add another state ?

> @@ -735,6 +737,8 @@ asmlinkage __visible void __init start_kernel(void)
>   efi_free_boot_services();
>   }
>  
> + early_boot_done = true;
> +
>   /* Do the rest non-__init'ed, we're now alive */
>   rest_init();
>  }
> -- 
> 2.17.1
>