Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Sven Schnelle
Mark Rutland  writes:

> On Thu, Jan 27, 2022 at 07:46:01AM -0500, Steven Rostedt wrote:
>> On Thu, 27 Jan 2022 12:27:04 +
>> Mark Rutland  wrote:
>> 
>> > Ah, so those non-ELF relocations for the mcount_loc table just mean "apply 
>> > the
>> > KASLR offset here", which is equivalent for all entries.
>> > 
>> > That makes sense, thanks!
>> 
>> And this is why we were having such a hard time understanding each other ;-)
>
> ;)
>
> With that in mind, I think that we understand that the build-time sort works
> for:
>
> * arch/x86, becuase the non-ELF relocations for mcount_loc happen to be
>   equivalent.
>  
> * arch/arm, because there's no dynamic relocaiton and the mcount_loc entries
>   have been finalized prior to sorting.
>
> ... but doesn't work for anyone else (including arm64) because the ELF
> relocations are not equivalent, and need special care that is not yet
> implemented.

For s390 my idea is to just skip the addresses between __start_mcount_loc
and __stop_mcount_loc, because for these addresses we know that they are
64 bits wide, so we just need to add the KASLR offset.

I'm thinking about something like this:

diff --git a/arch/s390/boot/compressed/decompressor.h 
b/arch/s390/boot/compressed/decompressor.h
index f75cc31a77dd..015d7e2e94ef 100644
--- a/arch/s390/boot/compressed/decompressor.h
+++ b/arch/s390/boot/compressed/decompressor.h
@@ -25,6 +25,8 @@ struct vmlinux_info {
unsigned long rela_dyn_start;
unsigned long rela_dyn_end;
unsigned long amode31_size;
+   unsigned long start_mcount_loc;
+   unsigned long stop_mcount_loc;
 };
 
 /* Symbols defined by linker scripts */
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
index 1aa11a8f57dd..7bb0d88db5c6 100644
--- a/arch/s390/boot/startup.c
+++ b/arch/s390/boot/startup.c
@@ -88,6 +88,11 @@ static void handle_relocs(unsigned long offset)
dynsym = (Elf64_Sym *) vmlinux.dynsym_start;
for (rela = rela_start; rela < rela_end; rela++) {
loc = rela->r_offset + offset;
+   if ((loc >= vmlinux.start_mcount_loc) &&
+   (loc < vmlinux.stop_mcount_loc)) {
+   (*(unsigned long *)loc) += offset;
+   continue;
+   }
val = rela->r_addend;
r_sym = ELF64_R_SYM(rela->r_info);
if (r_sym) {
@@ -232,6 +237,8 @@ static void offset_vmlinux_info(unsigned long offset)
vmlinux.rela_dyn_start += offset;
vmlinux.rela_dyn_end += offset;
vmlinux.dynsym_start += offset;
+   vmlinux.start_mcount_loc += offset;
+   vmlinux.stop_mcount_loc += offset;
 }
 
 static unsigned long reserve_amode31(unsigned long safe_addr)
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 42c43521878f..51c773405608 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -213,6 +213,8 @@ SECTIONS
QUAD(__rela_dyn_start)  /* 
rela_dyn_start */
QUAD(__rela_dyn_end)/* rela_dyn_end 
*/
QUAD(_eamode31 - _samode31) /* amode31_size 
*/
+   QUAD(__start_mcount_loc)
+   QUAD(__stop_mcount_loc)
} :NONE
 
/* Debugging sections.  */

Not sure whether that would also work on power, and also not sure
whether i missed something thinking about that. Maybe it doesn't even
work. ;-)


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Sven Schnelle
Mark Rutland  writes:

>> Isn't x86 relocatable in some configurations (e.g. for KASLR)?
>> 
>> I can't see how the sort works for those cases, because the mcount_loc 
>> entries
>> are absolute, and either:
>> 
>> * The sorted entries will get overwritten by the unsorted relocation entries,
>>   and won't be sorted.
>> 
>> * The sorted entries won't get overwritten, but then the absolute address 
>> will
>>   be wrong since they hadn't been relocated.
>> 
>> How does that work?

>From what i've seen when looking into this ftrace sort problem x86 has a
a relocation tool, which is run before final linking: arch/x86/tools/relocs.c
This tools converts all the required relocations to three types:

- 32 bit relocations
- 64 bit relocations
- inverse 32 bit relocations

These are added to the end of the image.

The decompressor then iterates over that array, and just adds/subtracts
the KASLR offset - see arch/x86/boot/compressed/misc.c, handle_relocations()

So IMHO x86 never uses 'real' relocations during boot, and just
adds/subtracts. That's why the order stays the same, and the compile
time sort works.

/Sven


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Kees Cook
On Thu, Jan 27, 2022 at 11:46:49AM +, Mark Rutland wrote:
> I'm not sure how x86 works here; AFAICT the relocations are performed during
> decompression, but it looks like there's some special build-time processing
> associated with that, and the vmlinux doesn't contain standard ELF 
> relocations.
> 
> Kees, IIUC you added the x86_64 support there, can you shed any light on 
> if/how
> this works on x86?

I think Sven beat me to it, and this was answered in
https://lore.kernel.org/lkml/yt9dy231gzae@linux.ibm.com
but let me know if anything needs further info.

An additional note is that x86 is built with "-2G addressing"
(-mcmodel=kernel). There was some work done to make it actually
PIE, which would allow the KASLR base to move further:
https://github.com/KSPP/linux/issues/38

-Kees

-- 
Kees Cook


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Ard Biesheuvel
On Thu, 27 Jan 2022 at 15:55, Mark Rutland  wrote:
>
> On Thu, Jan 27, 2022 at 02:59:31PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 14:24, Mark Rutland  wrote:
> > >
> > > On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > > > I suppose that on arm64, we can work around this by passing
> > > > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > > > targets are prepopulated with the link time value of the respective
> > > > addresses. It does cause some bloat, which is why we disable that
> > > > today, but we could make that dependent on ftrace being enabled.
> > >
> > > We'd also need to teach the build-time sort to update the relocations, 
> > > unless
> > > you mean to also change the boot-time reloc code to RMW with the offset?
> >
> > Why would that be necessary? Every RELA entry has the same effect on
> > its target address, as it just adds a fixed offset.
>
> Currently in relocate_kernel() we generate the absolute address from the
> relocation alone, with the core of the relocation logic being as follows, with
> x9 being the pointer to a RELA entry, and x23 being the offset relative to the
> default load address:
>
> ldp x12, x13, [x9], #24
> ldr x14, [x9, #-8]
>
> add x14, x14, x23   // relocate
> str x14, [x12, x23]
>
> ... and (as per another reply), a sample RELA entry currently contains:
>
> 0x890b1ab0  // default load VA of pointer to update
> 0x0403  // R_AARCH64_RELATIVE
> 0x890b6000  // default load VA of addr to write
>
> So either:
>
> * That code stays as-is, and we must update the relocs to correspond to their
>   new sorted locations, or we'll blat the sorted values with the original
>   relocs as we do today.
>
> * The code needs to change to RMW: read the existing value, add the offset
>   (ignoring the content of the RELA entry's addend field), and write it back.
>   This is what I meant when I said "change the boot-time reloc code to RMW 
> with
>   the offset".
>
> Does that make sense, or have I misunderstood?
>

No you're right. We'd have to use different sequences here depending
on whether the relocation target is populated or not, which currently
we don't care about.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 08:55:43AM -0500, Steven Rostedt wrote:
> On Thu, 27 Jan 2022 13:33:02 +
> Mark Rutland  wrote:
> 
> > I want to get the regression fixed ASAP, so can we take a simple patch for 
> > -rc2
> > which disables the build-time sort where it's currently broken (by limiting 
> > the
> > opt-in to arm and x86), then follow-up per-architecture to re-enable it if
> > desired/safe?
> 
> I'm going to retest my patch that makes it an opt in for just x86 and arm
> (32bit). I'll be pushing that hopefully later today. I have some other
> patches to test as well.

Great; thanks!

Let me know if you'd like me to give that a spin on arm or arm64.

Thanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 02:59:31PM +0100, Ard Biesheuvel wrote:
> On Thu, 27 Jan 2022 at 14:24, Mark Rutland  wrote:
> >
> > On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > > I suppose that on arm64, we can work around this by passing
> > > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > > targets are prepopulated with the link time value of the respective
> > > addresses. It does cause some bloat, which is why we disable that
> > > today, but we could make that dependent on ftrace being enabled.
> >
> > We'd also need to teach the build-time sort to update the relocations, 
> > unless
> > you mean to also change the boot-time reloc code to RMW with the offset?
> 
> Why would that be necessary? Every RELA entry has the same effect on
> its target address, as it just adds a fixed offset.

Currently in relocate_kernel() we generate the absolute address from the
relocation alone, with the core of the relocation logic being as follows, with
x9 being the pointer to a RELA entry, and x23 being the offset relative to the
default load address:

ldp x12, x13, [x9], #24
ldr x14, [x9, #-8]

add x14, x14, x23   // relocate
str x14, [x12, x23]

... and (as per another reply), a sample RELA entry currently contains:

0x890b1ab0  // default load VA of pointer to update
0x0403  // R_AARCH64_RELATIVE
0x890b6000  // default load VA of addr to write

So either:

* That code stays as-is, and we must update the relocs to correspond to their
  new sorted locations, or we'll blat the sorted values with the original
  relocs as we do today.

* The code needs to change to RMW: read the existing value, add the offset
  (ignoring the content of the RELA entry's addend field), and write it back.
  This is what I meant when I said "change the boot-time reloc code to RMW with
  the offset".

Does that make sense, or have I misunderstood?

Thanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Ard Biesheuvel
On Thu, 27 Jan 2022 at 14:24, Mark Rutland  wrote:
>
> On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 13:59, Mark Rutland  wrote:
> > >
> > > On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 27 Jan 2022 at 13:20, Mark Rutland  wrote:
> > > > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > > > >
> > > > > > These architectures use place-relative extables for the same reason:
> > > > > > place relative references are resolved at build time rather than at
> > > > > > runtime during relocation, making a build time sort feasible.
> > > > > >
> > > > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > >
> > > > > > Note that the swap routine becomes something like the below, given
> > > > > > that the relative references need to be fixed up after the entry
> > > > > > changes place in the sorted list.
> > > > > >
> > > > > > static void swap_ex(void *a, void *b, int size)
> > > > > > {
> > > > > > struct exception_table_entry *x = a, *y = b, tmp;
> > > > > > int delta = b - a;
> > > > > >
> > > > > > tmp = *x;
> > > > > > x->insn = y->insn + delta;
> > > > > > y->insn = tmp.insn - delta;
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > As a bonus, the resulting footprint of the table in the image is
> > > > > > reduced by 8x, given that every 8 byte pointer has an accompanying 
> > > > > > 24
> > > > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call 
> > > > > > to
> > > > > > __gnu_mcount_mc.
> > > > >
> > > > > Absolutely -- it'd be great if we could do that for the callsite 
> > > > > locations; the
> > > > > difficulty is that the entries are generated by the compiler itself, 
> > > > > so we'd
> > > > > either need some build/link time processing to convert each absolute 
> > > > > 64-bit
> > > > > value to a relative 32-bit offset, or new compiler options to 
> > > > > generate those as
> > > > > relative offsets from the outset.
> > > >
> > > > Don't we use scripts/recordmcount.pl for that?
> > >
> > > Not quite -- we could adjust it to do so, but today it doesn't consider
> > > existing mcount_loc entries, and just generates new ones where the 
> > > compiler has
> > > generated calls to mcount, which it finds by scanning the instructions in 
> > > the
> > > binary. Today it is not used:
> > >
> > > * On arm64 when we default to using `-fpatchable-function-entry=N`.  That 
> > > makes
> > >   the compiler insert 2 NOPs in the function prologue, and log the 
> > > location of
> > >   that NOP sled to a section called.  `__patchable_function_entries`.
> > >
> > >   We need the compiler to do that since we can't reliably identify 2 NOPs 
> > > in a
> > >   function prologue as being intended to be a patch site, as e.g. there 
> > > could
> > >   be notrace functions where the compiler had to insert NOPs for 
> > > alignment of a
> > >   subsequent brnach or similar.
> > >
> > > * On architectures with `-nop-mcount`. On these, it's necessary to use
> > >   `-mrecord-mcount` to have the compiler log the patch-site, for the same
> > >   reason as with `-fpatchable-function-entry`.
> > >
> > > * On architectures with `-mrecord-mcount` generally, since this avoids the
> > >   overhead of scanning each object.
> > >
> > > * On x86 when objtool is used.
> > >
> >
> > Right.
> >
> > I suppose that on arm64, we can work around this by passing
> > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > targets are prepopulated with the link time value of the respective
> > addresses. It does cause some bloat, which is why we disable that
> > today, but we could make that dependent on ftrace being enabled.
>
> We'd also need to teach the build-time sort to update the relocations, unless
> you mean to also change the boot-time reloc code to RMW with the offset?
>

Why would that be necessary? Every RELA entry has the same effect on
its target address, as it just adds a fixed offset.

> I think for right now the best thing is to disable the build-time sort for
> arm64, but maybe something like that is the right thing to do longer term.
>

Fair enough.

> > I do wonder how much over head we accumulate, though, by having all
> > these relocations, but I suppose that is the situation today in any
> > case.
>
> Yeah; I suspect if we want to do something about that we 

Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Steven Rostedt
On Thu, 27 Jan 2022 13:33:02 +
Mark Rutland  wrote:

> I want to get the regression fixed ASAP, so can we take a simple patch for 
> -rc2
> which disables the build-time sort where it's currently broken (by limiting 
> the
> opt-in to arm and x86), then follow-up per-architecture to re-enable it if
> desired/safe?

I'm going to retest my patch that makes it an opt in for just x86 and arm
(32bit). I'll be pushing that hopefully later today. I have some other
patches to test as well.

-- Steve


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 02:16:31PM +0100, Sven Schnelle wrote:
> Mark Rutland  writes:
> 
> > On Thu, Jan 27, 2022 at 07:46:01AM -0500, Steven Rostedt wrote:
> >> On Thu, 27 Jan 2022 12:27:04 +
> >> Mark Rutland  wrote:
> >> 
> >> > Ah, so those non-ELF relocations for the mcount_loc table just mean 
> >> > "apply the
> >> > KASLR offset here", which is equivalent for all entries.
> >> > 
> >> > That makes sense, thanks!
> >> 
> >> And this is why we were having such a hard time understanding each other 
> >> ;-)
> >
> > ;)
> >
> > With that in mind, I think that we understand that the build-time sort works
> > for:
> >
> > * arch/x86, becuase the non-ELF relocations for mcount_loc happen to be
> >   equivalent.
> >  
> > * arch/arm, because there's no dynamic relocaiton and the mcount_loc entries
> >   have been finalized prior to sorting.
> >
> > ... but doesn't work for anyone else (including arm64) because the ELF
> > relocations are not equivalent, and need special care that is not yet
> > implemented.
> 
> For s390 my idea is to just skip the addresses between __start_mcount_loc
> and __stop_mcount_loc, because for these addresses we know that they are
> 64 bits wide, so we just need to add the KASLR offset.
> 
> I'm thinking about something like this:
> 
> diff --git a/arch/s390/boot/compressed/decompressor.h 
> b/arch/s390/boot/compressed/decompressor.h
> index f75cc31a77dd..015d7e2e94ef 100644
> --- a/arch/s390/boot/compressed/decompressor.h
> +++ b/arch/s390/boot/compressed/decompressor.h
> @@ -25,6 +25,8 @@ struct vmlinux_info {
>   unsigned long rela_dyn_start;
>   unsigned long rela_dyn_end;
>   unsigned long amode31_size;
> + unsigned long start_mcount_loc;
> + unsigned long stop_mcount_loc;
>  };
>  
>  /* Symbols defined by linker scripts */
> diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c
> index 1aa11a8f57dd..7bb0d88db5c6 100644
> --- a/arch/s390/boot/startup.c
> +++ b/arch/s390/boot/startup.c
> @@ -88,6 +88,11 @@ static void handle_relocs(unsigned long offset)
>   dynsym = (Elf64_Sym *) vmlinux.dynsym_start;
>   for (rela = rela_start; rela < rela_end; rela++) {
>   loc = rela->r_offset + offset;
> + if ((loc >= vmlinux.start_mcount_loc) &&
> + (loc < vmlinux.stop_mcount_loc)) {
> + (*(unsigned long *)loc) += offset;
> + continue;
> + }
>   val = rela->r_addend;
>   r_sym = ELF64_R_SYM(rela->r_info);
>   if (r_sym) {
> @@ -232,6 +237,8 @@ static void offset_vmlinux_info(unsigned long offset)
>   vmlinux.rela_dyn_start += offset;
>   vmlinux.rela_dyn_end += offset;
>   vmlinux.dynsym_start += offset;
> + vmlinux.start_mcount_loc += offset;
> + vmlinux.stop_mcount_loc += offset;
>  }
>  
>  static unsigned long reserve_amode31(unsigned long safe_addr)
> diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
> index 42c43521878f..51c773405608 100644
> --- a/arch/s390/kernel/vmlinux.lds.S
> +++ b/arch/s390/kernel/vmlinux.lds.S
> @@ -213,6 +213,8 @@ SECTIONS
>   QUAD(__rela_dyn_start)  /* 
> rela_dyn_start */
>   QUAD(__rela_dyn_end)/* rela_dyn_end 
> */
>   QUAD(_eamode31 - _samode31) /* amode31_size 
> */
> + QUAD(__start_mcount_loc)
> + QUAD(__stop_mcount_loc)
>   } :NONE
>  
>   /* Debugging sections.  */
> 
> Not sure whether that would also work on power, and also not sure
> whether i missed something thinking about that. Maybe it doesn't even
> work. ;-)

I don't know enough about s390 or powerpc relocs to say whether that works, but
I can say that approach isn't going to work for arm64 without other signficant
changes.

I want to get the regression fixed ASAP, so can we take a simple patch for -rc2
which disables the build-time sort where it's currently broken (by limiting the
opt-in to arm and x86), then follow-up per-architecture to re-enable it if
desired/safe?

Thanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> On Thu, 27 Jan 2022 at 13:59, Mark Rutland  wrote:
> >
> > On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > > On Thu, 27 Jan 2022 at 13:20, Mark Rutland  wrote:
> > > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > > >
> > > > > These architectures use place-relative extables for the same reason:
> > > > > place relative references are resolved at build time rather than at
> > > > > runtime during relocation, making a build time sort feasible.
> > > > >
> > > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > >
> > > > > Note that the swap routine becomes something like the below, given
> > > > > that the relative references need to be fixed up after the entry
> > > > > changes place in the sorted list.
> > > > >
> > > > > static void swap_ex(void *a, void *b, int size)
> > > > > {
> > > > > struct exception_table_entry *x = a, *y = b, tmp;
> > > > > int delta = b - a;
> > > > >
> > > > > tmp = *x;
> > > > > x->insn = y->insn + delta;
> > > > > y->insn = tmp.insn - delta;
> > > > > ...
> > > > > }
> > > > >
> > > > > As a bonus, the resulting footprint of the table in the image is
> > > > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > > > __gnu_mcount_mc.
> > > >
> > > > Absolutely -- it'd be great if we could do that for the callsite 
> > > > locations; the
> > > > difficulty is that the entries are generated by the compiler itself, so 
> > > > we'd
> > > > either need some build/link time processing to convert each absolute 
> > > > 64-bit
> > > > value to a relative 32-bit offset, or new compiler options to generate 
> > > > those as
> > > > relative offsets from the outset.
> > >
> > > Don't we use scripts/recordmcount.pl for that?
> >
> > Not quite -- we could adjust it to do so, but today it doesn't consider
> > existing mcount_loc entries, and just generates new ones where the compiler 
> > has
> > generated calls to mcount, which it finds by scanning the instructions in 
> > the
> > binary. Today it is not used:
> >
> > * On arm64 when we default to using `-fpatchable-function-entry=N`.  That 
> > makes
> >   the compiler insert 2 NOPs in the function prologue, and log the location 
> > of
> >   that NOP sled to a section called.  `__patchable_function_entries`.
> >
> >   We need the compiler to do that since we can't reliably identify 2 NOPs 
> > in a
> >   function prologue as being intended to be a patch site, as e.g. there 
> > could
> >   be notrace functions where the compiler had to insert NOPs for alignment 
> > of a
> >   subsequent brnach or similar.
> >
> > * On architectures with `-nop-mcount`. On these, it's necessary to use
> >   `-mrecord-mcount` to have the compiler log the patch-site, for the same
> >   reason as with `-fpatchable-function-entry`.
> >
> > * On architectures with `-mrecord-mcount` generally, since this avoids the
> >   overhead of scanning each object.
> >
> > * On x86 when objtool is used.
> >
> 
> Right.
> 
> I suppose that on arm64, we can work around this by passing
> --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> targets are prepopulated with the link time value of the respective
> addresses. It does cause some bloat, which is why we disable that
> today, but we could make that dependent on ftrace being enabled.

We'd also need to teach the build-time sort to update the relocations, unless
you mean to also change the boot-time reloc code to RMW with the offset?

I think for right now the best thing is to disable the build-time sort for
arm64, but maybe something like that is the right thing to do longer term.

> I do wonder how much over head we accumulate, though, by having all
> these relocations, but I suppose that is the situation today in any
> case.

Yeah; I suspect if we want to do something about that we want to do it more
generally, and would probably need to do something like the x86 approach and
rewrite the relocs at build-time to something more compressed. If we applied
the dynamic relocs with the link-time address we'd only need 4 bytes to
identify each pointer to apply an offset to.

I'm not exactly sure how we could do that, nor what the trade-off look like in
practice.

THanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 07:46:01AM -0500, Steven Rostedt wrote:
> On Thu, 27 Jan 2022 12:27:04 +
> Mark Rutland  wrote:
> 
> > Ah, so those non-ELF relocations for the mcount_loc table just mean "apply 
> > the
> > KASLR offset here", which is equivalent for all entries.
> > 
> > That makes sense, thanks!
> 
> And this is why we were having such a hard time understanding each other ;-)

;)

With that in mind, I think that we understand that the build-time sort works
for:

* arch/x86, becuase the non-ELF relocations for mcount_loc happen to be
  equivalent.
 
* arch/arm, because there's no dynamic relocaiton and the mcount_loc entries
  have been finalized prior to sorting.

... but doesn't work for anyone else (including arm64) because the ELF
relocations are not equivalent, and need special care that is not yet
implemented.

So we should have arm and x86 opt-in, but for now everyone else (including
arm64, powerpc, s390) should be left with the prior behaviour with the runtime
sort only (in case the build-time sort breaks anything, as I mentioned in my
other mail).

Does that sound about right?

In future we might be able to do something much smarter (e.g. adding some
preprocessing to use relative entries).

I'll take a look at shelf. :)

Thanks,
Mark.

> I started a new project called "shelf", which is a shell interface to
> read ELF files (Shelf on a ELF!).
> 
> It uses my ccli library:
> 
>https://github.com/rostedt/libccli
> 
> and can be found here:
> 
>https://github.com/rostedt/shelf
> 
> Build and install the latest libccli and then build this with just
> "make".
> 
>   $ shelf vmlinux
> 
> and then you can see what is stored in the mcount location:
> 
>   shelf> dump symbol __start_mcount_loc - __stop_mcount_loc
> 
> I plan on adding more to include the REL and RELA sections and show how
> they affect symbols and such.
> 
> Feel free to contribute too ;-)
> 
> -- Steve


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Ard Biesheuvel
On Thu, 27 Jan 2022 at 13:59, Mark Rutland  wrote:
>
> On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 13:20, Mark Rutland  wrote:
> > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > >
> > > > These architectures use place-relative extables for the same reason:
> > > > place relative references are resolved at build time rather than at
> > > > runtime during relocation, making a build time sort feasible.
> > > >
> > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > >
> > > > Note that the swap routine becomes something like the below, given
> > > > that the relative references need to be fixed up after the entry
> > > > changes place in the sorted list.
> > > >
> > > > static void swap_ex(void *a, void *b, int size)
> > > > {
> > > > struct exception_table_entry *x = a, *y = b, tmp;
> > > > int delta = b - a;
> > > >
> > > > tmp = *x;
> > > > x->insn = y->insn + delta;
> > > > y->insn = tmp.insn - delta;
> > > > ...
> > > > }
> > > >
> > > > As a bonus, the resulting footprint of the table in the image is
> > > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > > __gnu_mcount_mc.
> > >
> > > Absolutely -- it'd be great if we could do that for the callsite 
> > > locations; the
> > > difficulty is that the entries are generated by the compiler itself, so 
> > > we'd
> > > either need some build/link time processing to convert each absolute 
> > > 64-bit
> > > value to a relative 32-bit offset, or new compiler options to generate 
> > > those as
> > > relative offsets from the outset.
> >
> > Don't we use scripts/recordmcount.pl for that?
>
> Not quite -- we could adjust it to do so, but today it doesn't consider
> existing mcount_loc entries, and just generates new ones where the compiler 
> has
> generated calls to mcount, which it finds by scanning the instructions in the
> binary. Today it is not used:
>
> * On arm64 when we default to using `-fpatchable-function-entry=N`.  That 
> makes
>   the compiler insert 2 NOPs in the function prologue, and log the location of
>   that NOP sled to a section called.  `__patchable_function_entries`.
>
>   We need the compiler to do that since we can't reliably identify 2 NOPs in a
>   function prologue as being intended to be a patch site, as e.g. there could
>   be notrace functions where the compiler had to insert NOPs for alignment of 
> a
>   subsequent brnach or similar.
>
> * On architectures with `-nop-mcount`. On these, it's necessary to use
>   `-mrecord-mcount` to have the compiler log the patch-site, for the same
>   reason as with `-fpatchable-function-entry`.
>
> * On architectures with `-mrecord-mcount` generally, since this avoids the
>   overhead of scanning each object.
>
> * On x86 when objtool is used.
>

Right.

I suppose that on arm64, we can work around this by passing
--apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
targets are prepopulated with the link time value of the respective
addresses. It does cause some bloat, which is why we disable that
today, but we could make that dependent on ftrace being enabled.

I do wonder how much over head we accumulate, though, by having all
these relocations, but I suppose that is the situation today in any
case.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> On Thu, 27 Jan 2022 at 13:20, Mark Rutland  wrote:
> > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> >
> > > These architectures use place-relative extables for the same reason:
> > > place relative references are resolved at build time rather than at
> > > runtime during relocation, making a build time sort feasible.
> > >
> > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > >
> > > Note that the swap routine becomes something like the below, given
> > > that the relative references need to be fixed up after the entry
> > > changes place in the sorted list.
> > >
> > > static void swap_ex(void *a, void *b, int size)
> > > {
> > > struct exception_table_entry *x = a, *y = b, tmp;
> > > int delta = b - a;
> > >
> > > tmp = *x;
> > > x->insn = y->insn + delta;
> > > y->insn = tmp.insn - delta;
> > > ...
> > > }
> > >
> > > As a bonus, the resulting footprint of the table in the image is
> > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > __gnu_mcount_mc.
> >
> > Absolutely -- it'd be great if we could do that for the callsite locations; 
> > the
> > difficulty is that the entries are generated by the compiler itself, so we'd
> > either need some build/link time processing to convert each absolute 64-bit
> > value to a relative 32-bit offset, or new compiler options to generate 
> > those as
> > relative offsets from the outset.
> 
> Don't we use scripts/recordmcount.pl for that?

Not quite -- we could adjust it to do so, but today it doesn't consider
existing mcount_loc entries, and just generates new ones where the compiler has
generated calls to mcount, which it finds by scanning the instructions in the
binary. Today it is not used:

* On arm64 when we default to using `-fpatchable-function-entry=N`.  That makes
  the compiler insert 2 NOPs in the function prologue, and log the location of
  that NOP sled to a section called.  `__patchable_function_entries`. 

  We need the compiler to do that since we can't reliably identify 2 NOPs in a
  function prologue as being intended to be a patch site, as e.g. there could
  be notrace functions where the compiler had to insert NOPs for alignment of a
  subsequent brnach or similar.

* On architectures with `-nop-mcount`. On these, it's necessary to use
  `-mrecord-mcount` to have the compiler log the patch-site, for the same
  reason as with `-fpatchable-function-entry`.

* On architectures with `-mrecord-mcount` generally, since this avoids the
  overhead of scanning each object.

* On x86 when objtool is used.

Thanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Steven Rostedt
On Thu, 27 Jan 2022 12:27:04 +
Mark Rutland  wrote:

> Ah, so those non-ELF relocations for the mcount_loc table just mean "apply the
> KASLR offset here", which is equivalent for all entries.
> 
> That makes sense, thanks!

And this is why we were having such a hard time understanding each other ;-)

I started a new project called "shelf", which is a shell interface to
read ELF files (Shelf on a ELF!).

It uses my ccli library:

   https://github.com/rostedt/libccli

and can be found here:

   https://github.com/rostedt/shelf

Build and install the latest libccli and then build this with just
"make".

  $ shelf vmlinux

and then you can see what is stored in the mcount location:

  shelf> dump symbol __start_mcount_loc - __stop_mcount_loc

I plan on adding more to include the REL and RELA sections and show how
they affect symbols and such.

Feel free to contribute too ;-)

-- Steve


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 01:04:41PM +0100, Sven Schnelle wrote:
> Mark Rutland  writes:
> 
> >> Isn't x86 relocatable in some configurations (e.g. for KASLR)?
> >> 
> >> I can't see how the sort works for those cases, because the mcount_loc 
> >> entries
> >> are absolute, and either:
> >> 
> >> * The sorted entries will get overwritten by the unsorted relocation 
> >> entries,
> >>   and won't be sorted.
> >> 
> >> * The sorted entries won't get overwritten, but then the absolute address 
> >> will
> >>   be wrong since they hadn't been relocated.
> >> 
> >> How does that work?
> 
> From what i've seen when looking into this ftrace sort problem x86 has a
> a relocation tool, which is run before final linking: arch/x86/tools/relocs.c
> This tools converts all the required relocations to three types:
> 
> - 32 bit relocations
> - 64 bit relocations
> - inverse 32 bit relocations
> 
> These are added to the end of the image.
> 
> The decompressor then iterates over that array, and just adds/subtracts
> the KASLR offset - see arch/x86/boot/compressed/misc.c, handle_relocations()
> 
> So IMHO x86 never uses 'real' relocations during boot, and just
> adds/subtracts. That's why the order stays the same, and the compile
> time sort works.

Ah, so those non-ELF relocations for the mcount_loc table just mean "apply the
KASLR offset here", which is equivalent for all entries.

That makes sense, thanks!

Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Ard Biesheuvel
On Thu, 27 Jan 2022 at 13:20, Mark Rutland  wrote:
>
> On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 12:47, Mark Rutland  wrote:
> > >
> > > [adding LKML so this is easier for others to find]
> > >
> > > If anyone wants to follow the thread from the start, it's at:
> > >
> > >   
> > > https://lore.kernel.org/linuxppc-dev/944d10da-8200-4ba9-8d0a-3bed9aa99...@linux.ibm.com/
> > >
> > > Ard, I was under the impression that the 32-bit arm kernel was (virtually)
> > > relocatable, but I couldn't spot where, and suspect I'm mistaken. Do you 
> > > know
> > > whether it currently does any boot-time dynamic relocation?
> >
> > No, it does not.
>
> Thanks for comfirming!
>
> So 32-bit arm should be able to opt into the build-time sort as-is.
>
> > > Steve asked for a bit more detail on IRC, so the below is an attempt to 
> > > explain
> > > what's actually going on here.
> > >
> > > The short answer is that relocatable kernels (e.g. those with KASLR 
> > > support)
> > > need to handle the kernel being loaded at (somewhat) arbitrary virtual
> > > addresses. Even where code can be position-independent, any pointers in 
> > > the
> > > kernel image (e.g. the contents of the mcount_loc table) need to be 
> > > updated to
> > > account for the specific VA the kernel was loaded at -- arch code does 
> > > this
> > > early at boot time by applying dynamic (ELF) relocations.
> >
> > These architectures use place-relative extables for the same reason:
> > place relative references are resolved at build time rather than at
> > runtime during relocation, making a build time sort feasible.
> >
> > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> >
> > Note that the swap routine becomes something like the below, given
> > that the relative references need to be fixed up after the entry
> > changes place in the sorted list.
> >
> > static void swap_ex(void *a, void *b, int size)
> > {
> > struct exception_table_entry *x = a, *y = b, tmp;
> > int delta = b - a;
> >
> > tmp = *x;
> > x->insn = y->insn + delta;
> > y->insn = tmp.insn - delta;
> > ...
> > }
> >
> > As a bonus, the resulting footprint of the table in the image is
> > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > __gnu_mcount_mc.
>
> Absolutely -- it'd be great if we could do that for the callsite locations; 
> the
> difficulty is that the entries are generated by the compiler itself, so we'd
> either need some build/link time processing to convert each absolute 64-bit
> value to a relative 32-bit offset, or new compiler options to generate those 
> as
> relative offsets from the outset.
>

Don't we use scripts/recordmcount.pl for that?


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> On Thu, 27 Jan 2022 at 12:47, Mark Rutland  wrote:
> >
> > [adding LKML so this is easier for others to find]
> >
> > If anyone wants to follow the thread from the start, it's at:
> >
> >   
> > https://lore.kernel.org/linuxppc-dev/944d10da-8200-4ba9-8d0a-3bed9aa99...@linux.ibm.com/
> >
> > Ard, I was under the impression that the 32-bit arm kernel was (virtually)
> > relocatable, but I couldn't spot where, and suspect I'm mistaken. Do you 
> > know
> > whether it currently does any boot-time dynamic relocation?
> 
> No, it does not.

Thanks for comfirming!

So 32-bit arm should be able to opt into the build-time sort as-is.

> > Steve asked for a bit more detail on IRC, so the below is an attempt to 
> > explain
> > what's actually going on here.
> >
> > The short answer is that relocatable kernels (e.g. those with KASLR support)
> > need to handle the kernel being loaded at (somewhat) arbitrary virtual
> > addresses. Even where code can be position-independent, any pointers in the
> > kernel image (e.g. the contents of the mcount_loc table) need to be updated 
> > to
> > account for the specific VA the kernel was loaded at -- arch code does this
> > early at boot time by applying dynamic (ELF) relocations.
> 
> These architectures use place-relative extables for the same reason:
> place relative references are resolved at build time rather than at
> runtime during relocation, making a build time sort feasible.
> 
> arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> 
> Note that the swap routine becomes something like the below, given
> that the relative references need to be fixed up after the entry
> changes place in the sorted list.
> 
> static void swap_ex(void *a, void *b, int size)
> {
> struct exception_table_entry *x = a, *y = b, tmp;
> int delta = b - a;
> 
> tmp = *x;
> x->insn = y->insn + delta;
> y->insn = tmp.insn - delta;
> ...
> }
> 
> As a bonus, the resulting footprint of the table in the image is
> reduced by 8x, given that every 8 byte pointer has an accompanying 24
> byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> __gnu_mcount_mc.

Absolutely -- it'd be great if we could do that for the callsite locations; the
difficulty is that the entries are generated by the compiler itself, so we'd
either need some build/link time processing to convert each absolute 64-bit
value to a relative 32-bit offset, or new compiler options to generate those as
relative offsets from the outset.

Thanks,
Mark.


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Ard Biesheuvel
On Thu, 27 Jan 2022 at 12:47, Mark Rutland  wrote:
>
> [adding LKML so this is easier for others to find]
>
> If anyone wants to follow the thread from the start, it's at:
>
>   
> https://lore.kernel.org/linuxppc-dev/944d10da-8200-4ba9-8d0a-3bed9aa99...@linux.ibm.com/
>
> Ard, I was under the impression that the 32-bit arm kernel was (virtually)
> relocatable, but I couldn't spot where, and suspect I'm mistaken. Do you know
> whether it currently does any boot-time dynamic relocation?
>

No, it does not.

..
> > Steve pointed me at this thread over IRC -- I'm not subscribed to this list 
> > so
> > grabbed a copy of the thread thus far via b4.
> >
> > On Tue, Jan 25, 2022 at 11:20:27AM +0800, Yinan Liu wrote:
> > > > Yeah, I think it's time to opt in, instead of opting out.
> >
> > I agree this must be opt-in rather than opt-out.
> >
> > However, I think most architectures were broken (in at least some
> > configurations) by commit:
> >
> >   72b3942a173c387b ("scripts: ftrace - move the sort-processing in 
> > ftrace_init")
> >
> > ... and so I don't think this fix is correct as-is, and we might want to 
> > revert
> > that or at least mark is as BROKEN for now.
>
> Steve asked for a bit more detail on IRC, so the below is an attempt to 
> explain
> what's actually going on here.
>
> The short answer is that relocatable kernels (e.g. those with KASLR support)
> need to handle the kernel being loaded at (somewhat) arbitrary virtual
> addresses. Even where code can be position-independent, any pointers in the
> kernel image (e.g. the contents of the mcount_loc table) need to be updated to
> account for the specific VA the kernel was loaded at -- arch code does this
> early at boot time by applying dynamic (ELF) relocations.
>

These architectures use place-relative extables for the same reason:
place relative references are resolved at build time rather than at
runtime during relocation, making a build time sort feasible.

arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE

Note that the swap routine becomes something like the below, given
that the relative references need to be fixed up after the entry
changes place in the sorted list.

static void swap_ex(void *a, void *b, int size)
{
struct exception_table_entry *x = a, *y = b, tmp;
int delta = b - a;

tmp = *x;
x->insn = y->insn + delta;
y->insn = tmp.insn - delta;
...
}

As a bonus, the resulting footprint of the table in the image is
reduced by 8x, given that every 8 byte pointer has an accompanying 24
byte RELA record, so we go from 32 bytes to 4 bytes for every call to
__gnu_mcount_mc.



> Walking through how we get there, considering arm64 specifically:
>
> 1) When an object is created with traceable functions:
>
>The compiler records the addresses of the callsites into a section. Those
>are absolute virtual addresses, but the final virtual addresses are not yet
>known, so the compiler generates ELF relocations to tell the linker how to
>fill these in later.
>
>On arm64, since the compiler doesn't know the final value yet, it fills the
>actual values with zero for now. Other architectures might do differently.
>
>For example, per `objdump -r init/main.o`:
>
>| RELOCATION RECORDS FOR [__patchable_function_entries]:
>| OFFSET   TYPE  VALUE
>|  R_AARCH64_ABS64   .text+0x0028
>| 0008 R_AARCH64_ABS64   .text+0x0088
>| 0010 R_AARCH64_ABS64   .text+0x00e8
>
> 2) When vmlinux is linked:
>
>The linker script accumulates the callsite pointers from all the object
>files into the mcount_loc table. Since the kernel is relocatable, the
>runtime absolute addresses are still not yet known, but the offset relative
>to the kernel base is known, and so the linker consumes the absolute
>relocations created by the compiler and generates new relocations relative
>to the kernel's default load address so that these can be adjusted at boot
>time.
>
>On arm64, those are RELA and/or RELR relocations, which our vmlinux.lds.S
>accumulates those into a location in the initdata section that the kernel
>can find at boot time:
>
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/vmlinux.lds.S?h=v5.17-rc1#n252
>
>For example, per `objdump -s vmlinux -j .rela.dyn`:
>
>| vmlinux: file format elf64-littleaarch64
>|

Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-27 Thread Mark Rutland
[adding LKML so this is easier for others to find]

If anyone wants to follow the thread from the start, it's at:

  
https://lore.kernel.org/linuxppc-dev/944d10da-8200-4ba9-8d0a-3bed9aa99...@linux.ibm.com/

Ard, I was under the impression that the 32-bit arm kernel was (virtually)
relocatable, but I couldn't spot where, and suspect I'm mistaken. Do you know
whether it currently does any boot-time dynamic relocation?

Kees, there's an x86_64 relocation question for you at the end.

On Wed, Jan 26, 2022 at 02:37:16PM +, Mark Rutland wrote:
> Hi,
> 
> Steve pointed me at this thread over IRC -- I'm not subscribed to this list so
> grabbed a copy of the thread thus far via b4.
> 
> On Tue, Jan 25, 2022 at 11:20:27AM +0800, Yinan Liu wrote:
> > > Yeah, I think it's time to opt in, instead of opting out.
> 
> I agree this must be opt-in rather than opt-out.
> 
> However, I think most architectures were broken (in at least some
> configurations) by commit:
> 
>   72b3942a173c387b ("scripts: ftrace - move the sort-processing in 
> ftrace_init")
> 
> ... and so I don't think this fix is correct as-is, and we might want to 
> revert
> that or at least mark is as BROKEN for now.

Steve asked for a bit more detail on IRC, so the below is an attempt to explain
what's actually going on here.

The short answer is that relocatable kernels (e.g. those with KASLR support)
need to handle the kernel being loaded at (somewhat) arbitrary virtual
addresses. Even where code can be position-independent, any pointers in the
kernel image (e.g. the contents of the mcount_loc table) need to be updated to
account for the specific VA the kernel was loaded at -- arch code does this
early at boot time by applying dynamic (ELF) relocations.

Walking through how we get there, considering arm64 specifically:

1) When an object is created with traceable functions:

   The compiler records the addresses of the callsites into a section. Those
   are absolute virtual addresses, but the final virtual addresses are not yet
   known, so the compiler generates ELF relocations to tell the linker how to
   fill these in later.

   On arm64, since the compiler doesn't know the final value yet, it fills the
   actual values with zero for now. Other architectures might do differently.

   For example, per `objdump -r init/main.o`:

   | RELOCATION RECORDS FOR [__patchable_function_entries]:
   | OFFSET   TYPE  VALUE 
   |  R_AARCH64_ABS64   .text+0x0028
   | 0008 R_AARCH64_ABS64   .text+0x0088
   | 0010 R_AARCH64_ABS64   .text+0x00e8

2) When vmlinux is linked:

   The linker script accumulates the callsite pointers from all the object
   files into the mcount_loc table. Since the kernel is relocatable, the
   runtime absolute addresses are still not yet known, but the offset relative
   to the kernel base is known, and so the linker consumes the absolute
   relocations created by the compiler and generates new relocations relative
   to the kernel's default load address so that these can be adjusted at boot
   time.

   On arm64, those are RELA and/or RELR relocations, which our vmlinux.lds.S
   accumulates those into a location in the initdata section that the kernel
   can find at boot time:

 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/vmlinux.lds.S?h=v5.17-rc1#n252
 
   For example, per `objdump -s vmlinux -j .rela.dyn`:

   | vmlinux: file format elf64-littleaarch64
   | 
   | Contents of section .rela.dyn:
   |  89beb0d0 b01a0b09 0080 0304   
   |  89beb0e0 00600b09 0080 b81a0b09 0080  .`..
   |  89beb0f0 0304  80710b09 0080  .q..
   |  89beb100 e8670b09 0080 0304   .g..
   |  89beb110 48e60809 0080 f0670b09 0080  Hg..
   |  89beb120 0304  ec190b09 0080  

   Each of the relocations in .rela.dyn consists of 3 64-bit little-endian
   values:

   * The first (e.g. 0x890b1ab0) is the VA of the pointer to write to,
 assuming the kernel's default load address (e.g. 0x8800). An
 offset must be applied to this depending on where the kernel was actually
 loaded relative to that default load address.

   * The second (e.g. 0x0403) is the ELF relocation type (1027, AKA
 R_AARCH64_RELATIVE).

   * The third, (e.g. 0x890b6000) is the VA to write to the pointer,
 assuming the kernel's default load address (e.g. 0x8800). An
 offset must be applied to this depending on where the kernel was actually
 loaded relative to that default load address.

   The AArch64 ELF spec defines our relocations, e.g.

 
https://github.com/ARM-software/abi-aa/blob/main/aaelf64/aaelf64.rst#5712dynamic-relocations

   In general, relocations 

Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-26 Thread Mark Rutland
Hi,

Steve pointed me at this thread over IRC -- I'm not subscribed to this list so
grabbed a copy of the thread thus far via b4.

On Tue, Jan 25, 2022 at 11:20:27AM +0800, Yinan Liu wrote:
> > Yeah, I think it's time to opt in, instead of opting out.

I agree this must be opt-in rather than opt-out.

However, I think most architectures were broken (in at least some
configurations) by commit:

  72b3942a173c387b ("scripts: ftrace - move the sort-processing in ftrace_init")

... and so I don't think this fix is correct as-is, and we might want to revert
that or at least mark is as BROKEN for now.

More on that below.

> > 
> > Something like this:
> > 
> > -- Steve
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index c2724d986fa0..5256ebe57451 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -82,6 +82,7 @@ config ARM
> > select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
> > select HAVE_CONTEXT_TRACKING
> > select HAVE_C_RECORDMCOUNT
> > +   select HAVE_BUILDTIME_MCOUNT_SORT
> > select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
> > select HAVE_DMA_CONTIGUOUS if MMU
> > select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU

IIUC the 32-bit arm kernel can be relocated at boot time, so I don't believe
this is correct, and I believe any relocatable arm kernel has been broken since
htat was introduced.

> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index c4207cf9bb17..7996548b2b27 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -166,6 +166,7 @@ config ARM64
> > select HAVE_ASM_MODVERSIONS
> > select HAVE_EBPF_JIT
> > select HAVE_C_RECORDMCOUNT
> > +   select HAVE_BUILDTIME_MCOUNT_SORT
> > select HAVE_CMPXCHG_DOUBLE
> > select HAVE_CMPXCHG_LOCAL
> > select HAVE_CONTEXT_TRACKING

The arm64 kernel is relocatable by default, and has been broken since the
build-time sort was introduced -- I see ftrace test failures, and the
CONFIG_FTRACE_SORT_STARTUP_TEST screams at boot time.

> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 7399327d1eff..46080dea5dba 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -186,6 +186,7 @@ config X86
> > select HAVE_CONTEXT_TRACKING_OFFSTACK   if HAVE_CONTEXT_TRACKING
> > select HAVE_C_RECORDMCOUNT
> > select HAVE_OBJTOOL_MCOUNT  if STACK_VALIDATION
> > +   select HAVE_BUILDTIME_MCOUNT_SORT

Isn't x86 relocatable in some configurations (e.g. for KASLR)?

I can't see how the sort works for those cases, because the mcount_loc entries
are absolute, and either:

* The sorted entries will get overwritten by the unsorted relocation entries,
  and won't be sorted.

* The sorted entries won't get overwritten, but then the absolute address will
  be wrong since they hadn't been relocated.

How does that work?

Thanks,
Mark.

> > select HAVE_DEBUG_KMEMLEAK
> > select HAVE_DMA_CONTIGUOUS
> > select HAVE_DYNAMIC_FTRACE
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 752ed89a293b..7e5b92090faa 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -70,10 +70,16 @@ config HAVE_C_RECORDMCOUNT
> > help
> >   C version of recordmcount available?
> > +config HAVE_BUILDTIME_MCOUNT_SORT
> > +   bool
> > +   help
> > + An architecture selects this if it sorts the mcount_loc section
> > +at build time.
> > +
> >   config BUILDTIME_MCOUNT_SORT
> >  bool
> >  default y
> > -   depends on BUILDTIME_TABLE_SORT && !S390
> > +   depends on HAVE_BUILDTIME_MCOUNT_SORT
> >  help
> >Sort the mcount_loc section at build time.
> 
> LGTM. This will no longer destroy ftrace on other architectures.
> Those arches that we are not sure about can test and enable this function by
> themselves.
> 
> 
> Best regards
> --yinan
> 


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-25 Thread Steven Rostedt
On Tue, 25 Jan 2022 09:30:51 +0530
Sachin Sant  wrote:

> Tested-by: Sachin Sant 

Thanks, I'll start running it through my tests and send it to Linus later
today or tomorrow.

-- Steve


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Sachin Sant


> On 24-Jan-2022, at 10:15 PM, Steven Rostedt  wrote:
> 
> On Mon, 24 Jan 2022 20:15:06 +0800
> Yinan Liu  wrote:
> 
>> Hi, Steven and Sachin
>> 
>> I don't have a powerpc machine for testing, I guess the ppc has a 
>> similar problem with the s390. It's not clear to me why the compiler 
>> does this. Maybe we can handle ppc like you did with the s390 before, 
>> but I'm not sure if other architectures have similar issues. Or limit 
>> BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
>> x86 and arm?
>> 
>> steven, what's your opinion?
> 
> Yeah, I think it's time to opt in, instead of opting out.
> 
> Something like this:
> 
Thanks. This fixes the reported problem.

Tested-by: Sachin Sant 

- Sachin


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Yinan Liu

Yeah, I think it's time to opt in, instead of opting out.

Something like this:

-- Steve

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..5256ebe57451 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,6 +82,7 @@ config ARM
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..7996548b2b27 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,6 +166,7 @@ config ARM64
select HAVE_ASM_MODVERSIONS
select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7399327d1eff..46080dea5dba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -186,6 +186,7 @@ config X86
select HAVE_CONTEXT_TRACKING_OFFSTACK   if HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
select HAVE_OBJTOOL_MCOUNT  if STACK_VALIDATION
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 752ed89a293b..7e5b92090faa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -70,10 +70,16 @@ config HAVE_C_RECORDMCOUNT
help
  C version of recordmcount available?
  
+config HAVE_BUILDTIME_MCOUNT_SORT

+   bool
+   help
+ An architecture selects this if it sorts the mcount_loc section
+at build time.
+
  config BUILDTIME_MCOUNT_SORT
 bool
 default y
-   depends on BUILDTIME_TABLE_SORT && !S390
+   depends on HAVE_BUILDTIME_MCOUNT_SORT
 help
   Sort the mcount_loc section at build time.


LGTM. This will no longer destroy ftrace on other architectures.
Those arches that we are not sure about can test and enable this 
function by themselves.



Best regards
--yinan


Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Yinan Liu

在 2022/1/24 下午5:19, Sachin Sant 写道:

While running stress_code_patching test from selftests/powerpc/mm
against 5.17-rc1 booted on a POWER10 LPAR following ftrace warning
is seen:

WARNING: CPU: 1 PID: 2017392 at kernel/trace/ftrace.c:2068 
ftrace_bug+0x274/0x2d8
Modules linked in: dm_mod bonding rfkill sunrpc pseries_rng xts vmx_crypto 
uio_pdrv_genirq uio sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg 
ibmvscsi ibmveth scsi_transport_srp fuse
CPU: 1 PID: 2017392 Comm: stress_code_pat Not tainted 5.17.0-rc1-gdd81e1c7d5fb 
#1
NIP:  c02d561c LR: c02d5618 CTR: 005b4448
REGS: c000332fb760 TRAP: 0700   Not tainted  (5.17.0-rc1-gdd81e1c7d5fb)
MSR:  8282b033   CR: 48228224  XER: 
0009
CFAR: c01f6b00 IRQMASK: 0
GPR00: c02d5618 c000332fba00 c2a2 0022
GPR04: 7fff c000332fb720 c000332fb718 0027
GPR08: c0167cca7e10 0001 0027 c28d6d08
GPR12: 8000 c0167fa30780 4000 7fff9a089798
GPR16: 7fff9a089724 7fff9a026be8 7fff99fbf4f0 7fff9a08d568
GPR20: 7fffce533ed0 0001 7fff9a0399d8 7fffd9eccf94
GPR24: 0001  c000332fbc70 c0fb0d18
GPR28: c0ff5080 c0fadd38 c20032ec c70800a8
NIP [c02d561c] ftrace_bug+0x274/0x2d8
LR [c02d5618] ftrace_bug+0x270/0x2d8


Hi, Steven and Sachin

I don't have a powerpc machine for testing, I guess the ppc has a 
similar problem with the s390. It's not clear to me why the compiler 
does this. Maybe we can handle ppc like you did with the s390 before, 
but I'm not sure if other architectures have similar issues. Or limit 
BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
x86 and arm?


steven, what's your opinion?


Best regards
--yinan



Re: [powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests

2022-01-24 Thread Steven Rostedt
On Mon, 24 Jan 2022 20:15:06 +0800
Yinan Liu  wrote:

> Hi, Steven and Sachin
> 
> I don't have a powerpc machine for testing, I guess the ppc has a 
> similar problem with the s390. It's not clear to me why the compiler 
> does this. Maybe we can handle ppc like you did with the s390 before, 
> but I'm not sure if other architectures have similar issues. Or limit 
> BUILDTIME_MCOUNT_SORT to a smaller scope and make it only available for 
> x86 and arm?
> 
> steven, what's your opinion?

Yeah, I think it's time to opt in, instead of opting out.

Something like this:

-- Steve

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..5256ebe57451 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,6 +82,7 @@ config ARM
select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK if !XIP_KERNEL
select HAVE_DMA_CONTIGUOUS if MMU
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..7996548b2b27 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -166,6 +166,7 @@ config ARM64
select HAVE_ASM_MODVERSIONS
select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_CONTEXT_TRACKING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7399327d1eff..46080dea5dba 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -186,6 +186,7 @@ config X86
select HAVE_CONTEXT_TRACKING_OFFSTACK   if HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
select HAVE_OBJTOOL_MCOUNT  if STACK_VALIDATION
+   select HAVE_BUILDTIME_MCOUNT_SORT
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 752ed89a293b..7e5b92090faa 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -70,10 +70,16 @@ config HAVE_C_RECORDMCOUNT
help
  C version of recordmcount available?
 
+config HAVE_BUILDTIME_MCOUNT_SORT
+   bool
+   help
+ An architecture selects this if it sorts the mcount_loc section
+at build time.
+
 config BUILDTIME_MCOUNT_SORT
bool
default y
-   depends on BUILDTIME_TABLE_SORT && !S390
+   depends on HAVE_BUILDTIME_MCOUNT_SORT
help
  Sort the mcount_loc section at build time.