On Tue, 16 Jun 2020 at 23:40, Will Deacon <w...@kernel.org> wrote: > > On Fri, Jun 05, 2020 at 03:22:57PM -0700, Saravana Kannan wrote: > > When loading a module, module_frob_arch_sections() tries to figure out > > the number of PLTs that'll be needed to handle all the RELAs. While > > doing this, it tries to dedupe PLT allocations for multiple > > R_AARCH64_CALL26 relocations to the same symbol. It does the same for > > R_AARCH64_JUMP26 relocations too. > > > > To make checks for duplicates easier/faster, it sorts the relocation > > list by type, symbol and addend. That way, to check for a duplicate > > relocation, it just needs to compare with the previous entry. > > > > However, sorting the entire relocation array is unnecessary and > > expensive (O(n log n)) because there are a lot of other relocation types > > that don't need deduping or can't be deduped. > > > > So this commit partitions the array into entries that need deduping and > > those that don't. And then sorts just the part that needs deduping. And > > when CONFIG_RANDOMIZE_BASE is disabled, the sorting is skipped entirely > > because PLTs are not allocated for R_AARCH64_CALL26 and R_AARCH64_JUMP26 > > if it's disabled. > > > > This gives significant reduction in module load time for modules with > > large number of relocations with no measurable impact on modules with a > > small number of relocations. In my test setup with CONFIG_RANDOMIZE_BASE > > enabled, the load time for one module went down from 268ms to 100ms. > > Another module went down from 143ms to 83ms. > > Whilst I can see that's a significant relative saving, what proportion of > actual boot time are we talking about here? It would be interesting to > know if there are bigger potential savings elsewhere. >
Also, 'some module' vs 'some other module' doesn't really say anything. Please explain which modules and their sizes. > > This commit also disables the sorting if CONFIG_RANDOMIZE_BASE is > > disabled because it looks like PLTs are not allocated for > > R_AARCH64_CALL26 and R_AARCH64_JUMP26 if it's disabled. > > > > Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> > > Signed-off-by: Saravana Kannan <sarava...@google.com> > > --- > > arch/arm64/kernel/module-plts.c | 37 ++++++++++++++++++++++++++++++++- > > 1 file changed, 36 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kernel/module-plts.c > > b/arch/arm64/kernel/module-plts.c > > index 65b08a74aec6..bf5118b3b828 100644 > > --- a/arch/arm64/kernel/module-plts.c > > +++ b/arch/arm64/kernel/module-plts.c > > @@ -253,6 +253,36 @@ static unsigned int count_plts(Elf64_Sym *syms, > > Elf64_Rela *rela, int num, > > return ret; > > } > > > > +static bool rela_needs_dedup(Elf64_Rela *rela) > > +{ > > + return ELF64_R_TYPE(rela->r_info) == R_AARCH64_JUMP26 > > + || ELF64_R_TYPE(rela->r_info) == R_AARCH64_CALL26; > > +} > Would it help to check the section index here as well? Call/jump instructions within a section are never sent through a PLT entry. > Does this handle A53 erratum 843419 correctly? I'm worried that we skip > the ADRP PLTs there. > ADRP PLTs cannot be deduplicated, as they incorporate a relative jump back to the caller. > > + > > +/* Group the CALL26/JUMP26 relas toward the beginning of the array. */ > > +static int partition_dedup_relas(Elf64_Rela *rela, int numrels) > > +{ > > + int i = 0, j = numrels - 1; > > + Elf64_Rela t; > > + > > + while (i < j) { > > + while (rela_needs_dedup(rela + i) && i < j) > > + i++; > > + while (!rela_needs_dedup(rela + j) && i < j) > > + j--; > > + if (i < j) { > > + t = *(rela + j); > > + *(rela + j) = *(rela + i); > > + *(rela + i) = t; > > + } > > + } > > This is very hard to read and I think some of the 'i < j' comparisons are > redundant. Would it make more sense to assign a temporary rather than > post-inc/decrement and recheck? > Agreed. Also, what's wrong with [] array indexing?