Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre writes: > On Tue, 12 Nov 2013, Måns Rullgård wrote: > >> Nicolas Pitre writes: >> >> > On Tue, 12 Nov 2013, Ben Dooks wrote: >> > >> >> Given these are single instructoins for ARM, is it possible we could >> >> make a table of all the callers and fix them up when we initialise >> >> as we do for the SMP/UP case and for page-offset? >> > >> > Not really. Calls to those functions are generated by the compiler >> > implicitly when a divisor operand is used and therefore we cannot >> > annotate those calls. We'd have to use special accessors everywhere to >> > replace the standard division operand (like we do for 64 by 32 bit >> > divisions) but I doubt that people would accept that. >> >> It might be possible to extract this information from relocation tables. > > True, but only for individual .o files. Once the linker puts them > together the information is lost, and trying to infer what the linker > has done is insane. > > Filtering the compiler output to annotate idiv calls before it is > assembled would probably be a better solution. OK, here's an extremely ugly hootenanny of a patch. It seems to work on an A7 Cubieboard2. I would never suggest actually doing this, but maybe it can be useful for comparing performance against the more palatable solutions. diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 7397db6..cf1cd30 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -113,7 +113,7 @@ endif endif # Need -Uarm for gcc < 3.x -KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm +KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm -include asm/divhack.h KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_THUMB2) $(arch-y) $(tune-y) -include asm/unified.h -msoft-float CHECKFLAGS += -D__arm__ diff --git a/arch/arm/include/asm/divhack.h b/arch/arm/include/asm/divhack.h new file mode 100644 index 000..c750b78 --- /dev/null +++ b/arch/arm/include/asm/divhack.h @@ -0,0 +1,23 @@ +__asm__ (".macro dobl tgt \n" + ".ifc \\tgt, __aeabi_idiv \n" + ".L.sdiv.\\@: \n" + ".pushsection .sdiv_tab.init, \"a\", %progbits \n" + ".word.L.sdiv.\\@ \n" + ".popsection \n" + ".endif\n" + ".ifc \\tgt, __aeabi_uidiv \n" + ".L.udiv.\\@: \n" + ".pushsection .udiv_tab.init, \"a\", %progbits \n" + ".word.L.udiv.\\@ \n" + ".popsection \n" + ".endif\n" + "bl \\tgt \n" + ".endm \n" + ".macro defbl \n" + ".macro bl tgt \n" + ".purgem bl\n" + "dobl \\tgt\n" + "defbl \n" + ".endm \n" + ".endm \n" + "defbl \n"); diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 067815c1..b3a3fe1 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -375,6 +375,18 @@ static void __init cpuid_init_hwcaps(void) case 1: elf_hwcap |= HWCAP_IDIVT; } + + if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVA)) { + extern u32 __sdiv_tab_start, __sdiv_tab_end; + extern u32 __udiv_tab_start, __udiv_tab_end; + u32 *div; + + for (div = &__sdiv_tab_start; div < &__sdiv_tab_end; div++) + *(u32 *)*div = 0xe710f110; + + for (div = &__udiv_tab_start; div < &__udiv_tab_end; div++) + *(u32 *)*div = 0xe730f110; + } } static void __init feat_v6_fixup(void) diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 43a31fb..3d5c103 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -176,6 +176,8 @@ SECTIONS CON_INITCALL SECURITY_INITCALL INIT_RAM_FS + __sdiv_tab_start = .; *(.sdiv_tab.init); __sdiv_tab_end = .; +
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Nicolas Pitre wrote: > On Tue, 12 Nov 2013, Måns Rullgård wrote: > > > It might be possible to extract this information from relocation tables. > > True, but only for individual .o files. Once the linker puts them > together the information is lost, and trying to infer what the linker > has done is insane. > > Filtering the compiler output to annotate idiv calls before it is > assembled would probably be a better solution. Another solution is to patch the call site from within __aeabi_idiv at run time using lr. That wouldn't work in the presence of tail call optimization though. Again this might not be worth it and patching __aeabi_idiv instead might be a good enough compromize. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Måns Rullgård wrote: > Nicolas Pitre writes: > > > On Tue, 12 Nov 2013, Ben Dooks wrote: > > > >> Given these are single instructoins for ARM, is it possible we could > >> make a table of all the callers and fix them up when we initialise > >> as we do for the SMP/UP case and for page-offset? > > > > Not really. Calls to those functions are generated by the compiler > > implicitly when a divisor operand is used and therefore we cannot > > annotate those calls. We'd have to use special accessors everywhere to > > replace the standard division operand (like we do for 64 by 32 bit > > divisions) but I doubt that people would accept that. > > It might be possible to extract this information from relocation tables. True, but only for individual .o files. Once the linker puts them together the information is lost, and trying to infer what the linker has done is insane. Filtering the compiler output to annotate idiv calls before it is assembled would probably be a better solution. Is it worth it? I'm not sure. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Måns Rullgård wrote: > Nicolas Pitre writes: > > > What about this patch which I think is currently your best option. Note > > it would need to use the facilities from asm/opcodes.h to make it endian > > agnostic. > > > > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c > > index 6a1b8a81b1..379cffe4ab 100644 > > --- a/arch/arm/kernel/setup.c > > +++ b/arch/arm/kernel/setup.c > > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) > > elf_hwcap |= HWCAP_IDIVT; > > } > > > > + /* > > +* Patch our division routines with the corresponding opcode > > +* if the hardware supports it. > > +*/ > > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { > > + extern char __aeabi_uidiv, __aeabi_idiv; > > It would be safer to declare these as arrays of unspecified size. > Otherwise the compiler might do evil things with what to it looks like > out of bounds indexing. Right. > > There should also be some cache maintenance after this patching, or is > that already happening for some other reason? This is so early during boot that the MMU isn't even fully initialized yet. The cache will be flushed. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre writes: > On Tue, 12 Nov 2013, Ben Dooks wrote: > >> Given these are single instructoins for ARM, is it possible we could >> make a table of all the callers and fix them up when we initialise >> as we do for the SMP/UP case and for page-offset? > > Not really. Calls to those functions are generated by the compiler > implicitly when a divisor operand is used and therefore we cannot > annotate those calls. We'd have to use special accessors everywhere to > replace the standard division operand (like we do for 64 by 32 bit > divisions) but I doubt that people would accept that. It might be possible to extract this information from relocation tables. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Ben Dooks wrote: > Given these are single instructoins for ARM, is it possible we could > make a table of all the callers and fix them up when we initialise > as we do for the SMP/UP case and for page-offset? Not really. Calls to those functions are generated by the compiler implicitly when a divisor operand is used and therefore we cannot annotate those calls. We'd have to use special accessors everywhere to replace the standard division operand (like we do for 64 by 32 bit divisions) but I doubt that people would accept that. You cannot just scan the binary for the appropriate branch opcode either as you may turn up false positives in literal pools. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre writes: > What about this patch which I think is currently your best option. Note > it would need to use the facilities from asm/opcodes.h to make it endian > agnostic. > > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c > index 6a1b8a81b1..379cffe4ab 100644 > --- a/arch/arm/kernel/setup.c > +++ b/arch/arm/kernel/setup.c > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) > elf_hwcap |= HWCAP_IDIVT; > } > > + /* > + * Patch our division routines with the corresponding opcode > + * if the hardware supports it. > + */ > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { > + extern char __aeabi_uidiv, __aeabi_idiv; It would be safer to declare these as arrays of unspecified size. Otherwise the compiler might do evil things with what to it looks like out of bounds indexing. There should also be some cache maintenance after this patching, or is that already happening for some other reason? -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 12/11/13 14:04, Russell King - ARM Linux wrote: On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)&__aeabi_uidiv; + u16 *idiv = (u16 *)&__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)&__aeabi_uidiv; + u32 *idiv = (u32 *)&__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } What about endianness, and what if XIP is enabled? I was also going to add a note about endian-ness. Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Russell King - ARM Linux wrote: > On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: > > What about this patch which I think is currently your best option. Note > > it would need to use the facilities from asm/opcodes.h to make it endian > > agnostic. > > > > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c > > index 6a1b8a81b1..379cffe4ab 100644 > > --- a/arch/arm/kernel/setup.c > > +++ b/arch/arm/kernel/setup.c > > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) > > elf_hwcap |= HWCAP_IDIVT; > > } > > > > + /* > > +* Patch our division routines with the corresponding opcode > > +* if the hardware supports it. > > +*/ > > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { > > + extern char __aeabi_uidiv, __aeabi_idiv; > > + u16 *uidiv = (u16 *)&__aeabi_uidiv; > > + u16 *idiv = (u16 *)&__aeabi_idiv; > > + > > + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ > > + uidiv[1] = 0xf0f1; > > + uidiv[2] = 0x4770; /* bx lr */ > > + > > + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ > > + idiv[1] = 0xf0f1; > > + idiv[2] = 0x4770; /* bx lr */ > > + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & > > HWCAP_IDIVA)) { > > + extern char __aeabi_uidiv, __aeabi_idiv; > > + u32 *uidiv = (u32 *)&__aeabi_uidiv; > > + u32 *idiv = (u32 *)&__aeabi_idiv; > > + > > + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ > > + uidiv[1] = 0xe12fff1e; /* bx lr */ > > + > > + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ > > + idiv[1] = 0xe12fff1e; /* bx lr */ > > + } > > What about endianness, and what if XIP is enabled? Just as I said above the diff: this needs refined. Obviously XIP can't use this and doesn't need it either as a XIP kernel should be optimized for the very platform it will run onto i.e. gcc should already emit those div opcodes inline if appropriate. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: > What about this patch which I think is currently your best option. Note > it would need to use the facilities from asm/opcodes.h to make it endian > agnostic. > > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c > index 6a1b8a81b1..379cffe4ab 100644 > --- a/arch/arm/kernel/setup.c > +++ b/arch/arm/kernel/setup.c > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) > elf_hwcap |= HWCAP_IDIVT; > } > > + /* > + * Patch our division routines with the corresponding opcode > + * if the hardware supports it. > + */ > + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { > + extern char __aeabi_uidiv, __aeabi_idiv; > + u16 *uidiv = (u16 *)&__aeabi_uidiv; > + u16 *idiv = (u16 *)&__aeabi_idiv; > + > + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ > + uidiv[1] = 0xf0f1; > + uidiv[2] = 0x4770; /* bx lr */ > + > + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ > + idiv[1] = 0xf0f1; > + idiv[2] = 0x4770; /* bx lr */ > + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & > HWCAP_IDIVA)) { > + extern char __aeabi_uidiv, __aeabi_idiv; > + u32 *uidiv = (u32 *)&__aeabi_uidiv; > + u32 *idiv = (u32 *)&__aeabi_idiv; > + > + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ > + uidiv[1] = 0xe12fff1e; /* bx lr */ > + > + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ > + idiv[1] = 0xe12fff1e; /* bx lr */ > + } What about endianness, and what if XIP is enabled? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Mon, 11 Nov 2013, Stephen Boyd wrote: > On 11/09/13 21:03, Nicolas Pitre wrote: > > Bah. NAK. We are doing runtime patching of the kernel for many > > many things already. So why not do the same here? > > static keys are a form of runtime patching, albeit not as extreme as > you're suggesting. > > > > > The obvious strategy is to simply overwrite the start of the existing > > __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes. > > > > Similarly for the unsigned case. > > I was thinking the same thing when I wrote this, but I didn't know how > to tell the compiler to either inline this function or to let me inilne > an assembly stub with some section magic. > > > > > That let you test the hardware capability only once during boot instead > > of everytime a divide operation is performed. > > The test for hardware capability really isn't done more than once during > boot. The assembly is like so at compile time > > <__aeabi_idiv>: >0: nop {0} >4: b 0 <___aeabi_idiv> >8: sdivr0, r0, r1 >c: bx lr > > and after we test and find support for the instruction it will be > replaced with > > <__aeabi_idiv>: >0: b 8 >4: b 0 <___aeabi_idiv> >8: sdivr0, r0, r1 >c: bx lr > > Unfortunately we still have to jump to this function. It would be great > if we could inline this function at the call site but as I already said > I don't know how to do that. What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)&__aeabi_uidiv; + u16 *idiv = (u16 *)&__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)&__aeabi_uidiv; + u32 *idiv = (u32 *)&__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } + /* LPAE implies atomic ldrd/strd instructions */ vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) & 0xf) >> 0; if (vmsa >= 5) Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Stephen Boyd writes: > On 11/09/13 21:03, Nicolas Pitre wrote: >> Bah. NAK. We are doing runtime patching of the kernel for many >> many things already. So why not do the same here? > > static keys are a form of runtime patching, albeit not as extreme as > you're suggesting. > >> >> The obvious strategy is to simply overwrite the start of the existing >> __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes. >> >> Similarly for the unsigned case. > > I was thinking the same thing when I wrote this, but I didn't know how > to tell the compiler to either inline this function or to let me inilne > an assembly stub with some section magic. > >> >> That let you test the hardware capability only once during boot instead >> of everytime a divide operation is performed. > > The test for hardware capability really isn't done more than once during > boot. The assembly is like so at compile time > > <__aeabi_idiv>: >0: nop {0} >4: b 0 <___aeabi_idiv> >8: sdivr0, r0, r1 >c: bx lr > > and after we test and find support for the instruction it will be > replaced with > > <__aeabi_idiv>: >0: b 8 >4: b 0 <___aeabi_idiv> >8: sdivr0, r0, r1 >c: bx lr > > Unfortunately we still have to jump to this function. It would be great > if we could inline this function at the call site but as I already said > I don't know how to do that. Ideally the bl instruction at the call site would be patched over with sdiv/udiv when supported. This would leave things exactly as they are for hardware without div capability and incur only the call setup cost (but no actual call) on div-capable hardware. No, I don't know how to achieve this. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Stephen Boyd sb...@codeaurora.org writes: On 11/09/13 21:03, Nicolas Pitre wrote: Bah. NAK. We are doing runtime patching of the kernel for many many things already. So why not do the same here? static keys are a form of runtime patching, albeit not as extreme as you're suggesting. The obvious strategy is to simply overwrite the start of the existing __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes. Similarly for the unsigned case. I was thinking the same thing when I wrote this, but I didn't know how to tell the compiler to either inline this function or to let me inilne an assembly stub with some section magic. That let you test the hardware capability only once during boot instead of everytime a divide operation is performed. The test for hardware capability really isn't done more than once during boot. The assembly is like so at compile time __aeabi_idiv: 0: nop {0} 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr and after we test and find support for the instruction it will be replaced with __aeabi_idiv: 0: b 8 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr Unfortunately we still have to jump to this function. It would be great if we could inline this function at the call site but as I already said I don't know how to do that. Ideally the bl instruction at the call site would be patched over with sdiv/udiv when supported. This would leave things exactly as they are for hardware without div capability and incur only the call setup cost (but no actual call) on div-capable hardware. No, I don't know how to achieve this. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Mon, 11 Nov 2013, Stephen Boyd wrote: On 11/09/13 21:03, Nicolas Pitre wrote: Bah. NAK. We are doing runtime patching of the kernel for many many things already. So why not do the same here? static keys are a form of runtime patching, albeit not as extreme as you're suggesting. The obvious strategy is to simply overwrite the start of the existing __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes. Similarly for the unsigned case. I was thinking the same thing when I wrote this, but I didn't know how to tell the compiler to either inline this function or to let me inilne an assembly stub with some section magic. That let you test the hardware capability only once during boot instead of everytime a divide operation is performed. The test for hardware capability really isn't done more than once during boot. The assembly is like so at compile time __aeabi_idiv: 0: nop {0} 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr and after we test and find support for the instruction it will be replaced with __aeabi_idiv: 0: b 8 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr Unfortunately we still have to jump to this function. It would be great if we could inline this function at the call site but as I already said I don't know how to do that. What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)__aeabi_uidiv; + u16 *idiv = (u16 *)__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)__aeabi_uidiv; + u32 *idiv = (u32 *)__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } + /* LPAE implies atomic ldrd/strd instructions */ vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) 0xf) 0; if (vmsa = 5) Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* + * Patch our division routines with the corresponding opcode + * if the hardware supports it. + */ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)__aeabi_uidiv; + u16 *idiv = (u16 *)__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)__aeabi_uidiv; + u32 *idiv = (u32 *)__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } What about endianness, and what if XIP is enabled? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 12/11/13 14:04, Russell King - ARM Linux wrote: On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)__aeabi_uidiv; + u16 *idiv = (u16 *)__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)__aeabi_uidiv; + u32 *idiv = (u32 *)__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } What about endianness, and what if XIP is enabled? I was also going to add a note about endian-ness. Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? -- Ben Dooks http://www.codethink.co.uk/ Senior Engineer Codethink - Providing Genius -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Russell King - ARM Linux wrote: On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u16 *uidiv = (u16 *)__aeabi_uidiv; + u16 *idiv = (u16 *)__aeabi_idiv; + + uidiv[0] = 0xfbb0; /* udiv r0, r0, r1 */ + uidiv[1] = 0xf0f1; + uidiv[2] = 0x4770; /* bx lr */ + + idiv[0] = 0xfb90; /* sdiv r0, r0, r1 */ + idiv[1] = 0xf0f1; + idiv[2] = 0x4770; /* bx lr */ + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVA)) { + extern char __aeabi_uidiv, __aeabi_idiv; + u32 *uidiv = (u32 *)__aeabi_uidiv; + u32 *idiv = (u32 *)__aeabi_idiv; + + uidiv[0] = 0xe730f110; /* udiv r0, r0, r1 */ + uidiv[1] = 0xe12fff1e; /* bx lr */ + + idiv[0] = 0xe710f110; /* sdiv r0, r0, r1 */ + idiv[1] = 0xe12fff1e; /* bx lr */ + } What about endianness, and what if XIP is enabled? Just as I said above the diff: this needs refined. Obviously XIP can't use this and doesn't need it either as a XIP kernel should be optimized for the very platform it will run onto i.e. gcc should already emit those div opcodes inline if appropriate. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre nicolas.pi...@linaro.org writes: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* + * Patch our division routines with the corresponding opcode + * if the hardware supports it. + */ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; It would be safer to declare these as arrays of unspecified size. Otherwise the compiler might do evil things with what to it looks like out of bounds indexing. There should also be some cache maintenance after this patching, or is that already happening for some other reason? -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Ben Dooks wrote: Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? Not really. Calls to those functions are generated by the compiler implicitly when a divisor operand is used and therefore we cannot annotate those calls. We'd have to use special accessors everywhere to replace the standard division operand (like we do for 64 by 32 bit divisions) but I doubt that people would accept that. You cannot just scan the binary for the appropriate branch opcode either as you may turn up false positives in literal pools. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre nicolas.pi...@linaro.org writes: On Tue, 12 Nov 2013, Ben Dooks wrote: Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? Not really. Calls to those functions are generated by the compiler implicitly when a divisor operand is used and therefore we cannot annotate those calls. We'd have to use special accessors everywhere to replace the standard division operand (like we do for 64 by 32 bit divisions) but I doubt that people would accept that. It might be possible to extract this information from relocation tables. -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Måns Rullgård wrote: Nicolas Pitre nicolas.pi...@linaro.org writes: What about this patch which I think is currently your best option. Note it would need to use the facilities from asm/opcodes.h to make it endian agnostic. diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 6a1b8a81b1..379cffe4ab 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + /* +* Patch our division routines with the corresponding opcode +* if the hardware supports it. +*/ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVT)) { + extern char __aeabi_uidiv, __aeabi_idiv; It would be safer to declare these as arrays of unspecified size. Otherwise the compiler might do evil things with what to it looks like out of bounds indexing. Right. There should also be some cache maintenance after this patching, or is that already happening for some other reason? This is so early during boot that the MMU isn't even fully initialized yet. The cache will be flushed. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Måns Rullgård wrote: Nicolas Pitre nicolas.pi...@linaro.org writes: On Tue, 12 Nov 2013, Ben Dooks wrote: Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? Not really. Calls to those functions are generated by the compiler implicitly when a divisor operand is used and therefore we cannot annotate those calls. We'd have to use special accessors everywhere to replace the standard division operand (like we do for 64 by 32 bit divisions) but I doubt that people would accept that. It might be possible to extract this information from relocation tables. True, but only for individual .o files. Once the linker puts them together the information is lost, and trying to infer what the linker has done is insane. Filtering the compiler output to annotate idiv calls before it is assembled would probably be a better solution. Is it worth it? I'm not sure. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Tue, 12 Nov 2013, Nicolas Pitre wrote: On Tue, 12 Nov 2013, Måns Rullgård wrote: It might be possible to extract this information from relocation tables. True, but only for individual .o files. Once the linker puts them together the information is lost, and trying to infer what the linker has done is insane. Filtering the compiler output to annotate idiv calls before it is assembled would probably be a better solution. Another solution is to patch the call site from within __aeabi_idiv at run time using lr. That wouldn't work in the presence of tail call optimization though. Again this might not be worth it and patching __aeabi_idiv instead might be a good enough compromize. Nicolas
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Nicolas Pitre nicolas.pi...@linaro.org writes: On Tue, 12 Nov 2013, Måns Rullgård wrote: Nicolas Pitre nicolas.pi...@linaro.org writes: On Tue, 12 Nov 2013, Ben Dooks wrote: Given these are single instructoins for ARM, is it possible we could make a table of all the callers and fix them up when we initialise as we do for the SMP/UP case and for page-offset? Not really. Calls to those functions are generated by the compiler implicitly when a divisor operand is used and therefore we cannot annotate those calls. We'd have to use special accessors everywhere to replace the standard division operand (like we do for 64 by 32 bit divisions) but I doubt that people would accept that. It might be possible to extract this information from relocation tables. True, but only for individual .o files. Once the linker puts them together the information is lost, and trying to infer what the linker has done is insane. Filtering the compiler output to annotate idiv calls before it is assembled would probably be a better solution. OK, here's an extremely ugly hootenanny of a patch. It seems to work on an A7 Cubieboard2. I would never suggest actually doing this, but maybe it can be useful for comparing performance against the more palatable solutions. diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 7397db6..cf1cd30 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -113,7 +113,7 @@ endif endif # Need -Uarm for gcc 3.x -KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm +KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm -include asm/divhack.h KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_THUMB2) $(arch-y) $(tune-y) -include asm/unified.h -msoft-float CHECKFLAGS += -D__arm__ diff --git a/arch/arm/include/asm/divhack.h b/arch/arm/include/asm/divhack.h new file mode 100644 index 000..c750b78 --- /dev/null +++ b/arch/arm/include/asm/divhack.h @@ -0,0 +1,23 @@ +__asm__ (.macro dobl tgt \n + .ifc \\tgt, __aeabi_idiv \n + .L.sdiv.\\@: \n + .pushsection .sdiv_tab.init, \a\, %progbits \n + .word.L.sdiv.\\@ \n + .popsection \n + .endif\n + .ifc \\tgt, __aeabi_uidiv \n + .L.udiv.\\@: \n + .pushsection .udiv_tab.init, \a\, %progbits \n + .word.L.udiv.\\@ \n + .popsection \n + .endif\n + bl \\tgt \n + .endm \n + .macro defbl \n + .macro bl tgt \n + .purgem bl\n + dobl \\tgt\n + defbl \n + .endm \n + .endm \n + defbl \n); diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 067815c1..b3a3fe1 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -375,6 +375,18 @@ static void __init cpuid_init_hwcaps(void) case 1: elf_hwcap |= HWCAP_IDIVT; } + + if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) (elf_hwcap HWCAP_IDIVA)) { + extern u32 __sdiv_tab_start, __sdiv_tab_end; + extern u32 __udiv_tab_start, __udiv_tab_end; + u32 *div; + + for (div = __sdiv_tab_start; div __sdiv_tab_end; div++) + *(u32 *)*div = 0xe710f110; + + for (div = __udiv_tab_start; div __udiv_tab_end; div++) + *(u32 *)*div = 0xe730f110; + } } static void __init feat_v6_fixup(void) diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S index 43a31fb..3d5c103 100644 --- a/arch/arm/kernel/vmlinux.lds.S +++ b/arch/arm/kernel/vmlinux.lds.S @@ -176,6 +176,8 @@ SECTIONS CON_INITCALL SECURITY_INITCALL INIT_RAM_FS + __sdiv_tab_start = .; *(.sdiv_tab.init); __sdiv_tab_end = .; + __udiv_tab_start = .; *(.udiv_tab.init); __udiv_tab_end = .; } #ifndef
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/10/13 23:46, Uwe Kleine-König wrote: > Hello, > > On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote: >> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile >> index bd454b0..38621729 100644 >> --- a/arch/arm/lib/Makefile >> +++ b/arch/arm/lib/Makefile >> @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o >> csumpartial.o \ >> io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ >> call_with_stack.o >> >> +lib-$(CONFIG_CPU_V7) += div-v7.o > CPU_V7M could make use of that, too. > (If you follow Nico's advice to use runtime patching I cannot test it > for you on v7-M though as my machine has to use an XIP kernel.) > It already is runtime patching so I suspect you won't be able to test it anyway. I suppose we need another config like MIGHT_HAVE_IDIV or something that both v7 and v7M select? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/09/13 21:03, Nicolas Pitre wrote: > Bah. NAK. We are doing runtime patching of the kernel for many > many things already. So why not do the same here? static keys are a form of runtime patching, albeit not as extreme as you're suggesting. > > The obvious strategy is to simply overwrite the start of the existing > __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes. > > Similarly for the unsigned case. I was thinking the same thing when I wrote this, but I didn't know how to tell the compiler to either inline this function or to let me inilne an assembly stub with some section magic. > > That let you test the hardware capability only once during boot instead > of everytime a divide operation is performed. The test for hardware capability really isn't done more than once during boot. The assembly is like so at compile time <__aeabi_idiv>: 0: nop {0} 4: b 0 <___aeabi_idiv> 8: sdivr0, r0, r1 c: bx lr and after we test and find support for the instruction it will be replaced with <__aeabi_idiv>: 0: b 8 4: b 0 <___aeabi_idiv> 8: sdivr0, r0, r1 c: bx lr Unfortunately we still have to jump to this function. It would be great if we could inline this function at the call site but as I already said I don't know how to do that. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/08/13 22:46, Matt Sealey wrote: > On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd wrote: >> If we're running on a v7 ARM CPU, detect if the CPU supports the >> sdiv/udiv instructions and replace the signed and unsigned >> division library functions with an sdiv/udiv instruction. >> >> Running the perf messaging benchmark in pipe mode >> >> $ perf bench sched messaging -p >> >> shows a modest improvement on my v7 CPU. >> >> before: >> (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + >> 5.683) / 10 = 5.805 >> >> after: >> (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + >> 5.093) / 10 = 5.538 >> >> (5.805 - 5.538) / 5.805 = 4.6% > Even with the change to the output constraint suggested by Mans, you > get absolutely identical benchmark results? There's a lot of variance > in any case.. Yeah sorry I didn't run the testcase again to see if numbers changed because I assumed one less instruction would be in the noise. I agree there is a lot of variance so if you have any better benchmarks/testcases please let me know. > > BTW has there been any evaluation of the penalty for the extra > branching, or the performance hit for the ARMv7-without-division > cases? I haven't done any. I'll factor that in for the next round. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/08/13 22:46, Matt Sealey wrote: On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd sb...@codeaurora.org wrote: If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Even with the change to the output constraint suggested by Mans, you get absolutely identical benchmark results? There's a lot of variance in any case.. Yeah sorry I didn't run the testcase again to see if numbers changed because I assumed one less instruction would be in the noise. I agree there is a lot of variance so if you have any better benchmarks/testcases please let me know. BTW has there been any evaluation of the penalty for the extra branching, or the performance hit for the ARMv7-without-division cases? I haven't done any. I'll factor that in for the next round. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/09/13 21:03, Nicolas Pitre wrote: Bah. NAK. We are doing runtime patching of the kernel for many many things already. So why not do the same here? static keys are a form of runtime patching, albeit not as extreme as you're suggesting. The obvious strategy is to simply overwrite the start of the existing __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes. Similarly for the unsigned case. I was thinking the same thing when I wrote this, but I didn't know how to tell the compiler to either inline this function or to let me inilne an assembly stub with some section magic. That let you test the hardware capability only once during boot instead of everytime a divide operation is performed. The test for hardware capability really isn't done more than once during boot. The assembly is like so at compile time __aeabi_idiv: 0: nop {0} 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr and after we test and find support for the instruction it will be replaced with __aeabi_idiv: 0: b 8 4: b 0 ___aeabi_idiv 8: sdivr0, r0, r1 c: bx lr Unfortunately we still have to jump to this function. It would be great if we could inline this function at the call site but as I already said I don't know how to do that. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On 11/10/13 23:46, Uwe Kleine-König wrote: Hello, On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote: diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index bd454b0..38621729 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ call_with_stack.o +lib-$(CONFIG_CPU_V7) += div-v7.o CPU_V7M could make use of that, too. (If you follow Nico's advice to use runtime patching I cannot test it for you on v7-M though as my machine has to use an XIP kernel.) It already is runtime patching so I suspect you won't be able to test it anyway. I suppose we need another config like MIGHT_HAVE_IDIV or something that both v7 and v7M select? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Hello, On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote: > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > index bd454b0..38621729 100644 > --- a/arch/arm/lib/Makefile > +++ b/arch/arm/lib/Makefile > @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o > csumpartial.o \ > io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ > call_with_stack.o > > +lib-$(CONFIG_CPU_V7) += div-v7.o CPU_V7M could make use of that, too. (If you follow Nico's advice to use runtime patching I cannot test it for you on v7-M though as my machine has to use an XIP kernel.) Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ | -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Hello, On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote: diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index bd454b0..38621729 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ call_with_stack.o +lib-$(CONFIG_CPU_V7) += div-v7.o CPU_V7M could make use of that, too. (If you follow Nico's advice to use runtime patching I cannot test it for you on v7-M though as my machine has to use an XIP kernel.) Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ | -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Fri, 8 Nov 2013, Stephen Boyd wrote: > If we're running on a v7 ARM CPU, detect if the CPU supports the > sdiv/udiv instructions and replace the signed and unsigned > division library functions with an sdiv/udiv instruction. > > Running the perf messaging benchmark in pipe mode > > $ perf bench sched messaging -p > > shows a modest improvement on my v7 CPU. > > before: > (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + > 5.683) / 10 = 5.805 > > after: > (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + > 5.093) / 10 = 5.538 > > (5.805 - 5.538) / 5.805 = 4.6% > > Signed-off-by: Stephen Boyd Bah. NAK. We are doing runtime patching of the kernel for many many things already. So why not do the same here? The obvious strategy is to simply overwrite the start of the existing __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes. Similarly for the unsigned case. That let you test the hardware capability only once during boot instead of everytime a divide operation is performed. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Matt Sealey writes: > BTW has there been any evaluation of the penalty for the extra > branching, or the performance hit for the ARMv7-without-division > cases? The branches themselves probably have minimal overhead. There will however be code to preserve call-clobbered registers (and move the values to/from r0/r1) that would not be needed if the div instructions were done inline (obviously such a kernel could only run on hardware with division support). -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
Matt Sealey n...@bakuhatsu.net writes: BTW has there been any evaluation of the penalty for the extra branching, or the performance hit for the ARMv7-without-division cases? The branches themselves probably have minimal overhead. There will however be code to preserve call-clobbered registers (and move the values to/from r0/r1) that would not be needed if the div instructions were done inline (obviously such a kernel could only run on hardware with division support). -- Måns Rullgård m...@mansr.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Fri, 8 Nov 2013, Stephen Boyd wrote: If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Signed-off-by: Stephen Boyd sb...@codeaurora.org Bah. NAK. We are doing runtime patching of the kernel for many many things already. So why not do the same here? The obvious strategy is to simply overwrite the start of the existing __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes. Similarly for the unsigned case. That let you test the hardware capability only once during boot instead of everytime a divide operation is performed. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd wrote: > If we're running on a v7 ARM CPU, detect if the CPU supports the > sdiv/udiv instructions and replace the signed and unsigned > division library functions with an sdiv/udiv instruction. > > Running the perf messaging benchmark in pipe mode > > $ perf bench sched messaging -p > > shows a modest improvement on my v7 CPU. > > before: > (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + > 5.683) / 10 = 5.805 > > after: > (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + > 5.093) / 10 = 5.538 > > (5.805 - 5.538) / 5.805 = 4.6% Even with the change to the output constraint suggested by Mans, you get absolutely identical benchmark results? There's a lot of variance in any case.. BTW has there been any evaluation of the penalty for the extra branching, or the performance hit for the ARMv7-without-division cases? Ta, Matt Sealey -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Signed-off-by: Stephen Boyd --- Changes since v1: * Replace signed with unsigned in unsigned divide function * drop & in inline assembly * Use IS_ENABLED() instead of #ifdef * Pass DIV_V7 into lib1funcs.S instead of depending on ZIMAGE or CPU_V7 arch/arm/kernel/setup.c | 13 ++- arch/arm/lib/Makefile| 6 + arch/arm/lib/div-v7.c| 58 arch/arm/lib/lib1funcs.S | 16 + 4 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 arch/arm/lib/div-v7.c diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 0e1e2b3..f9e577a 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include @@ -365,9 +366,11 @@ void __init early_print(const char *str, ...) printk("%s", buf); } +struct static_key cpu_has_idiv = STATIC_KEY_INIT_FALSE; + static void __init cpuid_init_hwcaps(void) { - unsigned int divide_instrs, vmsa; + unsigned int divide_instrs, vmsa, idiv_mask; if (cpu_architecture() < CPU_ARCH_ARMv7) return; @@ -381,6 +384,14 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) + idiv_mask = HWCAP_IDIVT; + else + idiv_mask = HWCAP_IDIVA; + + if (elf_hwcap & idiv_mask) + static_key_slow_inc(_has_idiv); + /* LPAE implies atomic ldrd/strd instructions */ vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) & 0xf) >> 0; if (vmsa >= 5) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index bd454b0..38621729 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -15,6 +15,12 @@ lib-y:= backtrace.o changebit.o csumipv6.o csumpartial.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ call_with_stack.o +lib-$(CONFIG_CPU_V7) += div-v7.o +CFLAGS_div-v7.o := -march=armv7-a +ifeq ($(CONFIG_CPU_V7),y) + AFLAGS_lib1funcs.o := -DDIV_V7 +endif + mmu-y := clear_user.o copy_page.o getuser.o putuser.o # the code in uaccess.S is not preemption safe and diff --git a/arch/arm/lib/div-v7.c b/arch/arm/lib/div-v7.c new file mode 100644 index 000..e20945a --- /dev/null +++ b/arch/arm/lib/div-v7.c @@ -0,0 +1,58 @@ +/* Copyright (c) 2013, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include + +extern int ___aeabi_idiv(int, int); +extern unsigned ___aeabi_uidiv(int, int); + +extern struct static_key cpu_has_idiv; + +int __aeabi_idiv(int numerator, int denominator) +{ + if (static_key_false(_has_idiv)) { + int ret; + + asm volatile ( + ".arch_extension idiv\n" + "sdiv %0, %1, %2" + : "=r" (ret) + : "r" (numerator), "r" (denominator)); + + return ret; + } + + return ___aeabi_idiv(numerator, denominator); +} + +int __divsi3(int numerator, int denominator) + __attribute__((alias("__aeabi_idiv"))); + +unsigned __aeabi_uidiv(unsigned numerator, unsigned denominator) +{ + if (static_key_false(_has_idiv)) { + unsigned ret; + + asm volatile ( + ".arch_extension idiv\n" + "udiv %0, %1, %2" + : "=r" (ret) + : "r" (numerator), "r" (denominator)); + + return ret; + } + + return ___aeabi_uidiv(numerator, denominator); +} + +unsigned __udivsi3(unsigned numerator, unsigned denominator) + __attribute__((alias("__aeabi_uidiv"))); diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S index c562f64..82bbcc7 100644 --- a/arch/arm/lib/lib1funcs.S +++ b/arch/arm/lib/lib1funcs.S @@ -205,8 +205,12 @@ Boston, MA 02111-1307, USA. */ .endm +#ifdef DIV_V7 +ENTRY(___aeabi_uidiv) +#else ENTRY(__udivsi3)
[PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Signed-off-by: Stephen Boyd sb...@codeaurora.org --- Changes since v1: * Replace signed with unsigned in unsigned divide function * drop in inline assembly * Use IS_ENABLED() instead of #ifdef * Pass DIV_V7 into lib1funcs.S instead of depending on ZIMAGE or CPU_V7 arch/arm/kernel/setup.c | 13 ++- arch/arm/lib/Makefile| 6 + arch/arm/lib/div-v7.c| 58 arch/arm/lib/lib1funcs.S | 16 + 4 files changed, 92 insertions(+), 1 deletion(-) create mode 100644 arch/arm/lib/div-v7.c diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 0e1e2b3..f9e577a 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -30,6 +30,7 @@ #include linux/bug.h #include linux/compiler.h #include linux/sort.h +#include linux/static_key.h #include asm/unified.h #include asm/cp15.h @@ -365,9 +366,11 @@ void __init early_print(const char *str, ...) printk(%s, buf); } +struct static_key cpu_has_idiv = STATIC_KEY_INIT_FALSE; + static void __init cpuid_init_hwcaps(void) { - unsigned int divide_instrs, vmsa; + unsigned int divide_instrs, vmsa, idiv_mask; if (cpu_architecture() CPU_ARCH_ARMv7) return; @@ -381,6 +384,14 @@ static void __init cpuid_init_hwcaps(void) elf_hwcap |= HWCAP_IDIVT; } + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) + idiv_mask = HWCAP_IDIVT; + else + idiv_mask = HWCAP_IDIVA; + + if (elf_hwcap idiv_mask) + static_key_slow_inc(cpu_has_idiv); + /* LPAE implies atomic ldrd/strd instructions */ vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) 0xf) 0; if (vmsa = 5) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index bd454b0..38621729 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -15,6 +15,12 @@ lib-y:= backtrace.o changebit.o csumipv6.o csumpartial.o \ io-readsb.o io-writesb.o io-readsl.o io-writesl.o \ call_with_stack.o +lib-$(CONFIG_CPU_V7) += div-v7.o +CFLAGS_div-v7.o := -march=armv7-a +ifeq ($(CONFIG_CPU_V7),y) + AFLAGS_lib1funcs.o := -DDIV_V7 +endif + mmu-y := clear_user.o copy_page.o getuser.o putuser.o # the code in uaccess.S is not preemption safe and diff --git a/arch/arm/lib/div-v7.c b/arch/arm/lib/div-v7.c new file mode 100644 index 000..e20945a --- /dev/null +++ b/arch/arm/lib/div-v7.c @@ -0,0 +1,58 @@ +/* Copyright (c) 2013, The Linux Foundation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 and + * only version 2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/static_key.h + +extern int ___aeabi_idiv(int, int); +extern unsigned ___aeabi_uidiv(int, int); + +extern struct static_key cpu_has_idiv; + +int __aeabi_idiv(int numerator, int denominator) +{ + if (static_key_false(cpu_has_idiv)) { + int ret; + + asm volatile ( + .arch_extension idiv\n + sdiv %0, %1, %2 + : =r (ret) + : r (numerator), r (denominator)); + + return ret; + } + + return ___aeabi_idiv(numerator, denominator); +} + +int __divsi3(int numerator, int denominator) + __attribute__((alias(__aeabi_idiv))); + +unsigned __aeabi_uidiv(unsigned numerator, unsigned denominator) +{ + if (static_key_false(cpu_has_idiv)) { + unsigned ret; + + asm volatile ( + .arch_extension idiv\n + udiv %0, %1, %2 + : =r (ret) + : r (numerator), r (denominator)); + + return ret; + } + + return ___aeabi_uidiv(numerator, denominator); +} + +unsigned __udivsi3(unsigned numerator, unsigned denominator) + __attribute__((alias(__aeabi_uidiv))); diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S index c562f64..82bbcc7 100644 --- a/arch/arm/lib/lib1funcs.S +++ b/arch/arm/lib/lib1funcs.S @@ -205,8 +205,12 @@ Boston, MA 02111-1307, USA.
Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd sb...@codeaurora.org wrote: If we're running on a v7 ARM CPU, detect if the CPU supports the sdiv/udiv instructions and replace the signed and unsigned division library functions with an sdiv/udiv instruction. Running the perf messaging benchmark in pipe mode $ perf bench sched messaging -p shows a modest improvement on my v7 CPU. before: (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) / 10 = 5.805 after: (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) / 10 = 5.538 (5.805 - 5.538) / 5.805 = 4.6% Even with the change to the output constraint suggested by Mans, you get absolutely identical benchmark results? There's a lot of variance in any case.. BTW has there been any evaluation of the penalty for the extra branching, or the performance hit for the ARMv7-without-division cases? Ta, Matt Sealey n...@bakuhatsu.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/