Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Tue, 12 Nov 2013, Måns Rullgård wrote:
>
>> Nicolas Pitre  writes:
>> 
>> > On Tue, 12 Nov 2013, Ben Dooks wrote:
>> >
>> >> Given these are single instructoins for ARM, is it possible we could
>> >> make a table of all the callers and fix them up when we initialise
>> >> as we do for the SMP/UP case and for page-offset?
>> >
>> > Not really.  Calls to those functions are generated by the compiler 
>> > implicitly when a divisor operand is used and therefore we cannot 
>> > annotate those calls.  We'd have to use special accessors everywhere to 
>> > replace the standard division operand (like we do for 64 by 32 bit 
>> > divisions) but I doubt that people would accept that.
>> 
>> It might be possible to extract this information from relocation tables.
>
> True, but only for individual .o files.  Once the linker puts them 
> together the information is lost, and trying to infer what the linker 
> has done is insane.
>
> Filtering the compiler output to annotate idiv calls before it is 
> assembled would probably be a better solution.

OK, here's an extremely ugly hootenanny of a patch.  It seems to work on
an A7 Cubieboard2.  I would never suggest actually doing this, but maybe
it can be useful for comparing performance against the more palatable
solutions.

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 7397db6..cf1cd30 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -113,7 +113,7 @@ endif
 endif
 
 # Need -Uarm for gcc < 3.x
-KBUILD_CFLAGS  +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call 
cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float 
-Uarm
+KBUILD_CFLAGS  +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call 
cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float 
-Uarm -include asm/divhack.h
 KBUILD_AFLAGS  +=$(CFLAGS_ABI) $(AFLAGS_THUMB2) $(arch-y) $(tune-y) -include 
asm/unified.h -msoft-float
 
 CHECKFLAGS += -D__arm__
diff --git a/arch/arm/include/asm/divhack.h b/arch/arm/include/asm/divhack.h
new file mode 100644
index 000..c750b78
--- /dev/null
+++ b/arch/arm/include/asm/divhack.h
@@ -0,0 +1,23 @@
+__asm__ (".macro dobl tgt   \n"
+ ".ifc \\tgt, __aeabi_idiv  \n"
+ ".L.sdiv.\\@:  \n"
+ ".pushsection .sdiv_tab.init, \"a\", %progbits \n"
+ ".word.L.sdiv.\\@  \n"
+ ".popsection   \n"
+ ".endif\n"
+ ".ifc \\tgt, __aeabi_uidiv \n"
+ ".L.udiv.\\@:  \n"
+ ".pushsection .udiv_tab.init, \"a\", %progbits \n"
+ ".word.L.udiv.\\@  \n"
+ ".popsection   \n"
+ ".endif\n"
+ "bl \\tgt  \n"
+ ".endm \n"
+ ".macro defbl  \n"
+ ".macro bl tgt \n"
+ ".purgem bl\n"
+ "dobl \\tgt\n"
+ "defbl \n"
+ ".endm \n"
+ ".endm \n"
+ "defbl \n");
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 067815c1..b3a3fe1 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -375,6 +375,18 @@ static void __init cpuid_init_hwcaps(void)
case 1:
elf_hwcap |= HWCAP_IDIVT;
}
+
+   if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVA)) {
+   extern u32 __sdiv_tab_start, __sdiv_tab_end;
+   extern u32 __udiv_tab_start, __udiv_tab_end;
+   u32 *div;
+
+   for (div = &__sdiv_tab_start; div < &__sdiv_tab_end; div++)
+   *(u32 *)*div = 0xe710f110;
+
+   for (div = &__udiv_tab_start; div < &__udiv_tab_end; div++)
+   *(u32 *)*div = 0xe730f110;
+   }
 }
 
 static void __init feat_v6_fixup(void)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 43a31fb..3d5c103 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -176,6 +176,8 @@ SECTIONS
CON_INITCALL
SECURITY_INITCALL
INIT_RAM_FS
+   __sdiv_tab_start = .; *(.sdiv_tab.init); __sdiv_tab_end = .;
+   

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Nicolas Pitre wrote:

> On Tue, 12 Nov 2013, Måns Rullgård wrote:
> 
> > It might be possible to extract this information from relocation tables.
> 
> True, but only for individual .o files.  Once the linker puts them 
> together the information is lost, and trying to infer what the linker 
> has done is insane.
> 
> Filtering the compiler output to annotate idiv calls before it is 
> assembled would probably be a better solution.

Another solution is to patch the call site from within __aeabi_idiv at 
run time using lr.  That wouldn't work in the presence of tail call 
optimization though.

Again this might not be worth it and patching __aeabi_idiv instead might 
be a good enough compromize.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Måns Rullgård wrote:

> Nicolas Pitre  writes:
> 
> > On Tue, 12 Nov 2013, Ben Dooks wrote:
> >
> >> Given these are single instructoins for ARM, is it possible we could
> >> make a table of all the callers and fix them up when we initialise
> >> as we do for the SMP/UP case and for page-offset?
> >
> > Not really.  Calls to those functions are generated by the compiler 
> > implicitly when a divisor operand is used and therefore we cannot 
> > annotate those calls.  We'd have to use special accessors everywhere to 
> > replace the standard division operand (like we do for 64 by 32 bit 
> > divisions) but I doubt that people would accept that.
> 
> It might be possible to extract this information from relocation tables.

True, but only for individual .o files.  Once the linker puts them 
together the information is lost, and trying to infer what the linker 
has done is insane.

Filtering the compiler output to annotate idiv calls before it is 
assembled would probably be a better solution.

Is it worth it?  I'm not sure.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Måns Rullgård wrote:

> Nicolas Pitre  writes:
> 
> > What about this patch which I think is currently your best option.  Note 
> > it would need to use the facilities from asm/opcodes.h to make it endian 
> > agnostic.
> >
> > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> > index 6a1b8a81b1..379cffe4ab 100644
> > --- a/arch/arm/kernel/setup.c
> > +++ b/arch/arm/kernel/setup.c
> > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
> > elf_hwcap |= HWCAP_IDIVT;
> > }
> >
> > +   /*
> > +* Patch our division routines with the corresponding opcode
> > +* if the hardware supports it.
> > +*/
> > +   if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
> > +   extern char __aeabi_uidiv, __aeabi_idiv;
> 
> It would be safer to declare these as arrays of unspecified size.
> Otherwise the compiler might do evil things with what to it looks like
> out of bounds indexing.

Right.

> 
> There should also be some cache maintenance after this patching, or is
> that already happening for some other reason?

This is so early during boot that the MMU isn't even fully initialized 
yet.  The cache will be flushed.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Tue, 12 Nov 2013, Ben Dooks wrote:
>
>> Given these are single instructoins for ARM, is it possible we could
>> make a table of all the callers and fix them up when we initialise
>> as we do for the SMP/UP case and for page-offset?
>
> Not really.  Calls to those functions are generated by the compiler 
> implicitly when a divisor operand is used and therefore we cannot 
> annotate those calls.  We'd have to use special accessors everywhere to 
> replace the standard division operand (like we do for 64 by 32 bit 
> divisions) but I doubt that people would accept that.

It might be possible to extract this information from relocation tables.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Ben Dooks wrote:

> Given these are single instructoins for ARM, is it possible we could
> make a table of all the callers and fix them up when we initialise
> as we do for the SMP/UP case and for page-offset?

Not really.  Calls to those functions are generated by the compiler 
implicitly when a divisor operand is used and therefore we cannot 
annotate those calls.  We'd have to use special accessors everywhere to 
replace the standard division operand (like we do for 64 by 32 bit 
divisions) but I doubt that people would accept that.

You cannot just scan the binary for the appropriate branch opcode either 
as you may turn up false positives in literal pools.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre  writes:

> What about this patch which I think is currently your best option.  Note 
> it would need to use the facilities from asm/opcodes.h to make it endian 
> agnostic.
>
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 6a1b8a81b1..379cffe4ab 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
>   elf_hwcap |= HWCAP_IDIVT;
>   }
>
> + /*
> +  * Patch our division routines with the corresponding opcode
> +  * if the hardware supports it.
> +  */
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
> + extern char __aeabi_uidiv, __aeabi_idiv;

It would be safer to declare these as arrays of unspecified size.
Otherwise the compiler might do evil things with what to it looks like
out of bounds indexing.

There should also be some cache maintenance after this patching, or is
that already happening for some other reason?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Ben Dooks

On 12/11/13 14:04, Russell King - ARM Linux wrote:

On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:

What about this patch which I think is currently your best option.  Note
it would need to use the facilities from asm/opcodes.h to make it endian
agnostic.

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6a1b8a81b1..379cffe4ab 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}

+   /*
+* Patch our division routines with the corresponding opcode
+* if the hardware supports it.
+*/
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u16 *uidiv = (u16 *)&__aeabi_uidiv;
+   u16 *idiv = (u16 *)&__aeabi_idiv;
+
+   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xf0f1;
+   uidiv[2] = 0x4770;  /* bx lr */
+
+   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xf0f1;
+   idiv[2] = 0x4770;  /* bx lr */
+   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & 
HWCAP_IDIVA)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u32 *uidiv = (u32 *)&__aeabi_uidiv;
+   u32 *idiv = (u32 *)&__aeabi_idiv;
+
+   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xe12fff1e;  /* bx lr */
+
+   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xe12fff1e;  /* bx lr */
+   }


What about endianness, and what if XIP is enabled?


I was also going to add a note about endian-ness.

Given these are single instructoins for ARM, is it possible we could
make a table of all the callers and fix them up when we initialise
as we do for the SMP/UP case and for page-offset?


--
Ben Dooks   http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Russell King - ARM Linux wrote:

> On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:
> > What about this patch which I think is currently your best option.  Note 
> > it would need to use the facilities from asm/opcodes.h to make it endian 
> > agnostic.
> > 
> > diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> > index 6a1b8a81b1..379cffe4ab 100644
> > --- a/arch/arm/kernel/setup.c
> > +++ b/arch/arm/kernel/setup.c
> > @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
> > elf_hwcap |= HWCAP_IDIVT;
> > }
> >  
> > +   /*
> > +* Patch our division routines with the corresponding opcode
> > +* if the hardware supports it.
> > +*/
> > +   if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
> > +   extern char __aeabi_uidiv, __aeabi_idiv;
> > +   u16 *uidiv = (u16 *)&__aeabi_uidiv;
> > +   u16 *idiv = (u16 *)&__aeabi_idiv;
> > +
> > +   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
> > +   uidiv[1] = 0xf0f1;
> > +   uidiv[2] = 0x4770;  /* bx lr */
> > +
> > +   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
> > +   idiv[1] = 0xf0f1;  
> > +   idiv[2] = 0x4770;  /* bx lr */
> > +   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & 
> > HWCAP_IDIVA)) {
> > +   extern char __aeabi_uidiv, __aeabi_idiv;
> > +   u32 *uidiv = (u32 *)&__aeabi_uidiv;
> > +   u32 *idiv = (u32 *)&__aeabi_idiv;
> > +
> > +   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
> > +   uidiv[1] = 0xe12fff1e;  /* bx lr */
> > +
> > +   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
> > +   idiv[1] = 0xe12fff1e;  /* bx lr */
> > +   }
> 
> What about endianness, and what if XIP is enabled?

Just as I said above the diff: this needs refined.

Obviously XIP can't use this and doesn't need it either as a XIP kernel 
should be optimized for the very platform it will run onto i.e. gcc 
should already emit those div opcodes inline if appropriate.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Russell King - ARM Linux
On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:
> What about this patch which I think is currently your best option.  Note 
> it would need to use the facilities from asm/opcodes.h to make it endian 
> agnostic.
> 
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 6a1b8a81b1..379cffe4ab 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
>   elf_hwcap |= HWCAP_IDIVT;
>   }
>  
> + /*
> +  * Patch our division routines with the corresponding opcode
> +  * if the hardware supports it.
> +  */
> + if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
> + extern char __aeabi_uidiv, __aeabi_idiv;
> + u16 *uidiv = (u16 *)&__aeabi_uidiv;
> + u16 *idiv = (u16 *)&__aeabi_idiv;
> +
> + uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
> + uidiv[1] = 0xf0f1;
> + uidiv[2] = 0x4770;  /* bx lr */
> +
> + idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
> + idiv[1] = 0xf0f1;  
> + idiv[2] = 0x4770;  /* bx lr */
> + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & 
> HWCAP_IDIVA)) {
> + extern char __aeabi_uidiv, __aeabi_idiv;
> + u32 *uidiv = (u32 *)&__aeabi_uidiv;
> + u32 *idiv = (u32 *)&__aeabi_idiv;
> +
> + uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
> + uidiv[1] = 0xe12fff1e;  /* bx lr */
> +
> + idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
> + idiv[1] = 0xe12fff1e;  /* bx lr */
> + }

What about endianness, and what if XIP is enabled?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Mon, 11 Nov 2013, Stephen Boyd wrote:

> On 11/09/13 21:03, Nicolas Pitre wrote:
> > Bah. NAK. We are doing runtime patching of the kernel for many
> > many things already. So why not do the same here?
> 
> static keys are a form of runtime patching, albeit not as extreme as
> you're suggesting.
> 
> >
> > The obvious strategy is to simply overwrite the start of the existing 
> > __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes.  
> >
> > Similarly for the unsigned case.
> 
> I was thinking the same thing when I wrote this, but I didn't know how
> to tell the compiler to either inline this function or to let me inilne
> an assembly stub with some section magic.
> 
> >
> > That let you test the hardware capability only once during boot instead 
> > of everytime a divide operation is performed.
> 
> The test for hardware capability really isn't done more than once during
> boot. The assembly is like so at compile time
> 
>  <__aeabi_idiv>:
>0:   nop {0}
>4:   b   0 <___aeabi_idiv>
>8:   sdivr0, r0, r1
>c:   bx  lr
> 
> and after we test and find support for the instruction it will be
> replaced with
> 
>  <__aeabi_idiv>:
>0:   b   8
>4:   b   0 <___aeabi_idiv>
>8:   sdivr0, r0, r1
>c:   bx  lr
> 
> Unfortunately we still have to jump to this function. It would be great
> if we could inline this function at the call site but as I already said
> I don't know how to do that.

What about this patch which I think is currently your best option.  Note 
it would need to use the facilities from asm/opcodes.h to make it endian 
agnostic.

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6a1b8a81b1..379cffe4ab 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}
 
+   /*
+* Patch our division routines with the corresponding opcode
+* if the hardware supports it.
+*/
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & HWCAP_IDIVT)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u16 *uidiv = (u16 *)&__aeabi_uidiv;
+   u16 *idiv = (u16 *)&__aeabi_idiv;
+
+   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xf0f1;
+   uidiv[2] = 0x4770;  /* bx lr */
+
+   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xf0f1;  
+   idiv[2] = 0x4770;  /* bx lr */
+   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL) && (elf_hwcap & 
HWCAP_IDIVA)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u32 *uidiv = (u32 *)&__aeabi_uidiv;
+   u32 *idiv = (u32 *)&__aeabi_idiv;
+
+   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xe12fff1e;  /* bx lr */
+
+   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xe12fff1e;  /* bx lr */
+   }
+
/* LPAE implies atomic ldrd/strd instructions */
vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) & 0xf) >> 0;
if (vmsa >= 5)


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Stephen Boyd  writes:

> On 11/09/13 21:03, Nicolas Pitre wrote:
>> Bah. NAK. We are doing runtime patching of the kernel for many
>> many things already. So why not do the same here?
>
> static keys are a form of runtime patching, albeit not as extreme as
> you're suggesting.
>
>>
>> The obvious strategy is to simply overwrite the start of the existing 
>> __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes.  
>>
>> Similarly for the unsigned case.
>
> I was thinking the same thing when I wrote this, but I didn't know how
> to tell the compiler to either inline this function or to let me inilne
> an assembly stub with some section magic.
>
>>
>> That let you test the hardware capability only once during boot instead 
>> of everytime a divide operation is performed.
>
> The test for hardware capability really isn't done more than once during
> boot. The assembly is like so at compile time
>
>  <__aeabi_idiv>:
>0:   nop {0}
>4:   b   0 <___aeabi_idiv>
>8:   sdivr0, r0, r1
>c:   bx  lr
>
> and after we test and find support for the instruction it will be
> replaced with
>
>  <__aeabi_idiv>:
>0:   b   8
>4:   b   0 <___aeabi_idiv>
>8:   sdivr0, r0, r1
>c:   bx  lr
>
> Unfortunately we still have to jump to this function. It would be great
> if we could inline this function at the call site but as I already said
> I don't know how to do that.

Ideally the bl instruction at the call site would be patched over with
sdiv/udiv when supported.  This would leave things exactly as they are
for hardware without div capability and incur only the call setup cost
(but no actual call) on div-capable hardware.  No, I don't know how to
achieve this.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Stephen Boyd sb...@codeaurora.org writes:

 On 11/09/13 21:03, Nicolas Pitre wrote:
 Bah. NAK. We are doing runtime patching of the kernel for many
 many things already. So why not do the same here?

 static keys are a form of runtime patching, albeit not as extreme as
 you're suggesting.


 The obvious strategy is to simply overwrite the start of the existing 
 __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes.  

 Similarly for the unsigned case.

 I was thinking the same thing when I wrote this, but I didn't know how
 to tell the compiler to either inline this function or to let me inilne
 an assembly stub with some section magic.


 That let you test the hardware capability only once during boot instead 
 of everytime a divide operation is performed.

 The test for hardware capability really isn't done more than once during
 boot. The assembly is like so at compile time

  __aeabi_idiv:
0:   nop {0}
4:   b   0 ___aeabi_idiv
8:   sdivr0, r0, r1
c:   bx  lr

 and after we test and find support for the instruction it will be
 replaced with

  __aeabi_idiv:
0:   b   8
4:   b   0 ___aeabi_idiv
8:   sdivr0, r0, r1
c:   bx  lr

 Unfortunately we still have to jump to this function. It would be great
 if we could inline this function at the call site but as I already said
 I don't know how to do that.

Ideally the bl instruction at the call site would be patched over with
sdiv/udiv when supported.  This would leave things exactly as they are
for hardware without div capability and incur only the call setup cost
(but no actual call) on div-capable hardware.  No, I don't know how to
achieve this.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Mon, 11 Nov 2013, Stephen Boyd wrote:

 On 11/09/13 21:03, Nicolas Pitre wrote:
  Bah. NAK. We are doing runtime patching of the kernel for many
  many things already. So why not do the same here?
 
 static keys are a form of runtime patching, albeit not as extreme as
 you're suggesting.
 
 
  The obvious strategy is to simply overwrite the start of the existing 
  __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes.  
 
  Similarly for the unsigned case.
 
 I was thinking the same thing when I wrote this, but I didn't know how
 to tell the compiler to either inline this function or to let me inilne
 an assembly stub with some section magic.
 
 
  That let you test the hardware capability only once during boot instead 
  of everytime a divide operation is performed.
 
 The test for hardware capability really isn't done more than once during
 boot. The assembly is like so at compile time
 
  __aeabi_idiv:
0:   nop {0}
4:   b   0 ___aeabi_idiv
8:   sdivr0, r0, r1
c:   bx  lr
 
 and after we test and find support for the instruction it will be
 replaced with
 
  __aeabi_idiv:
0:   b   8
4:   b   0 ___aeabi_idiv
8:   sdivr0, r0, r1
c:   bx  lr
 
 Unfortunately we still have to jump to this function. It would be great
 if we could inline this function at the call site but as I already said
 I don't know how to do that.

What about this patch which I think is currently your best option.  Note 
it would need to use the facilities from asm/opcodes.h to make it endian 
agnostic.

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6a1b8a81b1..379cffe4ab 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}
 
+   /*
+* Patch our division routines with the corresponding opcode
+* if the hardware supports it.
+*/
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u16 *uidiv = (u16 *)__aeabi_uidiv;
+   u16 *idiv = (u16 *)__aeabi_idiv;
+
+   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xf0f1;
+   uidiv[2] = 0x4770;  /* bx lr */
+
+   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xf0f1;  
+   idiv[2] = 0x4770;  /* bx lr */
+   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  
HWCAP_IDIVA)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u32 *uidiv = (u32 *)__aeabi_uidiv;
+   u32 *idiv = (u32 *)__aeabi_idiv;
+
+   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xe12fff1e;  /* bx lr */
+
+   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xe12fff1e;  /* bx lr */
+   }
+
/* LPAE implies atomic ldrd/strd instructions */
vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0)  0xf)  0;
if (vmsa = 5)


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Russell King - ARM Linux
On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:
 What about this patch which I think is currently your best option.  Note 
 it would need to use the facilities from asm/opcodes.h to make it endian 
 agnostic.
 
 diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
 index 6a1b8a81b1..379cffe4ab 100644
 --- a/arch/arm/kernel/setup.c
 +++ b/arch/arm/kernel/setup.c
 @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
   elf_hwcap |= HWCAP_IDIVT;
   }
  
 + /*
 +  * Patch our division routines with the corresponding opcode
 +  * if the hardware supports it.
 +  */
 + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
 + extern char __aeabi_uidiv, __aeabi_idiv;
 + u16 *uidiv = (u16 *)__aeabi_uidiv;
 + u16 *idiv = (u16 *)__aeabi_idiv;
 +
 + uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
 + uidiv[1] = 0xf0f1;
 + uidiv[2] = 0x4770;  /* bx lr */
 +
 + idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
 + idiv[1] = 0xf0f1;  
 + idiv[2] = 0x4770;  /* bx lr */
 + } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  
 HWCAP_IDIVA)) {
 + extern char __aeabi_uidiv, __aeabi_idiv;
 + u32 *uidiv = (u32 *)__aeabi_uidiv;
 + u32 *idiv = (u32 *)__aeabi_idiv;
 +
 + uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
 + uidiv[1] = 0xe12fff1e;  /* bx lr */
 +
 + idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
 + idiv[1] = 0xe12fff1e;  /* bx lr */
 + }

What about endianness, and what if XIP is enabled?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Ben Dooks

On 12/11/13 14:04, Russell King - ARM Linux wrote:

On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:

What about this patch which I think is currently your best option.  Note
it would need to use the facilities from asm/opcodes.h to make it endian
agnostic.

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 6a1b8a81b1..379cffe4ab 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}

+   /*
+* Patch our division routines with the corresponding opcode
+* if the hardware supports it.
+*/
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u16 *uidiv = (u16 *)__aeabi_uidiv;
+   u16 *idiv = (u16 *)__aeabi_idiv;
+
+   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xf0f1;
+   uidiv[2] = 0x4770;  /* bx lr */
+
+   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xf0f1;
+   idiv[2] = 0x4770;  /* bx lr */
+   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  
HWCAP_IDIVA)) {
+   extern char __aeabi_uidiv, __aeabi_idiv;
+   u32 *uidiv = (u32 *)__aeabi_uidiv;
+   u32 *idiv = (u32 *)__aeabi_idiv;
+
+   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
+   uidiv[1] = 0xe12fff1e;  /* bx lr */
+
+   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
+   idiv[1] = 0xe12fff1e;  /* bx lr */
+   }


What about endianness, and what if XIP is enabled?


I was also going to add a note about endian-ness.

Given these are single instructoins for ARM, is it possible we could
make a table of all the callers and fix them up when we initialise
as we do for the SMP/UP case and for page-offset?


--
Ben Dooks   http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Russell King - ARM Linux wrote:

 On Tue, Nov 12, 2013 at 09:01:16AM -0500, Nicolas Pitre wrote:
  What about this patch which I think is currently your best option.  Note 
  it would need to use the facilities from asm/opcodes.h to make it endian 
  agnostic.
  
  diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
  index 6a1b8a81b1..379cffe4ab 100644
  --- a/arch/arm/kernel/setup.c
  +++ b/arch/arm/kernel/setup.c
  @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
  elf_hwcap |= HWCAP_IDIVT;
  }
   
  +   /*
  +* Patch our division routines with the corresponding opcode
  +* if the hardware supports it.
  +*/
  +   if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
  +   extern char __aeabi_uidiv, __aeabi_idiv;
  +   u16 *uidiv = (u16 *)__aeabi_uidiv;
  +   u16 *idiv = (u16 *)__aeabi_idiv;
  +
  +   uidiv[0] = 0xfbb0;  /* udiv r0, r0, r1 */
  +   uidiv[1] = 0xf0f1;
  +   uidiv[2] = 0x4770;  /* bx lr */
  +
  +   idiv[0] = 0xfb90;  /* sdiv r0, r0, r1 */
  +   idiv[1] = 0xf0f1;  
  +   idiv[2] = 0x4770;  /* bx lr */
  +   } else if (!IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  
  HWCAP_IDIVA)) {
  +   extern char __aeabi_uidiv, __aeabi_idiv;
  +   u32 *uidiv = (u32 *)__aeabi_uidiv;
  +   u32 *idiv = (u32 *)__aeabi_idiv;
  +
  +   uidiv[0] = 0xe730f110;  /* udiv r0, r0, r1 */
  +   uidiv[1] = 0xe12fff1e;  /* bx lr */
  +
  +   idiv[0] = 0xe710f110;  /* sdiv r0, r0, r1 */
  +   idiv[1] = 0xe12fff1e;  /* bx lr */
  +   }
 
 What about endianness, and what if XIP is enabled?

Just as I said above the diff: this needs refined.

Obviously XIP can't use this and doesn't need it either as a XIP kernel 
should be optimized for the very platform it will run onto i.e. gcc 
should already emit those div opcodes inline if appropriate.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre nicolas.pi...@linaro.org writes:

 What about this patch which I think is currently your best option.  Note 
 it would need to use the facilities from asm/opcodes.h to make it endian 
 agnostic.

 diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
 index 6a1b8a81b1..379cffe4ab 100644
 --- a/arch/arm/kernel/setup.c
 +++ b/arch/arm/kernel/setup.c
 @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
   elf_hwcap |= HWCAP_IDIVT;
   }

 + /*
 +  * Patch our division routines with the corresponding opcode
 +  * if the hardware supports it.
 +  */
 + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
 + extern char __aeabi_uidiv, __aeabi_idiv;

It would be safer to declare these as arrays of unspecified size.
Otherwise the compiler might do evil things with what to it looks like
out of bounds indexing.

There should also be some cache maintenance after this patching, or is
that already happening for some other reason?

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Ben Dooks wrote:

 Given these are single instructoins for ARM, is it possible we could
 make a table of all the callers and fix them up when we initialise
 as we do for the SMP/UP case and for page-offset?

Not really.  Calls to those functions are generated by the compiler 
implicitly when a divisor operand is used and therefore we cannot 
annotate those calls.  We'd have to use special accessors everywhere to 
replace the standard division operand (like we do for 64 by 32 bit 
divisions) but I doubt that people would accept that.

You cannot just scan the binary for the appropriate branch opcode either 
as you may turn up false positives in literal pools.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre nicolas.pi...@linaro.org writes:

 On Tue, 12 Nov 2013, Ben Dooks wrote:

 Given these are single instructoins for ARM, is it possible we could
 make a table of all the callers and fix them up when we initialise
 as we do for the SMP/UP case and for page-offset?

 Not really.  Calls to those functions are generated by the compiler 
 implicitly when a divisor operand is used and therefore we cannot 
 annotate those calls.  We'd have to use special accessors everywhere to 
 replace the standard division operand (like we do for 64 by 32 bit 
 divisions) but I doubt that people would accept that.

It might be possible to extract this information from relocation tables.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Måns Rullgård wrote:

 Nicolas Pitre nicolas.pi...@linaro.org writes:
 
  What about this patch which I think is currently your best option.  Note 
  it would need to use the facilities from asm/opcodes.h to make it endian 
  agnostic.
 
  diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
  index 6a1b8a81b1..379cffe4ab 100644
  --- a/arch/arm/kernel/setup.c
  +++ b/arch/arm/kernel/setup.c
  @@ -383,6 +383,34 @@ static void __init cpuid_init_hwcaps(void)
  elf_hwcap |= HWCAP_IDIVT;
  }
 
  +   /*
  +* Patch our division routines with the corresponding opcode
  +* if the hardware supports it.
  +*/
  +   if (IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVT)) {
  +   extern char __aeabi_uidiv, __aeabi_idiv;
 
 It would be safer to declare these as arrays of unspecified size.
 Otherwise the compiler might do evil things with what to it looks like
 out of bounds indexing.

Right.

 
 There should also be some cache maintenance after this patching, or is
 that already happening for some other reason?

This is so early during boot that the MMU isn't even fully initialized 
yet.  The cache will be flushed.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Måns Rullgård wrote:

 Nicolas Pitre nicolas.pi...@linaro.org writes:
 
  On Tue, 12 Nov 2013, Ben Dooks wrote:
 
  Given these are single instructoins for ARM, is it possible we could
  make a table of all the callers and fix them up when we initialise
  as we do for the SMP/UP case and for page-offset?
 
  Not really.  Calls to those functions are generated by the compiler 
  implicitly when a divisor operand is used and therefore we cannot 
  annotate those calls.  We'd have to use special accessors everywhere to 
  replace the standard division operand (like we do for 64 by 32 bit 
  divisions) but I doubt that people would accept that.
 
 It might be possible to extract this information from relocation tables.

True, but only for individual .o files.  Once the linker puts them 
together the information is lost, and trying to infer what the linker 
has done is insane.

Filtering the compiler output to annotate idiv calls before it is 
assembled would probably be a better solution.

Is it worth it?  I'm not sure.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Nicolas Pitre
On Tue, 12 Nov 2013, Nicolas Pitre wrote:

 On Tue, 12 Nov 2013, Måns Rullgård wrote:
 
  It might be possible to extract this information from relocation tables.
 
 True, but only for individual .o files.  Once the linker puts them 
 together the information is lost, and trying to infer what the linker 
 has done is insane.
 
 Filtering the compiler output to annotate idiv calls before it is 
 assembled would probably be a better solution.

Another solution is to patch the call site from within __aeabi_idiv at 
run time using lr.  That wouldn't work in the presence of tail call 
optimization though.

Again this might not be worth it and patching __aeabi_idiv instead might 
be a good enough compromize.


Nicolas

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-12 Thread Måns Rullgård
Nicolas Pitre nicolas.pi...@linaro.org writes:

 On Tue, 12 Nov 2013, Måns Rullgård wrote:

 Nicolas Pitre nicolas.pi...@linaro.org writes:
 
  On Tue, 12 Nov 2013, Ben Dooks wrote:
 
  Given these are single instructoins for ARM, is it possible we could
  make a table of all the callers and fix them up when we initialise
  as we do for the SMP/UP case and for page-offset?
 
  Not really.  Calls to those functions are generated by the compiler 
  implicitly when a divisor operand is used and therefore we cannot 
  annotate those calls.  We'd have to use special accessors everywhere to 
  replace the standard division operand (like we do for 64 by 32 bit 
  divisions) but I doubt that people would accept that.
 
 It might be possible to extract this information from relocation tables.

 True, but only for individual .o files.  Once the linker puts them 
 together the information is lost, and trying to infer what the linker 
 has done is insane.

 Filtering the compiler output to annotate idiv calls before it is 
 assembled would probably be a better solution.

OK, here's an extremely ugly hootenanny of a patch.  It seems to work on
an A7 Cubieboard2.  I would never suggest actually doing this, but maybe
it can be useful for comparing performance against the more palatable
solutions.

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 7397db6..cf1cd30 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -113,7 +113,7 @@ endif
 endif
 
 # Need -Uarm for gcc  3.x
-KBUILD_CFLAGS  +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call 
cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float 
-Uarm
+KBUILD_CFLAGS  +=$(CFLAGS_ABI) $(CFLAGS_THUMB2) $(arch-y) $(tune-y) $(call 
cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float 
-Uarm -include asm/divhack.h
 KBUILD_AFLAGS  +=$(CFLAGS_ABI) $(AFLAGS_THUMB2) $(arch-y) $(tune-y) -include 
asm/unified.h -msoft-float
 
 CHECKFLAGS += -D__arm__
diff --git a/arch/arm/include/asm/divhack.h b/arch/arm/include/asm/divhack.h
new file mode 100644
index 000..c750b78
--- /dev/null
+++ b/arch/arm/include/asm/divhack.h
@@ -0,0 +1,23 @@
+__asm__ (.macro dobl tgt   \n
+ .ifc \\tgt, __aeabi_idiv  \n
+ .L.sdiv.\\@:  \n
+ .pushsection .sdiv_tab.init, \a\, %progbits \n
+ .word.L.sdiv.\\@  \n
+ .popsection   \n
+ .endif\n
+ .ifc \\tgt, __aeabi_uidiv \n
+ .L.udiv.\\@:  \n
+ .pushsection .udiv_tab.init, \a\, %progbits \n
+ .word.L.udiv.\\@  \n
+ .popsection   \n
+ .endif\n
+ bl \\tgt  \n
+ .endm \n
+ .macro defbl  \n
+ .macro bl tgt \n
+ .purgem bl\n
+ dobl \\tgt\n
+ defbl \n
+ .endm \n
+ .endm \n
+ defbl \n);
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 067815c1..b3a3fe1 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -375,6 +375,18 @@ static void __init cpuid_init_hwcaps(void)
case 1:
elf_hwcap |= HWCAP_IDIVT;
}
+
+   if (!IS_ENABLED(CONFIG_THUMB2_KERNEL)  (elf_hwcap  HWCAP_IDIVA)) {
+   extern u32 __sdiv_tab_start, __sdiv_tab_end;
+   extern u32 __udiv_tab_start, __udiv_tab_end;
+   u32 *div;
+
+   for (div = __sdiv_tab_start; div  __sdiv_tab_end; div++)
+   *(u32 *)*div = 0xe710f110;
+
+   for (div = __udiv_tab_start; div  __udiv_tab_end; div++)
+   *(u32 *)*div = 0xe730f110;
+   }
 }
 
 static void __init feat_v6_fixup(void)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 43a31fb..3d5c103 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -176,6 +176,8 @@ SECTIONS
CON_INITCALL
SECURITY_INITCALL
INIT_RAM_FS
+   __sdiv_tab_start = .; *(.sdiv_tab.init); __sdiv_tab_end = .;
+   __udiv_tab_start = .; *(.udiv_tab.init); __udiv_tab_end = .;
}
 #ifndef 

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/10/13 23:46, Uwe Kleine-König wrote:
> Hello,
>
> On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote:
>> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
>> index bd454b0..38621729 100644
>> --- a/arch/arm/lib/Makefile
>> +++ b/arch/arm/lib/Makefile
>> @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o 
>> csumpartial.o   \
>> io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
>> call_with_stack.o
>>  
>> +lib-$(CONFIG_CPU_V7) += div-v7.o
> CPU_V7M could make use of that, too.
> (If you follow Nico's advice to use runtime patching I cannot test it
> for you on v7-M though as my machine has to use an XIP kernel.)
>

It already is runtime patching so I suspect you won't be able to test it
anyway. I suppose we need another config like MIGHT_HAVE_IDIV or
something that both v7 and v7M select?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/09/13 21:03, Nicolas Pitre wrote:
> Bah. NAK. We are doing runtime patching of the kernel for many
> many things already. So why not do the same here?

static keys are a form of runtime patching, albeit not as extreme as
you're suggesting.

>
> The obvious strategy is to simply overwrite the start of the existing 
> __aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes.  
>
> Similarly for the unsigned case.

I was thinking the same thing when I wrote this, but I didn't know how
to tell the compiler to either inline this function or to let me inilne
an assembly stub with some section magic.

>
> That let you test the hardware capability only once during boot instead 
> of everytime a divide operation is performed.

The test for hardware capability really isn't done more than once during
boot. The assembly is like so at compile time

 <__aeabi_idiv>:
   0:   nop {0}
   4:   b   0 <___aeabi_idiv>
   8:   sdivr0, r0, r1
   c:   bx  lr

and after we test and find support for the instruction it will be
replaced with

 <__aeabi_idiv>:
   0:   b   8
   4:   b   0 <___aeabi_idiv>
   8:   sdivr0, r0, r1
   c:   bx  lr

Unfortunately we still have to jump to this function. It would be great
if we could inline this function at the call site but as I already said
I don't know how to do that.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/08/13 22:46, Matt Sealey wrote:
> On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd  wrote:
>> If we're running on a v7 ARM CPU, detect if the CPU supports the
>> sdiv/udiv instructions and replace the signed and unsigned
>> division library functions with an sdiv/udiv instruction.
>>
>> Running the perf messaging benchmark in pipe mode
>>
>>  $ perf bench sched messaging -p
>>
>> shows a modest improvement on my v7 CPU.
>>
>> before:
>> (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
>> 5.683) / 10 = 5.805
>>
>> after:
>> (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
>> 5.093) / 10 = 5.538
>>
>> (5.805 - 5.538) / 5.805 = 4.6%
> Even with the change to the output constraint suggested by Mans, you
> get absolutely identical benchmark results? There's a lot of variance
> in any case..

Yeah sorry I didn't run the testcase again to see if numbers changed
because I assumed one less instruction would be in the noise. I agree
there is a lot of variance so if you have any better
benchmarks/testcases please let me know.

>
> BTW has there been any evaluation of the penalty for the extra
> branching, or the performance hit for the ARMv7-without-division
> cases?

I haven't done any. I'll factor that in for the next round.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/08/13 22:46, Matt Sealey wrote:
 On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd sb...@codeaurora.org wrote:
 If we're running on a v7 ARM CPU, detect if the CPU supports the
 sdiv/udiv instructions and replace the signed and unsigned
 division library functions with an sdiv/udiv instruction.

 Running the perf messaging benchmark in pipe mode

  $ perf bench sched messaging -p

 shows a modest improvement on my v7 CPU.

 before:
 (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
 5.683) / 10 = 5.805

 after:
 (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
 5.093) / 10 = 5.538

 (5.805 - 5.538) / 5.805 = 4.6%
 Even with the change to the output constraint suggested by Mans, you
 get absolutely identical benchmark results? There's a lot of variance
 in any case..

Yeah sorry I didn't run the testcase again to see if numbers changed
because I assumed one less instruction would be in the noise. I agree
there is a lot of variance so if you have any better
benchmarks/testcases please let me know.


 BTW has there been any evaluation of the penalty for the extra
 branching, or the performance hit for the ARMv7-without-division
 cases?

I haven't done any. I'll factor that in for the next round.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/09/13 21:03, Nicolas Pitre wrote:
 Bah. NAK. We are doing runtime patching of the kernel for many
 many things already. So why not do the same here?

static keys are a form of runtime patching, albeit not as extreme as
you're suggesting.


 The obvious strategy is to simply overwrite the start of the existing 
 __aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes.  

 Similarly for the unsigned case.

I was thinking the same thing when I wrote this, but I didn't know how
to tell the compiler to either inline this function or to let me inilne
an assembly stub with some section magic.


 That let you test the hardware capability only once during boot instead 
 of everytime a divide operation is performed.

The test for hardware capability really isn't done more than once during
boot. The assembly is like so at compile time

 __aeabi_idiv:
   0:   nop {0}
   4:   b   0 ___aeabi_idiv
   8:   sdivr0, r0, r1
   c:   bx  lr

and after we test and find support for the instruction it will be
replaced with

 __aeabi_idiv:
   0:   b   8
   4:   b   0 ___aeabi_idiv
   8:   sdivr0, r0, r1
   c:   bx  lr

Unfortunately we still have to jump to this function. It would be great
if we could inline this function at the call site but as I already said
I don't know how to do that.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-11 Thread Stephen Boyd
On 11/10/13 23:46, Uwe Kleine-König wrote:
 Hello,

 On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote:
 diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
 index bd454b0..38621729 100644
 --- a/arch/arm/lib/Makefile
 +++ b/arch/arm/lib/Makefile
 @@ -15,6 +15,12 @@ lib-y := backtrace.o changebit.o csumipv6.o 
 csumpartial.o   \
 io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
 call_with_stack.o
  
 +lib-$(CONFIG_CPU_V7) += div-v7.o
 CPU_V7M could make use of that, too.
 (If you follow Nico's advice to use runtime patching I cannot test it
 for you on v7-M though as my machine has to use an XIP kernel.)


It already is runtime patching so I suspect you won't be able to test it
anyway. I suppose we need another config like MIGHT_HAVE_IDIV or
something that both v7 and v7M select?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-10 Thread Uwe Kleine-König
Hello,

On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote:
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index bd454b0..38621729 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -15,6 +15,12 @@ lib-y  := backtrace.o changebit.o csumipv6.o 
> csumpartial.o   \
>  io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
>  call_with_stack.o
>  
> +lib-$(CONFIG_CPU_V7) += div-v7.o
CPU_V7M could make use of that, too.
(If you follow Nico's advice to use runtime patching I cannot test it
for you on v7-M though as my machine has to use an XIP kernel.)

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-10 Thread Uwe Kleine-König
Hello,

On Fri, Nov 08, 2013 at 03:00:32PM -0800, Stephen Boyd wrote:
 diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
 index bd454b0..38621729 100644
 --- a/arch/arm/lib/Makefile
 +++ b/arch/arm/lib/Makefile
 @@ -15,6 +15,12 @@ lib-y  := backtrace.o changebit.o csumipv6.o 
 csumpartial.o   \
  io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
  call_with_stack.o
  
 +lib-$(CONFIG_CPU_V7) += div-v7.o
CPU_V7M could make use of that, too.
(If you follow Nico's advice to use runtime patching I cannot test it
for you on v7-M though as my machine has to use an XIP kernel.)

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-09 Thread Nicolas Pitre
On Fri, 8 Nov 2013, Stephen Boyd wrote:

> If we're running on a v7 ARM CPU, detect if the CPU supports the
> sdiv/udiv instructions and replace the signed and unsigned
> division library functions with an sdiv/udiv instruction.
> 
> Running the perf messaging benchmark in pipe mode
> 
>  $ perf bench sched messaging -p
> 
> shows a modest improvement on my v7 CPU.
> 
> before:
> (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
> 5.683) / 10 = 5.805
> 
> after:
> (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
> 5.093) / 10 = 5.538
> 
> (5.805 - 5.538) / 5.805 = 4.6%
> 
> Signed-off-by: Stephen Boyd 

Bah. NAK.

We are doing runtime patching of the kernel for many many things
already.  So why not do the same here?

The obvious strategy is to simply overwrite the start of the existing 
__aeabi_idiv code with the "sdiv r0, r0, r1" and "bx lr" opcodes.  

Similarly for the unsigned case.

That let you test the hardware capability only once during boot instead 
of everytime a divide operation is performed.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-09 Thread Måns Rullgård
Matt Sealey  writes:

> BTW has there been any evaluation of the penalty for the extra
> branching, or the performance hit for the ARMv7-without-division
> cases?

The branches themselves probably have minimal overhead.  There will
however be code to preserve call-clobbered registers (and move the
values to/from r0/r1) that would not be needed if the div instructions
were done inline (obviously such a kernel could only run on hardware
with division support).

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-09 Thread Måns Rullgård
Matt Sealey n...@bakuhatsu.net writes:

 BTW has there been any evaluation of the penalty for the extra
 branching, or the performance hit for the ARMv7-without-division
 cases?

The branches themselves probably have minimal overhead.  There will
however be code to preserve call-clobbered registers (and move the
values to/from r0/r1) that would not be needed if the div instructions
were done inline (obviously such a kernel could only run on hardware
with division support).

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-09 Thread Nicolas Pitre
On Fri, 8 Nov 2013, Stephen Boyd wrote:

 If we're running on a v7 ARM CPU, detect if the CPU supports the
 sdiv/udiv instructions and replace the signed and unsigned
 division library functions with an sdiv/udiv instruction.
 
 Running the perf messaging benchmark in pipe mode
 
  $ perf bench sched messaging -p
 
 shows a modest improvement on my v7 CPU.
 
 before:
 (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
 5.683) / 10 = 5.805
 
 after:
 (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
 5.093) / 10 = 5.538
 
 (5.805 - 5.538) / 5.805 = 4.6%
 
 Signed-off-by: Stephen Boyd sb...@codeaurora.org

Bah. NAK.

We are doing runtime patching of the kernel for many many things
already.  So why not do the same here?

The obvious strategy is to simply overwrite the start of the existing 
__aeabi_idiv code with the sdiv r0, r0, r1 and bx lr opcodes.  

Similarly for the unsigned case.

That let you test the hardware capability only once during boot instead 
of everytime a divide operation is performed.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Matt Sealey
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd  wrote:
> If we're running on a v7 ARM CPU, detect if the CPU supports the
> sdiv/udiv instructions and replace the signed and unsigned
> division library functions with an sdiv/udiv instruction.
>
> Running the perf messaging benchmark in pipe mode
>
>  $ perf bench sched messaging -p
>
> shows a modest improvement on my v7 CPU.
>
> before:
> (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
> 5.683) / 10 = 5.805
>
> after:
> (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
> 5.093) / 10 = 5.538
>
> (5.805 - 5.538) / 5.805 = 4.6%

Even with the change to the output constraint suggested by Mans, you
get absolutely identical benchmark results? There's a lot of variance
in any case..

BTW has there been any evaluation of the penalty for the extra
branching, or the performance hit for the ARMv7-without-division
cases?

Ta,
Matt Sealey 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Stephen Boyd
If we're running on a v7 ARM CPU, detect if the CPU supports the
sdiv/udiv instructions and replace the signed and unsigned
division library functions with an sdiv/udiv instruction.

Running the perf messaging benchmark in pipe mode

 $ perf bench sched messaging -p

shows a modest improvement on my v7 CPU.

before:
(5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) 
/ 10 = 5.805

after:
(4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) 
/ 10 = 5.538

(5.805 - 5.538) / 5.805 = 4.6%

Signed-off-by: Stephen Boyd 
---

Changes since v1:
 * Replace signed with unsigned in unsigned divide function
 * drop & in inline assembly
 * Use IS_ENABLED() instead of #ifdef
 * Pass DIV_V7 into lib1funcs.S instead of depending on ZIMAGE or CPU_V7

 arch/arm/kernel/setup.c  | 13 ++-
 arch/arm/lib/Makefile|  6 +
 arch/arm/lib/div-v7.c| 58 
 arch/arm/lib/lib1funcs.S | 16 +
 4 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/lib/div-v7.c

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0e1e2b3..f9e577a 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -365,9 +366,11 @@ void __init early_print(const char *str, ...)
printk("%s", buf);
 }
 
+struct static_key cpu_has_idiv = STATIC_KEY_INIT_FALSE;
+
 static void __init cpuid_init_hwcaps(void)
 {
-   unsigned int divide_instrs, vmsa;
+   unsigned int divide_instrs, vmsa, idiv_mask;
 
if (cpu_architecture() < CPU_ARCH_ARMv7)
return;
@@ -381,6 +384,14 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}
 
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL))
+   idiv_mask = HWCAP_IDIVT;
+   else
+   idiv_mask = HWCAP_IDIVA;
+
+   if (elf_hwcap & idiv_mask)
+   static_key_slow_inc(_has_idiv);
+
/* LPAE implies atomic ldrd/strd instructions */
vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0) & 0xf) >> 0;
if (vmsa >= 5)
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index bd454b0..38621729 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -15,6 +15,12 @@ lib-y:= backtrace.o changebit.o csumipv6.o 
csumpartial.o   \
   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
   call_with_stack.o
 
+lib-$(CONFIG_CPU_V7) += div-v7.o
+CFLAGS_div-v7.o := -march=armv7-a
+ifeq ($(CONFIG_CPU_V7),y)
+  AFLAGS_lib1funcs.o := -DDIV_V7
+endif
+
 mmu-y  := clear_user.o copy_page.o getuser.o putuser.o
 
 # the code in uaccess.S is not preemption safe and
diff --git a/arch/arm/lib/div-v7.c b/arch/arm/lib/div-v7.c
new file mode 100644
index 000..e20945a
--- /dev/null
+++ b/arch/arm/lib/div-v7.c
@@ -0,0 +1,58 @@
+/* Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+extern int ___aeabi_idiv(int, int);
+extern unsigned ___aeabi_uidiv(int, int);
+
+extern struct static_key cpu_has_idiv;
+
+int __aeabi_idiv(int numerator, int denominator)
+{
+   if (static_key_false(_has_idiv)) {
+   int ret;
+
+   asm volatile (
+   ".arch_extension idiv\n"
+   "sdiv %0, %1, %2"
+   : "=r" (ret)
+   : "r" (numerator), "r" (denominator));
+
+   return ret;
+   }
+
+   return ___aeabi_idiv(numerator, denominator);
+}
+
+int __divsi3(int numerator, int denominator)
+   __attribute__((alias("__aeabi_idiv")));
+
+unsigned __aeabi_uidiv(unsigned numerator, unsigned denominator)
+{
+   if (static_key_false(_has_idiv)) {
+   unsigned ret;
+
+   asm volatile (
+   ".arch_extension idiv\n"
+   "udiv %0, %1, %2"
+   : "=r" (ret)
+   : "r" (numerator), "r" (denominator));
+
+   return ret;
+   }
+
+   return ___aeabi_uidiv(numerator, denominator);
+}
+
+unsigned __udivsi3(unsigned numerator, unsigned denominator)
+   __attribute__((alias("__aeabi_uidiv")));
diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S
index c562f64..82bbcc7 100644
--- a/arch/arm/lib/lib1funcs.S
+++ b/arch/arm/lib/lib1funcs.S
@@ -205,8 +205,12 @@ Boston, MA 02111-1307, USA.  */
 .endm
 
 
+#ifdef DIV_V7
+ENTRY(___aeabi_uidiv)
+#else
 ENTRY(__udivsi3)
 

[PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Stephen Boyd
If we're running on a v7 ARM CPU, detect if the CPU supports the
sdiv/udiv instructions and replace the signed and unsigned
division library functions with an sdiv/udiv instruction.

Running the perf messaging benchmark in pipe mode

 $ perf bench sched messaging -p

shows a modest improvement on my v7 CPU.

before:
(5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 5.683) 
/ 10 = 5.805

after:
(4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 5.093) 
/ 10 = 5.538

(5.805 - 5.538) / 5.805 = 4.6%

Signed-off-by: Stephen Boyd sb...@codeaurora.org
---

Changes since v1:
 * Replace signed with unsigned in unsigned divide function
 * drop  in inline assembly
 * Use IS_ENABLED() instead of #ifdef
 * Pass DIV_V7 into lib1funcs.S instead of depending on ZIMAGE or CPU_V7

 arch/arm/kernel/setup.c  | 13 ++-
 arch/arm/lib/Makefile|  6 +
 arch/arm/lib/div-v7.c| 58 
 arch/arm/lib/lib1funcs.S | 16 +
 4 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/lib/div-v7.c

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0e1e2b3..f9e577a 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -30,6 +30,7 @@
 #include linux/bug.h
 #include linux/compiler.h
 #include linux/sort.h
+#include linux/static_key.h
 
 #include asm/unified.h
 #include asm/cp15.h
@@ -365,9 +366,11 @@ void __init early_print(const char *str, ...)
printk(%s, buf);
 }
 
+struct static_key cpu_has_idiv = STATIC_KEY_INIT_FALSE;
+
 static void __init cpuid_init_hwcaps(void)
 {
-   unsigned int divide_instrs, vmsa;
+   unsigned int divide_instrs, vmsa, idiv_mask;
 
if (cpu_architecture()  CPU_ARCH_ARMv7)
return;
@@ -381,6 +384,14 @@ static void __init cpuid_init_hwcaps(void)
elf_hwcap |= HWCAP_IDIVT;
}
 
+   if (IS_ENABLED(CONFIG_THUMB2_KERNEL))
+   idiv_mask = HWCAP_IDIVT;
+   else
+   idiv_mask = HWCAP_IDIVA;
+
+   if (elf_hwcap  idiv_mask)
+   static_key_slow_inc(cpu_has_idiv);
+
/* LPAE implies atomic ldrd/strd instructions */
vmsa = (read_cpuid_ext(CPUID_EXT_MMFR0)  0xf)  0;
if (vmsa = 5)
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index bd454b0..38621729 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -15,6 +15,12 @@ lib-y:= backtrace.o changebit.o csumipv6.o 
csumpartial.o   \
   io-readsb.o io-writesb.o io-readsl.o io-writesl.o  \
   call_with_stack.o
 
+lib-$(CONFIG_CPU_V7) += div-v7.o
+CFLAGS_div-v7.o := -march=armv7-a
+ifeq ($(CONFIG_CPU_V7),y)
+  AFLAGS_lib1funcs.o := -DDIV_V7
+endif
+
 mmu-y  := clear_user.o copy_page.o getuser.o putuser.o
 
 # the code in uaccess.S is not preemption safe and
diff --git a/arch/arm/lib/div-v7.c b/arch/arm/lib/div-v7.c
new file mode 100644
index 000..e20945a
--- /dev/null
+++ b/arch/arm/lib/div-v7.c
@@ -0,0 +1,58 @@
+/* Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include linux/static_key.h
+
+extern int ___aeabi_idiv(int, int);
+extern unsigned ___aeabi_uidiv(int, int);
+
+extern struct static_key cpu_has_idiv;
+
+int __aeabi_idiv(int numerator, int denominator)
+{
+   if (static_key_false(cpu_has_idiv)) {
+   int ret;
+
+   asm volatile (
+   .arch_extension idiv\n
+   sdiv %0, %1, %2
+   : =r (ret)
+   : r (numerator), r (denominator));
+
+   return ret;
+   }
+
+   return ___aeabi_idiv(numerator, denominator);
+}
+
+int __divsi3(int numerator, int denominator)
+   __attribute__((alias(__aeabi_idiv)));
+
+unsigned __aeabi_uidiv(unsigned numerator, unsigned denominator)
+{
+   if (static_key_false(cpu_has_idiv)) {
+   unsigned ret;
+
+   asm volatile (
+   .arch_extension idiv\n
+   udiv %0, %1, %2
+   : =r (ret)
+   : r (numerator), r (denominator));
+
+   return ret;
+   }
+
+   return ___aeabi_uidiv(numerator, denominator);
+}
+
+unsigned __udivsi3(unsigned numerator, unsigned denominator)
+   __attribute__((alias(__aeabi_uidiv)));
diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S
index c562f64..82bbcc7 100644
--- a/arch/arm/lib/lib1funcs.S
+++ b/arch/arm/lib/lib1funcs.S
@@ -205,8 +205,12 @@ Boston, MA 02111-1307, USA. 

Re: [PATCH v2] ARM: Use udiv/sdiv for __aeabi_{u}idiv library functions

2013-11-08 Thread Matt Sealey
On Fri, Nov 8, 2013 at 5:00 PM, Stephen Boyd sb...@codeaurora.org wrote:
 If we're running on a v7 ARM CPU, detect if the CPU supports the
 sdiv/udiv instructions and replace the signed and unsigned
 division library functions with an sdiv/udiv instruction.

 Running the perf messaging benchmark in pipe mode

  $ perf bench sched messaging -p

 shows a modest improvement on my v7 CPU.

 before:
 (5.060 + 5.960 + 5.971 + 5.643 + 6.029 + 5.665 + 6.050 + 5.870 + 6.117 + 
 5.683) / 10 = 5.805

 after:
 (4.884 + 5.549 + 5.749 + 6.001 + 5.460 + 5.103 + 5.956 + 6.112 + 5.468 + 
 5.093) / 10 = 5.538

 (5.805 - 5.538) / 5.805 = 4.6%

Even with the change to the output constraint suggested by Mans, you
get absolutely identical benchmark results? There's a lot of variance
in any case..

BTW has there been any evaluation of the penalty for the extra
branching, or the performance hit for the ARMv7-without-division
cases?

Ta,
Matt Sealey n...@bakuhatsu.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/