Re: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
On 07/07/15 16:23, Tejas Belagod wrote: > Ping! > I've had a look at this (sorry for the delay). I think it's mostly OK, but I have two comments to make. 1) It's quite hard to understand the algorithm and there are no comments to aid understanding (to be fair, there aren't many comments on the other algorithms either). 2) It looks as though the new code calculates both the division and the modulus simultaneously. As such, it means that the the divmod code should be simplified to share the same code as the division function itself (saving a few bytes, and more importantly several cycles when modulus is required). Can you please run some tests to validate 2) above? and if correct adjust the code to handle this case. I think that will go some way to mitigating the code size increase from the new implementation. R. > On 30/04/15 10:40, Hale Wang wrote: >>> -Original Message- >>> From: Hale Wang [mailto:hale.w...@arm.com] >>> Sent: Monday, February 09, 2015 9:54 AM >>> To: Richard Earnshaw >>> Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann >>> Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for >>> armv6-m >>> >>> Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html. >>> >> >> Ping for trunk. Is it ok for trunk now? >> >> Thanks, >> Hale -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hale Wang Sent: Friday, December 12, 2014 9:36 AM To: gcc-patches Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6- m Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? -Hale > -Original Message- > From: Joey Ye [mailto:joey.ye...@gmail.com] > Sent: Thursday, November 27, 2014 10:01 AM > To: Hale Wang > Cc: gcc-patches > Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for > armv6-m > > OK applying to arm/embedded-4_9-branch, though you still need > maintainer approval into trunk. > > - Joey > > On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang wrote: >> Hi, >> >> This patch ports the aeabi_idiv routine from Linaro Cortex-Strings >> (https://git.linaro.org/toolchain/cortex-strings.git), which was >> contributed by ARM under Free BSD license. >> >> The new aeabi_idiv routine is used to replace the one in >> libgcc/config/arm/lib1funcs.S. This replacement happens within the >> Thumb1 wrapper. The new routine is under LGPLv3 license. >> >> The main advantage of this version is that it can improve the >> performance of the aeabi_idiv function for Thumb1. This solution >> will also increase the code size. So it will only be used if >> __OPTIMIZE_SIZE__ is > not defined. >> >> Make check passed for armv6-m. >> >> OK for trunk? >> >> Thanks, >> Hale Wang >> >> libgcc/ChangeLog: >> >> 2014-11-26 Hale Wang >> >> * config/arm/lib1funcs.S: Add new wrapper. >> >> === >> diff --git a/libgcc/config/arm/lib1funcs.S >> b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 >> --- a/libgcc/config/arm/lib1funcs.S >> +++ b/libgcc/config/arm/lib1funcs.S >> @@ -306,34 +306,12 @@ LSYM(Lend_fde): >> #ifdef __ARM_EABI__ >> .macro THUMB_LDIV0 name signed >> #if defined(__ARM_ARCH_6M__) >> - .ifc \signed, unsigned >> - cmp r0, #0 >> - beq 1f >> - mov r0, #0 >> - mvn r0, r0 @ 0x >> -1: >> - .else >> - cmp r0, #0 >> - beq 2f >> - blt 3f >> + >> + push{r0, lr} >> mov r0, #0 >> - mvn r0, r0 >> - lsr r0, r0, #1 @ 0x7fff >> - b 2f >> -3: mov r0, #0x80 >> - lsl r0, r0, #24 @ 0x8000 >> -2: >> - .endif >> - push{r0, r1, r2} >> - ldr r0, 4f >> - adr r1, 4f >> - add r0, r1 >> - str r0, [sp, #8] >> - @ We know we are not on armv4t, so pop pc is safe. >> - pop {r0, r1, pc} >> - .align 2 >> -4: >> - .word __aeabi_idiv0 - 4b >> + bl SYM(__aeabi_idiv0) >> + pop {r1, pc} >> + >> #elif defined(__thumb2__) >> .syntax unified >> .ifc \signed, unsigned >> @@ -927,7 +905,158 @@ LSYM(Lover7): >> add dividend, work >> .endif >> LSYM(Lgot_result): >> -.endm >> +.endm >> + >> +#if defined(__prefer_thumb__) > && !defined(__OPTIMIZE_SIZE__) .macro >> +BranchToDiv n, label >> + lsr curbit, dividend, \n >> + cmp curbit, divisor >> +
Re: RE: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
Ping! On 30/04/15 10:40, Hale Wang wrote: -Original Message- From: Hale Wang [mailto:hale.w...@arm.com] Sent: Monday, February 09, 2015 9:54 AM To: Richard Earnshaw Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html. Ping for trunk. Is it ok for trunk now? Thanks, Hale -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Hale Wang Sent: Friday, December 12, 2014 9:36 AM To: gcc-patches Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6- m Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? -Hale -Original Message- From: Joey Ye [mailto:joey.ye...@gmail.com] Sent: Thursday, November 27, 2014 10:01 AM To: Hale Wang Cc: gcc-patches Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m OK applying to arm/embedded-4_9-branch, though you still need maintainer approval into trunk. - Joey On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang wrote: Hi, This patch ports the aeabi_idiv routine from Linaro Cortex-Strings (https://git.linaro.org/toolchain/cortex-strings.git), which was contributed by ARM under Free BSD license. The new aeabi_idiv routine is used to replace the one in libgcc/config/arm/lib1funcs.S. This replacement happens within the Thumb1 wrapper. The new routine is under LGPLv3 license. The main advantage of this version is that it can improve the performance of the aeabi_idiv function for Thumb1. This solution will also increase the code size. So it will only be used if __OPTIMIZE_SIZE__ is not defined. Make check passed for armv6-m. OK for trunk? Thanks, Hale Wang libgcc/ChangeLog: 2014-11-26 Hale Wang * config/arm/lib1funcs.S: Add new wrapper. === diff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 --- a/libgcc/config/arm/lib1funcs.S +++ b/libgcc/config/arm/lib1funcs.S @@ -306,34 +306,12 @@ LSYM(Lend_fde): #ifdef __ARM_EABI__ .macro THUMB_LDIV0 name signed #if defined(__ARM_ARCH_6M__) - .ifc \signed, unsigned - cmp r0, #0 - beq 1f - mov r0, #0 - mvn r0, r0 @ 0x -1: - .else - cmp r0, #0 - beq 2f - blt 3f + + push{r0, lr} mov r0, #0 - mvn r0, r0 - lsr r0, r0, #1 @ 0x7fff - b 2f -3: mov r0, #0x80 - lsl r0, r0, #24 @ 0x8000 -2: - .endif - push{r0, r1, r2} - ldr r0, 4f - adr r1, 4f - add r0, r1 - str r0, [sp, #8] - @ We know we are not on armv4t, so pop pc is safe. - pop {r0, r1, pc} - .align 2 -4: - .word __aeabi_idiv0 - 4b + bl SYM(__aeabi_idiv0) + pop {r1, pc} + #elif defined(__thumb2__) .syntax unified .ifc \signed, unsigned @@ -927,7 +905,158 @@ LSYM(Lover7): add dividend, work .endif LSYM(Lgot_result): -.endm +.endm + +#if defined(__prefer_thumb__) && !defined(__OPTIMIZE_SIZE__) .macro +BranchToDiv n, label + lsr curbit, dividend, \n + cmp curbit, divisor + blo \label +.endm + +.macro DoDiv n + lsr curbit, dividend, \n + cmp curbit, divisor + bcc 1f + lsl curbit, divisor, \n + sub dividend, dividend, curbit + +1: adc result, result +.endm + +.macro THUMB1_Div_Positive + mov result, #0 + BranchToDiv #1, LSYM(Lthumb1_div1) + BranchToDiv #4, LSYM(Lthumb1_div4) + BranchToDiv #8, LSYM(Lthumb1_div8) + BranchToDiv #12, LSYM(Lthumb1_div12) + BranchToDiv #16, LSYM(Lthumb1_div16) +LSYM(Lthumb1_div_large_positive): + mov result, #0xff + lsl divisor, divisor, #8 + rev result, result + lsr curbit, dividend, #16 + cmp curbit, divisor + blo 1f + asr result, #8 + lsl divisor, divisor, #8 + beq LSYM(Ldivbyzero_waypoint) + +1: lsr curbit, dividend, #12 + cmp curbit, divisor + blo LSYM(Lthumb1_div12) + b LSYM(Lthumb1_div16) +LSYM(Lthumb1_div_loop): + lsr divisor, divisor, #8 +LSYM(Lthumb1_div16): + Dodiv #15 + Dodiv #14 + Dodiv #13 + Dodiv #12 +LSYM(Lthumb1_div12): + Dodiv #11 + Dodiv #10 + Dodiv #9 + Dodiv #8 + bcs LSYM(Lthumb1_div_loop) +LSYM(Lthumb1_div8): + Dodiv #7 + Dodiv #6 + Dodiv #5 +LSYM(Lthumb1_div5): + Dodiv #4 +LSYM(Lthumb1_div4): + Dodiv #3 +LSYM(Lthumb1_div3): + Dodiv #2 +LSYM(Lthumb1_div2): + Dodiv #1 +LSYM(Lthumb1_div1): + sub divisor, dividend, divisor +
RE: [Ping^3] [PATCH, ARM, libgcc] New aeabi_idiv function for armv6-m
> -Original Message- > From: Hale Wang [mailto:hale.w...@arm.com] > Sent: Monday, February 09, 2015 9:54 AM > To: Richard Earnshaw > Cc: Hale Wang; gcc-patches; Matthew Gretton-Dann > Subject: RE: [Ping^2] [PATCH, ARM, libgcc] New aeabi_idiv function for > armv6-m > > Ping https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01059.html. > Ping for trunk. Is it ok for trunk now? Thanks, Hale > > -Original Message- > > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > ow...@gcc.gnu.org] On Behalf Of Hale Wang > > Sent: Friday, December 12, 2014 9:36 AM > > To: gcc-patches > > Subject: RE: [Ping] [PATCH, ARM, libgcc] New aeabi_idiv function for > > armv6- m > > > > Ping? Already applied to arm/embedded-4_9-branch, is it OK for trunk? > > > > -Hale > > > > > -Original Message- > > > From: Joey Ye [mailto:joey.ye...@gmail.com] > > > Sent: Thursday, November 27, 2014 10:01 AM > > > To: Hale Wang > > > Cc: gcc-patches > > > Subject: Re: [PATCH, ARM, libgcc] New aeabi_idiv function for > > > armv6-m > > > > > > OK applying to arm/embedded-4_9-branch, though you still need > > > maintainer approval into trunk. > > > > > > - Joey > > > > > > On Wed, Nov 26, 2014 at 11:43 AM, Hale Wang > > wrote: > > > > Hi, > > > > > > > > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings > > > > (https://git.linaro.org/toolchain/cortex-strings.git), which was > > > > contributed by ARM under Free BSD license. > > > > > > > > The new aeabi_idiv routine is used to replace the one in > > > > libgcc/config/arm/lib1funcs.S. This replacement happens within the > > > > Thumb1 wrapper. The new routine is under LGPLv3 license. > > > > > > > > The main advantage of this version is that it can improve the > > > > performance of the aeabi_idiv function for Thumb1. This solution > > > > will also increase the code size. So it will only be used if > > > > __OPTIMIZE_SIZE__ is > > > not defined. > > > > > > > > Make check passed for armv6-m. > > > > > > > > OK for trunk? > > > > > > > > Thanks, > > > > Hale Wang > > > > > > > > libgcc/ChangeLog: > > > > > > > > 2014-11-26 Hale Wang > > > > > > > > * config/arm/lib1funcs.S: Add new wrapper. > > > > > > > > === > > > > diff --git a/libgcc/config/arm/lib1funcs.S > > > > b/libgcc/config/arm/lib1funcs.S index b617137..de66c81 100644 > > > > --- a/libgcc/config/arm/lib1funcs.S > > > > +++ b/libgcc/config/arm/lib1funcs.S > > > > @@ -306,34 +306,12 @@ LSYM(Lend_fde): > > > > #ifdef __ARM_EABI__ > > > > .macro THUMB_LDIV0 name signed > > > > #if defined(__ARM_ARCH_6M__) > > > > - .ifc \signed, unsigned > > > > - cmp r0, #0 > > > > - beq 1f > > > > - mov r0, #0 > > > > - mvn r0, r0 @ 0x > > > > -1: > > > > - .else > > > > - cmp r0, #0 > > > > - beq 2f > > > > - blt 3f > > > > + > > > > + push{r0, lr} > > > > mov r0, #0 > > > > - mvn r0, r0 > > > > - lsr r0, r0, #1 @ 0x7fff > > > > - b 2f > > > > -3: mov r0, #0x80 > > > > - lsl r0, r0, #24 @ 0x8000 > > > > -2: > > > > - .endif > > > > - push{r0, r1, r2} > > > > - ldr r0, 4f > > > > - adr r1, 4f > > > > - add r0, r1 > > > > - str r0, [sp, #8] > > > > - @ We know we are not on armv4t, so pop pc is safe. > > > > - pop {r0, r1, pc} > > > > - .align 2 > > > > -4: > > > > - .word __aeabi_idiv0 - 4b > > > > + bl SYM(__aeabi_idiv0) > > > > + pop {r1, pc} > > > > + > > > > #elif defined(__thumb2__) > > > > .syntax unified > > > > .ifc \signed, unsigned > > > > @@ -927,7 +905,158 @@ LSYM(Lover7): > > > > add dividend, work > > > >.endif > > > > LSYM(Lgot_result): > > > > -.endm > > > > +.endm > > > > + > > > > +#if defined(__prefer_thumb__) > > > && !defined(__OPTIMIZE_SIZE__) .macro > > > > +BranchToDiv n, label > > > > + lsr curbit, dividend, \n > > > > + cmp curbit, divisor > > > > + blo \label > > > > +.endm > > > > + > > > > +.macro DoDiv n > > > > + lsr curbit, dividend, \n > > > > + cmp curbit, divisor > > > > + bcc 1f > > > > + lsl curbit, divisor, \n > > > > + sub dividend, dividend, curbit > > > > + > > > > +1: adc result, result > > > > +.endm > > > > + > > > > +.macro THUMB1_Div_Positive > > > > + mov result, #0 > > > > + BranchToDiv #1, LSYM(Lthumb1_div1) > > > > + BranchToDiv #4, LSYM(Lthumb1_div4) > > > > + BranchToDiv #8, LSYM(Lthumb1_div8) > > > > + BranchToDiv #12, LSYM(Lthumb1_div12) > > > > + BranchToDiv #16, LSYM(Lthumb1_div16) > > > > +LSYM(Lthumb1_div_large_positive): > > > > + mov result, #0xff > > > > + lsl divisor, divisor, #8 > > > > + rev result, result >