On Wed, 2 Dec 2020 at 04:31, Daniel Engel <lib...@danielengel.com> wrote: > > Hi Christophe, > > On Thu, Nov 26, 2020, at 1:14 AM, Christophe Lyon wrote: > > Hi, > > > > On Fri, 13 Nov 2020 at 00:03, Daniel Engel <lib...@danielengel.com> wrote: > > > > > > Hi, > > > > > > This patch adds an efficient assembly-language implementation of IEEE- > > > 754 compliant floating point routines for Cortex M0 EABI (v6m, thumb- > > > 1). This is the libgcc portion of a larger library originally > > > described in 2018: > > > > > > https://gcc.gnu.org/legacy-ml/gcc/2018-11/msg00043.html > > > > > > Since that time, I've separated the libm functions for submission to > > > newlib. The remaining libgcc functions in the attached patch have > > > the following characteristics: > > > > > > Function(s) Size (bytes) Cycles > > > Stack Accuracy > > > __clzsi2 42 23 0 > > > exact > > > __clzsi2 (OPTIMIZE_SIZE) 22 55 0 > > > exact > > > __clzdi2 8+__clzsi2 4+__clzsi2 0 > > > exact > > > > > > __umulsidi3 44 24 0 > > > exact > > > __mulsidi3 30+__umulsidi3 24+__umulsidi3 8 > > > exact > > > __muldi3 (__aeabi_lmul) 10+__umulsidi3 6+__umulsidi3 0 > > > exact > > > __ashldi3 (__aeabi_llsl) 22 13 0 > > > exact > > > __lshrdi3 (__aeabi_llsr) 22 13 0 > > > exact > > > __ashrdi3 (__aeabi_lasr) 22 13 0 > > > exact > > > > > > __aeabi_lcmp 20 13 0 > > > exact > > > __aeabi_ulcmp 16 10 0 > > > exact > > > > > > __udivsi3 (__aeabi_uidiv) 56 72 – 385 0 > > > < 1 lsb > > > __divsi3 (__aeabi_idiv) 38+__udivsi3 26+__udivsi3 8 > > > < 1 lsb > > > __udivdi3 (__aeabi_uldiv) 164 103 – 1394 > > > 16 < 1 lsb > > > __udivdi3 (OPTIMIZE_SIZE) 142 120 – 1392 > > > 16 < 1 lsb > > > __divdi3 (__aeabi_ldiv) 54+__udivdi3 36+__udivdi3 > > > 32 < 1 lsb > > > > > > __shared_float 178 > > > __shared_float (OPTIMIZE_SIZE) 154 > > > > > > __addsf3 (__aeabi_fadd) 116+__shared_float 31 – 76 8 > > > <= 0.5 ulp > > > __addsf3 (OPTIMIZE_SIZE) 112+__shared_float 74 8 > > > <= 0.5 ulp > > > __subsf3 (__aeabi_fsub) 8+__addsf3 6+__addsf3 8 > > > <= 0.5 ulp > > > __aeabi_frsub 8+__addsf3 6+__addsf3 8 > > > <= 0.5 ulp > > > __mulsf3 (__aeabi_fmul) 112+__shared_float 73 – 97 8 > > > <= 0.5 ulp > > > __mulsf3 (OPTIMIZE_SIZE) 96+__shared_float 93 8 > > > <= 0.5 ulp > > > __divsf3 (__aeabi_fdiv) 132+__shared_float 83 – 361 8 > > > <= 0.5 ulp > > > __divsf3 (OPTIMIZE_SIZE) 120+__shared_float 263 – 359 8 > > > <= 0.5 ulp > > > > > > __cmpsf2/__lesf2/__ltsf2 72 33 0 > > > exact > > > __eqsf2/__nesf2 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __gesf2/__gesf2 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __unordsf2 (__aeabi_fcmpun) 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __aeabi_fcmpeq 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __aeabi_fcmpne 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __aeabi_fcmplt 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __aeabi_fcmple 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > __aeabi_fcmpge 4+__cmpsf2 3+__cmpsf2 0 > > > exact > > > > > > __floatundisf (__aeabi_ul2f) 14+__shared_float 40 – 81 8 > > > <= 0.5 ulp > > > __floatundisf (OPTIMIZE_SIZE) 14+__shared_float 40 – 237 8 > > > <= 0.5 ulp > > > __floatunsisf (__aeabi_ui2f) 0+__floatundisf 1+__floatundisf 8 > > > <= 0.5 ulp > > > __floatdisf (__aeabi_l2f) 14+__floatundisf 7+__floatundisf 8 > > > <= 0.5 ulp > > > __floatsisf (__aeabi_i2f) 0+__floatdisf 1+__floatdisf 8 > > > <= 0.5 ulp > > > > > > __fixsfdi (__aeabi_f2lz) 74 27 – 33 0 > > > exact > > > __fixunssfdi (__aeabi_f2ulz) 4+__fixsfdi 3+__fixsfdi 0 > > > exact > > > __fixsfsi (__aeabi_f2iz) 52 19 0 > > > exact > > > __fixsfsi (OPTIMIZE_SIZE) 4+__fixsfdi 3+__fixsfdi 0 > > > exact > > > __fixunssfsi (__aeabi_f2uiz) 4+__fixsfsi 3+__fixsfsi 0 > > > exact > > > > > > __extendsfdf2 (__aeabi_f2d) 42+__shared_float 38 8 > > > exact > > > __aeabi_d2f 56+__shared_float 54 – 58 8 > > > <= 0.5 ulp > > > __aeabi_h2f 34+__shared_float 34 8 > > > exact > > > __aeabi_f2h 84 23 – 34 0 > > > <= 0.5 ulp > > > > > > Copyright assignment is on file with the FSF. > > > > > > I've built the gcc-arm-none-eabi cross-compiler using the 20201108 > > > snapshot of GCC plus this patch, and successfully compiled a test > > > program: > > > > > > extern int main (void) > > > { > > > volatile int x = 1; > > > volatile unsigned long long int y = 10; > > > volatile long long int z = x / y; // 64-bit division > > > > > > volatile float a = x; // 32-bit casting > > > volatile float b = y; // 64 bit casting > > > volatile float c = z / b; // float division > > > volatile float d = a + c; // float addition > > > volatile float e = c * b; // float multiplication > > > volatile float f = d - e - c; // float subtraction > > > > > > if (f != c) // float comparison > > > y -= (long long int)d; // float casting > > > } > > > > > > As one point of comparison, the test program links to 876 bytes of > > > libgcc code from the patched toolchain, vs 10276 bytes from the > > > latest released gcc-arm-none-eabi-9-2020-q2 toolchain. That's a > > > 90% size reduction. > > > > This looks awesome! > > > > > > > > I have extensive test vectors, and have passed these tests on an > > > STM32F051. These vectors were derived from UCB [1], Testfloat [2], > > > and IEEECC754 [3] sources, plus some of my own creation. > > > Unfortunately, I'm not sure how "make check" should work for a cross > > > compiler run time library. > > > > > > Although I believe this patch can be incorporated as-is, there are > > > at least two points that might bear discussion: > > > > > > * I'm not sure where or how they would be integrated, but I would be > > > happy to provide sources for my test vectors. > > > > > > * The library is currently built for the ARM v6m architecture only. > > > It is likely that some of the other Cortex variants would benefit > > > from these routines. However, I would need some guidance on this > > > to proceed without introducing regressions. I do not currently > > > have a test strategy for architectures beyond Cortex M0, and I > > > have NOT profiled the existing thumb-2 implementations (ieee754- > > > sf.S) for comparison. > > > > I tried your patch, and I see many regressions in the GCC testsuite > > because many tests fail to link with errors like: > > ld: /gcc/thumb/v6-m/nofp/libgcc.a(_arm_cmpdf2.o): in function > > `__clzdi2': > > /libgcc/config/arm/cm0/clz2.S:39: multiple definition of > > `__clzdi2';/gcc/thumb/v6-m/nofp/libgcc.a(_thumb1_case_sqi.o):/libgcc/config/arm/cm0/clz2.S:39: > > first defined here > > > > This happens with a toolchain configured with --target arm-none-eabi, > > default cpu/fpu/mode, > > --enable-multilib --with-multilib-list=rmprofile and running the tests with > > -mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m > > > > Does it work for you? > > Thanks for the feedback. > > I'm afraid I'm quite ignorant as to the gcc test suite infrastructure, > so I don't know how to use the options you've shared above. I'm cross- > compiling the Windows toolchain on Ubuntu. Would you mind sharing a > full command line you would use for testing? The toolchain is built > with the default options, which includes "--target arm-none-eabi". >
Why put Windows in the picture? This seems unnecessarily complicated... I suggest you build your cross-toolchain on x86_64 ubuntu and run it on x86_64 ubuntu (of course targetting arm) The above options where GCC configure options, except for the last one which I used when running the tests. There is some documentation about how to run the GCC testsuite there: https://gcc.gnu.org/install/test.html Basically 'make check' should mostly work except for execution tests for which you'll need to teach DejaGnu how to run the generated programs on a real board or on a simulator. I didn't analyze your patch, I just submitted it to my validation system: https://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/r11-5993-g159b0bd9ce263dfb791eff5133b0ca0207201c84-cortex-m0-fplib-20201130.patch2/report-build-info.html - the red "regressed" items indicate regressions in the testsuite. You can click on "log" to download the corresponding gcc.log - the dark-red "build broken" items indicate that the toolchain build failed - the orange "interrupted" items indicate an infrastructure problem, so you can ignore such cases - similarly the dark red "ref build failed" indicate that the reference build failed for some infrastructure reason for the arm-none-eabi target, several toolchain versions fail to build, some succeed. This is because I use different multilib configuration flags, it looks like the ones involving --with-multilib=rmprofile are broken with your patch. These ones should be reasonably easy to fix: no 'make check' involved. For instance if you configure GCC with: --target arm-none-eabi --enable-multilib --with-multilib-list=rmprofile you should see the build failure. HTH Christophe > I did see similar errors once before. It turned out then that I omitted > one of the ".S" files from the build. My interpretation at that point > was that gcc had been searching multiple versions of "libgcc.a" and > unable to merge the symbols. In hindsight, that was a really bad > interpretation. I was able to reproduce the error above by simply > adding a line like "volatile double m = 1.0; m += 2;". > > After reviewing the existing asm implementations more closely, I > believe that I have not been using the function guard macros (L_arm_*) > as intended. The make script appears to compile "lib1funcs.S" dozens of > times -- once for each function guard macro listed in LIB1ASMFUNCS -- > with the intent of generating a separate ".o" file for each function. > Because they were unguarded, my new library functions were duplicated > into every ".o" file, which caused the link errors you saw. > > I have attached an updated patch that implements the macros. > > However, I'm not sure whether my usage is really consistent with the > spirit of the make script. If there's a README or HOWTO, I haven't > found it yet. The following points summarize my concerns as I was > making these updates: > > 1. While some of the new functions (e.g. __cmpsf2) are standalone, > there is a common core in the new library shared by several related > functions. That keeps the library small. For now, I've elected to > group all of these related functions together in a single object > file "_arm_addsubsf3.o" to protect the short branches (+/-2KB) > within this unit. Notice that I manually assigned section names in > the code, so there still shouldn't be any unnecessary code linked in > the final build. Does the multiple-".o" files strategy predate "-gc- > sections", or should I be trying harder to break these related > functions into separate compilation units? > > 2. I introduced a few new macro keywords for functions/groups (e.g. > "_arm_f2h" and '_arm_f2h'. My assumption is that some empty ".o" > files compiled for the non-v6m architectures will be benign. > > 3. The "t-elf" make script implies that __mulsf3() should not be > compiled in thumb mode (it's inside a conditional), but this is one > of the new functions. Moot for now, since my __mulsf3() is grouped > with the common core functions (see point 1) and is thus currently > guarded by the "_arm_addsubsf3.o" macro. > > 4. The advice (in "ieee754-sf.S") regarding WEAK symbols does not seem > to be working. I have defined __clzsi2() as a weak symbol to be > overridden by the combined function __clzdi2(). I can also see > (with "nm") that "clzsi2.o" is compiled before "clzdi2.o" in > "libgcc.a". Yet, the full __clzdi2() function (8 bytes larger) is > always linked, even in programs that only call __clzsi2(), A minor > annoyance at this point. > > 5. Is there a permutation of the makefile that compiles libgcc with > __OPTIMIZE_SIZE__? There are a few sections in the patch that can > optimize either way, yet the final product only seems to have the > "fast" code. At this optimization level, the sample program above > pulls in 1012 bytes of library code instead of 836. Perhaps this is > meant to be controlled by the toolchain configuration step, but it > doesn't follow that the optimization for the cross-compiler would > automatically translate to the target runtime libraries. > > Thanks again, > Daniel > > > > > Thanks, > > > > Christophe > > > > > > > > I'm naturally hoping for some action on this patch before the Nov 16th > > > deadline for GCC-11 stage 3. Please review and advise. > > > > > > Thanks, > > > Daniel Engel > > > > > > [1] http://www.netlib.org/fp/ucbtest.tgz > > > [2] http://www.jhauser.us/arithmetic/TestFloat.html > > > [3] http://win-www.uia.ac.be/u/cant/ieeecc754.html > >