Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

Daniel Engel Thu, 21 Jan 2021 12:35:26 -0800

Hi Christophe,

On Thu, Jan 21, 2021, at 2:29 AM, Christophe Lyon wrote:
> On Sat, 16 Jan 2021 at 17:13, Daniel Engel <lib...@danielengel.com> wrote:
> >
> > Hi Christophe,
> >
> > On Fri, Jan 15, 2021, at 4:30 AM, Christophe Lyon wrote:
> > > On Fri, 15 Jan 2021 at 12:39, Daniel Engel <lib...@danielengel.com> wrote:
> > > >
> > > > Hi Christophe,
> > > >
> > > > On Mon, Jan 11, 2021, at 8:39 AM, Christophe Lyon wrote:
> > > > > On Mon, 11 Jan 2021 at 17:18, Daniel Engel <lib...@danielengel.com> 
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jan 11, 2021, at 8:07 AM, Christophe Lyon wrote:
> > > > > > > On Sat, 9 Jan 2021 at 14:09, Christophe Lyon 
> > > > > > > <christophe.l...@linaro.org> wrote:
> > > > > > > >
> > > > > > > > On Sat, 9 Jan 2021 at 13:27, Daniel Engel 
> > > > > > > > <lib...@danielengel.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Jan 7, 2021, at 4:56 AM, Richard Earnshaw wrote:
> > > > > > > > > > On 07/01/2021 00:59, Daniel Engel wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jan 6, 2021, at 9:05 AM, Richard Earnshaw wrote:
> > > > > > > > > > > --snip--
> > > > > > > > > > >
> > > > > > > > > > >> - finally, your popcount implementations have data in 
> > > > > > > > > > >> the code segment.
> > > > > > > > > > >>  That's going to cause problems when we have compilation 
> > > > > > > > > > >> options such as
> > > > > > > > > > >> -mpure-code.
> > > > > > > > > > >
> > > > > > > > > > > I am just following the precedent of existing lib1funcs 
> > > > > > > > > > > (e.g. __clz2si).
> > > > > > > > > > > If this matters, you'll need to point in the right 
> > > > > > > > > > > direction for the
> > > > > > > > > > > fix.  I'm not sure it does matter, since these functions 
> > > > > > > > > > > are PIC anyway.
> > > > > > > > > >
> > > > > > > > > > That might be a bug in the clz implementations - 
> > > > > > > > > > Christophe: Any thoughts?
> > > > > > > > >
> > > > > > > > > __clzsi2() has test coverage in 
> > > > > > > > > "gcc.c-torture/execute/builtin-bitops-1.c"
> > > > > > > > Thanks, I'll have a closer look at why I didn't see problems.
> > > > > > > >
> > > > > > >
> > > > > > > So, that's because the code goes to the .text section (as opposed 
> > > > > > > to
> > > > > > > .text.noread)
> > > > > > > and does not have the PURECODE flag. The compiler takes care of 
> > > > > > > this
> > > > > > > when generating code with -mpure-code.
> > > > > > > And the simulator does not complain because it only checks loads 
> > > > > > > from
> > > > > > > the segment with the PURECODE flag set.
> > > > > > >
> > > > > > This is far out of my depth, but can something like:
> > > > > >
> > > > > > ifeq (,$(findstring __symbian__,$(shell $(gcc_compile_bare) -dM -E 
> > > > > > - </dev/null)))
> > > > > >
> > > > > > be adapted to:
> > > > > >
> > > > > > a) detect the state of the -mpure-code switch, and
> > > > > > b) pass that flag to the preprocessor?
> > > > > >
> > > > > > If so, I can probably fix both the target section and the data 
> > > > > > usage.
> > > > > > Just have to add a few instructions to finish unrolling the loop.
> > > > >
> > > > > I must confess I never checked libgcc's Makefile deeply before,
> > > > > but it looks like you can probably detect whether -mpure-code is
> > > > > part of $CFLAGS.
> > > > >
> > > > > However, it might be better to write pure-code-safe code
> > > > > unconditionally because the toolchain will probably not
> > > > > be rebuilt with -mpure-code as discussed before.
> > > > > Or that could mean adding a -mpure-code multilib....
> > > >
> > > > I have learned a few things since the last update.  I think I know how
> > > > to get -mpure-code out of CFLAGS and into a macro.  However, I have hit
> > > > something of a wall with testing.  I can't seem to compile any flavor of
> > > > libgcc with CFLAGS_FOR_TARGET="-mpure-code".
> > > >
> > > > 1.  Configuring --with-multilib-list=rmprofile results in build failure:
> > > >
> > > >     checking for suffix of object files... configure: error: in 
> > > > `/home/mirdan/gcc-obj/arm-none-eabi/libgcc':
> > > >     configure: error: cannot compute suffix of object files: cannot 
> > > > compile
> > > >     See `config.log' for more details
> > > >
> > > >    cc1: error: -mpure-code only supports non-pic code on M-profile 
> > > > targets
> > > >
> > >
> > > Yes, I did hit that wall too :-)
> > >
> > > Hence what we discussed earlier: the toolchain is not rebuilt with 
> > > -mpure-code.
> > >
> > > Note that there are problems in newlib too, but users of -mpure-code seem
> > > to be able to work around that (eg. using their own startup code and no 
> > > stdlib)
> >
> > Is there a current side project to solve the makefile problems?
> None that I know of.
> 
> 
> > I think I'm back to my original question: If libgcc can't be built
> > with -mpure-code, and users bypass it completely with -nostdlib, then
> > why this conversation about pure-code compatibility of __clzsi2() etc?
> I think Richard noticed this pre-existing problem as part of the review
> of your patches. I don't think I meant fixing this is a prerequisite.
> But maybe I misunderstood :-)


I might have misunderstood too then.  It was certainly a pre-existing
problem, but I took the comments to mean that I had to own it as part of
touching those functions.
 
> > > > 2.  Attempting to filter the multib list results in configuration error.
> > > >     This might have been misguided, but it was something I tried:
> > > >
> > > >     Error: --with-multilib-list=armv6s-m not supported.
> > > >
> > > >     Error: --with-multilib-list=mthumb/march=armv6s-m/mfloat-abi=soft 
> > > > not supported
> > >
> > > I think only 2 values are supported: aprofile and rmprofile.
> >
> > It looks like this might require a custom t-* multilib in gcc/config/arm.
> Or we could add -mpure-code to the rmprofile list.

I have no strong opinions here.  Are you proposing that the "m" versions
of libgcc be built with -mpure-code enabled by default, or are you
proposing a parallel set of multilibs?  -mpure-code by default would
have costs in both size and speed.

> > > > 3.  Attempting to configure a single architecture results in a build 
> > > > error.
> > > >
> > > >     --with-mode=thumb --with-arch=armv6s-m --with-float=soft
> > > >
> > > >     checking for suffix of object files... configure: error: in 
> > > > `/home/mirdan/gcc-obj/arm-none-eabi/arm/autofp/v5te/fpu/libgcc':
> > > >     configure: error: cannot compute suffix of object files: cannot 
> > > > compile
> > > >     See `config.log' for more details
> > > >
> > > >     conftest.c:9:10: fatal error: ac_nonexistent.h: No such file or 
> > > > directory
> > > >         9 | #include <ac_nonexistent.h>
> > > >           |          ^~~~~~~~~~~~~~~~~~
> > > I never saw that error message, but I never build using --with-arch.
> > > I do use --with-cpu though.
> > >
> > > > This has me wondering whether pure-code in libgcc is a real issue ...
> > > > If there's a way to build libgcc with -mpure-code, please enlighten me.
> > > I haven't done so yet. Maybe building the toolchain --with-cpu=cortex-m0
> > > works?
> >
> > No luck with that.  Same error message as before:
> >
> > 4.  --with-mode=thumb --with-arch=armv6s-m --with-float=soft 
> > --with-cpu=cortex-m0
> >
> >     Switch "--with-arch" may not be used with switch "--with-cpu"
> >
> > 5.  Then: --with-mode=thumb --with-float=soft --with-cpu=cortex-m0
> >
> >     checking for suffix of object files... configure: error: in 
> > `/home/mirdan/gcc-obj/arm-none-eabi/arm/autofp/v5te/fpu/libgcc':
> >     configure: error: cannot compute suffix of object files: cannot compile
> >     See `config.log' for more details
> >
> >     cc1: error: -mpure-code only supports non-pic code on M-profile targets
> Yes that's because default multilibs include targets incompatible with
> -mpure-code
> 
> > 6.  Finally! --with-float=soft --with-cpu=cortex-m0 --disable-multilib
> >
> > Once you know this, and read the docs sideways, the previous errors are
> > all probably "works as designed".  But, I can still grumble.
> Yes, I think it's "as designed". I faced the "incompatible multilibs" issue
> too some time ago. Hence testing is not easy.
> 
> > With libgcc compiled with -mpure-code, I can confirm that
> > 'builtin-bitops-1.c' (the test for __clzsi2) passed with libgcc as-is.
> >
> > I then added the SHF_ARM_PURECODE flag to the libgcc assembly functions
> > and re-ran the test.  Still passed.  I then added -mpure-code to
> > RUNTESTFLAGS and re-ran the test.  Still passed.  readelf confirmed that
> > the test program is compiling as expected [1]:
> >
> >     [ 2] .text             PROGBITS        0000800c 00800c 003314 00 AXy  0 
> >   0  4
> >     Key to Flags:
> >     W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
> >     L (link order), O (extra OS processing required), G (group), T (TLS),
> >     C (compressed), x (unknown), o (OS specific), E (exclude),
> >     y (purecode), p (processor specific)
> >
> > It was only when I started inserting pure-code test directives into
> > 'builtin-bitops-1.c' that 'make check' began to report errors.
> >
> >     /* { dg-do compile } */
> >     ...
> >     /* { dg-options "-mpure-code -mfp16-format=ieee" } */
> >     /* { dg-final { scan-assembler-not 
> > "\\.(float|l\\?double|\d?byte|short|int|long|quad|word)\\s+\[^.\]" } } */
> >
> > However, for reasons [2] [3] [4] [5], this wasn't actually useful.  It's
> > sufficient to say that there are many reasons that non-pure-code
> > compatible functions exist in libgcc.
> 
> I filed a PR to better track this discussion:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98779

Possibly worth noting that my patch series addresses __clzsi2, __ctzsi2,
__aeabi_ldivmod, __aeabi_uldivmod, and most of __gnu_float2h_internal as
long as CFLAGS_FOR_TARGET contains -mpure-code.

> 
> >
> > Although I'm not sure how useful this will be in light of the previous
> > findings, I did take the opportunity with a working compile process to
> > modify the relevant assembly functions for -mpure-code compatibility.
> > I can manually disassemble the library and verify correct compilation.
> > I can manually run a non-pure-code builtin-bitops-1 with a pure-code
> > library to verify correct execution.  But, I don't think your standard
> > regression suite will be able to exercise the new paths.
> Thanks for doing that. With the SHF_ARM_PURECODE flag you
> added to clz, I think my simulator would catch problems.

See my more detailed comments following.  Unless you're also using a
custom linker script, I would have expected any simulator capable of
catching errors to have already caught them.  Putting the
SHF_ARM_PURECODE flag on clz actually seems rather cosmetic.

> > The patch is below; you can consider this as 34/33 in the series.
> >
> > Regards,
> > Daniel
> >
> > [1] It's pretty clear that the section flags in libgcc have never really
> >     mattered.  When the linker strings all of the used objects together,
> >     the original sections disappear into a single output object. The
> >     compiler controls those flags regardless of what libgcc does.)
> Not sure what you mean? The linker creates two segments, one
> with and one without the SHF_ARM_PURECODE flag.

When libgcc is compiled "normally", individual objects in libgcc.a are
compiled _without_ SHF_ARM_PURECODE.  Note line 4 below, with flags
"AX" only (no "y").  

`readelf -S arm-none-eabi/thumb/v6-m/nofp/libgcc/libgcc.a`

    File: arm-none-eabi/thumb/v6-m/nofp/libgcc/libgcc.a(_clzsi2.o)
    There are 19 section headers, starting at offset 0x43c:

    Section Headers:
      [Nr] Name              Type            Addr     Off    Size   ES Flg Lk 
Inf Al
      [ 0]                   NULL            00000000 000000 000000 00      0   
0  0
      [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   
0  2
      [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   
0  1
      [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   
0  1
  ==> [ 4] .text.sorted[...] PROGBITS        00000000 000034 000034 00  AX  0   
0  4
      ...
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      y (purecode), p (processor specific)

When this "normal" libgcc is linked into an -mpure-code program
(e.g. with RUNTESTFLAGS), the linker script flattens all of the
sections together into a single output.  The relevant portion of the
linker script:

      .text           :
      {
        *(.text.unlikely .text.*_unlikely .text.unlikely.*)
        *(.text.exit .text.exit.*)
        *(.text.startup .text.startup.*)
        *(.text.hot .text.hot.*)
        *(SORT(.text.sorted.*)) // _clzsi2.o matches here
        *(.text .stub .text.* .gnu.linkonce.t.*) // main.o matches here
        /* .gnu.warning sections are handled specially by elf.em.  */
        *(.gnu.warning)
        *(.glue_7t) *(.glue_7) *(.vfp11_veneer) *(.v4_bx)
      }

I can't pretend to know how the linker merges conflicting flags from the
various input sections, but the final binary has the attributes
"AXy" as expected from the top level compile (line 2):

`readelf -Sl builtin-bitops-1.exe`

    There are 22 section headers, starting at offset 0x10934:

    Section Headers:
      [Nr] Name              Type            Addr     Off    Size   ES Flg Lk 
Inf Al
      [ 0]                   NULL            00000000 000000 000000 00      0   
0  0
      [ 1] .init             PROGBITS        00008000 008000 00000c 00  AX  0   
0  4
  ==> [ 2] .text             PROGBITS        0000800c 00800c 00455c 00 AXy  0   
0  4
      [ 3] .fini             PROGBITS        0000c568 00c568 00000c 00  AX  0   
0  4
      [ 4] .rodata           PROGBITS        0000c574 00c574 000050 00   A  0   
0  4
      [ 5] .ARM.exidx        ARM_EXIDX       0000c5c4 00c5c4 000008 00  AL  2   
0  4
      [ 6] .eh_frame         PROGBITS        0000c5cc 00c5cc 000124 00   A  0   
0  4
      [ 7] .init_array       INIT_ARRAY      0001c6f0 00c6f0 000004 04  WA  0   
0  4
      [ 8] .fini_array       FINI_ARRAY      0001c6f4 00c6f4 000004 04  WA  0   
0  4
      [ 9] .data             PROGBITS        0001c6f8 00c6f8 000a30 00  WA  0   
0  8
      [10] .bss              NOBITS          0001d128 00d128 000114 00  WA  0   
0  4
      ...
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
      L (link order), O (extra OS processing required), G (group), T (TLS),
      C (compressed), x (unknown), o (OS specific), E (exclude),
      y (purecode), p (processor specific)

    There are 3 program headers, starting at offset 52

    Program Headers:
      Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
      EXIDX          0x00c5c4 0x0000c5c4 0x0000c5c4 0x00008 0x00008 R   0x4
      LOAD           0x000000 0x00000000 0x00000000 0x0c6f0 0x0c6f0 R E 0x10000
      LOAD           0x00c6f0 0x0001c6f0 0x0001c6f0 0x00a38 0x00b4c RW  0x10000

     Section to Segment mapping:
      Segment Sections...
       00     .ARM.exidx
  ==>   01     .init .text .fini .rodata .ARM.exidx .eh_frame
       02     .init_array .fini_array .data .bss

A segment contains complete sections, not the other way around.  As you
can see above, the first LOAD segment contains the entire ".text" plus
some other sections.   Thus, SHF_ARM_PURECODE flag really appears
to apply to all of "text", even though the bits linked in from libgcc
weren't built or flagged this way.

> 
> > [2] The existing pure-code tests are compile-only and cover just the
> >     disassembled 'main.o'.  There is no test of a complete executable
> >     and there is no execution/simulation.
> That's something I did manually: run the full gcc testsuite, forcing 
> -mpure-code
> in RUNTESTFLAGS. This way, all execution tests are compiled with -mpure-code,
> and this is how I found several bugs.
> 
> > [3] While other parts of binutils may understand SHF_ARM_PURECODE, I
> >     don't think the simulator checks section flags or throws exceptions.
> Indeed, I know of no public simulator that honors this flag. I do have
> one though.
> 
> > [4] builtin-bitops-1 modified this way will always fail due to the array
> >     data definitions (longs, longlongs, etc).  GCC can't translate those
> >     to instructions.  While the ".data" section would presumably be
> >     readable, scan-assembler-not doesn't know the difference.
> Sure, adding such scan-assembler-not is not suitable for any existing 
> testcase.
> That's why it is only in place for testcases dedicated to -mpure-code.
> 
> > [5] Even if the simulator were modified to throw exceptions, this will
> >     continue to fail because _mainCRTStartup uses a literal pool.
> Yes, in general, and that's why I mentioned problems with newlib
> earlier in this thread.
> 
> However, the simulator I use only throws an exception for code executed with
> SHF_ARM_PURECODE. Code in the "regular" code segment is not checked.
> So this does not catch errors in hand-written assembly code using regular 
> .text,
> but it enables to run larger validations (such as the whole GCC testsuite)
> without having to fix all of newlib.
> Not perfect, as it left the issues in libgcc we are discussing, but it
> helped me fix
> several bugs in -mpure-code.

Yet again I suspect that you have a custom linker script, or there's
some other major difference.  Using the public releases of binutils,
newlib, etc, my experiences just aren't lining up with yours.

> Thanks,
> 
> Christophe
> 
> >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > --snip-- 

If the test server farm is free at some point, would you mind running
another set of regression tests on my v5 patch series?

Regards,
Daniel

Re: [PATCH v3] libgcc: Thumb-1 Floating-Point Library for Cortex M0

Reply via email to