https://sourceware.org/bugzilla/show_bug.cgi?id=34135

            Bug ID: 34135
           Summary: ld/riscv: GP relaxation deletes HI20 anchor while an
                    unrelaxed LO12 sibling still uses it
           Product: binutils
           Version: 2.45
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: ld
          Assignee: unassigned at sourceware dot org
          Reporter: l784896635 at gmail dot com
  Target Milestone: ---

Created attachment 16708
  --> https://sourceware.org/bugzilla/attachment.cgi?id=16708&action=edit
Minimal RISC-V reproducer for ld GP-relaxation bug: source, linker script,
prebuilt object, bad linked ELF, repro scripts, and objdump/readelf evidence.

GNU ld for RISC-V appears to emit an invalid linked ELF when GP relaxation is
applied to a small-data access group at the edge of the GP-relative signed
12-bit range.

The input object has one HI20 anchor in a2 and two LO12 loads that use that
anchor:

    lui a2, %hi(g0)
    lw  a3, %lo(g0)(a2)
    lw  a4, %lo(g0+4)(a2)

When linked with --relax, the first access is rewritten to gp-relative form and
the defining "lui a2" is removed, but the second access remains as "lw
a4,0(a2)". The final executable therefore uses a2 after its definition has been
deleted.

Known bad:

    GNU ld (xPack GNU RISC-V Embedded GCC x86_64) 2.45
    riscv-none-elf-gcc.exe (xPack GNU RISC-V Embedded GCC x86_64) 15.2.0
    Target: rv64gc/lp64d
    Host: Windows-10-10.0.26200-SP0

The reproducer can be run either from source or from the attached prebuilt
object.

>From source:

    cd reproducer
    sh ./repro-from-source.sh

>From the prebuilt object:

    cd reproducer
    sh ./repro-from-prebuilt-object.sh

Equivalent manual commands from source:

    riscv-none-elf-gcc -march=rv64gc -mabi=lp64d -O2 -ffunction-sections
-fdata-sections -c case.S -o case.o
    riscv-none-elf-gcc -march=rv64gc -mabi=lp64d case.o -Wl,--gc-sections
-Wl,--relax -nostdlib -Wl,-T,linker.ld -Wl,-Map,relax.map -o relax.elf
    riscv-none-elf-objdump -dr case.o
    riscv-none-elf-objdump -d relax.elf
    riscv-none-elf-readelf -s relax.elf

Equivalent direct ld command from the prebuilt object:

    riscv-none-elf-ld --gc-sections --relax -T linker.ld -Map relax.map -o
relax.elf case.o
    riscv-none-elf-objdump -d relax.elf

Actual result observed in the linked ELF:

    0000000080000000 <_start>:
        80000000: 7fc1a683           lw  a3,2044(gp) # 80001ffc <g0>
        80000004: 00062703           lw  a4,0(a2)
        80000008: a009               j   8000000a <done>

There is no remaining "lui a2" in _start.

The relevant symbol addresses are:

    g0                 = 0x80001ffc
    __global_pointer$  = 0x80001800

So g0 is at gp + 2044, which fits the signed 12-bit gp-relative offset, but g0
+ 4 is at gp + 2048, which is just outside the signed 12-bit range. GNU ld
appears to partially relax the group and delete the shared HI20 anchor even
though a sibling LO12 access still needs it.

The original object relocations are:

    0x0: R_RISCV_HI20     g0
    0x4: R_RISCV_LO12_I   g0
    0x8: R_RISCV_LO12_I   g0 + 4

Expected result:

The linker should not produce an executable that uses an anchor register after
deleting its definition. I would expect one of these outcomes:

1. keep the address-materialization group consistent;
2. do not delete the HI20 anchor while any sibling LO12 access still uses it;
3. reject the link with a relocation overflow/truncation diagnostic.

For comparison, linking the same object with --no-relax does not silently
produce the above invalid relaxed code; it reports a relocation
truncation/overflow instead with the tested 2.45 toolchain.

This also appears inconsistent with the RISC-V psABI linker-relaxation rules.
The psABI describes relocation groups, says a linker should not apply
relaxation to only part of a relocation group, and for GP relaxation
specifically notes that multiple LO12 fragments may share one HI20 but all
relaxed GP offsets must be in range.

Additional campaign evidence:

This was found by a RISC-V post-link metamorphic test campaign. The campaign
produced 86 raw/replayable hits, but all of them collapse to the same minimal
case.S / case.o / linked.elf hashes, so I am reporting this as one bug report /
one likely root cause rather than 86 separate bugs.

    60 hits from real-seed GP families
    26 hits from directed GP-boundary templates
    top witness rule: I4, anchor-relative GP group member remains after the
anchor definition disappeared
    case.S SHA-256:     
008412b7609ba4830600aa8628a1cebac7a7716de1cc38b536496dd8f0955fe3
    case.o SHA-256:     
2d059d6fefc30a29bf430ca5b143015d112ea60733366ead5024a110b894d8ee
    linked.elf SHA-256: 
7826c49859feefdb913d08d78ed91b5729a9b6059f57c1393f16671fb8003f76

The attached archive includes the source, linker script, prebuilt object,
reference bad linked ELF, reproducer scripts, disassembly/relocation evidence,
and the campaign summary/index.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to