https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102785
Bug ID: 102785 Summary: [12 Regression] {smul,umul}_highpart changes break bfin-elf Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: law at gcc dot gnu.org Target Milestone: --- This change: commit 555fa3545efe23393ff21fe0928aa3942e1b90ed (refs/bisect/bad) Author: Roger Sayle <ro...@nextmovesoftware.com> Date: Thu Oct 7 15:42:09 2021 +0100 Introduce smul_highpart and umul_highpart RTX for high-part multiplications This patch introduces new RTX codes to allow the RTL passes and backends to consistently represent high-part multiplications. Currently, the RTL used by different backends for expanding smul<mode>3_highpart and umul<mode>3_highpart varies greatly, with many but not all choosing to express this something like: [ ... ] Is causing execution failures for bfin-elf for this code (taken from the bfin/builtins tests): extern void abort (void); typedef short __v2hi __attribute ((vector_size(4))); typedef __v2hi fract2x16; typedef short fract16; int main () { fract2x16 a, b, t; fract16 t1, t2; a = __builtin_bfin_compose_2x16 (0x5000, 0x3000); b = __builtin_bfin_compose_2x16 (0x4000, 0x2000); t = __builtin_bfin_dspaddsubsat (a, b); t1 = __builtin_bfin_extract_hi (t); t2 = __builtin_bfin_extract_lo (t); if (t1 != 0x7fff || t2 != 0x1000) abort (); return 0; } Before the change the source compiled down to something like this at -O2: _main: R0.H = 20480; R1.H = 16384; R1.L = 8192; R0.L = 12288; R0 = R0 +|- R1 (S); R1.L = R0.H << 0; R1 = R1.L (X); R2 = 32767 (X); [--SP] = RETS; cc =R1==R2; SP += -12; if !cc jump .L2; R0 = R0.L (X); R1 = 4096 (X); cc =R0==R1; if !cc jump .L2; SP += 12; R0 = 0 (X); RETS = [SP++]; rts; .L2: call _abort; What's important here is there's real code that tests some values and conditionally returns with zero status or aborts. After the [su]mul_highpart changes we get: _main: [--SP] = RETS; // 36 [c=4 l=2] *pushsi_insn SP += -12; // 37 [c=4 l=2] addsi3/0 call _abort; // 23 [c=0 l=4] *call_symbol ie, we unconditionally call abort. Things start to differ in CSE1. But I haven't really tried to debug it. Thankfully it should be possible to debug with just a cross compiler..