[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #32 from deller at gmx dot de --- Fixed in Linux kernel by declaring the extern int32 as char: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c42813b71a06a2ff4a155aa87ac609feeab76cf3
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #31 from deller at gmx dot de --- Richard suggested that adding a compiler optimization barrier (__asm__ ("" : "+r" (__pptr))) might fix the problem. I tested the attached patch and it works nicely.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #30 from deller at gmx dot de --- Created attachment 51405 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51405&action=edit Linux kernel patch to add compiler optimization barrier Linux kernel boots sucessfully with this patch on hppa.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #29 from Andrew Pinski --- (In reply to deller from comment #28) > Arnd, > there are various calls to the get_unaligned_X() functions in all kernel > bootloaders, specifically in the kernel decompression routines: get_unaligned_ function is fine and working correctly. It is only the declarations of output_len (and like declarations) which problematic.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #28 from deller at gmx dot de --- Arnd, there are various calls to the get_unaligned_X() functions in all kernel bootloaders, specifically in the kernel decompression routines: [deller@ls3530 linux-2.6]$ grep get_unaligned lib/decompress* lib/decompress_unlz4.c: size_t out_len = get_unaligned_le32(input + in_len); lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp); lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp); lib/decompress_unlzo.c: version = get_unaligned_be16(parse); lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER) lib/decompress_unlzo.c: dst_len = get_unaligned_be32(in_buf); lib/decompress_unlzo.c: src_len = get_unaligned_be32(in_buf); So sadly it's not possible to work around that cases with linker scripts, because they work on externally generated compressed files (kernel code) for which the specs of the compressed files can't be changed. Same for the output_len variable - externally linked in directly behind the code and not (easily?) changeable. Helge
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #27 from Arnd Bergmann --- The linux kernel instance from arch/parisc/ looks like a bug we fixed in arch/arm a few years ago, by adding the required alignment directive to the linker script. If changing the linker script is not possible because of boot loader requirements, then this should do as well: diff --git a/arch/parisc/boot/compressed/misc.c b/arch/parisc/boot/compressed/misc.c index 2d395998f524..b91d6cf80c06 100644 --- a/arch/parisc/boot/compressed/misc.c +++ b/arch/parisc/boot/compressed/misc.c @@ -26,7 +26,7 @@ extern char input_data[]; extern int input_len; /* output_len is inserted by the linker possibly at an unaligned address */ -extern __le32 output_len __aligned(1); +extern struct { __u8 bytes; } output_len; extern char _text, _end; extern char _bss, _ebss; extern char _startcode_end;
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Bug 102162 depends on bug 88085, which changed state. Bug 88085 Summary: User alignments on var decls not respected if smaller than type alignment https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88085 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #26 from Andrew Pinski --- Just marking this as a dup of bug 88085. The workaround is do this: typedef unsigned int u32a1 __attribute__((__aligned__(1))); extern u32a1 output_len __attribute__((__aligned__(1))); *** This bug has been marked as a duplicate of bug 88085 ***
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Andrew Pinski changed: What|Removed |Added Depends on||88085 --- Comment #25 from Andrew Pinski --- PR 88085 is the same bug. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88085 [Bug 88085] User alignments on var decls not respected if smaller than type alignment
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #24 from dave.anglin at bell dot net --- On 2021-09-01 8:23 p.m., pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 > > --- Comment #23 from Andrew Pinski --- > (In reply to Andrew Pinski from comment #22) >> The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos: >> >> /* We can set the alignment from the type if we are making an object or if >> this is an INDIRECT_REF. */ >> if (objectp || TREE_CODE (t) == INDIRECT_REF) >> attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); >> >> >> The type here is not the correct thing to do. > This has been a bug since r0-38512 (2001). Excellent work! I assume attrs.align should only be set from type when it is not set.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #23 from Andrew Pinski --- (In reply to Andrew Pinski from comment #22) > The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos: > > /* We can set the alignment from the type if we are making an object or if > this is an INDIRECT_REF. */ > if (objectp || TREE_CODE (t) == INDIRECT_REF) > attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); > > > The type here is not the correct thing to do. This has been a bug since r0-38512 (2001).
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #22 from Andrew Pinski --- The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos: /* We can set the alignment from the type if we are making an object or if this is an INDIRECT_REF. */ if (objectp || TREE_CODE (t) == INDIRECT_REF) attrs.align = MAX (attrs.align, TYPE_ALIGN (type)); The type here is not the correct thing to do.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #21 from dave.anglin at bell dot net --- On 2021-09-01 7:21 p.m., pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 > > --- Comment #17 from Andrew Pinski --- > (In reply to dave.anglin from comment #14) >> On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote: >>> We only get correct code at -O0. >> Maybe cpymemsi expander is problem. > It can't be as that is only used for !TARGET_64BIT and this is a TARGET_64BIT > as obvious by "LEVEL 2.0w". I changed expanders for both !TARGET_64BIT and TARGET_64BIT. Didn't help. Same error with trunk. Dave
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #20 from Andrew Pinski --- tem was the var_decl /* If TEM's type is a union of variable size, pass TARGET to the inner computation, since it will need a temporary and TARGET is known to have to do. This occurs in unchecked conversion in Ada. */ orig_op0 = op0 = expand_expr_real (tem, (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE && COMPLETE_TYPE_P (TREE_TYPE (tem)) && (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem))) != INTEGER_CST) && modifier != EXPAND_STACK_PARM ? target : NULL_RTX), VOIDmode, modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier, NULL, true); produces: (gdb) p debug_rtx(op0) (mem/c:SI (reg/f:DI 71) [1 output_len+0 S4 A32]) Note the A32 here. So it is a bug in the expansion of the var_decl.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #19 from Andrew Pinski --- Gimple level does look correct: unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77315bd0 precision:32 min max context > readonly arg:0 unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x773159d8 attributes > fields context pointer_to_this > arg:0 constant arg:0 t.c:17:9 start: t.c:17:9 finish: t.c:17:39> arg:1 > arg:1 unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77251690 precision:32 min max context pointer_to_this > unsigned packed SI t.c:12:33 size unit-size align:8 warn_if_not_align:0 offset_align 128 offset bit-offset context > t.c:12:103 start: t.c:12:97 finish: t.c:12:105> The var_decl too: (gdb) p debug_tree(0x77ff6120) unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77251690 precision:32 min max context pointer_to_this > addressable used public unsigned external read SI t.c:6:14 size unit-size user align:8 warn_if_not_align:0 context attributes value >> chain >
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-09-01 Ever confirmed|0 |1 --- Comment #18 from Andrew Pinski --- I used noticed the original testcase had the wrong line commented out :) It should have been: extern u32 output_len __attribute__((__aligned__(1))); Anyways confirmed on aarch64-linux-gnu with -O1 -mstrict-align too.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #17 from Andrew Pinski --- (In reply to dave.anglin from comment #14) > On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote: > > We only get correct code at -O0. > Maybe cpymemsi expander is problem. It can't be as that is only used for !TARGET_64BIT and this is a TARGET_64BIT as obvious by "LEVEL 2.0w".
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #16 from Andrew Pinski --- I cannot even reproduce the original issue on released gcc 10.3.0 sources. What configure options is being used? I used none except for --target: Configured with: ../configure --target=hppa-linux-gnu I even tried with -march=2.0 and it still works. Looks like the target really is hppa*64-linux-gnu :)
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #15 from Andrew Pinski --- The trunk works: .LEVEL 1.1 .text .align 4 .globl test .type test, @function test: .PROC .CALLINFO FRAME=0,NO_CALLS .ENTRY addil LR'output_len-$global$,%r27 ldo RR'output_len-$global$(%r1),%r20 ldb RR'output_len-$global$(%r1),%r28 zdep %r28,7,8,%r28 ldb 1(%r20),%r19 zdep %r19,15,16,%r19 or %r19,%r28,%r19 ldb 2(%r20),%r28 zdep %r28,23,24,%r28 or %r28,%r19,%r28 ldb 3(%r20),%r19 bv %r0(%r2) or %r19,%r28,%r28 .EXIT .PROCEND .size test, .-test .ident "GCC: (GNU) 12.0.0 20210901 (experimental)"
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #14 from dave.anglin at bell dot net --- On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote: > We only get correct code at -O0. Maybe cpymemsi expander is problem.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #13 from dave.anglin at bell dot net --- On 2021-09-01 5:52 p.m., pinskia at gcc dot gnu.org wrote: > This is doing the correct thing in splitting up the load into bytes loads. We only get correct code at -O0. STRICT_ALIGNMENT is defined to 1.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #12 from Andrew Pinski --- Here is what the first testcase looks like at -O1 -mstrict-align on aarch64-linux-gnu for GCC 10.3.0: test: .LFB1: .cfi_startproc adrpx0, output_len add x1, x0, :lo12:output_len ldrbw2, [x0, #:lo12:output_len] ldrbw0, [x1, 1] orr x2, x2, x0, lsl 8 ldrbw0, [x1, 2] orr x0, x2, x0, lsl 16 ldrbw1, [x1, 3] orr w0, w0, w1, lsl 24 ret .cfi_endproc .LFE1: .size test, .-test .ident "GCC: (GNU) 10.3.0" .section.note.GNU-stack,"",@progbits This is doing the correct thing in splitting up the load into bytes loads.
[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Andrew Pinski changed: What|Removed |Added Component|tree-optimization |middle-end Keywords||wrong-code --- Comment #11 from Andrew Pinski --- (In reply to Andrew Pinski from comment #10) > Does hppa*-*-linux* have STRICT_ALIGNMENT set to true or false? config/pa/pa.h:#define STRICT_ALIGNMENT 1 Hmm, so it should work. It is definitely something in the expansion between gimple and rtl which is messing up.