[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-03 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #32 from deller at gmx dot de ---
Fixed in Linux kernel by declaring the extern int32 as char:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c42813b71a06a2ff4a155aa87ac609feeab76cf3

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #31 from deller at gmx dot de ---
Richard suggested that adding a compiler optimization barrier (__asm__ ("" :
"+r" (__pptr))) might fix the problem.
I tested the attached patch and it works nicely.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #30 from deller at gmx dot de ---
Created attachment 51405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51405&action=edit
Linux kernel patch to add compiler optimization barrier

Linux kernel boots sucessfully with this patch on hppa.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #29 from Andrew Pinski  ---
(In reply to deller from comment #28)
> Arnd,
> there are various calls to the get_unaligned_X() functions in all kernel
> bootloaders, specifically in the kernel decompression routines: 

get_unaligned_ function is fine and working correctly.  It is only the
declarations of output_len (and like declarations) which problematic.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #28 from deller at gmx dot de ---
Arnd,
there are various calls to the get_unaligned_X() functions in all kernel
bootloaders, specifically in the kernel decompression routines: 
[deller@ls3530 linux-2.6]$ grep get_unaligned lib/decompress*
lib/decompress_unlz4.c: size_t out_len = get_unaligned_le32(input + in_len);
lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp);
lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp);
lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
lib/decompress_unlzo.c: dst_len = get_unaligned_be32(in_buf);
lib/decompress_unlzo.c: src_len = get_unaligned_be32(in_buf);

So sadly it's not possible to work around that cases with linker scripts,
because they work on externally generated compressed files (kernel code) for
which the specs of the compressed files can't be changed.
Same for the output_len variable - externally linked in directly behind the
code and not (easily?) changeable.
Helge

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread arnd at linaro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #27 from Arnd Bergmann  ---
The linux kernel instance from arch/parisc/ looks like a bug we fixed in
arch/arm a few years ago, by adding the required alignment directive to the
linker script.

If changing the linker script is not possible because of boot loader
requirements, then this should do as well:

diff --git a/arch/parisc/boot/compressed/misc.c
b/arch/parisc/boot/compressed/misc.c
index 2d395998f524..b91d6cf80c06 100644
--- a/arch/parisc/boot/compressed/misc.c
+++ b/arch/parisc/boot/compressed/misc.c
@@ -26,7 +26,7 @@
 extern char input_data[];
 extern int input_len;
 /* output_len is inserted by the linker possibly at an unaligned address */
-extern __le32 output_len __aligned(1);
+extern struct { __u8 bytes; } output_len;
 extern char _text, _end;
 extern char _bss, _ebss;
 extern char _startcode_end;

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
Bug 102162 depends on bug 88085, which changed state.

Bug 88085 Summary: User alignments on var decls not respected if smaller than 
type alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88085

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|NEW |RESOLVED

--- Comment #26 from Andrew Pinski  ---
Just marking this as a dup of bug 88085.

The workaround is do this:
typedef unsigned int u32a1  __attribute__((__aligned__(1)));

 extern u32a1  output_len __attribute__((__aligned__(1)));

*** This bug has been marked as a duplicate of bug 88085 ***

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||88085

--- Comment #25 from Andrew Pinski  ---
PR 88085 is the same bug.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88085
[Bug 88085] User alignments on var decls not respected if smaller than type
alignment

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #24 from dave.anglin at bell dot net ---
On 2021-09-01 8:23 p.m., pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
>
> --- Comment #23 from Andrew Pinski  ---
> (In reply to Andrew Pinski from comment #22)
>> The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos:
>>
>>   /* We can set the alignment from the type if we are making an object or if
>>  this is an INDIRECT_REF.  */
>>   if (objectp || TREE_CODE (t) == INDIRECT_REF)
>> attrs.align = MAX (attrs.align, TYPE_ALIGN (type));
>>
>>
>> The type here is not the correct thing to do.
> This has been a bug since r0-38512 (2001).
Excellent work!  I assume attrs.align should only be set from type when it is
not set.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #23 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #22)
> The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos:
> 
>   /* We can set the alignment from the type if we are making an object or if
>  this is an INDIRECT_REF.  */
>   if (objectp || TREE_CODE (t) == INDIRECT_REF)
> attrs.align = MAX (attrs.align, TYPE_ALIGN (type));
> 
> 
> The type here is not the correct thing to do.

This has been a bug since r0-38512 (2001).

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #22 from Andrew Pinski  ---
The problem is in emit-rtl.c in set_mem_attributes_minus_bitpos:

  /* We can set the alignment from the type if we are making an object or if
 this is an INDIRECT_REF.  */
  if (objectp || TREE_CODE (t) == INDIRECT_REF)
attrs.align = MAX (attrs.align, TYPE_ALIGN (type));


The type here is not the correct thing to do.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #21 from dave.anglin at bell dot net ---
On 2021-09-01 7:21 p.m., pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
>
> --- Comment #17 from Andrew Pinski  ---
> (In reply to dave.anglin from comment #14)
>> On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote:
>>> We only get correct code at -O0.
>> Maybe cpymemsi expander is problem.
> It can't be as that is only used for !TARGET_64BIT and this is a TARGET_64BIT
> as obvious by "LEVEL 2.0w".
I changed expanders for both !TARGET_64BIT and TARGET_64BIT.  Didn't help. 
Same error with trunk.

Dave

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #20 from Andrew Pinski  ---
tem was the var_decl
/* If TEM's type is a union of variable size, pass TARGET to the inner
   computation, since it will need a temporary and TARGET is known
   to have to do.  This occurs in unchecked conversion in Ada.  */
orig_op0 = op0
  = expand_expr_real (tem,
  (TREE_CODE (TREE_TYPE (tem)) == UNION_TYPE
   && COMPLETE_TYPE_P (TREE_TYPE (tem))
   && (TREE_CODE (TYPE_SIZE (TREE_TYPE (tem)))
   != INTEGER_CST)
   && modifier != EXPAND_STACK_PARM
   ? target : NULL_RTX),
  VOIDmode,
  modifier == EXPAND_SUM ? EXPAND_NORMAL :
modifier,
  NULL, true);
produces:
(gdb) p debug_rtx(op0)
(mem/c:SI (reg/f:DI 71) [1 output_len+0 S4 A32])

Note the A32 here.

So it is a bug in the expansion of the var_decl.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #19 from Andrew Pinski  ---
Gimple level does look correct:
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77315bd0 precision:32 min  max  context >
readonly
arg:0  unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x773159d8
attributes > fields
 context 
pointer_to_this >

arg:0 
constant arg:0 
t.c:17:9 start: t.c:17:9 finish: t.c:17:39>
arg:1 >
arg:1  unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77251690 precision:32 min  max  context 
pointer_to_this >
unsigned packed SI t.c:12:33 size 
unit-size 
align:8 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context >
t.c:12:103 start: t.c:12:97 finish: t.c:12:105>

The var_decl too:
(gdb) p debug_tree(0x77ff6120)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77251690 precision:32 min  max  context 
pointer_to_this >
addressable used public unsigned external read SI t.c:6:14 size
 unit-size 
user align:8 warn_if_not_align:0 context 
attributes 
value >> chain
>

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-09-01
 Ever confirmed|0   |1

--- Comment #18 from Andrew Pinski  ---
I used noticed the original testcase had the wrong line commented out :)
It should have been:
extern u32  output_len __attribute__((__aligned__(1)));

Anyways confirmed on aarch64-linux-gnu with -O1 -mstrict-align too.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #17 from Andrew Pinski  ---
(In reply to dave.anglin from comment #14)
> On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote:
> > We only get correct code at -O0.
> Maybe cpymemsi expander is problem.

It can't be as that is only used for !TARGET_64BIT and this is a TARGET_64BIT
as obvious by "LEVEL 2.0w".

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #16 from Andrew Pinski  ---
I cannot even reproduce the original issue on released gcc 10.3.0 sources.
What configure options is being used? I used none except for --target:
Configured with: ../configure --target=hppa-linux-gnu

I even tried with  -march=2.0 and it still works.
Looks like the target really is hppa*64-linux-gnu :)

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #15 from Andrew Pinski  ---
The trunk works:
.LEVEL 1.1
.text
.align 4
.globl test
.type   test, @function
test:
.PROC
.CALLINFO FRAME=0,NO_CALLS
.ENTRY
addil LR'output_len-$global$,%r27
ldo RR'output_len-$global$(%r1),%r20
ldb RR'output_len-$global$(%r1),%r28
zdep %r28,7,8,%r28
ldb 1(%r20),%r19
zdep %r19,15,16,%r19
or %r19,%r28,%r19
ldb 2(%r20),%r28
zdep %r28,23,24,%r28
or %r28,%r19,%r28
ldb 3(%r20),%r19
bv %r0(%r2)
or %r19,%r28,%r28
.EXIT
.PROCEND
.size   test, .-test
.ident  "GCC: (GNU) 12.0.0 20210901 (experimental)"

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #14 from dave.anglin at bell dot net ---
On 2021-09-01 6:35 p.m., dave.anglin at bell dot net wrote:
> We only get correct code at -O0.
Maybe cpymemsi expander is problem.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #13 from dave.anglin at bell dot net ---
On 2021-09-01 5:52 p.m., pinskia at gcc dot gnu.org wrote:
> This is doing the correct thing in splitting up the load into bytes loads.
We only get correct code at -O0.  STRICT_ALIGNMENT is defined to 1.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #12 from Andrew Pinski  ---
Here is what the first testcase looks like at -O1 -mstrict-align on
aarch64-linux-gnu for GCC 10.3.0:
test:
.LFB1:
.cfi_startproc
adrpx0, output_len
add x1, x0, :lo12:output_len
ldrbw2, [x0, #:lo12:output_len]
ldrbw0, [x1, 1]
orr x2, x2, x0, lsl 8
ldrbw0, [x1, 2]
orr x0, x2, x0, lsl 16
ldrbw1, [x1, 3]
orr w0, w0, w1, lsl 24
ret
.cfi_endproc
.LFE1:
.size   test, .-test
.ident  "GCC: (GNU) 10.3.0"
.section.note.GNU-stack,"",@progbits

This is doing the correct thing in splitting up the load into bytes loads.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end
   Keywords||wrong-code

--- Comment #11 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #10) 
> Does hppa*-*-linux* have STRICT_ALIGNMENT set to true or false?

config/pa/pa.h:#define STRICT_ALIGNMENT 1

Hmm, so it should work.
It is definitely something in the expansion between gimple and rtl which is
messing up.