[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2024-07-30 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #15 from Andrew Pinski  ---
No longer working on bitfields optimization. Maybe in a few years I will be
again but not today.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2021-08-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|rguenth at gcc dot gnu.org |pinskia at gcc dot 
gnu.org

--- Comment #14 from Andrew Pinski  ---
Taking this for GCC 13.  I am going to finally submit the bit-field lowering
next year.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2020-02-10 Thread luoxhu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org

--- Comment #13 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #12)
> But it could do just
> 
>   stw r4,0(r3)
> 
> (on LE; and with a rotate first, on BE).

Thanks for the catching, this optimization is not related to load hit store.  I
will investigate why store-merging pass failed to merge the 2 half store.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2020-02-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #12 from Segher Boessenkool  ---
But it could do just

  stw r4,0(r3)

(on LE; and with a rotate first, on BE).

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2020-02-09 Thread luoxhu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||luoxhu at gcc dot gnu.org

--- Comment #11 from luoxhu at gcc dot gnu.org ---
(In reply to Anton Blanchard from comment #4)
> Created attachment 39683 [details]
> Another bitop LHS test case
> 
> Here's another issue found in the Linux kernel. Seems like this should be a
> single lwz/stw since the union of counter and the bitops completely overlap.
> 
> The half word store followed by word load is going to prevent it from store
> forwarding.
> 
>  :
>0: 00 00 03 81 lwz r8,0(r3)
>4: 20 00 89 78 clrldi  r9,r4,32
>8: c2 0f 2a 79 rldicl  r10,r9,33,31
>c: 00 f8 48 51 rlwimi  r8,r10,31,0,0
>   10: 5e 00 2a 55 rlwinm  r10,r9,0,1,15
>   14: 00 00 03 91 stw r8,0(r3)
>   18: 00 00 83 b0 sth r4,0(r3)
>   1c: 00 00 42 60 ori r2,r2,0
>   20: 00 00 23 81 lwz r9,0(r3)
>   24: 00 04 29 55 rlwinm  r9,r9,0,16,0
>   28: 78 53 29 7d or  r9,r9,r10
>   2c: 00 00 23 91 stw r9,0(r3)
>   30: 20 00 80 4e blr

This case only is fixed on latest gcc 10 already (issues in case
__skb_decr_checksum_unnecessary from Anton Blanchard and test2 from Nicholas
Piggin  still exist).

gcc version 10.0.1 20200210 

objdump -d set_page_slub_counters.o

set_page_slub_counters.o: file format elf64-powerpcle


Disassembly of section .text:

 :
   0:   22 84 89 78 rldicl  r9,r4,48,48
   4:   00 00 83 b0 sth r4,0(r3)
   8:   02 00 23 b1 sth r9,2(r3)
   c:   20 00 80 4e blr

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2019-11-18 Thread luoxhu at cn dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #10 from Xiong Hu XS Luo  ---
(In reply to Andrew Pinski from comment #9)
> (In reply to rguent...@suse.de from comment #8)
> > On Fri, 15 Mar 2019, luoxhu at cn dot ibm.com wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509
> > > 
> > > Xiong Hu XS Luo  changed:
> > > 
> > >What|Removed |Added
> > > 
> > >  CC||guojiufu at gcc dot 
> > > gnu.org,
> > >||rguenth at gcc dot 
> > > gnu.org
> > > 
> > > --- Comment #7 from Xiong Hu XS Luo  ---
> > > Hi Richard, trying to figure out the issue recently, but get some 
> > > questions
> > > need your help. How is the status of the "proposed simple lowering of 
> > > bitfield
> > > accesses on GIMPLE", please? 
> > 
> > There are finished patches in a few variants but all of them show issues
> > in the testsuite, mostly around missed optimizations IIRC.  It has been
> > quite some time since I last looked at this but IIRC Andrew Pinski said
> > he's got sth for GCC 10 so you might want to ask him.
> 
> Yes I do, I am planing on submitting the new pass once GCC 10 has opened up.

Hi Pinski, is your patch upstreamed to GCC 10 please?  Thanks.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2019-03-15 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #9 from Andrew Pinski  ---
(In reply to rguent...@suse.de from comment #8)
> On Fri, 15 Mar 2019, luoxhu at cn dot ibm.com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509
> > 
> > Xiong Hu XS Luo  changed:
> > 
> >What|Removed |Added
> > 
> >  CC||guojiufu at gcc dot 
> > gnu.org,
> >||rguenth at gcc dot gnu.org
> > 
> > --- Comment #7 from Xiong Hu XS Luo  ---
> > Hi Richard, trying to figure out the issue recently, but get some questions
> > need your help. How is the status of the "proposed simple lowering of 
> > bitfield
> > accesses on GIMPLE", please? 
> 
> There are finished patches in a few variants but all of them show issues
> in the testsuite, mostly around missed optimizations IIRC.  It has been
> quite some time since I last looked at this but IIRC Andrew Pinski said
> he's got sth for GCC 10 so you might want to ask him.

Yes I do, I am planing on submitting the new pass once GCC 10 has opened up.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2019-03-15 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #8 from rguenther at suse dot de  ---
On Fri, 15 Mar 2019, luoxhu at cn dot ibm.com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509
> 
> Xiong Hu XS Luo  changed:
> 
>What|Removed |Added
> 
>  CC||guojiufu at gcc dot gnu.org,
>||rguenth at gcc dot gnu.org
> 
> --- Comment #7 from Xiong Hu XS Luo  ---
> Hi Richard, trying to figure out the issue recently, but get some questions
> need your help. How is the status of the "proposed simple lowering of bitfield
> accesses on GIMPLE", please? 

There are finished patches in a few variants but all of them show issues
in the testsuite, mostly around missed optimizations IIRC.  It has been
quite some time since I last looked at this but IIRC Andrew Pinski said
he's got sth for GCC 10 so you might want to ask him.

> for "less conservative about DECL_BIT_FIELD_REPRESENTATIVE", do you mean we
> choose large mode in GIMPLE stage, and make the decision when in target?
> Thanks.

DECL_BIT_FIELD_REPRESENTATIVE was mainly added for strict volatile
bitfield accesses as well as to avoid data races as specified by the
C++ memory model.  So in some cases we might be able to widen the
accesses (and we can shorten them in any case if we compute this is a good
idea).

> PS: As a newbie, can you tell how did you do to "Widening the representative"
> :),  I am a bit confused about the best mode and where to process it, 
> sometimes
> big mode is better and sometimes smaller mode is better(from Segher's
> comments).

Yeah, get_best_mode is a big pile of *** and this issue is very hard to
compute locally.  I suppose for the best decision you'd need to
look at the whole program (or at least the whole function) in a
flow-sensitive manner.  One of the lowering tries I did was to integrate
the lowering with the SRA pass which is said to eventually get
flow-sensitive analysis at some point (but at least does whole-function
analysis already).

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2019-03-15 Thread luoxhu at cn dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

Xiong Hu XS Luo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #7 from Xiong Hu XS Luo  ---
Hi Richard, trying to figure out the issue recently, but get some questions
need your help. How is the status of the "proposed simple lowering of bitfield
accesses on GIMPLE", please? 
for "less conservative about DECL_BIT_FIELD_REPRESENTATIVE", do you mean we
choose large mode in GIMPLE stage, and make the decision when in target?
Thanks.

PS: As a newbie, can you tell how did you do to "Widening the representative"
:),  I am a bit confused about the best mode and where to process it, sometimes
big mode is better and sometimes smaller mode is better(from Segher's
comments).

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2017-05-11 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #6 from Segher Boessenkool  ---
Doing an 8 byte load of something that was stored as 4 byte immediately
before will cause flushes and stalls...  Yeah it could use a 4-byte load
here afaics.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2017-05-11 Thread npiggin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

Nicholas Piggin  changed:

   What|Removed |Added

 CC||npiggin at gmail dot com

--- Comment #5 from Nicholas Piggin  ---
This test case seems like it may be related. It does the right thing and uses
all 4 byte ops when the 8 byte alignment is removed. I post it here because it
may not always be the case that smallest op is best

struct s {
unsigned long align1;
union {
unsigned int blah;
unsigned int a:1;
};
};

void test2(struct s *s)
{
s->blah = 100;
if (s->a)
asm volatile("#blah");
}

Generates (gcc 7.0.1)

test2:
li 9,100
stw 9,8(3)
ld 9,8(3)
andi. 9,9,0x1
beqlr 0
#APP
 # 29 "a.c" 1
#blah
 # 0 "" 2
#NO_APP
blr

There is a more general case of mismatched load and store sizes in unions with
different size types, but in this case the sizes could be matched I think.

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2016-09-25 Thread anton at samba dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #4 from Anton Blanchard  ---
Created attachment 39683
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39683=edit
Another bitop LHS test case

Here's another issue found in the Linux kernel. Seems like this should be a
single lwz/stw since the union of counter and the bitops completely overlap.

The half word store followed by word load is going to prevent it from store
forwarding.

 :
   0:   00 00 03 81 lwz r8,0(r3)
   4:   20 00 89 78 clrldi  r9,r4,32
   8:   c2 0f 2a 79 rldicl  r10,r9,33,31
   c:   00 f8 48 51 rlwimi  r8,r10,31,0,0
  10:   5e 00 2a 55 rlwinm  r10,r9,0,1,15
  14:   00 00 03 91 stw r8,0(r3)
  18:   00 00 83 b0 sth r4,0(r3)
  1c:   00 00 42 60 ori r2,r2,0
  20:   00 00 23 81 lwz r9,0(r3)
  24:   00 04 29 55 rlwinm  r9,r9,0,16,0
  28:   78 53 29 7d or  r9,r9,r10
  2c:   00 00 23 91 stw r9,0(r3)
  30:   20 00 80 4e blr

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2016-06-15 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #3 from rguenther at suse dot de  ---
On Tue, 14 Jun 2016, segher at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509
> 
> --- Comment #2 from Segher Boessenkool  ---
> (In reply to Richard Biener from comment #1)
> > It looks like we didn't adjust the bitfield read paths for the mem model
> > because in practice it doesn't matter and it may generate larger/slower code
> > not to do loads in larger types on some archs.
> 
> It still generates correct (or at least working) code, yeah.
> 
> > Note that we conservatively compute the extent for
> > DECL_BIT_FIELD_REPRESENTATIVE
> > by prefering smaller modes.  There's some ??? in
> > finish_bitfield_representative
> > and the above remark about tail padding re-use is only implemented via
> > prefering
> > smaller modes.  Thus when adding a 'long foo' after csum_level the
> > representative doesn't change to 64bit width but stays at 8bits (both are
> > valid from the C++ memory model).
> 
> Smaller modes are slightly better here, they allow more efficient insn
> sequences to manipulate the data, esp. on big endian.  For example on
> PowerPC, on BE, our 16-bit immediates (in e.g. ior) can handle QImode
> and HImode, but not DImode; and e.g. the rlwinm insn can handle SImode
> but not DImode.  In general, with smaller modes you do not need to
> worry about preserving the other bits.
> 
> > Note that the proposed simple lowering of bitfield accesses on GIMPLE would
> > do accesses of DECL_BIT_FIELD_REPRESENTATIVE and thus in this case use byte
> > accesses.
> 
> That would be lovely :-)

On my list for GCC 7 ;-)

> > I suppose we want to be less conservative about 
> > DECL_BIT_FIELD_REPRESENTATIVE
> > and leave it up to the target how to do the actual accesses.
> 
> Maybe.  Always smallest is a good choice for PowerPC at least.

Note the middle-end in extract_bit_field says using a larger mode is
always better because it is better for CSE (not sure about that argument
if you only look at a single stmt).

[Bug middle-end/71509] Bitfield causes load hit store with larger store than load

2016-06-14 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71509

--- Comment #2 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #1)
> It looks like we didn't adjust the bitfield read paths for the mem model
> because in practice it doesn't matter and it may generate larger/slower code
> not to do loads in larger types on some archs.

It still generates correct (or at least working) code, yeah.

> Note that we conservatively compute the extent for
> DECL_BIT_FIELD_REPRESENTATIVE
> by prefering smaller modes.  There's some ??? in
> finish_bitfield_representative
> and the above remark about tail padding re-use is only implemented via
> prefering
> smaller modes.  Thus when adding a 'long foo' after csum_level the
> representative doesn't change to 64bit width but stays at 8bits (both are
> valid from the C++ memory model).

Smaller modes are slightly better here, they allow more efficient insn
sequences to manipulate the data, esp. on big endian.  For example on
PowerPC, on BE, our 16-bit immediates (in e.g. ior) can handle QImode
and HImode, but not DImode; and e.g. the rlwinm insn can handle SImode
but not DImode.  In general, with smaller modes you do not need to
worry about preserving the other bits.

> Note that the proposed simple lowering of bitfield accesses on GIMPLE would
> do accesses of DECL_BIT_FIELD_REPRESENTATIVE and thus in this case use byte
> accesses.

That would be lovely :-)

> I suppose we want to be less conservative about DECL_BIT_FIELD_REPRESENTATIVE
> and leave it up to the target how to do the actual accesses.

Maybe.  Always smallest is a good choice for PowerPC at least.

> Widening the representative generates
> 
> __skb_decr_checksum_unnecessary:
> ld 9,8(3)
> addi 10,9,3
> rldicr 9,9,0,61
> rldicl 10,10,0,62
> or 9,9,10
> std 9,8(3)
> blr

Making all accesses use QImode yields

__skb_decr_checksum_unnecessary:
lbz 9,8(3)
addi 10,9,3
rlwinm 9,9,0,0,29
rlwinm 10,10,0,30,31
or 9,9,10
stb 9,8(3)
blr

(the rlwinm's and the or could be a rlwimi together, not sure why they
aren't merged)

and on BE it gives

.L.__skb_decr_checksum_unnecessary:
lbz 9,8(3)
srwi 10,9,6
rlwinm 9,9,0,26,31
addi 10,10,3
rlwinm 10,10,6,24,25
or 9,9,10
stb 9,8(3)
blr

which could be just

lbz 9,8(3)
addi 9,9,192
stb 9,8(3)
blr