RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-06-06 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 
  Great. What about you Andreas? Does it work fine for you? If yes, is this ok
 for trunk?
 
 Ok.
 
 Thanks,
 Richard.

Commited since I got positive feedback from Christophe Lyon and
Rainer (on PR61320).

Best regards,

Thomas





Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-06-04 Thread Richard Biener
On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Christophe Lyon [mailto:christophe.l...@linaro.org]
 On 29 May 2014 11:58, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 
  Does the patch solve the problem you had? What about you Christophe?
 
 

 Hi Thomas,

 After a quick test, it looks OK to me.

 Great. What about you Andreas? Does it work fine for you? If yes, is this ok 
 for trunk?

Ok.

Thanks,
Richard.

 Best regards,

 Thomas




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-06-02 Thread Christophe Lyon
On 29 May 2014 11:58, Thomas Preud'homme thomas.preudho...@arm.com wrote:
 From: Andreas Schwab [mailto:sch...@linux-m68k.org]
 Thomas Preud'homme thomas.preudho...@arm.com writes:

  By the way, I couldn't understand how you reached the value
  0x44434241. Can you explain me?

 Each byte is composed of the first 7 bits of the original byte.

 Sorry, it seems I wasn't very awake when I checked that. Makes sense now.
 Thanks.

 Does the patch solve the problem you had? What about you Christophe?

 Best regards,

 Thomas


Hi Thomas,

After a quick test, it looks OK to me.

Thanks

Christophe.


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Thomas Preud'homme
 From: Andreas Schwab [mailto:sch...@linux-m68k.org]
 
 This adds a full byte of padding between each bitfield.  If you want a
 single padding bit you should use :1, but you also need to update the
 test to check for 0x44434241 (0x88868482 is impossible, since that
 requires at least 8 bits per bitfield).

Actually if I understood C99 correctly it depends on the storage unit
allocated for the bitfield preceding the 0 length bitfield. Instead of
trying to cover all possible value read from this bitfield I rewrote the
test to check if bswap misinterpret the expression and replace it with
a load or load+bswap. This reduce the number of possible values to 2
and thus makes the test less fragile and easier to understand.

By the way, I couldn't understand how you reached the value
0x44434241. Can you explain me?

Here is the ChangeLog:

2014-05-29  Thomas Preud'homme  thomas.preudho...@arm.com

* gcc.c-torture/execute/bswap-2.c: Add alignment constraints to
bitfield and test wrong results instead of correct results to make the
test more portable.

And the patch:

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index 38f18fd..a47e01a 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned;
 
 struct bitfield {
   unsigned char f0:7;
+  unsigned char   :1;
   unsigned char f1:7;
+  unsigned char   :1;
   unsigned char f2:7;
+  unsigned char   :1;
   unsigned char f3:7;
 };
 
@@ -74,11 +77,17 @@ main ()
 return 0;
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_le32 (bfin);
-  if (out != 0x09070503  out != 0x88868482  out != 0x78306141)
+  /* Test what bswap would do if its check are not strict enough instead of
+ what is the expected result as there is too many possible results with
+ bitfields.  */
+  if (out == 0x89878583)
 __builtin_abort ();
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_be32 (bfin);
-  if (out != 0x03050709  out != 0x82848688  out != 0x41613078)
+  /* Test what bswap would do if its check are not strict enough instead of
+ what is the expected result as there is too many possible results with
+ bitfields.  */
+  if (out == 0x83858789)
 __builtin_abort ();
   out = fake_read_le32 (cin, cin[2]);
   if (out != 0x89018583)

Best regards,

Thomas

fix_bswap-2.diff
Description: Binary data


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Andreas Schwab
Thomas Preud'homme thomas.preudho...@arm.com writes:

 By the way, I couldn't understand how you reached the value
 0x44434241. Can you explain me?

Each byte is composed of the first 7 bits of the original byte.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Thomas Preud'homme
 From: Andreas Schwab [mailto:sch...@linux-m68k.org]
 Thomas Preud'homme thomas.preudho...@arm.com writes:
 
  By the way, I couldn't understand how you reached the value
  0x44434241. Can you explain me?
 
 Each byte is composed of the first 7 bits of the original byte.

Sorry, it seems I wasn't very awake when I checked that. Makes sense now.
Thanks. 

Does the patch solve the problem you had? What about you Christophe?

Best regards,

Thomas





RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-27 Thread Thomas Preud'homme
Hi Chistophe and Andreas,

 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 
 I suspect it's the same kind of problem m68k run into. I already wrote a patch
 to
 reduce the impact of different bitfield layout and it's in review right now. 
 I'll
 send you tomorrow for testing.

Can you both try the attached patch to see if it solves the bug you experienced?
Andreas, I wrote it based on the correction you did so it should at least work 
for
m68k.

ChangeLog is:

2014-05-26  Thomas Preud'homme  thomas.preudho...@arm.com

* gcc.c-torture/execute/bswap-2.c: Add alignment constraints to
bitfield to make it more portable.

Best regards,

Thomas 

fix_bswap-2.diff
Description: Binary data


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-27 Thread Christophe Lyon
On 27 May 2014 09:23, Thomas Preud'homme thomas.preudho...@arm.com wrote:
 Hi Chistophe and Andreas,

 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme

 I suspect it's the same kind of problem m68k run into. I already wrote a 
 patch
 to
 reduce the impact of different bitfield layout and it's in review right now. 
 I'll
 send you tomorrow for testing.

 Can you both try the attached patch to see if it solves the bug you 
 experienced?
 Andreas, I wrote it based on the correction you did so it should at least 
 work for
 m68k.

It still doesn't work on armeb.

Christophe.


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-27 Thread Andreas Schwab
Thomas Preud'homme thomas.preudho...@arm.com writes:

 diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
 b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 index 38f18fd..4368d83 100644
 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 @@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned;
  
  struct bitfield {
unsigned char f0:7;
 +  unsigned char   :0;
unsigned char f1:7;
 +  unsigned char   :0;
unsigned char f2:7;
 +  unsigned char   :0;
unsigned char f3:7;
  };

This adds a full byte of padding between each bitfield.  If you want a
single padding bit you should use :1, but you also need to update the
test to check for 0x44434241 (0x88868482 is impossible, since that
requires at least 8 bits per bitfield).

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-26 Thread Christophe Lyon
On 23 May 2014 05:36, Thomas Preud'homme thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:

 
  Updated ChangeLogs:
 
  *** gcc/ChangeLog ***
 
  2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/54733
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host 
  endianness.
  (execute_optimize_bswap): Likewise. Also move its leading comment
 back
  in place and split statement transformation into ...
  (bswap_replace): This.
 
  *** gcc/testsuite/ChangeLog ***
 
  2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.
 
  Best regards,

 This is ok.

 Great. Commited.

 Thanks a lot for your patience in mentoring me to improve this patch.

 Best regards,

 Thomas


I have noticed that the new bswap-2.c test fails at execution on armeb targets.
See:
http://cbuild.validation.linaro.org/build/cross-validation/gcc/210843/report-build-info.html

Could you have a look?
Thanks,

Christophe.


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-26 Thread Thomas Preud'homme
 From: Christophe Lyon [mailto:christophe.l...@linaro.org]
 
 I have noticed that the new bswap-2.c test fails at execution on armeb
 targets.
 See:
 http://cbuild.validation.linaro.org/build/cross-validation/gcc/210843/report-
 build-info.html
 
 Could you have a look?

Sure.

I suspect it's the same kind of problem m68k run into. I already wrote a patch 
to
reduce the impact of different bitfield layout and it's in review right now. 
I'll
send you tomorrow for testing.

Thanks for the feedback.

Best regards,

Thomas




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-24 Thread Andreas Schwab
* gcc.c-torture/execute/bswap-2.c (main): Handle more bitfield
layouts.

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index e91b487..38f18fd 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -74,11 +74,11 @@ main ()
 return 0;
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_le32 (bfin);
-  if (out != 0x09070503  out != 0x88868482)
+  if (out != 0x09070503  out != 0x88868482  out != 0x78306141)
 __builtin_abort ();
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_be32 (bfin);
-  if (out != 0x03050709  out != 0x82848688)
+  if (out != 0x03050709  out != 0x82848688  out != 0x41613078)
 __builtin_abort ();
   out = fake_read_le32 (cin, cin[2]);
   if (out != 0x89018583)
-- 
1.9.3

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-22 Thread Richard Biener
On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-

 I'll send the new patch as soon as all the tests are done.

 Here you are. I also simplified the tests a bit having the reference as a 
 function
 parameter instead of a local one.

 Updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com

 PR tree-optimization/54733
 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
 (CMPNOP): Define.
 (find_bswap_or_nop_load): New.
 (find_bswap_1): Renamed to ...
 (find_bswap_or_nop_1): This. Also add support for memory source.
 (find_bswap): Renamed to ...
 (find_bswap_or_nop): This. Also add support for memory source and
 detection of bitwise operations equivalent to load in host endianness.
 (execute_optimize_bswap): Likewise. Also move its leading comment back
 in place and split statement transformation into ...
 (bswap_replace): This.

 *** gcc/testsuite/ChangeLog ***

 2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com

 PR tree-optimization/54733
 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
 optimization to support memory sources and bitwise operations
 equivalent to load in host endianness.
 * gcc.dg/optimize-bswaphi-1.c: Likewise.
 * gcc.dg/optimize-bswapsi-2.c: Likewise.
 * gcc.c-torture/execute/bswap-2.c: Likewise.

 Best regards,

This is ok.

Thanks,
Richard.

 Thomas


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-22 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:

 
  Updated ChangeLogs:
 
  *** gcc/ChangeLog ***
 
  2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/54733
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host 
  endianness.
  (execute_optimize_bswap): Likewise. Also move its leading comment
 back
  in place and split statement transformation into ...
  (bswap_replace): This.
 
  *** gcc/testsuite/ChangeLog ***
 
  2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com
 
  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.
 
  Best regards,
 
 This is ok.

Great. Commited.

Thanks a lot for your patience in mentoring me to improve this patch.

Best regards,

Thomas




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-20 Thread Richard Biener
On Tue, May 20, 2014 at 4:46 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]

 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]

 Indeed, it's defined to 1 without consideration of the mode or alignment
 At least ARM, alpha, tilegx, tilepro and all target with STRICT_ALIGNMENT
 since that's the default value for SLOW_UNALIGNED_ACCESS macro. Thus
 mips should be in there too for instance.

 However, I fail to see how the code produced to do an unaligned load
 could be worse than the manual load done in the original bitwise
 expression. It might be worse for load + bswap though. Maybe I could
 skip the optimization based on this macro only for bswap?

It may do three aligned loads, char, short, char and combine them
while doing an unaligned int load may end up being slower.  Though
very probable the RTL expansion machinery for unaligned loads
is way more clever to emit an optimal sequence than a programmer is.

Anyway, as said before please consider addressing any cost issues
as followup - just make sure to properly emit unaligned loads via
a sequence I suggested.

Thanks,
Richard.

 Best regards,

 Thomas




RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-20 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 
 It may do three aligned loads, char, short, char and combine them
 while doing an unaligned int load may end up being slower.  Though
 very probable the RTL expansion machinery for unaligned loads
 is way more clever to emit an optimal sequence than a programmer is.

That's what I meant. I expect the RTL machinery to emit the optimal sequence
to do an unaligned load. If it doesn't, it should probably be fixed.

 
 Anyway, as said before please consider addressing any cost issues
 as followup - just make sure to properly emit unaligned loads via
 a sequence I suggested.

That's what I did with the addition of skipping the optimization for target
with slow unaligned access for bswap permutation. Since this affects ARM
you can be sure I'll follow up on the cost model. I already started thinking
about it.

I'll send the new patch as soon as all the tests are done.

Best regards,

Thomas





RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-20 Thread Thomas Preud'homme
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 
 I'll send the new patch as soon as all the tests are done.

Here you are. I also simplified the tests a bit having the reference as a 
function
parameter instead of a local one.

Updated ChangeLogs:

*** gcc/ChangeLog ***

2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com

PR tree-optimization/54733
* tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
(CMPNOP): Define.
(find_bswap_or_nop_load): New.
(find_bswap_1): Renamed to ...
(find_bswap_or_nop_1): This. Also add support for memory source.
(find_bswap): Renamed to ...
(find_bswap_or_nop): This. Also add support for memory source and
detection of bitwise operations equivalent to load in host endianness.
(execute_optimize_bswap): Likewise. Also move its leading comment back
in place and split statement transformation into ...
(bswap_replace): This.

*** gcc/testsuite/ChangeLog ***

2014-05-20  Thomas Preud'homme  thomas.preudho...@arm.com

PR tree-optimization/54733
* gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
optimization to support memory sources and bitwise operations
equivalent to load in host endianness.
* gcc.dg/optimize-bswaphi-1.c: Likewise.
* gcc.dg/optimize-bswapsi-2.c: Likewise.
* gcc.c-torture/execute/bswap-2.c: Likewise. 

Best regards,

Thomas

gcc32rm-84.6.0.diff
Description: Binary data


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-19 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 
 Oh, and what happens for
 
 unsigned foo (unsigned char *x)
 {
   return x[0]  24 | x[2]  8 | x[3];
 }
 
 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).

I implemented this but it makes the statistics more difficult to read
and thus the test more difficult to write. Indeed, when doing a full
bswap many of the intermediate computations are partial bswap.
I can get on with it by using a different string to notify of partial
bswap than full bswap and test the partial bswap feature in a
separate file to not get noise from the full bswap. However I still
don't like the idea of notifying for partial bswap in statements that
will eventually be eliminated as dead statements. Besides, this
occur because some recursion is made for each statement even
if they were already examined which is not very nice in itself.

The solution I see is to keep a map of visited statements but I
don't know if that's fine in this case. After all, it's not strictly
necessary as the pass would work without this. It would just serve
to avoid redundant computation and confusion statistics.

Best regards,

Thomas




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-19 Thread Richard Biener
On Mon, May 19, 2014 at 11:30 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]

 Oh, and what happens for

 unsigned foo (unsigned char *x)
 {
   return x[0]  24 | x[2]  8 | x[3];
 }

 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).

 I implemented this but it makes the statistics more difficult to read
 and thus the test more difficult to write. Indeed, when doing a full
 bswap many of the intermediate computations are partial bswap.
 I can get on with it by using a different string to notify of partial
 bswap than full bswap and test the partial bswap feature in a
 separate file to not get noise from the full bswap. However I still
 don't like the idea of notifying for partial bswap in statements that
 will eventually be eliminated as dead statements. Besides, this
 occur because some recursion is made for each statement even
 if they were already examined which is not very nice in itself.

Ah, indeed ...

 The solution I see is to keep a map of visited statements but I
 don't know if that's fine in this case. After all, it's not strictly
 necessary as the pass would work without this. It would just serve
 to avoid redundant computation and confusion statistics.

... having bswap cleanup after itself (thus remove dead code itself
would be nice).  Let's just keep the above as a possibility for
further enhancements and focus on getting the current patch
correct and committed.

Btw, another enhancement is memory target.  Thus recognize

void bswap_mem (char *m1, char *m2)
{
  m2[0] = m1[3];
  m2[1] = m1[2];
  m2[2] = m1[1];
  m2[3] = m1[0];
}

with all the complication that arises due to aliasing issues.

Thanks,
Richard.

 Best regards,

 Thomas




RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-19 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 
 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]

Indeed, it's defined to 1 without consideration of the mode or alignment
At least ARM, alpha, tilegx, tilepro and all target with STRICT_ALIGNMENT
since that's the default value for SLOW_UNALIGNED_ACCESS macro. Thus
mips should be in there too for instance.

However, I fail to see how the code produced to do an unaligned load
could be worse than the manual load done in the original bitwise
expression. It might be worse for load + bswap though. Maybe I could
skip the optimization based on this macro only for bswap?

Best regards,

Thomas




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-19 Thread Andrew Pinski
On Fri, May 16, 2014 at 9:58 AM,  pins...@gmail.com wrote:


 On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 1:03 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:


 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?

 Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other 
 patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

 PR tree-optimization/54733
 * expr.c (get_inner_reference): Add a parameter to control whether
 a
 MEM_REF should be split into base + offset.
 * tree.h (get_inner_reference): Default new parameter to false.
 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
 (CMPNOP): Define.
 (find_bswap_or_nop_load): New.
 (find_bswap_1): Renamed to ...
 (find_bswap_or_nop_1): This. Also add support for memory source.
 (find_bswap): Renamed to ...
 (find_bswap_or_nop): This. Also add support for memory source and
 detection of bitwise operations equivalent to load in host 
 endianness.
 (execute_optimize_bswap): Likewise. Also move its leading
 comment back
 in place and split statement transformation into ...
 (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

 PR tree-optimization/54733
 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
 optimization to support memory sources and bitwise operations
 equivalent to load in host endianness.
 * gcc.dg/optimize-bswaphi-1.c: Likewise.
 * gcc.dg/optimize-bswapsi-2.c: Likewise.
 * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?

 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:

 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);

 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.

 So what you have to do is either (simpler)

  unsigned int align = get_pointer_alignment (addr_tmp);
  tree al_load_type = load_type;
  if (align  TYPE_ALIGN (load_type))
al_load_type = build_aligned_type (load_type, align);
 ...
   val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
load_offset_ptr);

 or keep track of the first actual load and use

  unsigned int align = get_object_alignment (that_first_load);

 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.

 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).

 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.

 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]

 Oh, and what happens for

 unsigned foo (unsigned char *x)
 {
  return x[0]  24 | x[2]  8 | x[3];
 }

 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).

 No we cannot if x[4] is on a different page which is not mapped in, we get a 
 fault. Not something we want.

I was reading that code wrong.  I trying to say if we don't load from
x[3] then we can't do it.  But with the example above we can.

Thanks,
Andrew Pinski



 Thanks,
 Andrew



 Richard.

 Richard.

 Thanks,
 Andrew


 Thanks and sorry

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-18 Thread Thomas Preud'homme
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
  Ping?
 
 Sorry ...

 
 Thanks and sorry again for the delay.
 

No need to be sorry, it was really not meant as a complaint. I understand very
well that patches sometimes go under the radar and I just wanted to make
sure someone saw it.


 From: pins...@gmail.com [mailto:pins...@gmail.com]
 
 Not always decomposed. On MIPS, it should using the load/store left/right
 instructions for unaligned load/stores which is normally better than
 decomposed load/stores. So having a cost model would be nice.
 

This makes me think about the following situation:

uint32_t
read_le (unsigned char data[])
{
  uint16_t lo, hi;

  hi = data[0] | (data[1]  8);
  lo = (data[2]  16) | (data[3]  24);
  printf (lo + hi: %d\n, lo + hi);
  return hi | lo;
}

Currently this will do a load of 4 bytes and do a bswap on big endian target but
It would be better to just handle hi and lo separately doing two 2 bytes load
and doing a bitwise OR of these two. So check if two SSA_NAME are used by
other statements and if yes stop there.


 From: Richard Biener [mailto:richard.guent...@gmail.com]
 
 Oh, and what happens for
 
 unsigned foo (unsigned char *x)
 {
   return x[0]  24 | x[2]  8 | x[3];
 }
 
 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).
 

Looks like a nice improvement to the patch indeed.


 From: pins...@gmail.com [mailto:pins...@gmail.com]
 
 No we cannot if x[4] is on a different page which is not mapped in, we get a
 fault. Not something we want.
 

Why would x[4] be loaded? I guess Richard was only suggesting doing a single
load + zeroing only when the untouched array entry is neither the first nor the
last, that is when there is a load.

Best regards,

Thomas Preud'homme





RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Thomas Preud'homme
Ping?

Best regards,

Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
 
 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current setup
 is actually more difficult to read.
 
 Here are the updated ChangeLogs:
 
 *** gcc/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
   PR tree-optimization/54733
   * expr.c (get_inner_reference): Add a parameter to control whether
 a
   MEM_REF should be split into base + offset.
   * tree.h (get_inner_reference): Default new parameter to false.
   * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
   (CMPNOP): Define.
   (find_bswap_or_nop_load): New.
   (find_bswap_1): Renamed to ...
   (find_bswap_or_nop_1): This. Also add support for memory source.
   (find_bswap): Renamed to ...
   (find_bswap_or_nop): This. Also add support for memory source and
   detection of bitwise operations equivalent to load in host endianness.
   (execute_optimize_bswap): Likewise. Also move its leading
 comment back
   in place and split statement transformation into ...
   (bswap_replace): This. Add assert when updating bswap_stats.
 
 *** gcc/testsuite/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
   PR tree-optimization/54733
   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
   optimization to support memory sources and bitwise operations
   equivalent to load in host endianness.
   * gcc.dg/optimize-bswaphi-1.c: Likewise.
   * gcc.dg/optimize-bswapsi-2.c: Likewise.
   * gcc.c-torture/execute/bswap-2.c: Likewise.
 
 Ok for trunk?
 
 Best regards,
 
 Thomas
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
  Sent: Monday, May 05, 2014 7:30 PM
  To: GCC Patches
  Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
  load/store
 
  I found a way to improve the function find_bswap/find_bswap_or_nop
  and reduce its size. Please hold for the review, I will post an updated
  version as soon as I finish testing.
 
  Best regards,
 
  Thomas Preud'homme
 
 
 




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 Ping?

Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

   PR tree-optimization/54733
   * expr.c (get_inner_reference): Add a parameter to control whether
 a
   MEM_REF should be split into base + offset.
   * tree.h (get_inner_reference): Default new parameter to false.
   * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
   (CMPNOP): Define.
   (find_bswap_or_nop_load): New.
   (find_bswap_1): Renamed to ...
   (find_bswap_or_nop_1): This. Also add support for memory source.
   (find_bswap): Renamed to ...
   (find_bswap_or_nop): This. Also add support for memory source and
   detection of bitwise operations equivalent to load in host endianness.
   (execute_optimize_bswap): Likewise. Also move its leading
 comment back
   in place and split statement transformation into ...
   (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

   PR tree-optimization/54733
   * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
   optimization to support memory sources and bitwise operations
   equivalent to load in host endianness.
   * gcc.dg/optimize-bswaphi-1.c: Likewise.
   * gcc.dg/optimize-bswapsi-2.c: Likewise.
   * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

Ok, I now decided otherwise and dislike the new parameter to
get_inner_reference.  Can you please revert that part and just
deal with a MEM_REF result in your only caller?

And (of course) I found another possible issue.  The way you
compute load_type and use it here:

+  /* Perform the load.  */
+  load_offset_ptr = build_int_cst (n-alias_set, 0);
+  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
+ load_offset_ptr);

makes the load always appear aligned according to the mode of
load_type.  On strict-alignment targets this may cause faults.

So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

first in the one that corresponds to addr_tmp.  From that there
is a much better chance to derive good alignment values.

Of course on STRICT_ALIGNMENT targets a not aligned load
will be decomposed again, so eventually doing the transformation
may no longer be profitable(?).

Thanks and sorry again for the delay.

Otherwise the patch looks good to me.

Richard.

 Best regards,

 Thomas

  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
  Sent: Monday, May 05, 2014 7:30 PM
  To: GCC Patches
  Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
  load/store
 
  I found a way to improve the function find_bswap/find_bswap_or_nop
  and reduce its size. Please hold for the review, I will post an updated
  version as soon as I finish testing.
 
  Best regards,
 
  Thomas Preud'homme
 
 
 




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:


 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?

 Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * expr.c (get_inner_reference): Add a parameter to control whether
 a
  MEM_REF should be split into base + offset.
  * tree.h (get_inner_reference): Default new parameter to false.
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host endianness.
  (execute_optimize_bswap): Likewise. Also move its leading
 comment back
  in place and split statement transformation into ...
  (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?

 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:

 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);

 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.

 So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
 ...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

 or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.

 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).

 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.

Agreed, but I am happy with doing that as a followup.  Btw,
a very simple one would be to reject unaligned
SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
[of course that may be true on MIPS even for the cases where
a reasonable fast unalgined variant exists - nearly no target
defines that macro in a too fancy way]

Richard.

 Thanks,
 Andrew


 Thanks and sorry again for the delay.

 Otherwise the patch looks good to me.

 Richard.

 Best regards,

 Thomas

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store

 I found a way to improve the function find_bswap/find_bswap_or_nop
 and reduce its size. Please hold for the review, I will post an updated
 version as soon as I finish testing.

 Best regards,

 Thomas Preud'homme




Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread Richard Biener
On Fri, May 16, 2014 at 1:03 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:


 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:

 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?

 Sorry ...

 Best regards,

 Thomas Preud'homme

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store

 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.

 Here are the updated ChangeLogs:

 *** gcc/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * expr.c (get_inner_reference): Add a parameter to control whether
 a
  MEM_REF should be split into base + offset.
  * tree.h (get_inner_reference): Default new parameter to false.
  * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
  (CMPNOP): Define.
  (find_bswap_or_nop_load): New.
  (find_bswap_1): Renamed to ...
  (find_bswap_or_nop_1): This. Also add support for memory source.
  (find_bswap): Renamed to ...
  (find_bswap_or_nop): This. Also add support for memory source and
  detection of bitwise operations equivalent to load in host 
 endianness.
  (execute_optimize_bswap): Likewise. Also move its leading
 comment back
  in place and split statement transformation into ...
  (bswap_replace): This. Add assert when updating bswap_stats.

 *** gcc/testsuite/ChangeLog ***

 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

  PR tree-optimization/54733
  * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
  optimization to support memory sources and bitwise operations
  equivalent to load in host endianness.
  * gcc.dg/optimize-bswaphi-1.c: Likewise.
  * gcc.dg/optimize-bswapsi-2.c: Likewise.
  * gcc.c-torture/execute/bswap-2.c: Likewise.

 Ok for trunk?

 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?

 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:

 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);

 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.

 So what you have to do is either (simpler)

   unsigned int align = get_pointer_alignment (addr_tmp);
   tree al_load_type = load_type;
   if (align  TYPE_ALIGN (load_type))
 al_load_type = build_aligned_type (load_type, align);
 ...
val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
 load_offset_ptr);

 or keep track of the first actual load and use

   unsigned int align = get_object_alignment (that_first_load);

 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.

 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).

 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.

 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]

Oh, and what happens for

unsigned foo (unsigned char *x)
{
  return x[0]  24 | x[2]  8 | x[3];
}

?  We could do an unsigned int load from x and zero byte 3
with an AND.  Enhancement for a followup, similar to also
considering vector types for the load (also I'm not sure
that uint64_type_node always has non-BLKmode for all
targets).

Richard.

 Richard.

 Thanks,
 Andrew


 Thanks and sorry again for the delay.

 Otherwise the patch looks good to me.

 Richard.

 Best regards,

 Thomas

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store

 I found a way to improve the function find_bswap

Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-16 Thread pinskia


 On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com 
 wrote:
 
 On Fri, May 16, 2014 at 1:03 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 16, 2014 at 12:56 PM,  pins...@gmail.com wrote:
 
 
 On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com 
 wrote:
 
 On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme
 thomas.preudho...@arm.com wrote:
 Ping?
 
 Sorry ...
 
 Best regards,
 
 Thomas Preud'homme
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Friday, May 09, 2014 6:26 PM
 To: GCC Patches
 Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store
 
 Sorry, took longer than expected as I got distracted by some other patch.
 I merged the whole patchset in a single patch as I was told the current 
 setup
 is actually more difficult to read.
 
 Here are the updated ChangeLogs:
 
 *** gcc/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
 PR tree-optimization/54733
 * expr.c (get_inner_reference): Add a parameter to control whether
 a
 MEM_REF should be split into base + offset.
 * tree.h (get_inner_reference): Default new parameter to false.
 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
 (CMPNOP): Define.
 (find_bswap_or_nop_load): New.
 (find_bswap_1): Renamed to ...
 (find_bswap_or_nop_1): This. Also add support for memory source.
 (find_bswap): Renamed to ...
 (find_bswap_or_nop): This. Also add support for memory source and
 detection of bitwise operations equivalent to load in host 
 endianness.
 (execute_optimize_bswap): Likewise. Also move its leading
 comment back
 in place and split statement transformation into ...
 (bswap_replace): This. Add assert when updating bswap_stats.
 
 *** gcc/testsuite/ChangeLog ***
 
 2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com
 
 PR tree-optimization/54733
 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of
 bswap
 optimization to support memory sources and bitwise operations
 equivalent to load in host endianness.
 * gcc.dg/optimize-bswaphi-1.c: Likewise.
 * gcc.dg/optimize-bswapsi-2.c: Likewise.
 * gcc.c-torture/execute/bswap-2.c: Likewise.
 
 Ok for trunk?
 
 Ok, I now decided otherwise and dislike the new parameter to
 get_inner_reference.  Can you please revert that part and just
 deal with a MEM_REF result in your only caller?
 
 And (of course) I found another possible issue.  The way you
 compute load_type and use it here:
 
 +  /* Perform the load.  */
 +  load_offset_ptr = build_int_cst (n-alias_set, 0);
 +  val_expr = fold_build2 (MEM_REF, load_type, addr_tmp,
 + load_offset_ptr);
 
 makes the load always appear aligned according to the mode of
 load_type.  On strict-alignment targets this may cause faults.
 
 So what you have to do is either (simpler)
 
  unsigned int align = get_pointer_alignment (addr_tmp);
  tree al_load_type = load_type;
  if (align  TYPE_ALIGN (load_type))
al_load_type = build_aligned_type (load_type, align);
 ...
   val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp,
load_offset_ptr);
 
 or keep track of the first actual load and use
 
  unsigned int align = get_object_alignment (that_first_load);
 
 first in the one that corresponds to addr_tmp.  From that there
 is a much better chance to derive good alignment values.
 
 Of course on STRICT_ALIGNMENT targets a not aligned load
 will be decomposed again, so eventually doing the transformation
 may no longer be profitable(?).
 
 Not always decomposed. On MIPS, it should using the load/store left/right 
 instructions for unaligned load/stores which is normally better than 
 decomposed load/stores. So having a cost model would be nice.
 
 Agreed, but I am happy with doing that as a followup.  Btw,
 a very simple one would be to reject unaligned
 SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align).
 [of course that may be true on MIPS even for the cases where
 a reasonable fast unalgined variant exists - nearly no target
 defines that macro in a too fancy way]
 
 Oh, and what happens for
 
 unsigned foo (unsigned char *x)
 {
  return x[0]  24 | x[2]  8 | x[3];
 }
 
 ?  We could do an unsigned int load from x and zero byte 3
 with an AND.  Enhancement for a followup, similar to also
 considering vector types for the load (also I'm not sure
 that uint64_type_node always has non-BLKmode for all
 targets).

No we cannot if x[4] is on a different page which is not mapped in, we get a 
fault. Not something we want. 

Thanks,
Andrew


 
 Richard.
 
 Richard.
 
 Thanks,
 Andrew
 
 
 Thanks and sorry again for the delay.
 
 Otherwise the patch looks good to me.
 
 Richard.
 
 Best regards,
 
 Thomas
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-09 Thread Thomas Preud'homme
Sorry, took longer than expected as I got distracted by some other patch.
I merged the whole patchset in a single patch as I was told the current setup
is actually more difficult to read.

Here are the updated ChangeLogs:

*** gcc/ChangeLog ***

2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

PR tree-optimization/54733
* expr.c (get_inner_reference): Add a parameter to control whether a
MEM_REF should be split into base + offset.
* tree.h (get_inner_reference): Default new parameter to false.
* tree-ssa-math-opts.c (nop_stats): New bswap_stats structure.
(CMPNOP): Define.
(find_bswap_or_nop_load): New.
(find_bswap_1): Renamed to ...
(find_bswap_or_nop_1): This. Also add support for memory source.
(find_bswap): Renamed to ...
(find_bswap_or_nop): This. Also add support for memory source and
detection of bitwise operations equivalent to load in host endianness.
(execute_optimize_bswap): Likewise. Also move its leading comment back
in place and split statement transformation into ...
(bswap_replace): This. Add assert when updating bswap_stats.

*** gcc/testsuite/ChangeLog ***

2014-05-09  Thomas Preud'homme  thomas.preudho...@arm.com

PR tree-optimization/54733
* gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap
optimization to support memory sources and bitwise operations
equivalent to load in host endianness.
* gcc.dg/optimize-bswaphi-1.c: Likewise.
* gcc.dg/optimize-bswapsi-2.c: Likewise.
* gcc.c-torture/execute/bswap-2.c: Likewise.

Ok for trunk?

Best regards,

Thomas

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
 Sent: Monday, May 05, 2014 7:30 PM
 To: GCC Patches
 Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent
 load/store
 
 I found a way to improve the function find_bswap/find_bswap_or_nop
 and reduce its size. Please hold for the review, I will post an updated
 version as soon as I finish testing.
 
 Best regards,
 
 Thomas Preud'homme
 
 
 


gcc32rm-84.5.4.diff
Description: Binary data