RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Great. What about you Andreas? Does it work fine for you? If yes, is this ok for trunk? Ok. Thanks, Richard. Commited since I got positive feedback from Christophe Lyon and Rainer (on PR61320). Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Wed, Jun 4, 2014 at 9:04 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Christophe Lyon [mailto:christophe.l...@linaro.org] On 29 May 2014 11:58, Thomas Preud'homme thomas.preudho...@arm.com wrote: Does the patch solve the problem you had? What about you Christophe? Hi Thomas, After a quick test, it looks OK to me. Great. What about you Andreas? Does it work fine for you? If yes, is this ok for trunk? Ok. Thanks, Richard. Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On 29 May 2014 11:58, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Andreas Schwab [mailto:sch...@linux-m68k.org] Thomas Preud'homme thomas.preudho...@arm.com writes: By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Each byte is composed of the first 7 bits of the original byte. Sorry, it seems I wasn't very awake when I checked that. Makes sense now. Thanks. Does the patch solve the problem you had? What about you Christophe? Best regards, Thomas Hi Thomas, After a quick test, it looks OK to me. Thanks Christophe.
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Andreas Schwab [mailto:sch...@linux-m68k.org] This adds a full byte of padding between each bitfield. If you want a single padding bit you should use :1, but you also need to update the test to check for 0x44434241 (0x88868482 is impossible, since that requires at least 8 bits per bitfield). Actually if I understood C99 correctly it depends on the storage unit allocated for the bitfield preceding the 0 length bitfield. Instead of trying to cover all possible value read from this bitfield I rewrote the test to check if bswap misinterpret the expression and replace it with a load or load+bswap. This reduce the number of possible values to 2 and thus makes the test less fragile and easier to understand. By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Here is the ChangeLog: 2014-05-29 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c: Add alignment constraints to bitfield and test wrong results instead of correct results to make the test more portable. And the patch: diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index 38f18fd..a47e01a 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned; struct bitfield { unsigned char f0:7; + unsigned char :1; unsigned char f1:7; + unsigned char :1; unsigned char f2:7; + unsigned char :1; unsigned char f3:7; }; @@ -74,11 +77,17 @@ main () return 0; bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_le32 (bfin); - if (out != 0x09070503 out != 0x88868482 out != 0x78306141) + /* Test what bswap would do if its check are not strict enough instead of + what is the expected result as there is too many possible results with + bitfields. */ + if (out == 0x89878583) __builtin_abort (); bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_be32 (bfin); - if (out != 0x03050709 out != 0x82848688 out != 0x41613078) + /* Test what bswap would do if its check are not strict enough instead of + what is the expected result as there is too many possible results with + bitfields. */ + if (out == 0x83858789) __builtin_abort (); out = fake_read_le32 (cin, cin[2]); if (out != 0x89018583) Best regards, Thomas fix_bswap-2.diff Description: Binary data
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
Thomas Preud'homme thomas.preudho...@arm.com writes: By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Each byte is composed of the first 7 bits of the original byte. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Andreas Schwab [mailto:sch...@linux-m68k.org] Thomas Preud'homme thomas.preudho...@arm.com writes: By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Each byte is composed of the first 7 bits of the original byte. Sorry, it seems I wasn't very awake when I checked that. Makes sense now. Thanks. Does the patch solve the problem you had? What about you Christophe? Best regards, Thomas
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
Hi Chistophe and Andreas, From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme I suspect it's the same kind of problem m68k run into. I already wrote a patch to reduce the impact of different bitfield layout and it's in review right now. I'll send you tomorrow for testing. Can you both try the attached patch to see if it solves the bug you experienced? Andreas, I wrote it based on the correction you did so it should at least work for m68k. ChangeLog is: 2014-05-26 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c: Add alignment constraints to bitfield to make it more portable. Best regards, Thomas fix_bswap-2.diff Description: Binary data
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On 27 May 2014 09:23, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi Chistophe and Andreas, From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme I suspect it's the same kind of problem m68k run into. I already wrote a patch to reduce the impact of different bitfield layout and it's in review right now. I'll send you tomorrow for testing. Can you both try the attached patch to see if it solves the bug you experienced? Andreas, I wrote it based on the correction you did so it should at least work for m68k. It still doesn't work on armeb. Christophe.
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
Thomas Preud'homme thomas.preudho...@arm.com writes: diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index 38f18fd..4368d83 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned; struct bitfield { unsigned char f0:7; + unsigned char :0; unsigned char f1:7; + unsigned char :0; unsigned char f2:7; + unsigned char :0; unsigned char f3:7; }; This adds a full byte of padding between each bitfield. If you want a single padding bit you should use :1, but you also need to update the test to check for 0x44434241 (0x88868482 is impossible, since that requires at least 8 bits per bitfield). Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On 23 May 2014 05:36, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. *** gcc/testsuite/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Best regards, This is ok. Great. Commited. Thanks a lot for your patience in mentoring me to improve this patch. Best regards, Thomas I have noticed that the new bswap-2.c test fails at execution on armeb targets. See: http://cbuild.validation.linaro.org/build/cross-validation/gcc/210843/report-build-info.html Could you have a look? Thanks, Christophe.
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Christophe Lyon [mailto:christophe.l...@linaro.org] I have noticed that the new bswap-2.c test fails at execution on armeb targets. See: http://cbuild.validation.linaro.org/build/cross-validation/gcc/210843/report- build-info.html Could you have a look? Sure. I suspect it's the same kind of problem m68k run into. I already wrote a patch to reduce the impact of different bitfield layout and it's in review right now. I'll send you tomorrow for testing. Thanks for the feedback. Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
* gcc.c-torture/execute/bswap-2.c (main): Handle more bitfield layouts. diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index e91b487..38f18fd 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -74,11 +74,11 @@ main () return 0; bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_le32 (bfin); - if (out != 0x09070503 out != 0x88868482) + if (out != 0x09070503 out != 0x88868482 out != 0x78306141) __builtin_abort (); bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_be32 (bfin); - if (out != 0x03050709 out != 0x82848688) + if (out != 0x03050709 out != 0x82848688 out != 0x41613078) __builtin_abort (); out = fake_read_le32 (cin, cin[2]); if (out != 0x89018583) -- 1.9.3 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- I'll send the new patch as soon as all the tests are done. Here you are. I also simplified the tests a bit having the reference as a function parameter instead of a local one. Updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. *** gcc/testsuite/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Best regards, This is ok. Thanks, Richard. Thomas
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] On Wed, May 21, 2014 at 3:00 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. *** gcc/testsuite/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Best regards, This is ok. Great. Commited. Thanks a lot for your patience in mentoring me to improve this patch. Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Tue, May 20, 2014 at 4:46 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Indeed, it's defined to 1 without consideration of the mode or alignment At least ARM, alpha, tilegx, tilepro and all target with STRICT_ALIGNMENT since that's the default value for SLOW_UNALIGNED_ACCESS macro. Thus mips should be in there too for instance. However, I fail to see how the code produced to do an unaligned load could be worse than the manual load done in the original bitwise expression. It might be worse for load + bswap though. Maybe I could skip the optimization based on this macro only for bswap? It may do three aligned loads, char, short, char and combine them while doing an unaligned int load may end up being slower. Though very probable the RTL expansion machinery for unaligned loads is way more clever to emit an optimal sequence than a programmer is. Anyway, as said before please consider addressing any cost issues as followup - just make sure to properly emit unaligned loads via a sequence I suggested. Thanks, Richard. Best regards, Thomas
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] It may do three aligned loads, char, short, char and combine them while doing an unaligned int load may end up being slower. Though very probable the RTL expansion machinery for unaligned loads is way more clever to emit an optimal sequence than a programmer is. That's what I meant. I expect the RTL machinery to emit the optimal sequence to do an unaligned load. If it doesn't, it should probably be fixed. Anyway, as said before please consider addressing any cost issues as followup - just make sure to properly emit unaligned loads via a sequence I suggested. That's what I did with the addition of skipping the optimization for target with slow unaligned access for bswap permutation. Since this affects ARM you can be sure I'll follow up on the cost model. I already started thinking about it. I'll send the new patch as soon as all the tests are done. Best regards, Thomas
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- I'll send the new patch as soon as all the tests are done. Here you are. I also simplified the tests a bit having the reference as a function parameter instead of a local one. Updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. *** gcc/testsuite/ChangeLog *** 2014-05-20 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Best regards, Thomas gcc32rm-84.6.0.diff Description: Binary data
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). I implemented this but it makes the statistics more difficult to read and thus the test more difficult to write. Indeed, when doing a full bswap many of the intermediate computations are partial bswap. I can get on with it by using a different string to notify of partial bswap than full bswap and test the partial bswap feature in a separate file to not get noise from the full bswap. However I still don't like the idea of notifying for partial bswap in statements that will eventually be eliminated as dead statements. Besides, this occur because some recursion is made for each statement even if they were already examined which is not very nice in itself. The solution I see is to keep a map of visited statements but I don't know if that's fine in this case. After all, it's not strictly necessary as the pass would work without this. It would just serve to avoid redundant computation and confusion statistics. Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Mon, May 19, 2014 at 11:30 AM, Thomas Preud'homme thomas.preudho...@arm.com wrote: From: Richard Biener [mailto:richard.guent...@gmail.com] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). I implemented this but it makes the statistics more difficult to read and thus the test more difficult to write. Indeed, when doing a full bswap many of the intermediate computations are partial bswap. I can get on with it by using a different string to notify of partial bswap than full bswap and test the partial bswap feature in a separate file to not get noise from the full bswap. However I still don't like the idea of notifying for partial bswap in statements that will eventually be eliminated as dead statements. Besides, this occur because some recursion is made for each statement even if they were already examined which is not very nice in itself. Ah, indeed ... The solution I see is to keep a map of visited statements but I don't know if that's fine in this case. After all, it's not strictly necessary as the pass would work without this. It would just serve to avoid redundant computation and confusion statistics. ... having bswap cleanup after itself (thus remove dead code itself would be nice). Let's just keep the above as a possibility for further enhancements and focus on getting the current patch correct and committed. Btw, another enhancement is memory target. Thus recognize void bswap_mem (char *m1, char *m2) { m2[0] = m1[3]; m2[1] = m1[2]; m2[2] = m1[1]; m2[3] = m1[0]; } with all the complication that arises due to aliasing issues. Thanks, Richard. Best regards, Thomas
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Indeed, it's defined to 1 without consideration of the mode or alignment At least ARM, alpha, tilegx, tilepro and all target with STRICT_ALIGNMENT since that's the default value for SLOW_UNALIGNED_ACCESS macro. Thus mips should be in there too for instance. However, I fail to see how the code produced to do an unaligned load could be worse than the manual load done in the original bitwise expression. It might be worse for load + bswap though. Maybe I could skip the optimization based on this macro only for bswap? Best regards, Thomas
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 9:58 AM, pins...@gmail.com wrote: On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 1:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). No we cannot if x[4] is on a different page which is not mapped in, we get a fault. Not something we want. I was reading that code wrong. I trying to say if we don't load from x[3] then we can't do it. But with the example above we can. Thanks, Andrew Pinski Thanks, Andrew Richard. Richard. Thanks, Andrew Thanks and sorry
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Richard Biener [mailto:richard.guent...@gmail.com] On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Thanks and sorry again for the delay. No need to be sorry, it was really not meant as a complaint. I understand very well that patches sometimes go under the radar and I just wanted to make sure someone saw it. From: pins...@gmail.com [mailto:pins...@gmail.com] Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. This makes me think about the following situation: uint32_t read_le (unsigned char data[]) { uint16_t lo, hi; hi = data[0] | (data[1] 8); lo = (data[2] 16) | (data[3] 24); printf (lo + hi: %d\n, lo + hi); return hi | lo; } Currently this will do a load of 4 bytes and do a bswap on big endian target but It would be better to just handle hi and lo separately doing two 2 bytes load and doing a bitwise OR of these two. So check if two SSA_NAME are used by other statements and if yes stop there. From: Richard Biener [mailto:richard.guent...@gmail.com] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). Looks like a nice improvement to the patch indeed. From: pins...@gmail.com [mailto:pins...@gmail.com] No we cannot if x[4] is on a different page which is not mapped in, we get a fault. Not something we want. Why would x[4] be loaded? I guess Richard was only suggesting doing a single load + zeroing only when the untouched array entry is neither the first nor the last, that is when there is a load. Best regards, Thomas Preud'homme
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
Ping? Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 1:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). Richard. Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 1:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). No we cannot if x[4] is on a different page which is not mapped in, we get a fault. Not something we want. Thanks, Andrew Richard. Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme gcc32rm-84.5.4.diff Description: Binary data