https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65369
Thomas Preud'homme <thopre01 at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW Last reconfirmed|2015-03-10 00:00:00 |2015-03-12 0:00 Assignee|thopre01 at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #27 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #26) > So, on my version of the testcase with r210843 -O3 -mcpu=power8 there are > like 49 > 32 bit load in host endianness found at: _105 = MEM[(const unsigned char > *)load_src_25]; > occurrences, so I've added a quick hack (should have used dbg counters > parhaps), and > with BSWAPCNT=16 it works fine, with BSWAPCNT=17 it fails. > In the *.optimized dump, I've noticed that this single load matters for > vectorization in md4_update function, with BSWAPCNT=16 a chunk of code isn't > vectorized, with BSWAPCNT=17 it is. > > So very well this might just trigger a latent bug in the vectorizer or > powerpc backend. Using trunk I get the following difference for bswap @@ -1110,10 +1111,10 @@ nettle_md4_update (struct md4_ctx * ctx, _100 = MEM[(const uint8_t *)data_149 + 1B]; _101 = (unsigned int) _100; _102 = _101 << 8; + _106 = MEM[(const uint8_t *)data_149]; _104 = *data_149; _105 = (unsigned int) _104; _123 = _99 | _105; - _106 = _102 | _123; Which looks perfectly fine. So yeah, I guess the problem is at a different level.