https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65369

Thomas Preud'homme <thopre01 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
   Last reconfirmed|2015-03-10 00:00:00         |2015-03-12 0:00
           Assignee|thopre01 at gcc dot gnu.org        |unassigned at gcc dot 
gnu.org

--- Comment #27 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #26)
> So, on my version of the testcase with r210843 -O3 -mcpu=power8 there are
> like 49
> 32 bit load in host endianness found at: _105 = MEM[(const unsigned char
> *)load_src_25];
> occurrences, so I've added a quick hack (should have used dbg counters
> parhaps), and
> with BSWAPCNT=16 it works fine, with BSWAPCNT=17 it fails.
> In the *.optimized dump, I've noticed that this single load matters for
> vectorization in md4_update function, with BSWAPCNT=16 a chunk of code isn't
> vectorized, with BSWAPCNT=17 it is.

> 
> So very well this might just trigger a latent bug in the vectorizer or
> powerpc backend.


Using trunk I get the following difference for bswap

@@ -1110,10 +1111,10 @@ nettle_md4_update (struct md4_ctx * ctx,
   _100 = MEM[(const uint8_t *)data_149 + 1B];
   _101 = (unsigned int) _100;
   _102 = _101 << 8;
+  _106 = MEM[(const uint8_t *)data_149];
   _104 = *data_149;
   _105 = (unsigned int) _104;
   _123 = _99 | _105;
-  _106 = _102 | _123;

Which looks perfectly fine. So yeah, I guess the problem is at a different
level.

Reply via email to