[Bug target/88998] bad codegen with mmx instructions for unordered_map

2019-01-22 Thread dpzmick at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88998

David Zmick  changed:

   What|Removed |Added

 CC||dpzmick at gmail dot com

--- Comment #2 from David Zmick  ---
I'd like to add that we are seeing assertion fail because of the interaction
between MMX and the "long double" used in unordered_map's _M_need_rehash.

long double is forcing the use of the FPU, but the MMX instructions emitted are
wiping out the state of the FPU registers. Adding the an explicit _mm_empty (or
disabling MMX) solves the problem because it eliminates the bad interaction
with the long double in _M_need_rehash.

I'm not sure if there's a good way for developers to know when they need to add
(potentially expensive) calls to _mm_empty to vector code like this, so I feel
like I'd be less surprised if the compiler cleaned up after itself if it uses
MMX for code like this (and leave me with a performance problem to debug) than
if I ended up with the polluted FPU unexpectedly.

[Bug target/88998] bad codegen with mmx instructions for unordered_map

2019-01-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88998

Marc Glisse  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-01-23
 Ever confirmed|0   |1

--- Comment #1 from Marc Glisse  ---
Indeed. Constructing {a,b,0,0} is done by constructing {a,b}, {0,0}, and
concatenating them. _mm_cvtepi32_pd starts with selecting the initial V2SI of a
V4SI. Naturally, the compiler tries to combine them, and finds sse2_cvtpi2pd to
convert directly from V2SI to V2DF. Without this simplification, everything
would have used SSE.