[Bug target/88998] bad codegen with mmx instructions for unordered_map
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88998 David Zmick changed: What|Removed |Added CC||dpzmick at gmail dot com --- Comment #2 from David Zmick --- I'd like to add that we are seeing assertion fail because of the interaction between MMX and the "long double" used in unordered_map's _M_need_rehash. long double is forcing the use of the FPU, but the MMX instructions emitted are wiping out the state of the FPU registers. Adding the an explicit _mm_empty (or disabling MMX) solves the problem because it eliminates the bad interaction with the long double in _M_need_rehash. I'm not sure if there's a good way for developers to know when they need to add (potentially expensive) calls to _mm_empty to vector code like this, so I feel like I'd be less surprised if the compiler cleaned up after itself if it uses MMX for code like this (and leave me with a performance problem to debug) than if I ended up with the polluted FPU unexpectedly.
[Bug target/88998] bad codegen with mmx instructions for unordered_map
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88998 Marc Glisse changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-01-23 Ever confirmed|0 |1 --- Comment #1 from Marc Glisse --- Indeed. Constructing {a,b,0,0} is done by constructing {a,b}, {0,0}, and concatenating them. _mm_cvtepi32_pd starts with selecting the initial V2SI of a V4SI. Naturally, the compiler tries to combine them, and finds sse2_cvtpi2pd to convert directly from V2SI to V2DF. Without this simplification, everything would have used SSE.