[Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
[Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
--- Comment #6 from tim at klingt dot org 2008-12-31 09:20 --- > sys_perf_counter_open always returns less than zero for me. > This is with: > Linux gcc13 2.6.18-6-vserver-amd64 #1 SMP Sun Feb 10 17:55:04 UTC 2008 x86_64 > GNU/Linux > > What system call is it trying to do and why? > it is trying to open the performance counters (http://lwn.net/Articles/310176/). it requires a patched kernel, though ... (In reply to comment #3) > t.cc: In function �float __vector__ nova::detail::gen_one()�: > t.cc:34160: warning: �x� is used uninitialized in this function > > inline __m128 gen_one(void) > { > __m128i x; > __m128i ones = _mm_cmpeq_epi32(x, x); > return (__m128)_mm_slli_epi32 (_mm_srli_epi32(ones, 25), 23); > } > > Is undefined code I think. this code is valid. the uninitialized xmm register x is compared with itself in order to set the register ones to . -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
[Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits)
--- Comment #5 from pinskia at gcc dot gnu dot org 2008-12-31 08:12 --- Confirmed, though I don't have a fully reduced testcase yet. Basically it comes down to using unsigned int rather than size_t. If you had used size_t as the index, the code would have worked correctly. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2008-12-31 08:12:50 date|| Summary|[4.4 Regression] extra code |[4.4 Regression] extra code |for setting up loops|for setting up loops (IV- ||opts and 32bits vs 64bits) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
[Bug middle-end/38671] [4.4 Regression] extra code for setting up loops
--- Comment #4 from pinskia at gcc dot gnu dot org 2008-12-31 08:10 --- D.45587 = VIEW_CONVERT_EXPR<__v4si>(x); D.45589 = __builtin_ia32_pcmpeqd128 (D.45587, D.45587); D.45591 = __builtin_ia32_psrldi128 (D.45589, 25); D.45594 = __builtin_ia32_pslldi128 (D.45591, 23); one = VIEW_CONVERT_EXPR<__m128>(VIEW_CONVERT_EXPR<__m128i>(D.45594)); D.45644 = (long unsigned int) ((n >> 2) + 4294967295) + 1 * 16; ivtmp.516 = 0; So the inner loop is not the issue, only the setup code. The extra subtract/add comes from D.45644. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Component|target |middle-end Summary|[4.4 Regression] speed |[4.4 Regression] extra code |regression with sse |for setting up loops |intrinsics | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671