[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR
--- Additional Comments From guardia at sympatico dot ca 2005-06-09 20:44 --- So where exactly should we specify such a directory? I was not able to find any other configuration variables that we can change and that would do the job... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607
[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR
--- Additional Comments From guardia at sympatico dot ca 2005-06-10 03:15 --- Through fstab, yes, but the problem is it only works with specially compiled binaries. Right off the tar ball, gcc compiles to a native win32 program and does not honor MSYS's fstab... so no, for a Win32 program, there is no way to mount stuff in a Unixy manner... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607
[Bug regression/19530] New: MMX load intrinsic produces SSE superflus instructions (movlps)
When I compile this code : #include __m64 moo(int i) { __m64 tmp = _mm_cvtsi32_si64(i); return tmp; } With (GCC) 4.0.0 20050116 like so: gcc -O3 -S -mmmx moo.c I get this (without the function pop/push etc) movd12(%ebp), %mm0 movq%mm0, (%eax) However, if I use the -msse flag instead of -mmmx, I get this: movd12(%ebp), %mm0 movq%mm0, -8(%ebp) movlps -8(%ebp), %xmm1 movlps %xmm1, (%eax) gcc 3.4.2 does not display this behavior. I didn't get the chance to test it on my Linux installation yet, but I'm pretty sure it's going to give the same results.. I didn't use any special flags configuring or building gcc (just ../gcc-4.0-20050116/configure --enable-languages=c,c++ , and make bootstrap) With -O0 flag instead of -O3, we see that it seems that gcc replaced some movq's by movlps's (why??) and they do not get cancelled out during optimization.. I will attach the .i file generated by "gcc -O3 -S -msse moo.c". I also tried a "direct conversion": __m64 tmp = (__m64) (long long) i; But I get a compiler error: internal compiler error: in convert_move, at expr.c:367 -- Summary: MMX load intrinsic produces SSE superflus instructions (movlps) Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: regression AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guardia at sympatico dot ca CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-mingw32 GCC host triplet: i686-pc-mingw32 GCC target triplet: i686-pc-mingw32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug regression/19530] MMX load intrinsic produces SSE superflus instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-19 14:59 --- Created an attachment (id=7991) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7991&action=view) gcc -O3 -S -msse moo.c --save-temps -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug bootstrap/16024] gengtype crashes with mingw and c++ extension
--- Additional Comments From guardia at sympatico dot ca 2005-01-24 17:20 --- I think the "relative path" issue with MSYS and MinGW should be added for example in the notes at: http://gcc.gnu.org/install/specific.html It would save a lot of grief from people trying to build it on MSYS. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16024
[Bug bootstrap/19607] New: Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR
On MSYS, the system headers are found in /mingw/include ... I made a modified gcc/config/i386/t-mingw32 adding: NATIVE_SYSTEM_HEADER_DIR = /mingw/include And it works here. Also, the configure script needs to be started with a relative path or the "srcdir" anyway needs to be relative or gengtype will fail. This problem is documented in bug #16024 (gengtype doesn't crash here, but it still fails not finding a path) -- Summary: Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: guardia at sympatico dot ca CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: i686-pc-mingw32 GCC host triplet: i686-pc-mingw32 GCC target triplet: i686-pc-mingw32 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607
[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR
--- Additional Comments From guardia at sympatico dot ca 2005-01-24 17:31 --- Created an attachment (id=8054) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8054&action=view) new t-mingw32 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-24 18:09 --- MMX intrinsics don't seem to be a standard (?), but I'm under the impression that _mm_cvtsi32_si64 is supposed to generate MMX code. I just tested With (GCC) 4.0.0 20050123, and with -mmmx flag, the result is still the same, with the -msse flag I now get : movss 12(%ebp), %xmm0 movlps %xmm0, (%eax) Which is correct, but what I'm trying to get is a MOVD so I don't have to fish back into memory to use the integer I wanted to load in an mmx register. Or is there another way to generate a MOVD? Also, _mm_unpacklo_pi8 (check moo2.i) still generates superfluous movlps : punpcklbw %mm0, %mm0 movl%esp, %ebp subl$8, %esp movl8(%ebp), %eax movq%mm0, -8(%ebp) movlps -8(%ebp), %xmm1 movlps %xmm1, (%eax) I guess any MMX intrinsics that makes use of the (__m64) cast conversion will suffer from the same problem. I think the fix to all these problems would be to prevent the register allocator from using SSE registers when compiling MMX intrinsics.. ? -- What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-26 23:51 --- I'm wondering, would there be a #pragma directive that would we could use to surround the MMX instrinsics function, and that would prevent the compiler from using the XMM registers?? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-26 23:59 --- Even stranger, it doesn't do it with -march=athlon either... only -march=pentium, pentium2 or pentium3... ? That seems like some weird bug here. There musn't be a THAT big of a difference between the code for pentium3 and the one for athlon right? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-27 00:08 --- Oh oh, I think I'm getting somewhere... if I use both -march=athlon and -msse flags I get the "bad" code. Let me summarize this : -march=pentium3 = bad -msse = bad -march=athlon = good (ie.: no weird movss or movlps, everything looks good) however -march=athlon -msse = bad hum... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-27 02:30 --- Ok ok, SSE is not enabled by default on Athlon... So, is there some sort of "pragma" that could be used to disable SSE registers (force -mmmx sort of) for only part of some code? The way I see it, the problem seems to be that gcc views __m64 and __m128 as the same kind of variables, when they are not. __m64 should always be on mmx registers, and __m128 should always be on xmm registers. Actually, Intel created a new type __m128d, instead of trying to guess which out of integer or float instructions one should use for stuff like MOVDQA.. We can easily see that gcc is trying to put an __m64 variable on xmm registers in moo2.i . I can also prevent it from using an xmm register by using only __v8qi variables (which are invalid ie.: too small on xmm registers): __v8qi moo(__v8qi mmx1) { mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1); return mmx1; } tadam! no movss or movlps... Shouldn't gcc not try to place __m64 variables on xmm registers? If one wants to use an xmm register, one should use __m128 or __m128d (or at least a cast from a __m64 pointer), even on the Pentium 4, I think it makes sense, because moving stuff from mmx registers to xmm registers is not so cheap either.. If one wants to move one 32 bit integer to a mmx register, that should be the job of a specialized intrinsics (_mm_cvtsi32_si64) which maps to a MOVD instruction. And if one wants to load a 64 bit something into an xmm register, that should be the job of _mm_load_ss (and other such functions). At the moment, these intrinsics (_mm_cvtsi32_si64, _mm_load_ss) do NOT generate a mov instruction by themselves.. they go through a process (from what I can understand of i386.c) of "vector initialization" which starts generating mov instructions from MMX, SSE or SSE2 sets without discrimination... In my mind _mm_cvtsi32_si64 should generate a MOVD, and _mm_load_ss a MOVSS, period. Just like __builtin_ia32_punpcklbw generates a PUNPCKLBW. Does it make sense? Is this what you mean by a complete rewrite or were you thinking of something else? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-27 06:19 --- Ok, so from what I gather, the backend is being designed for the autovectorizer which will probably only work right with SSE2 (on x86 that is), as mucking with emms will probably bring too much trouble. Second, we can do any MMX operations on XMM registers in SSE2. So the code for SSE2 does not need to be changed optimization wise for intrinsics. As for a pragma or something, could we for example disable the automatic use of such instructions as movss, movhps, movlps, and the likes on SSE1 (if I may call it that way)? That would most certainly prevent gcc from trying to put __m64 in xmm registers however eager it might want to mov it there... (would it?) And supply a few built-ins to implement manual use of those instructions. I guess such a solution would be nice, although I realize it might not be too kosher ;) I use MMX to load char * arrays into shorts and convert them into float in SSE registers, to process them with float * arrays, so I can't separate the MMX code from the SSE code... Of course, with the way things look at the moment, I might end up writing everything in assembler by hand, but scheduling 200+ instructions (yup yup I have some pretty funky code here) by hand is no fun at all, especially if (ugh when) the algorithm changes. Also, the same code in C with intrinsics can target x86-64 :) yah, that's cool -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-27 22:55 --- I think I'm starting to see the problem here... I tried to understand more of the code, and from this and what you tell me, gcc find registers to use and then finds instructions to that fits the bill. So preventing gcc from using some instructions will only end up in a "instruction not found" error. The register allocator is the one that shouldn't allocate them in the first place, right? Well, let's forget this for now... maybe we should look at the optimization stages: movq%mm0, -8(%ebp) movlps -8(%ebp), %xmm1 movlps %xmm1, (%eax) <- If movlps merely moves 64 bit stuff around, why wasn't it optimized out to a one equivalent movq that also moves 64 bit stuff around? Would that be an optimizer problem instead? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-29 04:47 --- Hum, there apparently seems to be a problem with the optimization stages.. I cooked up another snippet : void moo(__m64 i, unsigned int *r) { unsigned int tmp = __builtin_ia32_vec_ext_v2si (i, 0); *r = tmp; } With -O0 -mmmx we get: movd%mm0, -4(%ebp) movl8(%ebp), %edx movl-4(%ebp), %eax movl%eax, (%edx) Which with -O3 gets reduced to: movl8(%ebp), %eax movd%mm0, (%eax) Now, clearly it understands that "movd" is the same as "movl", except they work on different registers on an MMX only machine. With "movlps" and "movq" it should do the same I think? If the optimization stages can work this out, maybe we wouldn't need to rewrite the MMX/SSE1 support... (BTW, correction, when I said 200+ instructions to schedule, I meant per function. I have a dozen such functions with 200+ instructions, and it ain't going to get any smaller) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-01-29 19:21 --- Hum, ok we can do a "movd %mm0, %eax", that's why it gets combined... Well, I give up. The V8QI (and whatever) -> V2SI conversion seems to be causing all the trouble here if we look at the RTL of something like: __m64 moo(__v8qi mmx1) { mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1); return mmx1; } It explicitly asks for a conversion to V2SI (__m64) that gets assigned to an xmm register afterwards: (insn 15 14 17 1 (set (reg:V8QI 58 [ D.2201 ]) (reg:V8QI 62)) -1 (nil) (nil)) (insn 17 15 18 1 (set (reg:V2SI 63) (subreg:V2SI (reg:V8QI 58 [ D.2201 ]) 0)) -1 (nil) (nil)) (insn 18 17 19 1 (set (mem/i:V2SI (reg/f:SI 60 [ D.2206 ]) [0 +0 S8 A64]) (reg:V2SI 63)) -1 (nil) (nil)) So... the only way to fix this would be to either make the register allocator more intelligent (bug 19161), or to provide intrinsics like the Intel compiler does with one to one mapping to instructions directly. right? That wouldn't be such a bad idea, I think... instead of using the current __builtins for stuff in *mmintrin.h, we could use a different set of builtins that only supports V2SI and nothing else..? Well, that's going to be for another time ;) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-06-21 13:26 --- Hum, it will be interesting to test this (it will have to wait a couple of weeks), but the problem with this here is that there is no "mov" instructions that can move stuff between MMX registers and SSE registers (MOVQ can't do it). In SSE2, there is one (MOVQ), but not in the original SSE. So the compiler generates movlps instructions from/to memory from/to SSE registers along MMX calculations, and, in the original SSE case, ends up not being able to reduce anymore than MMx->memory->XMMx->memory->MMx again for data that should have stayed in MMX registers all along... it does not realize up front how expensive it is to use XMM registers on "SSE1" along with MMX instructions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-06-22 21:59 --- Yup, excited, today, I just compiled the mainbranch to check this out (gcc-4.1-20050618) and it seems to be fixed! I don't see any strange movlps in any of the code I tried to compile with it. Can be moved to FIXED (I'm not sure I should be to one to switch it??) -- What|Removed |Added Status|SUSPENDED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530
[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
--- Additional Comments From guardia at sympatico dot ca 2005-06-22 22:59 --- Thanks to Uros and everybody! -- What|Removed |Added Status|RESOLVED|VERIFIED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530