[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR

2005-06-09 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-06-09 20:44 
---
So where exactly should we specify such a directory? I was not able to find any
other configuration variables that we can change and that would do the job...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607


[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR

2005-06-09 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-06-10 03:15 
---
Through fstab, yes, but the problem is it only works with specially compiled
binaries. Right off the tar ball, gcc compiles to a native win32 program and
does not honor MSYS's fstab... so no, for a Win32 program, there is no way to
mount stuff in a Unixy manner... 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607


[Bug regression/19530] New: MMX load intrinsic produces SSE superflus instructions (movlps)

2005-01-19 Thread guardia at sympatico dot ca
When I compile this code :
#include 
__m64 moo(int i) {
__m64 tmp = _mm_cvtsi32_si64(i);
   return tmp;
}
With (GCC) 4.0.0 20050116 like so:
   gcc -O3 -S -mmmx moo.c
I get this (without the function pop/push etc)
movd12(%ebp), %mm0
movq%mm0, (%eax)
However, if I use the -msse flag instead of -mmmx, I get this:
movd12(%ebp), %mm0
movq%mm0, -8(%ebp)
movlps  -8(%ebp), %xmm1
movlps  %xmm1, (%eax)

gcc 3.4.2 does not display this behavior. I didn't get the chance to test it on
my Linux installation yet, but I'm pretty sure it's going to give the same
results.. I didn't use any special flags configuring or building gcc (just
../gcc-4.0-20050116/configure --enable-languages=c,c++ , and make bootstrap)
With -O0 flag instead of -O3, we see that it seems that gcc replaced some movq's
by movlps's (why??) and they do not get cancelled out during optimization..

I will attach the .i file generated by "gcc -O3 -S -msse moo.c".


I also tried a "direct conversion":
__m64 tmp = (__m64) (long long) i;
But I get a compiler error:
   internal compiler error: in convert_move, at expr.c:367

-- 
   Summary: MMX load intrinsic produces SSE superflus instructions
(movlps)
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: guardia at sympatico dot ca
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-mingw32
  GCC host triplet: i686-pc-mingw32
GCC target triplet: i686-pc-mingw32


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug regression/19530] MMX load intrinsic produces SSE superflus instructions (movlps)

2005-01-19 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-19 14:59 
---
Created an attachment (id=7991)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=7991&action=view)
gcc -O3 -S -msse moo.c  --save-temps


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug bootstrap/16024] gengtype crashes with mingw and c++ extension

2005-01-24 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-24 17:20 
---
I think the "relative path" issue with MSYS and MinGW should be added for
example in the notes at:

http://gcc.gnu.org/install/specific.html

It would save a lot of grief from people trying to build it on MSYS.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16024


[Bug bootstrap/19607] New: Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR

2005-01-24 Thread guardia at sympatico dot ca
On MSYS, the system headers are found in /mingw/include ... I made a modified
gcc/config/i386/t-mingw32 adding:

NATIVE_SYSTEM_HEADER_DIR = /mingw/include

And it works here.

Also, the configure script needs to be started with a relative path or the
"srcdir" anyway needs to be relative or gengtype will fail. This problem is
documented in bug #16024 (gengtype doesn't crash here, but it still fails not
finding a path)

-- 
   Summary: Build fails on MSYS/MingGW because of incorrect
SYSTEM_HEADER_DIR
   Product: gcc
   Version: 4.0.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: guardia at sympatico dot ca
CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-mingw32
  GCC host triplet: i686-pc-mingw32
GCC target triplet: i686-pc-mingw32


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607


[Bug bootstrap/19607] Build fails on MSYS/MingGW because of incorrect SYSTEM_HEADER_DIR

2005-01-24 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-24 17:31 
---
Created an attachment (id=8054)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8054&action=view)
new t-mingw32


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19607


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-24 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-24 18:09 
---
MMX intrinsics don't seem to be a standard (?), but I'm under the impression
that _mm_cvtsi32_si64 is supposed to generate MMX code. I just tested With (GCC)
4.0.0 20050123, and with -mmmx flag, the result is still the same, with the
-msse flag I now get :

movss   12(%ebp), %xmm0
movlps  %xmm0, (%eax)

Which is correct, but what I'm trying to get is a MOVD so I don't have to fish
back into memory to use the integer I wanted to load in an mmx register.

Or is there another way to generate a MOVD?

Also,  _mm_unpacklo_pi8 (check moo2.i) still generates superfluous movlps :
punpcklbw   %mm0, %mm0
movl%esp, %ebp
subl$8, %esp
movl8(%ebp), %eax
movq%mm0, -8(%ebp)
movlps  -8(%ebp), %xmm1
movlps  %xmm1, (%eax)

I guess any MMX intrinsics that makes use of the (__m64) cast conversion will
suffer from the same problem. I think the fix to all these problems would be
to prevent the register allocator from using SSE registers when compiling MMX
intrinsics.. ?

-- 
   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-26 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-26 23:51 
---
I'm wondering, would there be a #pragma directive that would we could use to
surround the MMX instrinsics function, and that would prevent the compiler from
using the XMM registers??

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-26 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-26 23:59 
---
Even stranger, it doesn't do it with -march=athlon either... only
-march=pentium, pentium2 or pentium3... ?

That seems like some weird bug here. There musn't be a THAT big of a difference
between the code for pentium3 and the one for athlon right?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-26 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-27 00:08 
---
Oh oh, I think I'm getting somewhere... if I use both -march=athlon and -msse
flags I get the "bad" code.  Let me summarize this :

-march=pentium3 = bad
-msse = bad
-march=athlon = good   (ie.: no weird movss or movlps, everything looks good)
however 
-march=athlon -msse = bad

hum...

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-26 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-27 02:30 
---
Ok ok, SSE is not enabled by default on Athlon... 

So, is there some sort of "pragma" that could be used to disable SSE registers
(force -mmmx sort of) for only part of some code? 

The way I see it, the problem seems to be that gcc views __m64 and __m128 as the
same kind of variables, when they are not. __m64 should always be on mmx
registers, and __m128 should always be on xmm registers. Actually, Intel created
a new type __m128d, instead of trying to guess which out of integer or float
instructions one should use for stuff like MOVDQA..

We can easily see that gcc is trying to put an __m64 variable on xmm registers
in moo2.i . I can also prevent it from using an xmm register by using only
__v8qi variables (which are invalid ie.: too small on xmm registers):
__v8qi moo(__v8qi mmx1)
{
   mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1);
   return mmx1;
}
tadam! no movss or movlps...

Shouldn't gcc not try to place __m64 variables on xmm registers? If one wants to
use an xmm register, one should use __m128 or __m128d (or at least a cast from a
__m64 pointer), even on the Pentium 4, I think it makes sense, because moving
stuff from mmx registers to xmm registers is not so cheap either..

If one wants to move one 32 bit integer to a mmx register, that should be the
job of a specialized intrinsics (_mm_cvtsi32_si64) which maps to a MOVD
instruction. And if one wants to load a 64 bit something into an xmm register,
that should be the job of _mm_load_ss (and other such functions). At the moment,
these intrinsics (_mm_cvtsi32_si64, _mm_load_ss) do NOT generate a mov
instruction by themselves.. they go through a process (from what I can
understand of i386.c) of "vector initialization" which starts generating mov
instructions from MMX, SSE or SSE2 sets without discrimination... In my mind
_mm_cvtsi32_si64 should generate a MOVD, and _mm_load_ss a MOVSS, period. Just
like __builtin_ia32_punpcklbw generates a PUNPCKLBW.

Does it make sense? Is this what you mean by a complete rewrite or were you
thinking of something else?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-26 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-27 06:19 
---
Ok, so from what I gather, the backend is being designed for the autovectorizer
which will probably only work right with SSE2 (on x86 that is), as mucking with
emms will probably bring too much trouble. Second, we can do any MMX operations
on XMM registers in SSE2. So the code for SSE2 does not need to be changed
optimization wise for intrinsics.

As for a pragma or something, could we for example disable the automatic use of
such instructions as movss, movhps, movlps, and the likes on SSE1 (if I may call
it that way)? That would most certainly prevent gcc from trying to put __m64 in
xmm registers however eager it might want to mov it there... (would it?) And
supply a few built-ins to implement manual use of those instructions. I guess
such a solution would be nice, although I realize it might not be too kosher ;) 

I use MMX to load char * arrays into shorts and convert them into float in SSE
registers, to process them with float * arrays, so I can't separate the MMX code
from the SSE code... 

Of course, with the way things look at the moment, I might end up writing
everything in assembler by hand, but scheduling 200+ instructions (yup yup I
have some pretty funky code here) by hand is no fun at all, especially if (ugh
when) the algorithm changes. Also, the same code in C with intrinsics can target
x86-64 :) yah, that's cool

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-27 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-27 22:55 
---
I think I'm starting to see the problem here... I tried to understand more of
the code, and from this and what you tell me, gcc find registers to use and then
finds instructions to that fits the bill. So preventing gcc from using some
instructions will only end up in a "instruction not found" error. The register
allocator is the one that shouldn't allocate them in the first place, right?

Well, let's forget this for now... maybe we should look at the optimization 
stages:
movq%mm0, -8(%ebp)
movlps  -8(%ebp), %xmm1
movlps  %xmm1, (%eax)
<- If movlps merely moves 64 bit stuff around, why wasn't it optimized out to a
one equivalent movq that also moves 64 bit stuff around? Would that be an
optimizer problem instead?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-28 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-29 04:47 
---
Hum, there apparently seems to be a problem with the optimization stages.. I
cooked up another snippet :

void moo(__m64 i, unsigned int *r)
{
   unsigned int tmp = __builtin_ia32_vec_ext_v2si (i, 0);
   *r = tmp;
}

With -O0 -mmmx we get:
movd%mm0, -4(%ebp)
movl8(%ebp), %edx
movl-4(%ebp), %eax
movl%eax, (%edx)
Which with -O3 gets reduced to:
movl8(%ebp), %eax
movd%mm0, (%eax)

Now, clearly it understands that "movd" is the same as "movl", except they work
on different registers on an MMX only machine. With "movlps" and "movq" it
should do the same I think? If the optimization stages can work this out, maybe
we wouldn't need to rewrite the MMX/SSE1 support...

(BTW, correction, when I said 200+ instructions to schedule, I meant per
function. I have a dozen such functions with 200+ instructions, and it ain't
going to get any smaller)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-01-29 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-01-29 19:21 
---
Hum, ok we can do a "movd %mm0, %eax", that's why it gets combined... 

Well, I give up. The V8QI (and whatever) -> V2SI conversion seems to be causing
all the trouble here if we look at the RTL of something like:
__m64 moo(__v8qi mmx1)
{
   mmx1 = __builtin_ia32_punpcklbw (mmx1, mmx1);
   return mmx1;
}

It explicitly asks for a conversion to V2SI (__m64) that gets assigned to an xmm
register afterwards:
(insn 15 14 17 1 (set (reg:V8QI 58 [ D.2201 ])
(reg:V8QI 62)) -1 (nil)
(nil))

(insn 17 15 18 1 (set (reg:V2SI 63)
(subreg:V2SI (reg:V8QI 58 [ D.2201 ]) 0)) -1 (nil)
(nil))

(insn 18 17 19 1 (set (mem/i:V2SI (reg/f:SI 60 [ D.2206 ]) [0 +0 S8 
A64])
(reg:V2SI 63)) -1 (nil)
(nil))

So... the only way to fix this would be to either make the register allocator
more intelligent (bug 19161), or to provide intrinsics like the Intel compiler
does with one to one mapping to instructions directly. right? That wouldn't be
such a bad idea, I think... instead of using the current __builtins for stuff in
*mmintrin.h, we could use a different set of builtins that only supports V2SI
and nothing else..? Well, that's going to be for another time ;)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-06-21 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-06-21 13:26 
---
Hum, it will be interesting to test this (it will have to wait a couple of
weeks), but the problem with this here is that there is no "mov" instructions
that can move stuff between MMX registers and SSE registers (MOVQ can't do it).
In SSE2, there is one (MOVQ), but not in the original SSE. So the compiler
generates movlps instructions from/to memory from/to SSE registers along MMX
calculations, and, in the original SSE case, ends up not being able to reduce
anymore than MMx->memory->XMMx->memory->MMx again for data that should have
stayed in MMX registers all along... it does not realize up front how expensive
it is to use XMM registers on "SSE1" along with MMX instructions.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-06-22 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-06-22 21:59 
---
Yup, excited, today, I just compiled the mainbranch to check this out
(gcc-4.1-20050618) and it seems to be fixed! I don't see any strange movlps in
any of the code I tried to compile with it. Can be moved to FIXED (I'm not sure
I should be to one to switch it??)

-- 
   What|Removed |Added

 Status|SUSPENDED   |WAITING


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530


[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

2005-06-22 Thread guardia at sympatico dot ca

--- Additional Comments From guardia at sympatico dot ca  2005-06-22 22:59 
---
Thanks to Uros and everybody!

-- 
   What|Removed |Added

 Status|RESOLVED|VERIFIED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530