http://llvm.org/bugs/show_bug.cgi?id=2647
Summary: extractps selected too eagerly
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: Windows NT
Status: NEW
Severity: enhancement
Priority: P2
Component: new bugs
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]
CC: [email protected]
The following LLVM IR compiles to suboptimal code on x86 CPUs with SSE4
support, but optimizes fine on older CPUs:
external global float, align 16 ; <float*>:0 [#uses=2]
define internal void @""() {
load float* @0, align 16 ; <float>:1 [#uses=1]
insertelement <4 x float> undef, float %1, i32 0 ; <<4 x
float>>:2 [#uses=1]
call <4 x float> @llvm.x86.sse.rsqrt.ss( <4 x float> %2 )
; <<4 x float>>:3 [#uses=1]
extractelement <4 x float> %3, i32 0 ; <float>:4 [#uses=1]
store float %4, float* @0, align 16
ret void
}
declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone
Here's the result on a Penryn CPU:
push ebp
mov ebp,esp
and esp,0FFFFFFF0h
rsqrtss xmm0,dword ptr ds:[1762ED0h]
extractps eax, xmm0
movd xmm0,eax
movss dword ptr ds:[1762ED0h],xmm0
mov esp,ebp
pop ebp
ret
And this is the lovable code I get on Conroe:
rsqrtss xmm0,dword ptr ds:[1762ED0h]
movss dword ptr ds:[1762ED0h],xmm0
ret
Ignoring the stack setup for now, it looks like extractps is selected too
eagerly for an extractelement v4f32, 0.
P.S: To quickly test with and without SSE4 support just force X86SSELevel to
the desired value in X86Subtarget::AutoDetectSubtargetFeatures().
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
LLVMbugs mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/llvmbugs