[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.168 -> 1.169 --- Log message: add a note --- Diffs of the changes: (+9 -0) README.txt |9 + 1 files changed, 9 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.168 llvm/lib/Target/X86/README.txt:1.169 --- llvm/lib/Target/X86/README.txt:1.168Wed May 9 19:08:04 2007 +++ llvm/lib/Target/X86/README.txt Fri May 18 15:18:14 2007 @@ -26,6 +26,15 @@ ... which should only be one imul instruction. +or: + +unsigned long long int t2(unsigned int a, unsigned int b) { + return (unsigned long long)a * b; +} + +... which should be one mul instruction. + + This can be done with a custom expander, but it would be nice to move this to generic code. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.167 -> 1.168 --- Log message: add some notes --- Diffs of the changes: (+28 -0) README.txt | 28 1 files changed, 28 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.167 llvm/lib/Target/X86/README.txt:1.168 --- llvm/lib/Target/X86/README.txt:1.167Sat May 5 17:10:24 2007 +++ llvm/lib/Target/X86/README.txt Wed May 9 19:08:04 2007 @@ -1094,5 +1094,33 @@ has this xform, but it is currently disabled until the alignment fields of the load/store nodes are trustworthy. +//===-===// +Sometimes it is better to codegen subtractions from a constant (e.g. 7-x) with +a neg instead of a sub instruction. Consider: + +int test(char X) { return 7-X; } + +we currently produce: +_test: +movl $7, %eax +movsbl 4(%esp), %ecx +subl %ecx, %eax +ret + +We would use one fewer register if codegen'd as: + +movsbl 4(%esp), %eax + neg %eax +add $7, %eax +ret + +Note that this isn't beneficial if the load can be folded into the sub. In +this case, we want a sub: + +int test(int X) { return 7-X; } +_test: +movl $7, %eax +subl 4(%esp), %eax +ret ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.166 -> 1.167 --- Log message: move CodeGen/X86/overlap-add.ll here. --- Diffs of the changes: (+27 -0) README.txt | 27 +++ 1 files changed, 27 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.166 llvm/lib/Target/X86/README.txt:1.167 --- llvm/lib/Target/X86/README.txt:1.166Mon Apr 16 19:02:37 2007 +++ llvm/lib/Target/X86/README.txt Sat May 5 17:10:24 2007 @@ -1004,6 +1004,33 @@ movl %edi, %eax ret +Another example is: + +;; X's live range extends beyond the shift, so the register allocator +;; cannot coalesce it with Y. Because of this, a copy needs to be +;; emitted before the shift to save the register value before it is +;; clobbered. However, this copy is not needed if the register +;; allocator turns the shift into an LEA. This also occurs for ADD. + +; Check that the shift gets turned into an LEA. +; RUN: llvm-upgrade < %s | llvm-as | llc -march=x86 -x86-asm-syntax=intel | \ +; RUN: not grep {mov E.X, E.X} + +%G = external global int + +int %test1(int %X, int %Y) { +%Z = add int %X, %Y +volatile store int %Y, int* %G +volatile store int %Z, int* %G +ret int %X +} + +int %test2(int %X) { +%Z = add int %X, 1 ;; inc +volatile store int %Z, int* %G +ret int %X +} + //===-===// We use push/pop of stack space around calls in situations where we don't have to. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.165 -> 1.166 --- Log message: SSE4 is apparently public now. --- Diffs of the changes: (+2 -0) README.txt |2 ++ 1 files changed, 2 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.165 llvm/lib/Target/X86/README.txt:1.166 --- llvm/lib/Target/X86/README.txt:1.165Sat Apr 14 18:06:09 2007 +++ llvm/lib/Target/X86/README.txt Mon Apr 16 19:02:37 2007 @@ -3,6 +3,8 @@ //===-===// Missing features: + - Support for SSE4: http://www.intel.com/software/penryn +http://softwarecommunity.intel.com/isn/Downloads/Intel%20SSE4%20Programming%20Reference.pdf - support for 3DNow! - weird abis? ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.164 -> 1.165 --- Log message: add a note --- Diffs of the changes: (+18 -0) README.txt | 18 ++ 1 files changed, 18 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.164 llvm/lib/Target/X86/README.txt:1.165 --- llvm/lib/Target/X86/README.txt:1.164Wed Apr 11 00:34:00 2007 +++ llvm/lib/Target/X86/README.txt Sat Apr 14 18:06:09 2007 @@ -1049,3 +1049,21 @@ //===-===// +This: +#include +unsigned test(float f) { + return _mm_cvtsi128_si32( (__m128i) _mm_set_ss( f )); +} + +Compiles to: +_test: +movss 4(%esp), %xmm0 +movd %xmm0, %eax +ret + +it should compile to a move from the stack slot directly into eax. DAGCombine +has this xform, but it is currently disabled until the alignment fields of +the load/store nodes are trustworthy. + + + ___ llvm-commits mailing list [EMAIL PROTECTED] http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.163 -> 1.164 --- Log message: done --- Diffs of the changes: (+0 -28) README.txt | 28 1 files changed, 28 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.163 llvm/lib/Target/X86/README.txt:1.164 --- llvm/lib/Target/X86/README.txt:1.163Tue Apr 10 16:14:01 2007 +++ llvm/lib/Target/X86/README.txt Wed Apr 11 00:34:00 2007 @@ -1049,31 +1049,3 @@ //===-===// -Consider: - -int isnegative(unsigned int X) { - return !(X < 2147483648U); -} - -We current compile this to: - -define i32 @isnegative(i32 %X) { -icmp slt i32 %X, 0 ; :0 [#uses=1] -%retval = zext i1 %0 to i32 ; [#uses=1] -ret i32 %retval -} - -and: - -_isnegative: -cmpl $0, 4(%esp) -sets %al -movzbl %al, %eax -ret - -We should produce: - - movl4(%esp), %eax - shrl$31, %eax -ret - ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.162 -> 1.163 --- Log message: new micro optzn --- Diffs of the changes: (+30 -0) README.txt | 30 ++ 1 files changed, 30 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.162 llvm/lib/Target/X86/README.txt:1.163 --- llvm/lib/Target/X86/README.txt:1.162Tue Apr 3 18:41:34 2007 +++ llvm/lib/Target/X86/README.txt Tue Apr 10 16:14:01 2007 @@ -1047,3 +1047,33 @@ } +//===-===// + +Consider: + +int isnegative(unsigned int X) { + return !(X < 2147483648U); +} + +We current compile this to: + +define i32 @isnegative(i32 %X) { +icmp slt i32 %X, 0 ; :0 [#uses=1] +%retval = zext i1 %0 to i32 ; [#uses=1] +ret i32 %retval +} + +and: + +_isnegative: +cmpl $0, 4(%esp) +sets %al +movzbl %al, %eax +ret + +We should produce: + + movl4(%esp), %eax + shrl$31, %eax +ret + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.161 -> 1.162 --- Log message: make a new missing features section --- Diffs of the changes: (+6 -2) README.txt |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.161 llvm/lib/Target/X86/README.txt:1.162 --- llvm/lib/Target/X86/README.txt:1.161Tue Apr 3 18:37:20 2007 +++ llvm/lib/Target/X86/README.txt Tue Apr 3 18:41:34 2007 @@ -2,6 +2,12 @@ // Random ideas for the X86 backend. //===-===// +Missing features: + - support for 3DNow! + - weird abis? + +//===-===// + Add a MUL2U and MUL2S nodes to represent a multiply that returns both the Hi and Lo parts (combination of MUL and MULH[SU] into one node). Add this to X86, & make the dag combiner produce it when needed. This will eliminate one @@ -1040,6 +1046,4 @@ return 0; } -//===-===// -Add support for 3DNow! ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.160 -> 1.161 --- Log message: Updated --- Diffs of the changes: (+4 -0) README.txt |4 1 files changed, 4 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.160 llvm/lib/Target/X86/README.txt:1.161 --- llvm/lib/Target/X86/README.txt:1.160Wed Mar 28 13:17:19 2007 +++ llvm/lib/Target/X86/README.txt Tue Apr 3 18:37:20 2007 @@ -1039,3 +1039,7 @@ return f(decode); return 0; } + +//===-===// + +Add support for 3DNow! ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.159 -> 1.160 --- Log message: add a note --- Diffs of the changes: (+8 -0) README.txt |8 1 files changed, 8 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.159 llvm/lib/Target/X86/README.txt:1.160 --- llvm/lib/Target/X86/README.txt:1.159Wed Mar 21 16:16:39 2007 +++ llvm/lib/Target/X86/README.txt Wed Mar 28 13:17:19 2007 @@ -23,6 +23,14 @@ //===-===// +CodeGen/X86/lea-3.ll:test3 should be a single LEA, not a shift/move. The X86 +backend knows how to three-addressify this shift, but it appears the register +allocator isn't even asking it to do so in this case. We should investigate +why this isn't happening, it could have significant impact on other important +cases for X86 as well. + +//===-===// + This should be one DIV/IDIV instruction, not a libcall: unsigned test(unsigned long long X, unsigned Y) { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.158 -> 1.159 --- Log message: add generation of unnecessary push/pop around calls --- Diffs of the changes: (+42 -0) README.txt | 42 ++ 1 files changed, 42 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.158 llvm/lib/Target/X86/README.txt:1.159 --- llvm/lib/Target/X86/README.txt:1.158Wed Mar 14 16:03:53 2007 +++ llvm/lib/Target/X86/README.txt Wed Mar 21 16:16:39 2007 @@ -989,3 +989,45 @@ ret //===-===// + +We use push/pop of stack space around calls in situations where we don't have to. +Call to f below produces: +subl $16, %esp < +movl %eax, (%esp) +call L_f$stub +addl $16, %esp < +The stack push/pop can be moved into the prolog/epilog. It does this because it's +building the frame pointer, but this should not be sufficient, only the use of alloca +should cause it to do this. +(There are other issues shown by this code, but this is one.) + +typedef struct _range_t { +float fbias; +float fscale; +int ibias; +int iscale; +int ishift; +unsigned char lut[]; +} range_t; + +struct _decode_t { +int type:4; +int unit:4; +int alpha:8; +int N:8; +int bpc:8; +int bpp:16; +int skip:8; +int swap:8; +const range_t*const*range; +}; + +typedef struct _decode_t decode_t; + +extern int f(const decode_t* decode); + +int decode_byte (const decode_t* decode) { + if (decode->swap != 0) +return f(decode); + return 0; +} ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.157 -> 1.158 --- Log message: Notes about codegen issues. --- Diffs of the changes: (+47 -0) README.txt | 47 +++ 1 files changed, 47 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.157 llvm/lib/Target/X86/README.txt:1.158 --- llvm/lib/Target/X86/README.txt:1.157Thu Mar 1 23:04:52 2007 +++ llvm/lib/Target/X86/README.txt Wed Mar 14 16:03:53 2007 @@ -339,6 +339,53 @@ //===-===// +We are generating far worse code than gcc: + +volatile short X, Y; + +void foo(int N) { + int i; + for (i = 0; i < N; i++) { X = i; Y = i*4; } +} + +LBB1_1:#bb.preheader + xorl %ecx, %ecx + xorw %dx, %dx +LBB1_2:#bb + movl L_X$non_lazy_ptr, %esi + movw %dx, (%esi) + movw %dx, %si + shlw $2, %si + movl L_Y$non_lazy_ptr, %edi + movw %si, (%edi) + incl %ecx + incw %dx + cmpl %eax, %ecx + jne LBB1_2 #bb + +vs. + + xorl%edx, %edx + movlL_X$non_lazy_ptr-"L001$pb"(%ebx), %esi + movlL_Y$non_lazy_ptr-"L001$pb"(%ebx), %ecx +L4: + movw%dx, (%esi) + leal0(,%edx,4), %eax + movw%ax, (%ecx) + addl$1, %edx + cmpl%edx, %edi + jne L4 + +There are 3 issues: + +1. Lack of post regalloc LICM. +2. Poor sub-regclass support. That leads to inability to promote the 16-bit + arithmetic op to 32-bit and making use of leal. +3. LSR unable to reused IV for a different type (i16 vs. i32) even though + the cast would be free. + +//===-===// + Teach the coalescer to coalesce vregs of different register classes. e.g. FR32 / FR64 to VR128. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.156 -> 1.157 --- Log message: add a note --- Diffs of the changes: (+22 -0) README.txt | 22 ++ 1 files changed, 22 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.156 llvm/lib/Target/X86/README.txt:1.157 --- llvm/lib/Target/X86/README.txt:1.156Mon Feb 12 15:20:26 2007 +++ llvm/lib/Target/X86/README.txt Thu Mar 1 23:04:52 2007 @@ -920,3 +920,25 @@ Though this probably isn't worth it. //===-===// + +We need to teach the codegen to convert two-address INC instructions to LEA +when the flags are dead. For example, on X86-64, compile: + +int foo(int A, int B) { + return A+1; +} + +to: + +_foo: +leal1(%edi), %eax +ret + +instead of: + +_foo: +incl %edi +movl %edi, %eax +ret + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.155 -> 1.156 --- Log message: more notes --- Diffs of the changes: (+26 -3) README.txt | 29 ++--- 1 files changed, 26 insertions(+), 3 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.155 llvm/lib/Target/X86/README.txt:1.156 --- llvm/lib/Target/X86/README.txt:1.155Mon Feb 12 14:26:34 2007 +++ llvm/lib/Target/X86/README.txt Mon Feb 12 15:20:26 2007 @@ -874,15 +874,15 @@ if (X) abort(); } -is currently compiled to (with -static): +is currently compiled to: _test: subl $12, %esp cmpl $0, 16(%esp) -jne LBB1_1 #cond_true +jne LBB1_1 addl $12, %esp ret -LBB1_1: #cond_true +LBB1_1: call L_abort$stub It would be better to produce: @@ -895,5 +895,28 @@ ret This can be applied to any no-return function call that takes no arguments etc. +Alternatively, the stack save/restore logic could be shrink-wrapped, producing +something like this: + +_test: +cmpl $0, 4(%esp) +jne LBB1_1 +ret +LBB1_1: +subl $12, %esp +call L_abort$stub + +Both are useful in different situations. Finally, it could be shrink-wrapped +and tail called, like this: + +_test: +cmpl $0, 4(%esp) +jne LBB1_1 +ret +LBB1_1: +pop %eax # realign stack. +call L_abort$stub + +Though this probably isn't worth it. //===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.154 -> 1.155 --- Log message: add a note --- Diffs of the changes: (+29 -0) README.txt | 29 + 1 files changed, 29 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.154 llvm/lib/Target/X86/README.txt:1.155 --- llvm/lib/Target/X86/README.txt:1.154Thu Feb 8 17:53:38 2007 +++ llvm/lib/Target/X86/README.txt Mon Feb 12 14:26:34 2007 @@ -868,3 +868,32 @@ //===-===// +This code: + +void test(int X) { + if (X) abort(); +} + +is currently compiled to (with -static): + +_test: +subl $12, %esp +cmpl $0, 16(%esp) +jne LBB1_1 #cond_true +addl $12, %esp +ret +LBB1_1: #cond_true +call L_abort$stub + +It would be better to produce: + +_test: +subl $12, %esp +cmpl $0, 16(%esp) +jne L_abort$stub +addl $12, %esp +ret + +This can be applied to any no-return function call that takes no arguments etc. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.153 -> 1.154 --- Log message: This is done. --- Diffs of the changes: (+0 -14) README.txt | 14 -- 1 files changed, 14 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.153 llvm/lib/Target/X86/README.txt:1.154 --- llvm/lib/Target/X86/README.txt:1.153Sun Jan 21 01:03:37 2007 +++ llvm/lib/Target/X86/README.txt Thu Feb 8 17:53:38 2007 @@ -665,20 +665,6 @@ //===-===// -We generate really bad code in some cases due to lowering SETCC/SELECT at -legalize time, which prevents the post-legalize dag combine pass from -understanding the code. As a silly example, this prevents us from folding -stuff like this: - -bool %test(ulong %x) { - %tmp = setlt ulong %x, 4294967296 - ret bool %tmp -} - -into x.h == 0 - -//===-===// - We currently compile sign_extend_inreg into two shifts: long foo(long X) { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.152 -> 1.153 --- Log message: add a note --- Diffs of the changes: (+52 -0) README.txt | 52 1 files changed, 52 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.152 llvm/lib/Target/X86/README.txt:1.153 --- llvm/lib/Target/X86/README.txt:1.152Mon Jan 15 00:25:39 2007 +++ llvm/lib/Target/X86/README.txt Sun Jan 21 01:03:37 2007 @@ -830,3 +830,55 @@ the pxor is not needed, we could compare the value against itself. +//===-===// + +These two functions have identical effects: + +unsigned int f(unsigned int i, unsigned int n) {++i; if (i == n) ++i; return i;} +unsigned int f2(unsigned int i, unsigned int n) {++i; i += i == n; return i;} + +We currently compile them to: + +_f: +movl 4(%esp), %eax +movl %eax, %ecx +incl %ecx +movl 8(%esp), %edx +cmpl %edx, %ecx +jne LBB1_2 #UnifiedReturnBlock +LBB1_1: #cond_true +addl $2, %eax +ret +LBB1_2: #UnifiedReturnBlock +movl %ecx, %eax +ret +_f2: +movl 4(%esp), %eax +movl %eax, %ecx +incl %ecx +cmpl 8(%esp), %ecx +sete %cl +movzbl %cl, %ecx +leal 1(%ecx,%eax), %eax +ret + +both of which are inferior to GCC's: + +_f: +movl4(%esp), %edx +leal1(%edx), %eax +addl$2, %edx +cmpl8(%esp), %eax +cmove %edx, %eax +ret +_f2: +movl4(%esp), %eax +addl$1, %eax +xorl%edx, %edx +cmpl8(%esp), %eax +sete%dl +addl%edx, %eax +ret + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.151 -> 1.152 --- Log message: add some notes --- Diffs of the changes: (+68 -0) README.txt | 68 + 1 files changed, 68 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.151 llvm/lib/Target/X86/README.txt:1.152 --- llvm/lib/Target/X86/README.txt:1.151Fri Jan 12 13:20:47 2007 +++ llvm/lib/Target/X86/README.txt Mon Jan 15 00:25:39 2007 @@ -762,3 +762,71 @@ We should inline lrintf and probably other libc functions. //===-===// + +Start using the flags more. For example, compile: + +int add_zf(int *x, int y, int a, int b) { + if ((*x += y) == 0) + return a; + else + return b; +} + +to: + addl%esi, (%rdi) + movl%edx, %eax + cmovne %ecx, %eax + ret +instead of: + +_add_zf: +addl (%rdi), %esi +movl %esi, (%rdi) +testl %esi, %esi +cmove %edx, %ecx +movl %ecx, %eax +ret + +and: + +int add_zf(int *x, int y, int a, int b) { + if ((*x + y) < 0) + return a; + else + return b; +} + +to: + +add_zf: +addl(%rdi), %esi +movl%edx, %eax +cmovns %ecx, %eax +ret + +instead of: + +_add_zf: +addl (%rdi), %esi +testl %esi, %esi +cmovs %edx, %ecx +movl %ecx, %eax +ret + +//===-===// + +This: +#include +int foo(double X) { return isnan(X); } + +compiles to (-m64): + +_foo: +pxor %xmm1, %xmm1 +ucomisd %xmm1, %xmm0 +setp %al +movzbl %al, %eax +ret + +the pxor is not needed, we could compare the value against itself. + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ATTAsmPrinter.cpp X86AsmPrinter.cpp X86AsmPrinter.h X86ISelDAGToDAG.cpp X86ISelLowering.cpp X86RegisterInfo.cpp X86Subtarget.cpp X86Subtarget.h X8
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.150 -> 1.151 X86ATTAsmPrinter.cpp updated: 1.83 -> 1.84 X86AsmPrinter.cpp updated: 1.224 -> 1.225 X86AsmPrinter.h updated: 1.41 -> 1.42 X86ISelDAGToDAG.cpp updated: 1.141 -> 1.142 X86ISelLowering.cpp updated: 1.313 -> 1.314 X86RegisterInfo.cpp updated: 1.188 -> 1.189 X86Subtarget.cpp updated: 1.47 -> 1.48 X86Subtarget.h updated: 1.25 -> 1.26 X86TargetMachine.cpp updated: 1.134 -> 1.135 --- Log message: * PIC codegen for X86/Linux has been implemented * PIC-aware internal structures in X86 Codegen have been refactored * Visibility (default/weak) has been added * Docs fixes (external weak linkage, visibility, formatting) --- Diffs of the changes: (+201 -97) README.txt |4 - X86ATTAsmPrinter.cpp | 109 +++ X86AsmPrinter.cpp| 13 ++ X86AsmPrinter.h | 11 - X86ISelDAGToDAG.cpp | 17 +++ X86ISelLowering.cpp | 79 +++- X86RegisterInfo.cpp |4 + X86Subtarget.cpp | 15 +++ X86Subtarget.h | 25 +-- X86TargetMachine.cpp | 21 + 10 files changed, 201 insertions(+), 97 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.150 llvm/lib/Target/X86/README.txt:1.151 --- llvm/lib/Target/X86/README.txt:1.150Fri Jan 5 19:30:45 2007 +++ llvm/lib/Target/X86/README.txt Fri Jan 12 13:20:47 2007 @@ -534,10 +534,6 @@ //===-===// -We should handle __attribute__ ((__visibility__ ("hidden"))). - -//===-===// - int %foo(int* %a, int %t) { entry: br label %cond_true Index: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.83 llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.84 --- llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.83 Sat Jan 6 18:41:20 2007 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.cppFri Jan 12 13:20:47 2007 @@ -19,6 +19,7 @@ #include "X86MachineFunctionInfo.h" #include "X86TargetMachine.h" #include "X86TargetAsmInfo.h" +#include "llvm/ADT/StringExtras.h" #include "llvm/CallingConv.h" #include "llvm/Module.h" #include "llvm/Support/Mangler.h" @@ -29,6 +30,21 @@ STATISTIC(EmittedInsts, "Number of machine instrs printed"); +static std::string computePICLabel(unsigned fnNumber, + const X86Subtarget* Subtarget) +{ + std::string label; + + if (Subtarget->isTargetDarwin()) { +label = "\"L" + utostr_32(fnNumber) + "$pb\""; + } else if (Subtarget->isTargetELF()) { +label = ".Lllvm$" + utostr_32(fnNumber) + "$piclabel"; + } else +assert(0 && "Don't know how to print PIC label!\n"); + + return label; +} + /// getSectionForFunction - Return the section that we should emit the /// specified function body into. std::string X86ATTAsmPrinter::getSectionForFunction(const Function &F) const { @@ -109,12 +125,15 @@ } break; } + if (F->hasHiddenVisibility()) +O << "\t.hidden " << CurrentFnName << "\n"; + O << CurrentFnName << ":\n"; // Add some workaround for linkonce linkage on Cygwin\MinGW if (Subtarget->isTargetCygMing() && (F->getLinkage() == Function::LinkOnceLinkage || F->getLinkage() == Function::WeakLinkage)) -O << "_llvm$workaround$fake$stub_" << CurrentFnName << ":\n"; +O << "Lllvm$workaround$fake$stub$" << CurrentFnName << ":\n"; if (Subtarget->isTargetDarwin() || Subtarget->isTargetELF() || @@ -193,9 +212,14 @@ if (!isMemOp) O << '$'; O << TAI->getPrivateGlobalPrefix() << "JTI" << getFunctionNumber() << "_" << MO.getJumpTableIndex(); -if (X86PICStyle == PICStyle::Stub && -TM.getRelocationModel() == Reloc::PIC_) - O << "-\"L" << getFunctionNumber() << "$pb\""; + +if (TM.getRelocationModel() == Reloc::PIC_) { + if (Subtarget->isPICStyleStub()) +O << "-\"L" << getFunctionNumber() << "$pb\""; + else if (Subtarget->isPICStyleGOT()) +O << "@GOTOFF"; +} + if (isMemOp && Subtarget->is64Bit() && !NotRIPRel) O << "(%rip)"; return; @@ -205,9 +229,14 @@ if (!isMemOp) O << '$'; O << TAI->getPrivateGlobalPrefix() << "CPI" << getFunctionNumber() << "_" << MO.getConstantPoolIndex(); -if (X86PICStyle == PICStyle::Stub && -TM.getRelocationModel() == Reloc::PIC_) - O << "-\"L" << getFunctionNumber() << "$pb\""; + +if (TM.getRelocationModel() == Reloc::PIC_) { + if (Subtarget->isPICStyleStub()) +O << "-\"L" << getFunctionNumber() << "$pb\""; + if (Subtarget->isPICStyleGOT()) +O << "@GOTOFF"; +} + int Offset = MO.getOffset(); if (Offset > 0) O << "+" << Offset; @@ -228,11 +257,11 @@ bool isExt = (GV->isExternal() || GV->hasWeakLinkage() ||
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.149 -> 1.150 --- Log message: new note --- Diffs of the changes: (+5 -0) README.txt |5 + 1 files changed, 5 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.149 llvm/lib/Target/X86/README.txt:1.150 --- llvm/lib/Target/X86/README.txt:1.149Wed Jan 3 13:12:31 2007 +++ llvm/lib/Target/X86/README.txt Fri Jan 5 19:30:45 2007 @@ -761,3 +761,8 @@ return 0; } +//===-===// + +We should inline lrintf and probably other libc functions. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.148 -> 1.149 --- Log message: fix testcase. It's not safe to strictly evaluate a load that should be lazy. --- Diffs of the changes: (+2 -1) README.txt |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.148 llvm/lib/Target/X86/README.txt:1.149 --- llvm/lib/Target/X86/README.txt:1.148Thu Dec 21 19:03:22 2006 +++ llvm/lib/Target/X86/README.txt Wed Jan 3 13:12:31 2007 @@ -755,8 +755,9 @@ //===-===// This could be a single 16-bit load. + int f(char *p) { -if (p[0] == 1 && p[1] == 2) return 1; +if ((p[0] == 1) & (p[1] == 2)) return 1; return 0; } ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.147 -> 1.148 --- Log message: add a note --- Diffs of the changes: (+9 -0) README.txt |9 + 1 files changed, 9 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.147 llvm/lib/Target/X86/README.txt:1.148 --- llvm/lib/Target/X86/README.txt:1.147Sun Dec 10 19:20:25 2006 +++ llvm/lib/Target/X86/README.txt Thu Dec 21 19:03:22 2006 @@ -751,3 +751,12 @@ //===-===// In c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE. + +//===-===// + +This could be a single 16-bit load. +int f(char *p) { +if (p[0] == 1 && p[1] == 2) return 1; +return 0; +} + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.145 -> 1.146 --- Log message: New entries. --- Diffs of the changes: (+20 -0) README.txt | 20 1 files changed, 20 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.145 llvm/lib/Target/X86/README.txt:1.146 --- llvm/lib/Target/X86/README.txt:1.145Tue Nov 14 02:08:46 2006 +++ llvm/lib/Target/X86/README.txt Tue Nov 28 13:59:25 2006 @@ -730,3 +730,23 @@ except that mul isn't a commutative 2-addr instruction. I guess this has to be done at isel time based on the #uses to mul? +//===-===// + +Make sure the instruction which starts a loop does not cross a cacheline +boundary. This requires knowning the exact length of each machine instruction. +That is somewhat complicated, but doable. Example 256.bzip2: + +In the new trace, the hot loop has an instruction which crosses a cacheline +boundary. In addition to potential cache misses, this can't help decoding as I +imagine there has to be some kind of complicated decoder reset and realignment +to grab the bytes from the next cacheline. + +532 532 0x3cfc movb (1809(%esp, %esi), %bl <<<--- spans 2 64 byte lines +942 942 0x3d03 movl %dh, (1809(%esp, %esi) +937 937 0x3d0a incl %esi +33 0x3d0b cmpb %bl, %dl +27 27 0x3d0d jnz 0x62db + +//===-===// + +In c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.144 -> 1.145 --- Log message: it would be nice of ctlz were lowered to bsf etc. --- Diffs of the changes: (+9 -0) README.txt |9 + 1 files changed, 9 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.144 llvm/lib/Target/X86/README.txt:1.145 --- llvm/lib/Target/X86/README.txt:1.144Fri Nov 10 16:03:35 2006 +++ llvm/lib/Target/X86/README.txt Tue Nov 14 02:08:46 2006 @@ -114,6 +114,15 @@ however, check that these are defined for 0 and 32. Our intrinsics are, GCC's aren't. +Another example (use predsimplify to eliminate a select): + +int foo (unsigned long j) { + if (j) +return __builtin_ffs (j) - 1; + else +return 0; +} + //===-===// Use push/pop instructions in prolog/epilog sequences instead of stores off ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.142 -> 1.143 --- Log message: this part implemented. --- Diffs of the changes: (+0 -29) README.txt | 29 - 1 files changed, 29 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.142 llvm/lib/Target/X86/README.txt:1.143 --- llvm/lib/Target/X86/README.txt:1.142Thu Oct 12 17:01:26 2006 +++ llvm/lib/Target/X86/README.txt Sun Oct 22 16:40:12 2006 @@ -607,35 +607,6 @@ cmp eax, 6 jz label -If we aren't going to do this, we should lower the switch better. We compile -the code to: - -_f: -movl 8(%esp), %eax -movl 4(%esp), %ecx -cmpl $6, %ecx -jl LBB1_4 #entry -jmp LBB1_3 #entry -LBB1_3: #entry -cmpl $6, %ecx -je LBB1_1 #bb -jmp LBB1_2 #UnifiedReturnBlock -LBB1_4: #entry -cmpl $4, %ecx -jne LBB1_2 #UnifiedReturnBlock -LBB1_1: #bb -incl %eax -ret -LBB1_2: #UnifiedReturnBlock -ret - -In the code above, the 'if' is turned into a 'switch' at the mid-level. It -looks like the 'lower to branches' mode could be improved a little here. In -particular, the fall-through to LBB1_3 doesn't need a branch. It would also be -nice to eliminate the redundant "cmp 6", maybe by lowering to a linear sequence -of compares if there are below a certain number of cases (instead of a binary -sequence)? - //===-===// GCC's ix86_expand_int_movcc function (in i386.c) has a ton of interesting ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.141 -> 1.142 --- Log message: add a note --- Diffs of the changes: (+27 -0) README.txt | 27 +++ 1 files changed, 27 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.141 llvm/lib/Target/X86/README.txt:1.142 --- llvm/lib/Target/X86/README.txt:1.141Fri Oct 6 12:39:34 2006 +++ llvm/lib/Target/X86/README.txt Thu Oct 12 17:01:26 2006 @@ -734,3 +734,30 @@ ret //===-===// + +Consider the expansion of: + +uint %test3(uint %X) { +%tmp1 = rem uint %X, 255 +ret uint %tmp1 +} + +Currently it compiles to: + +... +movl $2155905153, %ecx +movl 8(%esp), %esi +movl %esi, %eax +mull %ecx +... + +This could be "reassociated" into: + +movl $2155905153, %eax +movl 8(%esp), %ecx +mull %ecx + +to avoid the copy. In fact, the existing two-address stuff would do this +except that mul isn't a commutative 2-addr instruction. I guess this has +to be done at isel time based on the #uses to mul? + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.140 -> 1.141 --- Log message: ugly codegen --- Diffs of the changes: (+22 -0) README.txt | 22 ++ 1 files changed, 22 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.140 llvm/lib/Target/X86/README.txt:1.141 --- llvm/lib/Target/X86/README.txt:1.140Fri Oct 6 03:21:07 2006 +++ llvm/lib/Target/X86/README.txt Fri Oct 6 12:39:34 2006 @@ -711,4 +711,26 @@ into x.h == 0 +//===-===// +We currently compile sign_extend_inreg into two shifts: + +long foo(long X) { + return (long)(signed char)X; +} + +becomes: + +_foo: +movl 4(%esp), %eax +shll $24, %eax +sarl $24, %eax +ret + +This could be: + +_foo: +movsbl 4(%esp),%eax +ret + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.138 -> 1.139 --- Log message: implemented --- Diffs of the changes: (+0 -35) README.txt | 35 --- 1 files changed, 35 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.138 llvm/lib/Target/X86/README.txt:1.139 --- llvm/lib/Target/X86/README.txt:1.138Thu Sep 21 00:46:00 2006 +++ llvm/lib/Target/X86/README.txt Thu Sep 21 01:14:54 2006 @@ -642,41 +642,6 @@ //===-===// -Compile: -int %test(ulong *%tmp) { -%tmp = load ulong* %tmp ; [#uses=1] -%tmp.mask = shr ulong %tmp, ubyte 50; [#uses=1] -%tmp.mask = cast ulong %tmp.mask to ubyte -%tmp2 = and ubyte %tmp.mask, 3 ; [#uses=1] -%tmp2 = cast ubyte %tmp2 to int ; [#uses=1] -ret int %tmp2 -} - -to: - -_test: -movl 4(%esp), %eax -movl 4(%eax), %eax -shrl $18, %eax -andl $3, %eax -ret - -instead of: - -_test: -movl 4(%esp), %eax -movl 4(%eax), %eax -shrl $18, %eax -# TRUNCATE movb %al, %al -andb $3, %al -movzbl %al, %eax -ret - -This saves a movzbl, and saves a truncate if it doesn't get coallesced right. -This is a simple DAGCombine to propagate the zext through the and. - -//===-===// - GCC's ix86_expand_int_movcc function (in i386.c) has a ton of interesting simplifications for integer "x cmp y ? a : b". For example, instead of: ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.137 -> 1.138 --- Log message: Fit in 80-cols --- Diffs of the changes: (+10 -9) README.txt | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.137 llvm/lib/Target/X86/README.txt:1.138 --- llvm/lib/Target/X86/README.txt:1.137Wed Sep 20 01:32:10 2006 +++ llvm/lib/Target/X86/README.txt Thu Sep 21 00:46:00 2006 @@ -544,9 +544,9 @@ br label %cond_true cond_true: ; preds = %cond_true, %entry -%x.0.0 = phi int [ 0, %entry ], [ %tmp9, %cond_true ] ; [#uses=3] -%t_addr.0.0 = phi int [ %t, %entry ], [ %tmp7, %cond_true ] ; [#uses=1] -%tmp2 = getelementptr int* %a, int %x.0.0 ; [#uses=1] +%x.0.0 = phi int [ 0, %entry ], [ %tmp9, %cond_true ] +%t_addr.0.0 = phi int [ %t, %entry ], [ %tmp7, %cond_true ] +%tmp2 = getelementptr int* %a, int %x.0.0 %tmp3 = load int* %tmp2 ; [#uses=1] %tmp5 = add int %t_addr.0.0, %x.0.0 ; [#uses=1] %tmp7 = add int %tmp5, %tmp3; [#uses=2] @@ -633,11 +633,12 @@ LBB1_2: #UnifiedReturnBlock ret -In the code above, the 'if' is turned into a 'switch' at the mid-level. It looks -like the 'lower to branches' mode could be improved a little here. In particular, -the fall-through to LBB1_3 doesn't need a branch. It would also be nice to -eliminate the redundant "cmp 6", maybe by lowering to a linear sequence of -compares if there are below a certain number of cases (instead of a binary sequence)? +In the code above, the 'if' is turned into a 'switch' at the mid-level. It +looks like the 'lower to branches' mode could be improved a little here. In +particular, the fall-through to LBB1_3 doesn't need a branch. It would also be +nice to eliminate the redundant "cmp 6", maybe by lowering to a linear sequence +of compares if there are below a certain number of cases (instead of a binary +sequence)? //===-===// @@ -645,7 +646,7 @@ int %test(ulong *%tmp) { %tmp = load ulong* %tmp ; [#uses=1] %tmp.mask = shr ulong %tmp, ubyte 50; [#uses=1] -%tmp.mask = cast ulong %tmp.mask to ubyte ; [#uses=1] +%tmp.mask = cast ulong %tmp.mask to ubyte %tmp2 = and ubyte %tmp.mask, 3 ; [#uses=1] %tmp2 = cast ubyte %tmp2 to int ; [#uses=1] ret int %tmp2 ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.136 -> 1.137 --- Log message: add a note --- Diffs of the changes: (+16 -0) README.txt | 16 1 files changed, 16 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.136 llvm/lib/Target/X86/README.txt:1.137 --- llvm/lib/Target/X86/README.txt:1.136Mon Sep 18 00:36:54 2006 +++ llvm/lib/Target/X86/README.txt Wed Sep 20 01:32:10 2006 @@ -734,3 +734,19 @@ addl $12, %esp The add\sub pair is really unneeded here. + +//===-===// + +We generate really bad code in some cases due to lowering SETCC/SELECT at +legalize time, which prevents the post-legalize dag combine pass from +understanding the code. As a silly example, this prevents us from folding +stuff like this: + +bool %test(ulong %x) { + %tmp = setlt ulong %x, 4294967296 + ret bool %tmp +} + +into x.h == 0 + + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.135 -> 1.136 --- Log message: add a note. Our 64-bit shifts are ~30% slower than gcc's --- Diffs of the changes: (+2 -1) README.txt |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.135 llvm/lib/Target/X86/README.txt:1.136 --- llvm/lib/Target/X86/README.txt:1.135Sun Sep 17 15:25:45 2006 +++ llvm/lib/Target/X86/README.txt Mon Sep 18 00:36:54 2006 @@ -59,7 +59,8 @@ But that requires good 8-bit subreg support. - +64-bit shifts (in general) expand to really bad code. Instead of using +cmovs, we should expand to a conditional branch like GCC produces. //===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ATTAsmPrinter.cpp X86AsmPrinter.cpp X86ISelDAGToDAG.cpp X86ISelLowering.cpp X86RegisterInfo.cpp
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.134 -> 1.135 X86ATTAsmPrinter.cpp updated: 1.62 -> 1.63 X86AsmPrinter.cpp updated: 1.197 -> 1.198 X86ISelDAGToDAG.cpp updated: 1.108 -> 1.109 X86ISelLowering.cpp updated: 1.260 -> 1.261 X86RegisterInfo.cpp updated: 1.169 -> 1.170 --- Log message: Added some eye-candy for Subtarget type checking Added X86 StdCall & FastCall calling conventions. Codegen will follow. --- Diffs of the changes: (+33 -7) README.txt | 26 ++ X86ATTAsmPrinter.cpp |2 +- X86AsmPrinter.cpp|4 ++-- X86ISelDAGToDAG.cpp |2 +- X86ISelLowering.cpp |2 +- X86RegisterInfo.cpp |4 ++-- 6 files changed, 33 insertions(+), 7 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.134 llvm/lib/Target/X86/README.txt:1.135 --- llvm/lib/Target/X86/README.txt:1.134Fri Sep 15 22:30:19 2006 +++ llvm/lib/Target/X86/README.txt Sun Sep 17 15:25:45 2006 @@ -707,3 +707,29 @@ //===-===// +Currently we don't have elimination of redundant stack manipulations. Consider +the code: + +int %main() { +entry: + call fastcc void %test1( ) + call fastcc void %test2( sbyte* cast (void ()* %test1 to sbyte*) ) + ret int 0 +} + +declare fastcc void %test1() + +declare fastcc void %test2(sbyte*) + + +This currently compiles to: + + subl $16, %esp + call _test5 + addl $12, %esp + subl $16, %esp + movl $_test5, (%esp) + call _test6 + addl $12, %esp + +The add\sub pair is really unneeded here. Index: llvm/lib/Target/X86/X86ATTAsmPrinter.cpp diff -u llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.62 llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.63 --- llvm/lib/Target/X86/X86ATTAsmPrinter.cpp:1.62 Thu Sep 14 13:23:27 2006 +++ llvm/lib/Target/X86/X86ATTAsmPrinter.cppSun Sep 17 15:25:45 2006 @@ -63,7 +63,7 @@ ".section __TEXT,__textcoal_nt,coalesced,pure_instructions", F); O << "\t.globl\t" << CurrentFnName << "\n"; O << "\t.weak_definition\t" << CurrentFnName << "\n"; -} else if (Subtarget->TargetType == X86Subtarget::isCygwin) { +} else if (Subtarget->isTargetCygwin()) { EmitAlignment(4, F); // FIXME: This should be parameterized somewhere. O << "\t.section\t.llvm.linkonce.t." << CurrentFnName << ",\"ax\"\n"; Index: llvm/lib/Target/X86/X86AsmPrinter.cpp diff -u llvm/lib/Target/X86/X86AsmPrinter.cpp:1.197 llvm/lib/Target/X86/X86AsmPrinter.cpp:1.198 --- llvm/lib/Target/X86/X86AsmPrinter.cpp:1.197 Thu Sep 14 13:23:27 2006 +++ llvm/lib/Target/X86/X86AsmPrinter.cpp Sun Sep 17 15:25:45 2006 @@ -83,7 +83,7 @@ } else O << TAI->getCOMMDirective() << name << "," << Size; } else { - if (Subtarget->TargetType != X86Subtarget::isCygwin) { + if (!Subtarget->isTargetCygwin()) { if (I->hasInternalLinkage()) O << "\t.local\t" << name << "\n"; } @@ -101,7 +101,7 @@ O << "\t.globl " << name << "\n" << "\t.weak_definition " << name << "\n"; SwitchToDataSection(".section __DATA,__const_coal,coalesced", I); -} else if (Subtarget->TargetType == X86Subtarget::isCygwin) { +} else if (Subtarget->isTargetCygwin()) { O << "\t.section\t.llvm.linkonce.d." << name << ",\"aw\"\n" << "\t.weak " << name << "\n"; } else { Index: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp diff -u llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:1.108 llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:1.109 --- llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:1.108 Thu Sep 14 18:55:02 2006 +++ llvm/lib/Target/X86/X86ISelDAGToDAG.cpp Sun Sep 17 15:25:45 2006 @@ -468,7 +468,7 @@ /// the main function. void X86DAGToDAGISel::EmitSpecialCodeForMain(MachineBasicBlock *BB, MachineFrameInfo *MFI) { - if (Subtarget->TargetType == X86Subtarget::isCygwin) + if (Subtarget->isTargetCygwin()) BuildMI(BB, X86::CALLpcrel32, 1).addExternalSymbol("__main"); // Switch the FPU to 64-bit precision mode for better compatibility and speed. Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.260 llvm/lib/Target/X86/X86ISelLowering.cpp:1.261 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.260 Sun Sep 17 08:06:18 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Sun Sep 17 15:25:45 2006 @@ -3907,7 +3907,7 @@ MachineFunction &MF = DAG.getMachineFunction(); const Function* Fn = MF.getFunction(); if (Fn->hasExternalLinkage() && - Subtarget->TargetType == X86Subtarget::isCygwin && + Subtarget->isTargetCygwin() && Fn->getName() == "main") MF.getInfo()->setForceFramePointer(true); Index: llvm/lib/Target/X86/X86RegisterInfo.cpp diff -u llvm/lib/Target/X86/X86Regis
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.133 -> 1.134 --- Log message: add a note --- Diffs of the changes: (+3 -0) README.txt |3 +++ 1 files changed, 3 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.133 llvm/lib/Target/X86/README.txt:1.134 --- llvm/lib/Target/X86/README.txt:1.133Wed Sep 13 18:37:16 2006 +++ llvm/lib/Target/X86/README.txt Fri Sep 15 22:30:19 2006 @@ -18,6 +18,9 @@ ... which should only be one imul instruction. +This can be done with a custom expander, but it would be nice to move this to +generic code. + //===-===// This should be one DIV/IDIV instruction, not a libcall: ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.132 -> 1.133 --- Log message: add note about switch lowering --- Diffs of the changes: (+29 -0) README.txt | 29 + 1 files changed, 29 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.132 llvm/lib/Target/X86/README.txt:1.133 --- llvm/lib/Target/X86/README.txt:1.132Tue Sep 12 23:19:50 2006 +++ llvm/lib/Target/X86/README.txt Wed Sep 13 18:37:16 2006 @@ -607,6 +607,34 @@ cmp eax, 6 jz label +If we aren't going to do this, we should lower the switch better. We compile +the code to: + +_f: +movl 8(%esp), %eax +movl 4(%esp), %ecx +cmpl $6, %ecx +jl LBB1_4 #entry +jmp LBB1_3 #entry +LBB1_3: #entry +cmpl $6, %ecx +je LBB1_1 #bb +jmp LBB1_2 #UnifiedReturnBlock +LBB1_4: #entry +cmpl $4, %ecx +jne LBB1_2 #UnifiedReturnBlock +LBB1_1: #bb +incl %eax +ret +LBB1_2: #UnifiedReturnBlock +ret + +In the code above, the 'if' is turned into a 'switch' at the mid-level. It looks +like the 'lower to branches' mode could be improved a little here. In particular, +the fall-through to LBB1_3 doesn't need a branch. It would also be nice to +eliminate the redundant "cmp 6", maybe by lowering to a linear sequence of +compares if there are below a certain number of cases (instead of a binary sequence)? + //===-===// Compile: @@ -675,3 +703,4 @@ etc. //===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.131 -> 1.132 --- Log message: new note --- Diffs of the changes: (+32 -0) README.txt | 32 1 files changed, 32 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.131 llvm/lib/Target/X86/README.txt:1.132 --- llvm/lib/Target/X86/README.txt:1.131Tue Sep 12 22:54:54 2006 +++ llvm/lib/Target/X86/README.txt Tue Sep 12 23:19:50 2006 @@ -643,3 +643,35 @@ This is a simple DAGCombine to propagate the zext through the and. //===-===// + +GCC's ix86_expand_int_movcc function (in i386.c) has a ton of interesting +simplifications for integer "x cmp y ? a : b". For example, instead of: + +int G; +void f(int X, int Y) { + G = X < 0 ? 14 : 13; +} + +compiling to: + +_f: +movl $14, %eax +movl $13, %ecx +movl 4(%esp), %edx +testl %edx, %edx +cmovl %eax, %ecx +movl %ecx, _G +ret + +it could be: +_f: +movl4(%esp), %eax +sarl$31, %eax +notl%eax +addl$14, %eax +movl%eax, _G +ret + +etc. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.130 -> 1.131 --- Log message: new note --- Diffs of the changes: (+13 -0) README.txt | 13 + 1 files changed, 13 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.130 llvm/lib/Target/X86/README.txt:1.131 --- llvm/lib/Target/X86/README.txt:1.130Tue Sep 12 22:22:10 2006 +++ llvm/lib/Target/X86/README.txt Tue Sep 12 22:54:54 2006 @@ -45,6 +45,19 @@ Another useful one would be ~0ULL >> X and ~0ULL << X. +One better solution for 1LL << x is: +xorl%eax, %eax +xorl%edx, %edx +testb $32, %cl +sete%al +setne %dl +sall%cl, %eax +sall%cl, %edx + +But that requires good 8-bit subreg support. + + + //===-===// Compile this: ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ISelLowering.cpp
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.129 -> 1.130 X86ISelLowering.cpp updated: 1.255 -> 1.256 --- Log message: Compile X > -1 -> text X,X; js dest This implements CodeGen/X86/jump_sign.ll. --- Diffs of the changes: (+23 -28) README.txt | 12 X86ISelLowering.cpp | 39 +++ 2 files changed, 23 insertions(+), 28 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.129 llvm/lib/Target/X86/README.txt:1.130 --- llvm/lib/Target/X86/README.txt:1.129Tue Sep 12 01:36:01 2006 +++ llvm/lib/Target/X86/README.txt Tue Sep 12 22:22:10 2006 @@ -630,15 +630,3 @@ This is a simple DAGCombine to propagate the zext through the and. //===-===// - -Instead of: - - cmpl $4294967295, %edx - jg LBB1_8 #cond_false49 - -emit: - - testl %edx, %edx - js LBB1_8 - -This saves a byte of code space. Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.255 llvm/lib/Target/X86/X86ISelLowering.cpp:1.256 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.255 Tue Sep 12 16:03:39 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Tue Sep 12 22:22:10 2006 @@ -1866,13 +1866,23 @@ /// translateX86CC - do a one to one translation of a ISD::CondCode to the X86 /// specific condition code. It returns a false if it cannot do a direct -/// translation. X86CC is the translated CondCode. Flip is set to true if the -/// the order of comparison operands should be flipped. +/// translation. X86CC is the translated CondCode. LHS/RHS are modified as +/// needed. static bool translateX86CC(ISD::CondCode SetCCOpcode, bool isFP, - unsigned &X86CC, bool &Flip) { - Flip = false; + unsigned &X86CC, SDOperand &LHS, SDOperand &RHS, + SelectionDAG &DAG) { X86CC = X86ISD::COND_INVALID; if (!isFP) { +if (SetCCOpcode == ISD::SETGT) { + if (ConstantSDNode *RHSC = dyn_cast(RHS)) +if (RHSC->isAllOnesValue()) { + // X > -1 -> X == 0, jump on sign. + RHS = DAG.getConstant(0, RHS.getValueType()); + X86CC = X86ISD::COND_S; + return true; +} +} + switch (SetCCOpcode) { default: break; case ISD::SETEQ: X86CC = X86ISD::COND_E; break; @@ -1893,6 +1903,7 @@ // 0 | 0 | 1 | X < Y // 1 | 0 | 0 | X == Y // 1 | 1 | 1 | unordered +bool Flip = false; switch (SetCCOpcode) { default: break; case ISD::SETUEQ: @@ -1914,16 +1925,13 @@ case ISD::SETUO: X86CC = X86ISD::COND_P; break; case ISD::SETO: X86CC = X86ISD::COND_NP; break; } +if (Flip) + std::swap(LHS, RHS); } return X86CC != X86ISD::COND_INVALID; } -static bool translateX86CC(SDOperand CC, bool isFP, unsigned &X86CC, - bool &Flip) { - return translateX86CC(cast(CC)->get(), isFP, X86CC, Flip); -} - /// hasFPCMov - is there a floating point cmov for the specific X86 condition /// code. Current x86 isa includes the following FP cmov instructions: /// fcmovb, fcomvbe, fcomve, fcmovu, fcmovae, fcmova, fcmovne, fcmovnu. @@ -3620,12 +3628,11 @@ ISD::CondCode SetCCOpcode = cast(CC)->get(); const MVT::ValueType *VTs = DAG.getNodeValueTypes(MVT::Other, MVT::Flag); bool isFP = MVT::isFloatingPoint(Op.getOperand(1).getValueType()); - bool Flip; unsigned X86CC; VTs = DAG.getNodeValueTypes(MVT::i8, MVT::Flag); - if (translateX86CC(CC, isFP, X86CC, Flip)) { -if (Flip) std::swap(Op0, Op1); + if (translateX86CC(cast(CC)->get(), isFP, X86CC, + Op0, Op1, DAG)) { SDOperand Ops1[] = { Chain, Op0, Op1 }; Cond = DAG.getNode(X86ISD::CMP, VTs, 2, Ops1, 3).getValue(1); SDOperand Ops2[] = { DAG.getConstant(X86CC, MVT::i8), Cond }; @@ -4356,13 +4363,13 @@ break; } -bool Flip; unsigned X86CC; -translateX86CC(CC, true, X86CC, Flip); +SDOperand LHS = Op.getOperand(1); +SDOperand RHS = Op.getOperand(2); +translateX86CC(CC, true, X86CC, LHS, RHS, DAG); const MVT::ValueType *VTs = DAG.getNodeValueTypes(MVT::Other, MVT::Flag); -SDOperand Ops1[] = { DAG.getEntryNode(), Op.getOperand(Flip?2:1), - Op.getOperand(Flip?1:2) }; +SDOperand Ops1[] = { DAG.getEntryNode(), LHS, RHS }; SDOperand Cond = DAG.getNode(Opc, VTs, 2, Ops1, 3); VTs = DAG.getNodeValueTypes(MVT::i8, MVT::Flag); SDOperand Ops2[] = { DAG.getConstant(X86CC, MVT::i8), Cond }; ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.128 -> 1.129 --- Log message: add a note --- Diffs of the changes: (+14 -0) README.txt | 14 ++ 1 files changed, 14 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.128 llvm/lib/Target/X86/README.txt:1.129 --- llvm/lib/Target/X86/README.txt:1.128Mon Sep 11 18:00:56 2006 +++ llvm/lib/Target/X86/README.txt Tue Sep 12 01:36:01 2006 @@ -628,3 +628,17 @@ This saves a movzbl, and saves a truncate if it doesn't get coallesced right. This is a simple DAGCombine to propagate the zext through the and. + +//===-===// + +Instead of: + + cmpl $4294967295, %edx + jg LBB1_8 #cond_false49 + +emit: + + testl %edx, %edx + js LBB1_8 + +This saves a byte of code space. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.127 -> 1.128 --- Log message: Testcase noticed from PR906: http://llvm.org/PR906 --- Diffs of the changes: (+34 -0) README.txt | 34 ++ 1 files changed, 34 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.127 llvm/lib/Target/X86/README.txt:1.128 --- llvm/lib/Target/X86/README.txt:1.127Mon Sep 11 17:57:51 2006 +++ llvm/lib/Target/X86/README.txt Mon Sep 11 18:00:56 2006 @@ -594,3 +594,37 @@ cmp eax, 6 jz label +//===-===// + +Compile: +int %test(ulong *%tmp) { +%tmp = load ulong* %tmp ; [#uses=1] +%tmp.mask = shr ulong %tmp, ubyte 50; [#uses=1] +%tmp.mask = cast ulong %tmp.mask to ubyte ; [#uses=1] +%tmp2 = and ubyte %tmp.mask, 3 ; [#uses=1] +%tmp2 = cast ubyte %tmp2 to int ; [#uses=1] +ret int %tmp2 +} + +to: + +_test: +movl 4(%esp), %eax +movl 4(%eax), %eax +shrl $18, %eax +andl $3, %eax +ret + +instead of: + +_test: +movl 4(%esp), %eax +movl 4(%eax), %eax +shrl $18, %eax +# TRUNCATE movb %al, %al +andb $3, %al +movzbl %al, %eax +ret + +This saves a movzbl, and saves a truncate if it doesn't get coallesced right. +This is a simple DAGCombine to propagate the zext through the and. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.126 -> 1.127 --- Log message: add compilable testcase --- Diffs of the changes: (+6 -1) README.txt |7 ++- 1 files changed, 6 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.126 llvm/lib/Target/X86/README.txt:1.127 --- llvm/lib/Target/X86/README.txt:1.126Mon Sep 11 00:35:17 2006 +++ llvm/lib/Target/X86/README.txt Mon Sep 11 17:57:51 2006 @@ -581,7 +581,12 @@ Codegen: -if ((variable == 4) || (variable == 6)) { stuff } +int f(int a, int b) { + if (a == 4 || a == 6) +b++; + return b; +} + as: ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.124 -> 1.125 --- Log message: Update README file. --- Diffs of the changes: (+3 -98) README.txt | 101 + 1 files changed, 3 insertions(+), 98 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.124 llvm/lib/Target/X86/README.txt:1.125 --- llvm/lib/Target/X86/README.txt:1.124Tue Aug 15 21:47:44 2006 +++ llvm/lib/Target/X86/README.txt Mon Sep 11 00:25:15 2006 @@ -80,15 +80,6 @@ //===-===// -Model X86 EFLAGS as a real register to avoid redudant cmp / test. e.g. - - cmpl $1, %eax - setg %al - testb %al, %al # unnecessary - jne .BB7 - -//===-===// - Count leading zeros and count trailing zeros: int clz(int X) { return __builtin_clz(X); } @@ -126,6 +117,8 @@ should be made smart enough to cannonicalize the load into the RHS of a compare when it can invert the result of the compare for free. +//===-===// + How about intrinsics? An example is: *res = _mm_mulhi_epu16(*A, _mm_mul_epu32(*B, *C)); @@ -140,51 +133,6 @@ //===-===// -The DAG Isel doesn't fold the loads into the adds in this testcase. The -pattern selector does. This is because the chain value of the load gets -selected first, and the loads aren't checking to see if they are only used by -and add. - -.ll: - -int %test(int* %x, int* %y, int* %z) { -%X = load int* %x -%Y = load int* %y -%Z = load int* %z -%a = add int %X, %Y -%b = add int %a, %Z -ret int %b -} - -dag isel: - -_test: -movl 4(%esp), %eax -movl (%eax), %eax -movl 8(%esp), %ecx -movl (%ecx), %ecx -addl %ecx, %eax -movl 12(%esp), %ecx -movl (%ecx), %ecx -addl %ecx, %eax -ret - -pattern isel: - -_test: -movl 12(%esp), %ecx -movl 4(%esp), %edx -movl 8(%esp), %eax -movl (%eax), %eax -addl (%edx), %eax -addl (%ecx), %eax -ret - -This is bad for register pressure, though the dag isel is producing a -better schedule. :) - -//===-===// - In many cases, LLVM generates code like this: _test: @@ -198,7 +146,7 @@ _test: movl 8(%esp), %ebx - xor %eax, %eax +xor %eax, %eax cmpl %ebx, 4(%esp) setl %al ret @@ -207,38 +155,6 @@ //===-===// -We should generate 'test' instead of 'cmp' in various cases, e.g.: - -bool %test(int %X) { -%Y = shl int %X, ubyte 1 -%C = seteq int %Y, 0 -ret bool %C -} -bool %test(int %X) { -%Y = and int %X, 8 -%C = seteq int %Y, 0 -ret bool %C -} - -This may just be a matter of using 'test' to write bigger patterns for X86cmp. - -An important case is comparison against zero: - -if (X == 0) ... - -instead of: - - cmpl $0, %eax - je LBB4_2 #cond_next - -use: - test %eax, %eax - jz LBB4_2 - -which is smaller. - -//===-===// - We should generate bts/btr/etc instructions on targets where they are cheap or when codesize is important. e.g., for: @@ -564,17 +480,6 @@ //===-===// -Some ideas for instruction selection code simplification: 1. A pre-pass to -determine which chain producing node can or cannot be folded. The generated -isel code would then use the information. 2. The same pre-pass can force -ordering of TokenFactor operands to allow load / store folding. 3. During isel, -instead of recursively going up the chain operand chain, mark the chain operand -as available and put it in some work list. Select other nodes in the normal -manner. The chain operands are selected after all other nodes are selected. Uses -of chain nodes are modified after instruction selection is completed. - -//===-===// - Another instruction selector deficiency: void %bar() { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.123 -> 1.124 --- Log message: add a note --- Diffs of the changes: (+13 -0) README.txt | 13 + 1 files changed, 13 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.123 llvm/lib/Target/X86/README.txt:1.124 --- llvm/lib/Target/X86/README.txt:1.123Wed Aug 2 00:31:20 2006 +++ llvm/lib/Target/X86/README.txt Tue Aug 15 21:47:44 2006 @@ -709,3 +709,16 @@ When using fastcc abi, align stack slot of argument of type double on 8 byte boundary to improve performance. + +//===-===// + +Codegen: + +if ((variable == 4) || (variable == 6)) { stuff } + +as: + +or eax, 2 +cmp eax, 6 +jz label + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.122 -> 1.123 --- Log message: Update the readme to remove duplicate information and clarify the loop problem. --- Diffs of the changes: (+20 -45) README.txt | 65 ++--- 1 files changed, 20 insertions(+), 45 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.122 llvm/lib/Target/X86/README.txt:1.123 --- llvm/lib/Target/X86/README.txt:1.122Wed Jul 26 16:49:52 2006 +++ llvm/lib/Target/X86/README.txt Wed Aug 2 00:31:20 2006 @@ -198,7 +198,7 @@ _test: movl 8(%esp), %ebx - xor %eax, %eax + xor %eax, %eax cmpl %ebx, 4(%esp) setl %al ret @@ -340,22 +340,6 @@ //===-===// -Investigate whether it is better to codegen the following - -%tmp.1 = mul int %x, 9 -to - - movl4(%esp), %eax - leal(%eax,%eax,8), %eax - -as opposed to what llc is currently generating: - - imull $9, 4(%esp), %eax - -Currently the load folding imull has a higher complexity than the LEA32 pattern. - -//===-===// - We are currently lowering large (1MB+) memmove/memcpy to rep/stosl and rep/movsl We should leave these as libcalls for everything over a much lower threshold, since libc is hand tuned for medium and large mem ops (avoiding RFO for large @@ -671,35 +655,26 @@ //===-===// -Consider: -int foo(int *a, int t) { -int x; -for (x=0; x<40; ++x) - t = t + a[x] + x; -return t; -} - -We generate: -LBB1_1: #cond_true -movl %ecx, %esi -movl (%edx,%eax,4), %edi -movl %esi, %ecx -addl %edi, %ecx -addl %eax, %ecx -incl %eax -cmpl $40, %eax -jne LBB1_1 #cond_true - -GCC generates: - -L2: -addl(%ecx,%edx,4), %eax -addl%edx, %eax -addl$1, %edx -cmpl$40, %edx -jne L2 +int %foo(int* %a, int %t) { +entry: +br label %cond_true + +cond_true: ; preds = %cond_true, %entry +%x.0.0 = phi int [ 0, %entry ], [ %tmp9, %cond_true ] ; [#uses=3] +%t_addr.0.0 = phi int [ %t, %entry ], [ %tmp7, %cond_true ] ; [#uses=1] +%tmp2 = getelementptr int* %a, int %x.0.0 ; [#uses=1] +%tmp3 = load int* %tmp2 ; [#uses=1] +%tmp5 = add int %t_addr.0.0, %x.0.0 ; [#uses=1] +%tmp7 = add int %tmp5, %tmp3; [#uses=2] +%tmp9 = add int %x.0.0, 1 ; [#uses=2] +%tmp = setgt int %tmp9, 39 ; [#uses=1] +br bool %tmp, label %bb12, label %cond_true + +bb12: ; preds = %cond_true +ret int %tmp7 +} -Smells like a register coallescing/reassociation issue. +is pessimized by -loop-reduce and -indvars //===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.121 -> 1.122 --- Log message: New entry. --- Diffs of the changes: (+5 -0) README.txt |5 + 1 files changed, 5 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.121 llvm/lib/Target/X86/README.txt:1.122 --- llvm/lib/Target/X86/README.txt:1.121Fri Jul 21 18:07:23 2006 +++ llvm/lib/Target/X86/README.txt Wed Jul 26 16:49:52 2006 @@ -729,3 +729,8 @@ 002aflds(%esp,1) 002daddl$0x04,%esp 0030ret + +//===-===// + +When using fastcc abi, align stack slot of argument of type double on 8 byte +boundary to improve performance. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.120 -> 1.121 --- Log message: Done. --- Diffs of the changes: (+0 -5) README.txt |5 - 1 files changed, 5 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.120 llvm/lib/Target/X86/README.txt:1.121 --- llvm/lib/Target/X86/README.txt:1.120Wed Jul 19 16:29:30 2006 +++ llvm/lib/Target/X86/README.txt Fri Jul 21 18:07:23 2006 @@ -707,11 +707,6 @@ //===-===// -JIT should resolve __cxa_atexit on Mac OS X. In a non-jit environment, the -symbol is a dynamically resolved by the linker. - -//===-===// - u32 to float conversion improvement: float uint32_2_float( unsigned u ) { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.119 -> 1.120 --- Log message: New entry. --- Diffs of the changes: (+25 -0) README.txt | 25 + 1 files changed, 25 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.119 llvm/lib/Target/X86/README.txt:1.120 --- llvm/lib/Target/X86/README.txt:1.119Wed Jul 19 01:06:24 2006 +++ llvm/lib/Target/X86/README.txt Wed Jul 19 16:29:30 2006 @@ -709,3 +709,28 @@ JIT should resolve __cxa_atexit on Mac OS X. In a non-jit environment, the symbol is a dynamically resolved by the linker. + +//===-===// + +u32 to float conversion improvement: + +float uint32_2_float( unsigned u ) { + float fl = (int) (u & 0x); + float fh = (int) (u >> 16); + fh *= 0x1.0p16f; + return fh + fl; +} + +subl$0x04,%esp +0003movl0x08(%esp,1),%eax +0007movl%eax,%ecx +0009shrl$0x10,%ecx +000ccvtsi2ss%ecx,%xmm0 +0010andl$0x,%eax +0015cvtsi2ss%eax,%xmm1 +0019mulss 0x0078,%xmm0 +0021addss %xmm1,%xmm0 +0025movss %xmm0,(%esp,1) +002aflds(%esp,1) +002daddl$0x04,%esp +0030ret ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.118 -> 1.119 --- Log message: Misc. new entry. --- Diffs of the changes: (+5 -0) README.txt |5 + 1 files changed, 5 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.118 llvm/lib/Target/X86/README.txt:1.119 --- llvm/lib/Target/X86/README.txt:1.118Fri Jun 16 19:45:49 2006 +++ llvm/lib/Target/X86/README.txt Wed Jul 19 01:06:24 2006 @@ -704,3 +704,8 @@ //===-===// Use cpuid to auto-detect CPU features such as SSE, SSE2, and SSE3. + +//===-===// + +JIT should resolve __cxa_atexit on Mac OS X. In a non-jit environment, the +symbol is a dynamically resolved by the linker. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.117 -> 1.118 --- Log message: A new entry. --- Diffs of the changes: (+2 -0) README.txt |2 ++ 1 files changed, 2 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.117 llvm/lib/Target/X86/README.txt:1.118 --- llvm/lib/Target/X86/README.txt:1.117Thu Jun 15 16:33:31 2006 +++ llvm/lib/Target/X86/README.txt Fri Jun 16 19:45:49 2006 @@ -702,3 +702,5 @@ Smells like a register coallescing/reassociation issue. //===-===// + +Use cpuid to auto-detect CPU features such as SSE, SSE2, and SSE3. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.116 -> 1.117 --- Log message: Add a note that Nate noticed. --- Diffs of the changes: (+34 -0) README.txt | 34 ++ 1 files changed, 34 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.116 llvm/lib/Target/X86/README.txt:1.117 --- llvm/lib/Target/X86/README.txt:1.116Sun Jun 4 04:08:00 2006 +++ llvm/lib/Target/X86/README.txt Thu Jun 15 16:33:31 2006 @@ -668,3 +668,37 @@ //===-===// We should handle __attribute__ ((__visibility__ ("hidden"))). + +//===-===// + +Consider: +int foo(int *a, int t) { +int x; +for (x=0; x<40; ++x) + t = t + a[x] + x; +return t; +} + +We generate: +LBB1_1: #cond_true +movl %ecx, %esi +movl (%edx,%eax,4), %edi +movl %esi, %ecx +addl %edi, %ecx +addl %eax, %ecx +incl %eax +cmpl $40, %eax +jne LBB1_1 #cond_true + +GCC generates: + +L2: +addl(%ecx,%edx,4), %eax +addl%edx, %eax +addl$1, %edx +cmpl$40, %edx +jne L2 + +Smells like a register coallescing/reassociation issue. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.115 -> 1.116 --- Log message: A few new entries. --- Diffs of the changes: (+19 -0) README.txt | 19 +++ 1 files changed, 19 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.115 llvm/lib/Target/X86/README.txt:1.116 --- llvm/lib/Target/X86/README.txt:1.115Tue May 30 02:37:37 2006 +++ llvm/lib/Target/X86/README.txt Sun Jun 4 04:08:00 2006 @@ -538,6 +538,9 @@ sarl $24, %eax ret +SIGN_EXTEND_INREG can be implemented as (sext (trunc)) to take advantage of +sub-registers. + //===-===// Consider this: @@ -649,3 +652,19 @@ However, if we care more about code size, then imull is better. It's two bytes shorter than movl + leal. + +//===-===// + +Implement CTTZ, CTLZ with bsf and bsr. + +//===-===// + +It appears gcc place string data with linkonce linkage in +.section __TEXT,__const_coal,coalesced instead of +.section __DATA,__const_coal,coalesced. +Take a look at darwin.h, there are other Darwin assembler directives that we +do not make use of. + +//===-===// + +We should handle __attribute__ ((__visibility__ ("hidden"))). ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.114 -> 1.115 --- Log message: Add a note about integer multiplication by constants. --- Diffs of the changes: (+27 -0) README.txt | 27 +++ 1 files changed, 27 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.114 llvm/lib/Target/X86/README.txt:1.115 --- llvm/lib/Target/X86/README.txt:1.114Tue May 30 01:23:50 2006 +++ llvm/lib/Target/X86/README.txt Tue May 30 02:37:37 2006 @@ -622,3 +622,30 @@ operand? i.e. Print as 32-bit super-class register / 16-bit sub-class register. Do this for the cases where a truncate / anyext is guaranteed to be eliminated. For IA32 that is truncate from 32 to 16 and anyext from 16 to 32. + +//===-===// + +For this: + +int test(int a) +{ + return a * 3; +} + +We currently emits + imull $3, 4(%esp), %eax + +Perhaps this is what we really should generate is? Is imull three or four +cycles? Note: ICC generates this: + movl4(%esp), %eax + leal(%eax,%eax,2), %eax + +The current instruction priority is based on pattern complexity. The former is +more "complex" because it folds a load so the latter will not be emitted. + +Perhaps we should use AddedComplexity to give LEA32r a higher priority? We +should always try to match LEA first since the LEA matching code does some +estimate to determine whether the match is profitable. + +However, if we care more about code size, then imull is better. It's two bytes +shorter than movl + leal. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.113 -> 1.114 --- Log message: Some new entries about truncate / anyext --- Diffs of the changes: (+16 -0) README.txt | 16 1 files changed, 16 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.113 llvm/lib/Target/X86/README.txt:1.114 --- llvm/lib/Target/X86/README.txt:1.113Mon May 22 00:54:49 2006 +++ llvm/lib/Target/X86/README.txt Tue May 30 01:23:50 2006 @@ -606,3 +606,19 @@ The current isel scheme will not allow the load to be folded in the call since the load's chain result is read by the callseq_start. + +//===-===// + +Don't forget to find a way to squash noop truncates in the JIT environment. + +//===-===// + +Implement anyext in the same manner as truncate that would allow them to be +eliminated. + +//===-===// + +How about implementing truncate / anyext as a property of machine instruction +operand? i.e. Print as 32-bit super-class register / 16-bit sub-class register. +Do this for the cases where a truncate / anyext is guaranteed to be eliminated. +For IA32 that is truncate from 32 to 16 and anyext from 16 to 32. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.111 -> 1.112 --- Log message: A new entry --- Diffs of the changes: (+10 -0) README.txt | 10 ++ 1 files changed, 10 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.111 llvm/lib/Target/X86/README.txt:1.112 --- llvm/lib/Target/X86/README.txt:1.111Fri May 19 15:55:31 2006 +++ llvm/lib/Target/X86/README.txt Sat May 20 02:44:53 2006 @@ -577,3 +577,13 @@ //===-===// +Some ideas for instruction selection code simplification: 1. A pre-pass to +determine which chain producing node can or cannot be folded. The generated +isel code would then use the information. 2. The same pre-pass can force +ordering of TokenFactor operands to allow load / store folding. 3. During isel, +instead of recursively going up the chain operand chain, mark the chain operand +as available and put it in some work list. Select other nodes in the normal +manner. The chain operands are selected after all other nodes are selected. Uses +of chain nodes are modified after instruction selection is completed. + + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.110 -> 1.111 --- Log message: Add a note --- Diffs of the changes: (+38 -0) README.txt | 38 ++ 1 files changed, 38 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.110 llvm/lib/Target/X86/README.txt:1.111 --- llvm/lib/Target/X86/README.txt:1.110Fri May 19 15:51:43 2006 +++ llvm/lib/Target/X86/README.txt Fri May 19 15:55:31 2006 @@ -539,3 +539,41 @@ ret //===-===// + +Consider this: + +typedef struct pair { float A, B; } pair; +void pairtest(pair P, float *FP) { +*FP = P.A+P.B; +} + +We currently generate this code with llvmgcc4: + +_pairtest: +subl $12, %esp +movl 20(%esp), %eax +movl %eax, 4(%esp) +movl 16(%esp), %eax +movl %eax, (%esp) +movss (%esp), %xmm0 +addss 4(%esp), %xmm0 +movl 24(%esp), %eax +movss %xmm0, (%eax) +addl $12, %esp +ret + +we should be able to generate: +_pairtest: +movss 4(%esp), %xmm0 +movl 12(%esp), %eax +addss 8(%esp), %xmm0 +movss %xmm0, (%eax) +ret + +The issue is that llvmgcc4 is forcing the struct to memory, then passing it as +integer chunks. It does this so that structs like {short,short} are passed in +a single 32-bit integer stack slot. We should handle the safe cases above much +nicer, while still handling the hard cases. + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.107 -> 1.108 --- Log message: Particularly ugly code. --- Diffs of the changes: (+14 -0) README.txt | 14 ++ 1 files changed, 14 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.107 llvm/lib/Target/X86/README.txt:1.108 --- llvm/lib/Target/X86/README.txt:1.107Thu May 18 12:38:16 2006 +++ llvm/lib/Target/X86/README.txt Fri May 19 14:41:33 2006 @@ -36,6 +36,20 @@ //===-===// +On darwin/x86, we should codegen: + +ret double 0.00e+00 + +as fld0/ret, not as: + +movl $0, 4(%esp) +movl $0, (%esp) +fldl (%esp) + ... +ret + +//===-===// + This should use fiadd on chips where it is profitable: double foo(double P, int *I) { return P+*I; } ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.106 -> 1.107 --- Log message: add a note --- Diffs of the changes: (+15 -0) README.txt | 15 +++ 1 files changed, 15 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.106 llvm/lib/Target/X86/README.txt:1.107 --- llvm/lib/Target/X86/README.txt:1.106Wed May 17 16:20:51 2006 +++ llvm/lib/Target/X86/README.txt Thu May 18 12:38:16 2006 @@ -380,6 +380,21 @@ This may just be a matter of using 'test' to write bigger patterns for X86cmp. +An important case is comparison against zero: + +if (X == 0) ... + +instead of: + + cmpl $0, %eax + je LBB4_2 #cond_next + +use: + test %eax, %eax + jz LBB4_2 + +which is smaller. + //===-===// SSE should implement 'select_cc' using 'emulated conditional moves' that use ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.105 -> 1.106 --- Log message: Another entry --- Diffs of the changes: (+9 -0) README.txt |9 + 1 files changed, 9 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.105 llvm/lib/Target/X86/README.txt:1.106 --- llvm/lib/Target/X86/README.txt:1.105Wed May 17 14:05:31 2006 +++ llvm/lib/Target/X86/README.txt Wed May 17 16:20:51 2006 @@ -1183,3 +1183,12 @@ shll $24, %eax sarl $24, %eax ret + +//===-===// + +Some useful information in the Apple Altivec / SSE Migration Guide: + +http://developer.apple.com/documentation/Performance/Conceptual/ +Accelerate_sse_migration/index.html + +e.g. SSE select using and, andnot, or. Various SSE compare translations. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.104 -> 1.105 --- Log message: Another entry --- Diffs of the changes: (+12 -0) README.txt | 12 1 files changed, 12 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.104 llvm/lib/Target/X86/README.txt:1.105 --- llvm/lib/Target/X86/README.txt:1.104Tue May 9 01:54:05 2006 +++ llvm/lib/Target/X86/README.txt Wed May 17 14:05:31 2006 @@ -1171,3 +1171,15 @@ ret or use pxor (to make a zero vector) and shuffle (to insert it). + +//===-===// + +Bad codegen: + +char foo(int x) { return x; } + +_foo: + movl 4(%esp), %eax + shll $24, %eax + sarl $24, %eax + ret ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.103 -> 1.104 --- Log message: Remove a completed entry. --- Diffs of the changes: (+0 -42) README.txt | 42 -- 1 files changed, 42 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.103 llvm/lib/Target/X86/README.txt:1.104 --- llvm/lib/Target/X86/README.txt:1.103Mon May 8 16:39:45 2006 +++ llvm/lib/Target/X86/README.txt Tue May 9 01:54:05 2006 @@ -1126,48 +1126,6 @@ //===-===// -This testcase: - -%G1 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] -%G2 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] -%G3 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] -%G4 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] - -implementation ; Functions: - -void %test() { -%tmp = load <4 x float>* %G1; <<4 x float>> [#uses=2] -%tmp2 = load <4 x float>* %G2 ; <<4 x float>> [#uses=2] -%tmp135 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 0, uint 4, uint 1, uint 5 >; <<4 x float>> [#uses=1] -store <4 x float> %tmp135, <4 x float>* %G3 -%tmp293 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 1, uint undef, uint 3, uint 4 >; <<4 x float>> [#uses=1] -store <4 x float> %tmp293, <4 x float>* %G4 -ret void -} - -Compiles (llc -march=x86 -mcpu=yonah -relocation-model=static) to: - -_test: -movaps _G2, %xmm0 -movaps _G1, %xmm1 -movaps %xmm1, %xmm2 -2) shufps $3, %xmm0, %xmm2 -movaps %xmm1, %xmm3 -2) shufps $1, %xmm0, %xmm3 -1) unpcklps %xmm0, %xmm1 -2) shufps $128, %xmm2, %xmm3 -1) movaps %xmm1, _G3 -movaps %xmm3, _G4 -ret - -The 1) marked instructions could be scheduled better for reduced register -pressure. The scheduling issue is more pronounced without -static. - -The 2) marked instructions are the lowered form of the 1,undef,3,4 -shufflevector. It seems that there should be a better way to do it :) - -//===-===// - If shorter, we should use things like: movzwl %ax, %eax instead of: ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.102 -> 1.103 --- Log message: Another bad case I noticed --- Diffs of the changes: (+37 -0) README.txt | 37 + 1 files changed, 37 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.102 llvm/lib/Target/X86/README.txt:1.103 --- llvm/lib/Target/X86/README.txt:1.102Mon May 8 16:24:21 2006 +++ llvm/lib/Target/X86/README.txt Mon May 8 16:39:45 2006 @@ -1176,3 +1176,40 @@ The former can also be used when the two-addressy nature of the 'and' would require a copy to be inserted (in X86InstrInfo::convertToThreeAddress). +//===-===// + +This code generates ugly code, probably due to costs being off or something: + +void %test(float* %P, <4 x float>* %P2 ) { +%xFloat0.688 = load float* %P +%loadVector37.712 = load <4 x float>* %P2 +%inFloat3.713 = insertelement <4 x float> %loadVector37.712, float 0.00e+00, uint 3 +store <4 x float> %inFloat3.713, <4 x float>* %P2 +ret void +} + +Generates: + +_test: +pxor %xmm0, %xmm0 +movd %xmm0, %eax;; EAX = 0! +movl 8(%esp), %ecx +movaps (%ecx), %xmm0 +pinsrw $6, %eax, %xmm0 +shrl $16, %eax ;; EAX = 0 again! +pinsrw $7, %eax, %xmm0 +movaps %xmm0, (%ecx) +ret + +It would be better to generate: + +_test: +movl 8(%esp), %ecx +movaps (%ecx), %xmm0 + xor %eax, %eax +pinsrw $6, %eax, %xmm0 +pinsrw $7, %eax, %xmm0 +movaps %xmm0, (%ecx) +ret + +or use pxor (to make a zero vector) and shuffle (to insert it). ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.101 -> 1.102 --- Log message: add a note --- Diffs of the changes: (+9 -0) README.txt |9 + 1 files changed, 9 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.101 llvm/lib/Target/X86/README.txt:1.102 --- llvm/lib/Target/X86/README.txt:1.101Tue May 2 17:43:31 2006 +++ llvm/lib/Target/X86/README.txt Mon May 8 16:24:21 2006 @@ -1166,4 +1166,13 @@ The 2) marked instructions are the lowered form of the 1,undef,3,4 shufflevector. It seems that there should be a better way to do it :) +//===-===// + +If shorter, we should use things like: +movzwl %ax, %eax +instead of: +andl $65535, %EAX + +The former can also be used when the two-addressy nature of the 'and' would +require a copy to be inserted (in X86InstrInfo::convertToThreeAddress). ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.100 -> 1.101 --- Log message: Remove some stuff from the README --- Diffs of the changes: (+0 -21) README.txt | 21 - 1 files changed, 21 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.100 llvm/lib/Target/X86/README.txt:1.101 --- llvm/lib/Target/X86/README.txt:1.100Thu Apr 27 16:40:57 2006 +++ llvm/lib/Target/X86/README.txt Tue May 2 17:43:31 2006 @@ -1126,27 +1126,6 @@ //===-===// -typedef short v8i16 __attribute__ ((__vector_size__ (16))); -v8i16 test(v8i16 x, v8i16 y) { - return x + y; -} - -compiles to - -_test: - paddw %xmm0, %xmm1 - movaps %xmm1, %xmm0 - ret - -It should be - - paddw %xmm1, %xmm0 - ret - -since paddw is commutative. - -//===-===// - This testcase: %G1 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.99 -> 1.100 --- Log message: Add a note --- Diffs of the changes: (+44 -0) README.txt | 44 1 files changed, 44 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.99 llvm/lib/Target/X86/README.txt:1.100 --- llvm/lib/Target/X86/README.txt:1.99 Thu Apr 27 03:31:33 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 27 16:40:57 2006 @@ -1144,3 +1144,47 @@ ret since paddw is commutative. + +//===-===// + +This testcase: + +%G1 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G2 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G3 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] +%G4 = weak global <4 x float> zeroinitializer ; <<4 x float>*> [#uses=1] + +implementation ; Functions: + +void %test() { +%tmp = load <4 x float>* %G1; <<4 x float>> [#uses=2] +%tmp2 = load <4 x float>* %G2 ; <<4 x float>> [#uses=2] +%tmp135 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 0, uint 4, uint 1, uint 5 >; <<4 x float>> [#uses=1] +store <4 x float> %tmp135, <4 x float>* %G3 +%tmp293 = shufflevector <4 x float> %tmp, <4 x float> %tmp2, <4 x uint> < uint 1, uint undef, uint 3, uint 4 >; <<4 x float>> [#uses=1] +store <4 x float> %tmp293, <4 x float>* %G4 +ret void +} + +Compiles (llc -march=x86 -mcpu=yonah -relocation-model=static) to: + +_test: +movaps _G2, %xmm0 +movaps _G1, %xmm1 +movaps %xmm1, %xmm2 +2) shufps $3, %xmm0, %xmm2 +movaps %xmm1, %xmm3 +2) shufps $1, %xmm0, %xmm3 +1) unpcklps %xmm0, %xmm1 +2) shufps $128, %xmm2, %xmm3 +1) movaps %xmm1, _G3 +movaps %xmm3, _G4 +ret + +The 1) marked instructions could be scheduled better for reduced register +pressure. The scheduling issue is more pronounced without -static. + +The 2) marked instructions are the lowered form of the 1,undef,3,4 +shufflevector. It seems that there should be a better way to do it :) + + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.98 -> 1.99 --- Log message: A couple of new entries. --- Diffs of the changes: (+37 -0) README.txt | 37 + 1 files changed, 37 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.98 llvm/lib/Target/X86/README.txt:1.99 --- llvm/lib/Target/X86/README.txt:1.98 Mon Apr 24 18:30:10 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 27 03:31:33 2006 @@ -1107,3 +1107,40 @@ So icc is smart enough to know that B is in memory so it doesn't load it and store it back to stack. + +//===-===// + +__m128d test1( __m128d A, __m128d B) { + return _mm_shuffle_pd(A, B, 0x3); +} + +compiles to + +shufpd $3, %xmm1, %xmm0 + +Perhaps it's better to use unpckhpd instead? + +unpckhpd %xmm1, %xmm0 + +Don't know if unpckhpd is faster. But it is shorter. + +//===-===// + +typedef short v8i16 __attribute__ ((__vector_size__ (16))); +v8i16 test(v8i16 x, v8i16 y) { + return x + y; +} + +compiles to + +_test: + paddw %xmm0, %xmm1 + movaps %xmm1, %xmm0 + ret + +It should be + + paddw %xmm1, %xmm0 + ret + +since paddw is commutative. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.97 -> 1.98 --- Log message: Add a new entry. --- Diffs of the changes: (+32 -0) README.txt | 32 1 files changed, 32 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.97 llvm/lib/Target/X86/README.txt:1.98 --- llvm/lib/Target/X86/README.txt:1.97 Mon Apr 24 12:38:16 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 24 18:30:10 2006 @@ -1075,3 +1075,35 @@ There is also one case we do worse on PPC. //===-===// + +For this: + +#include +void test(__m128d *r, __m128d *A, double B) { + *r = _mm_loadl_pd(*A, &B); +} + +We generates: + + subl $12, %esp + movsd 24(%esp), %xmm0 + movsd %xmm0, (%esp) + movl 20(%esp), %eax + movapd (%eax), %xmm0 + movlpd (%esp), %xmm0 + movl 16(%esp), %eax + movapd %xmm0, (%eax) + addl $12, %esp + ret + +icc generates: + +movl 4(%esp), %edx #3.6 +movl 8(%esp), %eax #3.6 +movapd(%eax), %xmm0 #4.22 +movlpd12(%esp), %xmm0 #4.8 +movapd%xmm0, (%edx) #4.3 +ret #5.1 + +So icc is smart enough to know that B is in memory so it doesn't load it and +store it back to stack. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.96 -> 1.97 --- Log message: Remove a completed entry. --- Diffs of the changes: (+0 -55) README.txt | 55 --- 1 files changed, 55 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.96 llvm/lib/Target/X86/README.txt:1.97 --- llvm/lib/Target/X86/README.txt:1.96 Sun Apr 23 14:47:09 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 24 12:38:16 2006 @@ -999,61 +999,6 @@ //===-===// -Use the 0's in the top part of movss from memory (and from other instructions -that generate them) to build vectors more efficiently. Consider: - -vector float test(float a) { - return (vector float){ 0.0, a, 0.0, 0.0}; -} - -We currently generate this as: - -_test: -sub %ESP, 28 -movss %XMM0, DWORD PTR [%ESP + 32] -movss DWORD PTR [%ESP + 4], %XMM0 -mov DWORD PTR [%ESP + 12], 0 -mov DWORD PTR [%ESP + 8], 0 -mov DWORD PTR [%ESP], 0 -movaps %XMM0, XMMWORD PTR [%ESP] -add %ESP, 28 -ret - -Something like this should be sufficient: - -_test: - movss %XMM0, DWORD PTR [%ESP + 4] - shufps %XMM0, %XMM0, 81 - ret - -... which takes advantage of the zero elements provided by movss. -Even xoring a register and shufps'ing IT would be better than the -above code. - -Likewise, for this: - -vector float test(float a, float b) { - return (vector float){ b, a, 0.0, 0.0}; -} - -_test: -pxor %XMM0, %XMM0 -movss %XMM1, %XMM0 -movss %XMM2, DWORD PTR [%ESP + 4] -unpcklps %XMM2, %XMM1 -movss %XMM0, DWORD PTR [%ESP + 8] -unpcklps %XMM0, %XMM1 -unpcklps %XMM0, %XMM2 -ret - -... where we do use pxor, it would be better to use the zero'd -elements that movss provides to turn this into 2 shufps's instead -of 3 unpcklps's. - -Another example: {0.0, 0.0, a, b } - -//===-===// - Consider: __m128 test(float a) { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.95 -> 1.96 --- Log message: Add a note --- Diffs of the changes: (+7 -0) README.txt |7 +++ 1 files changed, 7 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.95 llvm/lib/Target/X86/README.txt:1.96 --- llvm/lib/Target/X86/README.txt:1.95 Fri Apr 21 16:05:22 2006 +++ llvm/lib/Target/X86/README.txt Sun Apr 23 14:47:09 2006 @@ -1123,3 +1123,10 @@ //===-===// +We generate significantly worse code for this than GCC: +http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21150 +http://gcc.gnu.org/bugzilla/attachment.cgi?id=8701 + +There is also one case we do worse on PPC. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.94 -> 1.95 --- Log message: fix thinko --- Diffs of the changes: (+2 -2) README.txt |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.94 llvm/lib/Target/X86/README.txt:1.95 --- llvm/lib/Target/X86/README.txt:1.94 Fri Apr 21 16:03:21 2006 +++ llvm/lib/Target/X86/README.txt Fri Apr 21 16:05:22 2006 @@ -1068,8 +1068,8 @@ movss %xmm1, %xmm0 ret -Because mulss multiplies 0*0 = 0.0, the top elements of xmm1 are already zerod. -We could compile this to: +Because mulss doesn't modify the top 3 elements, the top elements of +xmm1 are already zero'd. We could compile this to: movss 4(%esp), %xmm0 mulss %xmm0, %xmm0 ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.93 -> 1.94 --- Log message: add some low-prio notes --- Diffs of the changes: (+69 -0) README.txt | 69 + 1 files changed, 69 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.93 llvm/lib/Target/X86/README.txt:1.94 --- llvm/lib/Target/X86/README.txt:1.93 Wed Apr 19 00:53:27 2006 +++ llvm/lib/Target/X86/README.txt Fri Apr 21 16:03:21 2006 @@ -1054,3 +1054,72 @@ //===-===// +Consider: + +__m128 test(float a) { + return _mm_set_ps(0.0, 0.0, 0.0, a*a); +} + +This compiles into: + +movss 4(%esp), %xmm1 +mulss %xmm1, %xmm1 +xorps %xmm0, %xmm0 +movss %xmm1, %xmm0 +ret + +Because mulss multiplies 0*0 = 0.0, the top elements of xmm1 are already zerod. +We could compile this to: + +movss 4(%esp), %xmm0 +mulss %xmm0, %xmm0 +ret + +//===-===// + +Here's a sick and twisted idea. Consider code like this: + +__m128 test(__m128 a) { + float b = *(float*)&A; + ... + return _mm_set_ps(0.0, 0.0, 0.0, b); +} + +This might compile to this code: + +movaps c(%esp), %xmm1 +xorps %xmm0, %xmm0 +movss %xmm1, %xmm0 +ret + +Now consider if the ... code caused xmm1 to get spilled. This might produce +this code: + +movaps c(%esp), %xmm1 +movaps %xmm1, c2(%esp) +... + +xorps %xmm0, %xmm0 +movaps c2(%esp), %xmm1 +movss %xmm1, %xmm0 +ret + +However, since the reload is only used by these instructions, we could +"fold" it into the uses, producing something like this: + +movaps c(%esp), %xmm1 +movaps %xmm1, c2(%esp) +... + +movss c2(%esp), %xmm0 +ret + +... saving two instructions. + +The basic idea is that a reload from a spill slot, can, if only one 4-byte +chunk is used, bring in 3 zeros the the one element instead of 4 elements. +This can be used to simplify a variety of shuffle operations, where the +elements are fixed zeros. + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.92 -> 1.93 --- Log message: Add a note. --- Diffs of the changes: (+58 -0) README.txt | 58 ++ 1 files changed, 58 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.92 llvm/lib/Target/X86/README.txt:1.93 --- llvm/lib/Target/X86/README.txt:1.92 Mon Apr 17 22:45:01 2006 +++ llvm/lib/Target/X86/README.txt Wed Apr 19 00:53:27 2006 @@ -996,3 +996,61 @@ movaps %xmm3, %xmm2 movaps %xmm4, %xmm3 jne LBB_main_4 # cond_true44 + +//===-===// + +Use the 0's in the top part of movss from memory (and from other instructions +that generate them) to build vectors more efficiently. Consider: + +vector float test(float a) { + return (vector float){ 0.0, a, 0.0, 0.0}; +} + +We currently generate this as: + +_test: +sub %ESP, 28 +movss %XMM0, DWORD PTR [%ESP + 32] +movss DWORD PTR [%ESP + 4], %XMM0 +mov DWORD PTR [%ESP + 12], 0 +mov DWORD PTR [%ESP + 8], 0 +mov DWORD PTR [%ESP], 0 +movaps %XMM0, XMMWORD PTR [%ESP] +add %ESP, 28 +ret + +Something like this should be sufficient: + +_test: + movss %XMM0, DWORD PTR [%ESP + 4] + shufps %XMM0, %XMM0, 81 + ret + +... which takes advantage of the zero elements provided by movss. +Even xoring a register and shufps'ing IT would be better than the +above code. + +Likewise, for this: + +vector float test(float a, float b) { + return (vector float){ b, a, 0.0, 0.0}; +} + +_test: +pxor %XMM0, %XMM0 +movss %XMM1, %XMM0 +movss %XMM2, DWORD PTR [%ESP + 4] +unpcklps %XMM2, %XMM1 +movss %XMM0, DWORD PTR [%ESP + 8] +unpcklps %XMM0, %XMM1 +unpcklps %XMM0, %XMM2 +ret + +... where we do use pxor, it would be better to use the zero'd +elements that movss provides to turn this into 2 shufps's instead +of 3 unpcklps's. + +Another example: {0.0, 0.0, a, b } + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.91 -> 1.92 --- Log message: Correct comments --- Diffs of the changes: (+6 -6) README.txt | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.91 llvm/lib/Target/X86/README.txt:1.92 --- llvm/lib/Target/X86/README.txt:1.91 Mon Apr 17 20:22:57 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 17 22:45:01 2006 @@ -982,17 +982,17 @@ jne LBB_main_4 # cond_true44 There are two problems. 1) No need to two loop induction variables. We can -compare against 262144 * 16. 2) Poor register allocation decisions. We should +compare against 262144 * 16. 2) Known register coalescer issue. We should be able eliminate one of the movaps: - addps %xmm1, %xmm2 - subps %xmm3, %xmm2 + addps %xmm2, %xmm1<=== Commute! + subps %xmm3, %xmm1 movaps (%ecx), %xmm4 - movaps %xmm2, %xmm2 <=== Eliminate! - addps %xmm4, %xmm2 + movaps %xmm1, %xmm1 <=== Eliminate! + addps %xmm4, %xmm1 addl $16, %ecx incl %edx cmpl $262144, %edx - movaps %xmm3, %xmm1 + movaps %xmm3, %xmm2 movaps %xmm4, %xmm3 jne LBB_main_4 # cond_true44 ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.90 -> 1.91 --- Log message: Another entry --- Diffs of the changes: (+35 -0) README.txt | 35 +++ 1 files changed, 35 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.90 llvm/lib/Target/X86/README.txt:1.91 --- llvm/lib/Target/X86/README.txt:1.90 Mon Apr 17 19:21:01 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 17 20:22:57 2006 @@ -961,3 +961,38 @@ to three-address transformation. It also exposes some other problems. See MOV32ri -3 and the spills. + +//===-===// + +http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25500 + +LLVM is producing bad code. + +LBB_main_4:# cond_true44 + addps %xmm1, %xmm2 + subps %xmm3, %xmm2 + movaps (%ecx), %xmm4 + movaps %xmm2, %xmm1 + addps %xmm4, %xmm1 + addl $16, %ecx + incl %edx + cmpl $262144, %edx + movaps %xmm3, %xmm2 + movaps %xmm4, %xmm3 + jne LBB_main_4 # cond_true44 + +There are two problems. 1) No need to two loop induction variables. We can +compare against 262144 * 16. 2) Poor register allocation decisions. We should +be able eliminate one of the movaps: + + addps %xmm1, %xmm2 + subps %xmm3, %xmm2 + movaps (%ecx), %xmm4 + movaps %xmm2, %xmm2 <=== Eliminate! + addps %xmm4, %xmm2 + addl $16, %ecx + incl %edx + cmpl $262144, %edx + movaps %xmm3, %xmm1 + movaps %xmm4, %xmm3 + jne LBB_main_4 # cond_true44 ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.89 -> 1.90 --- Log message: Another entry. --- Diffs of the changes: (+151 -0) README.txt | 151 + 1 files changed, 151 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.89 llvm/lib/Target/X86/README.txt:1.90 --- llvm/lib/Target/X86/README.txt:1.89 Sat Apr 15 00:37:34 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 17 19:21:01 2006 @@ -810,3 +810,154 @@ How about andps, andpd, and pand? Do we really care about the type of the packed elements? If not, why not always use the "ps" variants which are likely to be shorter. + +//===-===// + +We are emitting bad code for this: + +float %test(float* %V, int %I, int %D, float %V) { +entry: + %tmp = seteq int %D, 0 + br bool %tmp, label %cond_true, label %cond_false23 + +cond_true: + %tmp3 = getelementptr float* %V, int %I + %tmp = load float* %tmp3 + %tmp5 = setgt float %tmp, %V + %tmp6 = tail call bool %llvm.isunordered.f32( float %tmp, float %V ) + %tmp7 = or bool %tmp5, %tmp6 + br bool %tmp7, label %UnifiedReturnBlock, label %cond_next + +cond_next: + %tmp10 = add int %I, 1 + %tmp12 = getelementptr float* %V, int %tmp10 + %tmp13 = load float* %tmp12 + %tmp15 = setle float %tmp13, %V + %tmp16 = tail call bool %llvm.isunordered.f32( float %tmp13, float %V ) + %tmp17 = or bool %tmp15, %tmp16 + %retval = select bool %tmp17, float 0.00e+00, float 1.00e+00 + ret float %retval + +cond_false23: + %tmp28 = tail call float %foo( float* %V, int %I, int %D, float %V ) + ret float %tmp28 + +UnifiedReturnBlock:; preds = %cond_true + ret float 0.00e+00 +} + +declare bool %llvm.isunordered.f32(float, float) + +declare float %foo(float*, int, int, float) + + +It exposes a known load folding problem: + + movss (%edx,%ecx,4), %xmm1 + ucomiss %xmm1, %xmm0 + +As well as this: + +LBB_test_2:# cond_next + movss LCPI1_0, %xmm2 + pxor %xmm3, %xmm3 + ucomiss %xmm0, %xmm1 + jbe LBB_test_6 # cond_next +LBB_test_5:# cond_next + movaps %xmm2, %xmm3 +LBB_test_6:# cond_next + movss %xmm3, 40(%esp) + flds 40(%esp) + addl $44, %esp + ret + +Clearly it's unnecessary to clear %xmm3. It's also not clear why we are emitting +three moves (movss, movaps, movss). + +//===-===// + +External test Nurbs exposed some problems. Look for +__ZN15Nurbs_SSE_Cubic17TessellateSurfaceE, bb cond_next140. This is what icc +emits: + +movaps(%edx), %xmm2 #59.21 +movaps(%edx), %xmm5 #60.21 +movaps(%edx), %xmm4 #61.21 +movaps(%edx), %xmm3 #62.21 +movl 40(%ecx), %ebp#69.49 +shufps$0, %xmm2, %xmm5 #60.21 +movl 100(%esp), %ebx #69.20 +movl (%ebx), %edi #69.20 +imull %ebp, %edi#69.49 +addl (%eax), %edi #70.33 +shufps$85, %xmm2, %xmm4 #61.21 +shufps$170, %xmm2, %xmm3#62.21 +shufps$255, %xmm2, %xmm2#63.21 +lea (%ebp,%ebp,2), %ebx #69.49 +negl %ebx #69.49 +lea -3(%edi,%ebx), %ebx #70.33 +shll $4, %ebx #68.37 +addl 32(%ecx), %ebx#68.37 +testb $15, %bl #91.13 +jne L_B1.24 # Prob 5% #91.13 + +This is the llvm code after instruction scheduling: + +cond_next140 (0xa910740, LLVM BB @0xa90beb0): + %reg1078 = MOV32ri -3 + %reg1079 = ADD32rm %reg1078, %reg1068, 1, %NOREG, 0 + %reg1037 = MOV32rm %reg1024, 1, %NOREG, 40 + %reg1080 = IMUL32rr %reg1079, %reg1037 + %reg1081 = MOV32rm %reg1058, 1, %NOREG, 0 + %reg1038 = LEA32r %reg1081, 1, %reg1080, -3 + %reg1036 = MOV32rm %reg1024, 1, %NOREG, 32 + %reg1082 = SHL32ri %reg1038, 4 + %reg1039 = ADD32rr %reg1036, %reg1082 + %reg1083 = MOVAPSrm %reg1059, 1, %NOREG, 0 + %reg1034 = SHUFPSrr %reg1083, %reg1083, 170 + %reg1032 = SHUFPSrr %reg1083, %reg1083, 0 + %reg1035 = SHUFPSrr %reg1083, %reg1083, 255 + %reg1033 = S
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ISelLowering.cpp X86InstrSSE.td
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.88 -> 1.89 X86ISelLowering.cpp updated: 1.167 -> 1.168 X86InstrSSE.td updated: 1.91 -> 1.92 --- Log message: Silly bug --- Diffs of the changes: (+11 -18) README.txt |5 - X86ISelLowering.cpp | 22 ++ X86InstrSSE.td |2 +- 3 files changed, 11 insertions(+), 18 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.88 llvm/lib/Target/X86/README.txt:1.89 --- llvm/lib/Target/X86/README.txt:1.88 Fri Apr 14 02:24:04 2006 +++ llvm/lib/Target/X86/README.txt Sat Apr 15 00:37:34 2006 @@ -810,8 +810,3 @@ How about andps, andpd, and pand? Do we really care about the type of the packed elements? If not, why not always use the "ps" variants which are likely to be shorter. - -//===-===// - -Make sure XMM registers are spilled to 128-bit locations (if not already) and -add vector SSE opcodes to X86RegisterInfo::foldMemoryOperand(). Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.167 llvm/lib/Target/X86/X86ISelLowering.cpp:1.168 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.167 Fri Apr 14 22:13:24 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Sat Apr 15 00:37:34 2006 @@ -1724,27 +1724,26 @@ return false; // Expect 1, 1, 3, 3 - unsigned NumNodes = 0; for (unsigned i = 0; i < 2; ++i) { SDOperand Arg = N->getOperand(i); if (Arg.getOpcode() == ISD::UNDEF) continue; assert(isa(Arg) && "Invalid VECTOR_SHUFFLE mask!"); unsigned Val = cast(Arg)->getValue(); if (Val != 1) return false; -NumNodes++; } + + bool HasHi = false; for (unsigned i = 2; i < 4; ++i) { SDOperand Arg = N->getOperand(i); if (Arg.getOpcode() == ISD::UNDEF) continue; assert(isa(Arg) && "Invalid VECTOR_SHUFFLE mask!"); unsigned Val = cast(Arg)->getValue(); if (Val != 3) return false; -NumNodes++; +HasHi = true; } - // Don't use movshdup if the resulting vector contains only one undef node. - // Use {p}shuf* instead. - return NumNodes > 1; + // Don't use movshdup if it can be done with a shufps. + return HasHi; } /// isMOVSLDUPMask - Return true if the specified VECTOR_SHUFFLE operand @@ -1756,27 +1755,26 @@ return false; // Expect 0, 0, 2, 2 - unsigned NumNodes = 0; for (unsigned i = 0; i < 2; ++i) { SDOperand Arg = N->getOperand(i); if (Arg.getOpcode() == ISD::UNDEF) continue; assert(isa(Arg) && "Invalid VECTOR_SHUFFLE mask!"); unsigned Val = cast(Arg)->getValue(); if (Val != 0) return false; -NumNodes++; } + + bool HasHi = false; for (unsigned i = 2; i < 4; ++i) { SDOperand Arg = N->getOperand(i); if (Arg.getOpcode() == ISD::UNDEF) continue; assert(isa(Arg) && "Invalid VECTOR_SHUFFLE mask!"); unsigned Val = cast(Arg)->getValue(); if (Val != 2) return false; -NumNodes++; +HasHi = true; } - // Don't use movsldup if the resulting vector contains only one undef node. - // Use {p}shuf* instead. - return NumNodes > 1; + // Don't use movshdup if it can be done with a shufps. + return HasHi; } /// isSplatMask - Return true if the specified VECTOR_SHUFFLE operand specifies Index: llvm/lib/Target/X86/X86InstrSSE.td diff -u llvm/lib/Target/X86/X86InstrSSE.td:1.91 llvm/lib/Target/X86/X86InstrSSE.td:1.92 --- llvm/lib/Target/X86/X86InstrSSE.td:1.91 Fri Apr 14 18:32:40 2006 +++ llvm/lib/Target/X86/X86InstrSSE.td Sat Apr 15 00:37:34 2006 @@ -169,7 +169,7 @@ // PDIi8 - SSE2 instructions with ImmT == Imm8 and TB and OpSize prefixes. // S3I - SSE3 instructions with TB and OpSize prefixes. // S3SI - SSE3 instructions with XS prefix. -// S3SI - SSE3 instructions with XD prefix. +// S3DI - SSE3 instructions with XD prefix. class SSI o, Format F, dag ops, string asm, list pattern> : I, XS, Requires<[HasSSE1]>; class SDI o, Format F, dag ops, string asm, list pattern> ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.87 -> 1.88 --- Log message: New entry --- Diffs of the changes: (+5 -0) README.txt |5 + 1 files changed, 5 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.87 llvm/lib/Target/X86/README.txt:1.88 --- llvm/lib/Target/X86/README.txt:1.87 Thu Apr 13 00:09:45 2006 +++ llvm/lib/Target/X86/README.txt Fri Apr 14 02:24:04 2006 @@ -810,3 +810,8 @@ How about andps, andpd, and pand? Do we really care about the type of the packed elements? If not, why not always use the "ps" variants which are likely to be shorter. + +//===-===// + +Make sure XMM registers are spilled to 128-bit locations (if not already) and +add vector SSE opcodes to X86RegisterInfo::foldMemoryOperand(). ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.86 -> 1.87 --- Log message: Update --- Diffs of the changes: (+12 -0) README.txt | 12 1 files changed, 12 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.86 llvm/lib/Target/X86/README.txt:1.87 --- llvm/lib/Target/X86/README.txt:1.86 Wed Apr 12 16:21:57 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 13 00:09:45 2006 @@ -191,6 +191,18 @@ should be made smart enough to cannonicalize the load into the RHS of a compare when it can invert the result of the compare for free. +How about intrinsics? An example is: + *res = _mm_mulhi_epu16(*A, _mm_mul_epu32(*B, *C)); + +compiles to + pmuludq (%eax), %xmm0 + movl 8(%esp), %eax + movdqa (%eax), %xmm1 + pmulhuw %xmm0, %xmm1 + +The transformation probably requires a X86 specific pass or a DAG combiner +target specific hook. + //===-===// LSR should be turned on for the X86 backend and tuned to take advantage of its ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ISelLowering.cpp X86InstrSSE.td
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.85 -> 1.86 X86ISelLowering.cpp updated: 1.163 -> 1.164 X86InstrSSE.td updated: 1.79 -> 1.80 --- Log message: All "integer" logical ops (pand, por, pxor) are now promoted to v2i64. Clean up and fix various logical ops issues. --- Diffs of the changes: (+71 -146) README.txt |4 + X86ISelLowering.cpp | 45 - X86InstrSSE.td | 168 3 files changed, 71 insertions(+), 146 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.85 llvm/lib/Target/X86/README.txt:1.86 --- llvm/lib/Target/X86/README.txt:1.85 Mon Apr 10 16:51:03 2006 +++ llvm/lib/Target/X86/README.txt Wed Apr 12 16:21:57 2006 @@ -794,3 +794,7 @@ X86RegisterInfo::copyRegToReg() returns X86::MOVAPSrr for VR128. Is it possible to choose between movaps, movapd, and movdqa based on types of source and destination? + +How about andps, andpd, and pand? Do we really care about the type of the packed +elements? If not, why not always use the "ps" variants which are likely to be +shorter. Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.163 llvm/lib/Target/X86/X86ISelLowering.cpp:1.164 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.163 Wed Apr 12 12:12:36 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Wed Apr 12 16:21:57 2006 @@ -275,6 +275,9 @@ if (Subtarget->hasSSE1()) { addRegisterClass(MVT::v4f32, X86::VR128RegisterClass); +setOperationAction(ISD::AND,MVT::v4f32, Legal); +setOperationAction(ISD::OR, MVT::v4f32, Legal); +setOperationAction(ISD::XOR,MVT::v4f32, Legal); setOperationAction(ISD::ADD,MVT::v4f32, Legal); setOperationAction(ISD::SUB,MVT::v4f32, Legal); setOperationAction(ISD::MUL,MVT::v4f32, Legal); @@ -301,36 +304,43 @@ setOperationAction(ISD::SUB,MVT::v8i16, Legal); setOperationAction(ISD::SUB,MVT::v4i32, Legal); setOperationAction(ISD::MUL,MVT::v2f64, Legal); -setOperationAction(ISD::LOAD, MVT::v2f64, Legal); + setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Custom); setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Custom); +setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom); + +// Custom lower build_vector, vector_shuffle, and extract_vector_elt. +for (unsigned VT = (unsigned)MVT::v16i8; VT != (unsigned)MVT::v2i64; VT++) { + setOperationAction(ISD::BUILD_VECTOR,(MVT::ValueType)VT, Custom); + setOperationAction(ISD::VECTOR_SHUFFLE, (MVT::ValueType)VT, Custom); + setOperationAction(ISD::EXTRACT_VECTOR_ELT, (MVT::ValueType)VT, Custom); +} setOperationAction(ISD::BUILD_VECTOR, MVT::v2f64, Custom); -setOperationAction(ISD::BUILD_VECTOR, MVT::v16i8, Custom); -setOperationAction(ISD::BUILD_VECTOR, MVT::v8i16, Custom); -setOperationAction(ISD::BUILD_VECTOR, MVT::v4i32, Custom); setOperationAction(ISD::BUILD_VECTOR, MVT::v2i64, Custom); setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2f64, Custom); -setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v16i8, Custom); -setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v8i16, Custom); -setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v4i32, Custom); setOperationAction(ISD::VECTOR_SHUFFLE, MVT::v2i64, Custom); setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Custom); -setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v8i16, Custom); -setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4i32, Custom); -setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom); +setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2i64, Custom); -// Promote v16i8, v8i16, v4i32 selects to v2i64. Custom lower v2i64, v2f64, -// and v4f32 selects. -for (unsigned VT = (unsigned)MVT::v16i8; - VT != (unsigned)MVT::v2i64; VT++) { - setOperationAction(ISD::SELECT, (MVT::ValueType)VT, Promote); - AddPromotedToType (ISD::SELECT, (MVT::ValueType)VT, MVT::v2i64); +// Promote v16i8, v8i16, v4i32 load, select, and, or, xor to v2i64. +for (unsigned VT = (unsigned)MVT::v16i8; VT != (unsigned)MVT::v2i64; VT++) { + setOperationAction(ISD::AND,(MVT::ValueType)VT, Promote); + AddPromotedToType (ISD::AND,(MVT::ValueType)VT, MVT::v2i64); + setOperationAction(ISD::OR, (MVT::ValueType)VT, Promote); + AddPromotedToType (ISD::OR, (MVT::ValueType)VT, MVT::v2i64); + setOperationAction(ISD::XOR,(MVT::ValueType)VT, Promote); + AddPromotedToType (ISD::XOR,(MVT::ValueType)VT, MVT::v2i64); setOperationAction(ISD::LOAD, (MVT::ValueType)VT, Promote); AddPromotedToType (ISD::LOAD, (MVT::ValueTy
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.84 -> 1.85 --- Log message: add a note --- Diffs of the changes: (+23 -0) README.txt | 23 +++ 1 files changed, 23 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.84 llvm/lib/Target/X86/README.txt:1.85 --- llvm/lib/Target/X86/README.txt:1.84 Mon Apr 10 16:42:57 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 10 16:51:03 2006 @@ -675,6 +675,29 @@ //===-===// +Better codegen for: + +void f(float a, float b, vector float * out) { *out = (vector float){ a, 0.0, 0.0, b}; } +void f(float a, float b, vector float * out) { *out = (vector float){ a, b, 0.0, 0}; } + +For the later we generate: + +_f: +pxor %xmm0, %xmm0 +movss 8(%esp), %xmm1 +movaps %xmm0, %xmm2 +unpcklps %xmm1, %xmm2 +movss 4(%esp), %xmm1 +unpcklps %xmm0, %xmm1 +unpcklps %xmm2, %xmm1 +movl 12(%esp), %eax +movaps %xmm1, (%eax) +ret + +This seems like it should use shufps, one for each of a & b. + +//===-===// + Adding to the list of cmp / test poor codegen issues: int test(__m128 *A, __m128 *B) { ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.82 -> 1.83 --- Log message: Correct an entry --- Diffs of the changes: (+2 -2) README.txt |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.82 llvm/lib/Target/X86/README.txt:1.83 --- llvm/lib/Target/X86/README.txt:1.82 Mon Apr 10 02:22:03 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 10 16:41:39 2006 @@ -770,8 +770,8 @@ #include -void test(__m128 *res, __m128 *A) { - *res = _mm_shuffle_ps(*A, *A, 0xF0); +void test(__m128 *res, __m128 *A, __m128 *B) { + *res = _mm_shuffle_ps(*A, *B, 0xF0); } We should emit ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.83 -> 1.84 --- Log message: Remove an entry that is now done. --- Diffs of the changes: (+0 -13) README.txt | 13 - 1 files changed, 13 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.83 llvm/lib/Target/X86/README.txt:1.84 --- llvm/lib/Target/X86/README.txt:1.83 Mon Apr 10 16:41:39 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 10 16:42:57 2006 @@ -768,19 +768,6 @@ //===-===// -#include - -void test(__m128 *res, __m128 *A, __m128 *B) { - *res = _mm_shuffle_ps(*A, *B, 0xF0); -} - -We should emit - shufps $240, (%eax), %xmm0 -instead of - pshufd $240, (%eax), %xmm0 - -//===-===// - X86RegisterInfo::copyRegToReg() returns X86::MOVAPSrr for VR128. Is it possible to choose between movaps, movapd, and movdqa based on types of source and destination? ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.81 -> 1.82 --- Log message: New entries --- Diffs of the changes: (+19 -0) README.txt | 19 +++ 1 files changed, 19 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.81 llvm/lib/Target/X86/README.txt:1.82 --- llvm/lib/Target/X86/README.txt:1.81 Fri Apr 7 16:19:53 2006 +++ llvm/lib/Target/X86/README.txt Mon Apr 10 02:22:03 2006 @@ -765,3 +765,22 @@ A Mac OS X IA-32 specific ABI bug wrt returning value > 8 bytes: http://llvm.org/bugs/show_bug.cgi?id=729 + +//===-===// + +#include + +void test(__m128 *res, __m128 *A) { + *res = _mm_shuffle_ps(*A, *A, 0xF0); +} + +We should emit + shufps $240, (%eax), %xmm0 +instead of + pshufd $240, (%eax), %xmm0 + +//===-===// + +X86RegisterInfo::copyRegToReg() returns X86::MOVAPSrr for VR128. Is it possible +to choose between movaps, movapd, and movdqa based on types of source and +destination? ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.80 -> 1.81 --- Log message: Keep track of an Mac OS X / x86 ABI bug. --- Diffs of the changes: (+5 -0) README.txt |5 + 1 files changed, 5 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.80 llvm/lib/Target/X86/README.txt:1.81 --- llvm/lib/Target/X86/README.txt:1.80 Thu Apr 6 18:21:24 2006 +++ llvm/lib/Target/X86/README.txt Fri Apr 7 16:19:53 2006 @@ -760,3 +760,8 @@ movddup 8(%esp), %xmm0 movapd %xmm0, (%eax) ret + +//===-===// + +A Mac OS X IA-32 specific ABI bug wrt returning value > 8 bytes: +http://llvm.org/bugs/show_bug.cgi?id=729 ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.79 -> 1.80 --- Log message: New entries. --- Diffs of the changes: (+56 -0) README.txt | 56 1 files changed, 56 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.79 llvm/lib/Target/X86/README.txt:1.80 --- llvm/lib/Target/X86/README.txt:1.79 Wed Apr 5 18:46:04 2006 +++ llvm/lib/Target/X86/README.txt Thu Apr 6 18:21:24 2006 @@ -704,3 +704,59 @@ so a any extend (which becomes a zero extend) is added. We probably need some kind of target DAG combine hook to fix this. + +//===-===// + +How to decide when to use the "floating point version" of logical ops? Here are +some code fragments: + + movaps LCPI5_5, %xmm2 + divps %xmm1, %xmm2 + mulps %xmm2, %xmm3 + mulps 8656(%ecx), %xmm3 + addps 8672(%ecx), %xmm3 + andps LCPI5_6, %xmm2 + andps LCPI5_1, %xmm3 + por %xmm2, %xmm3 + movdqa %xmm3, (%edi) + + movaps LCPI5_5, %xmm1 + divps %xmm0, %xmm1 + mulps %xmm1, %xmm3 + mulps 8656(%ecx), %xmm3 + addps 8672(%ecx), %xmm3 + andps LCPI5_6, %xmm1 + andps LCPI5_1, %xmm3 + orps %xmm1, %xmm3 + movaps %xmm3, 112(%esp) + movaps %xmm3, (%ebx) + +Due to some minor source change, the later case ended up using orps and movaps +instead of por and movdqa. Does it matter? + +//===-===// + +Use movddup to splat a v2f64 directly from a memory source. e.g. + +#include + +void test(__m128d *r, double A) { + *r = _mm_set1_pd(A); +} + +llc: + +_test: + movsd 8(%esp), %xmm0 + unpcklpd %xmm0, %xmm0 + movl 4(%esp), %eax + movapd %xmm0, (%eax) + ret + +icc: + +_test: + movl 4(%esp), %eax + movddup 8(%esp), %xmm0 + movapd %xmm0, (%eax) + ret ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.78 -> 1.79 --- Log message: An entry about comi / ucomi intrinsics. --- Diffs of the changes: (+31 -0) README.txt | 31 +++ 1 files changed, 31 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.78 llvm/lib/Target/X86/README.txt:1.79 --- llvm/lib/Target/X86/README.txt:1.78 Tue Mar 28 21:03:46 2006 +++ llvm/lib/Target/X86/README.txt Wed Apr 5 18:46:04 2006 @@ -673,3 +673,34 @@ Better codegen for vector_shuffles like this { x, 0, 0, 0 } or { x, 0, x, 0}. Perhaps use pxor / xorp* to clear a XMM register first? +//===-===// + +Adding to the list of cmp / test poor codegen issues: + +int test(__m128 *A, __m128 *B) { + if (_mm_comige_ss(*A, *B)) +return 3; + else +return 4; +} + +_test: + movl 8(%esp), %eax + movaps (%eax), %xmm0 + movl 4(%esp), %eax + movaps (%eax), %xmm1 + comiss %xmm0, %xmm1 + setae %al + movzbl %al, %ecx + movl $3, %eax + movl $4, %edx + cmpl $0, %ecx + cmove %edx, %eax + ret + +Note the setae, movzbl, cmpl, cmove can be replaced with a single cmovae. There +are a number of issues. 1) We are introducing a setcc between the result of the +intrisic call and select. 2) The intrinsic is expected to produce a i32 value +so a any extend (which becomes a zero extend) is added. + +We probably need some kind of target DAG combine hook to fix this. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.77 -> 1.78 --- Log message: Another entry about shuffles. --- Diffs of the changes: (+6 -0) README.txt |6 ++ 1 files changed, 6 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.77 llvm/lib/Target/X86/README.txt:1.78 --- llvm/lib/Target/X86/README.txt:1.77 Tue Mar 28 00:55:45 2006 +++ llvm/lib/Target/X86/README.txt Tue Mar 28 21:03:46 2006 @@ -667,3 +667,9 @@ Use movhps to update upper 64-bits of a v4sf value. Also movlps on lower half of a v4sf value. + +//===-===// + +Better codegen for vector_shuffles like this { x, 0, 0, 0 } or { x, 0, x, 0}. +Perhaps use pxor / xorp* to clear a XMM register first? + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.76 -> 1.77 --- Log message: Update --- Diffs of the changes: (+2 -23) README.txt | 25 ++--- 1 files changed, 2 insertions(+), 23 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.76 llvm/lib/Target/X86/README.txt:1.77 --- llvm/lib/Target/X86/README.txt:1.76 Mon Mar 27 20:49:12 2006 +++ llvm/lib/Target/X86/README.txt Tue Mar 28 00:55:45 2006 @@ -665,26 +665,5 @@ //===-===// -Is it really a good idea to use movlhps to move 1 double-precision FP value from -low quadword of source to high quadword of destination? - -e.g. - -void test2 (v2sd *b, double X, double Y) { - v2sd a = (v2sd) {X, X*Y}; - *b = a; -} - - movsd 8(%esp), %xmm0 - movapd %xmm0, %xmm1 - mulsd 16(%esp), %xmm1 - movlhps %xmm1, %xmm0 - movl 4(%esp), %eax - movapd %xmm0, (%eax) - ret - -icc uses unpcklpd instead. - -//===-===// - -Use movhps and movlhps to update upper 64-bits of a v4sf value. +Use movhps to update upper 64-bits of a v4sf value. Also movlps on lower half +of a v4sf value. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.75 -> 1.76 --- Log message: Added a couple of entries about movhps and movlhps. --- Diffs of the changes: (+26 -0) README.txt | 26 ++ 1 files changed, 26 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.75 llvm/lib/Target/X86/README.txt:1.76 --- llvm/lib/Target/X86/README.txt:1.75 Mon Mar 27 20:44:05 2006 +++ llvm/lib/Target/X86/README.txt Mon Mar 27 20:49:12 2006 @@ -662,3 +662,29 @@ Obviously it would have been better for the first mov (or any op) to store directly %esp[0] if there are no other uses. + +//===-===// + +Is it really a good idea to use movlhps to move 1 double-precision FP value from +low quadword of source to high quadword of destination? + +e.g. + +void test2 (v2sd *b, double X, double Y) { + v2sd a = (v2sd) {X, X*Y}; + *b = a; +} + + movsd 8(%esp), %xmm0 + movapd %xmm0, %xmm1 + mulsd 16(%esp), %xmm1 + movlhps %xmm1, %xmm0 + movl 4(%esp), %eax + movapd %xmm0, (%eax) + ret + +icc uses unpcklpd instead. + +//===-===// + +Use movhps and movlhps to update upper 64-bits of a v4sf value. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.74 -> 1.75 --- Log message: All unpack cases are now being handled. --- Diffs of the changes: (+0 -4) README.txt |4 1 files changed, 4 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.74 llvm/lib/Target/X86/README.txt:1.75 --- llvm/lib/Target/X86/README.txt:1.74 Sun Mar 26 13:19:27 2006 +++ llvm/lib/Target/X86/README.txt Mon Mar 27 20:44:05 2006 @@ -662,7 +662,3 @@ Obviously it would have been better for the first mov (or any op) to store directly %esp[0] if there are no other uses. - -//===-===// - -Add more vector shuffle special cases using unpckhps and unpcklps. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
Re: [llvm-commits] CVS: llvm/lib/Target/X86/README.txt
The original note for implementing this (which I wrote) indicated that this should only be done for very small memory blocks, probably < 8 bytes, but certainly less than 64. I don't know what the magic number is where there's a tradeoff and its probably different for different targets, but certainly megabytes is WAAAY to big :) Reid. On Sun, 2006-03-26 at 13:19 -0600, Nate Begeman wrote: > > Changes in directory llvm/lib/Target/X86: > > README.txt updated: 1.73 -> 1.74 > --- > Log message: > > Readme note > > > --- > Diffs of the changes: (+7 -0) > > README.txt |7 +++ > 1 files changed, 7 insertions(+) > > > Index: llvm/lib/Target/X86/README.txt > diff -u llvm/lib/Target/X86/README.txt:1.73 > llvm/lib/Target/X86/README.txt:1.74 > --- llvm/lib/Target/X86/README.txt:1.73 Fri Mar 24 01:12:19 2006 > +++ llvm/lib/Target/X86/README.txtSun Mar 26 13:19:27 2006 > @@ -542,6 +542,13 @@ > > > //===-===// > > +We are currently lowering large (1MB+) memmove/memcpy to rep/stosl and > rep/movsl > +We should leave these as libcalls for everything over a much lower threshold, > +since libc is hand tuned for medium and large mem ops (avoiding RFO for large > +stores, TLB preheating, etc) > + > +//===-===// > + > Lower memcpy / memset to a series of SSE 128 bit move instructions when it's > feasible. > > > > > ___ > llvm-commits mailing list > llvm-commits@cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits signature.asc Description: This is a digitally signed message part ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.73 -> 1.74 --- Log message: Readme note --- Diffs of the changes: (+7 -0) README.txt |7 +++ 1 files changed, 7 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.73 llvm/lib/Target/X86/README.txt:1.74 --- llvm/lib/Target/X86/README.txt:1.73 Fri Mar 24 01:12:19 2006 +++ llvm/lib/Target/X86/README.txt Sun Mar 26 13:19:27 2006 @@ -542,6 +542,13 @@ //===-===// +We are currently lowering large (1MB+) memmove/memcpy to rep/stosl and rep/movsl +We should leave these as libcalls for everything over a much lower threshold, +since libc is hand tuned for medium and large mem ops (avoiding RFO for large +stores, TLB preheating, etc) + +//===-===// + Lower memcpy / memset to a series of SSE 128 bit move instructions when it's feasible. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ISelLowering.cpp
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.72 -> 1.73 X86ISelLowering.cpp updated: 1.127 -> 1.128 --- Log message: Gabor points out that we can't spell. :) --- Diffs of the changes: (+4 -4) README.txt |4 ++-- X86ISelLowering.cpp |4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.72 llvm/lib/Target/X86/README.txt:1.73 --- llvm/lib/Target/X86/README.txt:1.72 Fri Mar 24 00:40:32 2006 +++ llvm/lib/Target/X86/README.txt Fri Mar 24 01:12:19 2006 @@ -547,7 +547,7 @@ //===-===// -Teach the coallescer to commute 2-addr instructions, allowing us to eliminate +Teach the coalescer to commute 2-addr instructions, allowing us to eliminate the reg-reg copy in this example: float foo(int *x, float *y, unsigned c) { @@ -642,7 +642,7 @@ //===-===// -Teach the coallescer to coales vregs of different register classes. e.g. FR32 / +Teach the coalescer to coalesce vregs of different register classes. e.g. FR32 / FR64 to VR128. //===-===// Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.127 llvm/lib/Target/X86/X86ISelLowering.cpp:1.128 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.127 Fri Mar 24 00:40:32 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Fri Mar 24 01:12:19 2006 @@ -660,10 +660,10 @@ // EDX". Anything more is illegal. // // FIXME: The linscan register allocator currently has problem with -// coallescing. At the time of this writing, whenever it decides to coallesce +// coalescing. At the time of this writing, whenever it decides to coalesce // a physreg with a virtreg, this increases the size of the physreg's live // range, and the live range cannot ever be reduced. This causes problems if -// too many physregs are coalleced with virtregs, which can cause the register +// too many physregs are coaleced with virtregs, which can cause the register // allocator to wedge itself. // // This code triggers this problem more often if we pass args in registers, ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt X86ISelLowering.cpp
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.71 -> 1.72 X86ISelLowering.cpp updated: 1.126 -> 1.127 --- Log message: All v2f64 shuffle cases can be handled. --- Diffs of the changes: (+8 -1) README.txt |4 X86ISelLowering.cpp |5 - 2 files changed, 8 insertions(+), 1 deletion(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.71 llvm/lib/Target/X86/README.txt:1.72 --- llvm/lib/Target/X86/README.txt:1.71 Thu Mar 23 20:57:03 2006 +++ llvm/lib/Target/X86/README.txt Fri Mar 24 00:40:32 2006 @@ -655,3 +655,7 @@ Obviously it would have been better for the first mov (or any op) to store directly %esp[0] if there are no other uses. + +//===-===// + +Add more vector shuffle special cases using unpckhps and unpcklps. Index: llvm/lib/Target/X86/X86ISelLowering.cpp diff -u llvm/lib/Target/X86/X86ISelLowering.cpp:1.126 llvm/lib/Target/X86/X86ISelLowering.cpp:1.127 --- llvm/lib/Target/X86/X86ISelLowering.cpp:1.126 Thu Mar 23 20:58:06 2006 +++ llvm/lib/Target/X86/X86ISelLowering.cpp Fri Mar 24 00:40:32 2006 @@ -2329,7 +2329,10 @@ return DAG.getNode(ISD::VECTOR_SHUFFLE, VT, V1, DAG.getNode(ISD::UNDEF, V1.getValueType()), PermMask); -} else if (NumElems == 2 || X86::isSHUFPMask(PermMask.Val)) { +} else if (NumElems == 2) { + // All v2f64 cases are handled. + return SDOperand(); +} else if (X86::isSHUFPMask(PermMask.Val)) { SDOperand Elt = PermMask.getOperand(0); if (cast(Elt)->getValue() >= NumElems) { // Swap the operands and change mask. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.70 -> 1.71 --- Log message: A new entry --- Diffs of the changes: (+11 -0) README.txt | 11 +++ 1 files changed, 11 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.70 llvm/lib/Target/X86/README.txt:1.71 --- llvm/lib/Target/X86/README.txt:1.70 Tue Mar 21 01:18:26 2006 +++ llvm/lib/Target/X86/README.txt Thu Mar 23 20:57:03 2006 @@ -644,3 +644,14 @@ Teach the coallescer to coales vregs of different register classes. e.g. FR32 / FR64 to VR128. + +//===-===// + +mov $reg, 48(%esp) +... +leal 48(%esp), %eax +mov %eax, (%esp) +call _foo + +Obviously it would have been better for the first mov (or any op) to store +directly %esp[0] if there are no other uses. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.69 -> 1.70 --- Log message: Combine 2 entries --- Diffs of the changes: (+6 -8) README.txt | 14 ++ 1 files changed, 6 insertions(+), 8 deletions(-) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.69 llvm/lib/Target/X86/README.txt:1.70 --- llvm/lib/Target/X86/README.txt:1.69 Tue Mar 21 01:12:57 2006 +++ llvm/lib/Target/X86/README.txt Tue Mar 21 01:18:26 2006 @@ -485,6 +485,12 @@ //===-===// +Should generate min/max for stuff like: + +void minf(float a, float b, float *X) { + *X = a <= b ? a : b; +} + Make use of floating point min / max instructions. Perhaps introduce ISD::FMIN and ISD::FMAX node types? @@ -636,13 +642,5 @@ //===-===// -Should generate min/max for stuff like: - -void minf(float a, float b, float *X) { - *X = a <= b ? a : b; -} - -//===-===// - Teach the coallescer to coales vregs of different register classes. e.g. FR32 / FR64 to VR128. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.68 -> 1.69 --- Log message: Add a note about x86 register coallescing --- Diffs of the changes: (+2 -0) README.txt |2 ++ 1 files changed, 2 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.68 llvm/lib/Target/X86/README.txt:1.69 --- llvm/lib/Target/X86/README.txt:1.68 Sun Mar 19 16:27:41 2006 +++ llvm/lib/Target/X86/README.txt Tue Mar 21 01:12:57 2006 @@ -644,3 +644,5 @@ //===-===// +Teach the coallescer to coales vregs of different register classes. e.g. FR32 / +FR64 to VR128. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.67 -> 1.68 --- Log message: add a note with a testcase --- Diffs of the changes: (+11 -0) README.txt | 11 +++ 1 files changed, 11 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.67 llvm/lib/Target/X86/README.txt:1.68 --- llvm/lib/Target/X86/README.txt:1.67 Sun Mar 19 00:08:11 2006 +++ llvm/lib/Target/X86/README.txt Sun Mar 19 16:27:41 2006 @@ -633,3 +633,14 @@ The following tests perform worse with LSR: lambda, siod, optimizer-eval, ackermann, hash2, nestedloop, strcat, and Treesor. + +//===-===// + +Should generate min/max for stuff like: + +void minf(float a, float b, float *X) { + *X = a <= b ? a : b; +} + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.66 -> 1.67 --- Log message: Remember which tests are hurt by LSR. --- Diffs of the changes: (+4 -0) README.txt |4 1 files changed, 4 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.66 llvm/lib/Target/X86/README.txt:1.67 --- llvm/lib/Target/X86/README.txt:1.66 Thu Mar 16 16:44:22 2006 +++ llvm/lib/Target/X86/README.txt Sun Mar 19 00:08:11 2006 @@ -629,3 +629,7 @@ dependent LICM pass or 2) makeing SelectDAG represent the whole function. //===-===// + +The following tests perform worse with LSR: + +lambda, siod, optimizer-eval, ackermann, hash2, nestedloop, strcat, and Treesor. ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.65 -> 1.66 --- Log message: A new entry. --- Diffs of the changes: (+45 -0) README.txt | 45 + 1 files changed, 45 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.65 llvm/lib/Target/X86/README.txt:1.66 --- llvm/lib/Target/X86/README.txt:1.65 Wed Mar 8 19:39:46 2006 +++ llvm/lib/Target/X86/README.txt Thu Mar 16 16:44:22 2006 @@ -584,3 +584,48 @@ //===-===// +%X = weak global int 0 + +void %foo(int %N) { + %N = cast int %N to uint + %tmp.24 = setgt int %N, 0 + br bool %tmp.24, label %no_exit, label %return + +no_exit: + %indvar = phi uint [ 0, %entry ], [ %indvar.next, %no_exit ] + %i.0.0 = cast uint %indvar to int + volatile store int %i.0.0, int* %X + %indvar.next = add uint %indvar, 1 + %exitcond = seteq uint %indvar.next, %N + br bool %exitcond, label %return, label %no_exit + +return: + ret void +} + +compiles into: + + .text + .align 4 + .globl _foo +_foo: + movl 4(%esp), %eax + cmpl $1, %eax + jl LBB_foo_4# return +LBB_foo_1: # no_exit.preheader + xorl %ecx, %ecx +LBB_foo_2: # no_exit + movl L_X$non_lazy_ptr, %edx + movl %ecx, (%edx) + incl %ecx + cmpl %eax, %ecx + jne LBB_foo_2 # no_exit +LBB_foo_3: # return.loopexit +LBB_foo_4: # return + ret + +We should hoist "movl L_X$non_lazy_ptr, %edx" out of the loop after +remateralization is implemented. This can be accomplished with 1) a target +dependent LICM pass or 2) makeing SelectDAG represent the whole function. + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.64 -> 1.65 --- Log message: a couple of miscellaneous things. --- Diffs of the changes: (+18 -0) README.txt | 18 ++ 1 files changed, 18 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.64 llvm/lib/Target/X86/README.txt:1.65 --- llvm/lib/Target/X86/README.txt:1.64 Sat Mar 4 19:15:18 2006 +++ llvm/lib/Target/X86/README.txt Wed Mar 8 19:39:46 2006 @@ -566,3 +566,21 @@ jb LBB_foo_3# no_exit //===-===// + +Codegen: + if (copysign(1.0, x) == copysign(1.0, y)) +into: + if (x^y & mask) +when using SSE. + +//===-===// + +Optimize this into something reasonable: + x * copysign(1.0, y) * copysign(1.0, z) + +//===-===// + +Optimize copysign(x, *y) to use an integer load from y. + +//===-===// + ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
[llvm-commits] CVS: llvm/lib/Target/X86/README.txt
Changes in directory llvm/lib/Target/X86: README.txt updated: 1.63 -> 1.64 --- Log message: add a note for something evan noticed --- Diffs of the changes: (+28 -0) README.txt | 28 1 files changed, 28 insertions(+) Index: llvm/lib/Target/X86/README.txt diff -u llvm/lib/Target/X86/README.txt:1.63 llvm/lib/Target/X86/README.txt:1.64 --- llvm/lib/Target/X86/README.txt:1.63 Sat Mar 4 01:49:50 2006 +++ llvm/lib/Target/X86/README.txt Sat Mar 4 19:15:18 2006 @@ -538,3 +538,31 @@ Lower memcpy / memset to a series of SSE 128 bit move instructions when it's feasible. + +//===-===// + +Teach the coallescer to commute 2-addr instructions, allowing us to eliminate +the reg-reg copy in this example: + +float foo(int *x, float *y, unsigned c) { + float res = 0.0; + unsigned i; + for (i = 0; i < c; i++) { +float xx = (float)x[i]; +xx = xx * y[i]; +xx += res; +res = xx; + } + return res; +} + +LBB_foo_3: # no_exit +cvtsi2ss %XMM0, DWORD PTR [%EDX + 4*%ESI] +mulss %XMM0, DWORD PTR [%EAX + 4*%ESI] +addss %XMM0, %XMM1 +inc %ESI +cmp %ESI, %ECX +movaps %XMM1, %XMM0 +jb LBB_foo_3# no_exit + +//===-===// ___ llvm-commits mailing list llvm-commits@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits