Am 15.08.2013 20:27, schrieb Zack Rusin: >> I realize this function isn't used but it looks unnecessarily >> complicated - two constants one AND plus one comparison when you >> could simply do a single comparison (compare x with x with >> unordered not equal). This is actually doubly bad with AVX because >> the int comparison is going to use 4 instructions instead of 1 >> (extract/2 cmp/1 insert), well if this runs 8-wide at least. > > I'm going to kill that function, we already have lp_build_isnan that > does the correct thing. Ah right. It actually uses OEQ then does a not, I think it would be easier to understand if it would just use UNE. I think at some point we might want to optimize these functions a bit, even if it's really llvm's fault. Consider this (a trivial "select 1.0 if input is a nan otherwise 0.0")
define <4 x float> @une(<4 x float> %in) { entry: %0 = fcmp une <4 x float> %in, %in %1 = sext <4 x i1> %0 to <4 x i32> %2 = and <4 x i32> %1, <i32 1065353216,i32 1065353216,i32 1065353216,i32 1065353216> %3 = bitcast <4 x i32> %2 to <4 x float> ret <4 x float> %3 } This generates some neat code: vcmpunordps %xmm0, %xmm0, %xmm0 vandps .LCPI0_0(%rip), %xmm0, %xmm0 ret (FWIW the OEQ version using a xor generates much the same just vcmpordps and vandnotps instead). But now do it with a 1-vec (which we will generally never do, just for illustration): define <1 x float> @une(<1 x float> %in) { entry: %0 = fcmp une <1 x float> %in, %in %1 = sext <1 x i1> %0 to <1 x i32> %2 = and <1 x i32> %1, <i32 1065353216> %3 = bitcast <1 x i32> %2 to <1 x float> ret <1 x float> %3 } Generates not-so-hot-code: vucomiss %xmm0, %xmm0 setp %al movzbl %al, %eax shll $31, %eax sarl $31, %eax andl $1065353216, %eax # imm = 0x3F800000 vmovd %eax, %xmm0 ret We generally use true scalars instead: define float @une(float %in) { entry: %0 = fcmp une float %in, %in %1 = sext i1 %0 to i32 %2 = and i32 %1, 1065353216 %3 = bitcast i32 %2 to float ret float %3 } Which generates this abomination (don't ask me why <1 x float> vs. float makes a difference in generated code): vucomiss %xmm0, %xmm0 setp %cl setne %al orb %cl, %al movl $-1, %ecx testb %al, %al movl $0, %eax cmovnel %ecx, %eax andl $1065353216, %eax # imm = 0x3F800000 vmovd %eax, %xmm0 ret So, it looks like for the scalar case using int comparisons would be definitely better as it should avoid most of that ugliness (but I couldn't be bothered to write the test case for that). Now granted my guess is llvm simply should have used the simd unit no matter which version is used in the first place - nothing forbids it to use vector instructions for scalars. But no luck unfortunately... (It is possible newer llvm versions handle this better, didn't check, but I suspect shit like that could potentially make a performance difference if these functions are used enough.) > >> Otherwise looks good. Though I'm not sure you really need to kill >> the prims if the clip distances are infinite? > > The d3d10 spec says "Coordinates coming in to clipping with infinites > at x, y, z may or may not result in a discarded primitive.". I liked > handling them the same way as nan, otherwise we're just generating > pointless primitives. I don't have a strong opinion though, wlk > doesn't seem to test infinites. Ok whatever works. I just thought the test may be simpler but maybe not (given the above...) so that's just fine. Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev