On Monday, 8 December 2014 at 17:05:09 UTC, John Colvin wrote:
On Monday, 8 December 2014 at 16:32:50 UTC, Martin Nowak wrote:
I want to do bounds checking of 2 (4 on avx) ulongs (64-bit) at a time.

ulong2 vval = [v0, v1];
ulong2 vlow = [low, low];
ulong2 vhigh = [high, high];

int res = PMOVMSKB(vval >= vlow & vval < vhigh);

I figured out sort of a solution, but it seems way too complicated, because there is only signed comparison.

Usually (scalar) I'd use this, which makes use of unsigned wrap to safe one conditional

immutable size = cast(ulong)(vhigh - vlow);
if (cast(ulong)(v0 - vlow) < size) {}
if (cast(ulong)(v1 - vlow) < size) {}

over

if (v0 >= vlow && v0 < vhigh) {}

Maybe this can be used on SIMD too (saturated sub or so)?

-Martin

Well gcc gives me:

typedef unsigned long ulong4 __attribute__ ((vector_size (32)));

ulong4 foo(ulong4 a, ulong4 l, ulong4 h)
{
    return (a >= l) & (a < h);
}


foo(unsigned long __vector, unsigned long __vector, unsigned long __vector):
        vmovdqa .LC0(%rip), %ymm3
        vpsubq  %ymm3, %ymm0, %ymm0
        vpsubq  %ymm3, %ymm2, %ymm2
        vpsubq  %ymm3, %ymm1, %ymm1
        vpcmpgtq        %ymm0, %ymm2, %ymm2
        vpcmpgtq        %ymm0, %ymm1, %ymm1
        vpandn  %ymm2, %ymm1, %ymm0
        ret
.LC0:
        .quad   -9223372036854775808
        .quad   -9223372036854775808
        .quad   -9223372036854775808
        .quad   -9223372036854775808

To conceptually get what it's doing here, the trick is that it's offsetting the values so as to simulate unsigned comparisons using signed instructions.

Reply via email to