This by the way is my work around atm. but it seems a bit nuts. The logic
fundamentally just forces the numerically larger of the two
products to always be on the left side followed by a negate if there was a
swap. It forces the compiler to not use fma.
inline const float SymmDiff(const float a, const float b)
{
const unsigned int mask = sign_extend( abs(a)>abs(b) );
unsigned int x = intbits(a), y = intbits(b);
x ^= (mask&y); y ^= (mask&x); x ^= (mask&y);
const float diff = floatbits(x) - floatbits(y);
return floatbits(intbits(diff) ^ (0x80000000&mask));
}
inline const Vec3 Cross(const Vec3 v1, const Vec3 v2)
{
Vec3 v;
// doing this to solve an ispc issue where Cross(v0,v1) != -Cross(v1,v0)
for AVX2 target
// this way we know cross products between adjacent faces will become
numerically identical.
v.x = SymmDiff(v1.y*v2.z, v2.y*v1.z);
v.y = SymmDiff(v1.z*v2.x, v2.z*v1.x);
v.z = SymmDiff(v1.x*v2.y, v2.x*v1.y);
return v;
}
On Saturday, September 24, 2016 at 4:45:46 PM UTC-7, Morten Mikkelsen wrote:
>
> >You can use --opt=disable-fma if you need to avoid FMAs.
>
> Is there an option that's less dramatic then disabling it globally for my
> kernel?
> I only really need it for this one function. I have a similar question in
> regards to --addressing=64:
>
> Is there a way (like an intrinsic) to tell a specific read to use 64 bit
> calculation while allowing others within the kernel to remain 32 bit.
>
> Thank you.
>
> Morten.
>
>
>
>
> On Monday, September 12, 2016 at 9:21:20 AM UTC-7, Dmitry Babokin wrote:
>>
>> If I understand the problem correctly, on AVX2 ISPC has generated FMA
>> (a*b+c) instructions, which led to the problem that you have. So the code
>> is still numerically correct, but you don't have *exact* the same result
>> for cross(v0,v1) and -cross(v1,v0).
>>
>> The "problem" comes from the fact that in FMA operation there's no
>> intermediate rounding to the register size, which happens in consecutive
>> multiply and then add instructions, so you can think of multiply done in
>> infinite precision.
>>
>> You can use --opt=disable-fma if you need to avoid FMAs.
>>
>> Dmitry.
>>
>>
>> On Mon, Sep 12, 2016 at 6:36 PM, Morten Mikkelsen <[email protected]>
>> wrote:
>>
>>> I have this basic cross product declared like this in my kernel:
>>>
>>> inline const Vec3 Cross(const Vec3 v1, const Vec3 v2)
>>> {
>>> Vec3 v;
>>> v.x = v1.y*v2.z - v2.y*v1.z;
>>> v.y = v1.z*v2.x - v2.z*v1.x;
>>> v.z = v1.x*v2.y - v2.x*v1.y;
>>> return v;
>>> }
>>>
>>> Though I get that floating point operations are generally speaking
>>> order-dependent I was expecting given the simplicity of a cross product
>>> that we'd still find Cross(v0,v1) = -Cross(v1,v0) yet when I build for the
>>> avx2 target this is not what I'm seeing. For all other targets SSE2-AVX1 it
>>> is working in the sense that the above property holds true.
>>>
>>> Right now I'm working around by doing sort of a 96bit check if v0>v1 and
>>> if it is then I swap the inputs but it's adding overhead. So I wanted to
>>> ask if you guys would consider it a big or not on ispc side? Or is it
>>> simply that I can't reasonably expect Cross(v0,v1) = -Cross(v1,v0) for the
>>> generated code.
>>>
>>>
>>> Thank you,
>>>
>>> Morten S. Mikkelsen
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Intel SPMD Program Compiler Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.