Edouard Gomez ([EMAIL PROTECTED]) wrote:
> Here is an updated patch... still no SIMD... might come later.

Ok found some time to write the kernel of the computation in SSE

        asm volatile (
                "movaps (%1), %%xmm0\n\t"   // xmm0 = Lr, Lg, Lb, 0
                "movaps (%2), %%xmm1\n\t"   // xmm1 = R, G, B, G2
                "movaps %%xmm1, %%xmm4\n\t" // xmm4 = R, G, B, G2
                "mulps %%xmm0, %%xmm1\n\t"  // xmm1 = Lr*R, Lg*G, Lb*B, 0
                "movhlps %%xmm1, %%xmm0\n\t"// xmm0 = LbB, 0, x, x
                "addps %%xmm0, %%xmm1\n\t"  // xmm1 = LrR + LbB, LgG, x, x
                "movaps %%xmm1, %%xmm0\n\t" // xmm0 = LrR + LbB, LgG, x, x
                "shufps $0x1, %%xmm0, %%xmm0\n\t" // xmm0 = LgG, x, x , x 
                "addps %%xmm1, %%xmm0\n\t"  // xmm0 = Y = LrR + LbB + LgG, x, 
x, x
                "movaps %%xmm0, %%xmm1\n\t" // xmm1 = LrR + LbB + LgG, x, x, x
                "maxss %%xmm2, %%xmm1\n\t"  // xmm1 = max(Y, 0)
                "minss %%xmm3, %%xmm1\n\t"  // xmm1 = min(Y, 65535)
                "cvtss2si %%xmm1, %%rax\n\t"// eax = (int)Y
                "movss (%3,%%rax, 4), %%xmm1\n\t" // xmm1 = curve[(int)Y]
                "mulss %%xmm3, %%xmm1\n\t"  // xmm1 = curve[(int)Y]*65535.f = Y'
                "maxss %%xmm2, %%xmm1\n\t"  // xmm1 = max(Y', 0)
                "minss %%xmm3, %%xmm1\n\t"  // xmm1 = min(Y', 65535)
                "divss %%xmm0, %%xmm1\n\t"  // xmm1 = Y'/Y = a
                "shufps $0x0, %%xmm1, %%xmm1\n\t" // xmm1 = a, a, a, a
                "mulps %%xmm4, %%xmm1\n\t"  // xmm1 = a*R, a*G, a*B, a*G2
                "maxps %%xmm2, %%xmm1\n\t"  // xmm1 = max(xmm1, 0)
                "minps %%xmm3, %%xmm1\n\t"  // xmm1 = min(xmm1, 65535)
                "movaps %%xmm1, %0\n\t"
                : "=m" (result)
                : "r" (luminance),
                  "r" (rgbg),
                  "r" (curve)
                : "%xmm0", "%xmm1", "%xmm4", "%rax", "memory");

Assumptions:
1 - xmm2 is supposed to be a 0 vector
2 - xmm3 is supposed to be a 65535.f vector
3 - curve is supposed to be something [0.f, 1.f] but as it's not clipped to
    that range prior to the rendering, i included some clipping max/min
    magic.
4 - luminance is an aligned vector pointing to the 3 Y factors for RGB +
    a fourth 0 value
5 - result is an aligned float[4]

Possible changes:
1 - All alignments can be removed if necessary, just use a unaligned movups
    instruction
2 - The SSE division can be spared if we precompute the curve[Y]/Y table
    this removes the need to clip the result to [0, 65535] and avoids
    the stupid case Y=0 that is not handled in this code.
    I just didn't know the policy for precomputed tables in code if
    they're not used for all platforms and if the platform is runtime
    detected.

Ok, time to have some rest :-) it's now 2AM

-- 
Edouard Gomez

_______________________________________________
Rawstudio-dev mailing list
[email protected]
http://rawstudio.org/cgi-bin/mailman/listinfo/rawstudio-dev

Reply via email to