On Wednesday, 1 June 2016 at 23:23:49 UTC, ZILtoid1991 wrote:
Here's the assembly code for my alpha-blending routine:
ubyte[4] src = *cast(ubyte[4]*)(palette.ptr + 4 * *c);
ubyte[4] *p = cast(ubyte[4]*)(workpad + (offsetX + x)*4 +
offsetY);
asm{ //moving the values to their destinations
movd MM0, p;
movd MM1, src;
movq MM5, alpha;
movq MM7, alphaMMXmul_const1;
movq MM6, alphaMMXmul_const2;
punpcklbw MM2, MM0;
punpcklbw MM3, MM1;
paddw MM6, MM5; //1 + alpha
psubw MM7, MM5; //256 - alpha
pmulhuw MM2, MM6; //src * (1 + alpha)
pmulhuw MM3, MM7; //dest * (256 - alpha)
paddw MM3, MM2; //(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw MM3, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) /
256
//moving the result to its place;
packuswb MM4, MM3;
movd p, MM4;
emms;
}
The two constants being referred here:
static immutable ushort[4] alphaMMXmul_const1 =
[256,256,256,256];
static immutable ushort[4] alphaMMXmul_const2 = [1,1,1,1];
alpha is a ushort[4] containing the alpha value four times.
After some debugging, I found out that the p pointer becomes
null at the end instead of pointing to a value. I have no
experience with using in-line assemblers (although I made a few
Hello World programs for MS-Dos with a stand-alone assembler),
so I don't know when and how the compiler will interpret the
types from D.
Problem solved. Current assembly code:
asm{
//moving the values to their destinations
mov EBX, p[EBP];
movd MM0, src;
movd MM1, [EBX];
movq MM5, alpha;
movq MM7, alphaMMXmul_const256;
movq MM6, alphaMMXmul_const1;
pxor MM2, MM2;
punpcklbw MM0, MM2;
punpcklbw MM1, MM2;
paddusw MM6, MM5; //1 + alpha
psubusw MM7, MM5; //256 - alpha
pmullw MM0, MM6; //src * (1 + alpha)
pmullw MM1, MM7; //dest * (256 - alpha)
paddusw MM0, MM1; //(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw MM0, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) /
256
//moving the result to its place;
//pxor MM2, MM2;
packuswb MM0, MM2;
movd [EBX], MM0;
emms;
}
The actual problem was the poor documentation of MMX instructions
as it never really caught on, and the disappearance of assembly
programming from the mainstream. The end result was a quick
alpha-blending algorithm that barely has any extra performance
penalty compared to just copying the pixels. I currently have no
plans on translating the whole sprite displaying algorithm to
assembly, instead I'll work on the editor for the game engine.