On Wednesday, 1 June 2016 at 23:23:49 UTC, ZILtoid1991 wrote:
Here's the assembly code for my alpha-blending routine:
ubyte[4] src = *cast(ubyte[4]*)(palette.ptr + 4 * *c);
ubyte[4] *p = cast(ubyte[4]*)(workpad + (offsetX + x)*4 + offsetY);
asm{    //moving the values to their destinations
movd    MM0, p;
movd    MM1, src;
movq    MM5, alpha;
movq    MM7, alphaMMXmul_const1;
movq    MM6, alphaMMXmul_const2;
                                                                        
punpcklbw       MM2, MM0;
punpcklbw       MM3, MM1;

paddw   MM6, MM5;       //1 + alpha
psubw   MM7, MM5;       //256 - alpha

pmulhuw MM2, MM6;       //src * (1 + alpha)
pmulhuw MM3, MM7;       //dest * (256 - alpha)
paddw   MM3, MM2;       //(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw MM3, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) / 256
                                                                        
//moving the result to its place;
                                                                        
packuswb        MM4, MM3;
movd    p, MM4;
emms;
}

The two constants being referred here:
static immutable ushort[4] alphaMMXmul_const1 = [256,256,256,256];
static immutable ushort[4] alphaMMXmul_const2 = [1,1,1,1];

alpha is a ushort[4] containing the alpha value four times.

After some debugging, I found out that the p pointer becomes null at the end instead of pointing to a value. I have no experience with using in-line assemblers (although I made a few Hello World programs for MS-Dos with a stand-alone assembler), so I don't know when and how the compiler will interpret the types from D.

Problem solved. Current assembly code:

asm{
                                                                        
//moving the values to their destinations
mov             EBX, p[EBP];
movd    MM0, src;
movd    MM1, [EBX];

movq    MM5, alpha;                     
movq    MM7, alphaMMXmul_const256;
movq    MM6, alphaMMXmul_const1;
pxor    MM2, MM2;
punpcklbw       MM0, MM2;
punpcklbw       MM1, MM2;

paddusw MM6, MM5;       //1 + alpha
psubusw MM7, MM5;       //256 - alpha

pmullw  MM0, MM6;       //src * (1 + alpha)
pmullw  MM1, MM7;       //dest * (256 - alpha)
paddusw MM0, MM1;       //(src * (1 + alpha)) + (dest * (256 - alpha))
psrlw MM0, 8; //(src * (1 + alpha)) + (dest * (256 - alpha)) / 256
                                                                        
//moving the result to its place;
//pxor  MM2, MM2;
packuswb        MM0, MM2;

movd    [EBX], MM0;

emms;
}
The actual problem was the poor documentation of MMX instructions as it never really caught on, and the disappearance of assembly programming from the mainstream. The end result was a quick alpha-blending algorithm that barely has any extra performance penalty compared to just copying the pixels. I currently have no plans on translating the whole sprite displaying algorithm to assembly, instead I'll work on the editor for the game engine.

Reply via email to