Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.10 19:40 José Fonseca wrote: ... since the specs give some tolerance it would be nice to run the conformance tests with different settings in mmx_blend.S, specially the single multiply w/o rouding ... I've started to play with glean and I tried to check this myself, but it seems there is no effect in the results no matter what changes I make in mmx_blend.S. The command line I use to run glean is: LD_LIBRARY_PATH=/home/jfonseca/projects/dri/mesa3d/Mesa/lib ./glean -r mesa Am I doing some thing wrong here? Also, the whole test takes a bunch of time. Which of the tests should I look for? Is it just blendFunc? José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
José Fonseca wrote: On 2002.04.10 19:40 José Fonseca wrote: ... since the specs give some tolerance it would be nice to run the conformance tests with different settings in mmx_blend.S, specially the single multiply w/o rouding ... I've started to play with glean and I tried to check this myself, but it seems there is no effect in the results no matter what changes I make in mmx_blend.S. The command line I use to run glean is: LD_LIBRARY_PATH=/home/jfonseca/projects/dri/mesa3d/Mesa/lib ./glean -r mesa Am I doing some thing wrong here? Hmmm, I had put a printf in the _swrast_choose_blend_func() function to be sure the mmx routine was being chosen, and it was. Are you sure you're compiling Mesa with -DUSE_MMX_ASM? There's also a runtime check for MMX support. When compiled with the DEBUG token defined, Mesa will print a mesage to stdout to indicate if MMX, SSE and 3DNow are being used. Also, the whole test takes a bunch of time. Which of the tests should I look for? Is it just blendFunc? Yes, just blendFunc is sufficient. I usually run glean like this: glean -r res --visuals id==35 -t blendFunc Where '35' is a typial GLX visual. Otherwise all the visuals are tested - which is overkill if you're focused on one particular test case. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.12 14:43 José Fonseca wrote: On 2002.04.10 19:40 José Fonseca wrote: ... since the specs give some tolerance it would be nice to run the conformance tests with different settings in mmx_blend.S, specially the single multiply w/o rouding ... I've started to play with glean and I tried to check this myself, but it seems there is no effect in the results no matter what changes I make in mmx_blend.S. ... I've come to the conclusion that glean requirements lowered quite a deal! It even passes making s = (p*a + q*(255 - a)) 8 without a warning, thought it recognizes the less accuracy when comparing with the default implementation. The only way I managed to make it fail was setting a = 0xf0, to reduce heavily the precision of results!! It seems that there is a bug on glean. I'm using latest CVS. I'll try to run the glperf tests now and see if I can get to the bottom of this issue. José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On Fri, Apr 12, 2002 at 08:33:17AM -0600, Brian Paul wrote: Yes, just blendFunc is sufficient. I usually run glean like this: glean -r res --visuals id==35 -t blendFunc Where '35' is a typial GLX visual. Otherwise all the visuals are tested - which is overkill if you're focused on one particular test case. iirc, there's a bug in tblend.cpp, when it does the check, it doesn't increment ePix, aPix so some pixels aren't checked. Unless I've not got the latest version which is possible -- Michael. ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On Fri, Apr 12, 2002 at 02:43:00PM +0100, José Fonseca wrote: | | I've started to play with glean and I tried to check this myself, but it | seems there is no effect in the results no matter what changes I make in | mmx_blend.S. ... The blend test only fails if an error is greater than one least-significant bit in the framebuffer color channels. If I understood your earlier messages correctly, none of the methods you're investigating has more than 1 LSB error; they differ mainly in rounding behavior, which I'd expect to introduce errors on the order of 0.5 LSB. So it's possible that glean is too lenient to tell the difference between the methods you're testing. | ...The command line I use to run glean is: | | LD_LIBRARY_PATH=/home/jfonseca/projects/dri/mesa3d/Mesa/lib | ./glean -r mesa | | Am I doing some thing wrong here? | | Also, the whole test takes a bunch of time. Which of the tests should I | look for? Is it just blendFunc? Sure, just run the test you need, on the Visuals you need. If there are dependencies between tests, glean will automatically run the prerequisite tests first. You can select the Visual number like Brian suggested, or you can specify a Visual filter string on the glean command line. For casual testing I normally use something like this: glean -r mesa --visuals max rgb, z, s, db -t blendFunc Allen ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On Fri, Apr 12, 2002 at 05:12:33PM +0100, Michael wrote: | | iirc, there's a bug in tblend.cpp, when it does the | check, it doesn't increment ePix, aPix so some pixels aren't checked. Yep, that's definitely a bug. Why haven't I heard a bug report before now? :-) Fix is checked in. Allen ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.12 18:26 Allen Akin wrote: On Fri, Apr 12, 2002 at 02:43:00PM +0100, José Fonseca wrote: | | I've started to play with glean and I tried to check this myself, but it | seems there is no effect in the results no matter what changes I make in | mmx_blend.S. ... The blend test only fails if an error is greater than one least-significant bit in the framebuffer color channels. If I understood your earlier messages correctly, none of the methods you're investigating has more than 1 LSB error; they differ mainly in rounding behavior, which I'd expect to introduce errors on the order of 0.5 LSB. Yes, in general that's true. Although doing (p*a+q*(1-a)) 8 can introduce up to 1 LSB error and worst, it doesn't obey to the rule of 255*255 = 255 as 255*255/256 = 254. I know that in Mesa's C blending code this special case of a=255 is always checked, but in the MMX code it isn't, and glean doesn't complain of that. So it's possible that glean is too lenient to tell the difference between the methods you're testing. So how come the Mesa blending code in s_blend.c has coments such as This is pretty close, but Glean complains, This is slower but satisfies Glean, and This satisfies Glean and should be reasonably fast...? I can only understand these statments if Mesa was being compared to some reference implementation.. ... You can select the Visual number like Brian suggested, or you can specify a Visual filter string on the glean command line. For casual testing I normally use something like this: glean -r mesa --visuals max rgb, z, s, db -t blendFunc Ok. Thanks! Allen José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On Fri, Apr 12, 2002 at 07:01:08PM +0100, José Fonseca wrote: | ... Although doing (p*a+q*(1-a)) 8 can | introduce up to 1 LSB error and worst, it doesn't obey to the rule of | 255*255 = 255 as 255*255/256 = 254. I know that in Mesa's C blending code | this special case of a=255 is always checked, but in the MMX code it | isn't, and glean doesn't complain of that. If the expected value is 255 and the OpenGL implementation yields 254, that's only one LSB of error, so glean probably won't complain about it. We could make the test more stringent, but then some reasonable implementations (especially some hardware implementations) would fail. Also, maintaining enough accuracy to yield results correct to 1 LSB is already pretty challenging when color channels are deeper than 8 bits. | So how come the Mesa blending code in s_blend.c has coments such as This | is pretty close, but Glean complains, This is slower but satisfies | Glean, and This satisfies Glean and should be reasonably fast...? I don't know. It might be related to deep color channels, or possibly to glean tests other than the basic blending tests. Brian might remember. Thanks for all your good work, by the way! Allen ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
Allen Akin wrote: On Fri, Apr 12, 2002 at 07:01:08PM +0100, José Fonseca wrote: | ... Although doing (p*a+q*(1-a)) 8 can | introduce up to 1 LSB error and worst, it doesn't obey to the rule of | 255*255 = 255 as 255*255/256 = 254. I know that in Mesa's C blending code | this special case of a=255 is always checked, but in the MMX code it | isn't, and glean doesn't complain of that. If the expected value is 255 and the OpenGL implementation yields 254, that's only one LSB of error, so glean probably won't complain about it. We could make the test more stringent, but then some reasonable implementations (especially some hardware implementations) would fail. Also, maintaining enough accuracy to yield results correct to 1 LSB is already pretty challenging when color channels are deeper than 8 bits. This brings up some interesting questions about blending that aren't directly addressed in the OpenGL specification. One might expect that the following identities be true for blending terms: 1.0 * 1.0 == 1.0 1.0 * x == x * 1.0 == x 0.0 * x == x * 0.0 == 0.0 So for 8-bit channels, in fixed point: 255 * 255 == 255 I can easily imagine cases in which applications would depend on these identities being true (in blending and elsewhere). In fact, I have a vague recollection of someone bringing up this issue a few years ago. I'd like to see Mesa satisfy the 255*255=255 identity. Is it hard to implement that in the MMX code? If it is, we could let it go for now and see if anyone complains. | So how come the Mesa blending code in s_blend.c has coments such as This | is pretty close, but Glean complains, This is slower but satisfies | Glean, and This satisfies Glean and should be reasonably fast...? I don't know. It might be related to deep color channels, or possibly to glean tests other than the basic blending tests. Brian might remember. It's been at least a year since I touched that code. As far as I can remember the comments are correct. Though I don't remember if it was an issue at 5/6/5 or 8/8/8 color depth, or both. I don't know what else might have changed since then to cause different results with Glean. Thanks for all your good work, by the way! Yes! -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
Allen Akin wrote: If the expected value is 255 and the OpenGL implementation yields 254, that's only one LSB of error, so glean probably won't complain about it. We could make the test more stringent, but then some reasonable implementations (especially some hardware implementations) would fail. Also, maintaining enough accuracy to yield results correct to 1 LSB is already pretty challenging when color channels are deeper than 8 bits. Perhaps we need a -pedantic-like command-line option to force more stringent tests? It could be used when testing software implementations and (perhaps) newer hardware implementations, but not used on older cards (I seem to recall the 3dfx cards were a major source of problems here...). -- Gareth ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On Fri, Apr 12, 2002 at 01:14:30PM -0600, Brian Paul wrote: | | One might expect that the following identities be true for blending terms: | | 1.0 * 1.0 == 1.0 | 1.0 * x == x * 1.0 == x | 0.0 * x == x * 0.0 == 0.0 | | So for 8-bit channels, in fixed point: | | 255 * 255 == 255 | | I can easily imagine cases in which applications would depend on these | identities being true (in blending and elsewhere). Sounds like a good candidate for a glean test. Anyone out there want to give it a try? Allen ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa MMX blend code finished
I've finally ( hopefully) finished the rewrite of Mesa's MMX blend code. Is it already in binary snapshots? Cheers, Sergey ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.10 09:03 Sergey V. Udaltsov wrote: I've finally ( hopefully) finished the rewrite of Mesa's MMX blend code. Is it already in binary snapshots? Cheers, Sergey Nope. It's really a small drop in the ocean so there is no need to rush. I hope Brian will integrate on the mesa trunk soon. This way there are less places to fix eventual bugs. Regards, José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa MMX blend code finished
José Fonseca wrote: On 2002.04.10 09:03 Sergey V. Udaltsov wrote: I've finally ( hopefully) finished the rewrite of Mesa's MMX blend code. Is it already in binary snapshots? Cheers, Sergey Nope. It's really a small drop in the ocean so there is no need to rush. I hope Brian will integrate on the mesa trunk soon. This way there are less places to fix eventual bugs. I've checked it into Mesa CVS, both the trunk and the mesa_4_0_branch in case there's a 4.0.3 release. -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Mesa MMX blend code finished
José, I've checked in the code after testing with Glean and the OpenGL conformance tests. Was I supposed to change something in the C code? It passes the conformance tests as-is. Thanks for you work! -Brian ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.10 17:42 Brian Paul wrote: José, I've checked in the code after testing with Glean and the OpenGL conformance tests. Great. Was I supposed to change something in the C code? It passes the conformance tests as-is. I was surprised that the C code passed the conformance tests, because of the signed arithmetic it doesn't give the same results as before. So I've made a small comparision with the several methods (test program attached): // Nathan's method - unsigned 24bit arithmetic // NOTE: this was the original Mesa code t1 = p*a + q*(255 - a); s1 = (t1 + (t1 8) + 256) 16; // Nathan's method - signed 24bit arithmetic (less one multiply) // NOTE: this is how I changed and is now t2 = (p - q)*a; s2 = (t2 + (t2 8) + 256) 16; s2 += q; s2 = 0xff; // Blin's method - unsigned 16bit arithmetic // NOTE: is exact t3 = p*a + q*(255-a) + 128; s3 = (t3 + (t3 8)) 8; // Blin's method - signed 16bit arithmetic (less one multiply) // NOTE: is exact because the negative sign is considered t4 = ((p - q)*a + (p q ? 128 : -128)) 0x; s4 = (t4 + (t4 8)) 8; s4 += q; s4 = 0xff; When one compares with the exact result // exact result - rounded s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0) + 0.5); one gets: 1: 8164890 differences in 16777216 2: 8148697 differences in 16777216 3: 0 differences in 16777216 4: 0 differences in 16777216 So spite of the different results between 1 and 2, 2 gives better results overall!! What happens is that method 1 is aimed to follow the truncated results and not the rounded. If one compares with the truncated result // truncated result s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0)); one gets: 1: 15467 differences in 16777216 2: 31660 differences in 16777216 3: 8180357 differences in 16777216 4: 8180357 differences in 16777216 Notice that, by this point of view, the method 2 is indeed worst, but this really doesn't matter because is the wrong point of view. This explains why the current C code passes the conformance tests. At this moment the MMX code implements method 4, which is very fast. There is no point in implement method 2, spite being a little faster than method 4 (because of the simpler rounding) because it would requite 24bit arithmetic instead of 16, so less numbers could be multiplied at the same time. So, in contrary of what I thought, there is no need to switch to method 1. When I implement the double blend trick I will have to use the method 4, again for the same reasons of above. But since the specs give some tolerance it would be nice to run the conformance tests with different settings in mmx_blend.S, specially the single multiply w/o rouding which would give at least 5% improvement (it will be a little more because it would allow to free some registers allowing to leaving some necessary constants there). For that is just necessary to change #define GMBT_ROUNDOFF 0 leaving the rest as before #define GMBT_ALPHA_PLUS_ONE 0 #define GMBT_GEOMETRIC_SERIES 1 #define GMBT_SIGNED_ARITHMETIC 1 Using the alpha+1 method and not using the geometric series would be the even faster but it is already marked on the C code as rejected by glean... Thanks for you work! -Brian Regards, José Fonseca #include stdio.h #include stdlib.h int main() { unsigned short p, q, a; unsigned c1 = 0, c2 = 0, c3 = 0, c4 = 0; for (p = 0; p = 255; ++p) for (q = 0; q = 255; ++q) for (a = 0; a = 255; ++a) { unsigned s; unsigned s1, s2, s3, s4; unsigned t1, t2, t3, t4; #if 1 // exact result - rounded s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0) + 0.5); #else // truncated result s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0)); #endif // Nathan's method - unsigned 24bit arithmetic t1 = p*a + q*(255 - a); s1 = (t1 + (t1 8) + 256) 16; // Nathan's method - signed 24bit arithmetic t2 = (p - q)*a; s2 = (t2 + (t2 8) + 256) 16; s2 += q; s2 = 0xff; // Blin's method - unsigned 16bit arithmetic // NOTE: is exact t3 = p*a + q*(255-a) + 128; s3 = (t3 + (t3 8)) 8; // Blin's method - signed 16bit arithmetic // NOTE: is exact because the negative sign is considered t4 = ((p - q)*a + (p q ? 128 : -128)) 0x; s4 = (t4 + (t4 8)) 8; s4 += q; s4 = 0xff; if(s1 != s) ++c1; if(s2 != s) ++c2; if(s3 != s) ++c3; if(s4 != s) ++c4; if (s1 != s || s2 != s || s3 != s || s4 != s) { // printf(%3ux%3ux%3u:\t(%3u)\t%3u\t%3u\t%3u\t%3u\n, p, a, q, s, s1, s2,
Re: [Dri-devel] Mesa MMX blend code finished
On 2002.04.10 11:41 Sergey V. Udaltsov wrote: Nope. It's really a small drop in the ocean so there is no need to rush. I hope Brian will integrate on the mesa trunk soon. This way there are less places to fix eventual bugs. I see. Actually, AFAIU it would be not exactly mach64 snapshot but rather libGL shapshot (since it is about indirect rendering speedup). Am right? Not really. It would speed up the indirect rendering and the mach64 when it fallbacks to software, which doesn't happens so often yet because we're not really striving for opengl conformance *yet*. Looking forward to hear some news from mach64 front. Regards, Sergey José Fonseca ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Mesa MMX blend code finished
Ufh!! 8-O I've finally ( hopefully) finished the rewrite of Mesa's MMX blend code. The code is configurable - allowing to choose several methods for the blending equation. Here are the benchmarks that I made (on a Pentium III 700Mhz): C code: 8.142382 sec Old MMX code: 4.363946 sec exact (single multiply w/o rounding): 3.637088 sec= fastest that will satisfy glean exact (two multiplies w/o rounding):3.817152 sec approx (single multiply w/o rouding): 3.476336 sec approx (two multiplies w/o rounding:3.629378 sec approx (single multiply with alpha+1): 3.250325 sec Attached is the file and the testsuite I used (note that to use the testsuite it's needed to remove the static and ASSERTs from _mesa_mmx_blend_transparency since this function is called directly). I'll eventually make something very similar to this on the lines of the double blend trick in the C code, but it will stay on hold because I feel I have to dedicate myself to mach64 again. Regards, José Fonseca PS: I've CC'd to dri-devel because this subject was first raised on the DRI meeting, but I'm not sure if there is a real interest since most of its subscribers eventually subscribe mesa3d-dev as well... or am I wrong? PPS: I didn't name this thread Mesa software blending for the obvious reasons... :-) /* * Written by José Fonseca [EMAIL PROTECTED] */ #include matypes.h /* * make the following approximation to the division (Sree) * * rgb*a/255 ~= (rgb*(a+1)) 256 * * which is the fastest method that satisfies the following OpenGL criteria * * 0*0 = 0 and 255*255 = 255 * * note this one should be used alone */ #define GMBT_ALPHA_PLUS_ONE 0 /* * take the geometric series approximation to the division * * t/255 = (t 8) + (t 16) + (t 24) .. * * in this case just the first two terms to fit in 16bit arithmetic * * t/255 ~= (t + (t 8)) 8 * * note that just by itself it doesn't satisfies the OpenGL criteria, as 255*255 = 254, * so the special case a = 255 must be accounted or roundoff must be used */ #define GMBT_GEOMETRIC_SERIES 1 /* * when using a geometric series division instead of truncating the result * use roundoff in the approximation (Jim Blinn) * * t = rgb*a + 0x80 * * achieving the exact results */ #define GMBT_ROUNDOFF 1 /* * do * * s = (q - p)*a + q * * instead of * * s = p*a + q*(1-a) * * this eliminates a multiply at the expense of * complicating the roundoff but is generally worth it */ #define GMBT_SIGNED_ARITHMETIC 1 #if GMBT_ROUNDOFF SEG_DATA ALIGNDATA8 const_80: D_LONG 0x00800080, 0x00800080 #endif SEG_TEXT ALIGNTEXT16 GLOBL GLNAME(_mesa_mmx_blend_transparency) /* * void blend_transparency( GLcontext *ctx, * GLuint n, * const GLubyte mask[], * GLchan rgba[][4], * CONST GLchan dest[][4] ) * * Common transparency blending mode. */ GLNAME( _mesa_mmx_blend_transparency ): PUSH_L ( EBP ) MOV_L ( ESP, EBP ) PUSH_L ( ESI ) PUSH_L ( EDI ) PUSH_L ( EBX ) MOV_L ( REGOFF(12, EBP), ECX ) /* n */ CMP_L ( CONST(0), ECX) JE ( LLBL (GMBT_return) ) MOV_L ( REGOFF(16, EBP), EBX ) /* mask */ MOV_L ( REGOFF(20, EBP), EDI ) /* rgba */ MOV_L ( REGOFF(24, EBP), ESI ) /* dest */ TEST_L ( CONST(4), EDI ) /* align rgba on an 8-byte boundary */ JZ ( LLBL (GMBT_align_end) ) CMP_B ( CONST(0), REGIND(EBX) ) /* *mask == 0 */ JE ( LLBL (GMBT_align_continue) ) PXOR ( MM0, MM0 ) /* 0x | 0x | 0x | 0x */ MOVD ( REGIND(ESI), MM1 ) /* | | | | qa1 | qb1 | qg1 | qr1 */ MOVD ( REGIND(EDI), MM2 ) /* | | | | pa1 | pb1 | pg1 | pr1 */ PUNPCKLBW ( MM0, MM1 ) /*qa1|qb1|qg1|qr1*/ PUNPCKLBW ( MM0, MM2 ) /*pa1|pb1|pg1|pr1*/ MOVQ ( MM2, MM3 ) PUNPCKHWD ( MM3, MM3 ) /*pa1|pa1| | */ PUNPCKHDQ ( MM3, MM3 ) /*pa1|pa1|pa1|pa1*/ #if GMBT_ALPHA_PLUS_ONE PCMPEQW( MM4, MM4 ) /* 0x | 0x | 0x | 0x */ PSUBW ( MM4, MM3 ) /* pa1 + 1 | pa1 + 1 | pa1 + 1 | pa1 + 1 */ #endif #if GMBT_SIGNED_ARITHMETIC PSUBW ( MM1, MM2 ) /* pa1 - qa1 | pb1 - qb1 | pg1 - qg1 | pr1 - qr1 */ PSLLW ( CONST(8), MM1 ) /*q1 8*/ #if GMBT_ROUNDOFF MOVQ ( MM2, MM4 ) #endif PMULLW ( MM3, MM2 ) /* t1 = (q1 - p1)*pa1 */ #if GMBT_ROUNDOFF PSRLW ( CONST(15), MM4 ) /* q1 p1 ? 1 : 0 */ PSLLW