Am Freitag, den 06.01.2006, 17:22 +0100 schrieb Rune Petersen:
> - Missing Commit from r300.sf.net:
> When trying to Implement the KIL ops I found a commit by Ben Skeggs on 
> r300.sf.net that was lost in the Mesa tree:
> 
> http://sourceforge.net/mailarchive/forum.php?thread_id=7728162&forum_id=42268
> 
> At the very least the changes for r300_reg.h should be included in Mesa.
That commit fixed quite a few issues, and added all the remaining
opcodes with the exception of the trig opcodes, and LIT.  Though, there
are a few things I'd like to clean up in that code soon.  Also, The
depth-write support in there didn't work at all if I recall.

> 
> - whats with the DP3 op?
> 
>       if (fpi->DstReg.WriteMask & WRITEMASK_W) {
>               /* I assume these need to share the same alu slot */
>               sync_streams(rp);
>               emit_arith(rp, PFS_OP_DP4, dest, WRITEMASK_W,
>                       pfs_zero, pfs_zero, pfs_zero,
>                       flags);
>       }
>       emit_arith(rp, PFS_OP_DP3, t_dst(rp, fpi->DstReg),
>               fpi->DstReg.WriteMask & WRITEMASK_XYZ,
>               t_src(rp, fpi->SrcReg[0]),
>               t_src(rp, fpi->SrcReg[1]),
>               pfs_zero, flags);
Okay, if I recall this properly, it would appear to be correct.  Though,
some other parts of the code in Mesa CVS may cause it to fail in some
circumstances.  In my tests, the code from r300.sf.net CVS seemed to
produce the correct output for DP3.

> 
> Why is DP4 called for W and why does DP3 excluding W?
The programmable fragment unit on r300 is split into two separate units.
One of them performs operations on the XYZ components of a register, the
other on the W component.  However, there needs to be interaction
between the two units.  An ALU instruction looks roughly like this:

        FPI0: XYZ opcode + input swizzling
        FPI1: XYZ register selection (inputs and output)
        FPI2: W opcode + input swizzling
        FPI3: W register selection (inputs and output)

So, when you tell r300 to perform a DP4 operation in the W unit.  It
takes the XYZ components from the registers mentioned in FPI1, and the W
component from the registers mentioned in FPI3.

Thus, you get the following for the DP4 in the code fragment above:
    dot = (X from FPI1) + (Y from FPI1) + (Z from FPI1) + (W from FPI3)
        = (X from FPI1) + (Y from FPI1) + (Z from FPI1) + (0 * 0)
                        .. because all FPI3 is pfs_zero...^^^^^^^
        = the same result as the DP3 in the XYZ unit.

That's also why the DP3 and DP4 are forced into the same ALU slot as per
the comment.  It made register selection a little easier to deal with.

Sorry for the bad explanation, and some of it is possibly incorrect.
It's been a while since I looked at it.  For even more information,
check out the comments in r300_reg.h above the fragment program stuff,
it explains this a little more in depth.

Cheers,
Ben Skeggs.

> I don't see how it can conform to the specs:
> 
>        tmp0 = VectorLoad(op0);
>        tmp1 = VectorLoad(op1);
>        dot = (tmp0.x * tmp1.x) + (tmp0.y * tmp1.y) + (tmp0.z * tmp1.z);
>        result.x = dot;
>        result.y = dot;
>        result.z = dot;
>        result.w = dot;


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to