Re: R300 swizzle table

2005-05-22 Thread Jerome Glisse
Okay i finaly came over a stupid bug (as all bugs are...).
Thus i commited the table to r300 and here is what look
like swizzle  modified emit_arithm (there is some debug
code to test swizzling)...

Note that i changed pfs_reg_t thus swizzling is done
in emit arith and note in t_src. This way we can have
multiple constant as arg for emit_arith and then swizzling
alloc  copy const for us (have to add 7 native case to
the table for that).

If you think that i remove on important field in
pfs_reg tell me. I am wondering if we can drop
the valid field ?

I haven't yet done indivual or global neg but as i said
i think that the best solution is to first swizzle and then
do a 
MAD t, -t, 1, 0 with appropriate write mask.

Anyway once Keith commited your patch and you
commited your change in r300, i will commit change
to use table with individual neg support...

Jerome Glisse

typedef struct _pfs_reg_t {
enum {
REG_TYPE_INPUT,
REG_TYPE_OUTPUT,
REG_TYPE_TEMP,
REG_TYPE_CONST
} type:2;
GLuint index:6;
GLuint xyzw:12;
GLuint negate:4;
GLboolean has_w:1;
GLboolean valid:1;
} pfs_reg_t;


GLuint swizzle( struct r300_fragment_program *rp,
 pfs_reg_t swz_src )
{
GLuint src[3] = { 0, 0, 0 };
GLuint inst[4] = { 0, 0, 0, 0 };
GLuint i, xyz, w, j;
pfs_reg_t tmp;

switch (swz_src.type) {
case REG_TYPE_INPUT:
src[0] = rp-inputs[swz_src.index];
break;
case REG_TYPE_TEMP:
src[0] = rp-temps[swz_src.index];
src[0] = swz_src.index;
rp-used_in_node |= (1  src[0]);
break;
case REG_TYPE_CONST:
src[0] = swz_src.index;
break;
default:
ERROR(invalid source reg\n);
return 0;
}

/* Allocate temp reg for swizzling */
tmp = get_temp_reg(rp);
src[1] = tmp.index;

xyz = swz_src.xyzw  511;
w = (swz_src.xyzw  9)  7;


printf(w  : %d\n,w);
inst[2] = r300_swz_srca_mask[0][w] |
(R300_FPI2_ARGA_ONE   R300_FPI2_ARG1A_SHIFT) |
(R300_FPI2_ARGA_ZERO  R300_FPI2_ARG2A_SHIFT) |
R300_FPI0_OUTC_MAD;
inst[3] = src[0] |
R300_FPI3_SRC1A_CONST |
R300_FPI3_SRC2A_CONST |
(src[1]  R300_FPI3_DSTA_SHIFT);
inst[3] |= R300_FPI3_DSTA_REG;

for (i = 0; i  r300_swizzle[xyz].length; i++) {
inst[0]  = r300_swizzle[xyz].inst[(i  1)];
inst[1]  = r300_swizzle[xyz].inst[(i  1) + 1];
inst[1] |= src[r300_swizzle[xyz].src[i]];
inst[1] |= src[1]  R300_FPI1_DSTC_SHIFT;

rp-alu.inst[rp-v_pos].inst0 = inst[0];
rp-alu.inst[rp-v_pos].inst1 = inst[1];
rp-alu.inst[rp-s_pos].inst2 = inst[2];
rp-alu.inst[rp-s_pos].inst3 = inst[3];
rp-v_pos += 1;
rp-s_pos += 1;

j = rp-v_pos  rp-s_pos ? rp-v_pos : rp-s_pos;
if (j  rp-alu.length) {
rp-alu.length++;
rp-node[rp-cur_node].alu_end++;
}
}

return src[1];
}



static void emit_arith( struct r300_fragment_program *rp,
int op,
pfs_reg_t dest,
int mask,
pfs_reg_t src0,
pfs_reg_t src1,
pfs_reg_t src2,
int flags )
{
pfs_reg_t src[3] = { src0, src1, src2 };
int hwdest, hwsrc[3];
int argc;
int v_idx = rp-v_pos, s_idx = rp-s_pos;
GLuint inst[4] = { 0, 0, 0, 0 };
GLuint srcc_mask, srca_mask;
int i;

pfs_reg_t tt_reg = get_temp_reg(rp);
GLuint tt_id = tt_reg.index;

/* check opcode */
if (op  MAX_PFS_OP) {
ERROR(unknown opcode!\n);
return;
}
argc = r300_fpop[op].argc;

/* grab hwregs of sources */
for (i=0;iargc;i++) {
switch (src[i].type) {
case REG_TYPE_INPUT:
hwsrc[i] = rp-inputs[src[i].index];
break;
case REG_TYPE_TEMP:
hwsrc[i] = rp-temps[src[i].index];
rp-used_in_node |= (1  hwsrc[i]);
break;
case REG_TYPE_CONST:
hwsrc[i] = src[i].index;
break;
default:
ERROR(invalid source reg\n);
return;
}
}

/* grab hwregs of dest */
switch (dest.type) {
case REG_TYPE_TEMP:
hwdest = 

Re: R300 swizzle table

2005-05-22 Thread Jerome Glisse
On 5/22/05, Ben Skeggs [EMAIL PROTECTED] wrote:
 The reason I was doing swizzling in t_src is that some ARB_f_p opcodes
 aren't
 native on r300 and we need to emit multiple instuctions to emulate them
 (see LRP).
 If one of the sources used a non-native swizzle, we'd waste alu
 instructions re-doing
 the swizzle at each emit.  A case where this may be very important is
 the SIN/COS
 instructions, a document in the Radeon SDK says that COS is 11
 instructions..

Wasn't senzibilized to that, thus swizzing is better done in t_src i will
change this (not a big change in term of code anyway)
 
 The most important thing missing is the v_cross/s_cross fields.  These
 are used to
 say that the source swizzle depends on the result of the other
 instruction stream.
 ie. WZYW (v_cross=1), colour instruction depends on result of alpha
 instruction,
 XYZX (s_cross=1), alpha insn depends on result of colour instruction.
 WZYX (v_cross=1,
 s_cross=1), both depend on opposite stream.
 
 This allows for an extremely primitive form of instruction reordering so
 that we make
 use of the split xyz/w units, instead of leaving a whole load of NOPS
 when an ARB_f_p
 instruction only writes xyz or w.

I haven't up time to dig enought your code thus at first i did
not see the utility of that. Having that is effectively usefull...
I may have to handle this in swizzling too (not too hard).
 
 The valid field comes in useful occasionally when testing some things.
 The has_w field
 was only used by my swizzling code to say whether or not the W coord had
 to be copied
 over to the resulting swizzle, so you could probably drop that if you
 don't need it for
 your code.

I will see, this may be usefull if we don't want w than i don't emit
nop w instruction (right now i doing useless function in w stream). 

 I haven't yet done indivual or global neg but as i said
 i think that the best solution is to first swizzle and then
 do a
 MAD t, -t, 1, 0 with appropriate write mask.
 
 Anyway once Keith commited your patch and you
 commited your change in r300, i will commit change
 to use table with individual neg support...
 
 
 Cool.  I'll have a closer look at your code when I get home again in 12
 or so hours.

Okay, tell me if you find anything.

Jerome Glisse


---
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_idt12alloc_id344op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300 swizzle table

2005-05-22 Thread Ben Skeggs

Jerome Glisse wrote:


Okay i finaly came over a stupid bug (as all bugs are...).
Thus i commited the table to r300 and here is what look
like swizzle  modified emit_arithm (there is some debug
code to test swizzling)...

Note that i changed pfs_reg_t thus swizzling is done
in emit arith and note in t_src. This way we can have
multiple constant as arg for emit_arith and then swizzling
alloc  copy const for us (have to add 7 native case to
the table for that).
 

The reason I was doing swizzling in t_src is that some ARB_f_p opcodes 
aren't
native on r300 and we need to emit multiple instuctions to emulate them 
(see LRP).
If one of the sources used a non-native swizzle, we'd waste alu 
instructions re-doing
the swizzle at each emit.  A case where this may be very important is 
the SIN/COS
instructions, a document in the Radeon SDK says that COS is 11 
instructions..


Also, TEX sources can be swizzled.  So putting swizzling/negation into 
t_src made sense

in my mind.


If you think that i remove on important field in
pfs_reg tell me. I am wondering if we can drop
the valid field ?
 

The most important thing missing is the v_cross/s_cross fields.  These 
are used to
say that the source swizzle depends on the result of the other 
instruction stream.
ie. WZYW (v_cross=1), colour instruction depends on result of alpha 
instruction,
XYZX (s_cross=1), alpha insn depends on result of colour instruction.  
WZYX (v_cross=1,

s_cross=1), both depend on opposite stream.

This allows for an extremely primitive form of instruction reordering so 
that we make
use of the split xyz/w units, instead of leaving a whole load of NOPS 
when an ARB_f_p

instruction only writes xyz or w.

The valid field comes in useful occasionally when testing some things.  
The has_w field
was only used by my swizzling code to say whether or not the W coord had 
to be copied
over to the resulting swizzle, so you could probably drop that if you 
don't need it for

your code.


I haven't yet done indivual or global neg but as i said
i think that the best solution is to first swizzle and then
do a 
MAD t, -t, 1, 0 with appropriate write mask.


Anyway once Keith commited your patch and you
commited your change in r300, i will commit change
to use table with individual neg support...
 

Cool.  I'll have a closer look at your code when I get home again in 12 
or so hours.


Cheers,
Ben Skeggs.


Jerome Glisse

typedef struct _pfs_reg_t {
enum {
REG_TYPE_INPUT,
REG_TYPE_OUTPUT,
REG_TYPE_TEMP,
REG_TYPE_CONST
} type:2;
GLuint index:6;
GLuint xyzw:12;
GLuint negate:4;
GLboolean has_w:1;
GLboolean valid:1;
} pfs_reg_t;


GLuint swizzle( struct r300_fragment_program *rp,
 pfs_reg_t swz_src )
{
GLuint src[3] = { 0, 0, 0 };
GLuint inst[4] = { 0, 0, 0, 0 };
GLuint i, xyz, w, j;
pfs_reg_t tmp;

switch (swz_src.type) {
case REG_TYPE_INPUT:
src[0] = rp-inputs[swz_src.index];
break;
case REG_TYPE_TEMP:
src[0] = rp-temps[swz_src.index];
src[0] = swz_src.index;
rp-used_in_node |= (1  src[0]);
break;
case REG_TYPE_CONST:
src[0] = swz_src.index;
break;
default:
ERROR(invalid source reg\n);
return 0;
}

/* Allocate temp reg for swizzling */
tmp = get_temp_reg(rp);
src[1] = tmp.index;

xyz = swz_src.xyzw  511;
w = (swz_src.xyzw  9)  7;


printf(w  : %d\n,w);
inst[2] = r300_swz_srca_mask[0][w] |
(R300_FPI2_ARGA_ONE   R300_FPI2_ARG1A_SHIFT) |
(R300_FPI2_ARGA_ZERO  R300_FPI2_ARG2A_SHIFT) |
R300_FPI0_OUTC_MAD;
inst[3] = src[0] |
R300_FPI3_SRC1A_CONST |
R300_FPI3_SRC2A_CONST |
(src[1]  R300_FPI3_DSTA_SHIFT);
inst[3] |= R300_FPI3_DSTA_REG;

for (i = 0; i  r300_swizzle[xyz].length; i++) {
inst[0]  = r300_swizzle[xyz].inst[(i  1)];
inst[1]  = r300_swizzle[xyz].inst[(i  1) + 1];
inst[1] |= src[r300_swizzle[xyz].src[i]];
inst[1] |= src[1]  R300_FPI1_DSTC_SHIFT;

rp-alu.inst[rp-v_pos].inst0 = inst[0];
rp-alu.inst[rp-v_pos].inst1 = inst[1];
rp-alu.inst[rp-s_pos].inst2 = inst[2];
rp-alu.inst[rp-s_pos].inst3 = inst[3];
rp-v_pos += 1;
rp-s_pos += 1;

j = rp-v_pos  rp-s_pos ? rp-v_pos : rp-s_pos;
if (j  rp-alu.length) {
rp-alu.length++;
rp-node[rp-cur_node].alu_end++;
}
}

return src[1];
}



static void emit_arith( struct r300_fragment_program *rp,
 

Re: R300 swizzle table

2005-05-21 Thread Jerome Glisse
On 5/21/05, Ben Skeggs [EMAIL PROTECTED] wrote:
 Hello,
 
 I see what you mean about the tables not taking much space at all :)

The version i send is not the one i was working on, like Vladimir said
macro came handy with such things... Right now i am including my
table with your code. I got to change pfs_reg struct so that
it contains xyzw swizzle info and no s_swz (don't recall the
name you give).

I have also added support for 0, 1 as component after rereading
the spec i see that some one may ask a such swizzle (a pain
for us). Thus my table is a little bit bigger 216 entry but not so
much.

One nice thing i came up for indivual negate is the following :
you got your swizzle positive stuff in t then you do :
MAD t, t, -1, 0 with an output mask setting only place where
you want a negate component.

Anyother idea about this negate stuff ? Making a table for
each swizzle/negate combinaison could be generated but
then the table became quite big (1728 entry i think).

With this solution you only loose 1 instruction against
a totaly optimized one, doesn't sound too bad as we
shouldn't have often a fragment program doing 10 thousand
swizzle component negate...

 So, each entry in the table describes the instructions needed to get the
 desired swizzle?  Sounds like a workable idea.

Yes, this is what i meant
 
 The diff I posted to dri_devel had some bugs that showed up in UT2004, and
 there's some missing support in Mesa.  I've sent a patch to Keith that he'll
 hopefully apply, and then UT2004 should render very well.
 
 I'll probably commit my code to cvs once the changes have been made to
 Mesa, and note in the cvs log that the swizzling code will be changed at
 a later date.  Does that sound okay?

Okay i will probably finish my cleanup and my change over your version
then i will rediff against lastest cvs.

 Also, while I was debugging some problems in ut2004, I noticed that it
 re-uses
 the same few programs over and over, but they are translated each time.  I'm
 thinking about adding a cache for the last 5 or so texenv programs so
 that we
 don't need to translated all the time.  Should get a nice speedup in the
 more
 complex areas.  Any thoughts on this?

ut2004 has a bad ogl attitude if so,  (don't have it as i don't think there is
a PPC linux version :)) But yes caching program could be usefull. Moreover
IIRC r300 can have 2 fragment program in memory ?

Anyway the issue here is to find a way to identify a program or i may miss
something in mesa that can give us this information :)

Jerome Glisse


---
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_idt12alloc_id344op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300 swizzle table

2005-05-21 Thread Nicolai Haehnle
On Saturday 21 May 2005 17:42, Jerome Glisse wrote:
 On 5/21/05, Ben Skeggs [EMAIL PROTECTED] wrote:
  Also, while I was debugging some problems in ut2004, I noticed that it
  re-uses
  the same few programs over and over, but they are translated each time.  
I'm
  thinking about adding a cache for the last 5 or so texenv programs so
  that we
  don't need to translated all the time.  Should get a nice speedup in the
  more
  complex areas.  Any thoughts on this?

That would mean that either ut2004 rewrites different TexEnv settings 
multiple times between rendering calls, or the Mesa core fails to detect 
some redundant state setting.

 ut2004 has a bad ogl attitude if so,  (don't have it as i don't think 
there is
 a PPC linux version :)) But yes caching program could be usefull. Moreover
 IIRC r300 can have 2 fragment program in memory ?

Well, there are 64 slots for ALU instructions, and it seems to be possible 
to set pretty arbitrary program start offsets. So you could write two 
programs' ALU instructions into the chip at the same time, but I don't 
think you can do the same for TEX instructions, so it has very limited 
usability.

cu,
Nicolai


pgpoHGLkAguD0.pgp
Description: PGP signature