Re: R300 swizzle table
Okay i finaly came over a stupid bug (as all bugs are...). Thus i commited the table to r300 and here is what look like swizzle modified emit_arithm (there is some debug code to test swizzling)... Note that i changed pfs_reg_t thus swizzling is done in emit arith and note in t_src. This way we can have multiple constant as arg for emit_arith and then swizzling alloc copy const for us (have to add 7 native case to the table for that). If you think that i remove on important field in pfs_reg tell me. I am wondering if we can drop the valid field ? I haven't yet done indivual or global neg but as i said i think that the best solution is to first swizzle and then do a MAD t, -t, 1, 0 with appropriate write mask. Anyway once Keith commited your patch and you commited your change in r300, i will commit change to use table with individual neg support... Jerome Glisse typedef struct _pfs_reg_t { enum { REG_TYPE_INPUT, REG_TYPE_OUTPUT, REG_TYPE_TEMP, REG_TYPE_CONST } type:2; GLuint index:6; GLuint xyzw:12; GLuint negate:4; GLboolean has_w:1; GLboolean valid:1; } pfs_reg_t; GLuint swizzle( struct r300_fragment_program *rp, pfs_reg_t swz_src ) { GLuint src[3] = { 0, 0, 0 }; GLuint inst[4] = { 0, 0, 0, 0 }; GLuint i, xyz, w, j; pfs_reg_t tmp; switch (swz_src.type) { case REG_TYPE_INPUT: src[0] = rp-inputs[swz_src.index]; break; case REG_TYPE_TEMP: src[0] = rp-temps[swz_src.index]; src[0] = swz_src.index; rp-used_in_node |= (1 src[0]); break; case REG_TYPE_CONST: src[0] = swz_src.index; break; default: ERROR(invalid source reg\n); return 0; } /* Allocate temp reg for swizzling */ tmp = get_temp_reg(rp); src[1] = tmp.index; xyz = swz_src.xyzw 511; w = (swz_src.xyzw 9) 7; printf(w : %d\n,w); inst[2] = r300_swz_srca_mask[0][w] | (R300_FPI2_ARGA_ONE R300_FPI2_ARG1A_SHIFT) | (R300_FPI2_ARGA_ZERO R300_FPI2_ARG2A_SHIFT) | R300_FPI0_OUTC_MAD; inst[3] = src[0] | R300_FPI3_SRC1A_CONST | R300_FPI3_SRC2A_CONST | (src[1] R300_FPI3_DSTA_SHIFT); inst[3] |= R300_FPI3_DSTA_REG; for (i = 0; i r300_swizzle[xyz].length; i++) { inst[0] = r300_swizzle[xyz].inst[(i 1)]; inst[1] = r300_swizzle[xyz].inst[(i 1) + 1]; inst[1] |= src[r300_swizzle[xyz].src[i]]; inst[1] |= src[1] R300_FPI1_DSTC_SHIFT; rp-alu.inst[rp-v_pos].inst0 = inst[0]; rp-alu.inst[rp-v_pos].inst1 = inst[1]; rp-alu.inst[rp-s_pos].inst2 = inst[2]; rp-alu.inst[rp-s_pos].inst3 = inst[3]; rp-v_pos += 1; rp-s_pos += 1; j = rp-v_pos rp-s_pos ? rp-v_pos : rp-s_pos; if (j rp-alu.length) { rp-alu.length++; rp-node[rp-cur_node].alu_end++; } } return src[1]; } static void emit_arith( struct r300_fragment_program *rp, int op, pfs_reg_t dest, int mask, pfs_reg_t src0, pfs_reg_t src1, pfs_reg_t src2, int flags ) { pfs_reg_t src[3] = { src0, src1, src2 }; int hwdest, hwsrc[3]; int argc; int v_idx = rp-v_pos, s_idx = rp-s_pos; GLuint inst[4] = { 0, 0, 0, 0 }; GLuint srcc_mask, srca_mask; int i; pfs_reg_t tt_reg = get_temp_reg(rp); GLuint tt_id = tt_reg.index; /* check opcode */ if (op MAX_PFS_OP) { ERROR(unknown opcode!\n); return; } argc = r300_fpop[op].argc; /* grab hwregs of sources */ for (i=0;iargc;i++) { switch (src[i].type) { case REG_TYPE_INPUT: hwsrc[i] = rp-inputs[src[i].index]; break; case REG_TYPE_TEMP: hwsrc[i] = rp-temps[src[i].index]; rp-used_in_node |= (1 hwsrc[i]); break; case REG_TYPE_CONST: hwsrc[i] = src[i].index; break; default: ERROR(invalid source reg\n); return; } } /* grab hwregs of dest */ switch (dest.type) { case REG_TYPE_TEMP: hwdest =
Re: R300 swizzle table
On 5/22/05, Ben Skeggs [EMAIL PROTECTED] wrote: The reason I was doing swizzling in t_src is that some ARB_f_p opcodes aren't native on r300 and we need to emit multiple instuctions to emulate them (see LRP). If one of the sources used a non-native swizzle, we'd waste alu instructions re-doing the swizzle at each emit. A case where this may be very important is the SIN/COS instructions, a document in the Radeon SDK says that COS is 11 instructions.. Wasn't senzibilized to that, thus swizzing is better done in t_src i will change this (not a big change in term of code anyway) The most important thing missing is the v_cross/s_cross fields. These are used to say that the source swizzle depends on the result of the other instruction stream. ie. WZYW (v_cross=1), colour instruction depends on result of alpha instruction, XYZX (s_cross=1), alpha insn depends on result of colour instruction. WZYX (v_cross=1, s_cross=1), both depend on opposite stream. This allows for an extremely primitive form of instruction reordering so that we make use of the split xyz/w units, instead of leaving a whole load of NOPS when an ARB_f_p instruction only writes xyz or w. I haven't up time to dig enought your code thus at first i did not see the utility of that. Having that is effectively usefull... I may have to handle this in swizzling too (not too hard). The valid field comes in useful occasionally when testing some things. The has_w field was only used by my swizzling code to say whether or not the W coord had to be copied over to the resulting swizzle, so you could probably drop that if you don't need it for your code. I will see, this may be usefull if we don't want w than i don't emit nop w instruction (right now i doing useless function in w stream). I haven't yet done indivual or global neg but as i said i think that the best solution is to first swizzle and then do a MAD t, -t, 1, 0 with appropriate write mask. Anyway once Keith commited your patch and you commited your change in r300, i will commit change to use table with individual neg support... Cool. I'll have a closer look at your code when I get home again in 12 or so hours. Okay, tell me if you find anything. Jerome Glisse --- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_idt12alloc_id344op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: R300 swizzle table
Jerome Glisse wrote: Okay i finaly came over a stupid bug (as all bugs are...). Thus i commited the table to r300 and here is what look like swizzle modified emit_arithm (there is some debug code to test swizzling)... Note that i changed pfs_reg_t thus swizzling is done in emit arith and note in t_src. This way we can have multiple constant as arg for emit_arith and then swizzling alloc copy const for us (have to add 7 native case to the table for that). The reason I was doing swizzling in t_src is that some ARB_f_p opcodes aren't native on r300 and we need to emit multiple instuctions to emulate them (see LRP). If one of the sources used a non-native swizzle, we'd waste alu instructions re-doing the swizzle at each emit. A case where this may be very important is the SIN/COS instructions, a document in the Radeon SDK says that COS is 11 instructions.. Also, TEX sources can be swizzled. So putting swizzling/negation into t_src made sense in my mind. If you think that i remove on important field in pfs_reg tell me. I am wondering if we can drop the valid field ? The most important thing missing is the v_cross/s_cross fields. These are used to say that the source swizzle depends on the result of the other instruction stream. ie. WZYW (v_cross=1), colour instruction depends on result of alpha instruction, XYZX (s_cross=1), alpha insn depends on result of colour instruction. WZYX (v_cross=1, s_cross=1), both depend on opposite stream. This allows for an extremely primitive form of instruction reordering so that we make use of the split xyz/w units, instead of leaving a whole load of NOPS when an ARB_f_p instruction only writes xyz or w. The valid field comes in useful occasionally when testing some things. The has_w field was only used by my swizzling code to say whether or not the W coord had to be copied over to the resulting swizzle, so you could probably drop that if you don't need it for your code. I haven't yet done indivual or global neg but as i said i think that the best solution is to first swizzle and then do a MAD t, -t, 1, 0 with appropriate write mask. Anyway once Keith commited your patch and you commited your change in r300, i will commit change to use table with individual neg support... Cool. I'll have a closer look at your code when I get home again in 12 or so hours. Cheers, Ben Skeggs. Jerome Glisse typedef struct _pfs_reg_t { enum { REG_TYPE_INPUT, REG_TYPE_OUTPUT, REG_TYPE_TEMP, REG_TYPE_CONST } type:2; GLuint index:6; GLuint xyzw:12; GLuint negate:4; GLboolean has_w:1; GLboolean valid:1; } pfs_reg_t; GLuint swizzle( struct r300_fragment_program *rp, pfs_reg_t swz_src ) { GLuint src[3] = { 0, 0, 0 }; GLuint inst[4] = { 0, 0, 0, 0 }; GLuint i, xyz, w, j; pfs_reg_t tmp; switch (swz_src.type) { case REG_TYPE_INPUT: src[0] = rp-inputs[swz_src.index]; break; case REG_TYPE_TEMP: src[0] = rp-temps[swz_src.index]; src[0] = swz_src.index; rp-used_in_node |= (1 src[0]); break; case REG_TYPE_CONST: src[0] = swz_src.index; break; default: ERROR(invalid source reg\n); return 0; } /* Allocate temp reg for swizzling */ tmp = get_temp_reg(rp); src[1] = tmp.index; xyz = swz_src.xyzw 511; w = (swz_src.xyzw 9) 7; printf(w : %d\n,w); inst[2] = r300_swz_srca_mask[0][w] | (R300_FPI2_ARGA_ONE R300_FPI2_ARG1A_SHIFT) | (R300_FPI2_ARGA_ZERO R300_FPI2_ARG2A_SHIFT) | R300_FPI0_OUTC_MAD; inst[3] = src[0] | R300_FPI3_SRC1A_CONST | R300_FPI3_SRC2A_CONST | (src[1] R300_FPI3_DSTA_SHIFT); inst[3] |= R300_FPI3_DSTA_REG; for (i = 0; i r300_swizzle[xyz].length; i++) { inst[0] = r300_swizzle[xyz].inst[(i 1)]; inst[1] = r300_swizzle[xyz].inst[(i 1) + 1]; inst[1] |= src[r300_swizzle[xyz].src[i]]; inst[1] |= src[1] R300_FPI1_DSTC_SHIFT; rp-alu.inst[rp-v_pos].inst0 = inst[0]; rp-alu.inst[rp-v_pos].inst1 = inst[1]; rp-alu.inst[rp-s_pos].inst2 = inst[2]; rp-alu.inst[rp-s_pos].inst3 = inst[3]; rp-v_pos += 1; rp-s_pos += 1; j = rp-v_pos rp-s_pos ? rp-v_pos : rp-s_pos; if (j rp-alu.length) { rp-alu.length++; rp-node[rp-cur_node].alu_end++; } } return src[1]; } static void emit_arith( struct r300_fragment_program *rp,
Re: R300 swizzle table
On 5/21/05, Ben Skeggs [EMAIL PROTECTED] wrote: Hello, I see what you mean about the tables not taking much space at all :) The version i send is not the one i was working on, like Vladimir said macro came handy with such things... Right now i am including my table with your code. I got to change pfs_reg struct so that it contains xyzw swizzle info and no s_swz (don't recall the name you give). I have also added support for 0, 1 as component after rereading the spec i see that some one may ask a such swizzle (a pain for us). Thus my table is a little bit bigger 216 entry but not so much. One nice thing i came up for indivual negate is the following : you got your swizzle positive stuff in t then you do : MAD t, t, -1, 0 with an output mask setting only place where you want a negate component. Anyother idea about this negate stuff ? Making a table for each swizzle/negate combinaison could be generated but then the table became quite big (1728 entry i think). With this solution you only loose 1 instruction against a totaly optimized one, doesn't sound too bad as we shouldn't have often a fragment program doing 10 thousand swizzle component negate... So, each entry in the table describes the instructions needed to get the desired swizzle? Sounds like a workable idea. Yes, this is what i meant The diff I posted to dri_devel had some bugs that showed up in UT2004, and there's some missing support in Mesa. I've sent a patch to Keith that he'll hopefully apply, and then UT2004 should render very well. I'll probably commit my code to cvs once the changes have been made to Mesa, and note in the cvs log that the swizzling code will be changed at a later date. Does that sound okay? Okay i will probably finish my cleanup and my change over your version then i will rediff against lastest cvs. Also, while I was debugging some problems in ut2004, I noticed that it re-uses the same few programs over and over, but they are translated each time. I'm thinking about adding a cache for the last 5 or so texenv programs so that we don't need to translated all the time. Should get a nice speedup in the more complex areas. Any thoughts on this? ut2004 has a bad ogl attitude if so, (don't have it as i don't think there is a PPC linux version :)) But yes caching program could be usefull. Moreover IIRC r300 can have 2 fragment program in memory ? Anyway the issue here is to find a way to identify a program or i may miss something in mesa that can give us this information :) Jerome Glisse --- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_idt12alloc_id344op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: R300 swizzle table
On Saturday 21 May 2005 17:42, Jerome Glisse wrote: On 5/21/05, Ben Skeggs [EMAIL PROTECTED] wrote: Also, while I was debugging some problems in ut2004, I noticed that it re-uses the same few programs over and over, but they are translated each time. I'm thinking about adding a cache for the last 5 or so texenv programs so that we don't need to translated all the time. Should get a nice speedup in the more complex areas. Any thoughts on this? That would mean that either ut2004 rewrites different TexEnv settings multiple times between rendering calls, or the Mesa core fails to detect some redundant state setting. ut2004 has a bad ogl attitude if so, (don't have it as i don't think there is a PPC linux version :)) But yes caching program could be usefull. Moreover IIRC r300 can have 2 fragment program in memory ? Well, there are 64 slots for ALU instructions, and it seems to be possible to set pretty arbitrary program start offsets. So you could write two programs' ALU instructions into the chip at the same time, but I don't think you can do the same for TEX instructions, so it has very limited usability. cu, Nicolai pgpoHGLkAguD0.pgp Description: PGP signature