Re: [PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-18 Thread Nicolai Haehnle

I just realized I didn't send it to the list:

There was yet another problem with reordering of instructions. The
attached patch (which is against my earlier patch) should fix this.

~Nicolai


On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:

Another thought; the same changed are probably needed for the vertprog code. I
think there are also a lot of bugs there.


On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:
 This patch seems to break one of my longer fragment programs. I believe this
 is
 because it's running out of registers, but I haven't looked into it in
 detail
 yet.

 I think this patch should be committed, but directly followed by a patch to
 reduce the number of registers used.


 On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
  There were a number of bugs related to the pairing of vector and
  scalar operations where swizzles ended up using the wrong source
  register, or an instruction was moved forward and ended up overwriting
  an aliased register.
 
  The new algorithm for register allocation is slightly conservative and
  may run out of registers before it's strictly necessary. On the plus
  side, it Just Works.
 
  Pairing of instructions is done whenever possible, and in more cases
  than before, so in practice this change should be a net win.
 
  The patch mostly fixes glean/texCombine. One remaining problem is that
  the code duplicates constants and parameters all over the place and
  therefore quickly runs out of resources and falls back to software.
  I'm going to look into that as well.
 
  Please test and commit this patch. If you notice any regressions,
  please tell me (but the tests are looking good).
 
  ~Nicolai
 


commit 1ec4703585171f504180425b65dfab92be2a7782
Author: Nicolai Haehnle [EMAIL PROTECTED]
Date:   Sun Mar 18 13:29:18 2007 +0100

r300: Fix fragment program reordering

Do not move an instruction that writes to a temp forward past an instruction
that reads the same temporary.

diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index bc43953..29436ab 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -674,6 +674,11 @@ struct reg_lifetime {
 	   emitted instruction that writes to the register */
 	int vector_valid;
 	int scalar_valid;
+	
+	/* Index to the slot where the register was last read.
+	   This is also the first slot in which the register may be written again */
+	int vector_lastread;
+	int scalar_lastread;
 };
 
 
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 3c54830..89e9f65 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -1026,10 +1026,11 @@ static void emit_tex(struct r300_fragment_program *rp,
  */
 static int get_earliest_allowed_write(
 		struct r300_fragment_program* rp,
-		GLuint dest)
+		GLuint dest, int mask)
 {
 	COMPILE_STATE;
 	int idx;
+	int pos;
 	GLuint index = REG_GET_INDEX(dest);
 	assert(REG_GET_VALID(dest));
 
@@ -1047,7 +1048,17 @@ static int get_earliest_allowed_write(
 			return 0;
 	}
 	
-	return cs-hwtemps[idx].reserved;
+	pos = cs-hwtemps[idx].reserved;
+	if (mask  WRITEMASK_XYZ) {
+		if (pos  cs-hwtemps[idx].vector_lastread)
+			pos = cs-hwtemps[idx].vector_lastread;
+	}
+	if (mask  WRITEMASK_W) {
+		if (pos  cs-hwtemps[idx].scalar_lastread)
+			pos = cs-hwtemps[idx].scalar_lastread;
+	}
+	
+	return pos;
 }
 
 
@@ -1070,7 +1081,8 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 		GLboolean emit_sop,
 		int argc,
 		GLuint* src,
-		GLuint dest)
+		GLuint dest,
+		int mask)
 {
 	COMPILE_STATE;
 	int hwsrc[3];
@@ -1092,7 +1104,7 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 	if (emit_sop)
 		used |= SLOT_OP_SCALAR;
 	
-	pos = get_earliest_allowed_write(rp, dest);
+	pos = get_earliest_allowed_write(rp, dest, mask);
 	
 	if (rp-node[rp-cur_node].alu_offset  pos)
 		pos = rp-node[rp-cur_node].alu_offset;
@@ -1191,6 +1203,21 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 		cs-slot[pos].ssrc[i] = tempssrc[i];
 	}
 	
+	for(i = 0; i  argc; ++i) {
+		if (REG_GET_TYPE(src[i]) == REG_TYPE_TEMP) {
+			int regnr = hwsrc[i]  31;
+			
+			if (used  (SLOT_SRC_VECTOR  i)) {
+if (cs-hwtemps[regnr].vector_lastread  pos)
+	cs-hwtemps[regnr].vector_lastread = pos;
+			}
+			if (used  (SLOT_SRC_SCALAR  i)) {
+if (cs-hwtemps[regnr].scalar_lastread  pos)
+	cs-hwtemps[regnr].scalar_lastread = pos;
+			}
+		}
+	}
+	
 	// Emit the source fetch code
 	rp-alu.inst[pos].inst1 = ~R300_FPI1_SRC_MASK;
 	rp-alu.inst[pos].inst1 |=
@@ -1287,7 +1314,7 @@ static void emit_arith(struct r300_fragment_program *rp,
 	if ((mask  WRITEMASK_W) || vop == R300_FPI0_OUTC_REPL_ALPHA)
 		emit_sop = GL_TRUE;
 
-	pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest);
+	pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest, 

Re: [PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-18 Thread Oliver McFadden
On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
 I just realized I didn't send it to the list:

 There was yet another problem with reordering of instructions. The
 attached patch (which is against my earlier patch) should fix this.

I can confirm this fixes my problems with the first patch. :)

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-17 Thread Nicolai Haehnle

There were a number of bugs related to the pairing of vector and
scalar operations where swizzles ended up using the wrong source
register, or an instruction was moved forward and ended up overwriting
an aliased register.

The new algorithm for register allocation is slightly conservative and
may run out of registers before it's strictly necessary. On the plus
side, it Just Works.

Pairing of instructions is done whenever possible, and in more cases
than before, so in practice this change should be a net win.

The patch mostly fixes glean/texCombine. One remaining problem is that
the code duplicates constants and parameters all over the place and
therefore quickly runs out of resources and falls back to software.
I'm going to look into that as well.

Please test and commit this patch. If you notice any regressions,
please tell me (but the tests are looking good).

~Nicolai
diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index bd9ed6f..bc43953 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -647,38 +647,84 @@ struct r300_vertex_program_cont {
 #define PFS_NUM_TEMP_REGS	32
 #define PFS_NUM_CONST_REGS	16
 
-/* Tracking data for Mesa registers */
+/* Mapping Mesa registers to R300 temporaries */
 struct reg_acc {
int reg;/* Assigned hw temp */
unsigned int refcount; /* Number of uses by mesa program */
 };
 
+/**
+ * Describe the current lifetime information for an R300 temporary
+ */
+struct reg_lifetime {
+	/* Index of the first slot where this register is free in the sense
+	   that it can be used as a new destination register.
+	   This is -1 if the register has been assigned to a Mesa register
+	   and the last access to the register has not yet been emitted */
+	int free;
+	
+	/* Index of the first slot where this register is currently reserved.
+	   This is used to stop e.g. a scalar operation from being moved
+	   before the allocation time of a register that was first allocated
+	   for a vector operation. */
+	int reserved;
+	
+	/* Index of the first slot in which the register can be used as a
+	   source without losing the value that is written by the last
+	   emitted instruction that writes to the register */
+	int vector_valid;
+	int scalar_valid;
+};
+
+
+/**
+ * Store usage information about an ALU instruction slot during the
+ * compilation of a fragment program.
+ */
+#define SLOT_SRC_VECTOR  (10)
+#define SLOT_SRC_SCALAR  (13)
+#define SLOT_SRC_BOTH(SLOT_SRC_VECTOR | SLOT_SRC_SCALAR)
+#define SLOT_OP_VECTOR   (116)
+#define SLOT_OP_SCALAR   (117)
+#define SLOT_OP_BOTH (SLOT_OP_VECTOR | SLOT_OP_SCALAR)
+
+struct r300_pfs_compile_slot {
+	/* Bitmask indicating which parts of the slot are used, using SLOT_ constants 
+	   defined above */
+	unsigned int used;
+
+	/* Selected sources */
+	int vsrc[3];
+	int ssrc[3];
+};
+
+/**
+ * Store information during compilation of fragment programs.
+ */
 struct r300_pfs_compile_state {
-   int v_pos, s_pos;   /* highest ALU slots used */
-
-   /* Track some information gathered during opcode
-* construction.
-* 
-* NOTE: Data is only set by the code, and isn't used yet.
-*/
-   struct {
-   int vsrc[3];
-   int ssrc[3];
-   int umask;
-   } slot[PFS_MAX_ALU_INST];
-
-   /* Used to map Mesa's inputs/temps onto hardware temps */
-   int temp_in_use;
-   struct reg_acc temps[PFS_NUM_TEMP_REGS];
-   struct reg_acc inputs[32]; /* don't actually need 32... */
-
-   /* Track usage of hardware temps, for register allocation,
-* indirection detection, etc. */
-   int hwreg_in_use;
-   GLuint used_in_node;
-   GLuint dest_in_node;
+	int nrslots;   /* number of ALU slots used so far */
+	
+	/* Track which (parts of) slots are already filled with instructions */
+	struct r300_pfs_compile_slot slot[PFS_MAX_ALU_INST];
+	
+	/* Track the validity of R300 temporaries */
+	struct reg_lifetime hwtemps[PFS_NUM_TEMP_REGS];
+	
+	/* Used to map Mesa's inputs/temps onto hardware temps */
+	int temp_in_use;
+	struct reg_acc temps[PFS_NUM_TEMP_REGS];
+	struct reg_acc inputs[32]; /* don't actually need 32... */
+	
+	/* Track usage of hardware temps, for register allocation,
+	 * indirection detection, etc. */
+	GLuint used_in_node;
+	GLuint dest_in_node;
 };
 
+/**
+ * Store everything about a fragment program that is needed
+ * to render with that program.
+ */
 struct r300_fragment_program {
 	struct gl_fragment_program mesa_program;
 
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 251fd26..b2c89cc 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -94,8 +94,9 @@
 #define REG_NEGV_SHIFT		18
 #define REG_NEGS_SHIFT		19
 #define REG_ABS_SHIFT		20
-#define REG_NO_USE_SHIFT	21
-#define REG_VALID_SHIFT		22

Re: [PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-17 Thread Oliver McFadden
This patch seems to break one of my longer fragment programs. I believe this is
because it's running out of registers, but I haven't looked into it in detail
yet.

I think this patch should be committed, but directly followed by a patch to
reduce the number of registers used.


On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
 There were a number of bugs related to the pairing of vector and
 scalar operations where swizzles ended up using the wrong source
 register, or an instruction was moved forward and ended up overwriting
 an aliased register.

 The new algorithm for register allocation is slightly conservative and
 may run out of registers before it's strictly necessary. On the plus
 side, it Just Works.

 Pairing of instructions is done whenever possible, and in more cases
 than before, so in practice this change should be a net win.

 The patch mostly fixes glean/texCombine. One remaining problem is that
 the code duplicates constants and parameters all over the place and
 therefore quickly runs out of resources and falls back to software.
 I'm going to look into that as well.

 Please test and commit this patch. If you notice any regressions,
 please tell me (but the tests are looking good).

 ~Nicolai


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-17 Thread Oliver McFadden
Another thought; the same changed are probably needed for the vertprog code. I
think there are also a lot of bugs there.


On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:
 This patch seems to break one of my longer fragment programs. I believe this
 is
 because it's running out of registers, but I haven't looked into it in
 detail
 yet.

 I think this patch should be committed, but directly followed by a patch to
 reduce the number of registers used.


 On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
  There were a number of bugs related to the pairing of vector and
  scalar operations where swizzles ended up using the wrong source
  register, or an instruction was moved forward and ended up overwriting
  an aliased register.
 
  The new algorithm for register allocation is slightly conservative and
  may run out of registers before it's strictly necessary. On the plus
  side, it Just Works.
 
  Pairing of instructions is done whenever possible, and in more cases
  than before, so in practice this change should be a net win.
 
  The patch mostly fixes glean/texCombine. One remaining problem is that
  the code duplicates constants and parameters all over the place and
  therefore quickly runs out of resources and falls back to software.
  I'm going to look into that as well.
 
  Please test and commit this patch. If you notice any regressions,
  please tell me (but the tests are looking good).
 
  ~Nicolai
 


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel