Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-05-05 Thread Michel Dänzer
On Wed, 2004-05-05 at 04:48, Ian Romanick wrote:
> This patch enables proper handling of 3f texture coordinates for 
> cubemaps.  Up till now, cubemaps only worked if texgen was enabled.  As 
> far as I know, this works with all tcl_mode settings (0 through 3 were 
> tested), but I have only tested with progs/demos/cubemap.
> 
> With this patch, stex3d also seems to work.  I'm a bit confused about 
> this.  When I enable various driver debug messages, *nothing* gets 
> printed during this test.  I thought maybe it was falling back to 
> software, but it looks different than SW & R200_DEBUG=fall gives no 
> output either.

It's also way too fast for software. :)


> If this patch works for people (in apps other than cubemap), I will 
> commit it.  

Looks good with foobillard --cuberef on (too bad it's unplayably slow
here); the balls initially show cubemap-like reflections, but that might
be a foobillard issue.

> My next step will be to get the x86 & SSE codegen working 
> again.  After that I'll try to get point size working on R200.

Sounds good. :)


> Anyone have any ideas about the fog coordinate stuff???

Not really I'm afraid, but some things seem odd to me (I'm using the
register names from the register reference here, those in the driver
differ slightly):

The register reference says 'post-TCL only' about the
VTX_DISCRETE_FOG_PRESENT bit in the SE_VTX_FMT_0 register. No idea if
that's significant for this though.

There's a VTX_DISCRETE_FOG_SEL bit in the SE_TCL_OUTPUT_VTX_COMP_SEL
register that the code doesn't seem to handle yet?


PS: Is it just me, or is the r200 driver broken with no_rast=true?

-- 
Earthling Michel DÃnzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer



---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE. 
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-05-05 Thread Ian Romanick
Ian Romanick wrote:

The one caveat with this patch is the x86 & SSE codegen is disabled for 
all TexCoord and MultiTexCoord commands.  If you look at the changes to 
r200_vtxfmt_c.c, you'll see that I had to make some changes to the way 
those routines work.
The previous patch is committed.  The attached patch adds x86 & SSE 
codegen back.  I've changed the way the codegen works just slightly.

Each codegen stub consists of a bit of assembly code that needs to be 
reloced / fixed-up at run-time.  Prepended to the assembly code is a 
small preamble that describes how to do this.  The preamble contains the 
size of the assembly stub and array of "fix-ups" that need to be done. 
The stub code follows immediatly after the array of fix-ups.

At run-time, the function r200_do_codegen is called to create the 
executable stub.  It is passed a pointer to the stub's preamble and an 
array of fix-up values.  Each entry in the stub's fix-up array specifies 
a size, an offset in the stub, and an element index to use for the 
fix-up.  This is similar to how a reloc table works in an object file.

There are two obvious advantages.  If a stub is modified, it is likely 
that only one file (the file containing the stub) needs to be updated.
Code size (in the form of FIXUP macros) is cut way down.

There are a couple of advantages to this that aren't fully realized in 
this code.  This is a *lot* more cross-platform.  The only difference 
between r200_makeX86TexCoord2f and r200_makeSSETexCoord2f (and the 
non-existent r200_makePowerPCTexCoord2f) is a single pointer passed to 
r200_do_codegen.  This should make it possible to cut down on a lot of 
redundant code.  Additionlly, since the codegen stubs contain all the 
information needed to do the fix-ups, it should be possible to share 
common assembly stubs in multiple places (i.e., _x86_Vertex3f in r200, 
radeon, and t_vertex).

One disadvantage is if the codegen_stub structure is changed.  If that 
structure is changed, all of the assembly files will also have to 
change.  However, there won't be any compiler warnings for any that are 
"missed."  We'll just get mysterious codegen related bugs. :(

Another disadvantage is that this code seems to be more prone to 
cut-and-paste type errors.

If this new method is acceptable to everyone, I'll modify the rest of 
the codegen stubs in the R200 driver to use it.  I'd really like to put 
some form of r200_do_codegen in a shared location so that other places 
that do codegen can re-use it.

? src/mesa/drivers/dri/r200/depend
Index: src/mesa/drivers/dri/r200/r200_context.h
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_context.h,v
retrieving revision 1.15
diff -w -u -d -r1.15 r200_context.h
--- a/src/mesa/drivers/dri/r200/r200_context.h  5 May 2004 20:16:17 -   1.15
+++ b/src/mesa/drivers/dri/r200/r200_context.h  5 May 2004 22:19:34 -
@@ -788,6 +788,7 @@
r200_color_t *specptr;
GLfloat *texcoordptr[2];
 
+   GLint texcoordsize[2];   /**< Number of elements in each tex coord. */
 
GLenum *prim;   /* &ctx->Driver.CurrentExecPrimitive */
GLuint primflags;
Index: src/mesa/drivers/dri/r200/r200_vtxfmt.c
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_vtxfmt.c,v
retrieving revision 1.9
diff -w -u -d -r1.9 r200_vtxfmt.c
--- a/src/mesa/drivers/dri/r200/r200_vtxfmt.c   5 May 2004 21:32:16 -   1.9
+++ b/src/mesa/drivers/dri/r200/r200_vtxfmt.c   5 May 2004 22:19:34 -
@@ -808,6 +808,8 @@
 
 rmesa->vb.vertex_size += count[i];
   }
+
+  rmesa->vb.texcoordsize[i] = count[i];
}
 
if (rmesa->vb.installed_vertex_format != rmesa->vb.vtxfmt_0) {
Index: src/mesa/drivers/dri/r200/r200_vtxfmt_sse.c
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/r200/r200_vtxfmt_sse.c,v
retrieving revision 1.5
diff -w -u -d -r1.5 r200_vtxfmt_sse.c
--- a/src/mesa/drivers/dri/r200/r200_vtxfmt_sse.c   5 May 2004 20:16:17 -  
 1.5
+++ b/src/mesa/drivers/dri/r200/r200_vtxfmt_sse.c   5 May 2004 22:19:34 -
@@ -45,44 +45,18 @@
 extern const char *FUNC;   \
 extern const char *FUNC##_end
 
-EXTERN( _sse_Attribute2fv );
-EXTERN( _sse_Attribute2f );
 EXTERN( _sse_Attribute3fv );
 EXTERN( _sse_Attribute3f );
-EXTERN( _sse_MultiTexCoord2fv );
-EXTERN( _sse_MultiTexCoord2f );
-EXTERN( _sse_MultiTexCoord2fv_2 );
-EXTERN( _sse_MultiTexCoord2f_2 );
 
-/* Build specialized versions of the immediate calls on the fly for
- * the current state.
- */
-
-static struct dynfn *r200_makeSSEAttribute2fv( struct dynfn * cache, const int * key,
-  const char * name, void * dest)
-{
-   struct dynfn *dfn = MALLOC_STRUCT( dynfn );
-
-   if (R200_DEBUG & DEBUG_CODEGEN)
-  fprintf(stderr, "%s 0x%08x\n", name, key[0] );
-
-   DFN ( _sse_Attr

Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-05-05 Thread Ian Romanick
Michel DÃnzer wrote:
On Wed, 2004-05-05 at 04:48, Ian Romanick wrote:

Anyone have any ideas about the fog coordinate stuff???
Not really I'm afraid, but some things seem odd to me (I'm using the
register names from the register reference here, those in the driver
differ slightly):
The register reference says 'post-TCL only' about the
VTX_DISCRETE_FOG_PRESENT bit in the SE_VTX_FMT_0 register. No idea if
that's significant for this though.
There's a VTX_DISCRETE_FOG_SEL bit in the SE_TCL_OUTPUT_VTX_COMP_SEL
register that the code doesn't seem to handle yet?
SE_TCL_OUTPUT_VTX_COMP_SEL or SE_TCL_OUTPUT_VTX_FMT_0?  There isn't even 
a bit for fog in SE_TCL_OUTPUT_VTX_COMP_SEL in r200_reg.h.  Hmm...

PS: Is it just me, or is the r200 driver broken with no_rast=true?
The R200 driver is totally hosed any time when there's a software 
rasterization fallback (i.e., stencil operations in 16-bit depth mode). 
 It seems to be t_vertex related. :(



---
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-05-06 Thread Keith Whitwell
Ian Romanick wrote:
Ian Romanick wrote:

The one caveat with this patch is the x86 & SSE codegen is disabled 
for all TexCoord and MultiTexCoord commands.  If you look at the 
changes to r200_vtxfmt_c.c, you'll see that I had to make some changes 
to the way those routines work.


The previous patch is committed.  The attached patch adds x86 & SSE 
codegen back.  I've changed the way the codegen works just slightly.

Each codegen stub consists of a bit of assembly code that needs to be 
reloced / fixed-up at run-time.  Prepended to the assembly code is a 
small preamble that describes how to do this.  The preamble contains the 
size of the assembly stub and array of "fix-ups" that need to be done. 
The stub code follows immediatly after the array of fix-ups.

At run-time, the function r200_do_codegen is called to create the 
executable stub.  It is passed a pointer to the stub's preamble and an 
array of fix-up values.  Each entry in the stub's fix-up array specifies 
a size, an offset in the stub, and an element index to use for the 
fix-up.  This is similar to how a reloc table works in an object file.

There are two obvious advantages.  If a stub is modified, it is likely 
that only one file (the file containing the stub) needs to be updated.
Code size (in the form of FIXUP macros) is cut way down.

There are a couple of advantages to this that aren't fully realized in 
this code.  This is a *lot* more cross-platform.  The only difference 
between r200_makeX86TexCoord2f and r200_makeSSETexCoord2f (and the 
non-existent r200_makePowerPCTexCoord2f) is a single pointer passed to 
r200_do_codegen.  This should make it possible to cut down on a lot of 
redundant code.  Additionlly, since the codegen stubs contain all the 
information needed to do the fix-ups, it should be possible to share 
common assembly stubs in multiple places (i.e., _x86_Vertex3f in r200, 
radeon, and t_vertex).

One disadvantage is if the codegen_stub structure is changed.  If that 
structure is changed, all of the assembly files will also have to 
change.  However, there won't be any compiler warnings for any that are 
"missed."  We'll just get mysterious codegen related bugs. :(

Another disadvantage is that this code seems to be more prone to 
cut-and-paste type errors.

If this new method is acceptable to everyone, I'll modify the rest of 
the codegen stubs in the R200 driver to use it.  I'd really like to put 
some form of r200_do_codegen in a shared location so that other places 
that do codegen can re-use it.
I can't help thinking there's a "right" way to do these fixups and we're just 
not using it.  For instance, I don't know how the assembler "marks" addresses 
requiring relocation so that ld.so can find them efficiently later on - or 
whether we could use the same or simlar mechanism.  I know linux does a 
related trick with its copy_from_user code by emitting labels or pointers to 
another section of the object file.

It seems like your approach still involves guessing (or pre-calculating) 
offsets into the generated machine-code.  We've done a slightly different 
thing in the t_vtx_* codegen by using a distinctive dword (0x10101010+n, as it 
turns out), and basically doing a search & replace on that value, which seems 
to work.  The C code still knows which order the fixups are supposed to occur 
in the code being fixed up.  I guess the approaches could be combined, so that 
the 'n' value took on the same meaning as the 'entry' field in your structs, 
so that you might get something like:

GLOBL( _x86_MultiTexCoord2fv_stub )
.long   _x86_MultiTexCoord2fv_end - _x86_MultiTexCoord2fv
.long   2
.long   4, FIXUP(0), 0
.long   4, FIXUP(1), 1
_x86_MultiTexCoord2fv:
movl4(%esp), %eax
movl8(%esp), %ecx
and $TEX_TARGET_MASK, %eax
movlFIXUP(0)(,%eax,4), %edx # texcoord_size[unit] is 1, 2, or 3
movlFIXUP(1)(,%eax,4), %eax # texcoord_ptr[unit]
decl%edx
jne .3_2fv
movl(%ecx), %edx
movl%edx, (%eax)
ret
etc.



Secondly, I see you're using a single TexCoord2f function to cope with all 
possible sizes of the texcoord in the actual emitted vertex.  This is 
certainly the simplest appraoch, but it's worth pointing out that it's 
possible to have multiple versions of TexCoord2f, etc which are specialized 
for each emitted texcoord size, and thereby eliminate the branches in your 
code.  It's probably not significant, though.

Keiht





---
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to 
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-05-07 Thread Michel Dänzer
On Thu, 2004-05-06 at 02:54, Ian Romanick wrote:
> Michel DÃnzer wrote:
> > On Wed, 2004-05-05 at 04:48, Ian Romanick wrote:
> > 
> > There's a VTX_DISCRETE_FOG_SEL bit in the SE_TCL_OUTPUT_VTX_COMP_SEL
> > register that the code doesn't seem to handle yet?
> 
> SE_TCL_OUTPUT_VTX_COMP_SEL or SE_TCL_OUTPUT_VTX_FMT_0?  There isn't even 
> a bit for fog in SE_TCL_OUTPUT_VTX_COMP_SEL in r200_reg.h.

It's bit 24. The description says 'Select the computer descrete (sic)
fog value'.


-- 
Earthling Michel DÃnzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer



---
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to 
deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] R200 TexCoord3f patch for cubemaps

2004-06-23 Thread Ian Romanick
I'm *finally* getting back to this.  Sheesh...
Keith Whitwell wrote:
I can't help thinking there's a "right" way to do these fixups and we're 
just not using it.  For instance, I don't know how the assembler "marks" 
addresses requiring relocation so that ld.so can find them efficiently 
later on - or whether we could use the same or simlar mechanism.  I know 
linux does a related trick with its copy_from_user code by emitting 
labels or pointers to another section of the object file.
It would be nice if we somehow had access to that data.  To do that we'd 
need some sort of custom tool (probably) that could pre-process the .S 
file and generate a table of offsets.

It seems like your approach still involves guessing (or pre-calculating) 
offsets into the generated machine-code.  We've done a slightly 
There is still some error-prone human work involved.  A cycle of 
assemble -> objdump -D -> examine code works well for short stubs like 
this.  I can see it becoming very unwieldly for larger stubs. 
Hmm...perhaps a Python script could be written to automate that process...

different thing in the t_vtx_* codegen by using a distinctive dword 
(0x10101010+n, as it turns out), and basically doing a search & replace 
on that value, which seems to work.  The C code still knows which order 
the fixups are supposed to occur in the code being fixed up.  I guess 
the approaches could be combined, so that the 'n' value took on the same 
meaning as the 'entry' field in your structs, so that you might get 
something like:
Right.  I saw that code when I was part way through writing these stubs. 
 The problem is that you can't fix-up anything except 4-byte values.  I 
don't see a clear way to extend that setup to 1 or 2-byte values or 
(more importantly) 8-byte values.

Secondly, I see you're using a single TexCoord2f function to cope with 
all possible sizes of the texcoord in the actual emitted vertex.  This 
is certainly the simplest appraoch, but it's worth pointing out that 
it's possible to have multiple versions of TexCoord2f, etc which are 
specialized for each emitted texcoord size, and thereby eliminate the 
branches in your code.  It's probably not significant, though.
Yeah, I took the easy route. :)  I wasn't too worried about TexCoord2f, 
and specializing for MultiTexCoord would require examining the texture 
unit number to figure out how to emit.  That was the nice thing when 
everything was emitted as 2f.  Now that unit 0 can be 2f, unit 1 can be 
3f, and unit 2 be 1f, things get messy. :(

Part of the reason getting this codegen stuff done right is important to 
me is I have some codegen for sw Mesa that I've had kicking around for a 
number of months.  It's currently written more like my R200 patch, but 
it could be done in any style.  Either way, it will need some updating 
to get working again.  Basically, I wrote stubs to fetch texel data from 
texture maps.  Each gl_texture_image got a stub generated that was 
hard-coded for its format, height, and width.  The codegen was used in 
place of the texture's FetchTexelFunc function.  That alone gave a litte 
more than a 5% speed up to tunnel.  My ultimate (unrealized) goal was to 
codegen the entire texel-fetch (coordinate clamp, fetch, filter) stack.

The generated "fetch a filtered texel" functions could be used from the 
fragement program compiler, too.


---
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel