Re: [R300][PATCH] Add/fix COS & SIN + FP fixes

2007-02-11 Thread Jerome Glisse
On 2/11/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
> Rune Petersen wrote:
> > .
> >
> > Rune Petersen wrote:
> >> Jerome Glisse wrote:
> >>> On 2/11/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:
>  On 2/10/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Getting proper SIN and COS wasn't as easy as it appeared. I had to make
> > make some changes to the fragment program code.
> >
> > general FP changes:
> > - support HHH swizzle for vector instructions.
> > - don't copy a source to a temp when it is not XYZW swizzled, but
> >   combine the two and have the swizzle resolve any issues.
> >   (saves temps/instructions with more elaborate shader code)
> > - Disable refcounting of temps.
> >   The temp R0 in progs/fp/tri-cos.c is freed prematurely.
> >   This should be resolved properly.
> > - fix overflow in cnstv[].
> >
> >
> > SIN & COS:
> > they are based on:
> > http://www.devmaster.net/forums/showthread.php?t=5784
> >
> > There is an fast and a slow(high precision) version of SIN & COS.
> >
> > For SIN:
> > fast = 2 vector instructions
> > slow = 5 vector instructions
> >
> > For COS:
> > fast = 5 vector instructions + 2 scaler instructions
> > slow = 8 vector instructions + 2 scaler instructions
> >
> > The fast version appears to do a fine enough job, at least with the
> > simple test I have made.
> >
> >
> > Rune Petersen
>  Nice to tackle this :) few comment, maybe we could make an driconf
>  option to switch btw fast and slow version (or a more general conf
>  option to enable or disable fragprog optimization in case we come
>  with more optimization like that in the future).
> 
>  For the refcounting i am wondering if i didn't bump into that in
>  the past, i did use gdb to trace fragprog construction at that
>  time and found some strange interaction (which lead me to
>  the rework i did on fragprog).
> 
>  Anyway here from limited testing your patch seems good,
>  you should commit it.
> 
>  best,
>  Jerome Glisse
> 
> >>> Attached a patch to fix refcounting. Basicly whenever a temporary
> >>> source was used multiple time inside an instruction that lead to
> >>> multiple call to t_hw_src which is correct but as we also decrement
> >>> use counter in that function we over decremented the refcount.
> >>>
> >>> The patch decrement refcount after instruction decoding and avoid
> >>> over decrementing refcount.
> >>>
> >>> (The patch apply over yours)
> >>>
> >>> best,
> >>> Jerome
> >> I have found that the main reason for my problem was I forgot to use the
> >> keep() on the source.
> >>
> >> I think your patch is too intrusive. As long as keep() is used at the
> >> right places, you could move the refcount inside emit_arith() making the
> >> change more contained.
> >>
> >> Update patch attached.
> >>
> >> Could I get you to commit this, since I will not be able to find the
> >> time to figure GIT out any time (and lost my sig, don't ask).
> >>
> How I managed to compile that is beyond me.
> Here is a proper patch.
>
> Rune Petersen
>

Ok commited.

best,
Jerome Glisse

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [R300][PATCH] Add/fix COS & SIN + FP fixes

2007-02-11 Thread Rune Petersen
Rune Petersen wrote:
> .
> 
> Rune Petersen wrote:
>> Jerome Glisse wrote:
>>> On 2/11/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:
 On 2/10/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Getting proper SIN and COS wasn't as easy as it appeared. I had to make
> make some changes to the fragment program code.
>
> general FP changes:
> - support HHH swizzle for vector instructions.
> - don't copy a source to a temp when it is not XYZW swizzled, but
>   combine the two and have the swizzle resolve any issues.
>   (saves temps/instructions with more elaborate shader code)
> - Disable refcounting of temps.
>   The temp R0 in progs/fp/tri-cos.c is freed prematurely.
>   This should be resolved properly.
> - fix overflow in cnstv[].
>
>
> SIN & COS:
> they are based on:
> http://www.devmaster.net/forums/showthread.php?t=5784
>
> There is an fast and a slow(high precision) version of SIN & COS.
>
> For SIN:
> fast = 2 vector instructions
> slow = 5 vector instructions
>
> For COS:
> fast = 5 vector instructions + 2 scaler instructions
> slow = 8 vector instructions + 2 scaler instructions
>
> The fast version appears to do a fine enough job, at least with the
> simple test I have made.
>
>
> Rune Petersen
 Nice to tackle this :) few comment, maybe we could make an driconf
 option to switch btw fast and slow version (or a more general conf
 option to enable or disable fragprog optimization in case we come
 with more optimization like that in the future).

 For the refcounting i am wondering if i didn't bump into that in
 the past, i did use gdb to trace fragprog construction at that
 time and found some strange interaction (which lead me to
 the rework i did on fragprog).

 Anyway here from limited testing your patch seems good,
 you should commit it.

 best,
 Jerome Glisse

>>> Attached a patch to fix refcounting. Basicly whenever a temporary
>>> source was used multiple time inside an instruction that lead to
>>> multiple call to t_hw_src which is correct but as we also decrement
>>> use counter in that function we over decremented the refcount.
>>>
>>> The patch decrement refcount after instruction decoding and avoid
>>> over decrementing refcount.
>>>
>>> (The patch apply over yours)
>>>
>>> best,
>>> Jerome
>> I have found that the main reason for my problem was I forgot to use the
>> keep() on the source.
>>
>> I think your patch is too intrusive. As long as keep() is used at the
>> right places, you could move the refcount inside emit_arith() making the
>> change more contained.
>>
>> Update patch attached.
>>
>> Could I get you to commit this, since I will not be able to find the
>> time to figure GIT out any time (and lost my sig, don't ask).
>>
How I managed to compile that is beyond me.
Here is a proper patch.

Rune Petersen
diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index 02f8e91..b140235 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -729,6 +729,11 @@ struct r300_fragment_program {
 	GLboolean params_uptodate;
 
 	int max_temp_idx;
+
+	/* the index of the sin constant is stored here */
+	GLint const_sin;
+	
+	GLuint optimization;
 };
 
 #define R300_MAX_AOS_ARRAYS		16
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 6e85f0b..b00cf9e 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -33,7 +33,7 @@
 
 /*TODO'S
  *
- * - COS/SIN/SCS instructions
+ * - SCS instructions
  * - Depth write, WPOS/FOGC inputs
  * - FogOption
  * - Verify results of opcodes for accuracy, I've only checked them
@@ -187,6 +187,10 @@ static const struct {
 #define SLOT_VECTOR	(1<<0)
 #define SLOT_SCALAR	(1<<3)
 #define SLOT_BOTH	(SLOT_VECTOR | SLOT_SCALAR)
+
+/* mapping from SWIZZLE_* to r300 native values for scalar insns */
+#define SWIZZLE_HALF 6
+
 #define MAKE_SWZ3(x, y, z) (MAKE_SWIZZLE4(SWIZZLE_##x, \
 	  SWIZZLE_##y, \
 	  SWIZZLE_##z, \
@@ -208,7 +212,7 @@ static const struct r300_pfs_swizzle {
 	{ MAKE_SWZ3(W, Z, Y), R300_FPI0_ARGC_SRC0CA_WZY, 1, SLOT_BOTH },
 	{ MAKE_SWZ3(ONE, ONE, ONE), R300_FPI0_ARGC_ONE, 0, 0},
 	{ MAKE_SWZ3(ZERO, ZERO, ZERO), R300_FPI0_ARGC_ZERO, 0, 0},
-	{ PFS_INVAL, R300_FPI0_ARGC_HALF, 0, 0},
+	{ MAKE_SWZ3(HALF, HALF, HALF), R300_FPI0_ARGC_HALF, 0, 0},
 	{ PFS_INVAL, 0, 0, 0},
 };
 
@@ -232,8 +236,6 @@ static const struct {
 	{ PFS_INVAL, PFS_INVAL, PFS_INVAL}
 };
 
-/* mapping from SWIZZLE_* to r300 native values for scalar insns */
-#define SWIZZLE_HALF 6
 static const struct {
 	int base;	/* hw value of swizzle */
 	int stride;	/* difference between SRC0/1/2 */
@@ -590,6 +592,7 @@ static GLuint do_swizzle(struct r300_fragment_program *rp,
 	/* 

[PATCH] Fix some locking issues.

2007-02-11 Thread Thomas Hellström

Hi!

I'm posting a patch relating to bugzilla Bug #9457
https://bugs.freedesktop.org/show_bug.cgi?id=9457

Since it involves the hardware lock operation I'd like it to be reviewed 
before committing.


/Thomas



>From 44476a61912eb83e2e77680b400a383efa7f11bd Mon Sep 17 00:00:00 2001
From: Thomas Hellstrom 
Date: Sun, 11 Feb 2007 20:33:57 +0100
Subject: [PATCH] Bugzilla Bug #9457
Add refcounting of user waiters to the DRM hardware lock, so that we can use the
DRM_LOCK_CONT flag more conservatively.

Also add a kernel waiter refcount that if nonzero transfers the lock for the 
kernel context,
when it is released. This is useful when waiting for idle and can be used
for very simple fence object driver implementations for the new memory manager.

It also resolves the AIGLX startup deadlock for the sis and the via drivers.
i810, i830 still require that the hardware lock is really taken so the deadlock 
remains
for those two. I'm not sure about ffb. Anyone familiar with that code?
---
 linux-core/drmP.h |   14 +++-
 linux-core/drm_fops.c |   57 +++--
 linux-core/drm_irq.c  |4 +
 linux-core/drm_lock.c |  162 -
 linux-core/drm_stub.c |1 
 linux-core/sis_drv.c  |2 -
 linux-core/via_mm.c   |1 
 shared-core/via_drv.c |3 +
 8 files changed, 144 insertions(+), 100 deletions(-)

diff --git a/linux-core/drmP.h b/linux-core/drmP.h
index 9c748e6..516e69d 100644
--- a/linux-core/drmP.h
+++ b/linux-core/drmP.h
@@ -458,6 +458,10 @@ typedef struct drm_lock_data {
struct file *filp;  /**< File descr of lock holder 
(0=kernel) */
wait_queue_head_t lock_queue;   /**< Queue of blocked processes */
unsigned long lock_time;/**< Time of last lock in jiffies */
+   spinlock_t spinlock;
+   uint32_t kernel_waiters;
+   uint32_t user_waiters;
+   int idle_has_lock;
 } drm_lock_data_t;
 
 /**
@@ -712,6 +716,8 @@ struct drm_driver {
void (*reclaim_buffers) (struct drm_device *dev, struct file * filp);
void (*reclaim_buffers_locked) (struct drm_device *dev,
struct file * filp);
+   void (*reclaim_buffers_idlelocked) (struct drm_device *dev,
+   struct file * filp);
unsigned long (*get_map_ofs) (drm_map_t * map);
unsigned long (*get_reg_ofs) (struct drm_device * dev);
void (*set_version) (struct drm_device * dev, drm_set_version_t * sv);
@@ -1193,9 +1199,11 @@ extern int drm_lock(struct inode *inode,
unsigned int cmd, unsigned long arg);
 extern int drm_unlock(struct inode *inode, struct file *filp,
  unsigned int cmd, unsigned long arg);
-extern int drm_lock_take(__volatile__ unsigned int *lock, unsigned int 
context);
-extern int drm_lock_free(drm_device_t * dev,
-__volatile__ unsigned int *lock, unsigned int context);
+extern int drm_lock_take(drm_lock_data_t *lock_data, unsigned int context);
+extern int drm_lock_free(drm_lock_data_t *lock_data, unsigned int context);
+extern void drm_idlelock_take(drm_lock_data_t *lock_data);
+extern void drm_idlelock_release(drm_lock_data_t *lock_data);
+
 /*
  * These are exported to drivers so that they can implement fencing using
  * DMA quiscent + idle. DMA quiescent usually requires the hardware lock. 
diff --git a/linux-core/drm_fops.c b/linux-core/drm_fops.c
index 84e06c8..6555edb 100644
--- a/linux-core/drm_fops.c
+++ b/linux-core/drm_fops.c
@@ -427,38 +427,51 @@ int drm_release(struct inode *inode, str
  dev->open_count);
 
if (dev->driver->reclaim_buffers_locked && dev->lock.hw_lock) {
-   unsigned long _end = jiffies + DRM_HZ*3;
-
-   do {
-   retcode = drm_kernel_take_hw_lock(filp);
-   } while(retcode && !time_after_eq(jiffies,_end));
-
-   if (!retcode) {
+   if (drm_i_have_hw_lock(filp)) {
dev->driver->reclaim_buffers_locked(dev, filp);
-
-   drm_lock_free(dev, &dev->lock.hw_lock->lock,
- 
_DRM_LOCKING_CONTEXT(dev->lock.hw_lock->lock));
} else {
+   unsigned long _end=jiffies + 3*DRM_HZ;
+   int locked = 0;
+
+   drm_idlelock_take(&dev->lock);
 
/*
-* FIXME: This is not a good solution. We should 
perhaps associate the
-* DRM lock with a process context, and check whether 
the current process
-* holds the lock. Then we can run reclaim buffers 
locked anyway.
+* Wait for a while.
 */
+   
+   do{
+   spin_lock(&dev->lock.spinlock);
+   locked = dev->lock.idle_has_lock;
+   

Re: [R300][PATCH] Add/fix COS & SIN + FP fixes

2007-02-11 Thread Rune Petersen
.

Rune Petersen wrote:
> Jerome Glisse wrote:
>> On 2/11/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:
>>> On 2/10/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
 Hi,

 Getting proper SIN and COS wasn't as easy as it appeared. I had to make
 make some changes to the fragment program code.

 general FP changes:
 - support HHH swizzle for vector instructions.
 - don't copy a source to a temp when it is not XYZW swizzled, but
   combine the two and have the swizzle resolve any issues.
   (saves temps/instructions with more elaborate shader code)
 - Disable refcounting of temps.
   The temp R0 in progs/fp/tri-cos.c is freed prematurely.
   This should be resolved properly.
 - fix overflow in cnstv[].


 SIN & COS:
 they are based on:
 http://www.devmaster.net/forums/showthread.php?t=5784

 There is an fast and a slow(high precision) version of SIN & COS.

 For SIN:
 fast = 2 vector instructions
 slow = 5 vector instructions

 For COS:
 fast = 5 vector instructions + 2 scaler instructions
 slow = 8 vector instructions + 2 scaler instructions

 The fast version appears to do a fine enough job, at least with the
 simple test I have made.


 Rune Petersen
>>> Nice to tackle this :) few comment, maybe we could make an driconf
>>> option to switch btw fast and slow version (or a more general conf
>>> option to enable or disable fragprog optimization in case we come
>>> with more optimization like that in the future).
>>>
>>> For the refcounting i am wondering if i didn't bump into that in
>>> the past, i did use gdb to trace fragprog construction at that
>>> time and found some strange interaction (which lead me to
>>> the rework i did on fragprog).
>>>
>>> Anyway here from limited testing your patch seems good,
>>> you should commit it.
>>>
>>> best,
>>> Jerome Glisse
>>>
>> Attached a patch to fix refcounting. Basicly whenever a temporary
>> source was used multiple time inside an instruction that lead to
>> multiple call to t_hw_src which is correct but as we also decrement
>> use counter in that function we over decremented the refcount.
>>
>> The patch decrement refcount after instruction decoding and avoid
>> over decrementing refcount.
>>
>> (The patch apply over yours)
>>
>> best,
>> Jerome
> 
> I have found that the main reason for my problem was I forgot to use the
> keep() on the source.
> 
> I think your patch is too intrusive. As long as keep() is used at the
> right places, you could move the refcount inside emit_arith() making the
> change more contained.
> 
> Update patch attached.
> 
> Could I get you to commit this, since I will not be able to find the
> time to figure GIT out any time (and lost my sig, don't ask).
> 
> 
> Rune Petersen
> 
> 

diff --git a/src/mesa/drivers/dri/r300/r300_context.c b/src/mesa/drivers/dri/r300/r300_context.c
diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index 02f8e91..b140235 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -729,6 +729,11 @@ struct r300_fragment_program {
 	GLboolean params_uptodate;
 
 	int max_temp_idx;
+
+	/* the index of the sin constant is stored here */
+	GLint const_sin;
+	
+	GLuint optimization;
 };
 
 #define R300_MAX_AOS_ARRAYS		16
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 6e85f0b..cb250ca 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -33,7 +33,7 @@
 
 /*TODO'S
  *
- * - COS/SIN/SCS instructions
+ * - SCS instructions
  * - Depth write, WPOS/FOGC inputs
  * - FogOption
  * - Verify results of opcodes for accuracy, I've only checked them
@@ -51,6 +51,8 @@
 #include "r300_fragprog.h"
 #include "r300_reg.h"
 
+#define FAST_SIN
+
 /*
  * Usefull macros and values
  */
@@ -187,6 +189,10 @@ static const struct {
 #define SLOT_VECTOR	(1<<0)
 #define SLOT_SCALAR	(1<<3)
 #define SLOT_BOTH	(SLOT_VECTOR | SLOT_SCALAR)
+
+/* mapping from SWIZZLE_* to r300 native values for scalar insns */
+#define SWIZZLE_HALF 6
+
 #define MAKE_SWZ3(x, y, z) (MAKE_SWIZZLE4(SWIZZLE_##x, \
 	  SWIZZLE_##y, \
 	  SWIZZLE_##z, \
@@ -208,7 +214,7 @@ static const struct r300_pfs_swizzle {
 	{ MAKE_SWZ3(W, Z, Y), R300_FPI0_ARGC_SRC0CA_WZY, 1, SLOT_BOTH },
 	{ MAKE_SWZ3(ONE, ONE, ONE), R300_FPI0_ARGC_ONE, 0, 0},
 	{ MAKE_SWZ3(ZERO, ZERO, ZERO), R300_FPI0_ARGC_ZERO, 0, 0},
-	{ PFS_INVAL, R300_FPI0_ARGC_HALF, 0, 0},
+	{ MAKE_SWZ3(HALF, HALF, HALF), R300_FPI0_ARGC_HALF, 0, 0},
 	{ PFS_INVAL, 0, 0, 0},
 };
 
@@ -232,8 +238,6 @@ static const struct {
 	{ PFS_INVAL, PFS_INVAL, PFS_INVAL}
 };
 
-/* mapping from SWIZZLE_* to r300 native values for scalar insns */
-#define SWIZZLE_HALF 6
 static const struct {
 	int base;	/* hw value of swizzle */
 	int stride;	/* difference between SRC0/1/2 */
@@ -590,6 +5

Re: [R300][PATCH] Add/fix COS & SIN + FP fixes

2007-02-11 Thread Rune Petersen
Jerome Glisse wrote:
> On 2/11/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:
>> On 2/10/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > Getting proper SIN and COS wasn't as easy as it appeared. I had to make
>> > make some changes to the fragment program code.
>> >
>> > general FP changes:
>> > - support HHH swizzle for vector instructions.
>> > - don't copy a source to a temp when it is not XYZW swizzled, but
>> >   combine the two and have the swizzle resolve any issues.
>> >   (saves temps/instructions with more elaborate shader code)
>> > - Disable refcounting of temps.
>> >   The temp R0 in progs/fp/tri-cos.c is freed prematurely.
>> >   This should be resolved properly.
>> > - fix overflow in cnstv[].
>> >
>> >
>> > SIN & COS:
>> > they are based on:
>> > http://www.devmaster.net/forums/showthread.php?t=5784
>> >
>> > There is an fast and a slow(high precision) version of SIN & COS.
>> >
>> > For SIN:
>> > fast = 2 vector instructions
>> > slow = 5 vector instructions
>> >
>> > For COS:
>> > fast = 5 vector instructions + 2 scaler instructions
>> > slow = 8 vector instructions + 2 scaler instructions
>> >
>> > The fast version appears to do a fine enough job, at least with the
>> > simple test I have made.
>> >
>> >
>> > Rune Petersen
>>
>> Nice to tackle this :) few comment, maybe we could make an driconf
>> option to switch btw fast and slow version (or a more general conf
>> option to enable or disable fragprog optimization in case we come
>> with more optimization like that in the future).
>>
>> For the refcounting i am wondering if i didn't bump into that in
>> the past, i did use gdb to trace fragprog construction at that
>> time and found some strange interaction (which lead me to
>> the rework i did on fragprog).
>>
>> Anyway here from limited testing your patch seems good,
>> you should commit it.
>>
>> best,
>> Jerome Glisse
>>
> 
> Attached a patch to fix refcounting. Basicly whenever a temporary
> source was used multiple time inside an instruction that lead to
> multiple call to t_hw_src which is correct but as we also decrement
> use counter in that function we over decremented the refcount.
> 
> The patch decrement refcount after instruction decoding and avoid
> over decrementing refcount.
> 
> (The patch apply over yours)
> 
> best,
> Jerome

I have found that the main reason for my problem was I forgot to use the
keep() on the source.

I think your patch is too intrusive. As long as keep() is used at the
right places, you could move the refcount inside emit_arith() making the
change more contained.

Update patch attached.

Could I get you to commit this, since I will not be able to find the
time to figure GIT out any time (and lost my sig, don't ask).


Rune Petersen


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [R300][PATCH] Add/fix COS & SIN + FP fixes

2007-02-11 Thread Jerome Glisse

On 2/11/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:

On 2/10/07, Rune Petersen <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Getting proper SIN and COS wasn't as easy as it appeared. I had to make
> make some changes to the fragment program code.
>
> general FP changes:
> - support HHH swizzle for vector instructions.
> - don't copy a source to a temp when it is not XYZW swizzled, but
>   combine the two and have the swizzle resolve any issues.
>   (saves temps/instructions with more elaborate shader code)
> - Disable refcounting of temps.
>   The temp R0 in progs/fp/tri-cos.c is freed prematurely.
>   This should be resolved properly.
> - fix overflow in cnstv[].
>
>
> SIN & COS:
> they are based on:
> http://www.devmaster.net/forums/showthread.php?t=5784
>
> There is an fast and a slow(high precision) version of SIN & COS.
>
> For SIN:
> fast = 2 vector instructions
> slow = 5 vector instructions
>
> For COS:
> fast = 5 vector instructions + 2 scaler instructions
> slow = 8 vector instructions + 2 scaler instructions
>
> The fast version appears to do a fine enough job, at least with the
> simple test I have made.
>
>
> Rune Petersen

Nice to tackle this :) few comment, maybe we could make an driconf
option to switch btw fast and slow version (or a more general conf
option to enable or disable fragprog optimization in case we come
with more optimization like that in the future).

For the refcounting i am wondering if i didn't bump into that in
the past, i did use gdb to trace fragprog construction at that
time and found some strange interaction (which lead me to
the rework i did on fragprog).

Anyway here from limited testing your patch seems good,
you should commit it.

best,
Jerome Glisse



Attached a patch to fix refcounting. Basicly whenever a temporary
source was used multiple time inside an instruction that lead to
multiple call to t_hw_src which is correct but as we also decrement
use counter in that function we over decremented the refcount.

The patch decrement refcount after instruction decoding and avoid
over decrementing refcount.

(The patch apply over yours)

best,
Jerome
--- r300_fragprog.c	2007-02-11 14:26:42.0 +0100
+++ /home/glisse/code/r300/mesa/src/mesa/drivers/dri/r300/r300_fragprog.c	2007-02-11 14:25:15.0 +0100
@@ -773,12 +773,6 @@
 			cs->temps[index].reg = get_hw_temp(rp);
 
 		idx = cs->temps[index].reg;
-
-/*
-		if (!REG_GET_NO_USE(src) &&
-		(--cs->temps[index].refcount == 0))
-			free_temp(rp, src);
-*/
 		break;
 	case REG_TYPE_INPUT:
 		idx = cs->inputs[index].reg;
@@ -819,13 +813,6 @@
 			}
 		}
 		idx = cs->temps[index].reg;
-
-/*
-		if (!REG_GET_NO_USE(dest) &&
-		(--cs->temps[index].refcount == 0))
-			free_temp(rp, dest);
-*/
-
 		cs->dest_in_node |= (1 << idx);
 		cs->used_in_node |= (1 << idx);
 		break;
@@ -1215,10 +1202,11 @@
 
 static GLboolean parse_program(struct r300_fragment_program *rp)
 {	
+	COMPILE_STATE;
 	struct gl_fragment_program *mp = &rp->mesa_program;
 	const struct prog_instruction *inst = mp->Base.Instructions;
 	struct prog_instruction *fpi;
-	GLuint src[3], dest, temp;
+	GLuint src[3], dest, temp, srccount;
 	GLuint cnst;
 	int flags, mask = 0;
 	GLfloat cnstv[4] = {0.0, 0.0, 0.0, 0.0};
@@ -1228,6 +1216,7 @@
 		return GL_FALSE;
 	}
 
+
 	for (fpi=mp->Base.Instructions; fpi->Opcode != OPCODE_END; fpi++) {
 		if (fpi->SaturateMode == SATURATE_ZERO_ONE)
 			flags = PFS_FLAG_SAT;
@@ -1238,14 +1227,17 @@
 			mask = fpi->DstReg.WriteMask;
 		}
 
+		srccount = 0;
 		switch (fpi->Opcode) {
 		case OPCODE_ABS:
+			srccount = 1;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			emit_arith(rp, PFS_OP_MAD, dest, mask,
    absolute(src[0]), pfs_one, pfs_zero,
    flags);
 			break;
 		case OPCODE_ADD:
+			srccount = 2;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			src[1] = t_src(rp, fpi->SrcReg[1]);
 			emit_arith(rp, PFS_OP_MAD, dest, mask,
@@ -1253,6 +1245,7 @@
    flags);
 			break;
 		case OPCODE_CMP:
+			srccount = 3;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			src[1] = t_src(rp, fpi->SrcReg[1]);
 			src[2] = t_src(rp, fpi->SrcReg[2]);
@@ -1271,6 +1264,7 @@
 			 *   x = (x < PI)?x : x-2*PI
 			 *   result = sin(x)
 			 */
+			srccount = 1;
 			temp = get_temp_reg(rp);
 			if(rp->const_sin == -1){
 			cnstv[0] = 1.273239545;
@@ -1350,6 +1344,7 @@
 			free_temp(rp, temp);
 			break;
 		case OPCODE_DP3:
+			srccount = 2;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			src[1] = t_src(rp, fpi->SrcReg[1]);
 			emit_arith(rp, PFS_OP_DP3, dest, mask,
@@ -1357,6 +1352,7 @@
    flags);
 			break;
 		case OPCODE_DP4:
+			srccount = 2;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			src[1] = t_src(rp, fpi->SrcReg[1]);
 			emit_arith(rp, PFS_OP_DP4, dest, mask,
@@ -1364,6 +1360,7 @@
    flags);
 			break;
 		case OPCODE_DPH:
+			srccount = 2;
 			src[0] = t_src(rp, fpi->SrcReg[0]);
 			src[1] = t_src(rp, fpi->SrcReg[1]);
 			/* src0.xyz1 -> temp
@@ -1386,6 +1383,7 @@
 #endif
 			break;
 		case OPCODE_DST:
+			srccount = 2;
 			src[0]