Re: [Mesa3d-dev] A tiny problem regarding with the mesa source layout and DRI driver development

2010-03-22 Thread Nicolai Haehnle
On Mon, Mar 22, 2010 at 1:29 PM, LiYe omni.l...@gmail.com wrote:
 I'm interested in openGL implementation and the DRI driver development.
 Specifically, I want to learn how an OpenGL command was implemented and
 how it was converted into direct rendering context and transferred to
 the hardware. I know this is a quite complicated and time-consuming
 task, but it would be great if I can start the learning cruve with my
 newbie background. So I'm trying to look into the mesa codes. However,
 it seems quite large and monolithic and I cannot find a suitable
 breaking point. So I wrote this to ask for some experienced advice. For
 an overview of how DRI works in codes(not in theory as explained in
 documents), where should I start with?

I would suggest getting an IDE that has decent code browsing
capabilities (I personally like to play with the KDevelop4 beta even
though it's still a bit flaky) and just start stepping through your
favourite driver in a debugger.

Note that your life will be less painful if you have a second machine
from which you can SSH in, so that your gdb session doesn't live on
the same X server as the OpenGL application that you're debugging.

cu,
Nicolai

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Mesa3d-dev] DRI SDK and modularized drivers.

2010-03-22 Thread Nicolai Haehnle
On Mon, Mar 22, 2010 at 2:24 AM, Luc Verhaegen l...@skynet.be wrote:
 In
 particular, the Mesa core - classic driver split only makes sense if
 there are enough people who are actually working on those drivers who
 would support the split. Otherwise, this is bound to lead straight
 into hell.

 In a way, the kernel people got it right: put all the drivers in one
 repository, and make building the whole package and having parallel

 put all the drivers in one repository?

 So, all of:
        * drm
        * firmware
        * libdrm
        * xorg
        * mesa/dri
        * mesa/gallium
        * libxvmc
        * libvdpau
        (add more here)
 of the same driver stack, in one repository?

Why not?

Mind you, I'm not advocating for any change at all, but as long as you
feel the need to move stuff around, why not try finding a goal that
people actually find useful? Of course, my suggestion is probably
crap, too.


[snip]
 The real question is: where is the most pain, and how can we reduce it.
 And the most pain is between the driver specific parts.

Nobody has ever had to feel the pain of a separation between Mesa core
and drivers. And since a git log I've just done tells me that you have
committed only twice to the Mesa repository within the last year or
so, maybe you should listen to the opinion of people who *have* been
active in the Mesa tree when it comes to that subject, and are working
on drivers that are probably significantly more involved than whatever
Unichrome does.


 2) it wouldn't actually solve the DRM problems, because we want to
 have the DRM in our codebase, and the kernel people want to have it in
 theirs.

 The kernel people can have theirs. What stops anyone from getting the
 drm code of a released driver stack into the next kernel version?

 But when anyone decides they need a new driver stack which requires a
 new drm module, it should be easy to replace the stock kernel module.

And that has worked so well in the past.

cu,
Nicolai

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Mesa3d-dev] DRI SDK and modularized drivers.

2010-03-19 Thread Nicolai Haehnle
On Thu, Mar 18, 2010 at 5:38 PM, Luc Verhaegen l...@skynet.be wrote:
 So, identify the volatile interfaces, and the more stable interfaces,
 and then isolate the volatile ones, and then you come to only one
 conclusion.

Except that the Mesa core - classic driver interface also wants to
change from time to time in non-trivial ways, and trying to force a
separation there on people who don't want an additional set of
compatibility issues to deal with is not exactly a friendly move.

It may seem e.g. like the DRM interface is the worst because of rather
large threads caused by certain kernel developer's problems, but that
doesn't mean problems wouldn't be created by splitting other areas. In
particular, the Mesa core - classic driver split only makes sense if
there are enough people who are actually working on those drivers who
would support the split. Otherwise, this is bound to lead straight
into hell.

In a way, the kernel people got it right: put all the drivers in one
repository, and make building the whole package and having parallel
installations trivial. The (only?) issues with that in X.org are that:
1) there is a cultural aversion due to the bad experience with the
horrible pre-modularization setup, and
2) it wouldn't actually solve the DRM problems, because we want to
have the DRM in our codebase, and the kernel people want to have it in
theirs.

cu,
Nicolai

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-18 Thread Nicolai Haehnle

I just realized I didn't send it to the list:

There was yet another problem with reordering of instructions. The
attached patch (which is against my earlier patch) should fix this.

~Nicolai


On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:

Another thought; the same changed are probably needed for the vertprog code. I
think there are also a lot of bugs there.


On 3/18/07, Oliver McFadden [EMAIL PROTECTED] wrote:
 This patch seems to break one of my longer fragment programs. I believe this
 is
 because it's running out of registers, but I haven't looked into it in
 detail
 yet.

 I think this patch should be committed, but directly followed by a patch to
 reduce the number of registers used.


 On 3/18/07, Nicolai Haehnle [EMAIL PROTECTED] wrote:
  There were a number of bugs related to the pairing of vector and
  scalar operations where swizzles ended up using the wrong source
  register, or an instruction was moved forward and ended up overwriting
  an aliased register.
 
  The new algorithm for register allocation is slightly conservative and
  may run out of registers before it's strictly necessary. On the plus
  side, it Just Works.
 
  Pairing of instructions is done whenever possible, and in more cases
  than before, so in practice this change should be a net win.
 
  The patch mostly fixes glean/texCombine. One remaining problem is that
  the code duplicates constants and parameters all over the place and
  therefore quickly runs out of resources and falls back to software.
  I'm going to look into that as well.
 
  Please test and commit this patch. If you notice any regressions,
  please tell me (but the tests are looking good).
 
  ~Nicolai
 


commit 1ec4703585171f504180425b65dfab92be2a7782
Author: Nicolai Haehnle [EMAIL PROTECTED]
Date:   Sun Mar 18 13:29:18 2007 +0100

r300: Fix fragment program reordering

Do not move an instruction that writes to a temp forward past an instruction
that reads the same temporary.

diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index bc43953..29436ab 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -674,6 +674,11 @@ struct reg_lifetime {
 	   emitted instruction that writes to the register */
 	int vector_valid;
 	int scalar_valid;
+	
+	/* Index to the slot where the register was last read.
+	   This is also the first slot in which the register may be written again */
+	int vector_lastread;
+	int scalar_lastread;
 };
 
 
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 3c54830..89e9f65 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -1026,10 +1026,11 @@ static void emit_tex(struct r300_fragment_program *rp,
  */
 static int get_earliest_allowed_write(
 		struct r300_fragment_program* rp,
-		GLuint dest)
+		GLuint dest, int mask)
 {
 	COMPILE_STATE;
 	int idx;
+	int pos;
 	GLuint index = REG_GET_INDEX(dest);
 	assert(REG_GET_VALID(dest));
 
@@ -1047,7 +1048,17 @@ static int get_earliest_allowed_write(
 			return 0;
 	}
 	
-	return cs-hwtemps[idx].reserved;
+	pos = cs-hwtemps[idx].reserved;
+	if (mask  WRITEMASK_XYZ) {
+		if (pos  cs-hwtemps[idx].vector_lastread)
+			pos = cs-hwtemps[idx].vector_lastread;
+	}
+	if (mask  WRITEMASK_W) {
+		if (pos  cs-hwtemps[idx].scalar_lastread)
+			pos = cs-hwtemps[idx].scalar_lastread;
+	}
+	
+	return pos;
 }
 
 
@@ -1070,7 +1081,8 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 		GLboolean emit_sop,
 		int argc,
 		GLuint* src,
-		GLuint dest)
+		GLuint dest,
+		int mask)
 {
 	COMPILE_STATE;
 	int hwsrc[3];
@@ -1092,7 +1104,7 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 	if (emit_sop)
 		used |= SLOT_OP_SCALAR;
 	
-	pos = get_earliest_allowed_write(rp, dest);
+	pos = get_earliest_allowed_write(rp, dest, mask);
 	
 	if (rp-node[rp-cur_node].alu_offset  pos)
 		pos = rp-node[rp-cur_node].alu_offset;
@@ -1191,6 +1203,21 @@ static int find_and_prepare_slot(struct r300_fragment_program* rp,
 		cs-slot[pos].ssrc[i] = tempssrc[i];
 	}
 	
+	for(i = 0; i  argc; ++i) {
+		if (REG_GET_TYPE(src[i]) == REG_TYPE_TEMP) {
+			int regnr = hwsrc[i]  31;
+			
+			if (used  (SLOT_SRC_VECTOR  i)) {
+if (cs-hwtemps[regnr].vector_lastread  pos)
+	cs-hwtemps[regnr].vector_lastread = pos;
+			}
+			if (used  (SLOT_SRC_SCALAR  i)) {
+if (cs-hwtemps[regnr].scalar_lastread  pos)
+	cs-hwtemps[regnr].scalar_lastread = pos;
+			}
+		}
+	}
+	
 	// Emit the source fetch code
 	rp-alu.inst[pos].inst1 = ~R300_FPI1_SRC_MASK;
 	rp-alu.inst[pos].inst1 |=
@@ -1287,7 +1314,7 @@ static void emit_arith(struct r300_fragment_program *rp,
 	if ((mask  WRITEMASK_W) || vop == R300_FPI0_OUTC_REPL_ALPHA)
 		emit_sop = GL_TRUE;
 
-	pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest);
+	pos = find_and_prepare_slot(rp, emit_vop, emit_sop, argc, src, dest

[PATCH] [r300] Fix reordering of fragment program instructions and register allocation

2007-03-17 Thread Nicolai Haehnle

There were a number of bugs related to the pairing of vector and
scalar operations where swizzles ended up using the wrong source
register, or an instruction was moved forward and ended up overwriting
an aliased register.

The new algorithm for register allocation is slightly conservative and
may run out of registers before it's strictly necessary. On the plus
side, it Just Works.

Pairing of instructions is done whenever possible, and in more cases
than before, so in practice this change should be a net win.

The patch mostly fixes glean/texCombine. One remaining problem is that
the code duplicates constants and parameters all over the place and
therefore quickly runs out of resources and falls back to software.
I'm going to look into that as well.

Please test and commit this patch. If you notice any regressions,
please tell me (but the tests are looking good).

~Nicolai
diff --git a/src/mesa/drivers/dri/r300/r300_context.h b/src/mesa/drivers/dri/r300/r300_context.h
index bd9ed6f..bc43953 100644
--- a/src/mesa/drivers/dri/r300/r300_context.h
+++ b/src/mesa/drivers/dri/r300/r300_context.h
@@ -647,38 +647,84 @@ struct r300_vertex_program_cont {
 #define PFS_NUM_TEMP_REGS	32
 #define PFS_NUM_CONST_REGS	16
 
-/* Tracking data for Mesa registers */
+/* Mapping Mesa registers to R300 temporaries */
 struct reg_acc {
int reg;/* Assigned hw temp */
unsigned int refcount; /* Number of uses by mesa program */
 };
 
+/**
+ * Describe the current lifetime information for an R300 temporary
+ */
+struct reg_lifetime {
+	/* Index of the first slot where this register is free in the sense
+	   that it can be used as a new destination register.
+	   This is -1 if the register has been assigned to a Mesa register
+	   and the last access to the register has not yet been emitted */
+	int free;
+	
+	/* Index of the first slot where this register is currently reserved.
+	   This is used to stop e.g. a scalar operation from being moved
+	   before the allocation time of a register that was first allocated
+	   for a vector operation. */
+	int reserved;
+	
+	/* Index of the first slot in which the register can be used as a
+	   source without losing the value that is written by the last
+	   emitted instruction that writes to the register */
+	int vector_valid;
+	int scalar_valid;
+};
+
+
+/**
+ * Store usage information about an ALU instruction slot during the
+ * compilation of a fragment program.
+ */
+#define SLOT_SRC_VECTOR  (10)
+#define SLOT_SRC_SCALAR  (13)
+#define SLOT_SRC_BOTH(SLOT_SRC_VECTOR | SLOT_SRC_SCALAR)
+#define SLOT_OP_VECTOR   (116)
+#define SLOT_OP_SCALAR   (117)
+#define SLOT_OP_BOTH (SLOT_OP_VECTOR | SLOT_OP_SCALAR)
+
+struct r300_pfs_compile_slot {
+	/* Bitmask indicating which parts of the slot are used, using SLOT_ constants 
+	   defined above */
+	unsigned int used;
+
+	/* Selected sources */
+	int vsrc[3];
+	int ssrc[3];
+};
+
+/**
+ * Store information during compilation of fragment programs.
+ */
 struct r300_pfs_compile_state {
-   int v_pos, s_pos;   /* highest ALU slots used */
-
-   /* Track some information gathered during opcode
-* construction.
-* 
-* NOTE: Data is only set by the code, and isn't used yet.
-*/
-   struct {
-   int vsrc[3];
-   int ssrc[3];
-   int umask;
-   } slot[PFS_MAX_ALU_INST];
-
-   /* Used to map Mesa's inputs/temps onto hardware temps */
-   int temp_in_use;
-   struct reg_acc temps[PFS_NUM_TEMP_REGS];
-   struct reg_acc inputs[32]; /* don't actually need 32... */
-
-   /* Track usage of hardware temps, for register allocation,
-* indirection detection, etc. */
-   int hwreg_in_use;
-   GLuint used_in_node;
-   GLuint dest_in_node;
+	int nrslots;   /* number of ALU slots used so far */
+	
+	/* Track which (parts of) slots are already filled with instructions */
+	struct r300_pfs_compile_slot slot[PFS_MAX_ALU_INST];
+	
+	/* Track the validity of R300 temporaries */
+	struct reg_lifetime hwtemps[PFS_NUM_TEMP_REGS];
+	
+	/* Used to map Mesa's inputs/temps onto hardware temps */
+	int temp_in_use;
+	struct reg_acc temps[PFS_NUM_TEMP_REGS];
+	struct reg_acc inputs[32]; /* don't actually need 32... */
+	
+	/* Track usage of hardware temps, for register allocation,
+	 * indirection detection, etc. */
+	GLuint used_in_node;
+	GLuint dest_in_node;
 };
 
+/**
+ * Store everything about a fragment program that is needed
+ * to render with that program.
+ */
 struct r300_fragment_program {
 	struct gl_fragment_program mesa_program;
 
diff --git a/src/mesa/drivers/dri/r300/r300_fragprog.c b/src/mesa/drivers/dri/r300/r300_fragprog.c
index 251fd26..b2c89cc 100644
--- a/src/mesa/drivers/dri/r300/r300_fragprog.c
+++ b/src/mesa/drivers/dri/r300/r300_fragprog.c
@@ -94,8 +94,9 @@
 #define REG_NEGV_SHIFT		18
 #define REG_NEGS_SHIFT		19
 #define REG_ABS_SHIFT		20
-#define REG_NO_USE_SHIFT	21
-#define REG_VALID_SHIFT		22

Announcing Piglit, an automated testing framework

2007-03-16 Thread Nicolai Haehnle
Hello,

back when I was actively working on DRI drivers almost three years
ago, I always felt uneasy about the fact that I didn't have an
extensive array of tests that I could rely on to test for regressions.

Now I've decided to do something about it. I've taken Glean and some
code from Mesa and wrapped it with Python and cmake glue to
- execute OpenGL tests without user interaction and
- neatly format the results in HTML

You can find the current version (and a sample HTML summary, to get an
idea of what they look like at the moment) at
http://homepages.upb.de/prefect/piglit/

The idea is to make testing dead simple for driver developers. I
believe that Piglit already makes it quite simple, but I'm sure
there's still room for improvement.

My current plans are:
- Hunt some bugs in R300, to get a better feeling for how the tool
fares in practice
- Integrate tests from Mesa; unfortunately, this needs manual work
because those tests are mainly interactive, but it's definitely
necessary to make this useful

I'm also considering setting up a public repository somewhere, perhaps
on Sourceforge.

Please give it a try when you have a little time to spare and tell me
if you find it useful (or more importantly, why you don't find it
useful), and where it could be improved.

Thanks,
Nicolai

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Linux OpenGL ABI discussion

2005-09-29 Thread Nicolai Haehnle
On Thursday 29 September 2005 18:30, Alan Cox wrote:
 On Iau, 2005-09-29 at 09:49 +0200, Christoph Hellwig wrote:
  On Wed, Sep 28, 2005 at 04:07:56PM -0700, Andy Ritger wrote:
   Some of the topics raised include:
   
   - minimum OpenGL version required by libGL
   - SONAME change to libGL
   - libGL installation path
  
  I think the single most important point is to explicitly disallow
  vendor-supplied libGL binaries in the LSB.  Every other LSB componenet
  relies on a single backing implementation for a reason, and in practic
 
 That is not actually true. It defines a set of API and ABI behaviours
 which are generally based on a single existing common implementation.
 
  the Nvidia libGL just causes endless pain where people acceidentally
  link against it.  The DRI libGL should be declare the one and official
  one, and people who need extended features over it that aren't in the
  driver-specific backend will need to contribute them back.
 
 If the LSB standard deals with libGL API/ABI interfaces then any
 application using other interfaces/feature set items would not be LSB
 compliant. Educating users to link with the base libGL is an education
 problem not directly inside the LSB remit beyond the LSB test tools.
 
 In addition the way GL extensions work mean its fairly sane for an
 application to ask for extensions and continue using different
 approaches if they are not available. In fact this is done anyway for
 hardware reasons. There is a lack of is XYZ accelerated as an API but
 that is an upstream flaw.

The real issue with an IHV-supplied libGL.so is mixing vendors' graphics 
cards. As an OpenGL user (i.e. a developer of applications that link 
against libGL), I regularly switch graphics cards around to make sure 
things work with all the relevant major vendors. Having a vendor-supplied 
libGL.so makes this unnecessarily difficult on the software side (add to 
that the custom-installed header files that have ever so slightly different 
semantics, and there is a whole lot of fun to be had).

Not to mention the use case with two graphics cards installed at the same 
time, from different vendors. While the above problem is annoying but 
acceptable, there's simply no reasonable way to use two graphics cards from 
vendors that insist on their custom libGL.so. Having to hack around with 
LD_LIBRARY_PATH and the likes is ridiculous.

I'm not too familiar with the exact details of the DRI client-server 
protocol, so maybe it may be necessary to turn the libGL.so into even more 
of a skeleton, and reduce the basic DRI protocol to a simple tell me the 
client side driver name, so that IHVs can combine (for example) custom GLX 
extensions with direct rendering.

cu,
Nicolai


pgpeJXl3bNGf2.pgp
Description: PGP signature


Re: dual-TMU support

2005-09-17 Thread Nicolai Haehnle
On Saturday 17 September 2005 16:04, Aapo Tahkola wrote:
 On Sat, 17 Sep 2005 09:48:37 -0400 (EDT)
 Vladimir Dergachev [EMAIL PROTECTED] wrote:
  The user error messages is due to the fact that glxgears sometimes 
outputs 
  insufficient number of vertices to draw a primitive - for example only 2 
  vertices for a quad.
 
 This is normal AFAIK, and since mesa doesnt do it, we need to.
 For all I care this message should be removed.

This is by no means normal, and is the symptom of a bug in Mesa's recording 
of vertex arrays in display lists. At least the last time I looked at it it 
was.

https://bugs.freedesktop.org/show_bug.cgi?id=3129 is the relevant bug 
report.

cu,
Nicolai


pgpOhHEaHcPuW.pgp
Description: PGP signature


Re: [r300/ppc] lockups

2005-06-21 Thread Nicolai Haehnle
On Tuesday 21 June 2005 10:54, Jerome Glisse wrote:
 On 6/21/05, Vladimir Dergachev [EMAIL PROTECTED] wrote:
  On Sat, 18 Jun 2005, Johannes Berg wrote:
   Any idea where I should start looking for the source of the lockups or 
what else to do?
  
  The problem is likely either due to the radeon memory controller - in
  particular registers like MC_FB_LOCATION MC_AGP_LOCATION - or some sort 
of
  AGP issue with ring buffer not working properly.
  
 
 IIRC Paul has a similar problem with his powerbook and Ben provided
 patch against xorg  drm for correcting the way this reg are setup. But
 this patch were for normal drm. Maybe once we get time we should look
 at that an program properly this reg in r300...

Not knowing this particular patch or whether anybody has tried our driver on 
PPC, what about endianness issues? I know it's obvious, but who knows...

cu,
Nicolai


pgpi0g7Vngt13.pgp
Description: PGP signature


Re: [R300-commit] r300_driver/r300 r300_reg.h,1.44,1.45 r300_state.c,1.112,1.113

2005-06-21 Thread Nicolai Haehnle
On Tuesday 21 June 2005 18:06, Aapo Tahkola wrote:
 On Thu, 16 Jun 2005 14:22:36 +0200
 Nicolai Haehnle [EMAIL PROTECTED] wrote:
 
  On Thursday 16 June 2005 13:41, Aapo Tahkola wrote:
   Update of /cvsroot/r300/r300_driver/r300
   In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6333
   
   Modified Files:
r300_reg.h r300_state.c 
   Log Message:
   Use depth tiling.
  
  Will this work with software fallbacks?
 
 Im not quite sure but more recent r200_span.c has few words about it.
 Attached patch enables color tiling in case someone wants to play with it.

You *will* have to update radeon_span.c accordingly. I haven't looked into 
how this surface business works, that might help a bit, but I doubt you get 
away without changing anything in radeon_span.c

In fact, enabling the depth tiling did break software fallbacks, which 
includes depth readbacks. So stuff like glean and Cube (which uses depth 
readback to figure out your line of fire) is broken with depth tiling, 
which is why I backed that change out.

We really, really need a working fallback path. I can't stress this enough.

cu,
Nicolai


pgpmdHNtAmguB.pgp
Description: PGP signature


Re: [R300-commit] r300_driver/r300 r300_reg.h,1.44,1.45 r300_state.c,1.112,1.113

2005-06-21 Thread Nicolai Haehnle
On Tuesday 21 June 2005 21:15, Rune Petersen wrote:
 Aapo Tahkola wrote:
 *snip*
  +if (info-ChipFamily = CHIP_FAMILY_R300) {
  + unsigned char *RADEONMMIO = info-MMIO;
  + OUTREG(0x180, INREG(0x180) | 0x1100);
  +}
  +
 0x180 is defined as R300_MC_INIT_MISC_LAT_TIME in r300_reg.h.
 This seams unrelated to tiling. 

I agree that
a) the appropriate #defines should be added in the 2D driver instead of 
putting magic values everywhere when we can do better and
b) this should be split out into a different patch (note that you can do 
this kind of splitting with a simple editor; you just have to make sure 
that you do not modify the patch chunks themselves, be especially with the 
whitespace)

 Also I remember seeing that the values  
 are different depending on chip family. Is this safe?

Well, I have tested this on three different chips (R300, rv350 (mobile) and 
R420, which is quite a nice sample), and:
- fglrx sets this on all the chips and
- setting it in our driver caused no regressions.

Of course, it would be even better if people could test it on their hardware 
(use hw_script from r300 CVS to query the register value while fglrx is 
running, as well as test the patch).

cu,
Nicolai


pgp66TLxvo7vs.pgp
Description: PGP signature


Re: [R300] securing r300 drm

2005-06-21 Thread Nicolai Haehnle
On Tuesday 21 June 2005 20:57, Vladimir Dergachev wrote:
 Now that the driver paints usable pictures without lockups on many cards,
 including AGP versions of X800 and Mobility M10, it would make sense to 
 ready it for inclusion into main DRI codebase.
 
 I do not think that elusive lockups of Radeon 9800 cards, or issues with 
 PowerPC will require any drastic changes.
 
 As we discussed earlier, the major reason against inclusion into 
 mainstream DRI CVS is that the driver is not secure in its current state.
 
 Below, I will attempt to list current known issues - please reply with 
 your additions.
 
* r300_emit_unchecked_state - it is not as unchecked as it has been
  initially, however a few poorly checked registers remain:

Those poorly checked registers should be moved out of unchecked_state and 
into its own function. Adding these checks into unchecked_state will just 
add overhead to what should be a fast path.

The idea would be to add something like r300_emit_special_state which 
doesn't use register addresses but has subcommands like the state setting 
for radeon/r200.

  from r300_cmdbuf.c:
 
   ADD_RANGE(R300_RB3D_COLOROFFSET0, 1); /* Dangerous */
   ADD_RANGE(R300_RB3D_COLORPITCH0, 1); /* Dangerous */
/* .. snip ... */
   ADD_RANGE(R300_RB3D_DEPTHOFFSET, 2); /* Dangerous */
 
  In principle an attacker can set these to point to AGP or system
  RAM and then cause a paint operation to overwrite particular
  memory range.
 
  Ideally we should check that these point inside the framebuffer,
  i.e. are within range specified by MC_FB_LOCATION register.

Right. Actually, to be on the safe side, we'd have to set min/max clipping 
rects at the same time as setting those buffer offsets. This is currently 
not a problem, but it will become one when (if? ;)) we implement 
framebuffer_object etc.

/* Texture offset is dangerous and needs more checking */
   ADD_RANGE(R300_TX_OFFSET_0, 16);
 
  I don't think texture offsets are ever written to, however if 
they
  point in the wrong place they can be used to read memory 
directly.

Setting those texture offsets wrong can actually lock up the machine as I 
found out when I temporarily put MC_FB_LOCATION into its natural position 
(i.e. where it's put on older radeons).

  ideally we would check these to be either with MC_FB_LOCATION
  or MC_AGP_LOCATION ranges. Problem is what do we do on PCI 
cards ?
  use AIC controller settings ?

Unfortunately, I don't know enough about PCI to comment, and what's AIC 
anyway? I've seen some register names with AIC in it, but they don't really 
seem to be used.

* r300_emit_raw - we do not have code that checks any of bufferred 3d
  packets, in particular VBUF_2, IMMD_2, INDX_2 and INDX_BUFFER.
 
  I think that none of these can be exploited except to cause a lockup 
-
  please correct me if I am wrong

* r300_emit_raw - RADEON_3D_LOAD_VBPNTR - this sets offsets and so
  like texture offset registers could be exploited to read protected
  memory locations.
 
  Again, we need to check the offsets against something reasonable.

Note that by putting the offset at the end of allowed memory and setting the 
number of vertices very high, you could read memory that you shouldn't have 
access to.

But the more important thing is: What's up with r300_emit_raw anyway? It was 
originally supposed to do what the name suggests: Emit raw data into the 
ring buffer, purely as a hack for experimentation.

People have bent it in extreme ways, so it has clearly gone beyond that, and 
that's a Bad Thing. For one thing, r300_emit_raw doesn't get cliprects 
right. If you have more than four cliprects, you need to emit rendering 
commands multiple times.

Seriously, all the stuff that uses emit_raw should just be migrated to use 
r300_emit_packet3, which will clean this up a lot.

 
* anything I forgot ?

Talking about security, have a look at radeon_state.c. Where on earth does 
*this* come from:

/* Allocate an in-kernel area and copy in the cmdbuf.  Do this to 
avoid
 * races between checking values and using those values in other 
code,
 * and simply to avoid a lot of function calls to copy in data.
 */
orig_bufsz = cmdbuf.bufsz;
if (orig_bufsz != 0) {
kbuf = drm_alloc(cmdbuf.bufsz, DRM_MEM_DRIVER);
if (kbuf == NULL)
return DRM_ERR(ENOMEM);
if (DRM_COPY_FROM_USER(kbuf, cmdbuf.buf, cmdbuf.bufsz)) {
drm_free(kbuf, orig_bufsz, DRM_MEM_DRIVER);
return DRM_ERR(EFAULT);
}
cmdbuf.buf = kbuf;
}

This just shouts insanity. It calls kmalloc every single time you emit a 
command buffer!

The security issue mentioned in the comment is real, but there's a really 
simple rule of thumb that eliminates it 

Re: [R300-commit] r300_driver/r300 r300_reg.h,1.44,1.45 r300_state.c,1.112,1.113

2005-06-21 Thread Nicolai Haehnle
On Wednesday 22 June 2005 03:09, Rune Petersen wrote:
 Nicolai Haehnle wrote:
 Also I remember seeing that the values  
 are different depending on chip family. Is this safe?
  
  
  Well, I have tested this on three different chips (R300, rv350 (mobile) 
and 
  R420, which is quite a nice sample), and:
  - fglrx sets this on all the chips and
  - setting it in our driver caused no regressions.
  
  Of course, it would be even better if people could test it on their 
hardware 
  (use hw_script from r300 CVS to query the register value while fglrx is 
  running, as well as test the patch).
  
 I just had a quick try, it doesn't seam to cause any regressions.
 Am I right in assuming that it should reduce lockups on Radeon 9800?

No. At least it didn't for me, and it didn't help for Jerome either if I 
recall correctly.

However, it apparently fixes some display issues (white horizontal lines?) 
that some people were seeing.

cu,
Nicolai


pgpbVIuyR9kTA.pgp
Description: PGP signature


Re: [R300] radeon 9800 lockup : guilty reg list

2005-06-18 Thread Nicolai Haehnle
On Saturday 18 June 2005 08:20, Benjamin Herrenschmidt wrote:
 On Fri, 2005-06-17 at 18:37 +0200, Jerome Glisse wrote:
  Correct value (previous were ones of a dumb test :)):
  
  0x01480xf7fff000  RADEON_MC_FB_LOCATION
  0x014c0xfdfffc00  RADEON_MC_AGP_LOCATION
 
 Those look much better. If changing those help for us, then I was right
 saying that our hacks are no good :) More specifically, for r300, for
 some reason, we still put the FB at 0 in card space, which isn't a
 terrific idea, and for both r200 and r300, we incorrectly use
 CONFIG_APER_SIZE for sizing the memory controller apertures instead of
 the actual memory size.

Consider the following steps:
1. Load fglrx
2. Unload fglrx
3. Load r300 (without reboot)
4. r300 runs just fine without lockups

However, r300 obviously overwrites the RADEON_MC_FB/AGP_LOCATION registers. 
So while it is obviously a good idea to fix our behaviour here, I'm afraid 
it would be highly surprising if those registers were the cause of lockups.

cu,
Nicolai


pgp3KX8WbCIEg.pgp
Description: PGP signature


Re: [R300-commit] r300_driver/r300 r300_reg.h,1.44,1.45 r300_state.c,1.112,1.113

2005-06-16 Thread Nicolai Haehnle
On Thursday 16 June 2005 13:41, Aapo Tahkola wrote:
 Update of /cvsroot/r300/r300_driver/r300
 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv6333
 
 Modified Files:
   r300_reg.h r300_state.c 
 Log Message:
 Use depth tiling.

Will this work with software fallbacks?

cu,
Nicolai


pgpV14VXQ7uoa.pgp
Description: PGP signature


Re: [R300] new snapshot ?

2005-06-12 Thread Nicolai Haehnle
On Friday 10 June 2005 18:10, Vladimir Dergachev wrote:
 
 On Fri, 10 Jun 2005, Aapo Tahkola wrote:
 
  Someone, I believe it was Aapo, said that they see white lines across 
the
  screen when the framerate is fairly high. I didn't see this up until 
yesterday
  when I had to change from my 9600pro to a 9600XT (I killed the card 
moving
  it between machines somehow).
 
  Are you using SiS based motherboard by any chance?
  Following patch should fix this at the cost of some speed...
 
 I just committed the following patch to r300_reg.h:

Thanks. By the way, I confirmed that fglrx sets those bits in 0x180 on the 
following cards:
- 0x4E44 (R300)
- 0x4E50 (RV350)
- 0x4A49 (R420)
... i.e. pretty much across the board. However, there are many other 
registers that it touches, and I couldn't test how it affects lockups yet.

 ===
 RCS file: /cvsroot/r300/r300_driver/r300/r300_reg.h,v
 retrieving revision 1.41
 diff -u -r1.41 r300_reg.h
 --- r300_reg.h  8 Jun 2005 15:05:24 -   1.41
 +++ r300_reg.h  10 Jun 2005 16:09:22 -
 @@ -1,6 +1,27 @@
   #ifndef _R300_REG_H
   #define _R300_REG_H
 
 +#define R300_MC_INIT_MISC_LAT_TIMER0x180
 +#  define R300_MC_MISC__MC_CPR_INIT_LAT_SHIFT  0
 +#  define R300_MC_MISC__MC_VF_INIT_LAT_SHIFT   4
 +#  define R300_MC_MISC__MC_DISP0R_INIT_LAT_SHIFT   8
 +#  define R300_MC_MISC__MC_DISP1R_INIT_LAT_SHIFT   12
 +#  define R300_MC_MISC__MC_FIXED_INIT_LAT_SHIFT16
 +#  define R300_MC_MISC__MC_E2R_INIT_LAT_SHIFT  20
 +#  define R300_MC_MISC__MC_SAME_PAGE_PRIO_SHIFT24
 +#  define R300_MC_MISC__MC_GLOBW_INIT_LAT_SHIFT24

Is the last 24 supposed to be a 28?

 +
 +
 +#define R300_MC_INIT_GFX_LAT_TIMER 0x154
 +#  define R300_MC_MISC__MC_G3D0R_INIT_LAT_SHIFT0
 +#  define R300_MC_MISC__MC_G3D1R_INIT_LAT_SHIFT4
 +#  define R300_MC_MISC__MC_G3D2R_INIT_LAT_SHIFT8
 +#  define R300_MC_MISC__MC_G3D3R_INIT_LAT_SHIFT12
 +#  define R300_MC_MISC__MC_TX0R_INIT_LAT_SHIFT 16
 +#  define R300_MC_MISC__MC_TX1R_INIT_LAT_SHIFT 20
 +#  define R300_MC_MISC__MC_GLOBR_INIT_LAT_SHIFT24
 +#  define R300_MC_MISC__MC_GLOBW_FULL_LAT_SHIFT0

Is the last 0 supposed to be a 28?

cu,
Nicolai


pgp7aXAA21s8A.pgp
Description: PGP signature


Re: [R300] new snapshot ?

2005-06-10 Thread Nicolai Haehnle
On Friday 10 June 2005 16:52, Aapo Tahkola wrote:
 On Fri, 10 Jun 2005 14:31:48 +1000
 Ben Skeggs [EMAIL PROTECTED] wrote:
 
  Aapo Tahkola wrote:
  
  Someone, I believe it was Aapo, said that they see white lines across 
the
  screen when the framerate is fairly high. I didn't see this up until 
yesterday
  when I had to change from my 9600pro to a 9600XT (I killed the card 
moving
  it between machines somehow).
  
  
  
  Are you using SiS based motherboard by any chance?

  
  Nope, I'm using an nforce3 based board (Gigabyte GA-K8NS Ultra-939)
  
  Following patch should fix this at the cost of some speed...

  
  This does indeed seem to correct the problem, and I don't notice a loss 
  of speed.
  glxgears rose by about 20fps, and quake3 by 5-10 fps..  I updated xorg 
  in the
  process of applying the patch, so it could be something from there.
  
  What exactly does the patch do?  Or is it some magic we don't about yet?
 
 Perhaps ATI guys could answer that.

Umm... you *must* have that piece of code from *somewhere*, it can't just 
have fallen out of the sky. And that alone could provide at least some clue 
as to what this does...

cu,
Nicolai


pgpUcUe8u9dz5.pgp
Description: PGP signature


Re: [R300] on lockups

2005-06-05 Thread Nicolai Haehnle
On Sunday 05 June 2005 15:55, Vladimir Dergachev wrote:
 On Sat, 4 Jun 2005, Nicolai Haehnle wrote:
 
The mirroring works as follows: each time scratch register is written
  the
  radeon controller uses PCI to write their value to a specific location 
in
  system memory.
 
  Are you sure it uses PCI? I'm assuming that the destination address for
  scratch writeback is controlled by the RADEON_SCRATCH_ADDR register. 
This
  register is programmed to a value that falls within the AGP area (as
  defined by RADEON_MC_AGP_LOCATION) if I understand the code correctly.
 
 My understanding is that AGP only does transfers system RAM - video RAM
 and all transfers in the opposite direction have to use plain PCI 
 transfers at least as far as the bus is concerned.

You mean system RAM - graphics card, right? Does this mean that the 
graphics card cannot always write into memory that falls within 
RADEON_MC_AGP_LOCATION?

 It could be that AGP GART can still decode addresses for writes to system 
 memory, I guess this depends on a particular architecture.
 
 One of the reasons to look forward to PCI Express is that it is 
 bi-directional, unlike AGP.
 
 
This, of course, would not work if the memory controller is
  misprogrammed
  - which was the cause of failures.
 
Which way can memory controller be misprogrammed ? The part that
  concerns
  us are positions of Video RAM, AGP and System Ram in Radeon address 
space.
  (these are specified by RADEON_MC_AGP_LOCATION, RADEON_MC_FB_LOCATION).
 
  What's the meaning of RADEON_AGP_BASE, by the way? It is programmed to
  dev-agp-base, which is AFAIK an address from the kernel's address 
space.
  That doesn't make much sense to me.
 
 It could be anything. However, the recommended way to program the memory 
 controller is to set the BASE of video memory to its physical PCI address 
 and to put AGP memory where it is mirrored by the AGP GART, as, 
 presumably, this does not overlap with system RAM or any of other 
 sensitive areas.
 
 My understanding is that dev-agp-base is the address where the AGP GART 
 mirrors the pieces of system RAM comprising AGP space.

Yes, that's my understanding, too. But what is the Radeon's business knowing 
that address? Why does it need to know this address? I thought this was CPU 
address space, not card address space.

cu,
Nicolai


pgpmw9nGNctUV.pgp
Description: PGP signature


Re: [R300] on lockups

2005-06-05 Thread Nicolai Haehnle
On Sunday 05 June 2005 20:07, Vladimir Dergachev wrote:
  My understanding is that dev-agp-base is the address where the AGP 
GART
  mirrors the pieces of system RAM comprising AGP space.
 
  Yes, that's my understanding, too. But what is the Radeon's business 
knowing
  that address? Why does it need to know this address? I thought this was 
CPU
  address space, not card address space.
 
 Yes, however it is convenient to do so.
 
 The point is that AGP base address will not normally overlap the location 
 of system RAM. This is, of course, only reasonable for 32 bit systems..

I understand that part, but it's not what I meant. What I mean is this: You 
said, RADEON_MC_AGP_LOCATION is used to program where AGP is in the card's 
address space, and that's all fine and makes sense.

However, we are *also* programming dev-agp-base into a register called 
RADEON_AGP_BASE. What is the meaning of that register?

cu,
Nicolai


pgpmkn1hvrepD.pgp
Description: PGP signature


Re: [R300] on lockups

2005-06-04 Thread Nicolai Haehnle
On Saturday 04 June 2005 15:01, Vladimir Dergachev wrote:
   I just wanted to contribute the following piece of information that 
might 
 help with R300 lockups. I do not know whether it applies or not in this 
 case, but just something to be aware about.
 
   Radeon has a memory controller which translates internal address space 
of 
 the chip into accesses of different memory - framebuffer, agp, system ram.
 
   So from the point of view of Radeon chip there is a single flat 32 bit 
 address space which contains everything. This is nice because you can 
 simply set texture offset to a particular number and the chip will pull it 
 from appropriate memory - be it video memory, agp or system ram. (albeit 
 system ram access is done via PCI, not AGP commands and thus is much 
 slower).
 
   It used to be that Radeon DRM driver had two modes for usage of scratch 
 registers - a mode when it polled Radeon chip directly and a mode when the 
 contents of the registers were mirrored in the system RAM. The driver 
 would try mirroring during startup and if it fails uses polling method.
 
   The mirroring works as follows: each time scratch register is written 
the 
 radeon controller uses PCI to write their value to a specific location in 
 system memory.

Are you sure it uses PCI? I'm assuming that the destination address for 
scratch writeback is controlled by the RADEON_SCRATCH_ADDR register. This 
register is programmed to a value that falls within the AGP area (as 
defined by RADEON_MC_AGP_LOCATION) if I understand the code correctly.

   This, of course, would not work if the memory controller is 
misprogrammed 
 - which was the cause of failures.
 
   Which way can memory controller be misprogrammed ? The part that 
concerns 
 us are positions of Video RAM, AGP and System Ram in Radeon address space.
 (these are specified by RADEON_MC_AGP_LOCATION, RADEON_MC_FB_LOCATION).

What's the meaning of RADEON_AGP_BASE, by the way? It is programmed to 
dev-agp-base, which is AFAIK an address from the kernel's address space. 
That doesn't make much sense to me.

   The memory controller *always* assumes that system RAM (accessible via 
 PCI) starts at 0. So, if RADEON_MC_FB_LOCATION, for example, is set to 0
 then we have video RAM overlapping system RAM. However, the size of video 
 RAM is usually much smaller than the size of system RAM. So if the scratch 
 registers image in system memory had small physical address you might get 
 a lockup and if it was high you don't. You also would be more likely to 
get 
 a lockup when load on system memory increased.

Hmm. The way RADEON_MC_(FB|AGP)_LOCATION are programmed, it seems to be like 
they actually consist of two 16 bit fields, one indicating the start of the 
FB/AGP area, the other indicating the end.

Do you know what happens when the programmed size of the FB area is larger 
than the physical size of video RAM? What happens when the programmed size 
of the AGP area is larger than the size of the AGP aperture?

   This problem has been fixed for plain Radeon drivers, but it could be 
 that something similar is manifesting again on R300..

How did that fix work?

cu,
Nicolai


pgpNmZIBoEWgy.pgp
Description: PGP signature


Re: [r300] [patches] debugging lockups

2005-06-03 Thread Nicolai Haehnle
On Friday 03 June 2005 10:28, Aapo Tahkola wrote:
  On Thursday 02 June 2005 13:08, Boris Peterbarg wrote:
  Aapo Tahkola wrote:
   I did some figuring on the CB_DPATH problem.
   After little testing it turns out that the lock up with
   progs/demos/isosurf goes away when the pacifier sequences are applied
  to
   clearbuffer.
  
   Im starting to think that this sequence is needed whenever 
overwriting
   certain states rather than whenever 3d operation begins and ends.
 
  Perhaps. I don't think it's just the pacifier sequence, though. I've 
been
  running applications with RADEON_DEBUG=sync, which causes idle calls
  between cmdbuf calls, and I've been seeing lockups where the read 
pointer
  sits after the beginning of what cmdbuf emits and long before the first
  rendering commands.
 
 I dont know if packet3 was issued before since I tweaked isosurf to dump
 each frame portion of RADEON_DEBUG info into files using freopen.
 But DISCARD BUF was really the only key difference in these logs.
 
  This indicates that at least one lockup is cause by
  one
  of the following:
  - intial pacifier sequence in r300_cp_cmdbuf
  - emission of cliprects or
 Cliprects seem to be little off scale when compairing progs/samples/logo
 against software rendering.
 Perhaps near plane is negative ?
 
  - initial unchecked_state commands sent by the client
 This is bad as you can see from first frame drawn by texwrap...
 Sticking:
   r300_setup_textures(ctx);
   r300_setup_rs_unit(ctx);
 
   r300SetupVertexShader(rmesa);
   r300SetupPixelShader(rmesa);
 or resethwstate to r300_run_vb_render should fix it.

I'm not sure we're talking about the same thing here. This happens when the 
client sends a command buffer where all the state blocks (from r300-hw) 
are sent to the hardware *anyway*. It's actually the *order* of emission 
(e.g. the order in which insert_at_tail is called for state bits) that can 
make a difference. The thing is, while the order definitely *does* affect 
the probability of lockups, lockups will not go away completely even if I 
use the exact same order that fglrx uses. So I'm beginning to believe that 
we can't trust radeon_do_cp_idle to completely idle the chip, or that 
whatever is wrong is pretty fundamentally wrong (some wrong bits in how the 
memory is configured?).

cu,
Nicolai


pgprQzzlJLfsA.pgp
Description: PGP signature


Re: [r300] [patches] debugging lockups

2005-06-03 Thread Nicolai Haehnle
On Friday 03 June 2005 00:25, Benjamin Herrenschmidt wrote:
  
   You guys seem to be getting closer...
   When I had X + xfce4 + quake3 running (with this patch + 
   patch.drm-cmdbuf-more-pacifiers + patch.remove-userspace-pacifiers) X 
   locked up within 2 minutes.
   However, X + quake3 (no window manager), I went thirty minutes before 
   my first problem.  Quake3Arena crashed, and X quit.  There was some 
   message on the terminal about radeon_wait and IRQ 16.  
  
  
  Here it is:
  
  radeonWaitIrq: drmRadeonIrqWait: -16
 
 Have you tried David Airlie's or my latest DRM IRQ fixes ?

It is unlikely that the problem is related, especially when X locks up, too. 
If you see this message and X is continuing to run fine (i.e. no complete 
lockup), you should indeed consider looking at the IRQ code.

However, if X locks up completely, the most likely reason for this message 
is simply that the R300 locks up before it encounters the IRQ EMIT command 
in the ring buffer. It just happens then that the DRI client waits for an 
IRQ instead of busy-looping for the chip to idle.

So it is perfectly likely that this message appears even when the IRQ 
handling code is working fine. Nevertheless, testing that patch can't hurt.

cu,
Nicolai


pgp5KDqFMYqUZ.pgp
Description: PGP signature


[r300] [patches] debugging lockups

2005-05-31 Thread Nicolai Haehnle
Hello everybody,

today's lockup-chasing wrapup follows :)

Two observations about the lockups I've been seeing:

(1) Lockups are more likely to occur when the ring buffer is filled with 
packet2s for alignment (see the attached experimental 
patch.drm-align-ring).

(2) Lockups are a lot less likely to occur when additional synchronisation 
measures are taken (like waiting for the read pointer to catch up with the 
write pointer after every ADVANCE_RING).

If we assume that the (most) lockups are caused by a race in memory 
accesses, then both observations make sense: Filling the ring buffer with 
packet2s causes the CP to request new batches from the ring buffer more 
often, and waiting for the ring buffer to catch up means that less stuff 
happens in parallel. Of course there may be a number of other 
interpretations.

Another observation:

(3) On my system, lockups involving simple programs (like glxgears) are a 
lot more likely to happen when multiple 3D clients are running in parallel. 
In particular, starting glean while running glxgears means an almost 
certain lockup, at least with the patch.remove-userspace-pacifiers that I 
posted earlier. (The background for that patch was that fglrx never emits a 
pacifier sequence in between 3D commands)

I have written a very unintrusive debugging facility for the DRM that 
basically logs which parts of the code emit commands to the ring buffer. 
When a lockup is detected, it prints this information out (via printk) 
along with a dump of the relevant part of the ring buffer. I have attached 
this patch, it is called patch.drm-debug-lockups-enabled (this logging 
facility can be disabled at compile time via the RADEON_DEBUG_LOCKUPS 
define in radeon_drv.h).

Using this patch, I have made another observation:

(4) All the lockups that happen for me occur when two cmdbuf ioctls are 
processed immediately after another, without an idle ioctl or similar 
inbetween.

So I have compared what our driver does at the boundary of 3D commands to 
what fglrx does, and I've come up with the attached 
patch.drm-cmdbuf-more-pacifiers, which adds an additional wait command to 
the end of r300_do_cp_cmdbuf. Using this patch, glean no longer locks up 
immediately when glxgears is running at the same time. Unfortunately, not 
all lockups have gone away yet...

What you can do: Please, test the attached patch.drm-cmdbuf-more-pacifiers, 
and report if there are any regressions (I don't believe there are any) 
and/or if it removes certain lockups you are seeing.

If you're feeling adventurous, I'd appreciate it if you could also try this 
patch in combination with the patch.remove-userspace-pacifiers patch I 
posted earlier, though this patch appears to be dangerous still (even 
though I do not understand why).

cu,
Nicolai
Index: drm/shared-core/radeon_drv.h
===
RCS file: /cvsroot/r300/r300_driver/drm/shared-core/radeon_drv.h,v
retrieving revision 1.12
diff -u -3 -p -r1.12 radeon_drv.h
--- drm/shared-core/radeon_drv.h	3 Mar 2005 04:40:21 -	1.12
+++ drm/shared-core/radeon_drv.h	31 May 2005 17:36:01 -
@@ -985,14 +985,37 @@ do {	\
 
 #define RING_LOCALS	int write, _nr; unsigned int mask; u32 *ring;
 
+#define ALIGN_RING() do {		\
+ 	int _nr = 32 - (dev_priv-ring.tail  31);			\
+	int _write;			\
+	if (dev_priv-ring.space = (_nr+1) * sizeof(u32)) {		\
+		COMMIT_RING();		\
+		radeon_wait_ring( dev_priv, (_nr+1) * sizeof(u32) );	\
+	}\
+	_write = dev_priv-ring.tail;	\
+	if (_write  1)	{		\
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2;	\
+		_write = _write % dev_priv-ring.tail_mask;		\
+		_nr--;			\
+	}\
+	while( _nr = 2 ) {		\
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2; 	\
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2;	\
+		_write = _write % dev_priv-ring.tail_mask;		\
+		_nr -= 2;		\
+	}\
+	dev_priv-ring.tail = _write;	\
+} while (0)
+
 #define BEGIN_RING( n ) do {		\
+	ALIGN_RING(); /* TEST TEST */	\
 	if ( RADEON_VERBOSE ) {		\
 		DRM_INFO( BEGIN_RING( %d ) in %s\n,			\
 			   n, __FUNCTION__ );\
 	}\
-	if ( dev_priv-ring.space = (n) * sizeof(u32) ) {		\
+	if ( dev_priv-ring.space = dev_priv-ring.size/2 /*(n+1) * sizeof(u32)*/ ) {		\
 COMMIT_RING();		\
-		radeon_wait_ring( dev_priv, (n) * sizeof(u32) );	\
+		radeon_wait_ring( dev_priv, dev_priv-ring.size/2/*(n+1) * sizeof(u32)*/ );	\
 	}\
 	_nr = n; dev_priv-ring.space -= (n) * sizeof(u32);		\
 	ring = dev_priv-ring.start;	\
Index: drm/shared-core/radeon_cp.c
===
RCS file: /cvsroot/r300/r300_driver/drm/shared-core/radeon_cp.c,v
retrieving revision 1.7
diff -u -3 -p -r1.7 radeon_cp.c
--- drm/shared-core/radeon_cp.c	19 Apr 2005 21:05:18 -	1.7
+++ drm/shared-core/radeon_cp.c	31 May 2005 19:29:50 -
@@ -846,6 +847,126 @@ static void 

Re: r300 bugs

2005-05-30 Thread Nicolai Haehnle
Hi,

On Monday 30 May 2005 08:51, Vladimir Dergachev wrote:
 On Mon, 30 May 2005, Bernhard Rosenkraenzer wrote:
 
  Hi,
  I've just tried out the r300 driver - works remarkably well for 
untested and
  broken code.
 
 :))
 
 
  I've run into 2 bugs though:
  It doesn't work well if the display uses 16 bpp (24 bpp works perfectly) 
-- 3D
  in 16 bpp is pretty badly misrendered (sample attached; 2D works well w/ 
r300
  DRI even at 16 bpp) -- mixed with a random section of the rest of the 
screen,
  wrong colors, and drawn way too large (but close enough to the expected
  output to recognize it).
 
 I don't think we ever focused on getting 16bpp right - having 32bpp 
 working is fun enough :) Also, all of r300 and later cards have more than 
 enough RAM for 32bpp modes.
 
 That said, it is probably just a matter of making sure some constants are 
 set properly (like colorbuffer parameters), I don't think anything else in 
 the driver is tied to that.

If fglrx supports 16 bits (seriously, I've never tried that - who wants 16 
bits anyway ;)), it's a matter of using glxtest to figure out the necessary 
color buffer setup code. Some other constants may be different, but it's 
unlikely.

In addition to that, you will have to change radeon_span.c (for software 
fallbacks, as wall as Read/DrawPixels functionality) accordingly, as well 
as probably some context creation related stuff.

Also, you might want to look into the code that selects texture formats. It 
probably doesn't make too much sense to select a 32 bit texture format at a 
16 bit screen resolution unless the user explicitly requests it.

cu,
Nicolai


pgpzGHEIXlh44.pgp
Description: PGP signature


Re: [patches] Re: r300 radeon 9800 lockup

2005-05-29 Thread Nicolai Haehnle
On Sunday 29 May 2005 02:31, Ben Skeggs wrote:
 Morning,
 
 After playing UT2004 for 10 or so minutes, and then quickly checking 
 some other
 apps known to worn, I see no regressions with either patch.
 
 I'll be putting it through some more rigorous testing as the day 
 progresses, will
 report back if I find anything.
 
 Also, out of interest, what triggered the lockup you saw?

Pretty much anything could trigger it, from glxgears (unlikely lockup) over 
glean (regular lockups) to cube (almost instant lockup).

Unfortunately, just like others are reporting, I'm still getting lockups, 
too. This time, however, they are more elusive as the lockups disappear as 
soon as I start looking for them:

I used the attached patch (in variations) to hunt for the previous lockup. 
What this patch does is that it basically commits the ring buffer after 
every ADVANCE_RING and waits for the read ptr to catch up with the write 
ptr. Feel free to try this patch if you're seeing lockups, perhaps you can 
find something out that way. However, as soon as I enable the call to 
commit_and_wait, the lockups disappear for me.

I'm still going to try to find this thing, but it looks like it's going to 
be difficult.

cu,
Nicolai
Index: drm/shared-core/radeon_cp.c
===
RCS file: /cvsroot/r300/r300_driver/drm/shared-core/radeon_cp.c,v
retrieving revision 1.7
diff -u -3 -p -r1.7 radeon_cp.c
--- drm/shared-core/radeon_cp.c	19 Apr 2005 21:05:18 -	1.7
+++ drm/shared-core/radeon_cp.c	28 May 2005 20:34:04 -
@@ -923,6 +923,56 @@ static int radeon_do_wait_for_idle(drm_r
 	return DRM_ERR(EBUSY);
 }

+
+/* Debugging function:
+ *
+ * Commit the ring immediately and verify that the hardware is making
+ * progress on the ring.
+ */
+static int failed_once = 0;
+static const char* prev_inflight_caller = 0;
+static const char* prev2_inflight_caller = 0;
+static const char* prev3_inflight_caller = 0;
+
+void radeon_do_inflight_commit_and_wait(drm_radeon_private_t * dev_priv, const char* caller)
+{
+	const char* prev3_caller = prev3_inflight_caller;
+	const char* prev2_caller = prev2_inflight_caller;
+	const char* prev_caller = prev_inflight_caller;
+	const char* cur_caller = caller;
+	u32 old_tail = RADEON_READ(RADEON_CP_RB_WPTR);
+	u32 new_tail = dev_priv-ring.tail;
+	int i;
+
+	prev3_inflight_caller = prev2_caller;
+	prev2_inflight_caller = prev_caller;
+	prev_inflight_caller = caller;
+
+	if (failed_once)
+		return;
+
+	COMMIT_RING();
+
+	for(i = 0; i  dev_priv-usec_timeout; i++) {
+		u32 head = GET_RING_HEAD(dev_priv);
+
+		if (new_tail  old_tail) {
+			if (head  old_tail  head = new_tail)
+return;
+		} else {
+			if (head = new_tail || head  old_tail)
+return;
+		}
+
+		DRM_UDELAY(1);
+	}
+
+	DRM_ERROR(failed! (caller = %s, prev = %s - %s - %s)\n, cur_caller, prev_caller, prev2_caller, prev3_caller);
+	radeon_status(dev_priv);
+	failed_once = 1;
+}
+
+
 /* 
  * CP control, initialization
  */
Index: drm/shared-core/radeon_drv.h
===
RCS file: /cvsroot/r300/r300_driver/drm/shared-core/radeon_drv.h,v
retrieving revision 1.12
diff -u -3 -p -r1.12 radeon_drv.h
--- drm/shared-core/radeon_drv.h	3 Mar 2005 04:40:21 -	1.12
+++ drm/shared-core/radeon_drv.h	28 May 2005 20:34:05 -
@@ -985,14 +985,39 @@ do {	\

 #define RING_LOCALS	int write, _nr; unsigned int mask; u32 *ring;

+#define ALIGN_RING() do {		\
+	int _nr = 32 - (dev_priv-ring.tail  31);			\
+	int _write;			\
+	if (dev_priv-ring.space = (_nr+1) * sizeof(u32)) {		\
+		COMMIT_RING();		\
+		radeon_wait_ring( dev_priv, (_nr+1) * sizeof(u32) );	\
+	}\
+	_write = dev_priv-ring.tail;	\
+	if (_write  1)	{		\
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2;	\
+		_write = _write % dev_priv-ring.tail_mask;		\
+		_nr--;			\
+	}\
+	while( _nr = 2 ) {		\
+		/*dev_priv-ring.start[_write++] = CP_PACKET3( RADEON_CP_NOP, 0 );*/ \
+		/*dev_priv-ring.start[_write++] = CP_PACKET0( 0x1438, 0 );*/ \
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2; 	\
+		dev_priv-ring.start[_write++] = RADEON_CP_PACKET2; 	\
+		_write = _write % dev_priv-ring.tail_mask;		\
+		_nr -= 2;		\
+	}\
+	dev_priv-ring.tail = _write;	\
+} while (0)
+
 #define BEGIN_RING( n ) do {		\
+	ALIGN_RING(); /* TEST TEST */	\
 	if ( RADEON_VERBOSE ) {		\
 		DRM_INFO( BEGIN_RING( %d ) in %s\n,			\
 			   n, __FUNCTION__ );\
 	}\
-	if ( dev_priv-ring.space = (n) * sizeof(u32) ) {		\
+	if ( dev_priv-ring.space = dev_priv-ring.size/2 /*(n+1) * sizeof(u32)*/ ) {		\
 COMMIT_RING();		\
-		radeon_wait_ring( dev_priv, (n) * sizeof(u32) );	\
+		radeon_wait_ring( dev_priv, dev_priv-ring.size/2/*(n+1) * sizeof(u32)*/ );	\
 	}\
 	_nr = n; dev_priv-ring.space -= (n) * sizeof(u32);		\
 	ring = dev_priv-ring.start;	\
@@ 

Re: r300 radeon 9800 lockup

2005-05-25 Thread Nicolai Haehnle
On Tuesday 24 May 2005 22:54, Jerome Glisse wrote:
 On 5/24/05, Nicolai Haehnle [EMAIL PROTECTED] wrote:
  Unfortunately, I don't think so. The thing is, all those OUT_RING and
  ADVANCE_RING commands do not really call into the hardware immediately; 
all
  they do is write stuff to the ring buffer, but the ring buffer is just 
some
  memory area without any magic of its own.
  
  Only a call to COMMIT_RING will tell the hardware that new commands are
  waiting in the ring buffer, and the only thing we do know is that
  *something* in the ring buffer before the last COMMIT_RING causes the 
chip
  to hang.
  
  So another possible way to investigate this could be:
  - Call radeon_do_wait_for_idle() at the end of the COMMIT_RING macro, 
and
  define RADEON_FIFO_DEBUG (this will print out additional information 
when
  wait_for_idle fails)
  - Increasingly add COMMIT_RING macros into r300_cmdbuf.c to pinpoint the
  exact location of the problem, if at all possible.
  
  It would be very helpful if you could single out one command we send 
using
  this procedure.
  
  Note that in the worst case (depending on the actual nature of the 
lockup in
  hardware), those debugging changes could actually *remove* the lockup 
(e.g.
  because they remove a race condition that caused the lockup in the first
  place).
  
 
 Below a sample of what i get when a lockup occur. There is something
 that seems strange to me, i saw CP_RB_RTPR change while i am in a
 lockup and CP_RB_WTPR increase 6 by 6, I haven't let the things live
 for too much time (about 2mins before reboot) but i looks like it
 still process ring buffer but slowly.

The increases of the write pointer is easily by 6 dwords is easily explained 
by radeon_do_cp_idle: This function always emits a series of 6 dwords 
(cache flushes and stuff) before calling wait_for_idle. My understanding is 
that these commands make sure the chip is in a completely clean state.

Are you sure the read pointer is still moving 2mins after the lockup? That 
would be rather surprising, to say the least.

 Anyway i must misunderstood this 
 i have to dig up more this drm code to understand it a little more.

 By the way why radeon_cp_flush is disactivated ?

The only thing that calls radeon_cp_flush is radeon_cp_stop, which is never 
called during normal 3D operation and COMMIT_RING should take care of 
posting the write pointer.

I don't know the meaning of bit 31 of WPTR.

cu,
Nicolai

 May 24 21:33:25 localhost kernel: [drm:radeon_do_wait_for_idle] *ERROR* 
failed!
 May 24 21:33:25 localhost kernel: radeon_status:
 May 24 21:33:25 localhost kernel: RBBM_STATUS = 0x80010140
 May 24 21:33:25 localhost kernel: CP_RB_RTPR = 0x0003fdf0
 May 24 21:33:25 localhost kernel: CP_RB_WTPR = 0x0d95
 May 24 21:33:25 localhost kernel: AIC_CNTL = 0x
 May 24 21:33:25 localhost kernel: AIC_STAT = 0x0004
 May 24 21:33:25 localhost kernel: AIC_PT_BASE = 0x
 May 24 21:33:25 localhost kernel: TLB_ADDR = 0x
 May 24 21:33:25 localhost kernel: TLB_DATA = 0x
2


pgpt4F03RhI60.pgp
Description: PGP signature


Re: r300 radeon 9800 lockup

2005-05-25 Thread Nicolai Haehnle
On Wednesday 25 May 2005 17:01, Vladimir Dergachev wrote:
  Are you sure the read pointer is still moving 2mins after the lockup? 
That
  would be rather surprising, to say the least.
 
 
 I think I can imagine how this might be happenning. You see a lockup from
 the driver point of view is when the 3d engine busy bit is constantly on.
 
 The read pointer is updated by the CP engine, not the 3d engine. It could 
 be that something would cause the CP engine to loop around sending 
 commands to 3d engine forever. This would keep the 3d engine bit on, 
 update the read pointer and appear to be a lockup to the driver.

What you're saying is, some command that we sent could be misinterpreted by 
the 3D engine (or we sent something that we didn't intend to send, 
considering lack of docs etc.) as a command that takes insanely long to 
complete.

 One way to try to make sure this does not happen is to put code in the DRM 
 driver to control the active size of the ring buffer.

That could be useful for debugging, but that's about it. The thing is, we 
*want* to have the ring buffer full. If we didn't want that, we could just 
make the ring buffer smaller. But that doesn't really *solve* the problem 
either because even very small commands can take an insane amount of time 
to finish.

In any case, it would be interesting to know how fast the RPTR still moves 
and if it becomes unstuck at some point. You also need to watch out for 
when the X server finally decides to reset the CP. I believe there's a bug 
where the X server waits much longer than intended to do this, but the 
reset could still mess with results if you're waiting for too long.


 Also, there might be an issue where the CP engine expects the ring buffer 
 to be padded with NOPs in a certain way (say to have pointers always on 
 256 bit boundaries) - I don't think we are doing this.

Yes, that's what I mentioned in an earlier mail.

cu,
Nicolai


pgpIU9yrssrmU.pgp
Description: PGP signature


Re: r300 radeon 9800 lockup

2005-05-24 Thread Nicolai Haehnle
On Tuesday 24 May 2005 18:33, Adam K Kirchhoff wrote:
 Vladimir Dergachev wrote:
 
  Vladimir Dergachev wrote:
 
 
  In the past I found useful not to turn drm debugging on, but, rather,
  insert printk statements in various place in radeon code. This should
  also provide more information about what is actually going on. 
 
 
 
  I can't make any promises.  My partner already thinks I spend too much
  time in front of the computer :-)  I'll see what I can do, though.  
  Think a
  printk statement at the start and end of every function?  Have any
 
 
  This is probably overkill and might not be very useful
 
  Rather try, at first, to just print a printk logging which command is 
  being executed (r300_cmd.c) - this is not very thorough, but, maybe, 
  there is a pattern.
 
 
 I added a printk for each function in r300_cmdbuf.c...  When Q3A locked 
 up, and the last thing showing up in syslog was r300_pacify.  So I added 
 printk's after every line in r300_pacify :-)  The last thing in syslog 
was:
 
 OUT_RING( CP_PACKET3( RADEON_CP_NOP, 0 ) )
 OUT_RING( 0x0 )
 ADVANCE_RING()
 
 So it seems to be making it all the way through r300_pacify, which had 
 been called from r300_check_range, from r300_emit_unchecked_state.
 
 Here's the sequence:
 r300_emit_raw
 r300_emit_packet3
 r300_emit_raw
 r300_emit_unchecked_state
 r300_check_range
 r300_emit_unchecked_state
 r300_check_range
 r300_pacify
 RING_LOCALS
 BEGIN_RING(6)
 OUT_RING( CP_PACKET0( R300_RB3D_DSTCACHE_CTLSTAT, 0 ) )
 OUT_RING( 0xa )
 OUT_RING( CP_PACKET0( 0x4f18, 0 ) )
 OUT_RING( 0x3 )
 OUT_RING( CP_PACKET3( RADEON_CP_NOP, 0 ) )
 OUT_RING( 0x0 )
 ADVANCE_RING()
  
 
 Does this tell us anything?

Unfortunately, I don't think so. The thing is, all those OUT_RING and 
ADVANCE_RING commands do not really call into the hardware immediately; all 
they do is write stuff to the ring buffer, but the ring buffer is just some 
memory area without any magic of its own.

Only a call to COMMIT_RING will tell the hardware that new commands are 
waiting in the ring buffer, and the only thing we do know is that 
*something* in the ring buffer before the last COMMIT_RING causes the chip 
to hang.

So another possible way to investigate this could be:
- Call radeon_do_wait_for_idle() at the end of the COMMIT_RING macro, and 
define RADEON_FIFO_DEBUG (this will print out additional information when 
wait_for_idle fails)
- Increasingly add COMMIT_RING macros into r300_cmdbuf.c to pinpoint the 
exact location of the problem, if at all possible.

It would be very helpful if you could single out one command we send using 
this procedure.

Note that in the worst case (depending on the actual nature of the lockup in 
hardware), those debugging changes could actually *remove* the lockup (e.g. 
because they remove a race condition that caused the lockup in the first 
place).

cu,
Nicolai


pgp3gtAZqqVTh.pgp
Description: PGP signature


Re: r300 radeon 9800 lockup

2005-05-23 Thread Nicolai Haehnle
On Sunday 22 May 2005 21:00, Jerome Glisse wrote:
 Hi,
 
 I setup a x86 with radeon 9800 pro or xt, trying to find
 why it locks. I see little improvement with option no silken
 mouse can you test and tell me if it dones anythings for
 you (X -nosilk).
 
 My thought on this lockups is that it's similar to the one
 r200 users report, X taking 100% of CPU waiting for
 something. I saw a mail from Felix about a lock holding
 issue will try to dig in mail archive.

If I interpret the logs correctly, all those lockups are of the form where 
the R300 fails to process the ring buffer any further, i.e. the R300 locks 
up. This in turn causes the 3D driver or the X server (depending on the 
exact circumstances, and probably in a rather random fashion) to wait for 
the R300 to become idle in an endless loop.

The 100% CPU usage is merely caused by the fact that we're polling the chip 
instead of doing proper IRQ-based wait-for-idle.

 Anyone have any idea on that ? Could it be the mouse
 code in xorg ? Or is it in r300_mesa or drm ? I really
 suspect xorg radeon code...

It is easy to blame the DDX, but the truth is, we just don't know. The 
people seeing lockups should try to figure out whether there is a direct 
causal connection between e.g. mouse movements and lockups. If you are in a 
fullscreen OpenGL applications, not moving the mouse, no popups occuring 
from something like a panel applet, and the chip *still* locks up, it is 
highly unlikely that the DDX is at fault.

It is equally likely that the lockup is caused by, say, alignment or 
wraparound issues of the ring buffer.
Note that fglrx always submits commands in indirect buffers, which are 
stored linearly in physical memory. We, on the other hand, always submit 
commands into the ring buffer, which is not linear (because it wraps 
around). Also, fglrx likes to emit NOPs into the command stream sometimes, 
though I haven't been able to find an exact pattern in those NOPs. We never 
emit NOPs (or do we?).

So the fact is: We just don't know whether alignment/wraparound can cause 
trouble. The emission of NOPs by fglrx is IMO significant evidence that 
there *are* issues in this area, at least on some chipsets, but it could 
just be some weird artifact of the fglrx codebase.

cu,
Nicolai


pgpgCpffRgue8.pgp
Description: PGP signature


Re: R300 swizzle table

2005-05-21 Thread Nicolai Haehnle
On Saturday 21 May 2005 17:42, Jerome Glisse wrote:
 On 5/21/05, Ben Skeggs [EMAIL PROTECTED] wrote:
  Also, while I was debugging some problems in ut2004, I noticed that it
  re-uses
  the same few programs over and over, but they are translated each time.  
I'm
  thinking about adding a cache for the last 5 or so texenv programs so
  that we
  don't need to translated all the time.  Should get a nice speedup in the
  more
  complex areas.  Any thoughts on this?

That would mean that either ut2004 rewrites different TexEnv settings 
multiple times between rendering calls, or the Mesa core fails to detect 
some redundant state setting.

 ut2004 has a bad ogl attitude if so,  (don't have it as i don't think 
there is
 a PPC linux version :)) But yes caching program could be usefull. Moreover
 IIRC r300 can have 2 fragment program in memory ?

Well, there are 64 slots for ALU instructions, and it seems to be possible 
to set pretty arbitrary program start offsets. So you could write two 
programs' ALU instructions into the chip at the same time, but I don't 
think you can do the same for TEX instructions, so it has very limited 
usability.

cu,
Nicolai


pgpoHGLkAguD0.pgp
Description: PGP signature


Re: [R300] new snapshot ?

2005-05-19 Thread Nicolai Haehnle
On Thursday 19 May 2005 09:20, Keith Whitwell wrote:
 Vladimir Dergachev wrote:
  
  Hi Aapo, Ben, Jerome, Nicolai:
  
 I recently checked fresh code from CVS and was pleasantly surprised 
  to see that all Quake3 levels that were broken are now perfect - in fact 
  I cannot find anything that is amiss !
  
 Do you think it would be a good idea to tag the current code and make 
  a snapshot ?

Sure, anytime :)

 So have you guys given any consideration to moving the r300 driver into 
 mesa proper?  CVS access shouldn't be a problem, fwiw...

There are two main points that have stopped me from pushing for the 
inclusion of the driver into Mesa proper:

1. Kernel-level security holes
We should take care of full command-stream verification before moving the 
driver into Mesa CVS. It's easy to say We can do that later, but if we 
say that it's likely that it won't be done in a long time.

2. DRM binary compatibility
We still don't know the meaning of many of the registers. Some registers are 
labelled dangerous which means we might have to do some more checks in 
the kernel to make sure user processes can't do harmful stuff. This means 
that we might have to *remove* some of the cmdbuf commands that exist today 
in the future.

If the others believe moving r300 to Mesa is a good idea, then I'll do some 
auditing to the DRM code. Once I (or somebody else) has done this, I'm okay 
with moving the driver as long as we don't enforce DRM binary compatibility 
yet.

cu,
Nicolai


pgpm9DoWLbaY9.pgp
Description: PGP signature


Re: [Mesa3d-dev] update on GL_EXT_framebuffer_object work

2005-05-17 Thread Nicolai Haehnle
On Tuesday 17 May 2005 15:47, Brian Paul wrote:
  Note that it can be easy to be miss this problem. One way that should 
  trigger the issue in all drivers is:
  1. Make sure that you hit software rasterization fallbacks (e.g. 
  no_rast=true).
  2. Run any GL application in a window and resize the window. If you make 
the 
  window larger than its initial size, the framebuffer will be clipped 
  incorrectly.
  
  I've fixed this by calling _mesa_resize_framebuffer in the same place 
where 
  clip rectangles are recalculated after the DRI lock has been regained. 
  However, I'd like to know if this is the correct/canonical/preferred way 
of 
  doing it.
 
 That actually sounds like the right thing.
 
 The idea is that when the driver learns that the window has been 
 resized we need to call _mesa_resize_framebuffer() on the framebuffer 
 that corresponds to the window.  Wherever we recompute the cliprects 
 in response to a window size change, is also the right place to resize 
 the Mesa framebuffer.
 
 This should be addressed in all the DRI drivers.
 
 If you can provide the details of how/where you're doing this in the 
 r300 driver, we can look at doing the same in the other drivers.

In the r300 driver, the function radeonGetLock() is the function that 
handles all the non-fast cases of LOCK_HARDWARE. In this function, we call 
r300RegainedLock() after validating the drawable information.

I have changed r300RegainedLock() to look like this:

static void r300RegainedLock(radeonContextPtr radeon)
{
__DRIdrawablePrivate *dPriv = radeon-dri.drawable;

if (radeon-lastStamp != dPriv-lastStamp) {
/* --- Here is the interesting part --- */
_mesa_resize_framebuffer(radeon-glCtx,
(GLframebuffer*)dPriv-driverPrivate,
dPriv-w, dPriv-h);

... recalculate cliprects and scissor stuff here ...
}
}

Inserting this call to _mesa_resize_framebuffer was the only relevant 
change.

cu,
Nicolai


pgpntrjtOrF66.pgp
Description: PGP signature


Re: r300 patch: correct a format/argument mismatch

2005-05-13 Thread Nicolai Haehnle
On Friday 13 May 2005 04:45, Jeff Smith wrote:
 There is a format/argument mismatch in r300_texprog.c.  The format given 
is '%d' while
 the argument is a char*.  This patch corrects the format to '%s'.

Applied to CVS.

cu,
Nicolai

  -- Jeff Smith


pgp5MmPtTQEc6.pgp
Description: PGP signature


Re: r300 patch: change some parameters to GLvoid*

2005-05-13 Thread Nicolai Haehnle
On Friday 13 May 2005 04:43, Jeff Smith wrote:
 There are several places in r300_maos.c where a GLvoid* parameter is more 
appropriate
 than char*.  This patch makes these changes (which also fixes a compiler 
warning for me).

Applied to CVS.

cu,
Nicolai

  -- Jeff Smith
 
 
   
 __ 
 Yahoo! Mail Mobile 
 Take Yahoo! Mail with you! Check email on your mobile phone. 
 http://mobile.yahoo.com/learn/mail 


pgpTwaM2Ba5LZ.pgp
Description: PGP signature


Re: [Mesa3d-dev] update on GL_EXT_framebuffer_object work

2005-05-13 Thread Nicolai Haehnle
On Monday 02 May 2005 16:56, Brian Paul wrote:
 
 This weekend I finished updating the DRI drivers to work with the new 
 framebuffer/renderbuffer changes.  My DRI test system is terribly out 
 of date so I haven't run any tests.  I'm tempted to just check in the 
 changes now and help people fix any problems that arise, rather than 
 spend a few days updating my test box.  I think the code changes are 
 pretty safe though.
 
 Here's a summary of changes to the DRI drivers:
[snip]
 Are there any questions or concerns?

Working on the experimental R300 driver, I did come upon a question:

How are DRI drivers supposed to handle window resizes? If I understand the 
code correctly, _mesa_resize_framebuffer would have to be called at some 
point when the window is resized, but I don't see when that happens in any 
of the DRI drivers in Mesa CVS.

Note that it can be easy to be miss this problem. One way that should 
trigger the issue in all drivers is:
1. Make sure that you hit software rasterization fallbacks (e.g. 
no_rast=true).
2. Run any GL application in a window and resize the window. If you make the 
window larger than its initial size, the framebuffer will be clipped 
incorrectly.

I've fixed this by calling _mesa_resize_framebuffer in the same place where 
clip rectangles are recalculated after the DRI lock has been regained. 
However, I'd like to know if this is the correct/canonical/preferred way of 
doing it.

cu,
Nicolai


pgpjIe57zRehm.pgp
Description: PGP signature


Re: licenses, R300 code, etc

2005-05-01 Thread Nicolai Haehnle
On Sunday 01 May 2005 06:41, Vladimir Dergachev wrote:
 On Sun, 1 May 2005, Paul Mackerras wrote:
  Vladimir Dergachev writes:
 
  * the R300 driver derived from it appears under the same
license due to the notices left over from R200 files
(as we originally thought to merge the code in R200).
 
This needs approval from everyone who contributed to R300 -
please let me know !
 
  What exactly needs approval?  The current license, or are you
  proposing a change to the license?
 
 Just wanted to confirm that everyone is ok with MIT/X11 license.
 It was never explicit before - my fault, I was having too much fun playing 
 with the code :)

I always thought it was explicit, at least for me - I didn't just copypaste 
blindly ;)
So yes, I'm obviously okay with that license.

cu,
Nicolai


pgpd4cRidutV2.pgp
Description: PGP signature


Re: Proprosed break in libGL / DRI driver ABI

2005-04-05 Thread Nicolai Haehnle
On Tuesday 05 April 2005 22:11, Brian Paul wrote:
 If you increase MAX_WIDTH/HEIGHT too far, you'll start to see 
 interpolation errors in triangle rasterization (the software 
 routines).  The full explanation is long, but basically there needs to 
 be enough fractional bits in the GLfixed datatype to accomodate 
 interpolation across the full viewport width/height.
 
 In fact, I'm not sure that we've already gone too far by setting 
 MAX_WIDTH/HEIGHT to 4096 while the GLfixed type only has 11 fractional 
 bits.  I haven't heard any reports of bad triangles so far though. 
 But there probably aren't too many people generating 4Kx4K images.
 
 Before increasing MAX_WIDTH/HEIGHT, someone should do an analysis of 
 the interpolation issues to see what side-effects might pop up.
 
 Finally, Mesa has a number of scratch arrays that get dimensioned to 
 [MAX_WIDTH].  Some of those arrays/structs are rather large already.

Slightly off-topic, but a thought that occured to me in this regard was to 
tile rendering. Basically, do a logical divide of the framebuffer into 
rectangles of, say, 64x64 pixels. During rasterization, all primitives are 
split according to those tiles and rendered separately. This has some 
advantages:

a) It could help reduce the interpolation issues you mentioned. It's 
obviously not a magic bullet, but it can avoid the need for insane 
precision in inner loops.
c) Better control of the size of scratch structures, possibly even better 
caching behaviour.
b) One could build a multi-threaded rasterizer (where work queues are per 
framebuffer tile), which is going to become all the more interesting once 
dualcore CPUs are widespread.

cu,
Nicolai


pgpyy3jidOfu4.pgp
Description: PGP signature


Re: r300 - alpha test

2005-03-21 Thread Nicolai Haehnle
Meh, I originally sent this from the wrong email address, sorry...

On Monday 21 March 2005 12:50, Peter Zubaj wrote:
 I just realized something - isn't the application supposed to change
 Z test for that ?
 
 I don't know, but all application I tested and which uses alpha test -
 has z test enabled and all displays errors (tuxracer, enemy territory,
 fire - from mesa/demos)
 
 Maybe what really happens is that disabling Z test is broken.
 
 Z test is not disabled - it is enabled. Problem is - even alpha test
 fails (and fragment is discarded) z value is still written (and this
 looks wrong).

Bingo.
If setting 0x4F14 to 1 does indeed enable early Z testing, this is easily 
explained: For every fragment, the card *first* does the Z test. The Z test 
passes, so the new depth is written out. The fragment program is probably 
run after the Z test, but this detail doesn't matter. What matters is that 
the alpha test discards the fragment, but only after Z has already been 
written.

If, on the other hand, 0x4F14 is set to 0, Z testing happens *after* the 
alpha test and everything's fine.

 On the other hand, as Nicolai points out it would be nice to know
 what that register does and whether other bits have any function.
 
 AFAIK:
 
 fglrx initialize 0x4f14 register to 0x0001, but when alpha test is
 enabled it sets it to 0x. I have to do more tests to see if
 fglrx sets this register back to 0x0001 (for now looks, like it is
 not set back, but I need to make test program for it).

Yes, that would need further testing. If fglrx does not set the register 
back to 1, that would indicate that there's more to this bit than just 
early Z. Possible explanations could be (a) a relation to other Z 
acceleration tricks, (b) fglrx is just being stupid or (c) switching 
between early and late Z testing is very slow or broken in hardware. But if 
fglrx *does* reset the register to 1 when alpha test is disabled, we can 
pretty much say with certainty that it enables early Z testing.

cu,
Nicolai


pgp5ZzfXpDQXP.pgp
Description: PGP signature


Re: r300 - alpha test

2005-03-21 Thread Nicolai Haehnle
On Monday 21 March 2005 12:50, Peter Zubaj wrote:
 I just realized something - isn't the application supposed to change
 Z test for that ?
 
 I don't know, but all application I tested and which uses alpha test -
 has z test enabled and all displays errors (tuxracer, enemy territory,
 fire - from mesa/demos)
 
 Maybe what really happens is that disabling Z test is broken.
 
 Z test is not disabled - it is enabled. Problem is - even alpha test
 fails (and fragment is discarded) z value is still written (and this
 looks wrong).

Bingo.
If setting 0x4F14 to 1 does indeed enable early Z testing, this is easily 
explained: For every fragment, the card *first* does the Z test. The Z test 
passes, so the new depth is written out. The fragment program is probably 
run after the Z test, but this detail doesn't matter. What matters is that 
the alpha test discards the fragment, but only after Z has already been 
written.

If, on the other hand, 0x4F14 is set to 0, Z testing happens *after* the 
alpha test and everything's fine.

 On the other hand, as Nicolai points out it would be nice to know
 what that register does and whether other bits have any function.
 
 AFAIK:
 
 fglrx initialize 0x4f14 register to 0x0001, but when alpha test is
 enabled it sets it to 0x. I have to do more tests to see if
 fglrx sets this register back to 0x0001 (for now looks, like it is
 not set back, but I need to make test program for it).

Yes, that would need further testing. If fglrx does not set the register 
back to 1, that would indicate that there's more to this bit than just 
early Z. Possible explanations could be (a) a relation to other Z 
acceleration tricks, (b) fglrx is just being stupid or (c) switching 
between early and late Z testing is very slow or broken in hardware. But if 
fglrx *does* reset the register to 1 when alpha test is disabled, we can 
pretty much say with certainty that it enables early Z testing.

cu,
Nicolai


pgp21RdGhCMMO.pgp
Description: PGP signature


Re: [r200] Lockups...

2005-03-14 Thread Nicolai Haehnle
On Sunday 13 March 2005 23:46, Adam K Kirchhoff wrote:
 Adam K Kirchhoff wrote:
  I really am confused.  This was all working (with my 9200) prior to 
  reinstalling Debian on my system on Friday.  Thankfully it still works 
  (with drm 1.15.0) on my FreeBSD installation.  Not really sure if that 
  tells you anything.
 
 
 Alright...  So drm from both February 14th and January 1st are locking 
 up as well...  Which is odd since I never had any of these problem till 
 this weekend.  I'll start rolling back changes to the Mesa dri 
 driver...  Perhaps this isn't directly related to the drm.
 
 Oh, I've also flashed my BIOS to the latest from the motherboard 
 manufacturer..  Thought it was worth a shot, but it didn't help.

If rolling back the dri driver doesn't help, what about the DDX or even the 
kernel?

cu,
Nicolai

 Adam


pgpIA7614xZrz.pgp
Description: PGP signature


Re: [r200] Lockups...

2005-03-13 Thread Nicolai Haehnle
On Sunday 13 March 2005 03:10, Adam K Kirchhoff wrote:
 Was it always shared with the USB controller? Can you try changing that?
 
 Some more info for both of you...
 
 I remarked, in an earlier e-mail, that glxgears wouldn't cause the 
 lockups.  That's not true.  For whatever reason, it doesn't seem to 
 cause the lockups if I load the drm module with debug=1...  At least not 
 immediately.  However, if I don't load drm that way, glxgears will 
 lockup my machine rather quickly.  Some lockups are hard lockups, unable 
 to even get the serial console to respond.  With others, I can even ssh 
 in still.  In all cases, iirc, my machine will lockup hard if I actually 
 try and 'reboot' the box after logging in.

Wait a minute... isn't that a very similar lockup to the one you got with 
R300? I don't understand what's going on with radeon_cp_reset though.

cu,
Nicolai


pgpINuZkMkuKs.pgp
Description: PGP signature


Re: [R300] gliding_penguin snapshot

2005-03-06 Thread Nicolai Haehnle
On Sunday 06 March 2005 14:15, Adam K Kirchhoff wrote:
 Unfortunately, I'm still getting pretty constant lockups that seem to be 
 related to high framerates.  From ppcracer with RADEON_DEBUG set to all:
 
 http://68.44.156.246/ppracer.txt.gz
 
 On the plus side, the texture problem that I had seen with 
 neverputt/neverball seems to be resolved.

This is probably the same lockup that I have seen. Unfortunately, I can't 
test anything for another two weeks or so.

It may be worth it to test whether the lockup is due to some (race?) 
interaction between the 3D client and the X server. In particular, test if 
the lockup also happens with Solo. If the lockup happens with Solo as well, 
we at least know that it's not caused by the X server doing something.

cu,
Nicolai

 Adam


pgpo4Bo89wFuG.pgp
Description: PGP signature


Re: r300 - Saphire 9600

2005-02-27 Thread Nicolai Haehnle
On Sunday 27 February 2005 23:10, Hamie wrote:
 I've added in the pci-id's for the Saphire 9600 AGP card. As it has 2 
 pci-id's, I've added both to the pciids file, and added it into 
 radeon_screen, but left the seocnd head commented out on radeon-screen.c 
 as I'm unsiure whether or not it should be treeated separately...
 
 Why does it appear as two pci id's anyway? Can you treat it as a second 
 card?

To the best of my knowledge, we don't add the second head PCI ID to 
drm_pciids.txt because the driver only looks for the first PCI device (in 
fact, loading two driver instances, one for each device, would be certain 
to cause lockups). So please remove the second ID.

I don't know why ATI decided to publish two PCI devices, and I don't have 
any related documentation. However, all the features like dual head can be 
used by only considering the first PCI device, as far as I know.

cu,
Nicolai

 H


pgpzXuX0ehPAX.pgp
Description: PGP signature


Re: [r300] Radeon 9600se mostly working..

2005-02-22 Thread Nicolai Haehnle
On Monday 21 February 2005 17:40, John Clemens wrote:
  On Mon, 21 Feb 2005, John Clemens wrote:
 
  give it a go on my fanless 9600se (RV350 AP).
 
  How much memory do you have ? What kind of CPU and motherboard ?
 
 Duron 1.8G, 256MB ddr, old(ish) via km266 motherboard in a shuttle sk41g. 
 Gentoo.  The card has 128Mb ram.
 
  - glxinfo states r300 DRI is enabled. (AGP4x, NO-TCL)
  - glxgears gives me about 250fps with drm debug=1, ~625fps without 
debug
   on.
 
 should I be concerned that these fps are too low?  others seem to be 
 reporting around 1000..

Well, I'm not sure about the value with debug off, it does seem rather low, 
but perhaps reasonable if you are using immediate mode (which is still the 
default in CVS, I believe - check r300_run_render in r300_render.c).
Your debug FPS is rather high, actually - I only get around 50fps in 
glxgears with enabled DRM debugging (even less if I also enable debug 
messages from the userspace driver).

  - tuxracer runs ok at 640x480 fullscreen
   - ice textures look psychadelicly blue
   - at 1280x1024, (and somewhat at 800x600 windowed), i get these
 errors:
  [drm:radeon_cp_dispatch_swap] *ERROR* Engine timed out before swap 
buffer 
  blit
 
 ...
 
  The swap buffer blit is just a copy - for example a copy from back 
buffer to 
  front buffer. Since the engine timed out before swap buffer blit it 
means 
  that the commands before it were at fault. Which is puzzling as you 
point out 
  that everything works in 640x480.
 
 Just to elaborate:  640x480 runs fine.  at 800x600 windowed, it plays 
 fine, but if a scene gets more complicated i see some jerkyness.. i.e., 
 the scene freezes for a second or two and then jumps ahead, and i get a 
 few messages in the log.  At 1280x1024, this happens all the time, so it 
 appears the game is locked, and I get a stream of those messages in the 
 log file.  alt-F switching to the console works, and switching back i get 
 about 2 seconds more of movement, and then soft-lock again (persumably 
 because the card re-inits on VC switch).  I can switch to the VC and kill 
 it and all's fine.  Judging from what you're saying, the card isn't 
 locked, it just isn't able to draw a full scene before it times out.

Well, this is certainly interesting, and it does sound like userspace is 
generating so many drawing commands that the card is simply too slow to 
process them all. My guess is that the one-two second freezes are causes by 
the X server when it, too, thinks that the engine has timed out and 
initiates a reset sequence.

This is actually an interesting problem. Here are some issues to think 
about:
1) The SWAP ioctl should really report an error to userspace when the engine 
has timed out.
2) I agree that it would make sense to monitor the ring buffer somehow. 
Perhaps a wait_for_ringbuffer that is called at the top of wait_for_fifo? 
In the fast path, this costs an additional I/O read operation, otherwise 
it should be essentially be no different performance-wise.
3) Come to think of it, couldn't the card just issue an IRQ when it's done?
4) If a drawing command takes very long, can we identify the userspace 
process that is responsible for sending the command buffer that caused the 
delay, and can we deal with this process somehow? Perhaps we could insert 
an age marker before and after the processing in the command buffer ioctl.

The last point actually touches on a bigger subject: scheduling access to 
the graphics card. To get an idea of what I'm talking about, launch a 
terminal emulator and glxgears side by side. Then run yes in the terminal 
emulator. glxgears will essentially lock up.

cu,
Nicolai


pgpRFAJMWkcWT.pgp
Description: PGP signature


Re: R300 lockups...

2005-02-22 Thread Nicolai Haehnle
On Tuesday 22 February 2005 21:57, Adam K Kirchhoff wrote:
 No luck.  I setup my xorg.conf file to limit X to 640x480, and used 
 xrandr to drop the refresh rate to 60...  Launched neverputt at 640x480, 
 fullscreen.  Lockup was nearly instantaneous...  The music continues, at 
 least till neverputt dies, and the mouse moves around.  Rebooted and 
 tried again...  Exact same result.  At least when I was running it at 
 1024x768 on a mergedfb desktop of 2560x1024, I was able to play a hole 
 or two of golf...
 
 Two times now, I've tried running it at 640x480 on my large mergedfb 
 desktop.  I get further than I did when the screen resolution was 
 640x480, but not much.
 
 I just tried two times now running it at 1280x1024 on my large mergedfb 
 desktop, and it plays fine for a number of holes.  Usually locks up 
 between holes.
 
 My conclusion is that these lockups are occuring when the framerate is 
 at it's highest (ie. low resolution, low texture, low activity), which I 
 believe is a situation someone else described on here not to long ago.

That was me, so I can confirm that, and it *is* different from the problem 
reported by John Clemens in the other thread (the one called [r300] Radeon 
9600se mostly working).

Unfortunately, I won't have access to my test setup for the next weeks, so I 
don't have anything new.

cu,
Nicolai

 Adam


pgpxmWm01RE3d.pgp
Description: PGP signature


Re: [R300 and other radeons] MergedFB lockups

2005-02-19 Thread Nicolai Haehnle
On Saturday 19 February 2005 02:06, Vladimir Dergachev wrote:
 I think I found the cause of lockups in VB mode - they were due to cursor 
 updating function in radeon_mergedfb.c calling OUTREGP() which in turn 
 called INREG.
 
 When silken mouse is enabled this function could be called at any time 
 which would do *bad* things when CP engine is active.
 
 The fix of putting RADEONWaitForIdleMMIO() works fine on my setup.
 
 I have *no* idea why this worked with immediate mode at all and why no 
 issues were reported by R200 and Radeon users (well, I only looked through 
 the mailing lists, perhaps there is something on bugzilla but I don't know 
 how to use that efficiently)
 
 Also, I have no idea why the code in radeon_cursor.c that writes images 
 directly into framebuffer memory works - according to the manual any 
 writes into framebuffer while GUI is active should cause a hard lock.
 
 However, I could not produce any lockups with it, and so left it as is.

I can see no difference at all with this latest change, i.e. no regressions, 
but the lockup is still there.

cu,
Nicolai

   best
 
  Vladimir Dergachev


pgpwdAI9btdc8.pgp
Description: PGP signature


Re: [r300] VB mode success

2005-02-18 Thread Nicolai Haehnle
On Thursday 17 February 2005 22:34, Rune Petersen wrote:
 On my system it works on my X800 with no lockups.
 For now I have only tested with glxgears and q3demo.
 So I won't be of much help fixing this apart from being a success-vector.
 
 Are there any pattern in what systems works with VB? 

I have an R300 ND (PCI ID 0x4E44), and it doesn't work (i.e. it locks up 
after some time).

 Also an observation:
 With VB mode running glxgears I can get an extra 100 fps by moving the 
 window left. If I move it too much it goes back to the initial fps.
 This doesn't happen with immediate mode.
 are there a good reason for this?
 
 14345 frames in 5.0 seconds = 2868.983 FPS
 14357 frames in 5.0 seconds = 2871.259 FPS
 14332 frames in 5.0 seconds = 2866.250 FPS
 14324 frames in 5.0 seconds = 2864.621 FPS
 move
 14837 frames in 5.0 seconds = 2967.306 FPS
 14905 frames in 5.0 seconds = 2980.958 FPS
 14913 frames in 5.0 seconds = 2982.533 FPS
 14897 frames in 5.0 seconds = 2979.251 FPS

Huh, that's really weird :)
One possible cause (though this is really wild speculation) is that you're 
outputting debug messages somewhere, and moving the window reduces the 
volume of debug messages.

I also have news regarding the (I hope there's only one) VB lockup. I can 
launch and exit (on 0x4E44 as stated above) glxgears basically as often as 
I want, as long as I don't let it run for more than a few seconds. During 
that timeframe I can even do some things that are (used to be) notoriously 
lockup-prone, such as dragging windows around. But when I let it run more 
than a few seconds, it invariably locks up.

Now the really interesting thing is that I captured the full libGL debug 
output from several lock runs, and both the line count and the word count 
of all the logs is *exactly* the same. The byte counts differ by a very 
small amount, which appears to be due to some buffer indices being smaller 
(only one digit vs. two digits) in some cases.

So I assume that there is some kind of timebomb, at least on my machine, 
that reliably and reproducably causes the lockup, most likely when some 
counter or pointer wraps around. I haven't found the exact cause yet, but 
I'll look further into it.

cu,
Nicolai

 
 Rune Petersen
 


pgp2hEXpEbynZ.pgp
Description: PGP signature


[r300] VB lockup found and fixed

2005-02-18 Thread Nicolai Haehnle
Hi everybody,

As reported earlier, I had a perfectly repeatable lockup in VB mode that 
always happened after the exact same number of frames in glxgears. I can't 
explain everything about the lockup, mostly because I still don't know what 
the two registers in the begin3d/end3d sequence actually mean, but here's 
what I know:

It turns out that after the first 4 DMA buffers had been used to completion, 
r300FlushCmdBuf() was called from r300RefillCurrentDmaRegion(). This only 
caused simple state setting commands as well as an upload of the current 
vertex program into the VAP. There was no rendering going on, and neither 
the begin3d nor the end3d sequence was part of the commands that were sent 
to the card.
However for some reason, it was this sequence that caused the lockup.

This leads me to believe that there's somehow more magic to the 
begin3d/end3d sequence than just cache control as I originally assumed (or 
maybe it *is* cache control, but there's something weird going on in 
connection with it, I simply don't know).

In any case, what I did is *always* emit the begin3d sequence at the top of 
r300_do_cp_cmdbuf and end3d at the bottom of r300_do_cp_cmdbuf (it is also 
emitted in the case of an error). This works for me, I can run glxgears for 
several minutes, even doing some stuff that sometimes tends to produce 
lockups without any problems.

Please, everybody, get the latest CVS (anonymous will take some time to 
catch up...) and test vertex buffer mode with it (go to r300_run_render() 
in r300_render.c and change the #if so that r300_vb_run_render() is 
called). I want to be really sure that this fixes it for other people as 
well (after all, there may be other causes for lockups that haven't occured 
on my machine yet), and that there are no regressions for those who already 
had working VB mode.

Once we can be fairly certain that VB mode is stable (i.e. crash and 
lockup-free), let's talk about removing any mention of the begin3d and 
end3d sequence from the userspace driver. This is really far too subtle an 
issue to allow userspace to mess with it. This counts for the X server as 
well - if anybody feels like implementing Render acceleration, which I 
doubt at this stage, please leave the begin3d/end3d handling to the kernel 
module, as it's the only instance that really knows what's going on.

cu,
Nicolai


pgpL5MGT4YXt6.pgp
Description: PGP signature


Re: [r300] VB lockup found and fixed

2005-02-18 Thread Nicolai Haehnle
On Friday 18 February 2005 16:03, Keith Whitwell wrote:
 Ben Skeggs wrote:
  I still have a 100% reproducable bug which I need to find the cause of,
  but time is once again a problem for me.  If I move a window over the 
top
  of a glxgears window my machine locks up immediately, but sysrq still 
  works
  fine.
  
  
  I just discovered (and should've checked before), that I can ssh in and 
  successfuly
  kill glxgears, then X returns to normal.  I can have a partially covered 
  glxgears
  window and everything is fine, but as soon as the entire window (not 
  incl. window
  decorations) is covered, it seems that the 2d driver is unable to update 
  the screen.
 
 I think some of the other drivers do a 'sched_yeild()' or 'usleep(0)' in 
 the zero cliprect case to get away from this sort of behaviour.

Well, I can reproduce this bug and I tracked it down. There are a number of 
problems here, and they all have to do with DMA buffer accounting.
The first (trivial) problem is that nr_released_bufs was never reset to 0. 
I've already fixed that in CVS.
The real problem is that the following situation can occur when we have zero 
cliprects:
1. The command buffer contains a DISCARD command for a DMA buffer.
2. We simply drop that command buffer because there are no cliprects, i.e. 
nothing can be drawn.
3. As a consequence, DMA buffers aren't freed.
4. The rendering loop continues even though DMA buffers have been leaked, 
which eventually causes all DMA buffers to be exhausted, and this causes an 
infinite loop in r300RefillCurrentDmaRegion.

The root cause is that we drop the command buffers with the DISCARD. I can 
see two possible solutions to this problem:
1. Wait until we have a cliprect again before submitting command buffers.
2. Submit command buffers even when we have no cliprects. The kernel module 
would basically ignore everything but the DISCARD commands.
3. Something else?

I don't like option (1) because it somehow assumes that the user program 
only cares about OpenGL (and that's quite selfish). There are many use 
cases where it is plainly the incorrect thing to do:
- A user running something like Quake in listenserver mode; if they switch 
away from Quake for some reason (incoming messages, whatever), the server 
will stop and eventuall all clients will timeout.
- Imagine a chat application that uses some fancy 3D graphics for whatever 
reason (glitz, for example). Now this application may just be in the middle 
of drawing something when the user moves some other application above it. 
The end result will be that the applications essentially becomes locked up 
until it becomes visible again; in the mean time, the chat might time out 
and disconnect the user.
So (1) clearly isn't a good solution.

Option (2) is more correct, but it does seem a little bit hackish.

Any better ideas? Perhaps tracking which buffers were discarded? That's not 
exactly beautiful either.

cu,
Nicolai

 
 Keith


pgp9gwf1pFSni.pgp
Description: PGP signature


Re: [r300] VB lockup found and fixed

2005-02-18 Thread Nicolai Haehnle
On Friday 18 February 2005 18:17, Keith Whitwell wrote:
 Ben Skeggs wrote:
  I'm still rather new at this, so forgive me if this is a bad suggestion.
  How about going with option 2, but only submitting the command buffer
  anyway if nr_released_bufs != 0.
  
  Would this cause any unwanted side effects?  It seems better than just
  always submitting buffers with no cliprects anyhow.
 
 Oh, btw - note that if you start thowing buffers out, you have to 
 account for the fact that the hardware hasn't been programmed with the 
 state that you thought it had - probably by setting a dirty flag or lost 
 context flag.

The command buffer is always sent to the kernel now (and clipping is used to 
prevent any real rendering from happening), so this particular bug should 
be gone.

There's still at least one hardware lockup bug that can be triggered with 
glxgears; unfortunately, this one doesn't seem to be so easily 
reproducible.

cu,
Nicolai

 Keith


pgpCHhXvCXfaI.pgp
Description: PGP signature


Re: [r300] VB lockup found and fixed

2005-02-18 Thread Nicolai Haehnle
On Friday 18 February 2005 20:04, Nicolai Haehnle wrote:
 There's still at least one hardware lockup bug that can be triggered with 
 glxgears; unfortunately, this one doesn't seem to be so easily 
 reproducible.

This bug can be triggered on my machine by a single instance of glxgears. It 
seems to be unrelated to 2D activity.

The lockup is a lot more likely to occur for me at high framerates. This is 
unfortunate, because it means that I need to turn down debug message 
volume, otherwise the lockup is actually very unlikely to appear at all.

However, the lockup seems to be unrelated to the use of sync: It has 
happened both with RADEON_DEBUG= empty and with RADEON_DEBUG=sync.

When I don't issue any of the magic begin3d sequences from the userspace 
driver, the lockup always happens just after one DMA block has been 
discarded, but I can't find a pattern as to when it happens exactly.

Emitting some of those magic sequences changes the time when the lockup 
happens a bit, but I have no idea what's really going on.

cu,
Nicolai


pgpMA0G4AdB6f.pgp
Description: PGP signature


Re: radeon unified driver

2005-02-18 Thread Nicolai Haehnle
Hi,

On Saturday 19 February 2005 00:46, Roland Scheidegger wrote:
 There is some problem with driconf, it seems to have some problems 
 because the driver's name (radeon) does not match what it expects 
 (r200). Likewise, I couldn't figure out how you'd have 2 separate config 
 sections for both r100 and r200, currently you'll get all options of the 
 r200 (though it won't work for that chip family...), some options just 
 won't do anything on r100.

When I started working on the R300 driver, I did some similar work so that 
the R300 driver should in theory be able to handle R200 as well (this R200 
support has certainly gone to hell by now because of all the hacking that 
has been going on).

The point is, I also faced the driconf issue, and you can see how I 
attempted to tackle it at 
http://cvs.sourceforge.net/viewcvs.py/r300/r300_driver/r300/radeon_screen.c?rev=1.7view=auto
My solution is probably not that good, but it might give you some ideas.

cu,
Nicolai


pgpifZcHChIua.pgp
Description: PGP signature


Re: [r300] VB lockup found and fixed

2005-02-18 Thread Nicolai Haehnle
On Saturday 19 February 2005 01:05, Adam K Kirchhoff wrote:
 Nicolai Haehnle wrote:
 Please, everybody, get the latest CVS (anonymous will take some time to 
 catch up...) and test vertex buffer mode with it (go to r300_run_render() 
 in r300_render.c and change the #if so that r300_vb_run_render() is 
 called). I want to be really sure that this fixes it for other people as 
 well (after all, there may be other causes for lockups that haven't 
occured 
 on my machine yet), and that there are no regressions for those who 
already 
 had working VB mode.
   
 
 
 Correct me if I'm wrong, but to get the driver to automatically use vb 
 mode, all you have to do is to change:
 
 #if 1
 return r300_run_immediate_render(ctx, stage);
 #else
 return r300_run_vb_render(ctx, stage);
 #endif
 
 to
 
 #if 1
 return r300_run_vb_render(ctx, stage);
 #else
 return r300_run_vb_render(ctx, stage);
 #endif
 
 Correct?

That's correct, although it would be easier to just change the 1 into a 0 ;)

 If that's the case, I'm experiencing lockups with neverputt in both 
 immediate and vb modes, though the symptoms are slightly different.  In 
 both cases, I have to ssh in and reboot.  Simply killing neverputt 
 doesn't bring back the machine.  With immediate mode, the lockup seems 
 to happen quicker.  I can't get past the first hole.  The mouse still 
 responds..  I can move it around though, of course, it does no good.  In 
 vb mode, the mouse locks up, too.

 Any ideas?

Interesting, I didn't have lockups that hard for quite some time. Then 
again, I'm only trying to get glxgears to run without lockups...
So this could really be anything.

The first rule of thumb is to run with the environment variable 
RADEON_DEBUG=all set and pipe stderr into a file (beware that this will 
reduce performance a lot), make sure you capture the entire file and 
examine that. The last line should be something like R200 timed out... 
exiting in normal lockups.

cu,
Nicolai


pgpcMZig3phVt.pgp
Description: PGP signature


Re: [R300] jump_and_click retagged.

2005-02-04 Thread Nicolai Haehnle
On Friday 04 February 2005 21:52, Vladimir Dergachev wrote:
 On Fri, 4 Feb 2005, Adam Jackson wrote:
  Here again, ideally this would get folded upstream too, once it's at
  least secure.
 
  I can't really mandate a policy since I haven't been contributing much
  to r300, but I would like to hear how people think this should progress.
 
 Folding DRM driver is not difficult, in fact currently there is just one 
 extra file with r300-specific code.
 
 As for folding R300 driver, we'll see how things turn out. It is hard 
 for me to imagine how this folding could take place - albeit it might turn 
 out to be not too bad.

You know, I actually started the r300 driver with this in mind, which is why 
you'll still see all those r200_* files around. The thing is, I neither 
have the hardware to test whether it still works on R200, nor can I 
currently contribute much to development.

It really shouldn't require a complete rewrite, just a lot of careful (and 
tedious) refactoring.

cu,
Nicolai


pgphN0PekkQlU.pgp
Description: PGP signature


Re: ARB_vertex_program and r200...

2005-01-29 Thread Nicolai Haehnle
On Saturday 29 January 2005 02:47, Dave Airlie wrote:
 
 I've noticed fglrx advertises this for the r200, and doom 3 wants it...
 
 So after I manage to beat fragment_shader into shape, going to have a look
 at how to get ARB_vp working.. r300 guys you have something going on this
 already?

I don't have an R200, but the R200 registers related to vertex processing 
are *completely* different from those on the R300. Now maybe the R200 has 
both a dedicated fixed function pipeline *and* a programmable processor, 
but unless that is the case, I assume fglrx on R200 tries to map ARB_vp 
onto fixed function when it can, and falls back to software otherwise.

Somebody with R200 hardware would have to test fglrx with the glxtest 
dumping tool to find out for sure.

cu,
Nicolai


pgpRb0dLUiFNO.pgp
Description: PGP signature


Re: [R300-commit] r300_driver/r300 r300_render.c,1.29,1.30

2005-01-10 Thread Nicolai Haehnle
On Monday 10 January 2005 04:42, Vladimir Dergachev wrote:
 Update of /cvsroot/r300/r300_driver/r300
 In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv1824
 
 Modified Files:
  r300_render.c 
 Log Message:
 For some reason we need r300Flush when using textures. Perhaps the problem 
is 
 with BITBLT_MULTI call ?

I haven't looked at how texturing is implemented yet, but are the GPU caches 
flushed after the texture upload and before the rendering? I think 
r300Flush() does this implicitly.

cu,
Nicolai


pgptrCrP3wsKB.pgp
Description: PGP signature


Re: [R300][PATCH] Allow use of custom compiler in drm

2004-12-16 Thread Nicolai Haehnle
Hi,

On Monday 13 December 2004 18:29, Kronos wrote:
 Hi,
 Makefile in drm/linux-core/ doesn't pass CC to linux kernel build
 system. This prevents loading the modules if kernel has been compiled
 with a compiler different from the default (ie. gcc).
 
 The following patch add CC to kernel Makefile:
 
 --- a/drm/linux-core/Makefile 2004-10-23 14:43:44.0 +0200
 +++ b/drm/linux-core/Makefile 2004-12-13 18:20:16.0 +0100
 @@ -172,7 +172,7 @@
  all: modules
  
  modules: includes
 - make -C $(LINUXDIR) $(GETCONFIG) SUBDIRS=`pwd` DRMSRCDIR=`pwd` modules
 + make -C $(LINUXDIR) $(GETCONFIG) CC=$(CC) SUBDIRS=`pwd` DRMSRCDIR=`pwd` 
modules
  
  ifeq ($(HEADERFROMBOOT),1)
  
 
 In this way calling:
 CC=gcc-3.4 make
 does the Right Thing

The base DRM Makefile doesn't pass the CC on either, and this may or may not 
be with good reason. AFAIK kernel code can be rather dependant on the exact 
compiler version used, so it's probably a good idea to always use the same 
compiler for both the kernel itself and all modules.

Perhaps somebody with more experience in this area wants to comment?

cu,
Nicolai


pgpBEwFUW4YOm.pgp
Description: PGP signature


Re: Problems with r300 Mesa code

2004-11-21 Thread Nicolai Haehnle
On Sunday 21 November 2004 04:36, Michael Lothian wrote:
 Hi
 
 I'm having problems compiling the r300 Mesa stuff
 
 I get the following errors:
 
[snip]
 
 Anyone know what's causing this

You have to make sure that the compiler uses the radeon_drm.h from 
drm/shared-core in r300 CVS.

cu,
Nicolai


pgptw6UOxXPp5.pgp
Description: PGP signature


Re: R300 with xorg-x11-6.8.0

2004-11-10 Thread Nicolai Haehnle
On Wednesday 10 November 2004 08:10, eGore wrote:
 Hi list,
 
 I ran into some trouble getting DRI running with r300 (I have no idea if
 it is already supported or not), but it didn't work. I looked at xorg's
 logfile and found out that DRI was disabled and I also found out that
 this is caused by radeon_accelfuncs.c. So I wrote the attached patch
 to get around that. DRI does still not work, but at least my xorg log
 tells me it does :-)
 
 !! WARNING !!
 I have no idea what I'm doing, so this might be completely wrong :-)
 !! WARNING !!

You're confusing things. On the one hand, there's general support for DRI 
(and support for client side 3D acceleration), and on the other hand 
there's hardware acceleration for the Render extension. The two things are 
only loosely connected.
The R300 efforts are directed towards creating client side 3D acceleration 
(Render acceleration is a very specialised and limited subset of what the 
3D driver does), so your patch is both unnecessary and wrong, because both 
the R100 and the R200 Render acceleration paths cannot possibly work on an 
R300.

Client side 3D acceleration works without Render acceleration, where 
works means that Clear() operations are accelerated, and I have some code 
for hardware rasterization of untextured primitives which always locks up 
and which I haven't found the time to fix yet.

cu,
Nicolai

 PS: The webpage of r300 is missing a tutorial ;)
 PPS: Patch has been applied for a already patched file, I guess, so line
 numbers might be completely wrong.
 PPPS: I used xorg-x11-6.8.1 from gentoo Linux (xorg-x11-6.8.0-r1 to be
 exact)
 S: I'm having a Radon 9700 Pro from ELSA.
 
 That's it for now, regards,
   Christoph Brill aka egore


pgpsG36UP5Lfs.pgp
Description: PGP signature


Re: [Fwd: r300 Report.]

2004-11-06 Thread Nicolai Haehnle
Hi,

On Saturday 06 November 2004 10:09, Ben Skeggs wrote:
 I think the AGP issues *are* related to the lockup.  I've just switched 
 sysloggers, and switched to CVS XServer (was using release 6.8 before).  
 My previous problems still occurred, but I now seem to have a lot more 
 debugging information in my syslog.  I have the same AGP problem.  If I 
 set AGP to 4x in my BIOS (rather than 8x), the corruption, and the 
 lockup don't occur.

Okay, the new syslog has all the debug info. I notice the following line:

Nov  7 07:37:58 disoft-dc [drm:radeon_cp_init_ring_buffer] writeback test 
failed

The Radeon DRM source code has a comment indicating that writeback doesn't 
work everywhere, but I think it's safe to assume that all R300-based chips 
should be capable of writeback. This would indeed point towards a problem 
in the AGP setup in one way or another, and that means that the ring buffer 
won't work properly. Without a working ring buffer setup, it's only natural 
for a lockup to occur.
Perhaps we should fail completely when the writeback test failed on 
R300-based hardware.

Unfortunately, my AGP-fu isn't strong enough to know what's really going on 
here.

 I've attached my syslog from when the lockup occurred, in case it helps.
 
 I also have the problem as Pat, where glxinfo reports direct rendering: 
 No, but my Xorg log says it is.

As long as the X server works and uses the ring buffer, that would point 
towards a simple configuration problem. Perhaps you could post a log of 
glxinfo with LIBGL_DEBUG=all and RADEON_DEBUG=all?

cu,
Nicolai

 Ben Skeggs


pgpVLxUiyCGRC.pgp
Description: PGP signature


Re: r300.sf.net lockup

2004-11-05 Thread Nicolai Haehnle
Hi,

On Saturday 06 November 2004 01:04, Ben Skeggs wrote:
 Hello,
 
 I've been trying to get the experimental r300.sf.net driver to work on
 my machine.  I've compiled and installed it ok, but everytime I start
 the X server with DRI enabled, the top of the screen is corrupted, which
 I'm assuming is the xterms that are supposed to be showing.  However,
 the mouse pointer is draw correctly, and I'm still able to move it.
 
 I've posted what I captured in syslog, and my xorg log below.
 
 The card is a Powercolor Radeon 9600 256MB (RV350 AP), I tested with
 vanilla 2.6.9 with reiser4 patched in.

Thanks for testing on AFAIK untested hardware.

(from just before the SysRq message)
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0x4008642a, 
 nr=0x2a, dev 0xe200, auth=1
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0xc010644d, 
 nr=0x4d, dev 0xe200, auth=1
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0xc0286429, 
 nr=0x29, dev 0xe200, auth=1
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0xc010644d, 
 nr=0x4d, dev 0xe200, auth=1
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0xc0286429, 
 nr=0x29, dev 0xe200, auth=1
 Nov  6 06:05:44 [kernel] [drm:drm_ioctl] pid=9718, cmd=0xc010644d, 
 nr=0x4d, dev 0xe200, auth=1

This is a lock, followed by indirect and freelist_get calls. There are two 
things that concern me:

1. There's a lot less debug output than I get. Also, it's interesting that 
radeon_cp_dispatch_indirect itself appears to hang. That's not completely 
impossible, but I've never seen it happen. (or maybe it just seems that way 
in your syslog because we don't get full debug messages).

2. There are no calls to cp_idle between indirect buffer emits. This 
indicates that you are running an old DDX, and not X.Org CVS + patch from 
r300_driver CVS. The latest patch contains a workaround for a known lockup 
problem. Said problem shouldn't cause a lockup unless a 3D client is 
running, but you never know...

Some more things:
3. A lockup on X server startup is usually a sign for bad microcode, though 
you do have the correct log message in syslog.

4. Your card has 256MB memory while I can only test with 128MB. Has anybody 
successfully experimented in any way (r300_demo or r300_driver) with 256MB 
cards? I remember seeing that the large memory versions had some paging 
hacks, and there might be related differences that cause the lockup here.

cu,
Nicolai


pgpuYC189RfMS.pgp
Description: PGP signature


Re: [Fwd: r300 Report.]

2004-11-05 Thread Nicolai Haehnle
Hi,

On Friday 05 November 2004 23:12, Pat Suwalski wrote:
[snip]
 I am running the following system:
   - AMD 64 fx-51

I'm afraid that this is a very likely culprit, assuming you're running in 64 
bit mode. The trouble is that parts of the DRM interface and also some of 
the code interfacing the hardware might be broken when it comes to 64 bit.

I'm trying to fix code that is obviously 64bit unsafe when I notice 
something, but since I don't have the hardware to test it, I really can't 
promise anything.

[snip]
 One way or another, the PCI id of my card is 1002:4e48, and it seems to
 have no negative effects on the hardware, so it might as well be added
 to the list of pci id's.

The PCI ID is already added in the DRM branch of r300_driver.

As for the AGP issues, I have no idea. Maybe they're related to the lockup, 
maybe not.

cu,
Nicolai

 If anyone has any insight into what's up with all of this, I'm all ears.
 Again, I'll help out as much as I can. Thanks.
 
 --Pat


pgpCKPT3cxirO.pgp
Description: PGP signature


Re: Multiple hardware locks

2004-11-01 Thread Nicolai Haehnle
On Monday 01 November 2004 07:01, Thomas Hellström wrote:
 You are probably right, and it would be quite easy to implement such
 checks in the via command verifier as long as each lock is associated with
 a certain hardware address range.
 
 However, I don't quite see the point in plugging such a security hole when
 there are a similar ways to accomplish DOS, hardware crashes and even
 complete lockups using DRI.
 
 On via, for example, writing random data to the framebuffer, writing
 random data to the sarea, taking the hardware lock and sleeping for an
 indefinite amount of time. Writing certain data sequences to the HQV locks
 the north bridge etc.
 
 Seems like DRI allow authorized clients to do these things by design?
 
From what I've learned, the DRI isn't exactly designed for robustness. 
Still, an authorized client should never be able to cause a hardware 
crash/lockup, and an authorized client must not be able to issue arbitrary 
DMA requests. As far as I know, all DRMs that are enabled by default 
enforce at least the latter.

Personally I believe that in the long term, the DRI should have (at least) 
the following security properties:
1. Protect against arbitrary DMA (arbitrary DMA trivially allows 
circumvention of process boundaries)
This can be done via command-stream checks.

2. Prevent hardware lockup or provide a robust recovery mechanism 
(protection of multi-user systems, as well as data protection)
Should be relatively cheap via command-stream checks on most hardware 
(unless there are crazy hardware problems with command ordering like there 
seem to be on some Radeons). I believe that in the long term, recovery 
should be in the kernel rather than the X server.

3. Make sure that no client can cause another client to crash 
(malfunctioning clients shouldn't cause data loss in other applications)
In other words, make sure that a DRI client can continue even if the shared 
memory areas are overwritten with entirely random values. That does seem 
like a daunting task.

4. Make sure that no client can block access to the hardware forever (don't 
force the user to reboot)
I have posted a watchdog patch that protects against the take lock, sleep 
forever problem a long time ago. The patch has recently been updated by 
Dieter Nützel (search for updated drm.watchdog.3). However, I have to admit 
that the patch doesn't feel quite right to me.

5. Enable the user to kill/suspend resource hogs
Even if we protect against lock abuse, a client could still use excessive 
amounts of texture memory (thus causing lots of swap) or emit rendering 
calls that take extremely long to execute. That kills latency and makes the 
system virtually unusable. Perhaps the process that authorizes DRI clients 
should be able to revoke or suspend that authorization. A suspend would 
essentially mean that drmGetLock waits until the suspend is lifted.

I know that actually implementing these things in such a way that they Just 
Work is not a pleasant task. I just felt like sharing a brain dump.

cu,
Nicolai


pgpkLU7ArzKbS.pgp
Description: PGP signature


Re: R300 depth buffer

2004-10-28 Thread Nicolai Haehnle
Hi,

First of all, you should really check out the r300_driver module from CVS of 
the r300 project on SourceForge, and especially have a look at 
docs/r300_reg.h, which is where I put all register information that I and 
others have found so far.

On Tuesday 26 October 2004 14:18, Jerome Glisse wrote:
 Hi,
 
 I was playing a little around with r300 mainly looking at depth buffer. 
 I'm still unable to make it work properly.
 Thus i have few questions.  In radeon driver it seems that default value 
 for z_scale  z_offset are 0 (radeon/radeon_state_init.c)
 Why are they set like that ?
 
 I changed the depth in order to have something more conveniant on screen :
 
 adaptor.depth_pitch=display_width | (0x8  16);
 maybe better to write it as :
 adaptor.depth_pitch=display_width | (0x4  17);

As long as we don't know what these constants mean, is there really a 
difference?

 in void Emit3dRegs(void) i used informations from radeon register.
 
 /* do not enable either of stencil, Z or anything else */
 e32(CP_PACKET0(RADEON_RB3D_CNTL,0));
 e32(RADEON_COLOR_FORMAT_ARGB | RADEON_Z_ENABLE);
 
 e32(CP_PACKET0(RADEON_RB3D_ZSTENCILCNTL,0));
 e32(RADEON_Z_WRITE_ENABLE | RADEON_DEPTH_FORMAT_32BIT_FLOAT_Z | 
 RADEON_Z_TEST_LESS);

Basically everything in the 3D hardware interface has changed from R200 to 
R300, so the above almost certainly doesn't do what you want. Again, have a 
look at r300_reg.h

Also, my work-in-progress driver can already clear the depth buffer in 
hardware in a way that is consistent with the software fallback, so you can 
have a look at how the registers are set up there, in r300_ioctl.c and 
r300_state.c.

  Maybe we should put somewhere a list of things to find and who is 
 working on it, thus people won't work
 on the same things in the mean time or they could work together more 
 eviciently. Also maybe it could
 be usefull to make a plan of things we want to discover.

 z buffer  stencil buffer

That would be very helpful. I haven't looked at stencil settings at all, and 
I'm kind of confused about the Z-buffer format. The R300 seems to use ZZZS 
format for 24bit depth/8bit stencil where the R300 used SZZZ.

 matrix stack for modelview, projection  texture (is the information 
 of radeon enought ?)

I think I've pretty much got that down. The R300 has a very flexible 
programmable vertex processor, and the driver is responsible for setting up 
the correct matrices.

 TL route

Again, I think I've got most of that down.

If you could help with texture specification/formats, I'd be very thankful. 
The register addresses are already in r300_reg.h, but the texture format 
itself, how mipmaps work etc. is still a mystery.

 I think with this feature we could make a quite good first hardware 
 accelerated driver.
 
 By the way i find that clear_depth_buffer  clear_color_buffer are quite
 complex. Is all the stuff they have in really necessary ? (i will try to 
 look at that latter but if someone already done it).

No, most of those are redundant state updates produced by ATI's proprietary 
driver. Again, look at how the work-in-progress DRI driver does it.

 Oh yes i almost forgot :) my device id is 0x4e4a (it is a radeon 9800)

I've added this and other IDs from pciids.sf.net to the experimental driver 
in r300_driver/drm

cu,
Nicolai

 Jerome Glisse


pgpvsr9MCtyY6.pgp
Description: PGP signature


Re: glxtest with fglrx / r200

2004-10-28 Thread Nicolai Haehnle
On Thursday 28 October 2004 20:11, Roland Scheidegger wrote:
 - 0x2284. This one is interesting. The script gives this the name 
 X_VAP_PVS_WAITIDLE, the driver always emits this right before 
 R200_SE_VAP_CNTL. Apparently it exists on r200 too. Looks like it forces 
 the VAP (whatever that stands for...) to wait. Would we need to emit 
 that too?

The guessed register name is from me. On the R300, fglrx almost always 
writes 0 to this register before changing any vertex processor-related 
state, so I assume that it has some kind of serialising purpose. However, I 
have yet to run into any situation where emitting it makes any difference 
in my own code, so I don't know this for sure.

cu,
Nicolai


pgp0GdyzVtEM4.pgp
Description: PGP signature


Re: [Mesa3d-dev] Doom3 works on R200!

2004-10-24 Thread Nicolai Haehnle
On Sunday 24 October 2004 19:38, Bernardo Innocenti wrote:
 Even though I just have a Radeon 9200, I'm very excited about the
 ongoning R300 effort and with there was a similar project for NVidia
 cards too.

If that with above is a wish like I think it probably is, you might want 
to have a look at Utah-GLX which has rudimentary hw accel support. Also, 
somebody, somewhere (possibly in the nv driver in X, but I'm not sure) 
figured out how to do DMA.
Of course, what's really needed is the equivalent of glxtest for NVidia and 
somebody with NVidia hardware who has a few weeks to spare for long nights 
of puzzling over register dumps :)

cu,
Nicolai


pgppeOhHYFYMY.pgp
Description: PGP signature


DRM linux-core: inter_module_get(agp)

2004-10-23 Thread Nicolai Haehnle
Hi,

shouldn't the inter_module_get(agp) in drm_core_init() be 
inter_module_get(drm_agp) instead? drm_agp is what the old (non-core) 
DRM uses, and it works for me (unlike agp).

Also, which kernel version do I need for the symbol_get() thing to work?

cu,
Nicolai


pgpF3XicFohtp.pgp
Description: PGP signature


Re: Radeon 9600 with radeon DRM module

2004-10-19 Thread Nicolai Haehnle
On Monday 18 October 2004 16:04, Tino Keitel wrote:
 On Mon, Oct 18, 2004 at 09:13:57 +0200, Tino Keitel wrote:
 
 [...]
 
  Thanks again. Looks like I used the wrong 2d driver patch before
  (xorg680.atipatch.r300). Now the glxinfo output looks right:
  
  OpenGL renderer string: Mesa DRI R300 20040924 AGP 4x NO-TCL
  
  However, glxgears only prints out this messages and exits:
  
  disabling 3D acceleration
  drmCommandWrite: -22

You're probably using the main-kernel or DRI version of the DRM. You need 
the DRM from r300_driver/drm, because only that version of the DRM 
implements the new ioctls.

 Hm, this looks funny (from r300_context.c):
 
 if (1 ||
 driQueryOptionb(r300-radeon.optionCache, no_rast)) {
 fprintf(stderr, disabling 3D acceleration\n);
 
 Is this intended behaviour? I thought the r300 only exists to provide
 3D acceleration.

That's perfectly correct behaviour. My intention is to start with a purely 
software rendered driver and go from there. Right now, no primitives will 
be hardware accelerated, only glClear() actually uses the hardware path.

Yes, that's a disappointment, but at least the driver is actually very 
stable (for me, that is ;)).
If you think there should be more features, your help is always welcome :)

cu,
Nicolai


pgpDLuMqoy58P.pgp
Description: PGP signature


SW fallback: clipping bug [patch]

2004-10-15 Thread Nicolai Haehnle
Hi,

There is disagreement about the meaning of the CLIPSPAN _n parameter in CVS.

The drivers I have looked at and drivers/dri/common/spantmp.h treat _n as 
the number of pixels in the span after clipping.
depthtmp.h and stenciltmp.h treat _n as the end+1 x coordinate of the span.

This inconsistency leads to artifacts when software fallbacks are hit while 
clipping is used, especially with partially obscured clients. The attached 
patch should fix these artifacts by changing depthtmp.h and stenciltmp.h 
appropriately.

cu,
Nicolai
Index: drivers/dri/common/depthtmp.h
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/common/depthtmp.h,v
retrieving revision 1.5
diff -u -p -b -r1.5 depthtmp.h
--- drivers/dri/common/depthtmp.h	8 Oct 2004 22:21:09 -	1.5
+++ drivers/dri/common/depthtmp.h	15 Oct 2004 19:48:14 -
@@ -45,15 +45,15 @@ static void TAG(WriteDepthSpan)( GLconte
 	   GLint i = 0;
 	   CLIPSPAN( x, y, n, x1, n1, i );
 
-	   if ( DBG ) fprintf( stderr, WriteDepthSpan %d..%d (x1 %d)\n,
-   (int)i, (int)n1, (int)x1 );
+	   if ( DBG ) fprintf( stderr, WriteDepthSpan %d..%d (x1 %d) (mask %p)\n,
+   (int)i, (int)n1, (int)x1, mask );
 
 	   if ( mask ) {
-		  for ( ; i  n1 ; i++, x1++ ) {
+		  for ( ; n10 ; i++, x1++, n1-- ) {
 		 if ( mask[i] ) WRITE_DEPTH( x1, y, depth[i] );
 		  }
 	   } else {
-		  for ( ; i  n1 ; i++, x1++ ) {
+		  for ( ; n10 ; i++, x1++, n1-- ) {
 		 WRITE_DEPTH( x1, y, depth[i] );
 		  }
 	   }
@@ -87,11 +87,11 @@ static void TAG(WriteMonoDepthSpan)( GLc
    __FUNCTION__, (int)i, (int)n1, (int)x1, (GLuint)depth );
 
 	   if ( mask ) {
-		  for ( ; i  n1 ; i++, x1++ ) {
+		  for ( ; n10 ; i++, x1++, n1-- ) {
 		 if ( mask[i] ) WRITE_DEPTH( x1, y, depth );
 		  }
 	   } else {
-		  for ( ; i  n1 ; i++, x1++ ) {
+		  for ( ; n10 ; x1++, n1-- ) {
 		 WRITE_DEPTH( x1, y, depth );
 		  }
 	   }
@@ -162,8 +162,9 @@ static void TAG(ReadDepthSpan)( GLcontex
 	{
 	   GLint i = 0;
 	   CLIPSPAN( x, y, n, x1, n1, i );
-	   for ( ; i  n1 ; i++ )
-		  READ_DEPTH( depth[i], (x1+i), y );
+	   for ( ; n10 ; i++, n1-- ) {
+		  READ_DEPTH( depth[i], x+i, y );
+	   }
 	}
 	 HW_ENDCLIPLOOP();
 #endif
Index: drivers/dri/common/stenciltmp.h
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/common/stenciltmp.h,v
retrieving revision 1.2
diff -u -p -b -r1.2 stenciltmp.h
--- drivers/dri/common/stenciltmp.h	6 Aug 2003 18:12:22 -	1.2
+++ drivers/dri/common/stenciltmp.h	15 Oct 2004 19:48:15 -
@@ -41,13 +41,13 @@ static void TAG(WriteStencilSpan)( GLcon
 
 	   if (mask)
 	   {
-		  for (;in1;i++,x1++)
+		  for (;n10;i++,x1++,n1--)
 		 if (mask[i])
 			WRITE_STENCIL( x1, y, stencil[i] );
 	   }
 	   else
 	   {
-		  for (;in1;i++,x1++)
+		  for (;n10;i++,x1++,n1--)
 		 WRITE_STENCIL( x1, y, stencil[i] );
 	   }
 	}
@@ -107,8 +107,8 @@ static void TAG(ReadStencilSpan)( GLcont
 	{
 	   GLint i = 0;
 	   CLIPSPAN(x,y,n,x1,n1,i);
-	   for (;in1;i++)
-		  READ_STENCIL( stencil[i], (x1+i), y );
+	   for (;n10;i++,n1--)
+		  READ_STENCIL( stencil[i], (x+i), y );
 	}
 	 HW_ENDCLIPLOOP();
   }


pgpHBp7HXFaCC.pgp
Description: PGP signature


[r300] r300_driver update

2004-10-15 Thread Nicolai Haehnle
Hi,

I have uploaded my changes to the r300_driver CVS. I haven't merged any 
changes to the R200 driver that might apply, and I haven't merged the 
drm-core changes. I will do that within the next days.
Accelerated color buffer clear and basic clipping (without GL scissors) 
works, although I have noticed a flickering regression - this might be an 
interaction with updated Mesa CVS or a stupid merging mistake on my part, 
I'm not sure.

I have also uploaded a new patch for the DDX, which is especially necessary 
for stability. With this patch, there are currently no known lockups, and I 
have tested running multiple DRI clients thoroughly.

Securing against lockups comes at a price. The basic problem is that there 
is too little communication between what the DRM writes to the ring buffer 
and what the X server sends via indirect buffers. For now, the X server 
will alway issues a single cp_idle ioctl before sending indirect buffers. 
To quote my comment in radeon_accel.c:

/* TODO: Fix this more elegantly.
 * Sometimes (especially with multiple DRI clients), this code
 * runs immediately after a DRI client issues a rendering command.
 *
 * The accel code regularly inserts WAIT_UNTIL_IDLE into the
 * command buffer that is sent with the indirect buffer below.
 * The accel code fails to set the 3D cache flush registers for
 * the R300 before sending WAIT_UNTIL_IDLE. Sending a cache flush
 * on these new registers is not necessary for pure 2D functionality,
 * but it *is* necessary after 3D operations.
 * Without the cache flushes before WAIT_UNTIL_IDLE, the R300 locks up.
 *
 * The CP_IDLE call into the DRM indirectly flushes all caches and
 * thus avoids the lockup problem, but the solution is far from ideal.
 * Better solutions could be:
 *  - always flush caches when entering the X server
 *  - track the type of rendering commands somewhere and issue
 *cache flushes when they change
 * However, I don't feel confident enough with the control flow
 * inside the X server to implement either fix. -- nh
 */

My hope is that the X server will become just another DRM client as far as 
accelerated rendering is concerned, which will eventually allow this to be 
dealt with cleanly in the DRM. I am not interested in performance right 
now, and since these idle calls seem to be the most foolproof thing to do, 
I will leave them in.

cu,
Nicolai


pgpACVOpeXt02.pgp
Description: PGP signature


R300 driver update

2004-09-28 Thread Nicolai Haehnle
Hi,

I decided to commit what I have in terms of an R300 driver so far. You can 
find  You can find it in th r300 project on SourceForge in the r300_driver 
module:
cvs -d:pserver:[EMAIL PROTECTED]:/cvsroot/r300 checkout 
r300_driver

As you can easily see I started with the R200 driver. Since I didn't know 
what to rip out and what to keep in at first, I decided to take the extra 
time and separate stuff out into Radeon generic and R200/R300 code. So in 
theory, the driver should still work on R200 hardware, although a) I 
couldn't tested this and b) it's not quite uptodate (about a week old).

I have written state emission code which works, and I have started 
implementing hardware accelerated clear. Something does happen on the 
screen, but immediately afterwards I get a hard lockup for now. I'll let 
you know when the driver is more usable.

cu,
Nicolai


pgpsE6JTPWXag.pgp
Description: PGP signature


Re: R300 driver update

2004-09-28 Thread Nicolai Haehnle
On Tuesday 28 September 2004 15:02, Vladimir Dergachev wrote:
 On Tue, 28 Sep 2004, Nicolai Haehnle wrote:
 Hi Nicolai, you can just rename the driver so it produces r300_dri.so - 
 the 2d driver is in fact configured to tell DRI clients to use that 
 binary.

Well, it does produce r300_dri.so right now. The reason I am doing the work 
in an r300/ subdirectory is that development is going on in a different CVS 
repository, and I really don't fancy worrying about complicated merges with 
Mesa CVS (across file renames, at that!). Or maybe I somehow misunderstood 
what you meant?

On a more happy note, Clear doesn't lock up anymore, but the coordinate 
calculation seems to be all wrong.

cu,
Nicolai


pgpF6IkHKYM9k.pgp
Description: PGP signature


Re: Where is the source for DRM 1.5?

2004-09-27 Thread Nicolai Haehnle
On Monday 27 September 2004 19:00, Barry Scott wrote:
 I have failed to find a tar ball of CVS tag that names any
 specific version of DRM. What did I miss?

To my knowledge, there is no global DRM version like this. Each driver has 
an interface version number, but this number does not necessarily mark a 
particular release of DRM.
Basically, if your X windows driver or the 3D client driver complains that 
the DRM is too old, just upgrade to the latest version from your 
distribution. You can do this by upgrading the kernel. If the 
distribution's kernel is outdated, either compile your own kernel from 
stock sources or, if you're feeling adventurous, get the DRM CVS (I would 
link you to the download page on http://dri.sourceforge.net/ , but it 
appears to be down right now).

cu,
Nicolai


pgpbesNNmaMmV.pgp
Description: PGP signature


R200 - save_on_next_unlock

2004-09-26 Thread Nicolai Haehnle
Hi,

I'm trying to completely understand the command buffer stuff for my R300 
driver work, and I noticed something about the save_on_next_unlock code. 
I'm concerned about the state_atom::check function.
The check functions use the current state of the context to figure out which 
atoms must be emitted. Now, consider the following scenario: 
1. The driver unlocks and saves the state. 
2. The application issues a rendering command. The buffer is not full yet, 
so FlushCmdBuf isn't called.
3. The application changes some state (e.g. texture enable/disable) and 
issues some more rendering commands.
4. This time, FlushCmdBuf is called. It sees the lost_context flag and 
triggers the state backup code.
5. The state backup code still uses check() to see which state atoms are 
active, but this produces wrong results because the state at time (2) at 
the beginning of the command buffer is different from the state now.

So either I haven't fully understood the mechanisms yet (please enlighten 
me), or I found your bug ;)
Unfortunately, I can't verify this because I don't have R200 hardware.

cu,
Nicolai


pgp10ZHxbwSWz.pgp
Description: PGP signature


Software fallback fixes and R300 driver work

2004-09-23 Thread Nicolai Haehnle
Hi,

apparently I'm the first to use a full software fallback for glClear(), as I 
ran into a few problems that the attached patch should fix:
- spantmp.h doesn't check for NULL masks
- add a WriteMonoDepthSpan function to the swrast to driver interface
- use this function to clear the depth buffer in swrast when available 
(swrast depth buffer clearing completely ignores driver functions right 
now)

I decided to take it to the next level and actually start hacking on a DRI 
driver for the R300. So far I modified the R200 driver to recognize the 
R300 family and use 100% software fallbacks in this case. I will put source 
up as soon as some rasterization is actually hardware accelerated.

One thing I noticed in the process: r200Flush() unconditionally calls 
r200EmitState(). Is this really necessary? I was assuming that glFlush() 
could be a noop when it's not preceded by any rendering commands.

cu,
Nicolai
Index: drivers/dri/common/depthtmp.h
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/common/depthtmp.h,v
retrieving revision 1.2
diff -u -p -r1.2 depthtmp.h
--- drivers/dri/common/depthtmp.h	6 Aug 2003 18:12:22 -	1.2
+++ drivers/dri/common/depthtmp.h	23 Sep 2004 23:27:25 -
@@ -64,6 +64,42 @@ static void TAG(WriteDepthSpan)( GLconte
HW_WRITE_UNLOCK();
 }
 
+static void TAG(WriteMonoDepthSpan)( GLcontext *ctx,
+ GLuint n, GLint x, GLint y,
+ const GLdepth depth,
+ const GLubyte mask[] )
+{
+   HW_WRITE_LOCK()
+  {
+	 GLint x1;
+	 GLint n1;
+	 LOCAL_DEPTH_VARS;
+
+	 y = Y_FLIP( y );
+
+	 HW_CLIPLOOP()
+	{
+	   GLint i = 0;
+	   CLIPSPAN( x, y, n, x1, n1, i );
+
+	   if ( DBG ) fprintf( stderr, %s %d..%d (x1 %d) = %u\n,
+   __FUNCTION__, (int)i, (int)n1, (int)x1, (uint)depth );
+
+	   if ( mask ) {
+		  for ( ; i  n1 ; i++, x1++ ) {
+		 if ( mask[i] ) WRITE_DEPTH( x1, y, depth );
+		  }
+	   } else {
+		  for ( ; i  n1 ; i++, x1++ ) {
+		 WRITE_DEPTH( x1, y, depth );
+		  }
+	   }
+	}
+	 HW_ENDCLIPLOOP();
+  }
+   HW_WRITE_UNLOCK();
+}
+
 static void TAG(WriteDepthPixels)( GLcontext *ctx,
    GLuint n,
    const GLint x[],
Index: drivers/dri/common/spantmp.h
===
RCS file: /cvs/mesa/Mesa/src/mesa/drivers/dri/common/spantmp.h,v
retrieving revision 1.2
diff -u -p -r1.2 spantmp.h
--- drivers/dri/common/spantmp.h	6 Aug 2003 18:12:22 -	1.2
+++ drivers/dri/common/spantmp.h	23 Sep 2004 23:27:25 -
@@ -123,15 +123,29 @@ static void TAG(WriteRGBAPixels)( const 
 
 	 HW_WRITE_CLIPLOOP()
 	{
-	   for (i=0;in;i++)
+	   if (mask)
 	   {
-		  if (mask[i]) {
+	  for (i=0;in;i++)
+	  {
+		 if (mask[i]) {
+		const int fy = Y_FLIP(y[i]);
+		if (CLIPPIXEL(x[i],fy))
+			   WRITE_RGBA( x[i], fy,
+   rgba[i][0], rgba[i][1],
+   rgba[i][2], rgba[i][3] );
+		 }
+	  }
+	   }
+	   else
+	   {
+	  for (i=0;in;i++)
+	  {
 		 const int fy = Y_FLIP(y[i]);
 		 if (CLIPPIXEL(x[i],fy))
 			WRITE_RGBA( x[i], fy,
 rgba[i][0], rgba[i][1],
 rgba[i][2], rgba[i][3] );
-		  }
+	  }
 	   }
 	}
 	 HW_ENDCLIPLOOP();
@@ -160,9 +174,17 @@ static void TAG(WriteMonoRGBASpan)( cons
 	{
 	   GLint i = 0;
 	   CLIPSPAN(x,y,n,x1,n1,i);
-	   for (;n10;i++,x1++,n1--)
-		  if (mask[i])
+	   if (mask)
+	   {
+	  for (;n10;i++,x1++,n1--)
+		 if (mask[i])
+		WRITE_PIXEL( x1, y, p );
+	   }
+	   else
+	   {
+	  for (;n10;i++,x1++,n1--)
 		 WRITE_PIXEL( x1, y, p );
+	   }
 	}
 	 HW_ENDCLIPLOOP();
   }
@@ -186,12 +208,23 @@ static void TAG(WriteMonoRGBAPixels)( co
 
 	 HW_WRITE_CLIPLOOP()
 	{
-	   for (i=0;in;i++)
-		  if (mask[i]) {
+	   if (mask)
+	   {
+		  for (i=0;in;i++)
+		 if (mask[i]) {
+			int fy = Y_FLIP(y[i]);
+			if (CLIPPIXEL( x[i], fy ))
+			   WRITE_PIXEL( x[i], fy, p );
+		 }
+	   }
+	   else
+	   {
+		  for (i=0;in;i++) {
 		 int fy = Y_FLIP(y[i]);
 		 if (CLIPPIXEL( x[i], fy ))
 			WRITE_PIXEL( x[i], fy, p );
 		  }
+	   }
 	}
 	 HW_ENDCLIPLOOP();
   }
@@ -238,12 +271,23 @@ static void TAG(ReadRGBAPixels)( const G
 
 	 HW_READ_CLIPLOOP()
 	{
-	   for (i=0;in;i++)
-		  if (mask[i]) {
+	   if (mask)
+	   {
+		  for (i=0;in;i++)
+		 if (mask[i]) {
+			int fy = Y_FLIP( y[i] );
+			if (CLIPPIXEL( x[i], fy ))
+			   READ_RGBA( rgba[i], x[i], fy );
+		 }
+	   }
+	   else
+	   {
+		  for (i=0;in;i++) {
 		 int fy = Y_FLIP( y[i] );
 		 if (CLIPPIXEL( x[i], fy ))
 			READ_RGBA( rgba[i], x[i], fy );
 		  }
+	   }
 	}
 	 HW_ENDCLIPLOOP();
   }
Index: swrast/s_depth.c
===
RCS file: 

Re: [r300] - likely compatibility w rv360?

2004-09-22 Thread Nicolai Haehnle
Hi,

On Wednesday 22 September 2004 00:45, Dag Bakke wrote:
 If I load dri in my xorg.conf, the graphics gets wedged as soon
 as I start X. I get more or less garbled stuff from the previous session. 
I
 can move the cursor, but that's it. I can not exit from X with
 ctrl-alt-backspace, or shift to the console with ctrl-alt-f1. I also tried
 without radeonfb, just in case. I see no evidence of problems in the Xorg
 log with dri, which I can review after rebooting via sysrq. Of course, if
 the machine panicked, nothing got to the kernel log. And  no, my wireless
 keyboard does not have keyboard leds..

Whenever this has happened to me, it was caused by bad microcode. Check your 
syslog for a message Loading Rx00 microcode, and make sure it says R300.
If it does, then maybe this chip does need different microcode, as Vladimir 
said.

cu,
Nicolai


pgpEXEqNbM9KJ.pgp
Description: PGP signature


Re: [R300] pixel shader

2004-09-18 Thread Nicolai Haehnle
On Sunday 19 September 2004 03:53, Vladimir Dergachev wrote:
 Hi Nicolai,
 
 I committed a modification of pretty_print_command_stream.tcl that
 decodes most of PFS_INSTR* registers.
 
 It still prints the actual value written - as a last value after 
equals
 sign. So, I am hoping that even if this messed up your disassembler it is 
 easy to fix - I am not that proficient in Python to venture modifying your 
 code :)
 
 Also, r300_demo now have headers for both vertex shader and pixel 
 shader.

It did mess up the disassembler which uses a simple regex to catch the data, 
but it's an easy fix.
 
 Lastly, I think it would be useful to have an assembler for vertex 
 shaders and pixel shaders that does the job similar to those DirectX 
 functions that translate textual description into coded on (I also believe 
 that OpenGL 2.0 should have something like this as well).

Doesn't Mesa already support ARB_vertex_program and ARB_fragment_program? I 
think it would be best to add R300 programs as an additional backend for 
the already existing infrastructure, though I have no idea how flexible the 
existing code is - I haven't looked at it in detail.

cu,
Nicolai


pgpPA5WoUzFsG.pgp
Description: PGP signature


Re: R300 development

2004-09-14 Thread Nicolai Haehnle
On Tuesday 14 September 2004 17:01, Vladimir Dergachev wrote:
 Hi all,
 
 The new project name on SF is R300, the registration just went 
through,
 so I am in the process of setting things up.
 
 Everyone is welcome ! Also, despite the name, this project is *not*
 just about R300. If are interested in finding out more about earlier 
 radeons this is a place to exchange (public !) info about them as well.

I have committed my latest registers file into the r300_demo directory of 
CVS. Changes from the version posted to the mailing list:

- better understanding of how vertex program and data is uploaded
- some work towards input/output control of vertex programs
- start with decoding fragment programs

cu,
Nicolai


pgpgJTUrUI4tj.pgp
Description: PGP signature


R300 registers

2004-09-13 Thread Nicolai Haehnle
Hi,

while I've had less success (read: hard locks and reboots) with the recently 
drmtest and r300_demo, I did use glxtest to find out registers of the R300.

Basically, what I did is run a small GL program, get the command buffer, 
make some small changes and rerun. Often, this results in only a small 
change in the command buffer (found using diff), which makes it possible to 
guess register addresses and constants. So while I haven't been able to 
crosstest my results by _sending_ commands using my new knowledge, I am 
pretty certain that they are mostly correct (as long as there is no 
explicit comment stating otherwise).

So far, I have found registers for alpha blending and testing among other 
things. I have also decoded most of the vertex program instruction set. 
However, I do not have the registers for vertex program *setup* yet. All I 
know is that both the program and its environment/parameters (whatever you 
want to call it) are uploaded via 0x2208.

All my findings are documented in the attached header file.

cu,
Nicolai
// The entire range from 0x2300 to 0x2AC inclusive seems to be used for
// immediate vertices
#define R300_VAP_VTX_COLOR_R0x2464
#define R300_VAP_VTX_COLOR_G0x2468
#define R300_VAP_VTX_COLOR_B0x246C
#define R300_VAP_VTX_POS_0_X_1  0x2490 // used for glVertex2*()
#define R300_VAP_VTX_POS_0_Y_1  0x2494
#define R300_VAP_VTX_COLOR_PKD  0x249C // RGBA
#define R300_VAP_VTX_POS_0_X_2  0x24A0 // used for glVertex3*()
#define R300_VAP_VTX_POS_0_Y_2  0x24A4
#define R300_VAP_VTX_POS_0_Z_2  0x24A8
#define R300_VAP_VTX_END_OF_PKT 0x24AC // write 0 to indicate end of packet?

/* gap */
#define R300_PP_ALPHA_TEST  0x4BD4
#   define R300_REF_ALPHA_MASK   0x00ff
#   define R300_ALPHA_TEST_FAIL  (0  8)
#   define R300_ALPHA_TEST_LESS  (1  8)
#   define R300_ALPHA_TEST_LEQUAL(2  8)
#   define R300_ALPHA_TEST_EQUAL (3  8)
#   define R300_ALPHA_TEST_GEQUAL(4  8)
#   define R300_ALPHA_TEST_GREATER   (5  8)
#   define R300_ALPHA_TEST_NEQUAL(6  8)
#   define R300_ALPHA_TEST_PASS  (7  8)
#   define R300_ALPHA_TEST_OP_MASK   (7  8)
#   define R300_ALPHA_TEST_ENABLE(1  11)
/* gap */

// Notes:
// - AFAIK fglrx always sets BLEND_UNKNOWN when blending is used in the application
// - AFAIK fglrx always sets BLEND_NO_SEPARATE when CBLEND and ABLEND are set to the same
//   function (both registers are always set up completely in any case)
// - Most blend flags are simply copied from R200 and not tested yet
#define R300_RB3D_CBLEND0x4E04
#define R300_RB3D_ABLEND0x4E08
 /* the following only appear in CBLEND */
#   define R300_BLEND_ENABLE (1  0)
#   define R300_BLEND_UNKNOWN(3  1)
#   define R300_BLEND_NO_SEPARATE(1  3)
 /* the following are shared between CBLEND and ABLEND */
#   define R300_FCN_MASK (3   12)
#   define R300_COMB_FCN_ADD_CLAMP   (0   12)
#   define R300_COMB_FCN_ADD_NOCLAMP (1   12)
#   define R300_COMB_FCN_SUB_CLAMP   (2   12)
#   define R300_COMB_FCN_SUB_NOCLAMP (3   12)
#   define R300_SRC_BLEND_GL_ZERO(32  16)
#   define R300_SRC_BLEND_GL_ONE (33  16)
#   define R300_SRC_BLEND_GL_SRC_COLOR   (34  16)
#   define R300_SRC_BLEND_GL_ONE_MINUS_SRC_COLOR (35  16)
#   define R300_SRC_BLEND_GL_DST_COLOR   (36  16)
#   define R300_SRC_BLEND_GL_ONE_MINUS_DST_COLOR (37  16)
#   define R300_SRC_BLEND_GL_SRC_ALPHA   (38  16)
#   define R300_SRC_BLEND_GL_ONE_MINUS_SRC_ALPHA (39  16)
#   define R300_SRC_BLEND_GL_DST_ALPHA   (40  16)
#   define R300_SRC_BLEND_GL_ONE_MINUS_DST_ALPHA (41  16)
#   define R300_SRC_BLEND_GL_SRC_ALPHA_SATURATE  (42  16)
#   define R300_SRC_BLEND_MASK   (63  16)
#   define R300_DST_BLEND_GL_ZERO(32  24)
#   define R300_DST_BLEND_GL_ONE (33  24)
#   define R300_DST_BLEND_GL_SRC_COLOR   (34  24)
#   define R300_DST_BLEND_GL_ONE_MINUS_SRC_COLOR (35  24)
#   define R300_DST_BLEND_GL_DST_COLOR   (36  24)
#   define R300_DST_BLEND_GL_ONE_MINUS_DST_COLOR (37  24)
#   define R300_DST_BLEND_GL_SRC_ALPHA   (38  24)
#   define R300_DST_BLEND_GL_ONE_MINUS_SRC_ALPHA (39  24)
#   define R300_DST_BLEND_GL_DST_ALPHA   (40  24)
#   define R300_DST_BLEND_GL_ONE_MINUS_DST_ALPHA (41  24)
#   define R300_DST_BLEND_MASK   (63  24)
#define R300_RB3D_COLORMASK 0x4E0C
#   define R300_COLORMASK0_B (10)
#   

Re: Radeon 7200 problems

2004-06-04 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Friday 04 June 2004 12:22, Michel Dnzer wrote:
  Currently, if you set the gart size manually higher than what's possible 
  (set in bios), dri will just get disabled due to missing agp support, 
  which I consider bad behaviour, and that you get a useless error message 
  in that case doesn't help neither.
  (II) RADEON(0): [agp] 262144 kB allocated with handle 0x0001
  (EE) RADEON(0): [agp] Could not bind
  (EE) RADEON(0): [agp] AGP failed to initialize. Disabling the DRI.
  (II) RADEON(0): [agp] You may want to make sure the agpgart kernel 
  module is loaded before the radeon kernel module.
 
 IMHO only the 'Could not bind' error could use some clarification,
 otherwise I find this the only sane way to deal with an impossible
 configuration.

Would it be possible to do an automatic fallback to the largest allowed gart 
size, along with an appropriate warning/error message?
Tell me to shut up if it's not possible to query the maximum size ;)

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAwFrssxPozBga0lwRAqn6AKDX6knsbuksaX3KoAC/A5kG852mbQCgzpxv
9wEb+fLrX40yADNIfAZeCww=
=kgbs
-END PGP SIGNATURE-


---
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Development setup

2004-05-26 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tuesday 25 May 2004 23:43, Maurice van der Pot wrote:
 The modifications I made to the driver were visible when I executed an
 OpenGL app, so I knew it was using the right r200_dri.so. Strangely, 
 I was unable to get most of the debug prints working. The general ones
 with LIBGL_DEBUG seemed to work, but not the r200 specific ones.

Look in r200_context.h for DO_DEBUG, and set it to 1. I assume you've 
already set the environment variable R200_DEBUG, as well.

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAtQCKsxPozBga0lwRAitiAJ9opTRg/AbVBlvy6Fq6lyjg5Ji7gACeI9by
gqPrSG0hAQi2SdIbn1UFGHQ=
=NaxW
-END PGP SIGNATURE-


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] Re: Some questions regarding locks

2004-05-25 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've attached a new version of the patch. This should fix a minor bug: I put 
the call to init_timer() too late, which resulted in a kernel warning when 
the module was loaded/unloaded without actually being used.

On Sunday 23 May 2004 14:37, Michel Dnzer wrote:
  2. The timeout cannot be configured yet. I didn't find prior art as to 
how 
  something like it should be configured, so I'm open for input. For a 
Linux 
  driver, adding to the /proc entries seems to be the logical way to go, 
but 
  the DRI is very ioctl-centric. Maybe both?
 
 What's the goal of making it configurable at all, to allow for driver
 debugging? Maybe that could be dealt with better, see below.

This is actually a good point :)
 
 Is there a way to tell that a process is being debugged? If so, maybe it
 could be handled sanely by default? E.g., release the lock while the
 process is stopped? (That might wreak havoc once execution is resumed
 though) ...

Could be possible, but it *is* bound to wreak havoc. Now that you talk about 
it... in the far future, it would be *very* useful if clients could deal 
with temporary loss of access to the DRM. I'm thinking of the recent 
discussions about the possible future of fbdev, DRI, etc. where all 
graphics access eventually goes through the DRM, or something similar.
In that scenario, we need to have a way to establish a secure terminal that 
is safe against
a) fake messages / dialogs created by DRI clients running in the background 
and
b) screen scraping by background clients

I don't see how this could be done without revoking authorization 
temporarily, including unmapping memory regions.
Once DRI clients can deal with this, running them in a debugger should be a 
piece of cake, really :)

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAs6BxsxPozBga0lwRAglnAKC9ldd4n/KbE1cDSLPapkRHMx2O0QCZAdMC
Ab4c9daD4WyRZWyGPRJmSyw=
=yLV/
-END PGP SIGNATURE-
diff -ur drm-base/linux/drm_drv.h drm/linux/drm_drv.h
--- drm-base/linux/drm_drv.h	2004-05-22 21:41:28.0 +0200
+++ drm/linux/drm_drv.h	2004-05-25 19:51:21.0 +0200
@@ -273,6 +273,8 @@
 MODULE_PARM( drm_opts, s );
 MODULE_LICENSE(GPL and additional rights);
 
+static void drm_lock_watchdog( unsigned long __data );
+
 static int DRM(setup)( drm_device_t *dev )
 {
 	int i;
@@ -415,6 +417,7 @@
 
 	down( dev-struct_sem );
 	del_timer( dev-timer );
+	del_timer_sync( dev-lock.watchdog );
 
 	if ( dev-devname ) {
 		DRM(free)( dev-devname, strlen( dev-devname ) + 1,
@@ -545,6 +548,7 @@
 	if ( dev-lock.hw_lock ) {
 		dev-sigdata.lock = dev-lock.hw_lock = NULL; /* SHM removed */
 		dev-lock.filp = 0;
+		dev-lock.dontbreak = 1;
 		wake_up_interruptible( dev-lock.lock_queue );
 	}
 	
@@ -581,6 +585,10 @@
 	sema_init( dev-struct_sem, 1 );
 	sema_init( dev-ctxlist_sem, 1 );
 
+	init_timer( dev-lock.watchdog );
+	dev-lock.watchdog.data = (unsigned long) dev;
+	dev-lock.watchdog.function = drm_lock_watchdog;
+
 	if ((dev-minor = DRM(stub_register)(DRIVER_NAME, DRM(fops),dev))  0)
 	{
 		retcode = -EPERM;
@@ -928,6 +936,11 @@
 #if __HAVE_RELEASE
 		DRIVER_RELEASE();
 #endif
+		/* Avoid potential race where the watchdog callback is still
+		 * running when filp is freed.
+		 */
+		del_timer_sync( dev-lock.watchdog );
+
 		DRM(lock_free)( dev, dev-lock.hw_lock-lock,
 _DRM_LOCKING_CONTEXT(dev-lock.hw_lock-lock) );
 
@@ -951,6 +964,7 @@
 			}
 			if ( DRM(lock_take)( dev-lock.hw_lock-lock,
 	 DRM_KERNEL_CONTEXT ) ) {
+dev-lock.dontbreak = 1;
 dev-lock.filp	= filp;
 dev-lock.lock_time = jiffies;
 atomic_inc( dev-counts[_DRM_STAT_LOCKS] );
@@ -1096,6 +1110,40 @@
 	return retcode;
 }
 
+
+/**
+ * Lock watchdog callback function.
+ *
+ * Whenever a privileged client must sleep on the lock waitqueue
+ * in the LOCK ioctl, the watchdog timer is started.
+ * When the UNLOCK ioctl is called, the timer is stopped.
+ *
+ * When the watchdog timer expires, the process holding the lock
+ * is killed. Privileged clients set lock.dontbreak and are exempt
+ * from this rule.
+ */
+static void drm_lock_watchdog( unsigned long __data )
+{
+	drm_device_t *dev = (drm_device_t *)__data;
+	drm_file_t *priv;
+	
+	if ( !dev-lock.filp ) {
+		DRM_DEBUG( held by kernel\n );
+		return;
+	}
+	
+	if ( dev-lock.dontbreak ) {
+		DRM_DEBUG( privileged lock\n );
+		return;
+	}
+	
+	priv = dev-lock.filp-private_data;
+	DRM_DEBUG( Kill pid=%d\n, priv-pid );
+	
+	kill_proc( priv-pid, SIGKILL, 1 );
+}
+
+
 /** 
  * Lock ioctl.
  *
@@ -1115,6 +1163,7 @@
 DECLARE_WAITQUEUE( entry, current );
 drm_lock_t lock;
 int ret = 0;
+int privileged = capable( CAP_SYS_ADMIN );
 #if __HAVE_MULTIPLE_DMA_QUEUES
 	drm_queue_t *q;
 #endif
@@ -1157,6 +1206,7 @@
 }
 if ( DRM(lock_take)( dev-lock.hw_lock-lock,
 	 lock.context ) ) {
+

R300: Recovering from lockups

2004-05-25 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

As you may be aware, I was trying to get R300 support into a state where it 
is possible to start OpenGL applications, let them hang the CP and *not* 
bring  down the entire machine.

Looks like I was successful :)

The attached patch ati.unlock.1.patch against the DDX makes sure the RBBM 
(whatever that means; I'm guessing Ring Buffer something or other) is reset 
in RADEONEngineReset(), before any other register is accessed that could 
potentially cause a final crash (DSTCACHE_* is the major offender in this 
category).

Now since I don't have any Radeon-related documentation at all, I have no 
idea whether this patch will work on any other chip. For all that I know, 
it might totally break the driver on R100/R200. I'm especially confused by 
the fact that the bottom half of EngineReset() treats RBBM_SOFT_RESET 
differently for the R300. Can anybody explain why?
Maybe it would even be safest/cleanest to move the entire RBBM_SOFT_RESET 
block to the top of the function?

I can now launch glxgears several times in a row. It will be killed a few 
seconds later (during this time the GUI will hang), and as far as I can 
tell, everything continues to work normally.
Of course, for all I know the 3D part of the chip might still be wedged 
internally, which would make this patch (partially) useless for working on 
the driver. I guess I'll find out soon enough.

Important: You'll need my watchdog patch for the DRM from that other thread. 
Otherwise, the reset code in the X server will never be called, and this 
patch will have no effect.

I would also like to point out that the modified xf86 driver that was posted 
on this list (see http://volodya-project.sourceforge.net/R300.php)  does 
not check the version of the DRM. I know, this is really a silly, minor 
point to make at this time, but I've attached a small patch to fix this 
anyway.

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAs6SssxPozBga0lwRAsKwAJ0eyDj01OjMybqe18du3Qs06peOSACaAkVL
B9hN0+nizrYWhM6/nXcf6uE=
=4tp0
-END PGP SIGNATURE-
diff -ur -x '*.o' ati-vladimir/radeon_accel.c ati/radeon_accel.c
--- ati-vladimir/radeon_accel.c	2004-05-20 16:02:24.0 +0200
+++ ati/radeon_accel.c	2004-05-25 21:14:24.0 +0200
@@ -170,6 +170,31 @@
 CARD32 rbbm_soft_reset;
 CARD32 host_path_cntl;
 
+/* The following RBBM_SOFT_RESET sequence can help un-wedge
+ * an R300 after the command processor got stuck.
+ */
+rbbm_soft_reset = INREG(RADEON_RBBM_SOFT_RESET);
+OUTREG(RADEON_RBBM_SOFT_RESET, (rbbm_soft_reset |
+RADEON_SOFT_RESET_CP |
+RADEON_SOFT_RESET_HI |
+RADEON_SOFT_RESET_SE |
+RADEON_SOFT_RESET_RE |
+RADEON_SOFT_RESET_PP |
+RADEON_SOFT_RESET_E2 |
+RADEON_SOFT_RESET_RB));
+INREG(RADEON_RBBM_SOFT_RESET);
+OUTREG(RADEON_RBBM_SOFT_RESET, (rbbm_soft_reset  (CARD32)
+~(RADEON_SOFT_RESET_CP |
+  RADEON_SOFT_RESET_HI |
+  RADEON_SOFT_RESET_SE |
+  RADEON_SOFT_RESET_RE |
+  RADEON_SOFT_RESET_PP |
+  RADEON_SOFT_RESET_E2 |
+  RADEON_SOFT_RESET_RB)));
+INREG(RADEON_RBBM_SOFT_RESET);
+OUTREG(RADEON_RBBM_SOFT_RESET, rbbm_soft_reset);
+INREG(RADEON_RBBM_SOFT_RESET);
+
 RADEONEngineFlush(pScrn);
 
 clock_cntl_index = INREG(RADEON_CLOCK_CNTL_INDEX);
diff -ur -x '*.o' ati-vladimir/radeon_accelfuncs.c ati/radeon_accelfuncs.c
--- ati-vladimir/radeon_accelfuncs.c	2004-05-20 16:02:24.0 +0200
+++ ati/radeon_accelfuncs.c	2004-05-25 21:13:37.0 +0200
@@ -122,7 +122,7 @@
 		xf86DrvMsg(pScrn-scrnIndex, X_ERROR,
 			   %s: CP idle %d\n, __FUNCTION__, ret);
 		}
-	} while ((ret == -EBUSY)  (i++  RADEON_TIMEOUT));
+	} while ((ret == -EBUSY)  (i++  RADEON_TIMEOUT/1)); /* the ioctl has an internal delay */
 
 	if (ret == 0) return;
 
--- ati-vladimir/radeon_dri.c	2004-05-20 16:02:24.0 +0200
+++ ati/radeon_dri.c	2004-05-20 16:13:47.0 +0200
@@ -1369,6 +1369,9 @@
 	if (info-IsIGP) {
 	req_minor = 10;
 	req_patch = 0;
+	} else if (info-ChipFamily = CHIP_FAMILY_R300) {
+	req_minor = 11;
+	req_patch = 1;
 	} else if (info-ChipFamily = CHIP_FAMILY_R200) {
 	req_minor = 5;
 	req_patch = 0;	


[patch] Re: Some questions regarding locks

2004-05-23 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Saturday 22 May 2004 16:04, Michel Dnzer wrote:
 On Sat, 2004-05-22 at 14:04, Nicolai Haehnle wrote:
  It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking 
  whether the caller actually holds the global lock. There is no 
  LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer 
has 
  no check in it either.
  Did I miss something, or is this intended behaviour? It certainly seems 
  strange to me.
 
 True. Note that the lock ioctls are only used on contention, but still.

Unless I'm mistaken, DRM(lock) is always called when a client wants the lock 
for the first time (or when it needs to re-grab after it lost the lock). 
This is necessary because the DRM makes sure that dev-lock.filp matches 
the calling file. Afterwards, the ioctls are only used on contention.
The entire locking can be subverted anyway, because part of the lock is in 
userspace. I believe the important thing is to make sure that the X server 
can force a return into a sane locking state.

  Side question: Is killing the offending DRI client enough? When the 
process 
  is killed, the /dev/drm fd is closed, which should automatically release 
  the lock. On the other hand, I'm pretty sure that we can't just kill a 
  process immediately (unfortunately, I'm not familiar with process 
handling 
  in the kernel). What if, for some reason, the process is in a state 
where 
  it can't be killed yet?
 
 We're screwed? :)

Looks like it...

 This sounds like an idea for you to play with, but I'm afraid it won't
 be useful very often in my experience:
 
   * getting rid of the offending client doesn't help with a wedged
 chip (some way to recover from that would be nice...)
   * it doesn't help if the X server itself spins with the lock held

You were right, of course, while I show my lack of experience with driver 
writing. In my case I can get the X server's reset code to run, but some 
way through the reset the machine finally locks up completely (no more 
networking, no more disk I/O).

I'm curious though, how can a complete lockup like this be caused by the 
graphics card? My guess would be that it grabs the PCI/AGP bus forever for 
some reason (the dark side of bus mastering, so to speak). Is there 
anything else that could be the cause?

  Side question #2: Is it safe to release the DRM lock in the watchdog? 
There 
  might be races where the offending DRI client is currently executing a 
DRM 
  ioctl when the watchdog fires.
 
 Not sure, but this might not be a problem when just killing the
 offending process?

You're right.
On the other hand, it might sometimes be useful to be a little bit nicer to 
the offending process (see point 4 below).

I had a go at implementing my watchdog idea for Linux, see the attached 
patch. It basically works, but I couldn't test it on a system where the DRI 
actually works without locking up... *sigh*

Now for some notes:
1. This only affects the DRM for Linux. I don't have an installation of BSD, 
and while I know a little bit about the Linux kernel, I don't know anything 
about the BSD kernel(s).

2. The timeout cannot be configured yet. I didn't find prior art as to how 
something like it should be configured, so I'm open for input. For a Linux 
driver, adding to the /proc entries seems to be the logical way to go, but 
the DRI is very ioctl-centric. Maybe both?

3. Privileged processes may take the hardware lock for an infinite amount of 
time. This is necessary because the X server holds the lock when VT is 
switched away. 
Currently, privileged means capable(CAP_SYS_ADMIN). I would prefer if it 
meant the multiplexing controller process, i.e. the one that 
authenticates other processes. Unfortunately, this distinction isn't made 
anywhere in the DRM as far as I can see. This means that runaway DRI 
clients owned by root aren't killed by the watchdog, either.

4. Keith mentioned single-stepping through a driver, and he does have a 
point. Unfortunately, I also believe that it's not that simple.
Suppose an application developer debugs a windowed OpenGL application, on 
the local machine, without a dual-head setup. It may sound like a naive 
thing to do, but this actually works on Windows (yes, Windows is *a lot* 
more stable than Linux/BSD in that respect).
Now suppose she's got a bug in her application (e.g. bad vertex array) that 
triggers a segmentation fault inside the GL driver, while the hardware lock 
is held. GDB will catch that signal, so the process won't die, which in 
turn means that the lock is not released. Thus the developer's machine 
locks up unless the watchdog kicks in (of course, the watchdog in its 
current form will also frustrate her to no end).

cu,
Nicolai

 
 -- 
 Earthling Michel Dnzer  | Debian (powerpc), X and DRI developer
 Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux

Some questions regarding locks

2004-05-22 Thread Nicolai Haehnle
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

It seems to me as if DRM(unlock) in drm_drv.h unlocks without checking 
whether the caller actually holds the global lock. There is no 
LOCK_TEST_WITH_RETURN or similar, and the helper function lock_transfer has 
no check in it either.
Did I miss something, or is this intended behaviour? It certainly seems 
strange to me.

Also, it is possible for a DRI client to effectively lock up the entire 
machine simply by entering an endless loop after taking the lock. I suppose 
one could still log in remotely and kill the offending process, but that's 
not a realistic option for most people. Switching to a different VT or 
killing the X server does not work, because the X server has to take the 
DRI lock in the process.

This is a problem that I want to fix (it makes playing around with the R300 
hack Vladimir Dergachev posted an infinite-rebooting nightmare), but I am 
unsure what the best solution would be.

As far as I can see, the problem is two-fold: One, the X server must be able 
to break the lock, and two, it (or the DRM) must somehow disable the 
offending DRI client to prevent the problem from reoccurring.

I think the simplest solution would look something like this:
Whenever DRM(lock) is called by a privileged client (i.e. the X server), and 
it needs to sleep because the lock is held by an unprivileged client, a 
watchdog timer is started before we schedule. DRM(unlock) unconditionally 
stops this watchdog timer.
When the watchdog timer fires, it releases the lock and/or kills the 
offending DRI client.

Side question: Is killing the offending DRI client enough? When the process 
is killed, the /dev/drm fd is closed, which should automatically release 
the lock. On the other hand, I'm pretty sure that we can't just kill a 
process immediately (unfortunately, I'm not familiar with process handling 
in the kernel). What if, for some reason, the process is in a state where 
it can't be killed yet? I guess this isn't a problem when we're dealing 
with a faulty 3D driver, but it might be a problem when dealing with 
malicious code.

Side question #2: Is it safe to release the DRM lock in the watchdog? There 
might be races where the offending DRI client is currently executing a DRM 
ioctl when the watchdog fires.

This solution involves no ABI changes. Since all changes are kernel side and 
affect only code that is shared between all drivers, everybody would 
benefit immediately.
Does this all look reasonable to the DRI gurus?

cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAr0G+sxPozBga0lwRAmqlAJ9fzmVB1t5D5Hqna3QoGD4zwv1suwCgqyZ1
tWYeGKKr22zwJuR3WNsFzjc=
=M43m
-END PGP SIGNATURE-


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149alloc_id66op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Dri-devel] Typo in drm/drm_drv.h or drmP.h?

2003-08-14 Thread Nicolai Haehnle
Hi,

browsing the DRI source, I stumbled upon this. I have absolutely no idea 
what it does, but this just doesn't look right (drm_drv.h:317)

#ifdef __HAVE_COUNTER15
dev-types[14] = __HAVE_COUNTER14;
#endif

Looks like this should be 15, i.e. 

#ifdef __HAVE_COUNTER15
dev-types[15] = __HAVE_COUNTER15;
#endif

However, in drmP.h, dev-types is defined to have only 15 fields, so 
dev-types[15] would be out of bounds.

Looks like either those three lines should be removed, or the lines should 
be changed like above, and struct drm_device in drmP.h should be changed 
appropriately.

cu,
Nicolai



---
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa0013ave/direct;at.aspnet_072303_01/01
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel