On Mon, 2004-09-20 at 15:00, Roland Scheidegger wrote: > Eric Anholt wrote: > > The attached patch removes the mandatory emits of all state which were > > happening after each cmdbuf flush. Instead, we set a flag after a > > cmdbuf flush saying "save the state at the next unlock," which means > > memcpying the state atoms off. When we actually see the context get > > lost, then we "back up" and restore state -- make a new cmdbuf, dirty > > all state, emit it, flush it, then put the old cmdbuf back. > I like it ;-). I thought the locking really to be inefficient (but never > did anything against it...).
It was that state emit to ensure lost-context recovery that was the real killer. While working on it, I thought, "Man, all of this lock/unlocking going on has to have bad effects on performance." So I made (UN)LOCK_HARDWARE into inlines, and they were only around .01% of CPU time according to oprofile. So I'm not too worried about locking. One thing that I had talked about with Keith when working on the race fixes was the possibility of making the DRI locks recursive. This would let us be lazier about coding sometimes, but would also make DRI modules integration into the server (for hardware indirect) easier. While recursive acquires are more expensive, it looks like the locking costs aren't really an issue. > > I also > > removed the dirty/clean state lists and made a single one. The > > reasoning was that we have to walk the entire list on emit (and twice > > when the all-dirty is set) anyway, and I felt that this was cleaner. > It was not always that inefficient in r200EmitState, only since the > fixed emit order was introduced (and still no one understands why the > fixed order is needed). Didn't seem to make a performance difference > though (profiling showed it really didn't use much cpu time). Yeah, it was clear that we used to emit in whatever order, and that would have been nicer. At this point I'm seeing about 5% CPU time in EmitState for ipers, which seems pretty hefty for such a small bit of code, but I didn't see much obvious for improvement. Fixed order (at least within limits) being required certainly makes sense to me. I've found that docs sometimes say things like, "Writes to this register take no effect if bits X of register Y are not set to Z." It may be that those dependencies were just not recorded. > > This gets about a 5% speedup for me in ipers (which I wish was more > > accurate in its reporting), and doesn't touch glxgears. I didn't have > > any interesting apps besides glxgears handy to benchmark with. Any > > thoughts on this? If people think it's a good idea, I'll do it for > > radeon as well. > I certainly think it's a good idea. > However, I still think it should be possible to lock across multiple > buffers, to make it possible to emit larger numbers of vertices at once > (for instance for things like glDrawElements), which, as far as I > understand, just cannot work if you're restricted to one buffer. Multiples of which buffers are you talking about here? -- Eric Anholt [EMAIL PROTECTED] http://people.freebsd.org/~anholt/ [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php -- _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel