On Fri, 29 Nov 2002 13:20:45 +0000 José Fonseca <[EMAIL PROTECTED]> wrote:
[snip] > > Let'me ilustrate with an example. Image you have 1000 polygons to > process (i.e., transform, clip, build vertex buffers, and render). If > you have a SMP computer with 4 processors you can make use of > parallelism in at least two ways: > > a) Have a processor do the transform of the 1000 polygons, another do > the clipping, ... etc. > > b) Have a processor do the transform+clipping+... of 250 polygons, have > another do the same for another 250 polygons, ... etc. Ok, now I get the point. > > > should partition a stage for each thread (as you're suggesting), or if > > > there is enough data running through the pipeline making it worth to > > > have to a whole pipeline for each thread, each processing a part of the > > > primitives. > > > > Having one thread for each pipeline stage was my first idea. The > > alternative approach I tried to explain could be implemented in a way > > that, once data is processed by a free thread, it runs through all > > remaining pipeline stages in that same thread. Only, if no free thread > > is available, processing starts on the main thread. > > I understood your second proposal, but you still have one thread doing a > a pipeline stage processing all the data (case a) above). Sorry, my fault. I should have read more carefully. The chromium approach was new to me. I guess it took me a while to digest. > > Still semaphores have to be used to synchronize the threads (including > > the main thread) so that data packets cannot "overtake" each other. In the > > end, the drawing has to occur in the same order as data was fed into the > > pipeline. > > > > > The later is the approach taken e.g., in chromium > > > http://chromium.sourceforge.net , but actually I don't know if for any > > > application besides scientific visualization a pipeline handles so many > > > primitives at a time. For applications such as games, state changes (like > > > texture changes) seem to happen too often for that. > > > > I just had a look at their web page. They take a very different approach > > to parallelizing the rendering task. They are targeting clusters, > > not SMP systems. > > Why do you dismiss it so quickly? Have you seen I wasn't dismissing it. It's just what they say on their web page: <quote> Chromium is a system for interactive rendering on clusters of workstations. </quote> > http://www.cs.virginia.edu/~humper/chromium_documentation/threadedapplication.html > ? Not yet. I'll read it. > There is nothing in their approach specific to clusters. Using approach > b) yeilds much better parallelism than a) because each thread can work > independently of each other, and therefore there is less waits/lock > contentions/etc. If they want it to run on a cluster they cannot use shared memory. Also they have to optimize their system to not suffer too much from network latencies. Sure, their system will also work and yield a speedup on a single SMP system. But my approach is limited to single SMP systems. So in a way, my approach is more limited, if that's what was bothering you ;-) > Nevertheless, it isn't worth if you're processing 50 polygons at at > time. Still if you have 50 polygons in blue, 50 in red, and 50 in > texture A, and you run all them seperately in different processors will > probably still have better parallelism. On an SMP system my system could exploit this. It would render, say the blue polygons in a free thread while in the main thread the OpenGL client continues queuing up the red polygons. When the red polygons are to be run through the pipeline and the blue ones aren't finished yet, they would be rendered in the main thread in parallel to the blue polygons. If you have more threads, you can have more vertex bufferes processed in parallel in this way. If the synchronization is implemented as unrestrictive as possible the only reason to wait should be to force the drawing stages to occur in the correct order. > I'm not sure which approach will give better speedups, but this has to > be considered. Perhaps it would be a good idea to talk with the Chromium > guys to know what the order of speedups they achieve in SMP systems on > usual applications. My approach was basically inspired by the fact that there is something in mesa that is called "pipeline". So I thought, why not implement it like a real pipeline. If we really want to parallelize MESA, then we should consider all options. I'm probably biased towards my proposal ;-) Best regards, Felix __\|/__ ___ ___ ___ __Tschüß_______\_6 6_/___/__ \___/__ \___/___\___You can do anything,___ _____Felix_______\Ä/\ \_____\ \_____\ \______U___just not everything____ [EMAIL PROTECTED] >o<__/ \___/ \___/ at the same time! ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel