On Fri, 29 Nov 2002 13:20:45 +0000
José Fonseca <[EMAIL PROTECTED]> wrote:

[snip]
> 
> Let'me ilustrate with an example. Image you have 1000 polygons to
> process (i.e., transform, clip, build vertex buffers, and render). If
> you have a SMP computer with 4 processors you can make use of
> parallelism in at least two ways:
> 
> a) Have a processor do the transform of the 1000 polygons, another do
> the clipping, ... etc.
> 
> b) Have a processor do the transform+clipping+... of 250 polygons, have
> another do the same for another 250 polygons, ... etc.

Ok, now I get the point.

> > > should partition a stage for each thread (as you're suggesting), or if
> > > there is enough data running through the pipeline making it worth to
> > > have to a whole pipeline for each thread, each processing a part of the
> > > primitives.
> > 
> > Having one thread for each pipeline stage was my first idea. The
> > alternative approach I tried to explain could be implemented in a way
> > that, once data is processed by a free thread, it runs through all
> > remaining pipeline stages in that same thread. Only, if no free thread
> > is available, processing starts on the main thread.
> 
> I understood your second proposal, but you still have one thread doing a
> a pipeline stage processing all the data (case a) above).

Sorry, my fault. I should have read more carefully. The chromium
approach was new to me. I guess it took me a while to digest.

> > Still semaphores have to be used to synchronize the threads (including
> > the main thread) so that data packets cannot "overtake" each other. In the
> > end, the drawing has to occur in the same order as data was fed into the
> > pipeline.
> > 
> > > The later is the approach taken e.g., in chromium
> > > http://chromium.sourceforge.net , but actually I don't know if for any
> > > application besides scientific visualization a pipeline handles so many
> > > primitives at a time. For applications such as games, state changes (like
> > > texture changes) seem to happen too often for that.
> > 
> > I just had a look at their web page. They take a very different approach
> > to parallelizing the rendering task. They are targeting clusters,
> > not SMP systems.
> 
> Why do you dismiss it so quickly? Have you seen

I wasn't dismissing it. It's just what they say on their web page:
<quote>
Chromium is a system for interactive rendering on clusters of workstations.
</quote>

> http://www.cs.virginia.edu/~humper/chromium_documentation/threadedapplication.html
> ?

Not yet. I'll read it.

> There is nothing in their approach specific to clusters. Using approach
> b) yeilds much better parallelism than a) because each thread can work
> independently of each other, and therefore there is less waits/lock
> contentions/etc.

If they want it to run on a cluster they cannot use shared memory. Also
they have to optimize their system to not suffer too much from network
latencies. Sure, their system will also work and yield a speedup on a
single SMP system. But my approach is limited to single SMP systems. So
in a way, my approach is more limited, if that's what was bothering you
;-)

> Nevertheless, it isn't worth if you're processing 50 polygons at at
> time. Still if you have 50 polygons in blue, 50 in red, and 50 in
> texture A, and you run all them seperately in different processors will
> probably still have better parallelism.

On an SMP system my system could exploit this. It would render, say the
blue polygons in a free thread while in the main thread the OpenGL
client continues queuing up the red polygons. When the red polygons are
to be run through the pipeline and the blue ones aren't finished yet,
they would be rendered in the main thread in parallel to the blue
polygons. If you have more threads, you can have more vertex bufferes
processed in parallel in this way. If the synchronization is implemented
as unrestrictive as possible the only reason to wait should be to force
the drawing stages to occur in the correct order.

> I'm not sure which approach will give better speedups, but this has to
> be considered. Perhaps it would be a good idea to talk with the Chromium
> guys to know what the order of speedups they achieve in SMP systems on
> usual applications.

My approach was basically inspired by the fact that there is something
in mesa that is called "pipeline". So I thought, why not implement it
like a real pipeline. If we really want to parallelize MESA, then we
should consider all options. I'm probably biased towards my proposal ;-)

Best regards,
  Felix

               __\|/__    ___     ___     ___
__Tschüß_______\_6 6_/___/__ \___/__ \___/___\___You can do anything,___
_____Felix_______\Ä/\ \_____\ \_____\ \______U___just not everything____
  [EMAIL PROTECTED]    >o<__/   \___/   \___/        at the same time!


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to