On May 16, 2018, at 9:25 AM, Alexandre 
<alexandre.gauthier-foic...@inria.fr<mailto:alexandre.gauthier-foic...@inria.fr>>
 wrote:

Thank you for your answer.
I understand  I can control the number of threads and prevent them to be 
assigned to actual hardware threads.
Preventing oversubscription of the hardware threads is challenging when using 
OpenMP/TBB/OpenSWR in hybrid environments.

I am wondering if having N SWR contexts (where N correspond to the number of 
hardware threads) each single-threaded
is *good enough* (not too bad performances compared to a single SWR context 
that serially render the tasks).
Do you have a take on this ?
That might be do the trick.

A single threaded swr context would not give high performance; swr was 
architected to parallelize the pipeline stages and depends on multiple 
threads/cpus to deliver high performance.  Notably compared to llvmpipe we can 
parallelize the geometry frontend and thus achieve much higher throughput.


Similar oversubscription problems occur with all applications that use multiple 
threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions 
to prevent it besides re-writing code to use only 1 tech.


Yes, getting different threading libraries to agree can be tricky.  Does your 
application overlap heavy compute with graphics rendering?  If not, the 
oversubscription point might be moot.  One bit of advice we give to TBB library 
users is to initialize the TBB library before creating an OpenGL/SWR context.  
This allows TBB to size its thread pool to the entire machine, and then SWR 
will come in and create all its threads.  The other way round, SWR binds 
threads to cores, which TBB understands as unavailable resources resulting in a 
thread pool size of one.

If your concern is multiple SWR contexts running simultaneously and 
oversubscribing, it’s true that the swr thread pool creation is per-context and 
as Bruce says the only way to prevent that currently is setting the 
environmental variable to limit the number of worker threads.  This number 
should be greater than 1 for good performance, though.

-Tim

An alternative solution would be to have a callback mechanism in OpenSWR to 
launch a task on the application.

Cheers

Alex


On 16 May 2018, at 14:34, Cherniak, Bruce 
<bruce.chern...@intel.com<mailto:bruce.chern...@intel.com>> wrote:


On May 14, 2018, at 8:59 AM, Alexandre 
<alexandre.gauthier-foic...@inria.fr<mailto:alexandre.gauthier-foic...@inria.fr>>
 wrote:

Hello,

Sorry for the inconvenience if this message is not appropriate for this mailing 
list.

The following is a question for developers of the swr driver of gallium.

I am the main developer of a motion graphics application.
Our application internally has a dependency graph where each node may run 
concurrently.
We use OpenGL extensively in the implementation of the nodes (for example with 
Shadertoy).

Our application has 2 main requirements:
- A GPU backend, mainly for user interaction and fast results
- A CPU backend for batch rendering

Internally we use OSMesa for CPU backend so that our code is mostly identical 
for both GPU and CPU paths.
However when it comes to CPU, our application is heavily multi-threaded: each 
processing node can potentially run in parallel of others as a dependency graph.
We use Intel TBB to schedule the CPU threads.

For each actual hardware thread (not task) we allocate a new OSMesa context so 
that we can freely multi-thread operators rendering. It works fine with 
llvmpipe and also SWR so far (with a  patch to fix some static variables inside 
state_trackers/osmesa.c).

However with SWR using its own thread pool, I’m afraid of over-threading, 
introducing a bottleneck in threads scheduling
e.g: on a 32 cores processor, we already have lets say 24 threads busy on a TBB 
task on each core with 1 OSMesa context.
I looked at the code and all those concurrent OSMesa contexts will create a SWR 
context and each will try to initialise its own thread pool in CreateThreadPool 
in swr/rasterizer/core/api.cpp

Is there a way to have a single “static” thread-pool shared across all contexts 
?

There is not currently a way to create a single thread-pool shared across all 
contexts.  Each context creates unique worker threads.

However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS, 
that overrides the default thread allocation.
Setting this will limit the number of threads created by an OpenSWR context 
*and* prevent the threads from being bound to physical cores.

Please, give this a try.  By adjusting the value, you may find the optimal 
value for your situation.

Cheers,
Bruce

Thank you

Alexandre










_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org<mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to