Am 20.04.2013 09:27, schrieb Mathias Fröhlich:
Hi Tom,

May be I need to tell where the problem really appears in real life.
OpenSceneGraph has some nifty features regarding multi channel rendering.
Assume a setup of multiple full screen views running on different graphics
boards into a single mashine composing a view into a single scene.
Now the recommended way to do this with osg is to set up a X screen per
graphics board. Even if this spans multiple monitors/projectors. Set up a GL
graphics context per graphics board and set up a viewport per projector in the
graphics contexts. Rendering happens now in parallel for each graphics
context. I do drive such a thing here with two radeons and three monitors for
testing and here the problem appears.

When I start the favourite flight simulation application of my choice with this
setup, then it crashes almost immediately without llvm_start_multithreaded
being called. Wheres it works stable if we ensure llvm being multithreaded.

So, I tried to distill a piglit testcase out of this somehow huger setup with
flightgear, OpenSceneGraph, multiple gpu's and what not.

On Friday, April 19, 2013 20:08:54 Tom Stellard wrote:
On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote:
Tom,

-class LLVMEnsureMultithreaded {
-public:
-   LLVMEnsureMultithreaded()
-   {
-      llvm_start_multithreaded();
-   }
-};
-
-static LLVMEnsureMultithreaded lLVMEnsureMultithreaded;
Removing this leads to crashes in llvm with applications that concurrently
work on different gl contexts.
The test you wrote still passes with this patch.  Do you see that
we are now calling the C API version of llvm_start_multithreaded(),
LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a
static variable?
Oh, no I did not see this. I did not realize that the llvm_start_multithreaded
call is not just plain C. So I thought grepping for the call I used is
sufficient.

But negative. If I really apply your patch and try to run this with the above
setup I get the crashes. The same with the piglit test here.

Too bad, that reproducing races is racy for itself.
With the piglit test I get about 2/3 of the runs either glibc memory
corruption aborts. Or one of the below asserts from llvm:

bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode &&
"Already multithreaded!"' failed.

void
llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*):
Assertion `I != Impl->Listeners.end() && "PassRegistrationListener not
registered!"' failed.

bool llvm::sys::SmartMutex<mt_only>::release() [with bool mt_only = true]:
Assertion `((recursive && acquired) || (acquired == 1)) && "Lock not acquired
before release!"' failed.

So the biggest problem IIRC was that use of llvm::sys::SmartMutex<mt_only>
which is spread around here and there in llvm. The pass registry was (is?) one
of the users for that. If you did not tell llvm to run multithreaded these
locks get noops and you concurrently access containers and that ...

Looking at the first assert, the llvm guys have made this problem even worse
IMO since I looked at this before. We need to check for multithreading being
enabled before trying to set this. Both of which being racy for itself in this
way and all of them not being save against already happening llvm access from
an other thread and an other foreign use.

Sorry about that. I didn't have piglit commit access at the time, and
I forgot about the patch.  I fixed a few things and sent v3 to the list.
The same here. Thanks for this.

Regarding the point where this funciton is called I had choosen static
initialization time since llvm requires this function to be called single
threaded which we cannot guarantee in any case. Keep in mind that you need
to ensure this function called non concurrently even against applications
that itself already use the llvm libs in some way while the driver is
loaded. But the best bet is to do this in the dynamic loder which is
itself serialized, so I could avoid calling this function concurrently by
initialization of different contexts. That should at least shield against
applications that itself do the same trick by calling this funtion in the
dlopen phase in some static initializer ...
We may get around part of this problem with dlopening the driver with
better isolation but up to now the problem can get that far.
This is a tricky problem, and I'm not sure that radeon_llvm_compile() is
the best place to call llvm_start_multithreaded().  Maybe it would be
better to move this into gallivm, because this problem could affect any
driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g,
radeonsi, and i915g.  What do you think?
Yep, an other place would be better.

I do not know the llvm tools well enough, but If I move the current c++ code
into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM).
Seriously, I know of one guy who wants to use llvmpipe with windows and he
would benefit from the c++ solution since the gcc extension solution is clearly
not available with the cl windows build then. And with the lack of a windows
test system I do also not start implementing DllMain.
That could be a short term solution ...


Appart from that, I mentioned better symbol isolation somehow.

How would this improove the situation with the initialization?
If we would know that this current driver module is the only user of the this
particular llvm instance, then we could really place a mutex somewhere and
serialize the call to lvm::llvm_start_multithreaded() by ourselves as we then
know that we are the only user. This would then happen before the first use of
llvm by the driver module in question. No dlopen time static initializers that
usually lead to ordering problems earlier or later ...

So, I have a patch set floating around here for about half a year that changes
the RTLD_GLOBAL arguments to dlopen to RTLD_LOCAL|RTLD_DEEPBIND which should
(? I still have to definitely investigate this) provide us the private copy of
the used libraries/modules. Also up to now I did not take the time to really
analyze if we rely on some globals being really shared between driver modules
or different types of drivers that we then also have multiple times in memory.
Note that this does *not* affect the shared glapi topics as they are all above
the loaded driver modules at the level of libGL.so.1.
So for example I can think of some need for a single libdrm_radeon to make cl-
gl buffer sharing work for example? Do we?

Alternatively, you might be able to pack every use of llvm at least in radeon
statically into libllvmradeon and again hide all non driver api used stuff from
the outside. Then you would also have your private llvm instance that you can
initialize with your own lock to serialize. That is then more a radeon alone
solution as the other drivers need to do something similar themselves then.
And I realize that your recent code move work could point into this direction
at least for the radeon type drivers.

Both symbol isolation methods will also make the driver used llvm version
independent of any possibly application used version that can lead to
incompatible symbol clashes. So you gain even more with that.

Opinions?

Hi Tom & Mathias,

completely agree with Mathias here. I also suggested on IRC a couple of weeks ago that libllvmradeon should definitely be static and hide all internal symbols and only export those needed by the drivers. That should isolate us mostly from the mess that comes with having multiple LLVM instances (and probably also different versions) around at the same time.

That would probably also remove the need to call llvm_start_multithreaded from any library load routine, since then we can control the order of initialization.

I would take a look into it myself, but I'm a bit busy with UVD right now.

Christian.

Mathias
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to