On 2008-12-07 03:48:40 +0100, Sean Kelly <[EMAIL PROTECTED]> said:

Fawzi Mohamed wrote:
On 2008-12-06 17:13:34 +0100, Sean Kelly <[EMAIL PROTECTED]> said:

Fawzi Mohamed wrote:

a memory barrier would be needed, and atomic decrements, but I see that it is not portable...

It would also somewhat defeat the purpose of thread_needLock, since IMO this routine should be fast. If memory barriers are involved then it may as well simply use a mutex itself, and this is exactly what it's intended to avoid.

the memory barrier would be needed in the code that decrements the number of active threads, so that you are sure that no pending writes are still there, (that is the problem that you said brought you to switch to a multithreaded flag), not in the code of thread_needLock...

Not true. You would need an acquire barrier in thread_needLock. However, on x86 the point is probably moot since loads have acquire semantics anyway.

You would need a very good processor to reorder speculative loads before a function call and a branch. As far as I know even alpha did not do it. A volatile statement will probably be enough in all cases, but you are right that to be really correct a load barrier should be done, an even in a processor where this might matter the cost of it in the fast path will be basically 0 (so still better than a lock).


But again I would say that this optimization is not really worth it (as you also said it), even if it is relevant for GUI applications.

:-)


Sean


Reply via email to