On Sun, 18 Mar 2012, Juan Jose Garcia-Ripoll wrote: > The current philosophy is as follows: > * The lock is a word-sized field in a structure, atomically updated by the > locking thread. > * The no-wait locking amounts to trying to reset the field atomically (using > libatomics), and returning NIL if the field is already set. > * The wait mechanism consits on three phases > ++ try to acquire the lock once > ++ enter ecl_wait_on() which will > +++ first perform a brief spin lock > +++ then enter a look with increasing wait times > * The unlocking routine is now designed to awake at least one thread. This is > done by issuing an interrupt that will disrupt the waiting loop from > ecl_wait_on(). > > As discussed earlier, the advantage of this user space implementation is that > we have full control of the status of the lock even when interrupts occur.
... and later in the thread you mention reimplementing this with a FIFO queue and adding logic for threads to place themselves on the end of the queue. I'm no expert, but your approach above matches the pattern I use for portable code. On linux, I've started using the futex API directly. The futex philosophy is roughly - use a word-sized field with atomic operations - when contention is detected, invoke the kernel with parameters for the expected value and what to do if the kernel sees it - block this thread - wake waiting threads - wake some waiting threads - etc. - the kernel has a fancy algorithm to dynamically manage waiting lists in a known location based on the lock word address One phrase they use is the "thundering herd". If a large number of threads are waiting on a single lock, and you wake all the threads, then they will all compete for the lock, and all but one will be blocked... The futex API has a strategy or two to address this, but it is unclear to me how successful it is in practice. > Time-wise, it seems to be slower than the pthreads implementation, reaching > 400 connections / second or 2.5 ms / connection in Matthew's tests, vs. 1.5 > ms / connection for pthreads (ECL 12.1.1). On the other hand, I have not seen > any segfaults and it still puzzles me the difference: using a profiler I do > not see any more time being spent in the waiting loop (it is 0.4% of the > total time) Kernel-based approaches have the benefit of tighter integration with the scheduler. To me, the futex API looks about right, and you are emulating the kernel-side of a futex by polling and sending signals in userspace. The NPTL now uses futexes on linux; so its actually getting hard to outperform pthreads on that platform. Occasionally a polling loop still beats other kernel calls; but I think that is mostly in cases where contention is very short and you don't mind pegging the CPU while waiting. (or in cases where polling avoids combining heavy constructs like a mutex and a condition variable) I don't know of any portable futex emulation. SBCL had their lutexes for a while, but I don't think they were fully debugged. I believe these are now being replaced by safe points. The BSDs have linux emulation; you might find some good ideas in their documentation. e.g. http://www.freebsd.org/doc/en/articles/linux-emulation/mi.html#FUTEXES Later, Daniel P.S. Just found this. Might be useful. Hopefully faster than strace and friends. http://nptltracetool.sourceforge.net/ ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Ecls-list mailing list Ecls-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ecls-list