On Sun, 18 Mar 2012, Juan Jose Garcia-Ripoll wrote:

> The current philosophy is as follows:
> * The lock is a word-sized field in a structure, atomically updated by the 
> locking thread.
> * The no-wait locking amounts to trying to reset the field atomically (using 
> libatomics), and returning NIL if the field is already set.
> * The wait mechanism consits on three phases
> ++ try to acquire the lock once
> ++ enter ecl_wait_on() which will
> +++ first perform a brief spin lock
> +++ then enter a look with increasing wait times
> * The unlocking routine is now designed to awake at least one thread. This is 
> done by issuing an interrupt that will disrupt the waiting loop from 
> ecl_wait_on().
> 
> As discussed earlier, the advantage of this user space implementation is that 
> we have full control of the status of the lock even when interrupts occur.

... and later in the thread you mention reimplementing this with a FIFO 
queue and adding logic for threads to place themselves on the end of the 
queue.

I'm no expert, but your approach above matches the pattern I use for 
portable code.  On linux, I've started using the futex API directly.

The futex philosophy is roughly
- use a word-sized field with atomic operations
- when contention is detected, invoke the kernel with parameters for the
   expected value and what to do if the kernel sees it
   - block this thread
   - wake waiting threads
   - wake some waiting threads
   - etc.
- the kernel has a fancy algorithm to dynamically manage waiting lists
   in a known location based on the lock word address

One phrase they use is the "thundering herd".  If a large number of 
threads are waiting on a single lock, and you wake all the threads, then 
they will all compete for the lock, and all but one will be blocked... 
The futex API has a strategy or two to address this, but it is unclear to 
me how successful it is in practice.


> Time-wise, it seems to be slower than the pthreads implementation, reaching 
> 400 connections / second or 2.5 ms / connection in Matthew's tests, vs. 1.5 
> ms / connection for pthreads (ECL 12.1.1). On the other hand, I have not seen
> any segfaults and it still puzzles me the difference: using a profiler I do 
> not see any more time being spent in the waiting loop (it is 0.4% of the 
> total time)

Kernel-based approaches have the benefit of tighter integration with the 
scheduler.  To me, the futex API looks about right, and you are emulating 
the kernel-side of a futex by polling and sending signals in userspace. 
The NPTL now uses futexes on linux; so its actually getting hard to 
outperform pthreads on that platform.  Occasionally a polling loop still 
beats other kernel calls; but I think that is mostly in cases where 
contention is very short and you don't mind pegging the CPU while waiting. 
(or in cases where polling avoids combining heavy constructs like a mutex 
and a condition variable)

I don't know of any portable futex emulation.  SBCL had their lutexes for 
a while, but I don't think they were fully debugged.  I believe these are 
now being replaced by safe points.  The BSDs have linux emulation; you 
might find some good ideas in their documentation.

e.g. 
http://www.freebsd.org/doc/en/articles/linux-emulation/mi.html#FUTEXES


Later,
Daniel

P.S.  Just found this.  Might be useful.  Hopefully faster than strace and 
friends.
http://nptltracetool.sourceforge.net/

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Ecls-list mailing list
Ecls-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ecls-list

Reply via email to