On Thu, Mar 17, 2005 at 03:20:31PM +0000, Jamie Lokier wrote: > If you change futex_wait to be "atomic", and then have userspace locks > which _depend_ on that atomicity, it becomes impossible to wait on > multiple of those locks, or make poll-driven state machines which can > wait on those locks.
The futex man pages that have been around for years (certainly since mid 2002) certainly don't document FUTEX_WAIT as token passing operation, but as atomic operation: Say http://www.icewalkers.com/Linux/ManPages/futex-2.html FUTEX_WAIT This operation atomically verifies that the futex address still contains the value given, and sleeps awaiting FUTEX_WAKE on this futex address. If the timeout argument is non-NULL, its contents describe the maximum duration of the wait, which is infinite otherwise. For futex(4), this call is executed if decrementing the count gave a negative value (indi cating contention), and will sleep until another process releases the futex and executes the FUTEX_WAKE operation. RETURN VALUE FUTEX_WAIT Returns 0 if the process was woken by a FUTEX_WAKE call. In case of timeout, ETIMEDOUT is returned. If the futex was not equal to the expected value, the operation returns EWOULDBLOCK. Signals (or other spurious wakeups) cause FUTEX_WAIT to return EINTR. so there very well might be programs other than glibc that depend on this behaviour. Given that in most cases the race is not hit every day (after all, we have been living with it for several years), they probably wouldn't know there is a problem like that. > You can do userspace threading and simulate most blocking system calls > by making them non-blocking and using poll). Sure, but then you need to write your own locking as well and can just use the token passing property of futexes there. > It's not a _huge_ loss, but considering it's only Glibc which is > demanding this and futexes have another property, token-passing, which > Glibc could be using instead - why not use it? Because that requires requeue being done with the cv lock held, which means an extra context switch. > > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked > > inc_preempt_count(); > > ret = __copy_from_user_inatomic(dest, from, sizeof(int)); > > dec_preempt_count(); > > - preempt_check_resched(); > > > > return ret ? -EFAULT : 0; > > } > > inc_preempt_count() and dec_preempt_count() aren't needed, as > preemption is disabled by the queue spinlocks. So > get_futex_value_locked isn't needed any more: with the spinlocks held, > __get_user will do. They aren't needed if CONFIG_PREEMPT. But with !CONFIG_PREEMPT, they are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable}, which is a nop if !CONFIG_PREEMPT. __get_user can't be used though, it should be __get_user_inatomic (or __copy_from_user_inatomic if the former doesn't exist). > > [numerous instances of...] > > + preempt_check_resched(); > > Not required. The spin unlocks will do this. True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for CONFIG_PREEMPT spin_unlock will handle it. Will remove them from the patch. > > But with the recent changes to futex.c I think kernel can ensure > > atomicity for free. > > I agree it would probably not slow the kernel, but I would _strongly_ > prefer that Glibc were fixed to use the token-passing property, if > Glibc is the driving intention behind this patch - instead of this > becoming a semantic that application-level users of futex (like > database and IPC libraries) come to depend on and which can't be > decomposed into a multiple-waiting form. > > (I admit that the kernel code does look nicer with > get_futex_value_locked gone, though). > > By the way, do you know of Scott Snyder's recent work on fixing Glibc > in this way? He bumped into one of Glibc's currently broken corner > cases, fixed it (according to the algorithm I gave in November), and > reported that it works fine with the fix. I certainly haven't seen his patch. Jakub - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/