Re: [PATCH] FUTEX : new PRIVATE futexes
On Apr 6 2007 07:05, Hugh Dickins wrote: >On Fri, 6 Apr 2007, Nick Piggin wrote: >> >> I like |= for adding flags, it seems less ambiguous. But I guess that's >> a matter of opinion. Hugh seems to like +=, > >Do I? You probably have a shaming example in mind (PAGE_MAPPING_ANON? >that's a hybrid case where using + and - helped minimize the casting); >but in general I'd agree with you that it's |= for setting flag bits. > >Hmm, Eric's FUT_OFF_INODE is hybrid too, that might justify the += If a bit is already set, you can't set it again using +=. Jan -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On Thu, 2007-04-05 at 19:49 +0200, Eric Dumazet wrote: > Hi all > > I'm pleased to present this patch which improves linux futexes performance > and > scalability, merely avoiding taking mmap_sem rwlock. > > Ulrich agreed with the API and said glibc work could start as soon > as he gets a Fedora kernel with it :) > > Andrew, could we get this in mm as well ? This version is against > 2.6.21-rc5-mm4 > (so supports 64bit futexes) > > In this third version I dropped the NUMA optims and process private hash > table, > to let new API come in and be tested. Good work, Thanks! Acked-by: Peter Zijlstra <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
Eric Dumazet wrote: Nick Piggin a écrit : Did the whole thing just turn out neater when you passed the rwsem? We always know to use current->mm->mmap_sem, so it doesn't seem like a boolean flag would hurt? That's a good question current->mm->mmap_sem being calculated once is a win in itself, because current access is not cheap. It also does the memory access to go through part of the chain in advance, before its use. It does a prefetch() equivalent for free : If current->mm is not in CPU cache, CPU wont stall because next instructions dont depend on it. Fair enough. Current access I think should be cheap though (it is effectively a constant), but I guess it is still improvement. Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? If you check again, you'll see that address points to the start of the PAGE, not the real u32/u64 futex address. This checks the PAGE. We can use char, short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span two pages. Ah, that works. */ key->shared.inode = vma->vm_file->f_path.dentry->d_inode; -key->both.offset++; /* Bit 0 of offset indicates inode-based key. */ +key->both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma->vm_flags & VM_NONLINEAR))) { key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) Previous code was doing offset++ wich means offset += 1; But it doesn't mean you have to ;) @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us restart->arg1 = val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = (unsigned long)futex64; +if (shared) +restart->arg3 |= 2; Could you make this into a proper flags argument and use #define CONSTANTs for it? Yes, but I'm not sure it will improve readability. Well that bit of code alone is obviously unreadable. restart->arg3 = 0; if (futex64) restart->arg3 |= FUTEX_64; if (shared) restart->arg3 |= FUTEX_SHARED; Maybe a matter of taste. @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u struct timespec ts; ktime_t t, *tp = NULL; u64 val2 = 0; +int opm = op & FUTEX_CMD_MASK; What's opm stand for? I guess 'm' stands for 'mask' or 'masked' ? Why not call it cmd? (ie. what it is, rather than what you have done to derive it). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On Fri, 6 Apr 2007, Nick Piggin wrote: > > I like |= for adding flags, it seems less ambiguous. But I guess that's > a matter of opinion. Hugh seems to like +=, Do I? You probably have a shaming example in mind (PAGE_MAPPING_ANON? that's a hybrid case where using + and - helped minimize the casting); but in general I'd agree with you that it's |= for setting flag bits. Hmm, Eric's FUT_OFF_INODE is hybrid too, that might justify the += > and I can't argue with him about style issues ;) I feel a warm glow. Nobody ever called me a style guru before. Hugh p.s. Please don't interpret this as any useful contribution to reviewing Eric's futex work: seems sensible, but I've hardly looked. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On Fri, 6 Apr 2007, Nick Piggin wrote: I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, Do I? You probably have a shaming example in mind (PAGE_MAPPING_ANON? that's a hybrid case where using + and - helped minimize the casting); but in general I'd agree with you that it's |= for setting flag bits. Hmm, Eric's FUT_OFF_INODE is hybrid too, that might justify the += and I can't argue with him about style issues ;) I feel a warm glow. Nobody ever called me a style guru before. Hugh p.s. Please don't interpret this as any useful contribution to reviewing Eric's futex work: seems sensible, but I've hardly looked. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
Eric Dumazet wrote: Nick Piggin a écrit : Did the whole thing just turn out neater when you passed the rwsem? We always know to use current-mm-mmap_sem, so it doesn't seem like a boolean flag would hurt? That's a good question current-mm-mmap_sem being calculated once is a win in itself, because current access is not cheap. It also does the memory access to go through part of the chain in advance, before its use. It does a prefetch() equivalent for free : If current-mm is not in CPU cache, CPU wont stall because next instructions dont depend on it. Fair enough. Current access I think should be cheap though (it is effectively a constant), but I guess it is still improvement. Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? If you check again, you'll see that address points to the start of the PAGE, not the real u32/u64 futex address. This checks the PAGE. We can use char, short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span two pages. Ah, that works. */ key-shared.inode = vma-vm_file-f_path.dentry-d_inode; -key-both.offset++; /* Bit 0 of offset indicates inode-based key. */ +key-both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma-vm_flags VM_NONLINEAR))) { key-shared.pgoff = (((address - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) Previous code was doing offset++ wich means offset += 1; But it doesn't mean you have to ;) @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us restart-arg1 = val; restart-arg2 = (unsigned long)abs_time; restart-arg3 = (unsigned long)futex64; +if (shared) +restart-arg3 |= 2; Could you make this into a proper flags argument and use #define CONSTANTs for it? Yes, but I'm not sure it will improve readability. Well that bit of code alone is obviously unreadable. restart-arg3 = 0; if (futex64) restart-arg3 |= FUTEX_64; if (shared) restart-arg3 |= FUTEX_SHARED; Maybe a matter of taste. @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u struct timespec ts; ktime_t t, *tp = NULL; u64 val2 = 0; +int opm = op FUTEX_CMD_MASK; What's opm stand for? I guess 'm' stands for 'mask' or 'masked' ? Why not call it cmd? (ie. what it is, rather than what you have done to derive it). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On Thu, 2007-04-05 at 19:49 +0200, Eric Dumazet wrote: Hi all I'm pleased to present this patch which improves linux futexes performance and scalability, merely avoiding taking mmap_sem rwlock. Ulrich agreed with the API and said glibc work could start as soon as he gets a Fedora kernel with it :) Andrew, could we get this in mm as well ? This version is against 2.6.21-rc5-mm4 (so supports 64bit futexes) In this third version I dropped the NUMA optims and process private hash table, to let new API come in and be tested. Good work, Thanks! Acked-by: Peter Zijlstra [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On Apr 6 2007 07:05, Hugh Dickins wrote: On Fri, 6 Apr 2007, Nick Piggin wrote: I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, Do I? You probably have a shaming example in mind (PAGE_MAPPING_ANON? that's a hybrid case where using + and - helped minimize the casting); but in general I'd agree with you that it's |= for setting flag bits. Hmm, Eric's FUT_OFF_INODE is hybrid too, that might justify the += If a bit is already set, you can't set it again using +=. Jan -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
Nick Piggin a écrit : Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Hi Nick, thanks for reviewing. Eric Dumazet wrote: */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, +struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current->mm->mmap_sem, so it doesn't seem like a boolean flag would hurt? That's a good question current->mm->mmap_sem being calculated once is a win in itself, because current access is not cheap. It also does the memory access to go through part of the chain in advance, before its use. It does a prefetch() equivalent for free : If current->mm is not in CPU cache, CPU wont stall because next instructions dont depend on it. This means less CPU stall in case current->mm is not in CPU cache. Thats difficult to benchmark it, but you can trust me. A flag means : if (flag) up_read(>mm->mmap_sem) This generates quite a bad code. if (ptr) up_read(ptr) generates *much* better code. So this is a cleanup and a runtime optimization. I dit a similar optimization on commit 163da958ba5282cbf85e8b3dc08e4f51f8b01c5e I invite you to check it : http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=163da958ba5282cbf85e8b3dc08e4f51f8b01c5e { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key->both.offset; /* + * PROCESS_PRIVATE futexes are fast. + * As the mm cannot disappear under us and the 'key' only needs + * virtual address, we dont even have to find the underlying vma. + * Note : We do have to check 'address' is a valid user address, + *but access_ok() should be faster than find_vma() + * Note : At this point, address points to the start of page, + *not the real futex address, this is ok. + */ +if (!shared) { +if (!access_ok(VERIFY_WRITE, address, sizeof(int))) +return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? If you check again, you'll see that address points to the start of the PAGE, not the real u32/u64 futex address. This checks the PAGE. We can use char, short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span two pages. */ key->shared.inode = vma->vm_file->f_path.dentry->d_inode; -key->both.offset++; /* Bit 0 of offset indicates inode-based key. */ +key->both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma->vm_flags & VM_NONLINEAR))) { key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) Previous code was doing offset++ wich means offset += 1; I didnt want to hurt Hugh :) EXPORT_SYMBOL_GPL(drop_futex_key_refs); I wonder if it would be worthwhile inlining and likley()ing the private fastpath? Might make it pretty compact... I guess that's something to worry about after glibc gets support. Yes, in a future patch, in about one year :) + +if (!(vma = find_vma(mm, address)) || +vma->vm_start > address || !(vma->vm_flags & VM_WRITE)) +ret = -EFAULT; + +else +switch (handle_mm_fault(mm, vma, address, 1)) { +case VM_FAULT_MINOR: +current->min_flt++; +break; +case VM_FAULT_MAJOR: +current->maj_flt++; +break; +default: +ret = -EFAULT; +} +if (!shared) +up_read(>mmap_sem); +return ret; } /* You've got an extra space after the if (maybe for clarity?). In this situation I prefer putting braces around both the if and the else, and if you get rid of that blank line, it doesn't cost you anything more ;) Oh well... @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us restart->arg1 = val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = (unsigned long)futex64; +if (shared) +restart->arg3 |= 2; Could you make this into a proper flags argument and use #define CONSTANTs for it? Yes, but I'm not sure it will improve readability. @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u struct timespec ts; ktime_t t, *tp = NULL; u64 val2 = 0; +int opm = op & FUTEX_CMD_MASK; What's opm stand for?
Re: [PATCH] FUTEX : new PRIVATE futexes
Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Eric Dumazet wrote: Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- linux-2.6.21-rc5-mm4/kernel/futex.c +++ linux-2.6.21-rc5-mm4-ed/kernel/futex.c @@ -16,6 +16,9 @@ * Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]> * Copyright (C) 2006 Timesys Corp., Thomas Gleixner <[EMAIL PROTECTED]> * + * PRIVATE futexes by Eric Dumazet + * Copyright (C) 2007 Eric Dumazet <[EMAIL PROTECTED]> + * * Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly * enough at me, Linus for the original (flawed) idea, Matthew * Kirkwood for proof-of-concept implementation. @@ -199,9 +202,12 @@ static inline int match_futex(union fute * Returns: 0, or negative error code. * The key words are stored in *key on success. * - * Should be called with >mm->mmap_sem but NOT any spinlocks. + * shared is NULL for PROCESS_PRIVATE futexes + * For other futexes, it points to >mm->mmap_sem and + * caller must have taken the reader lock. but NOT any spinlocks. */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, + struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current->mm->mmap_sem, so it doesn't seem like a boolean flag would hurt? { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key->both.offset; /* +* PROCESS_PRIVATE futexes are fast. +* As the mm cannot disappear under us and the 'key' only needs +* virtual address, we dont even have to find the underlying vma. +* Note : We do have to check 'address' is a valid user address, +*but access_ok() should be faster than find_vma() +* Note : At this point, address points to the start of page, +*not the real futex address, this is ok. +*/ + if (!shared) { + if (!access_ok(VERIFY_WRITE, address, sizeof(int))) + return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? + key->private.mm = mm; + key->private.address = address; + return 0; + } + /* * The futex is hashed differently depending on whether * it's in a shared or private mapping. So check vma first. */ @@ -244,6 +266,7 @@ int get_futex_key(void __user *uaddr, un * mappings of _writable_ handles. */ if (likely(!(vma->vm_flags & VM_MAYSHARE))) { + key->both.offset += FUT_OFF_MMSHARED; /* reference taken on mm */ key->private.mm = mm; key->private.address = address; return 0; @@ -253,7 +276,7 @@ int get_futex_key(void __user *uaddr, un * Linear file mappings are also simple. */ key->shared.inode = vma->vm_file->f_path.dentry->d_inode; - key->both.offset++; /* Bit 0 of offset indicates inode-based key. */ + key->both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma->vm_flags & VM_NONLINEAR))) { key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) @@ -281,17 +304,19 @@ EXPORT_SYMBOL_GPL(get_futex_key); * Take a reference to the resource addressed by a key. * Can be called while holding spinlocks. * - * NOTE: mmap_sem MUST be held between get_futex_key() and calling this - * function, if it is called at all. mmap_sem keeps key->shared.inode valid. */ inline void get_futex_key_refs(union futex_key *key) { - if (key->both.ptr != 0) { - if (key->both.offset & 1) + if (key->both.ptr == 0) + return; + switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { + case FUT_OFF_INODE: atomic_inc(>shared.inode->i_count); - else + break; + case FUT_OFF_MMSHARED: atomic_inc(>private.mm->mm_count); - } + break; + } } EXPORT_SYMBOL_GPL(get_futex_key_refs); @@ -301,11 +326,15 @@ EXPORT_SYMBOL_GPL(get_futex_key_refs); */ void drop_futex_key_refs(union futex_key *key) { - if (key->both.ptr != 0) { -
Re: [PATCH] FUTEX : new PRIVATE futexes
On 4/5/07, Eric Dumazet <[EMAIL PROTECTED]> wrote: Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- include/linux/futex.h | 45 +- kernel/futex.c| 294 +--- 2 files changed, 230 insertions(+), 109 deletions(-) I cannot vouch for the whole code but the concept is very sound and definitely along the way the specification allows it: Acked-by: Ulrich Drepper <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
On 4/5/07, Eric Dumazet [EMAIL PROTECTED] wrote: Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- include/linux/futex.h | 45 +- kernel/futex.c| 294 +--- 2 files changed, 230 insertions(+), 109 deletions(-) I cannot vouch for the whole code but the concept is very sound and definitely along the way the specification allows it: Acked-by: Ulrich Drepper [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FUTEX : new PRIVATE futexes
Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Eric Dumazet wrote: Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.21-rc5-mm4/kernel/futex.c +++ linux-2.6.21-rc5-mm4-ed/kernel/futex.c @@ -16,6 +16,9 @@ * Copyright (C) 2006 Red Hat, Inc., Ingo Molnar [EMAIL PROTECTED] * Copyright (C) 2006 Timesys Corp., Thomas Gleixner [EMAIL PROTECTED] * + * PRIVATE futexes by Eric Dumazet + * Copyright (C) 2007 Eric Dumazet [EMAIL PROTECTED] + * * Thanks to Ben LaHaise for yelling hashed waitqueues loudly * enough at me, Linus for the original (flawed) idea, Matthew * Kirkwood for proof-of-concept implementation. @@ -199,9 +202,12 @@ static inline int match_futex(union fute * Returns: 0, or negative error code. * The key words are stored in *key on success. * - * Should be called with current-mm-mmap_sem but NOT any spinlocks. + * shared is NULL for PROCESS_PRIVATE futexes + * For other futexes, it points to current-mm-mmap_sem and + * caller must have taken the reader lock. but NOT any spinlocks. */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, + struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current-mm-mmap_sem, so it doesn't seem like a boolean flag would hurt? { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current-mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key-both.offset; /* +* PROCESS_PRIVATE futexes are fast. +* As the mm cannot disappear under us and the 'key' only needs +* virtual address, we dont even have to find the underlying vma. +* Note : We do have to check 'address' is a valid user address, +*but access_ok() should be faster than find_vma() +* Note : At this point, address points to the start of page, +*not the real futex address, this is ok. +*/ + if (!shared) { + if (!access_ok(VERIFY_WRITE, address, sizeof(int))) + return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? + key-private.mm = mm; + key-private.address = address; + return 0; + } + /* * The futex is hashed differently depending on whether * it's in a shared or private mapping. So check vma first. */ @@ -244,6 +266,7 @@ int get_futex_key(void __user *uaddr, un * mappings of _writable_ handles. */ if (likely(!(vma-vm_flags VM_MAYSHARE))) { + key-both.offset += FUT_OFF_MMSHARED; /* reference taken on mm */ key-private.mm = mm; key-private.address = address; return 0; @@ -253,7 +276,7 @@ int get_futex_key(void __user *uaddr, un * Linear file mappings are also simple. */ key-shared.inode = vma-vm_file-f_path.dentry-d_inode; - key-both.offset++; /* Bit 0 of offset indicates inode-based key. */ + key-both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma-vm_flags VM_NONLINEAR))) { key-shared.pgoff = (((address - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) @@ -281,17 +304,19 @@ EXPORT_SYMBOL_GPL(get_futex_key); * Take a reference to the resource addressed by a key. * Can be called while holding spinlocks. * - * NOTE: mmap_sem MUST be held between get_futex_key() and calling this - * function, if it is called at all. mmap_sem keeps key-shared.inode valid. */ inline void get_futex_key_refs(union futex_key *key) { - if (key-both.ptr != 0) { - if (key-both.offset 1) + if (key-both.ptr == 0) + return; + switch (key-both.offset (FUT_OFF_INODE|FUT_OFF_MMSHARED)) { + case FUT_OFF_INODE: atomic_inc(key-shared.inode-i_count); - else + break; + case FUT_OFF_MMSHARED: atomic_inc(key-private.mm-mm_count); - } + break; + } } EXPORT_SYMBOL_GPL(get_futex_key_refs); @@ -301,11 +326,15 @@ EXPORT_SYMBOL_GPL(get_futex_key_refs); */ void drop_futex_key_refs(union futex_key *key) { - if (key-both.ptr != 0) { - if (key-both.offset
Re: [PATCH] FUTEX : new PRIVATE futexes
Nick Piggin a écrit : Hi Eric, Thanks for doing this... It's looking good, I just have some minor comments: Hi Nick, thanks for reviewing. Eric Dumazet wrote: */ -int get_futex_key(void __user *uaddr, union futex_key *key) +int get_futex_key(void __user *uaddr, union futex_key *key, +struct rw_semaphore *shared) Can we pass in something other than the rw_semaphore here? Seeing as it only actually gets used as a flag, it might be nicer just to pass a 0 or 1? And all through the call stack... Did the whole thing just turn out neater when you passed the rwsem? We always know to use current-mm-mmap_sem, so it doesn't seem like a boolean flag would hurt? That's a good question current-mm-mmap_sem being calculated once is a win in itself, because current access is not cheap. It also does the memory access to go through part of the chain in advance, before its use. It does a prefetch() equivalent for free : If current-mm is not in CPU cache, CPU wont stall because next instructions dont depend on it. This means less CPU stall in case current-mm is not in CPU cache. Thats difficult to benchmark it, but you can trust me. A flag means : if (flag) up_read(current-mm-mmap_sem) This generates quite a bad code. if (ptr) up_read(ptr) generates *much* better code. So this is a cleanup and a runtime optimization. I dit a similar optimization on commit 163da958ba5282cbf85e8b3dc08e4f51f8b01c5e I invite you to check it : http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=163da958ba5282cbf85e8b3dc08e4f51f8b01c5e { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current-mm; @@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un address -= key-both.offset; /* + * PROCESS_PRIVATE futexes are fast. + * As the mm cannot disappear under us and the 'key' only needs + * virtual address, we dont even have to find the underlying vma. + * Note : We do have to check 'address' is a valid user address, + *but access_ok() should be faster than find_vma() + * Note : At this point, address points to the start of page, + *not the real futex address, this is ok. + */ +if (!shared) { +if (!access_ok(VERIFY_WRITE, address, sizeof(int))) +return -EFAULT; Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it should depend on the size of the operation. Maybe the access_ok check should go outside get_futex_key? If you check again, you'll see that address points to the start of the PAGE, not the real u32/u64 futex address. This checks the PAGE. We can use char, short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span two pages. */ key-shared.inode = vma-vm_file-f_path.dentry-d_inode; -key-both.offset++; /* Bit 0 of offset indicates inode-based key. */ +key-both.offset += FUT_OFF_INODE; /* inode-based key. */ if (likely(!(vma-vm_flags VM_NONLINEAR))) { key-shared.pgoff = (((address - vma-vm_start) PAGE_SHIFT) + vma-vm_pgoff); I like |= for adding flags, it seems less ambiguous. But I guess that's a matter of opinion. Hugh seems to like +=, and I can't argue with him about style issues ;) Previous code was doing offset++ wich means offset += 1; I didnt want to hurt Hugh :) EXPORT_SYMBOL_GPL(drop_futex_key_refs); I wonder if it would be worthwhile inlining and likley()ing the private fastpath? Might make it pretty compact... I guess that's something to worry about after glibc gets support. Yes, in a future patch, in about one year :) + +if (!(vma = find_vma(mm, address)) || +vma-vm_start address || !(vma-vm_flags VM_WRITE)) +ret = -EFAULT; + +else +switch (handle_mm_fault(mm, vma, address, 1)) { +case VM_FAULT_MINOR: +current-min_flt++; +break; +case VM_FAULT_MAJOR: +current-maj_flt++; +break; +default: +ret = -EFAULT; +} +if (!shared) +up_read(mm-mmap_sem); +return ret; } /* You've got an extra space after the if (maybe for clarity?). In this situation I prefer putting braces around both the if and the else, and if you get rid of that blank line, it doesn't cost you anything more ;) Oh well... @@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us restart-arg1 = val; restart-arg2 = (unsigned long)abs_time; restart-arg3 = (unsigned long)futex64; +if (shared) +restart-arg3 |= 2; Could you make this into a proper flags argument and use #define CONSTANTs for it? Yes, but I'm not sure it will improve readability. @@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u struct timespec ts; ktime_t t, *tp = NULL; u64 val2 = 0; +int opm = op FUTEX_CMD_MASK; What's opm stand for? I guess 'm' stands for