Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel
--- Al Viro <[EMAIL PROTECTED]> wrote: > On Sat, Oct 27, 2007 at 11:01:12AM +0200, Ahmed S. Darwish wrote: > > The problem here (As discussed in private mails) is that the for loop > > assumes that the beginning of given user-space buffer is the beginning > > of a rule. This leads to situations where the rule becomes "ecret 20", > > or "cret 20" instead of "Secret 20". Big input buffers/files leads > > smack to recieve a rule like "Secret 20" in fragmented chunks like: > > > > write("\nSec", ..) > > write("r", 1, ..) > > write("et 20\n", ..) > > > > Parsing a rule in such tough conditions in _kernel space_ is very > > hard. I began to feel that it will be much easier if we do the parsing > > in a userspace utility and let smack accept only small buffers (80 char). > > For crying out louf, all it takes is a finite state machine... BTW, folks, > your parser *and* input language suck. Really. Silently allowing noise > is Dumb(tm). I was planning to patient Dumb, but if you've already got a copywrite on it I'll have to go a different route. Carp. > Please, write the grammar down and _follow_ _it_. Good idea. I'll do it. > ... > > Come on, people, this is ridiculous - why bother reinventing the wheels for > the stuff that belongs to exercises halfway through any self-respecting > introductory textbook? Scary parser, my arse... That code is in need of cleansing. With bleach. Yes, it's on my list. I managed to get my virtual disk back, found how I messed up on v9, and now have a (long) list of improvements and fixes. No rest for the wicked. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fujitsu application panel driver
Hi Stephen, On Tuesday 23 October 2007 15:55, Stephen Hemminger wrote: > + > +static int apanel_setkeycode(struct input_dev *idev, int scancode, int > keycode) > +{ > + struct apanel *ap = idev->private; > + > + if (keycode < 0 || keycode > KEY_MAX) > + return -EINVAL; > + > + if (scancode < 0 || scancode >= MAX_PANEL_KEYS) > + return -EINVAL; scancode >= idev->keycodemax is prbably better here - we don't want to allow setting keycode for unsupported buttons. > + > + clear_bit(ap->keymap[scancode], idev->keybit); This will not work if one has same code assigned to 2 buttons. Pretty degenerate case, I know... > + ap->keymap[scancode] = keycode; > + set_bit(keycode, idev->keybit); > + return 0; > +} -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UML building failed in current Linus-tree
On Sun, Oct 28, 2007 at 03:43:43AM +, Al Viro wrote: >On Sun, Oct 28, 2007 at 11:24:41AM +0800, WANG Cong wrote: >> >> Hi, Jeff, Sam! >> >> I just pulled from Linus-tree, and got the following error when building uml. >> >> $ make defconfig ARCH=um >> /home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: >> /home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or >> directory >> make: *** No rule to make target >> `/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'. Stop. >> >> Is this a known problem? Yesterday's Linus-tree was fine. > >diff --git a/arch/um/Kconfig.i386 b/arch/um/Kconfig.i386 >index 9876d80..e0ac74e 100644 >--- a/arch/um/Kconfig.i386 >+++ b/arch/um/Kconfig.i386 Thanks, Al. With your patch, it works fine. Jeff, could you please consider pushing this and Al's another patch[1] into Linus-tree? UML in Linus-tree can't work for a long time. ;( [1] http://lkml.org/lkml/2007/10/21/118 Regards. WANG Cong - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scatterlist fallout: drivers/scsi/arm/
On Sat, 27 Oct 2007, Russell King wrote: > > FYI, here's what I have queued (cut'n'pasted so probably won't apply): Yes, much nicer. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Sanitize the type of struct user.u_ar0
struct user.u_ar0 is defined to contain a pointer offset on all architectures in which it is defined (all architectures which define an a.out format except SPARC.) However, it has a pointer type in the headers, which is pointless -- is not exported to userspace, and it just makes the code messy. Redefine the field as "unsigned long" (which is the same size as a pointer on all Linux architectures) and change the setting code to user offsetof() instead of hand-coded arithmetic. Cc: Linux Arch Mailing List <[EMAIL PROTECTED]> Cc: Bryan Wu <[EMAIL PROTECTED]> Cc: Roman Zippel <[EMAIL PROTECTED]> Cc: Thomas Gleixner <[EMAIL PROTECTED]> Cc: Ingo Molnar <[EMAIL PROTECTED]> Cc: Richard Henderson <[EMAIL PROTECTED]> Cc: Ivan Kokshaysky <[EMAIL PROTECTED]> Cc: Russell King <[EMAIL PROTECTED]> Cc: Lennert Buytenhek <[EMAIL PROTECTED]> Cc: Håvard Skinnemoen <[EMAIL PROTECTED]> Cc: Mikael Starvik <[EMAIL PROTECTED]> Cc: Yoshinori Sato <[EMAIL PROTECTED]> Cc: Tony Luck <[EMAIL PROTECTED]> Cc: Hirokazu Takata <[EMAIL PROTECTED]> Cc: Ralf Baechle <[EMAIL PROTECTED]> Cc: Paul Mackerras <[EMAIL PROTECTED]> Cc: Martin Schwidefsky <[EMAIL PROTECTED]> Cc: Heiko Carstens <[EMAIL PROTECTED]> Cc: Paul Mundt <[EMAIL PROTECTED]> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]> --- arch/blackfin/kernel/process.c |2 +- arch/m68k/kernel/process.c |2 +- arch/x86/ia32/ia32_aout.c |2 +- fs/binfmt_aout.c |2 +- include/asm-alpha/user.h |2 +- include/asm-arm/user.h |2 +- include/asm-avr32/user.h |2 +- include/asm-blackfin/user.h|2 +- include/asm-cris/user.h|2 +- include/asm-h8300/user.h |3 +-- include/asm-ia64/user.h|2 +- include/asm-m32r/user.h|2 +- include/asm-m68k/user.h|3 +-- include/asm-mips/user.h|2 +- include/asm-powerpc/user.h |2 +- include/asm-s390/user.h|3 +-- include/asm-sh/user.h |2 +- include/asm-sh64/user.h|2 +- include/asm-v850/user.h|2 +- include/asm-x86/user_32.h |2 +- include/asm-x86/user_64.h |2 +- 21 files changed, 21 insertions(+), 24 deletions(-) diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c index 9124467..5c08004 100644 --- a/arch/blackfin/kernel/process.c +++ b/arch/blackfin/kernel/process.c @@ -257,7 +257,7 @@ void dump_thread(struct pt_regs *regs, struct user *dump) ((unsigned long)(TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT; - dump->u_ar0 = (struct user_regs_struct *)((int)>regs - (int)dump); + dump->u_ar0 = offsetof(struct user, regs); dump->regs.r0 = regs->r0; dump->regs.r1 = regs->r1; diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c index 3ee9186..f85b928 100644 --- a/arch/m68k/kernel/process.c +++ b/arch/m68k/kernel/process.c @@ -335,7 +335,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump) if (dump->start_stack < TASK_SIZE) dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT; - dump->u_ar0 = (struct user_regs_struct *)((int)>regs - (int)dump); + dump->u_ar0 = offsetof(struct user, regs); sw = ((struct switch_stack *)regs) - 1; dump->regs.d1 = regs->d1; dump->regs.d2 = regs->d2; diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c index 731aac1..c489fee 100644 --- a/arch/x86/ia32/ia32_aout.c +++ b/arch/x86/ia32/ia32_aout.c @@ -162,7 +162,7 @@ static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file, u has_dumped = 1; current->flags |= PF_DUMPCORE; strncpy(dump.u_comm, current->comm, sizeof(current->comm)); - dump.u_ar0 = (u32)(((unsigned long)()) - ((unsigned long)())); + dump.u_ar0 = offsetof(struct user32, regs); dump.signal = signr; dump_thread32(regs, ); diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c index e176d19..9a00063 100644 --- a/fs/binfmt_aout.c +++ b/fs/binfmt_aout.c @@ -115,7 +115,7 @@ static int aout_core_dump(long signr, struct pt_regs *regs, struct file *file, u current->flags |= PF_DUMPCORE; strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm)); #ifndef __sparc__ - dump.u_ar0 = (void *)(((unsigned long)()) - ((unsigned long)())); + dump.u_ar0 = offsetof(struct user, regs); #endif dump.signal = signr; dump_thread(regs, ); diff --git a/include/asm-alpha/user.h b/include/asm-alpha/user.h index 7e417fc..a4eb6a4 100644 --- a/include/asm-alpha/user.h +++ b/include/asm-alpha/user.h @@ -39,7 +39,7 @@ struct user { unsigned long start_data; /* data starting address */ unsigned long start_stack;/* stack starting address */ long intsignal; /* signal causing core dump */ -
Re: [2.6 patch] input/serio/hp_sdc.c section fix
On Wednesday 24 October 2007 12:26, Adrian Bunk wrote: > hp_sdc_exit() mustn't be __exit since it's called from the > __init hp_sdc_register(). > Applied, thank you Adrian. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: epoll design problems with common fork/exec patterns
> 6) Epoll removes the file from the set, when the *kernel* object gets >closed (internal use-count goes to zero) > > With that in mind, how can the code snippet above trigger a removal from > the epoll set? I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd 5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set, since there no longer is an fd 8? Events on files registered for epoll notification are reported by descriptor, so the set membership has to be associated (as reflected into userspace) with the descriptor, not the file. For example, consider: 1) Process creates an epoll set, the set gets fd 4. 2) Process creates a socket, it gets fd 5. 3) The process adds fd 5 to set 4. 4) The process forks. 5) The child inherits the epoll set but not the socket. Here the kernel cannot quite do the right thing. Ideally, the parent would still have fd 5 in its version of the epoll set. After all, it has not closed fd 5. However, the child *cannot* see fd 5 in its version of the epoll set since it has no fd 5. An event reported for fd 5 would be nonsense. So it seems the kernel either has to break one of these "would/cannot" requirements, or it has to split the epoll set in two. However, splitting the set into two sets is clearly wrong since the processes should share it. Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. Note that this talks of the close of an "fd", not a file. The 'close' function in fact closes an fd, as that fd is then reusable. So it sounds like the problem above is solved by removing the fd from the set, but in practice this doesn't happen. I have programs that call 'close' between 'fork' and 'exec' and do not see the socket removed from the poll set. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Dump stack during sysctl registration failure
Alexey Dobriyan <[EMAIL PROTECTED]> writes: > On Sat, Oct 27, 2007 at 10:34:53PM +0400, wrote: >> --- a/arch/powerpc/kernel/idle.c >> +++ b/arch/powerpc/kernel/idle.c >> @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = { >> { >> .ctl_name = CTL_KERN, >> .procname = "kernel", >> -.mode = 0755, >> +.mode = 0555, > > Speaking of... > > > [PATCH] Dump stack during sysctl registration failure > > Let's make immediately obvious from where sysctl comes from and > messages itself more noticeable. > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]> For ambiguous cases like directories this looks like a serious help. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: UML building failed in current Linus-tree
On Sun, Oct 28, 2007 at 11:24:41AM +0800, WANG Cong wrote: > > Hi, Jeff, Sam! > > I just pulled from Linus-tree, and got the following error when building uml. > > $ make defconfig ARCH=um > /home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: > /home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or > directory > make: *** No rule to make target > `/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'. Stop. > > Is this a known problem? Yesterday's Linus-tree was fine. diff --git a/arch/um/Kconfig.i386 b/arch/um/Kconfig.i386 index 9876d80..e0ac74e 100644 --- a/arch/um/Kconfig.i386 +++ b/arch/um/Kconfig.i386 @@ -1,6 +1,6 @@ menu "Host processor type and features" -source "arch/i386/Kconfig.cpu" +source "arch/x86/Kconfig.cpu" endmenu diff --git a/arch/um/Makefile-i386 b/arch/um/Makefile-i386 index 08433f8..b01dfb0 100644 --- a/arch/um/Makefile-i386 +++ b/arch/um/Makefile-i386 @@ -28,7 +28,7 @@ CONFIG_X86_32 := y export CONFIG_X86_32 # First of all, tune CFLAGS for the specific CPU. This actually sets cflags-y. -include $(srctree)/arch/i386/Makefile.cpu +include $(srctree)/arch/x86/Makefile_32.cpu # prevent gcc from keeping the stack 16 byte aligned. Taken from i386. cflags-y += $(call cc-option,-mpreferred-stack-boundary=2) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 09/10] SLUB: Do our own locking via slab_lock and slab_unlock.
Too many troubles with the bitlocks and we really do not need to do any bitops. Bitops do not effectively retrieve the old value which we want. So use a cmpxchg instead on the arches that allow it. Instead of modifying the page->flags with fake atomic operations we pass the page state as a parameter to functions. No function uses the slab state if the page lock is held. Function must wait until the lock is cleared. Thus we can defer the update of page->flags until slab processing is complete. The "unlock" operation is then simply updating page->flags. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 324 +++--- 1 file changed, 187 insertions(+), 137 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-27 07:56:03.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-27 07:56:52.0 -0700 @@ -101,6 +101,7 @@ */ #define FROZEN (1 << PG_active) +#define LOCKED (1 << PG_locked) #ifdef CONFIG_SLUB_DEBUG #define SLABDEBUG (1 << PG_error) @@ -108,36 +109,6 @@ #define SLABDEBUG 0 #endif -static inline int SlabFrozen(struct page *page) -{ - return page->flags & FROZEN; -} - -static inline void SetSlabFrozen(struct page *page) -{ - page->flags |= FROZEN; -} - -static inline void ClearSlabFrozen(struct page *page) -{ - page->flags &= ~FROZEN; -} - -static inline int SlabDebug(struct page *page) -{ - return page->flags & SLABDEBUG; -} - -static inline void SetSlabDebug(struct page *page) -{ - page->flags |= SLABDEBUG; -} - -static inline void ClearSlabDebug(struct page *page) -{ - page->flags &= ~SLABDEBUG; -} - /* * Issues still to be resolved: * @@ -818,11 +789,12 @@ static void trace(struct kmem_cache *s, /* * Tracking of fully allocated slabs for debugging purposes. */ -static void add_full(struct kmem_cache *s, struct page *page) +static void add_full(struct kmem_cache *s, struct page *page, + unsigned long state) { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - if (!SlabDebug(page) || !(s->flags & SLAB_STORE_USER)) + if (!(state & SLABDEBUG) || !(s->flags & SLAB_STORE_USER)) return; spin_lock(>list_lock); list_add(>lru, >full); @@ -894,7 +866,7 @@ bad: } static noinline int free_debug_processing(struct kmem_cache *s, - struct page *page, void *object, void *addr) + struct page *page, void *object, void *addr, unsigned long state) { if (!check_slab(s, page)) goto fail; @@ -930,7 +902,7 @@ static noinline int free_debug_processin } /* Special debug activities for freeing objects */ - if (!SlabFrozen(page) && page->freelist == page->end) + if (!(state & FROZEN) && page->freelist == page->end) remove_full(s, page); if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_FREE, addr); @@ -1047,7 +1019,8 @@ static inline int slab_pad_check(struct { return 1; } static inline int check_object(struct kmem_cache *s, struct page *page, void *object, int active) { return 1; } -static inline void add_full(struct kmem_cache *s, struct page *page) {} +static inline void add_full(struct kmem_cache *s, struct page *page, + unsigned long state) {} static inline unsigned long kmem_cache_flags(unsigned long objsize, unsigned long flags, const char *name, void (*ctor)(struct kmem_cache *, void *)) @@ -1105,6 +1078,7 @@ static noinline struct page *new_slab(st void *start; void *last; void *p; + unsigned long state; BUG_ON(flags & GFP_SLAB_BUG_MASK); @@ -1117,11 +1091,12 @@ static noinline struct page *new_slab(st if (n) atomic_long_inc(>nr_slabs); page->slab = s; - page->flags |= 1 << PG_slab; + state = 1 << PG_slab; if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | SLAB_TRACE)) - SetSlabDebug(page); + state |= SLABDEBUG; + page->flags |= state; start = page_address(page); page->end = start + 1; @@ -1147,13 +1122,13 @@ static void __free_slab(struct kmem_cach { int pages = 1 << s->order; - if (unlikely(SlabDebug(page))) { + if (unlikely(page->flags & SLABDEBUG)) { void *p; slab_pad_check(s, page); for_each_object(p, s, slab_address(page)) check_object(s, page, p, 0); - ClearSlabDebug(page); + page->flags &= ~SLABDEBUG; } mod_zone_page_state(page_zone(page), @@ -1196,27 +1171,73 @@ static void discard_slab(struct kmem_cac free_slab(s,
[patch 10/10] SLUB: Restructure slab alloc
Restructure slab_alloc so that the code flows in the sequence it is usually executed. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 40 1 file changed, 24 insertions(+), 16 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-27 07:58:07.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-27 07:58:36.0 -0700 @@ -1580,16 +1580,28 @@ static void *__slab_alloc(struct kmem_ca local_irq_save(flags); preempt_enable_no_resched(); #endif - if (!c->page) - goto new_slab; + if (likely(c->page)) { + state = slab_lock(c->page); + + if (unlikely(node_match(c, node) && + c->page->freelist != c->page->end)) + goto load_freelist; + + deactivate_slab(s, c, state); + } + +another_slab: + state = get_partial(s, c, gfpflags, node); + if (!state) + goto grow_slab; - state = slab_lock(c->page); - if (unlikely(!node_match(c, node))) - goto another_slab; load_freelist: - object = c->page->freelist; - if (unlikely(object == c->page->end)) - goto another_slab; + /* +* slabs from the partial list must have at least +* one free object. +*/ + VM_BUG_ON(c->page->freelist == c->page->end); + if (unlikely(state & SLABDEBUG)) goto debug; @@ -1607,20 +1619,16 @@ out: #endif return object; -another_slab: - deactivate_slab(s, c, state); - -new_slab: - state = get_partial(s, c, gfpflags, node); - if (state) - goto load_freelist; - +/* Extend the slabcache with a new slab */ +grow_slab: state = get_new_slab(s, , gfpflags, node); if (state) goto load_freelist; object = NULL; goto out; + +/* Perform debugging */ debug: object = c->page->freelist; if (!alloc_debug_processing(s, c->page, object, addr)) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 07/10] SLUB: Avoid referencing kmem_cache structure in __slab_alloc
There is the need to use the objects per slab in the first part of __slab_alloc() which is still pretty hot. Copy the number of objects per slab into the kmem_cache_cpu structure. That way we can get the value from a cache line that we already need to touch. This brings the kmem_cache_cpu structure up to 4 even words. There is no increase in the size of kmem_cache_cpu since the size of object is rounded to the next word. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/slub_def.h |1 + mm/slub.c|3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6/include/linux/slub_def.h === --- linux-2.6.orig/include/linux/slub_def.h 2007-10-26 19:09:16.0 -0700 +++ linux-2.6/include/linux/slub_def.h 2007-10-27 07:55:12.0 -0700 @@ -17,6 +17,7 @@ struct kmem_cache_cpu { int node; unsigned int offset; unsigned int objsize; + unsigned int objects; }; struct kmem_cache_node { Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-27 07:52:12.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-27 07:55:12.0 -0700 @@ -1512,7 +1512,7 @@ load_freelist: object = c->page->freelist; c->freelist = object[c->offset]; - c->page->inuse = s->objects; + c->page->inuse = c->objects; c->page->freelist = c->page->end; c->node = page_to_nid(c->page); unlock_out: @@ -1896,6 +1896,7 @@ static void init_kmem_cache_cpu(struct k c->node = 0; c->offset = s->offset / sizeof(void *); c->objsize = s->objsize; + c->objects = s->objects; } static void init_kmem_cache_node(struct kmem_cache_node *n) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 08/10] SLUB: Optional fast path using cmpxchg_local
Provide an alternate implementation of the SLUB fast paths for alloc and free using cmpxchg_local. The cmpxchg_local fast path is selected for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an interrupt enable/disable sequence. This is known to be true for both x86 platforms so set FAST_CMPXCHG_LOCAL for both arches. Not all arches can support fast cmpxchg operations. Typically the architecture must have an optimized cmpxchg instruction. The cmpxchg fast path makes no sense on platforms whose cmpxchg is slower than interrupt enable/disable (like f.e. IA64). The advantages of a cmpxchg_local based fast path are: 1. Lower cycle count (30%-60% faster) 2. There is no need to disable and enable interrupts on the fast path. Currently interrupts have to be disabled and enabled on every slab operation. This is likely saving a significant percentage of interrupt off / on sequences in the kernel. 3. The disposal of freed slabs can occur with interrupts enabled. The alternate path is realized using #ifdef's. Several attempts to do the same with macros and in line functions resulted in a mess (in particular due to the strange way that local_interrupt_save() handles its argument and due to the need to define macros/functions that sometimes disable interrupts and sometimes do something else. The macro based approaches made it also difficult to preserve the optimizations for the non cmpxchg paths). #ifdef seems to be the way to go here to have a readable source. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- arch/x86/Kconfig.i386 |4 ++ arch/x86/Kconfig.x86_64 |4 ++ mm/slub.c | 71 ++-- 3 files changed, 77 insertions(+), 2 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-27 10:39:07.583665939 -0700 +++ linux-2.6/mm/slub.c 2007-10-27 10:40:19.710415861 -0700 @@ -1496,7 +1496,12 @@ static void *__slab_alloc(struct kmem_ca { void **object; struct page *new; +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + unsigned long flags; + local_irq_save(flags); + preempt_enable_no_resched(); +#endif if (!c->page) goto new_slab; @@ -1518,6 +1523,10 @@ load_freelist: unlock_out: slab_unlock(c->page); out: +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + preempt_disable(); + local_irq_restore(flags); +#endif return object; another_slab: @@ -1592,9 +1601,26 @@ static void __always_inline *slab_alloc( gfp_t gfpflags, int node, void *addr) { void **object; - unsigned long flags; struct kmem_cache_cpu *c; +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + c = get_cpu_slab(s, get_cpu()); + do { + object = c->freelist; + if (unlikely(is_end(object) || !node_match(c, node))) { + object = __slab_alloc(s, gfpflags, node, addr, c); + if (unlikely(!object)) { + put_cpu(); + goto out; + } + break; + } + } while (cmpxchg_local(>freelist, object, object[c->offset]) + != object); + put_cpu(); +#else + unsigned long flags; + local_irq_save(flags); c = get_cpu_slab(s, smp_processor_id()); if (unlikely((is_end(c->freelist)) || !node_match(c, node))) { @@ -1609,6 +1635,7 @@ static void __always_inline *slab_alloc( c->freelist = object[c->offset]; } local_irq_restore(flags); +#endif if (unlikely((gfpflags & __GFP_ZERO))) memset(object, 0, c->objsize); @@ -1644,6 +1671,11 @@ static void __slab_free(struct kmem_cach void *prior; void **object = (void *)x; +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + unsigned long flags; + + local_irq_save(flags); +#endif slab_lock(page); if (unlikely(SlabDebug(page))) @@ -1669,6 +1701,9 @@ checks_ok: out_unlock: slab_unlock(page); +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + local_irq_restore(flags); +#endif return; slab_empty: @@ -1679,6 +1714,9 @@ slab_empty: remove_partial(s, page); slab_unlock(page); +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + local_irq_restore(flags); +#endif discard_slab(s, page); return; @@ -1703,9 +1741,37 @@ static void __always_inline slab_free(st struct page *page, void *x, void *addr) { void **object = (void *)x; - unsigned long flags; struct kmem_cache_cpu *c; +#ifdef CONFIG_FAST_CMPXCHG_LOCAL + void **freelist; + + c = get_cpu_slab(s, get_cpu()); + debug_check_no_locks_freed(object, s->objsize); +
[patch 06/10] SLUB: Provide unique end marker for each slab
Currently we use the NULL pointer to signal that there are no more objects. However the NULL pointers of all slabs match in contrast to the pointers to the real objects which are distinctive for each slab page. Change the end pointer to be simply a pointer to the first object with bit 0 set. That way all end pointers are different for each slab. This is necessary to allow reliable end marker identification for the next patch that implements a fast path without the need to disable interrupts. Bring back the use of the mapping field by SLUB since we would otherwise have to call a relatively expensive function page_address() in __slab_alloc(). Use of the mapping field allows avoiding calling page_address() in various other functions as well. There is no need to change the page_mapping() function since bit 0 is set on the mapping as also for anonymous pages. page_mapping(slab_page) will therefore still return NULL although the mapping field is overloaded. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- include/linux/mm_types.h |5 ++- mm/slub.c| 72 ++- 2 files changed, 51 insertions(+), 26 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-26 19:09:03.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-27 07:52:12.0 -0700 @@ -277,15 +277,32 @@ static inline struct kmem_cache_cpu *get #endif } +/* + * The end pointer in a slab is special. It points to the first object in the + * slab but has bit 0 set to mark it. + * + * Note that SLUB relies on page_mapping returning NULL for pages with bit 0 + * in the mapping set. + */ +static inline int is_end(void *addr) +{ + return (unsigned long)addr & PAGE_MAPPING_ANON; +} + +void *slab_address(struct page *page) +{ + return page->end - PAGE_MAPPING_ANON; +} + static inline int check_valid_pointer(struct kmem_cache *s, struct page *page, const void *object) { void *base; - if (!object) + if (object == page->end) return 1; - base = page_address(page); + base = slab_address(page); if (object < base || object >= base + s->objects * s->size || (object - base) % s->size) { return 0; @@ -318,7 +335,8 @@ static inline void set_freepointer(struc /* Scan freelist */ #define for_each_free_object(__p, __s, __free) \ - for (__p = (__free); __p; __p = get_freepointer((__s), __p)) + for (__p = (__free); (__p) != page->end; __p = get_freepointer((__s),\ + __p)) /* Determine object index from a given position */ static inline int slab_index(void *p, struct kmem_cache *s, void *addr) @@ -470,7 +488,7 @@ static void slab_fix(struct kmem_cache * static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p) { unsigned int off; /* Offset of last byte */ - u8 *addr = page_address(page); + u8 *addr = slab_address(page); print_tracking(s, p); @@ -648,7 +666,7 @@ static int slab_pad_check(struct kmem_ca if (!(s->flags & SLAB_POISON)) return 1; - start = page_address(page); + start = slab_address(page); end = start + (PAGE_SIZE << s->order); length = s->objects * s->size; remainder = end - (start + length); @@ -715,7 +733,7 @@ static int check_object(struct kmem_cach * of the free objects in this slab. May cause * another error because the object count is now wrong. */ - set_freepointer(s, p, NULL); + set_freepointer(s, p, page->end); return 0; } return 1; @@ -749,18 +767,18 @@ static int on_freelist(struct kmem_cache void *fp = page->freelist; void *object = NULL; - while (fp && nr <= s->objects) { + while (fp != page->end && nr <= s->objects) { if (fp == search) return 1; if (!check_valid_pointer(s, page, fp)) { if (object) { object_err(s, page, object, "Freechain corrupt"); - set_freepointer(s, object, NULL); + set_freepointer(s, object, page->end); break; } else { slab_err(s, page, "Freepointer corrupt"); - page->freelist = NULL; + page->freelist = page->end; page->inuse = s->objects; slab_fix(s, "Freelist cleared"); return 0; @@ -870,7 +888,7 @@ bad: */ slab_fix(s, "Marking all objects used");
[patch 00/10] SLUB: SMP regression tests on Dual Xeon E5345 (8p) and new performance patches
Recent reports from Intel indicated that there were regression on SMP benchmarks vs. SLAB. This is a discussion of performance results and some patches are attached to fix various issues. The patches are also available via git pull from git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance SLAB and SLUB are fundamentally different architectures. SLAB batches multiple objects on queues. The movement between queues is protected by a single lock (at least in the SMP configuration). SLAB can move an arbitrary amount of objects by taking the list_lock. Integration of objects into the slabs is deferred as much as possible while objects circle on various slab queues. SLUB's design is to directly integrate or extract the objects from the slabs without going through intermediate queues. Thus the overhead is eliminated. SLUB has a lock in each slab allowing fine grained locking. Centralized locks are rarely taken. SLUB cannot batch objects to optimize lock use. Instead a whole slab is assigned to a processor. Allocations and frees can then occur from the CPU slab without taking the slab lock. However, that is limited to the number of objects that fit into a slab in contrast to SLAB which can extract objects from multiple slabs and put them on a per CPU queue. If SLUB is freeing an objects then the per CPU slab can only be used if the object is part of the CPU slab. This is usually the case for short lived allocations. Long lived allocations and objects allocated on other CPUs will need to use the slow path where the slab_lock must be taken to synchronize the free. This makes the slab_free() path particularly problematic in SMP contexts. Optimization in SLUB is therefore mainly optimization of locking and of the execution code paths. The following patches optimize locking further by using a cmpxchg_local in the fast path and by avoiding stores to page struct fields etc to address regressions that we see under SMP. Another fundamental distinction between SLAB and SLUB is that SLAB was designed with SMP in mind. NUMA was a later add-on that added a significant complexity. SLUB was written for NUMA. NUMA support is native. The same slab_free() path that is problematic under SMP is effectively dealing with the alien cache problem that SLAB has under NUMA and is increasing performance of remote free operations significantly. The cpu_slab concept makes the determination of NUMA locality of objects simpler since we can match on the page that an object belongs to and move the whole page of objects in a NUMA aware fashion instead of the individual objects in the queues of SLAB. The fine grained locking is also important for SMP system with a large number of processors. SLAB can put lots of objects on its queues. However, current processors can take a large number of objects off the queues in a short time period. As a result we see significant lock contention using SLAB during parallel operations on the 8p SMP machine that is investigated here. SLAB has less problems scaling on NUMA with a more limited number of processors per node because SLAB will then use node based locks instead of global locks. Tests were run with 4 different kernels: SLAB = 2.6.24-rc1 configured to run SLAB SLUB = 2.6.24-rc1 configured to run SLUB SLUB+ = 2.6.24-rc1 patched with the following patches. SLUB-o = SLUB+ booted with slub_max_order=3 slub_min_objects=20 The SLAB tests result in the baseline to work against. SLUB is the current state of 2.6.24. SLUB+ is an version of SLUB that was optimized to run on the 8p SMP box after observing some of the performance issues. SLUB-o is useful to see what effect the use of higher order pages has on performance. All tests are done by running 1 operations on each processor. The time needed is measured using TSC times tamps. All measurements are in cycle counts. The higher the cycle count the more time the allocator needs to perform an operation. The lower the count the better the performance of the allocator. Test A: Single threaded kmalloc === A single cpu is running and is allocating 1 objects of various sizes. SLABSLUBSLUB+ SLUB-o 896 86 45 44 2 * 1684 92 49 48 3284 106 61 59 +++ 64102 129 82 88 ++ 128147 226 188 181 -- 256200 248 207 285 - 512300 301 260 209 ++ 1024416 440 398 264 ++ 2048720 542 530 390 +++ 40961254342 342 336 3 * SLUB passes 4k allocations directly through to the page allocator which is more efficient at handling page sized allocations than SLABs handling of them. 4k (or page sized) allocations will be special throughout these tests. We see a performance degradation vs. SLAB in the middle range that is reduced by the patch set. The cmpxchg_local operation used in SLUB+
[patch 05/10] SLUB: __slab_alloc() exit path consolidation
Use a single exit path by using goto's to the hottest exit path. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-25 19:38:14.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-25 19:38:47.0 -0700 @@ -1493,7 +1493,9 @@ load_freelist: c->page->inuse = s->objects; c->page->freelist = NULL; c->node = page_to_nid(c->page); +unlock_out: slab_unlock(c->page); +out: return object; another_slab: @@ -1541,7 +1543,8 @@ new_slab: c->page = new; goto load_freelist; } - return NULL; + object = NULL; + goto out; debug: object = c->page->freelist; if (!alloc_debug_processing(s, c->page, object, addr)) @@ -1550,8 +1553,7 @@ debug: c->page->inuse++; c->page->freelist = object[c->offset]; c->node = -1; - slab_unlock(c->page); - return object; + goto unlock_out; } /* -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 04/10] SLUB: Avoid checking for a valid object before zeroing on the fast path
The fast path always results in a valid object. Move the check for the NULL pointer to the slow branch that calls __slab_alloc. Only __slab_alloc can return NULL if there is no memory available anymore and that case is exceedingly rare. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-25 19:37:38.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-25 19:38:14.0 -0700 @@ -1573,19 +1573,22 @@ static void __always_inline *slab_alloc( local_irq_save(flags); c = get_cpu_slab(s, smp_processor_id()); - if (unlikely(!c->freelist || !node_match(c, node))) + if (unlikely(!c->freelist || !node_match(c, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); - - else { + if (unlikely(!object)) { + local_irq_restore(flags); + goto out; + } + } else { object = c->freelist; c->freelist = object[c->offset]; } local_irq_restore(flags); - if (unlikely((gfpflags & __GFP_ZERO) && object)) + if (unlikely((gfpflags & __GFP_ZERO))) memset(object, 0, c->objsize); - +out: return object; } -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 03/10] SLUB: Move kmem_cache_node determination into add_full and add_partial
The kmem_cache_node determination can be moved into add_full() and add_partial(). This removes some code from the slab_free() slow path and reduces the register overhead that has to be managed in the slow path. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 29 + 1 file changed, 17 insertions(+), 12 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-25 19:36:59.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-25 19:37:38.0 -0700 @@ -800,8 +800,12 @@ static void trace(struct kmem_cache *s, /* * Tracking of fully allocated slabs for debugging purposes. */ -static void add_full(struct kmem_cache_node *n, struct page *page) +static void add_full(struct kmem_cache *s, struct page *page) { + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + + if (!SlabDebug(page) || !(s->flags & SLAB_STORE_USER)) + return; spin_lock(>list_lock); list_add(>lru, >full); spin_unlock(>list_lock); @@ -1025,7 +1029,7 @@ static inline int slab_pad_check(struct { return 1; } static inline int check_object(struct kmem_cache *s, struct page *page, void *object, int active) { return 1; } -static inline void add_full(struct kmem_cache_node *n, struct page *page) {} +static inline void add_full(struct kmem_cache *s, struct page *page) {} static inline unsigned long kmem_cache_flags(unsigned long objsize, unsigned long flags, const char *name, void (*ctor)(struct kmem_cache *, void *)) @@ -1198,9 +1202,11 @@ static __always_inline int slab_trylock( /* * Management of partially allocated slabs */ -static void add_partial(struct kmem_cache_node *n, +static void add_partial(struct kmem_cache *s, struct page *page, int tail) { + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + spin_lock(>list_lock); n->nr_partial++; if (tail) @@ -1336,19 +1342,18 @@ static struct page *get_partial(struct k */ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail) { - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); - ClearSlabFrozen(page); if (page->inuse) { if (page->freelist) - add_partial(n, page, tail); - else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER)) - add_full(n, page); + add_partial(s, page, tail); + else + add_full(s, page); slab_unlock(page); } else { - if (n->nr_partial < MIN_PARTIAL) { + if (get_node(s, page_to_nid(page))->nr_partial + < MIN_PARTIAL) { /* * Adding an empty slab to the partial slabs in order * to avoid page allocator overhead. This slab needs @@ -1357,7 +1362,7 @@ static void unfreeze_slab(struct kmem_ca * partial list stays small. kmem_cache_shrink can * reclaim empty slabs from the partial list. */ - add_partial(n, page, 1); + add_partial(s, page, 1); slab_unlock(page); } else { slab_unlock(page); @@ -1633,7 +1638,7 @@ checks_ok: * then add it. */ if (unlikely(!prior)) - add_partial(get_node(s, page_to_nid(page)), page, 0); + add_partial(s, page, 0); out_unlock: slab_unlock(page); @@ -2041,7 +2046,7 @@ static struct kmem_cache_node *early_kme #endif init_kmem_cache_node(n); atomic_long_inc(>nr_slabs); - add_partial(n, page, 0); + add_partial(kmalloc_caches, page, 0); return n; } -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 02/10] SLUB: Noinline some functions to avoid them being folded into alloc/free
Some function tend to get folded into __slab_free and __slab_alloc although they are rarely called. They cause register pressure that leads to bad code generation. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-25 19:36:10.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-25 19:36:59.0 -0700 @@ -831,8 +831,8 @@ static void setup_object_debug(struct km init_tracking(s, object); } -static int alloc_debug_processing(struct kmem_cache *s, struct page *page, - void *object, void *addr) +static noinline int alloc_debug_processing(struct kmem_cache *s, + struct page *page, void *object, void *addr) { if (!check_slab(s, page)) goto bad; @@ -871,8 +871,8 @@ bad: return 0; } -static int free_debug_processing(struct kmem_cache *s, struct page *page, - void *object, void *addr) +static noinline int free_debug_processing(struct kmem_cache *s, + struct page *page, void *object, void *addr) { if (!check_slab(s, page)) goto fail; @@ -1075,7 +1075,8 @@ static void setup_object(struct kmem_cac s->ctor(s, object); } -static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node) +static noinline struct page *new_slab(struct kmem_cache *s, + gfp_t flags, int node) { struct page *page; struct kmem_cache_node *n; @@ -1209,7 +1210,7 @@ static void add_partial(struct kmem_cach spin_unlock(>list_lock); } -static void remove_partial(struct kmem_cache *s, +static noinline void remove_partial(struct kmem_cache *s, struct page *page) { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 01/10] SLUB: Consolidate add_partial and add_partial_tail to one function
Add a parameter to add_partial instead of having separate functions. That allows the detailed control from multiple places when putting slabs back to the partial list. If we put slabs back to the front then they are likely immediately used for allocations. If they are put at the end then we can maximize the time that the partial slabs spent without allocations. When deactivating slab we can put the slabs that had remote objects freed to them at the end of the list so that the cache lines can cool down. Slabs that had objects from the local cpu freed to them are put in the front of the list to be reused ASAP in order to exploit the cache hot state. [This patch is already in mm] Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 31 +++ 1 file changed, 15 insertions(+), 16 deletions(-) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c2007-10-24 08:33:01.0 -0700 +++ linux-2.6/mm/slub.c 2007-10-24 09:19:52.0 -0700 @@ -1197,19 +1197,15 @@ /* * Management of partially allocated slabs */ -static void add_partial_tail(struct kmem_cache_node *n, struct page *page) +static void add_partial(struct kmem_cache_node *n, + struct page *page, int tail) { spin_lock(>list_lock); n->nr_partial++; - list_add_tail(>lru, >partial); - spin_unlock(>list_lock); -} - -static void add_partial(struct kmem_cache_node *n, struct page *page) -{ - spin_lock(>list_lock); - n->nr_partial++; - list_add(>lru, >partial); + if (tail) + list_add_tail(>lru, >partial); + else + list_add(>lru, >partial); spin_unlock(>list_lock); } @@ -1337,7 +1333,7 @@ * * On exit the slab lock will have been dropped. */ -static void unfreeze_slab(struct kmem_cache *s, struct page *page) +static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail) { struct kmem_cache_node *n = get_node(s, page_to_nid(page)); @@ -1345,7 +1341,7 @@ if (page->inuse) { if (page->freelist) - add_partial(n, page); + add_partial(n, page, tail); else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER)) add_full(n, page); slab_unlock(page); @@ -1360,7 +1356,7 @@ * partial list stays small. kmem_cache_shrink can * reclaim empty slabs from the partial list. */ - add_partial_tail(n, page); + add_partial(n, page, 1); slab_unlock(page); } else { slab_unlock(page); @@ -1375,6 +1371,7 @@ static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) { struct page *page = c->page; + int tail = 1; /* * Merge cpu freelist into freelist. Typically we get here * because both freelists are empty. So this is unlikely @@ -1383,6 +1380,8 @@ while (unlikely(c->freelist)) { void **object; + tail = 0; /* Hot objects. Put the slab first */ + /* Retrieve object from cpu_freelist */ object = c->freelist; c->freelist = c->freelist[c->offset]; @@ -1393,7 +1392,7 @@ page->inuse--; } c->page = NULL; - unfreeze_slab(s, page); + unfreeze_slab(s, page, tail); } static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) @@ -1633,7 +1632,7 @@ * then add it. */ if (unlikely(!prior)) - add_partial(get_node(s, page_to_nid(page)), page); + add_partial(get_node(s, page_to_nid(page)), page, 0); out_unlock: slab_unlock(page); @@ -2041,7 +2040,7 @@ #endif init_kmem_cache_node(n); atomic_long_inc(>nr_slabs); - add_partial(n, page); + add_partial(n, page, 0); return n; } -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
UML building failed in current Linus-tree
Hi, Jeff, Sam! I just pulled from Linus-tree, and got the following error when building uml. $ make defconfig ARCH=um /home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: /home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or directory make: *** No rule to make target `/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'. Stop. Is this a known problem? Yesterday's Linus-tree was fine. Regards. -- May the Source Be With You. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Mapping PCI memory to user-space
> I am writing a driver to map a PCI board memory space (pcibar2) into a > user-space vma via 'mmap'. What is the relationship between the address > returned from ioremap and the type of address needed in the > 'io_remap_page_range' or 'remap_pfn_range' functions? How about the > following? (I am developing under RHEL4 and a 2.6.9 kernel) There is no relationship between the address returned from ioremap and what you pass into io_remap_page_range(). ioremap gives you a kernel virtual address for the PCI address you remap. io_remap_page_range() creates a userspace mapping in the same way, and you should pass in the PCI address exactly the same way you pass in the PCI address into ioremap. io_remap_pfn_range() takes a PFN ("page frame number"), which is basically the PCI address you want to map divided by PAGE_SIZE. The main reason for using PFNs is that they allow you to map addresses above 4G even if sizeof long is only 4. In your code: > dev.pcibar2 = ioremap_nocache(resource,size); > dev.region_start = dev.pcibar2 + offset; // RAM is at some offset > from base This gives you a kernel mapping that you can use with readl(), writel() etc to access the PCI memory from the kernel. To map to userspace, this: > if (io_remap_page_range(vma, phyaddr, vma->vm_start, vsize, > vma->vm_page_prot)) should use phyaddr as you have it here: > // phyaddr = physical address of PCI memory area This is just wrong: > unsigned long phy= __pa(dev->region_start + off); __pa() doesn't work on addresses returned from ioremap. Just use the same resource address you passed into ioremap. - R. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] [POWERPC] Fix CONFIG_SMP=n build break
On Sun, Oct 28, 2007 at 12:38:50PM +1100, Stephen Rothwell wrote: > Hi Olof, > > Just a trivial thing ... > > On Sat, 27 Oct 2007 12:28:51 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote: > > > > +u8 iic_get_target_id(int cpu) > > +{ > > + return per_cpu(iic, cpu).target_id; > > +} > > + > > +EXPORT_SYMBOL_GPL(iic_get_target_id); > > We don't normally put a blank line between a function and its EXPORT... Yeah, sloppy of me, I thought I just copied and pasted. Paulus: feel free to fix up before applying. Thanks for your feedback, -Olof - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] [POWERPC] Fix CONFIG_SMP=n build break
Hi Olof, Just a trivial thing ... On Sat, 27 Oct 2007 12:28:51 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote: > > +u8 iic_get_target_id(int cpu) > +{ > + return per_cpu(iic, cpu).target_id; > +} > + > +EXPORT_SYMBOL_GPL(iic_get_target_id); We don't normally put a blank line between a function and its EXPORT... -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpTqpBpC9CIc.pgp Description: PGP signature
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
Alexey Dobriyan <[EMAIL PROTECTED]> wrote: > >irq_flags_t flags; > >flags = spin_lock_irqXXX(); >spin_unlock_irqYYY(, flags); > > where XXX and YYY are still to be found good names :^) It's also a solution How about flags? flags = spin_lock_irqflags(); spin_unlock_irqflags(, flags); Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: eradicating out of tree modules (was: Linux Security *Module* Framework)
On Sat, Oct 27, 2007 at 04:07:41PM +0200, Tilman Schmidt wrote: > Greg KH schrieb: > > On Fri, Oct 26, 2007 at 11:46:39AM +0200, Tilman Schmidt wrote: > >> [...] I still think there will always be > >> a number of external modules that cannot be merged right now or at > >> all, and deliberately making life difficult for out-of-tree code > >> maintainers in order to coerce them into submitting their code for > >> inclusion in the kernel will not work, it'll only create bad > >> feelings. > > > > Do you have examples of proof of this? > > No proof in the legal, mathematical or scientific sense of the > term, but examples: > > - at least one talented kernel developer giving up his work, > until then maintained out of tree, after submitting it for > inclusion in the kernel and taking the ensuing fla^Wdiscussion > on LKML (nothing extraordinary, just the usual lack of > courtesy and respect) too much to heart >... There's one important point to note: In a project of the size of the Linux kernel (at about 2000 distinct people contributing code within one year) you will always lose developers: If you require too much from code for getting it included you lose some of the people who develop code. If you accept code of dubious quality you lose some of the people who care about the quality of the kernel. And if you add a stable API for modules with not GPL compatible licences at least one untalented kernel developer (me) might give up his work. If your goal is to please all developers you have a goal you can't achieve. The only reasonable way is to accept that whatever you do you'll lose some people and go in the direction you consider the right one. And the power of open source is that when an open source project gets into a direction many people dislike they can simply fork it. Consider e.g. XFree86->X.Org or NetBSD->OpenBSD. And that's nothing bad - either the forks develop in different directions creating different useful software or there's an evolutionary contest for the best software. > T. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: eradicating out of tree modules (was: : Linux Security *Module* Framework)
On Sat, Oct 27, 2007 at 04:47:15PM +0200, Tilman Schmidt wrote: > Adrian Bunk schrieb: > > On Fri, Oct 26, 2007 at 11:46:39AM +0200, Tilman Schmidt wrote: > >> On Thu, 25 Oct 2007 19:56:47 -0700, Greg KH wrote: > >>> On Fri, Oct 26, 2007 at 01:09:14AM +0200, Tilman Schmidt wrote: > [...] Once you admit that there is code which, for very good > reasons, won't ever be accepted into the mainline kernel tree, what you > are saying amounts to: "Code that isn't fit to be included in the > mainline kernel isn't fit to exist at all." > >>> What kind of code is not accepted into the mainline kernel tree for good > >>> reasons? > >> - proprietary code > > > > It's unclear whether distributing not GPL compatible modules is legal > > at all. > > We're neither talking about distribution nor legal aspects, but > about existence. But anyway, you seem to agree with me that there > are very good reasons for not including these in the kernel. > > > And they are definitely not "very good reasons" for doing anything in > > the kernel. > > There is a big difference between "not doing anything to help" > and "actively doing something to make life difficult for". The > former is undoubtedly legitimate. It's the latter we're > discussing here. Justifying anything with code with not GPL compatible licences has zero relevance here. And there's value in making life harder for such modules with questionable legality. As an example, consider people who experienced crashes of "the Linux kernel" caused by some binary-only driver. Not that uncommon e.g. with some graphics drivers. This harms the reputation of Linux as being stable. The solution is not to support proprietary drivers, the solution is to get open source replacements. >... > >> - code conflicting with existing kernel structure or policy > >> - code in which the concerned subsystem maintainers see no benefit > > > > Let's fix the problems, not work around them. > > That's certainly better, but not always possible. Do you > agree with me that if it isn't, then that's a very good > reason for not including that code in the kernel? No, it's still a reason for fixing the real problem. > > There is a conflict between getting code included and ensuring some > > minimum quality of the kernel, but in many cases we could try better. > > Correct. Again, you appear to agree with my statement that > for some code there are very good reasons not to include it > in the kernel. But this does not result in any obligation of supporting low quality external code that destabilizes the kernel of people using it. If it's low quality code doing something useful - well, how many hundred people are on Greg's list only waiting for some driver they could write? > > And when there's a good reason for a kernel policy, then code that > > violates this policy is not a "very good reason" for anything. > > >> - code which its author is unable and/or unwilling to convert to > >> kernel coding standards > >> - code whose author is unable and/or unwilling to defend it on LKML > >> ... > > > > That's their fault, and definitely not a "very good reason" for making > > life easier for them. > > Putting aside the fruitless question of whose fault it is, > is it a "very good reason" for actively making life more > difficult for them than it is already, eg. by gratuitiously > breaking interfaces they rely on for no other "very good > reason" than to discourage out-of-tree development? In other > words, do you think it benefits the Linux community if you > discourage those programmers you've already scared away from > submitting their code to the kernel from continuing their > work off-tree, too? In summary, do you think the world would > be a better place if all the existing out-of-tree modules > just ceased to exist, without any replacement? With your "without any replacement" you needlessly excluded the reasonable solution: The solution is that someone other than the author either takes the existing external code or rewrites it from scratch, submits it for inclusion into the kernel, and maintains it there. Let me repeat that Greg has said he has hundreds of volunteers for such tasks. > T. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
struct user . u_ar0
I recently noticed that all architectures appear to have an entry n struct user called u_ar0: struct user_pt_regs * u_ar0; /* Used by gdb to help find the values for */ /* the registers. */ In all cases, u_ar0 is a pointer type, although the type of pointer varies with the architecture. However, under no conditions does this field ever contain a pointer value! It is set by the a.out code and its derivatives as an offset, not a pointer value (there are a total of four references in the kernel, in arch/{m68k,blackfin}/kernel/process.c, arch/x86/ia32/ia32_aout.c and fs/binfmt_aout.c -- they are all functionally identical and write-only): dump.u_ar0 = (void *)(((unsigned long)()) - ((unsigned long)())); Any reason to *NOT* change this field to "unsigned long"? , where struct user is defined, is not exported to userspace in any architecture as far as I can tell, although , which just contains #include , *is* exported (clearly a bug.) -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6][RFC] Cleanup FIBMAP
Mike Waychison wrote: The following series is meant to clean up FIBMAP paths with the eventual goal of allowing users to be able to FIBMAP their data. Keep in mind FIBMAP is currently extremely expensive on some filesystems, e.g. ext3. Therefore, additional filesystem-level work would have to be done in order for this not become a DoS issue. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kconfig: conf segfault (with invalid kconfig contents)
Hi, On Thursday 25 October 2007, Sam Ravnborg wrote: > > It's clearly invalid in that it depends on what it selects, but it should > > just abort instead. > > Thanks for the report and especially for the testcase! > I will try to look at it a bit later if noone bites me (I'm afraid not). Well, you're also responsible for it. :) http://lkml.org/lkml/2007/5/6/14 It actually finds the recursive dependency, but it crashes because sym->prop is NULL. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
USB_SUSPEND needs SUSPEND?
Hello, it seems that there is some dependency missing for CONFIG_USB_SUSPEND. When I compile a kernel without CONFIG_SUSPEND, USB suspend doesn't work (or at least doesn't expose its interface to userspace). /sys/bus/usb/devices/.../power doesn't exist, but it is there when I enable CONFIG_SUSPEND. So should CONFIG_USB_SUSPEND depend on CONFIG_SUSPEND (or something selected by CONFIG_SUSPEND)? diff between the two configs follows, full config and dmesg for both kernels is available at http://stud4.tuwien.ac.at/~e0626486/usb_suspend/ (though the dmesg diff doesn't seem to be all that interesting). --- /boot/config-2.6.23.1-nosuspend +++ /boot/config-2.6.23.1-suspend @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.23.1 -# Sat Oct 27 23:59:11 2007 +# Sat Oct 27 22:45:27 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y @@ -30,7 +30,7 @@ CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 -CONFIG_LOCALVERSION="-nosuspend" +CONFIG_LOCALVERSION="-suspend" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y @@ -229,11 +229,13 @@ CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set +CONFIG_PM_SLEEP=y CONFIG_SUSPEND_UP_POSSIBLE=y -# CONFIG_SUSPEND is not set +CONFIG_SUSPEND=y CONFIG_HIBERNATION_UP_POSSIBLE=y # CONFIG_HIBERNATION is not set CONFIG_ACPI=y +CONFIG_ACPI_SLEEP=y CONFIG_ACPI_PROCFS=y CONFIG_ACPI_PROC_EVENT=y CONFIG_ACPI_AC=m @@ -255,6 +257,7 @@ CONFIG_X86_PM_TIMER=y CONFIG_ACPI_CONTAINER=m CONFIG_ACPI_SBS=m +# CONFIG_APM is not set # # CPU Frequency scaling -- Oliver - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel
On Sat, Oct 27, 2007 at 11:01:12AM +0200, Ahmed S. Darwish wrote: > The problem here (As discussed in private mails) is that the for loop > assumes that the beginning of given user-space buffer is the beginning > of a rule. This leads to situations where the rule becomes "ecret 20", > or "cret 20" instead of "Secret 20". Big input buffers/files leads > smack to recieve a rule like "Secret 20" in fragmented chunks like: > > write("\nSec", ..) > write("r", 1, ..) > write("et 20\n", ..) > > Parsing a rule in such tough conditions in _kernel space_ is very > hard. I began to feel that it will be much easier if we do the parsing > in a userspace utility and let smack accept only small buffers (80 char). For crying out louf, all it takes is a finite state machine... BTW, folks, your parser *and* input language suck. Really. Silently allowing noise is Dumb(tm). Please, write the grammar down and _follow_ _it_. AFAICS, trimming the crap ("we have a number, skip everything until '/', whatever noise we have there doesn't matter") leads to something like text: (whitespace line? \n)* whitespace: [ \t]* line: label whitespace number whitespace (/ whitespace set whitespace)? set: number (whitespace , whitespace number)* label: [!-.0-~]{1,23} number: [0-9]+ and even that might be too liberal. For fsck sake, all you need is to keep a struct that would contain * state * (partial) number * list of smack_known, with the first element being the partial one. * number of characters already seen in label (label itself stored in the list head ->smk_known) and that's it - just have a switch by ->state to handle the next character. Allocate that struct in ->open(), modify in ->write(), apply the entire thing at once in ->release(). Come on, people, this is ridiculous - why bother reinventing the wheels for the stuff that belongs to exercises halfway through any self-respecting introductory textbook? Scary parser, my arse... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [build bug, 2.6.24-rc1] CONFIG_VIDEO_DEV=m & CONFIG_VIDEO_SAA7146_VV=y
On Sat, 27 Oct 2007 23:30:57 +0200 Johannes Stezenbach wrote: > On Fri, Oct 26, 2007, Ingo Molnar wrote: > > > > the attached config (generated via make randconfig) fails to build due > > to the combination of these config entries: > > > > CONFIG_VIDEO_DEV=m > > CONFIG_VIDEO_SAA7146_VV=y > > > > i found no obvious Kconfig way to force VIDEO_SAA7146_VV to be modular > > when VIDEO_DEV is modular - is there a good solution for this? > > According to http://lkml.org/lkml/2007/10/21/226 : > > config VIDEO_SAA7146_VV > tristate > depends on VIDEO_DEV = y || VIDEO_DEV = VIDEO_SAA7146_VV > select VIDEOBUF_DMA_SG > select VIDEO_SAA7146 > > (untested) Nope, won't work. I tried that last night. VIDEO_DEV_SAA7146_VV has too many "select"s involved, but select doesn't follow the dependency chains. IOW, as written in Documentation/kbuild/kconfig-language.txt: select is evil select will by brute force set a symbol equal to 'y' without visiting the dependencies. So abusing select you are able to select a symbol FOO even if FOO depends on BAR that is not set. --- ~Randy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tristate and bool not enogh for Kconfig anymore
Hi, On Monday 22 October 2007, Randy Dunlap wrote: > ~ > Another common idiom that we see (and sometimes have problems > with) is this: > > When B (module or subsystem) uses interfaces from A (module or > subsystem), A can be linked statically into the kernel image or > can be built as loadable module(s). This limits how B can be > built. If A is linked statically into the kernel image, B can be > built statically or as loadable module(s). However, if A is built > as loadable module(s), then B must be restricted to loadable > module(s) also. This can be expressed in kconfig language as: > > config B > depends on A = y || A = B What you describe is a simple "depends on A" and your example won't work because it adds a recursive dependency. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] cpusets: add interleave_over_allowed option
David wrote: > I think there's a mixup in the flag name [MPOL_MF_RELATIVE] there Most likely. The discussion involving that flag name was kinda mixed up ;). > but I actually would recommend against any flag to effect Choice A. > It's simply going to be too complex to describe and is going to be a > headache to code and support. While I am sorely tempted to agree entirely with this, I suspect that Christoph has a point when he cautions against breaking this kernel API. Especially for users of the set/get mempolicy calls coming in via libnuma, we have to be very careful not to break the current behaviour, whether it is documented API or just an accident of the implementation. There is a fairly deep and important stack of software, involving a well known DBMS product whose name begins with 'O', sitting on that libnuma software stack. Steering that solution stack is like steering a giant oil tanker near shore. You take it slow and easy, and listen closely to the advice of the ancient harbor master. The harbor masters in this case are or were Andi Kleen and Christoph Lameter. > It's simply going to be too complex to describe and is going to be a > headache to code and support. True, which is why I am hoping we can keep this modal flag, if such be, from having to be used on every set/get mempolicy call. The ordinary coder of new code using these calls directly should just see Choice B behaviour. However the user of libnuma should continue to see whatever API libnuma supports, with no change whatsoever, and various versions of libnuma, including those already shipped years ago, must continue to behave without any changes in node numbering. There are two decent looking ways (and some ugly ways) that I can see to accomplish this: 1) One could claim that no important use of Oracle over libnuma over these memory policy calls is happening on a system using cpusets. There would be a fair bit of circumstantial evidence for this claim, but I don't know it for a fact, and would not be the expert to determine this. On systems making no use of cpusets, these two Choices A and B are identical, and this is a non-issue. Those systems will see no API changes whatsoever from any of this. 2) We have a per-task mode flag selecting whether Choice A or B node numbering apply to the masks passed in to set_mempolicy. The kernel implementation is fairly easy. (Yeah, I know, I too cringe everytime I read that line ;) If Choice A is active, then we continue to enforce the current check, in mm/mempolicy.c: contextualize_policy(), that the passed in mask be a subset of the current cpusets allowed memory nodes, and we add a call to nodes_remap(), to map the passed in nodemask from cpuset centric to system centric (if they asked for the fourth node in their current cpuset, they get the fourth node in the entire system.) Similarly, for masks passed back by get_mempolicy, if Choice A is active, we use nodes_remap() to convert the mask from system centric back to cpuset centric. There is a subtle change in the kernel API's here: In current kernels, which are Choice A, if a task is moved from a big cpuset (many nodes) to a small cpuset and then -back- to a big cpuset, the nodemasks returned by get_mempolicy will still show the smaller masks (fewer set nodes) imposed by the smaller cpuset. In todays kernels, once scrunched or folded down, the masks don't recover their original size after the task is moved back to a large cpuset. With this change, even a task asking for Choice A would, once back on a larger cpuset, again see the larger masks from get_mempolicy queries. This is a change in the kernel API's visible to user space; but I really do not think that there is sufficient use of Oracle over libnuma on systems actively moving tasks between differing size cpusets for this to be a problem. Indeed, if there was much such usage, I suspect they'd be complaining that the current kernel API was borked, and they'd be filing a request for enhancement -asking- for just this subtle change in the kernel API's here. In other words, this subtle API change is a feature, not a bug ;) The bulk of the kernel's mempolicy code is coded for Choice B. If Choice B is active, we don't enforce the subset check in contextualize_policy(), and we don't invoke nodes_remap() in either of the set or get mempolicy code paths. A new option to get_mempolicy() would query the current state of this mode flag, and a new option to set_mempolicy() would set and clear this mode flag. Perhaps Christoph had this in mind when he wrote in an earlier message "The alternative is to add new set/get mempolicy functions." The default kernel API for each task would be Choice B
[PATCH 1/3] VFS: apply coding standards to fs/ioctl.c
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]> --- fs/ioctl.c | 164 +++- 1 files changed, 84 insertions(+), 80 deletions(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index c2a773e..652cacf 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -12,8 +12,8 @@ #include #include #include +#include -#include #include static long do_ioctl(struct file *filp, unsigned int cmd, @@ -45,31 +45,31 @@ static int file_ioctl(struct file *filp, unsigned int cmd, { int error; int block; - struct inode * inode = filp->f_path.dentry->d_inode; + struct inode *inode = filp->f_path.dentry->d_inode; int __user *p = (int __user *)arg; switch (cmd) { - case FIBMAP: - { - struct address_space *mapping = filp->f_mapping; - int res; - /* do we support this mess? */ - if (!mapping->a_ops->bmap) - return -EINVAL; - if (!capable(CAP_SYS_RAWIO)) - return -EPERM; - if ((error = get_user(block, p)) != 0) - return error; - - lock_kernel(); - res = mapping->a_ops->bmap(mapping, block); - unlock_kernel(); - return put_user(res, p); - } - case FIGETBSZ: - return put_user(inode->i_sb->s_blocksize, p); - case FIONREAD: - return put_user(i_size_read(inode) - filp->f_pos, p); + case FIBMAP: + { + struct address_space *mapping = filp->f_mapping; + int res; + /* do we support this mess? */ + if (!mapping->a_ops->bmap) + return -EINVAL; + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + error = get_user(block, p); + if (error) + return error; + lock_kernel(); + res = mapping->a_ops->bmap(mapping, block); + unlock_kernel(); + return put_user(res, p); + } + case FIGETBSZ: + return put_user(inode->i_sb->s_blocksize, p); + case FIONREAD: + return put_user(i_size_read(inode) - filp->f_pos, p); } return do_ioctl(filp, cmd, arg); @@ -82,81 +82,85 @@ static int file_ioctl(struct file *filp, unsigned int cmd, * vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d. * It's just a simple helper for sys_ioctl and compat_sys_ioctl. */ -int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, unsigned long arg) +int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, + unsigned long arg) { unsigned int flag; int on, error = 0; switch (cmd) { - case FIOCLEX: - set_close_on_exec(fd, 1); - break; + case FIOCLEX: + set_close_on_exec(fd, 1); + break; - case FIONCLEX: - set_close_on_exec(fd, 0); - break; + case FIONCLEX: + set_close_on_exec(fd, 0); + break; - case FIONBIO: - if ((error = get_user(on, (int __user *)arg)) != 0) - break; - flag = O_NONBLOCK; + case FIONBIO: + error = get_user(on, (int __user *)arg); + if (error) + break; + flag = O_NONBLOCK; #ifdef __sparc__ - /* SunOS compatibility item. */ - if(O_NONBLOCK != O_NDELAY) - flag |= O_NDELAY; + /* SunOS compatibility item. */ + if (O_NONBLOCK != O_NDELAY) + flag |= O_NDELAY; #endif - if (on) - filp->f_flags |= flag; - else - filp->f_flags &= ~flag; + if (on) + filp->f_flags |= flag; + else + filp->f_flags &= ~flag; + break; + + case FIOASYNC: + error = get_user(on, (int __user *)arg); + if (error) break; - - case FIOASYNC: - if ((error = get_user(on, (int __user *)arg)) != 0) - break; - flag = on ? FASYNC : 0; - - /* Did FASYNC state change ? */ - if ((flag ^ filp->f_flags) & FASYNC) { - if (filp->f_op && filp->f_op->fasync) { -
[PATCH 3/3] Unionfs: use vfs_ioctl
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]> --- fs/unionfs/commonfops.c | 32 ++-- 1 files changed, 6 insertions(+), 26 deletions(-) diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c index 50e5775..c99b519 100644 --- a/fs/unionfs/commonfops.c +++ b/fs/unionfs/commonfops.c @@ -661,31 +661,6 @@ out: return err; } -/* pass the ioctl to the lower fs */ -static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - struct file *lower_file; - int err; - - lower_file = unionfs_lower_file(file); - - err = -ENOTTY; - if (!lower_file || !lower_file->f_op) - goto out; - if (lower_file->f_op->unlocked_ioctl) { - err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg); - } else if (lower_file->f_op->ioctl) { - lock_kernel(); - err = lower_file->f_op->ioctl( - lower_file->f_path.dentry->d_inode, - lower_file, cmd, arg); - unlock_kernel(); - } - -out: - return err; -} - /* * return to user-space the branch indices containing the file in question * @@ -752,6 +727,7 @@ out: long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { long err; + struct file *lower_file; unionfs_read_lock(file->f_path.dentry->d_sb); @@ -775,7 +751,11 @@ long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) default: /* pass the ioctl down */ - err = do_ioctl(file, cmd, arg); + lower_file = unionfs_lower_file(file); + if (lower_file) + err = vfs_ioctl(lower_file, cmd, arg); + else + err = -ENOTTY; break; } -- 1.5.2.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] VFS: swap do_ioctl and vfs_ioctl names
Rename old vfs_ioctl to do_ioctl, because the comment above it clearly indicates that it is an internal function not to be exported to modules; therefore it should have a more traditional do_XXX name. The new do_ioctl is exported in fs.h but not to modules. Rename the old do_ioctl to vfs_ioctl because the names vfs_XXX should preferably be reserved to callable VFS functions which modules may call, as many other vfs_XXX functions already do. Export the new vfs_ioctl to modules so others can use it (including Unionfs and eCryptfs). Signed-off-by: Erez Zadok <[EMAIL PROTECTED]> --- fs/compat_ioctl.c |2 +- fs/ioctl.c | 18 ++ include/linux/fs.h |3 ++- 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c index a4284cc..a1604ce 100644 --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -2972,7 +2972,7 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, unsigned int cmd, } do_ioctl: - error = vfs_ioctl(filp, fd, cmd, arg); + error = do_ioctl(filp, fd, cmd, arg); out_fput: fput_light(filp, fput_needed); out: diff --git a/fs/ioctl.c b/fs/ioctl.c index 652cacf..00abbbf 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -16,8 +16,9 @@ #include -static long do_ioctl(struct file *filp, unsigned int cmd, - unsigned long arg) +/* vfs_ioctl can be called by other file systems or modules */ +long vfs_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) { int error = -ENOTTY; @@ -39,6 +40,7 @@ static long do_ioctl(struct file *filp, unsigned int cmd, out: return error; } +EXPORT_SYMBOL(vfs_ioctl); static int file_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) @@ -72,18 +74,18 @@ static int file_ioctl(struct file *filp, unsigned int cmd, return put_user(i_size_read(inode) - filp->f_pos, p); } - return do_ioctl(filp, cmd, arg); + return vfs_ioctl(filp, cmd, arg); } /* * When you add any new common ioctls to the switches above and below * please update compat_sys_ioctl() too. * - * vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d. + * do_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d. * It's just a simple helper for sys_ioctl and compat_sys_ioctl. */ -int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, - unsigned long arg) +int do_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, +unsigned long arg) { unsigned int flag; int on, error = 0; @@ -152,7 +154,7 @@ int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, if (S_ISREG(filp->f_path.dentry->d_inode->i_mode)) error = file_ioctl(filp, cmd, arg); else - error = do_ioctl(filp, cmd, arg); + error = vfs_ioctl(filp, cmd, arg); break; } return error; @@ -172,7 +174,7 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg) if (error) goto out_fput; - error = vfs_ioctl(filp, fd, cmd, arg); + error = do_ioctl(filp, fd, cmd, arg); out_fput: fput_light(filp, fput_needed); out: diff --git a/include/linux/fs.h b/include/linux/fs.h index b3ec4a4..c0c5d36 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1924,7 +1924,8 @@ extern int vfs_stat_fd(int dfd, char __user *, struct kstat *); extern int vfs_lstat_fd(int dfd, char __user *, struct kstat *); extern int vfs_fstat(unsigned int, struct kstat *); -extern int vfs_ioctl(struct file *, unsigned int, unsigned int, unsigned long); +extern long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); +extern int do_ioctl(struct file *, unsigned int, unsigned int, unsigned long); extern void get_filesystem(struct file_system_type *fs); extern void put_filesystem(struct file_system_type *fs); -- 1.5.2.2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 0/3 fs/ioctl.c coding style, rename vfs_ioctl/do_ioctl
This series of three proposed patches changes fs/ioctl.c and Unionfs as follows. This series is against v2.6.24-rc1-192-gef49c32. Patch 1: just applies coding standards to fs/ioctl.c (while I'm at it, I figured it's worth cleaning VFS files one at a time). Patch 2: does two things: (a) Renames the old vfs_ioctl to do_ioctl, because the comment above the old vfs_ioctl clearly indicates that it is an internal function not to be exported to modules; therefore it should have a more traditional do_XXX "internal function" name. The new do_ioctl is exported in fs.h but not to modules. (b) Renames the old (static) do_ioctl to vfs_ioctl because the names vfs_XXX should preferably be reserved to callable VFS functions which modules may call, as other vfs_XXX functions already do. Export the new vfs_ioctl to modules so others can use it (including Unionfs and eCryptfs). Patch 3: demonstrates how Unionfs can use the new vfs_ioctl. I successfully tested unionfs with this new exported vfs_ioctl. (eCryptfs could do the same.) I'd like to propose that the first two patches be merged in -mm and even mainline, pending review. Erez Zadok (3): VFS: apply coding standards to fs/ioctl.c VFS: swap do_ioctl and vfs_ioctl names Unionfs: use vfs_ioctl fs/compat_ioctl.c |2 fs/ioctl.c | 176 fs/unionfs/commonfops.c | 22 +- include/linux/fs.h |3 Cheers, Erez. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
On Friday 26 October 2007 10:44, Peter wrote: > > ...the way the watermarks work they will be evenly distributed > > over the appropriate zones. .. Hi Peter, The term is "highwater mark" not "high watermark". A watermark is an anti-counterfeiting device printed on paper money. "Highwater" is how high water gets, which I believe is the sense we intend in Linux. Therefore any occurrence of "watermark" in the kernel source is a spelling mistake, unless it has something to do with printing paper money. While fixing this entrenched terminology abuse in our kernel source may be difficult, sticking to the correct English on lkml is quite easy :-) Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/9] Unionfs: security convert lsm into a static interface fix
In message <[EMAIL PROTECTED]>, Christoph Hellwig writes: > On Mon, Oct 22, 2007 at 08:48:04PM -0400, Erez Zadok wrote: > > Why? Are you concerned that the security policy may change after a module > > is loaded? > > No, it's a matter of proper layering. We generally don't want modules > like stackabke filesystems to call directly into methods but rather use > proper highlevel VFS helpers to isolate them from details and possible > changes. The move to out of line security_ helpers just put this on the > radard. OK, I'll be shortly posting a couple of patches to fs/ioctl.c. > > I can probably get rid of having unionfs call security_inode_permission, > > by calling permission() myself and carefully post-process its return > > code (unionfs needs to "ignore" EROFS initially, to allow copyup to take > > place). > > Sounds fine. I was able to test this idea and it works fine. Now unionfs calls permission(), post-processes the return value, and I don't need my own modified version of permission() in unionfs. This saved me ~50 LoC and reduced stack pressure a little. > > But security_file_ioctl doesn't have any existing helper I can call. I > > can introduce a trivial vfs_security_file_ioctl wrapper to > > security_file_ioctl, but what about the already existing *19* exported > > security_* functions in security/security.c? Do you want to see simple > > wrappers for all of them? It seems redundant to add a one-line wrapper > > around an already one-line function around security_ops->XXX. Plus, > > some of the existing exported security_* functions are file-system > > related, others are networking, etc. So we'll need wrappers whose names > > are prefixed appropriately: vfs_*, net_*, etc. > > The fix for security_file_ioctl is probably to either not do it at all > or move it the call to security_file_ioctl into vfs_ioctl and get it by > using that helper. I suspect most other security_ exports should be > avoided similarly. Christoph, I looked more closely at that and the selinux code. Only sys_ioctl calls security_file_ioctl. And security_file_ioctl performs all sorts of checks that mostly have to do with the currently running task or the open file. The running task is still the same, whether filesystem stacking is involved or not. Also, the unionfs-level struct file is logically the same file at the lower level: they refer to the same object, just at two layers. I can't see any reason why unionfs_ioctl should have to call security_file_ioctl(lower_file) the way sys_ioctl does: that check is already done well before the file system's ->ioctl is invoked. I also don't see how it would ever be possible that sys_ioctl will succeed in its call to security_file_ioctl(upper_file), but unionfs will fail the same security check on the lower file. So I commented out unionfs's call to security_file_ioctl(lower_file) and tested it on a bunch of things, including an selinux-enabled livecd. Everything seemed to work just fine, so I'll be sending some patches to that effect, and we can drop the -mm patch which exports security_file_ioctl(). BTW, ecryptfs doesn't call security_file_ioctl(). Cheers, Erez. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
On Friday 26 October 2007 10:55, Christoph Lameter wrote: > On Fri, 26 Oct 2007, Pavel Machek wrote: > > > And, _no_, it does not necessarily mean global serialisation. By > > > simply saying there must be N pages available I say nothing about > > > on which node they should be available, and the way the > > > watermarks work they will be evenly distributed over the > > > appropriate zones. > > > > Agreed. Scalability of emergency swapping reserved is simply > > unimportant. Please, lets get swapping to _work_ first, then we can > > make it faster. > > Global reserve means that any cpuset that runs out of memory may > exhaust the global reserve and thereby impact the rest of the system. > The emergencies that are currently localized to a subset of the > system and may lead to the failure of a job may now become global and > lead to the failure of all jobs running on it. If it does, it is a bug in the reserve accounting. That said, I still agree with you that per-node reserve is a desirable goal for numa. I would just like to be clear that it is not necessary, even for numa, just nice. By all means somebody should be hacking on a numa feature for per-node emergency reserves, but as far as fixing the immediate, serious kernel block IO deadlocks goes, it does not matter. Pavel, I do not agree that efficiency is unimportant on the under-pressure path. I do not even like to call that the "emergency" path, because under heavy load it is normal for a machine to spend a significant fraction of its time in that state. However, the efficiency goal there does not need to be quite the same as normal mode. To illustrate, I would expect to see something like 95% of normal block IO performance on a numa machine in the case that "emergency" (aka memalloc memory) is allocated globally instead of locally, thus paying a (modest compared to the disk transfer itself) penalty for transfer of disk data over the numa interconnect. 95% of normal throughput on the block IO path is not a problem: if the machine spends 5% of its time on the "emergency" (aka memalloc) path, then overall efficiency will be 95% * 95% = 99.75%. Moral of this story: let's get the memory recursion fixes done in the most obviously correct way and not get distracted by illusory efficiency requirements for numa, that do not have a big bottom line impact. I'm glad to see everybody still interested in these problems. Though we have been a little quiet on this issue over here for a while, it does not mean that progress has stopped. In fact, we are testing our solutions more heavily than ever, and getting closer to a solution that not only works solidly, but that should enable mass deletion of the whole creaky notion of dirty page limits in favor of nice, tight per-device control of in flight write traffic as I have described previously. Regards, Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] SCSI not showing tray status correctly
> This patch is too simplistic. ide-cd.c:ide_cdrom_drive_status() looks > to be a reasonable implementation. However, the worry is that > GET_EVENT_NOTIFICATION is a MMC command; devices not conforming to MMC > won't support it. In theory, they should just return ILLEGAL_REQUEST, > but USB devices have been known to crash when given commands they don't > understand. How widely tested has this been (if it's been in Gentoo > since 2004, then it's probably widely tested enough)? > > James > This patch hasn't been tested at all, I'm sorry if I gave you that impression, it isn't in Gentoo's patches, it's just been brought to our attention in the bug I linked too. I created another patch, based on your recommendations, and with a lot of help from Daniel Drake ([EMAIL PROTECTED]), that's included below. Some changes are made to test_unit_ready() to be able to pass the sense data to sr_drive_status(). Currently, sr_do_ioctl() swallows this data and makes its own interpretations of sense codes, which isn't what we want here. Is this what you had in mind? Do you think the possible problems with USB drives that you mentioned will prevent this patch from going in? The patch has only been compile tested right now, we can do some real testing later, for now I'd just like to get your feedback on it. Maarten --- --- a/drivers/scsi/sr_ioctl.c 2007-10-26 22:40:41.0 +0200 +++ b/drivers/scsi/sr_ioctl.c 2007-10-27 23:56:16.0 +0200 @@ -275,16 +275,21 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack /* -- */ /* interface to cdrom.c */ -static int test_unit_ready(Scsi_CD *cd) +static int test_unit_ready(Scsi_CD *cd, struct request_sense *sense) { - struct packet_command cgc; + struct scsi_device *SDev = cd->device; + unsigned char cmd[CDROM_PACKET_SIZE] = { GPCMD_TEST_UNIT_READY }; + int result; - memset(, 0, sizeof(struct packet_command)); - cgc.cmd[0] = GPCMD_TEST_UNIT_READY; - cgc.quiet = 1; - cgc.data_direction = DMA_NONE; - cgc.timeout = IOCTL_TIMEOUT; - return sr_do_ioctl(cd, ); + if (!scsi_block_when_processing_errors(SDev)) + return -ENODEV; + + memset(sense, 0, sizeof(*sense)); + result = scsi_execute(SDev, cmd, DMA_NONE, + NULL, 0, (char *)sense, + IOCTL_TIMEOUT, IOCTL_RETRIES, 0); + + return driver_byte(result); } int sr_tray_move(struct cdrom_device_info *cdi, int pos) @@ -310,14 +315,41 @@ int sr_lock_door(struct cdrom_device_inf int sr_drive_status(struct cdrom_device_info *cdi, int slot) { + struct request_sense sense; + struct media_event_desc med; + if (CDSL_CURRENT != slot) { /* we have no changer support */ return -EINVAL; } - if (0 == test_unit_ready(cdi->handle)) + if ((0 == test_unit_ready(cdi->handle, )) || + sense.sense_key == UNIT_ATTENTION) return CDS_DISC_OK; - return CDS_TRAY_OPEN; + if (!cdrom_get_media_event(cdi, )) { + if (med.media_present) + return CDS_DISC_OK; + else if (med.door_open) + return CDS_TRAY_OPEN; + else + return CDS_NO_DISC; + } + + if (sense.sense_key == NOT_READY && sense.asc == 0x04 && sense.ascq == 0x04) + return CDS_DISC_OK; + + /* +* If not using Mt Fuji extended media tray reports, +* just return TRAY_OPEN since ATAPI doesn't provide +* any other way to detect this... +*/ + if (sense.sense_key == NOT_READY) { + if (sense.asc == 0x3a && sense.ascq == 1) + return CDS_NO_DISC; + else + return CDS_TRAY_OPEN; + } + return CDS_DRIVE_NOT_READY; } int sr_disk_status(struct cdrom_device_info *cdi) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: lock held when returning to user space
On Sat, 27 Oct 2007, Peter Zijlstra wrote: > > > [ 592.752781] > > > [ 592.753478] [ BUG: lock held when returning to user space! ] > > > [ 592.753880] > > > [ 592.754262] hwclock/1452 is leaving the kernel with locks still held! > > > [ 592.754655] 1 lock held by hwclock/1452: > > > [ 592.755007] #0: (>char_lock){--..}, at: [] > > > rtc_dev_open+0x2e/0x7e > > Yes, this is because rtc keeps a char_lock mutex locked as long as the > > device is open, to avoid concurrent accessess. > I think, in this case, the lock is associated with a kernel object that > is properly cleaned up if the holding tasks gets a SIGKILL. But in > general I'd like to see this kind of thing go away. Yes, but the fact is that is really is invalid use of mutex -- because the mutex owner could become seriously wrong after fork() or sending the filedescriptor through unix socket ... this easily leads to broken situation. This seems to have been introduced in e824290e5d ... Alessandro, could you convert this to test_and_set_bit()/clear_bit() semantics instead of a mutex please? Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] proc_fs.h redux
On Sat, 2007-10-27 at 23:47 +0400, Alexey Dobriyan wrote: > Remove proc_fs.h from headers that doesn't really need it. > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -17,6 +17,7 @@ > #include > #include > #include > +#include Your code doesn't match your patch description. You've got new includes of: and forward declarations of struct proc_dir_entry; struct file_operations; As a general rule, I think it better to use includes than use naked forward declarations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] select PARAVIRT when PARAVIRT_GUEST on
On Sunday 28 October 2007 00:07:12 Glauber de Oliveira Costa wrote: > Commit d3d1c4bdf16bd154d9f27f34fca28edca90465eb introduces a new config > knob that, according to its changelog, should select CONFIG_PARAVIRT. Hi Glauber, PARAVIRT is now selected by the options which need it. Merely turning on the menu "does not add any kernel code" as promised in the text... Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer
On Sat, 27 Oct 2007, Maxim Levitsky wrote: > > Use del_timer_sync(). It guarantees that when it returns, the timer > > will be stopped and the timer routine will no longer be running on any > > CPU. > > > Even if the timer re-enables itself, are you sure? Last time I looked at the source code, that's what it did. I'll look again... Yep, it still does. It checks to see if the timer routine is currently running; if so then it waits a little while and tries again. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6][RFC] Cleanup FIBMAP
On Sat, 27 Oct 2007, Anton Altaparmakov wrote: > And another of my pet peeves with ->bmap is that it uses 0 to mean "sparse" > which causes a conflict on NTFS at least as block zero is part of the $Boot > system file so it is a real, valid block... NTFS uses -1 to denote sparse > blocks internally. In practice, the meaning of 0 is file system [driver] dependent. For example in case of NTFS-3G it means that the block is sparse or the file is encrypted or compressed, or resident, or it's the $Boot file, or an error happened. Thankfully the widely used FIBMAP users (swapon and the ever less used lilo) are only interested in the non-zero values and they report an error if the driver returns 0 for some reason. Which is perfectly ok since both swaping and Linux booting would fail using a sparse, encrypted, compressed, resident, or the NTFS $Boot file. But in real, both swap files and lilo work fine with NTFS if the needed files were created the way these softwares expect. If not then swapon or lilo will catch and report the file creation error. Afair, somebody is doing (has done?) an indeed much needed, better alternative. Bmap is legacy, thank you Mike for maintaining it. Szaka -- NTFS-3G Lead Developer: http://ntfs-3g.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Networked filesystems vs backing_dev_info
On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote: > On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote: > > On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I had me a little look at bdi usage in networked filesystems. > > > > > > NFS, CIFS, (smbfs), AFS, CODA and NCP > > > > > > And of those, NFS is the only one that I could find that creates > > > backing_dev_info structures. The rest seems to fall back to > > > default_backing_dev_info. > > > > > > With my recent per bdi dirty limit patches the bdi has become more > > > important than it has been in the past. While falling back to the > > > default_backing_dev_info isn't wrong per-se, it isn't right either. > > > > > > Could I implore the various maintainers to look into this issue for > > > their respective filesystem. I'll try and come up with some patches to > > > address this, but feel free to beat me to it. > > > > I would like to understand more about your patches to see what bdi > > values makes sense for CIFS and how to report possible congestion back > > to the page manager. > > So, what my recent patches do is carve up the total writeback cache > size, or dirty page limit as we call it, proportionally to a BDIs > writeout speed. So a fast device gets more than a slow device, but will > not starve it. > > However, for this to work, each device, or remote backing store in the > case of networked filesystems, need to have a BDI. > > > I had been thinking about setting bdi->ra_pages > > so that we do more sensible readahead and writebehind - better > > matching what is possible over the network and what the server > > prefers. > > Well, you'd first have to create backing_dev_info instances before > setting that value :-) > > > SMB/CIFS Servers typically allow a maximum of 50 requests > > in parallel at one time from one client (although this is adjustable > > for some). > > That seems like a perfect point to set congestion. > > So in short, stick a struct backing_dev_info into whatever represents a > client, initialize it using bdi_init(), destroy using bdi_destroy(). Oh, and the most important point, make your fresh I_NEW inodes point to this bdi struct. > Mark it congested once you have 50 (or more) outstanding requests, clear > congestion when you drop below 50. > > and you should be set. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)
On Sat, 2007-10-27 at 23:08 +0200, Kay Sievers wrote: > On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: > > Ah, I see a few problems. Here, try this version instead. It's > > compile-tested only, and should be a lot simpler. > > > > Note, we still are not setting the parent to the new bdi structure > > properly, so the devices will show up in /sys/devices/virtual/ instead > > of in their proper location. To do this, we need the parent of the > > device, which I'm not so sure what it should be (block device? block > > device controller?) > > Assigning a parent device will only work with the upcoming conversion of > the raw kobjects in the block subsystem to "struct device". > > A few comments to the patch: > > > --- a/include/linux/string.h > > +++ b/include/linux/string.h > > @@ -8,6 +8,7 @@ > > #include /* for inline */ > > #include/* for size_t */ > > #include /* for NULL */ > > +#include > > > > #ifdef __cplusplus > > extern "C" { > > @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si > > extern char **argv_split(gfp_t gfp, const char *str, int *argcp); > > extern void argv_free(char **argv); > > > > +char *kvprintf(const char *fmt, va_list args); > > +char *kprintf(const char *fmt, ...); > > Why is that here? I don't think we need this when we use the existing: > kvasprintf(GFP_KERNEL, fmt, args) Ignorance of the existance of said function. Thanks for pointing it out. (kobject_set_name ought to use it too I guess) > > --- a/mm/backing-dev.c > > +++ b/mm/backing-dev.c > > > + > > +static struct device_attribute bdi_dev_attrs[] = { > > + __ATTR(readahead, 0644, readahead_show, readahead_store), > > + __ATTR_RO(reclaimable), > > + __ATTR_RO(writeback), > > + __ATTR_RO(dirty), > > + __ATTR_RO(bdi_dirty), > > +}; > > Default attributes will need the NULL termination back (see below). > > > +static __init int bdi_class_init(void) > > +{ > > + bdi_class = class_create(THIS_MODULE, "bdi"); > > + return 0; > > +} > > + > > +__initcall(bdi_class_init); > > + > > +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) > > This function should accept a: "struct device *parent" and all callers > just pass NULL until the block layer conversion gets merged. Yeah, you're right, but I wanted to just get something working before bothering with the parent thing. > > +{ > > + char *name; > > + va_list args; > > + int ret = -ENOMEM; > > + int i; > > + > > + va_start(args, fmt); > > + name = kvprintf(fmt, args); > > kvasprintf(GFP_KERNEL, fmt, args); > > > + va_end(args); > > + > > + if (!name) > > + return -ENOMEM; > > + > > + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name); > > The parent should be passed here. > > > + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) { > > + ret = device_create_file(bdi->dev, _dev_attrs[i]); > > + if (ret) > > + break; > > + } > > + if (ret) { > > + while (--i >= 0) > > + device_remove_file(bdi->dev, _dev_attrs[i]); > > + device_unregister(bdi->dev); > > + bdi->dev = NULL; > > + } > > All this open-coded attribute stuff should go away and be replaced by: > bdi_class->dev_attrs = bdi_dev_attrs; > Otherwise at event time the attributes are not created and stuff hooking > into the events will not be able to set values. Also, the core will do > proper add/remove and error handling then. ok, that's good to know. someone ought to write a book on how to use all this... really, even the functions are bare of documentation or comments. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [build bug, 2.6.24-rc1] CONFIG_VIDEO_DEV=m & CONFIG_VIDEO_SAA7146_VV=y
On Fri, Oct 26, 2007, Ingo Molnar wrote: > > the attached config (generated via make randconfig) fails to build due > to the combination of these config entries: > > CONFIG_VIDEO_DEV=m > CONFIG_VIDEO_SAA7146_VV=y > > i found no obvious Kconfig way to force VIDEO_SAA7146_VV to be modular > when VIDEO_DEV is modular - is there a good solution for this? According to http://lkml.org/lkml/2007/10/21/226 : config VIDEO_SAA7146_VV tristate depends on VIDEO_DEV = y || VIDEO_DEV = VIDEO_SAA7146_VV select VIDEOBUF_DMA_SG select VIDEO_SAA7146 (untested) Johannes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Networked filesystems vs backing_dev_info
On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote: > On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I had me a little look at bdi usage in networked filesystems. > > > > NFS, CIFS, (smbfs), AFS, CODA and NCP > > > > And of those, NFS is the only one that I could find that creates > > backing_dev_info structures. The rest seems to fall back to > > default_backing_dev_info. > > > > With my recent per bdi dirty limit patches the bdi has become more > > important than it has been in the past. While falling back to the > > default_backing_dev_info isn't wrong per-se, it isn't right either. > > > > Could I implore the various maintainers to look into this issue for > > their respective filesystem. I'll try and come up with some patches to > > address this, but feel free to beat me to it. > > I would like to understand more about your patches to see what bdi > values makes sense for CIFS and how to report possible congestion back > to the page manager. So, what my recent patches do is carve up the total writeback cache size, or dirty page limit as we call it, proportionally to a BDIs writeout speed. So a fast device gets more than a slow device, but will not starve it. However, for this to work, each device, or remote backing store in the case of networked filesystems, need to have a BDI. > I had been thinking about setting bdi->ra_pages > so that we do more sensible readahead and writebehind - better > matching what is possible over the network and what the server > prefers. Well, you'd first have to create backing_dev_info instances before setting that value :-) > SMB/CIFS Servers typically allow a maximum of 50 requests > in parallel at one time from one client (although this is adjustable > for some). That seems like a perfect point to set congestion. So in short, stick a struct backing_dev_info into whatever represents a client, initialize it using bdi_init(), destroy using bdi_destroy(). Mark it congested once you have 50 (or more) outstanding requests, clear congestion when you drop below 50. and you should be set. signature.asc Description: This is a digitally signed message part
Re: [BUG] [linux-pm] Commit "Hibernation: Enter platform hibernation state in a consistent way)" makes my system to resume instantly from S4
On Saturday, 27 October 2007 14:05, Maxim Levitsky wrote: > Hi, > > Recently I noticed that my system resumes just after suspend to disk. > > I traced this to commit 9cd9a0058dd35268b24fa16795a92c800f4086d4. > > Note: > > This happens only if I enable WOL using /proc/acpi/wakeup > (echo "ILAN" > /proc/acpi/wakeup) What happens after a suspend to RAM? > and have > "ACPI-Hibernate-erroneously-disabled-Suspend-wakeup" applied, since otherwise > all wake-up sources are disabled in S4. > > > Clearly the above commit confuses the BIOS. > Using latest -git with the above patch reverted makes everything work again > fine. Well, this patch is needed to make wakeup from peripherals (eg. RTC alarm) work on some boxes. The symptom that you describe is similar to what I'm observing after a suspend to RAM on one test box. Can you produce a log of kernel messages printed before powering off the system? Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] cpusets: add interleave_over_allowed option
David wrote: > I prefer Choice B because it does not force mempolicies to have any > dependence on cpusets with regard to what nodemask is passed. Yes, well said. > It would be very good to store the passed nodemask to set_mempolicy in > struct mempolicy, Yes - that's what I'm intending to do. > If the cpuset has fewer than four nodes, the behavior > should be undefined (probably implemented to just cycle the set of > mems_allowed until you reach the fourth entry). I do intend to implement it as you suggest. See the lib/bitmap.c routines bitmap_remap() and bitmap_bitremap(), and the nodemask wrappers for these, nodes_remap() and node_remap(). They will define the cycling, or I sometimes call it folding. I would have tended to make this folding a defined part of the API, though I will grant that the possibility of being lazy and forgetting to document it seems attractive (less to document ;). > That [running in a cpuset with fewer nodes than used in a memory policy > mask] is the result of constraining a task to a cpuset that obviously > wants access to more nodes -- it's a userspace mistake and abusing > cpusets so that the task does not get what it expects. Nah - I wouldn't put it that way. It's no mistake or abuse. It's just one more example of a kernel making too few resources look sufficient by sharing, multiplexing and virtualizing them. That's what kernels do. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
Hi, On Sun, 28 Oct 2007, Alexey Dobriyan wrote: > > If it's just about the type checking, something like below should pretty > > much do the same. > > It won't catch, the following if both variables are unsigned long: > > spin_lock_irqsave(, flags); > [stuff] > spin_unlock_irqrestore(, foo->flags); > > It won't catch "static unsigned long flags;". With sparse, we can > eventually mark type as "on-stack only". It should be on the stack, but we have cases where a pointer to it is used (e.g. lock_timer_base). How do you want to deal with this? bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: fixing up DRM device model usage
On Fri, 2007-10-26 at 14:31 -0700, Jesse Barnes wrote: > On Friday, October 26, 2007 12:08 pm Kay Sievers wrote: > > The open coded: device_create_file(>dev, _attrs[i]) > > should probably replaced by passing the array to the class, and the > > core will do that for you. > > You mean just set drm_class->dev_attrs = device_attrs? I didn't see in > the core device model code where that would create the files... Yeah, it is in device_add_attrs() in drivers/base/core.c. Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
On Sun, 2007-10-28 at 00:14 +0400, Alexey Dobriyan wrote: > On Sat, Oct 27, 2007 at 09:20:43PM +0200, Roman Zippel wrote: > > On Sun, 21 Oct 2007, Alexey Dobriyan wrote: > > > > > So far remedies were: > > > a) grep(1) -- obviously fragile. I tried at some point grepping for > > >spin_lock_irqsave(), found quite a few, but it became bring > > > quickly. > > > b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried, > > >brutally broke some arches, survived one commit before revert :^) > > >Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned > > > long). > > > > > > So it would be nice to have something more robust. > > > > If it's just about the type checking, something like below should pretty > > much do the same. > > It won't catch, the following if both variables are unsigned long: > > spin_lock_irqsave(, flags); > [stuff] > spin_unlock_irqrestore(, foo->flags); > > It won't catch "static unsigned long flags;". With sparse, we can > eventually mark type as "on-stack only". > > +static __always_inline void __irq_flags_check(unsigned long *flags) > > +{ + BUILD_BUG_ON(!__builtin_stack_addr(flags)); > > +} > > + obviously gcc doesn't (yet) support that __builtin function, but you could make it work for sparse and define a dummy for gcc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)
On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: > On Sat, Oct 27, 2007 at 10:39:59AM +0200, Peter Zijlstra wrote: > > On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote: > > > On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: > > > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: > > > > > This crashes and burns on bootup, but I'm too tired to figure out > > > > > what I > > > > > did wrong... will give it another try tomorrow.. > > > > > > > > Ok, can't sleep.. took a look. I have several problems here. > > > > > > > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that > > > > makes it boot. Albeit it then warns me of multiple duplicate sysfs > > > > objects, all named "bdi". > > > I'll look at this and see what I can come up with. Would you just like > > > a whole new patch, or one against this one? > > > > Sorry for the grumpy note, I get that way at 3.30 am. Maybe I ought not > > have mailed :-/ > > > > This is the code I had at that time. > > Ah, I see a few problems. Here, try this version instead. It's > compile-tested only, and should be a lot simpler. > > Note, we still are not setting the parent to the new bdi structure > properly, so the devices will show up in /sys/devices/virtual/ instead > of in their proper location. To do this, we need the parent of the > device, which I'm not so sure what it should be (block device? block > device controller?) Assigning a parent device will only work with the upcoming conversion of the raw kobjects in the block subsystem to "struct device". A few comments to the patch: > --- a/include/linux/string.h > +++ b/include/linux/string.h > @@ -8,6 +8,7 @@ > #include /* for inline */ > #include /* for size_t */ > #include /* for NULL */ > +#include > > #ifdef __cplusplus > extern "C" { > @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si > extern char **argv_split(gfp_t gfp, const char *str, int *argcp); > extern void argv_free(char **argv); > > +char *kvprintf(const char *fmt, va_list args); > +char *kprintf(const char *fmt, ...); Why is that here? I don't think we need this when we use the existing: kvasprintf(GFP_KERNEL, fmt, args) > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > + > +static struct device_attribute bdi_dev_attrs[] = { > + __ATTR(readahead, 0644, readahead_show, readahead_store), > + __ATTR_RO(reclaimable), > + __ATTR_RO(writeback), > + __ATTR_RO(dirty), > + __ATTR_RO(bdi_dirty), > +}; Default attributes will need the NULL termination back (see below). > +static __init int bdi_class_init(void) > +{ > + bdi_class = class_create(THIS_MODULE, "bdi"); > + return 0; > +} > + > +__initcall(bdi_class_init); > + > +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...) This function should accept a: "struct device *parent" and all callers just pass NULL until the block layer conversion gets merged. > +{ > + char *name; > + va_list args; > + int ret = -ENOMEM; > + int i; > + > + va_start(args, fmt); > + name = kvprintf(fmt, args); kvasprintf(GFP_KERNEL, fmt, args); > + va_end(args); > + > + if (!name) > + return -ENOMEM; > + > + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name); The parent should be passed here. > + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) { > + ret = device_create_file(bdi->dev, _dev_attrs[i]); > + if (ret) > + break; > + } > + if (ret) { > + while (--i >= 0) > + device_remove_file(bdi->dev, _dev_attrs[i]); > + device_unregister(bdi->dev); > + bdi->dev = NULL; > + } All this open-coded attribute stuff should go away and be replaced by: bdi_class->dev_attrs = bdi_dev_attrs; Otherwise at event time the attributes are not created and stuff hooking into the events will not be able to set values. Also, the core will do proper add/remove and error handling then. Thanks, Kay - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Networked filesystems vs backing_dev_info
On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > Hi, > > I had me a little look at bdi usage in networked filesystems. > > NFS, CIFS, (smbfs), AFS, CODA and NCP > > And of those, NFS is the only one that I could find that creates > backing_dev_info structures. The rest seems to fall back to > default_backing_dev_info. > > With my recent per bdi dirty limit patches the bdi has become more > important than it has been in the past. While falling back to the > default_backing_dev_info isn't wrong per-se, it isn't right either. > > Could I implore the various maintainers to look into this issue for > their respective filesystem. I'll try and come up with some patches to > address this, but feel free to beat me to it. I would like to understand more about your patches to see what bdi values makes sense for CIFS and how to report possible congestion back to the page manager. I had been thinking about setting bdi->ra_pages so that we do more sensible readahead and writebehind - better matching what is possible over the network and what the server prefers.SMB/CIFS Servers typically allow a maximum of 50 requests in parallel at one time from one client (although this is adjustable for some). The CIFS client prefers to do writes 14 pages (an iovec of 56K) at a time (although many servers can efficiently handle multiple of these 56K writes in parallel). With minor changes CIFS could handle even larger writes (to just under 64K for Windows and just under 128K for Samba - the current CIFS Unix Extensions allow servers to negotiate much larger writes, but lacking a "receivepage" equivalent Samba does not currently support larger than 128K). Ideally, to improve large file copy utilization, I would like to see from 3-10 writes of 56K (or larger in the future) in parallel. The read path is harder since we only do 16K reads to Windows and Samba - but we need to increase the number of these that are done in parallel on the same inode. There is a large Google Summer of Code patch for this which needs more review. -- Thanks, Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] cpusets: add interleave_over_allowed option
> > You have chosen (1) above, which keeps Choice A as the default. > > There can be different defaults for the user space API via libnuma that > are indepdent from the kernel API which needs to remain stable. The kernel > API can be extended but not changed. Yes - the user level code can have different defaults too. I was discussing what should be the default kernel API. > None of those [alternatives] sound appealing. Multiple processes may run > in one cpuset. Well, that would justify keeping this choice per-task. I tend to agree with that. But that doesn't justify having to specify it on each system call. In another reply David recommends against supporting Choice A at all. I'm inclined to agree with him. I'll reply there, with more thoughts. But if we did support Choice A, as a backwards compatible alternative to Choice B, I'd suggest a per-task mode, not per-system call mode. This would reduce the impact on the API of the ugly, unobvious, modal flag needed to select the optional, non kernel default, Choice B semantics. I still have low confidence that you (Christoph) and I have the same understanding of what these Choice A and B are. Hopefully you can address that, perhaps by briefly describing these choices in your words. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: gfs2_fh_to_parent() array overflow
On Wed, Oct 24, 2007 at 06:26:26PM +0200, Adrian Bunk wrote: > The Coverity checker spotted the following array overflow caused by > commit 34c0d154243dd913c5690ae6ceb9557017429b9c: The line is a left-over from times when gfs stored the mode of the inode in the file handle. It can simply be deleted. Steve, do you want a patch for that or could you commit that one-liner directly? > fs/gfs2/ops_export.c contains: > > <-- snip --> > > ... > static struct dentry *gfs2_fh_to_parent(struct super_block *sb, struct fid > *fid, > int fh_len, int fh_type) > { > struct gfs2_inum_host parent; > __be32 *fh = (__force __be32 *)fid->raw; < > > switch (fh_type) { > case GFS2_LARGE_FH_SIZE: > case GFS2_OLD_FH_SIZE: > parent.no_formal_ino = ((u64)be32_to_cpu(fh[4])) << 32; > parent.no_formal_ino |= be32_to_cpu(fh[5]); > parent.no_addr = ((u64)be32_to_cpu(fh[6])) << 32; >^ > parent.no_addr |= be32_to_cpu(fh[7]); > ... ^ > > <-- snip --> > > > cu > Adrian > > -- > >"Is there not promise of rain?" Ling Tan asked suddenly out > of the darkness. There had been need of rain for many days. >"Only a promise," Lao Er said. >Pearl S. Buck - Dragon Seed ---end quoted text--- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel
> +/** > + * smk_write_cipso - write() for /smack/cipso > + * @filp: file pointer, not actually used > + * @buf: where to get the data from > + * @count: bytes sent > + * @ppos: where to start > + * > + * Returns number of bytes written or error code, as appropriate > + */ > +static ssize_t smk_write_cipso(struct file *file, const char __user *buf, > +size_t count, loff_t *ppos) > +{ [...] > + > + /* > + * Only allow one writer at a time. Writes should be > + * quite rare and small in any case. > + */ > + mutex_lock(_cipso_lock); > + > + *(data + count) = '\0'; > + > + for (eolp = strchr(data, '\n'), linep = data; > + eolp != NULL && rc >= 0; > + linep = eolp + 1, eolp = strchr(linep, '\n')) { > + The problem here (As discussed in private mails) is that the for loop assumes that the beginning of given user-space buffer is the beginning of a rule. This leads to situations where the rule becomes "ecret 20", or "cret 20" instead of "Secret 20". Big input buffers/files leads smack to recieve a rule like "Secret 20" in fragmented chunks like: write("\nSec", ..) write("r", 1, ..) write("et 20\n", ..) Parsing a rule in such tough conditions in _kernel space_ is very hard. I began to feel that it will be much easier if we do the parsing in a userspace utility and let smack accept only small buffers (80 char). i.e. A user space utility that takes a big input file like exit 10/3,7,4 exit 10/3,7,4 exit 10/3,7,4 <100 times> And transform it to 100 small write() calls. By this way we can return -EINVAL if write()'s count field > 80, or if input contains no \n or more than one. Any Ideas ?. Casey, I can begin modifying cipso_write() and writing this small user-space utility now if you agree on this. Regards, -- Ahmed S. Darwish HomePage: http://darwish.07.googlepages.com Blog: http://darwish-07.blogspot.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
In message <[EMAIL PROTECTED]>, Hugh Dickins writes: > On Mon, 22 Oct 2007, Erez Zadok wrote: [...] > > If you've got suggestions how I can handle unionfs_write more cleanly, or > > comments on the above possibilities, I'd love to hear them. > > For now I think you should pursue the ~(__GFP_FS|__GFP_IO) idea somehow. > > Hugh Hugh, thanks for the great explanations and suggestions (in multiple emails). I'm going to test all of those soon. Erez. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [AppArmor 00/45] AppArmor security module overview
On Fri, Oct 26, 2007 at 07:37:21AM -0700, Arjan van de Ven wrote: > before going into the LSM / security side of things, I'd like to get > the VFS guys to look at your VFS interaction code. It's been NACKed a few times, and just reposting it won't help. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
On Oct 28 2007 00:24, Alexey Dobriyan wrote: > >> (Which would be the logical choice if it were a function and not a >> macro...) That would flag up all violations ("without cast to different >> pointer" or so) while usually not breaking compilation. >> >> Of course, irq_flags_t is probably the best long-term solution if one >> wants to hide a struct. (Even then perhaps, use a pointer instead?) > >IIRC, Christoph mentioned: > > irq_flags_t flags; > > flags = spin_lock_irqXXX(); > spin_unlock_irqYYY(, flags); > >where XXX and YYY are still to be found good names :^) It's also a solution >without flag day and with more sane lock part -- "how flags are modified >if they are passed by value?" > >I start to like this proposal but I can't come up with good names. > The BSD way: just add a number -- spin_lock_irq2() Of course names are preferable. irq_spin_lock and irq_spin_unlock? spinirq_lock, spinirq_unlock? spin_lock_irq_disable, spin_lock_irq_enable (a bit verbose...) Maybe the 'irq' part should be completely dropped from the name, as with -rt extensions, it behaves differently, does not it? (Or was it preemption?) If it was a home project, I'd just flag it, if it was a business project, I'd add the number, for Linux - ha, too big for me :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
On Thu, Oct 25, 2007 at 05:40:03PM +0200, Jan Engelhardt wrote: > On Oct 21 2007, Alexey Dobriyan wrote: > > > >One of type of bugs steadily being fought is the following one: > > > > unsigned int flags; > > spin_lock_irqsave(, flags); > > > >where "flags" should be "unsigned long". Here is far from complete list > >of commits fixing such bugs: > > > > How about making spin_lock_irqsave actually take a pointer to flags? How would you do it without flag day? > (Which would be the logical choice if it were a function and not a > macro...) That would flag up all violations ("without cast to different > pointer" or so) while usually not breaking compilation. > > Of course, irq_flags_t is probably the best long-term solution if one > wants to hide a struct. (Even then perhaps, use a pointer instead?) IIRC, Christoph mentioned: irq_flags_t flags; flags = spin_lock_irqXXX(); spin_unlock_irqYYY(, flags); where XXX and YYY are still to be found good names :^) It's also a solution without flag day and with more sane lock part -- "how flags are modified if they are passed by value?" I start to like this proposal but I can't come up with good names. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scatterlist fallout: drivers/scsi/arm/
On Sat, Oct 27, 2007 at 07:44:04PM +0100, Al Viro wrote: > > Signed-off-by: Al Viro <[EMAIL PROTECTED]> > --- > diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h > index 21ba571..2d09fef 100644 > --- a/drivers/scsi/arm/scsi.h > +++ b/drivers/scsi/arm/scsi.h > @@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp) > SCp->buffer++; > SCp->buffers_residual--; > SCp->ptr = (char *) > - (page_address(SCp->buffer->page) + > + (page_address(sg_page(SCp->buffer)) + > SCp->buffer->offset); > SCp->this_residual = SCp->buffer->length; > } else { > @@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt) > SCpnt->SCp.buffer = (struct scatterlist *) > SCpnt->request_buffer; > SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1; > SCpnt->SCp.ptr = (char *) > - (page_address(SCpnt->SCp.buffer->page) + > + (page_address(sg_page(SCpnt->SCp.buffer)) + > SCpnt->SCp.buffer->offset); > SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length; > SCpnt->SCp.phase = SCpnt->request_bufflen; FYI, here's what I have queued (cut'n'pasted so probably won't apply): diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h index 21ba571..bb6550e 100644 --- a/drivers/scsi/arm/scsi.h +++ b/drivers/scsi/arm/scsi.h @@ -38,9 +38,7 @@ static inline int next_SCp(struct scsi_pointer *SCp) if (ret) { SCp->buffer++; SCp->buffers_residual--; - SCp->ptr = (char *) -(page_address(SCp->buffer->page) + - SCp->buffer->offset); + SCp->ptr = sg_virt(SCp->buffer); SCp->this_residual = SCp->buffer->length; } else { SCp->ptr = NULL; @@ -76,9 +74,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt) SCpnt->SCp.buffer = (struct scatterlist *) SCpnt->request_buffer; SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1; - SCpnt->SCp.ptr = (char *) -(page_address(SCpnt->SCp.buffer->page) + - SCpnt->SCp.buffer->offset); + SCpnt->SCp.ptr = sg_virt(SCpnt->SCp.buffer); SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length; SCpnt->SCp.phase = SCpnt->request_bufflen; -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] duplicate initializer in sound/pci/hda/patch_realtek.c
Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index d9f78c8..1c50278 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -9299,7 +9299,6 @@ static struct alc_config_preset alc268_presets[] = { .num_channel_mode = ARRAY_SIZE(alc268_modes), .channel_mode = alc268_modes, .input_mux = _capture_source, - .input_mux = _capture_source, .unsol_event = alc268_toshiba_unsol_event, .init_hook = alc268_toshiba_automute, }, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
On Sat, Oct 27, 2007 at 09:20:43PM +0200, Roman Zippel wrote: > On Sun, 21 Oct 2007, Alexey Dobriyan wrote: > > > So far remedies were: > > a) grep(1) -- obviously fragile. I tried at some point grepping for > >spin_lock_irqsave(), found quite a few, but it became bring quickly. > > b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried, > >brutally broke some arches, survived one commit before revert :^) > >Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned long). > > > > So it would be nice to have something more robust. > > If it's just about the type checking, something like below should pretty > much do the same. It won't catch, the following if both variables are unsigned long: spin_lock_irqsave(, flags); [stuff] spin_unlock_irqrestore(, foo->flags); It won't catch "static unsigned long flags;". With sparse, we can eventually mark type as "on-stack only". > > * irq_flags_t allows arch maintainers to eventually switch to something > > smaller than "unsigned long" if they want to. > > Considering how painful this conversion could be, the question is whether > this is really needed. In the long run, I believe, this is needed. We want more bugs to be found automatically, so we can have more time for non-trivial bugs. > Comments so far suggest some archs don't need all > of it, but does someone need more? I don't know. > --- linux-2.6.orig/include/linux/irqflags.h > +++ linux-2.6/include/linux/irqflags.h > @@ -41,6 +41,10 @@ > # define INIT_TRACE_IRQFLAGS > #endif > > +static __always_inline void __irq_flags_check(unsigned long *flags) > +{ > +} > + > #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT > > #include > @@ -50,10 +54,11 @@ > #define local_irq_disable() \ > do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0) > #define local_irq_save(flags) \ > - do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0) > + do { __irq_flags_check(); raw_local_irq_save(flags); > trace_hardirqs_off(); } while (0) > > #define local_irq_restore(flags) \ > do {\ > + __irq_flags_check(); \ > if (raw_irqs_disabled_flags(flags)) { \ > raw_local_irq_restore(flags); \ > trace_hardirqs_off(); \ > @@ -69,8 +74,8 @@ > */ > # define raw_local_irq_disable() local_irq_disable() > # define raw_local_irq_enable() local_irq_enable() > -# define raw_local_irq_save(flags) local_irq_save(flags) > -# define raw_local_irq_restore(flags)local_irq_restore(flags) > +# define raw_local_irq_save(flags) ({ __irq_flags_check(); > local_irq_save(flags); }) > +# define raw_local_irq_restore(flags)({ __irq_flags_check(); > local_irq_restore(flags); }) > #endif /* CONFIG_TRACE_IRQFLAGS_SUPPORT */ > > #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rt4 (was 2.6.23-rt1 trouble)
Steven Rostedt wrote: > -- > > On Sat, 27 Oct 2007, Rui Nuno Capela wrote: > >>> On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote: I am experiencing some highly annoying but intermitent freezing on a pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1. The same crippling behavior does not occur on a Core 2 Due T7200 2.0G SMP, so I suspect it's something due specific to the SMT scheduling support (Hyper-Threading). But can't tell for sure, obviously :) >>> I was wrong. After several trials the same behavior also occurs on the >>> Core2 Duo T7200. It just took longer to show its nasty. >>> >>> The symptoms are noticeable primarily as some X/GUI intermitent freezing, sometimes only one application, then several and ultimately the whole X desktop becomes completely unresponsive. It looks like scheduling > > When things start to freeze, could you capture the output of a sysrq-t. > yes, you can find a complete serial console capture here, where it holds the final SysRq-T output: http://www.rncbc.org/datahub/console-2.6.23.1-rt4.1-1.log the corresponding .config is also there: http://www.rncbc.org/datahub/config-2.6.23.1-rt4.1 the reason this is a new kernel build, following a shot in the dark from nick mainsbridge, which let out the ntfs module build (CONFIG_NTFS_FS is not set) suggesting that would mitigate similar freezes. in deed the general feel is that it seems to run longer and less prone to those incidents, but that is just a gut feeling, nothing more. hope this be any useful. > problems. There is this hint that switching to a spare console terminal (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of some more time for it just happens again and again, one after another, several applications becoming temporarily frozen and just by luck the system gets back to normal, probably due to some incidental shake-up :) but there are other times that nothing seems to help with no alternative to the power-reset switch. I could not find any evidence on dmesg or in the system logs, of any apparent trouble. No BUGs, no oops, no panics, no nothing. It just freezes, this and that, now and then. It just makes it all unworkable and obviously subject to ditching. Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with same 2.6.23-rt1 with the very same kernel configuration, it does not show any illness and is running quite fine. >>> False. It used to run fine, until the creeps happen first time :( >>> >>> Remember one report I had about a similar freezing behavior? Now it's happening the other way around: the core2 is OK, the pentium4 is KO. >>> Now it applies to all 2.6.23-rt1 images I could test upon. >>> >>> One naive suspicion goes like the new rcu-preempt code is to blame, since I don't remember having this or any other trouble with 2.6.23-rc8-rt1. >>> Not be sure anymore, but this seems to be still a valid assumption. >>> >> just to let you know that still the same trouble persists with 2.6.23.1-rt4 >> >> .config can be found here: >>http://www.rncbc.org/datahub/config-2.6.23.1-rt4.0 >> > > I have a P4HT laptop (unfortunately with no serial). I use it as one of my > main machines, so it will suck for me when it freezes ;-). I'll take your > config and try it out. > > I'll most likely do this on Monday since process Wife has the highest > priority over the weekend ;-) > that is also true here :) otoh, as reported before, the freezes are not exclusive to P4HT, which is the main box I'm been reporting here (the one which has a good old serial port anyway), but also applies to my other laptop, an core2 duo t7200. bye -- rncbc aka Rui Nuno Capela [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] fix breakage in pegasos_eth (fallout from commit b45d9147f1582333e180e1023624c003874b7312)
Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/include/linux/mv643xx_eth.h b/include/linux/mv643xx_eth.h index 3f27239..8df230a 100644 --- a/include/linux/mv643xx_eth.h +++ b/include/linux/mv643xx_eth.h @@ -8,6 +8,9 @@ #define MV643XX_ETH_NAME "mv643xx_eth" #define MV643XX_ETH_SHARED_REGS0x2000 #define MV643XX_ETH_SHARED_REGS_SIZE 0x2000 +#define MV643XX_ETH_BAR_4 0x220 +#define MV643XX_ETH_SIZE_REG_4 0x224 +#define MV643XX_ETH_BASE_ADDR_ENABLE_REG 0x0290 struct mv643xx_eth_platform_data { int port_number; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Git pull] x86 bugfixes
Linus, please pull x86 bugfixes from ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git Some more voyager fixes and other compile warning fixups than in the yesterday pull request. Thanks, tglx Eric W. Biederman (1): x86: Fix boot protocol KEEP_SEGMENTS check. James Bottomley (1): x86: voyager: fix bogus conversion to per_cpu for boot_cpu_info Jeff Garzik (2): x86: fix !SMP compiler warning in arch/x86/kernel/acpi/processor.c x86: fix compiler warnings in arch/x86/kernel/early-quirks.c Ken'ichi Ohmichi (1): x86: Dump filtering supports x86_64 sparsemem Thomas Gleixner (2): Revert "i386: export i386 smp_call_function_mask() to modules" x86: export smp_ops to allow modular build of KVM arch/x86/boot/compressed/head_32.S | 12 arch/x86/boot/compressed/head_64.S |7 +++ arch/x86/kernel/acpi/processor.c|3 +-- arch/x86/kernel/asm-offsets_64.c| 10 ++ arch/x86/kernel/early-quirks.c |4 +++- arch/x86/kernel/machine_kexec_64.c |2 ++ arch/x86/kernel/smp_32.c|8 +--- arch/x86/mach-voyager/voyager_smp.c |4 ++-- include/asm-x86/smp_32.h|9 ++--- 9 files changed, 36 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S index a0ae2e7..036e635 100644 --- a/arch/x86/boot/compressed/head_32.S +++ b/arch/x86/boot/compressed/head_32.S @@ -33,24 +33,20 @@ .globl startup_32 startup_32: - /* check to see if KEEP_SEGMENTS flag is meaningful */ - cmpw $0x207, BP_version(%esi) - jb 1f - + cld /* test KEEP_SEGMENTS flag to see if the bootloader is asking * us to not reload segments */ testb $(1<<6), BP_loadflags(%esi) - jnz 2f + jnz 1f -1: cli + cli movl $(__BOOT_DS),%eax movl %eax,%ds movl %eax,%es movl %eax,%fs movl %eax,%gs movl %eax,%ss - -2: cld +1: /* Calculate the delta between where we were compiled to run * at and where we were actually loaded at. This can only be done diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index 4946764..1ccb38a 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -29,6 +29,7 @@ #include #include #include +#include .section ".text.head" .code32 @@ -36,11 +37,17 @@ startup_32: cld + /* test KEEP_SEGMENTS flag to see if the bootloader is asking +* us to not reload segments */ + testb $(1<<6), BP_loadflags(%esi) + jnz 1f + cli movl$(__KERNEL_DS), %eax movl%eax, %ds movl%eax, %es movl%eax, %ss +1: /* Calculate the delta between where we were compiled to run * at and where we were actually loaded at. This can only be done diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c index 2ed0a4c..f63e5ff 100644 --- a/arch/x86/kernel/acpi/processor.c +++ b/arch/x86/kernel/acpi/processor.c @@ -62,8 +62,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct cpuinfo_x86 *c) /* Initialize _PDC data based on the CPU vendor */ void arch_acpi_processor_init_pdc(struct acpi_processor *pr) { - unsigned int cpu = pr->id; - struct cpuinfo_x86 *c = _data(cpu); + struct cpuinfo_x86 *c = _data(pr->id); pr->pdc = NULL; if (c->x86_vendor == X86_VENDOR_INTEL) diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index 7e50bda..d1b6ed9 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -15,12 +15,16 @@ #include #include #include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) #define BLANK() asm volatile("\n->" : : ) +#define OFFSET(sym, str, mem) \ + DEFINE(sym, offsetof(struct str, mem)) + #define __NO_STUBS 1 #undef __SYSCALL #undef _ASM_X86_64_UNISTD_H_ @@ -109,5 +113,11 @@ int main(void) DEFINE(crypto_tfm_ctx_offset, offsetof(struct crypto_tfm, __crt_ctx)); BLANK(); DEFINE(__NR_syscall_max, sizeof(syscalls) - 1); + + BLANK(); + OFFSET(BP_scratch, boot_params, scratch); + OFFSET(BP_loadflags, boot_params, hdr.loadflags); + OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch); + OFFSET(BP_version, boot_params, hdr.version); return 0; } diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index dc34acb..639e632 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -35,12 +35,14 @@ static void __init via_bugs(void) } #ifdef CONFIG_ACPI +#ifdef CONFIG_X86_IO_APIC static int __init nvidia_hpet_check(struct acpi_table_header *header) { return 0; } -#endif +#endif /* CONFIG_X86_IO_APIC */ +#endif
Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent
On Sat, Oct 27, 2007 at 03:32:04PM -0400, David Woodhouse wrote: > On Sat, 2007-10-27 at 22:42 +0400, Anton Vorontsov wrote: > > Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and > > olpc drivers becuase it's not trivial to register/unregister their > > batteries on physical insertion/removal. I have some plans to teach > > at least pmu batteries to not use PROP_PRESENT. I don't have any > > OLPC, thus I can't convert it. > > Actually it's not hard to do that. I didn't say it's hard. But we don't have any interrupts for the battery events, thus we have to implement polling. > It was done this way in response to a > request from the userspace side. IIRC. Oh. Userspace tried to do something weird then, I think. Generally it's just sane to unregister absent hardware. -- Anton Vorontsov email: [EMAIL PROTECTED] backup email: [EMAIL PROTECTED] irc://irc.freenode.net/bd2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] proc_fs.h redux
Remove proc_fs.h from headers that doesn't really need it. Typical overkill is including full header when one can get away with just forward declaration of "struct proc_dir_entry". Number of files that are recompiled after touching proc_fs.h drops from 1100 to 513(!) on x86_64 allmodconfig. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- arch/powerpc/kernel/process.c |1 + arch/powerpc/kernel/prom.c |1 + arch/powerpc/mm/init_32.c |1 + arch/powerpc/mm/init_64.c |1 + arch/powerpc/platforms/iseries/mf.c |1 + arch/powerpc/platforms/pasemi/electra_ide.c |2 +- arch/powerpc/sysdev/fsl_soc.c |1 + arch/powerpc/sysdev/mv64x60_dev.c |2 +- arch/powerpc/sysdev/qe_lib/qe.c |1 + arch/sparc/kernel/process.c |3 ++- arch/sparc64/kernel/ldc.c |1 + arch/sparc64/kernel/mdesc.c |1 + arch/sparc64/kernel/viohs.c |2 +- drivers/acpi/bay.c |1 + drivers/acpi/dock.c |2 +- drivers/block/cpqarray.h|2 +- drivers/hwmon/ibmpex.c |2 +- drivers/isdn/hardware/eicon/platform.h |1 - drivers/md/md.c |1 + drivers/misc/fujitsu-laptop.c |1 + drivers/misc/msi-laptop.c |1 + drivers/misc/thinkpad_acpi.c|2 +- drivers/misc/thinkpad_acpi.h|3 ++- drivers/net/bonding/bonding.h |3 ++- drivers/net/cxgb3/t3cdev.h |2 +- drivers/parport/parport_serial.c|2 +- drivers/sbus/char/envctrl.c |2 +- drivers/sbus/char/uctrl.c |1 + drivers/scsi/imm.h |1 - drivers/scsi/ppa.h |1 - drivers/usb/gadget/lh7a40x_udc.h|1 - fs/nfs/client.c |1 + include/acpi/acpi_bus.h |2 +- include/asm-powerpc/prom.h |3 ++- include/asm-sparc/prom.h|4 +++- include/asm-sparc64/prom.h |4 +++- include/linux/atmdev.h |2 +- include/linux/crash_dump.h |3 ++- include/linux/efi.h |1 - include/linux/ipmi.h|1 - include/linux/netfilter.h |2 +- include/linux/parport.h |1 - include/linux/raid/md.h |1 - include/linux/sunrpc/cache.h|4 +++- include/linux/sunrpc/rpc_pipe_fs.h |2 ++ include/linux/sunrpc/stats.h|3 ++- include/linux/wanrouter.h |3 ++- include/net/sctp/sctp.h |2 +- net/atm/br2684.c|1 + net/atm/resources.h |2 -- net/irda/irnet/irnet.h |1 - net/irda/irnet/irnet_irda.c |2 +- net/sctp/objcnt.c |1 + net/sctp/proc.c |1 + net/sctp/protocol.c |1 + net/sunrpc/auth.c |2 ++ net/sunrpc/auth_unix.c |2 +- net/sunrpc/rpcb_clnt.c |1 + net/sunrpc/xprt.c |2 +- net/wanrouter/wanmain.c |1 + net/wanrouter/wanproc.c |1 + 61 files changed, 66 insertions(+), 38 deletions(-) --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include --- a/arch/powerpc/mm/init_32.c +++ b/arch/powerpc/mm/init_32.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include --- a/arch/powerpc/platforms/iseries/mf.c +++ b/arch/powerpc/platforms/iseries/mf.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include --- a/arch/powerpc/platforms/pasemi/electra_ide.c +++ b/arch/powerpc/platforms/pasemi/electra_ide.c @@ -16,7 +16,7 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ - +#include #include #include --- a/arch/powerpc/sysdev/fsl_soc.c +++ b/arch/powerpc/sysdev/fsl_soc.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include
Re: 2.6.24-rc1 sysctl table check failed on PowerMac
On Sat, 27 Oct 2007 22:34:53 +0400, Alexey Dobriyan wrote: > On Sat, Oct 27, 2007 at 07:57:38PM +0200, Mikael Pettersson wrote: > > Booting 2.6.24-rc1 on my PowerMac the kernel now spits > > out a sysctl warning early in the boot sequence: > > > > --- dmesg-2.6.23 > > +++ dmesg-2.6.24-rc1 > > ... > > IP route cache hash table entries: 32768 (order: 5, 131072 bytes) > > TCP established hash table entries: 131072 (order: 8, 1048576 bytes) > > TCP bind hash table entries: 65536 (order: 6, 262144 bytes) > > TCP: Hash tables configured (established 131072 bind 65536) > > TCP reno registered > > +sysctl table check failed: /kernel .1 Writable sysctl directory > > [PATCH] powerpc: fix sysctl whining re kernel.powersave-nap > > kernel was marked with 0755. Everywhere else it's 0555. > > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> > --- > > arch/powerpc/kernel/idle.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/arch/powerpc/kernel/idle.c > +++ b/arch/powerpc/kernel/idle.c > @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = { > { > .ctl_name = CTL_KERN, > .procname = "kernel", > - .mode = 0755, > + .mode = 0555, > .child = powersave_nap_ctl_table, > }, > {} > This eliminated the warning. Thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent
On Sat, 2007-10-27 at 22:42 +0400, Anton Vorontsov wrote: > Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and > olpc drivers becuase it's not trivial to register/unregister their > batteries on physical insertion/removal. I have some plans to teach > at least pmu batteries to not use PROP_PRESENT. I don't have any > OLPC, thus I can't convert it. Actually it's not hard to do that. It was done this way in response to a request from the userspace side. IIRC. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] scatterlist fallout: drivers/scsi/arm/
Already have this one queued for Linus in my tree. On Sat, Oct 27, 2007 at 07:44:04PM +0100, Al Viro wrote: > > Signed-off-by: Al Viro <[EMAIL PROTECTED]> > --- > diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h > index 21ba571..2d09fef 100644 > --- a/drivers/scsi/arm/scsi.h > +++ b/drivers/scsi/arm/scsi.h > @@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp) > SCp->buffer++; > SCp->buffers_residual--; > SCp->ptr = (char *) > - (page_address(SCp->buffer->page) + > + (page_address(sg_page(SCp->buffer)) + > SCp->buffer->offset); > SCp->this_residual = SCp->buffer->length; > } else { > @@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt) > SCpnt->SCp.buffer = (struct scatterlist *) > SCpnt->request_buffer; > SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1; > SCpnt->SCp.ptr = (char *) > - (page_address(SCpnt->SCp.buffer->page) + > + (page_address(sg_page(SCpnt->SCp.buffer)) + > SCpnt->SCp.buffer->offset); > SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length; > SCpnt->SCp.phase = SCpnt->request_bufflen; -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer
On Saturday 27 October 2007 21:17:55 Alan Stern wrote: > On Fri, 26 Oct 2007, Maxim Levitsky wrote: > > > > > Looking through the dmfe code, I noticed yet another possible race. > > > > A race between the .suspend, and a timer that serves both as a > > > > watchdog, and link state detector. > > > > Again I need to prevent it from running during the suspend/resume, but > > > > how? > > > > > > > > I can use del_timer in .suspend, and mod_timer in .resume, but that > > > > doesn't protect against > > > > race with already running timer. > > > > I can use del_timer_sync, but it states that it is useless if timer > > > > re-enables itself, and I agree with that. > > > > In dmfe case the timer does re-enable itself. > > > > > > That comment isn't right. del_timer_sync works perfectly well even if > > > the timer routine re-enables itself, provided it stops doing so after a > > > small number of iterations. > > Thanks for the info. but > > Due to the "don't access the hardware, while powered-off" rule, I must know > > that the timer isn't running. > > and won't be. > > So what function to use (if possible) to be sure that the timer won't run > > anymore? > > (Taking in the account the fact that it re-enables itself) > > Use del_timer_sync(). It guarantees that when it returns, the timer > will be stopped and the timer routine will no longer be running on any > CPU. > Even if the timer re-enables itself, are you sure? > Alan Stern > > Best regards, Maxim Levitsky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] irq_flags_t: intro and core annotations
Hi, On Sun, 21 Oct 2007, Alexey Dobriyan wrote: > So far remedies were: > a) grep(1) -- obviously fragile. I tried at some point grepping for >spin_lock_irqsave(), found quite a few, but it became bring quickly. > b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried, >brutally broke some arches, survived one commit before revert :^) >Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned long). > > So it would be nice to have something more robust. If it's just about the type checking, something like below should pretty much do the same. > * irq_flags_t allows arch maintainers to eventually switch to something > smaller than "unsigned long" if they want to. Considering how painful this conversion could be, the question is whether this is really needed. Comments so far suggest some archs don't need all of it, but does someone need more? bye, Roman --- include/linux/irqflags.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) Index: linux-2.6/include/linux/irqflags.h === --- linux-2.6.orig/include/linux/irqflags.h +++ linux-2.6/include/linux/irqflags.h @@ -41,6 +41,10 @@ # define INIT_TRACE_IRQFLAGS #endif +static __always_inline void __irq_flags_check(unsigned long *flags) +{ +} + #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT #include @@ -50,10 +54,11 @@ #define local_irq_disable() \ do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0) #define local_irq_save(flags) \ - do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0) + do { __irq_flags_check(); raw_local_irq_save(flags); trace_hardirqs_off(); } while (0) #define local_irq_restore(flags) \ do {\ + __irq_flags_check(); \ if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ @@ -69,8 +74,8 @@ */ # define raw_local_irq_disable() local_irq_disable() # define raw_local_irq_enable()local_irq_enable() -# define raw_local_irq_save(flags) local_irq_save(flags) -# define raw_local_irq_restore(flags) local_irq_restore(flags) +# define raw_local_irq_save(flags) ({ __irq_flags_check(); local_irq_save(flags); }) +# define raw_local_irq_restore(flags) ({ __irq_flags_check(); local_irq_restore(flags); }) #endif /* CONFIG_TRACE_IRQFLAGS_SUPPORT */ #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel
--- "Ahmed S. Darwish" <[EMAIL PROTECTED]> wrote: > Hi Casey, > > Casey <[EMAIL PROTECTED]> wrote: > > > > This version is again aimed at addressing Al Viro's issues in > > smackfs. Ahmed Darwish has again contributed in the repair of the > > locking issues there. The move to 2.6.24 was also an important > > release incentive. > > > > My patches mentiond above is not applied in this version. The same > lock issues remain. Did you forget applying them ? The patch I sent out was one version older than the one I intended to send. If I can recover the #$%^&*( disk I should have an update that includes those fixes presently. If not it may take a day or two longer. You have not been forgotten. Thank you for your contribution. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer
On Fri, 26 Oct 2007, Maxim Levitsky wrote: > > > Looking through the dmfe code, I noticed yet another possible race. > > > A race between the .suspend, and a timer that serves both as a watchdog, > > > and link state detector. > > > Again I need to prevent it from running during the suspend/resume, but > > > how? > > > > > > I can use del_timer in .suspend, and mod_timer in .resume, but that > > > doesn't protect against > > > race with already running timer. > > > I can use del_timer_sync, but it states that it is useless if timer > > > re-enables itself, and I agree with that. > > > In dmfe case the timer does re-enable itself. > > > > That comment isn't right. del_timer_sync works perfectly well even if > > the timer routine re-enables itself, provided it stops doing so after a > > small number of iterations. > Thanks for the info. but > Due to the "don't access the hardware, while powered-off" rule, I must know > that the timer isn't running. > and won't be. > So what function to use (if possible) to be sure that the timer won't run > anymore? > (Taking in the account the fact that it re-enables itself) Use del_timer_sync(). It guarantees that when it returns, the timer will be stopped and the timer routine will no longer be running on any CPU. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] cpusets: add interleave_over_allowed option
On Fri, 26 Oct 2007, David Rientjes wrote: > Hacking and requiring an updated version of libnuma to allow empty > nodemasks to be passed is a poor solution; if mempolicy's are supposed to > be independent from cpusets, then what semantics does an empty nodemask > actually imply when using MPOL_INTERLEAVE? To me, it means the entire > set_mempolicy() should be a no-op, and that's exactly how mainline > currently treats it _as_well_ as libnuma. So justifying this change in > the man page is respectible, but passing an empty nodemask just doesn't > make sense. > Another reason that passing an empty nodemask to set_mempolicy() doesn't make sense is that libnuma uses numa_set_interleave_mask(_no_nodes) to disable interleaving completely. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] DMA: Fix broken device refcounting
On Sat, 2007-10-27 at 06:49 -0700, Haavard Skinnemoen wrote: > On Fri, 26 Oct 2007 09:36:17 -0700 > Dan Williams <[EMAIL PROTECTED]> wrote: > > > @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref) > > { > > struct dma_chan *chan = container_of(kref, struct dma_chan, refcount); > > chan->device->device_free_chan_resources(chan); > > - kref_put(>device->refcount, dma_async_device_cleanup); > > } > > EXPORT_SYMBOL(dma_chan_cleanup); > > While I can't see any problems with the rest of the patch, I think this > part is wrong for the same reasons removing the kref_put() from the > class device cleanup function is. I don't see any constraint that > guarantees that dma_chan_cleanup() will always be called before > dma_dev_release(), which means that "chan" may have been freed before > this function gets a chance to run. Please correct me if I'm wrong. Absolutely right, the driver, not dmaengine, frees the memory so there must be a per channel reference on the device to hold off the driver's remove routine. > > Håvard So how about this... ---snip--- dmaengine: Fix broken device refcounting From: Haavard Skinnemoen <[EMAIL PROTECTED]> When a DMA device is unregistered, its reference count is decremented twice for each channel: Once dma_class_dev_release() and once in dma_chan_cleanup(). This may result in the DMA device driver's remove() function completing before all channels have been cleaned up, causing lots of use-after-free fun. Fix it by incrementing the device's reference count twice for each channel during registration. Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]> [EMAIL PROTECTED]: kill unnecessary client refcounting] Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/dma/dmaengine.c | 17 ++--- 1 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index 84257f7..ec7e871 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client *client) /* we are done once this client rejects * an available resource */ - if (ack == DMA_ACK) { + if (ack == DMA_ACK) dma_chan_get(chan); - kref_get(>refcount); - } else if (ack == DMA_NAK) + else if (ack == DMA_NAK) return; } } @@ -276,11 +275,8 @@ static void dma_clients_notify_removed(struct dma_chan *chan) /* client was holding resources for this channel so * free it */ - if (ack == DMA_ACK) { + if (ack == DMA_ACK) dma_chan_put(chan); - kref_put(>device->refcount, - dma_async_device_cleanup); - } } mutex_unlock(_list_mutex); @@ -320,11 +316,8 @@ void dma_async_client_unregister(struct dma_client *client) ack = client->event_callback(client, chan, DMA_RESOURCE_REMOVED); - if (ack == DMA_ACK) { + if (ack == DMA_ACK) dma_chan_put(chan); - kref_put(>device->refcount, - dma_async_device_cleanup); - } } list_del(>global_node); @@ -401,6 +394,8 @@ int dma_async_device_register(struct dma_device *device) goto err_out; } + /* One for the channel, one of the class device */ + kref_get(>refcount); kref_get(>refcount); kref_init(>refcount); chan->slow_ref = 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-usb-devel] USB HCD: avoid duplicate local_irq_disable()
On Thu, 25 Oct 2007, Greg KH wrote: > On Tue, Oct 23, 2007 at 12:01:37AM -0400, Alan Stern wrote: > > On Tue, 23 Oct 2007, Arnd Bergmann wrote: > > > > > usb_hcd_flush_endpoint() has a retry loop that starts with a > > > spin_lock_irq(), > > > but only gives up the spinlock, not the irq_disable before jumping to the > > > rescan label. > > > > > > Split the spin_lock_irq into the retryable part and the > > > local_irq_disable() > > > that is only done once as a micro-optimization and slight cleanup. > > > > I agree with your sentiment, but it would be better to solve this > > problem without using local_irq_disable(). The patch below does this. > > > > --- > > > > Signed-off-by: Alan Stern <[EMAIL PROTECTED]> > > Alan, is this something you want added to the tree and in before 2.6.24 > is out? Yes. It's a small thing, but we're better off keeping IRQ enable/disable calls properly balanced. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix x86_64 TIF_SYSCALL_TRACE race in entry.S
Mathieu Desnoyers <[EMAIL PROTECTED]> writes: > We make sure that the thread flag read is coherent between our new test and > the ALLWORK_MASK test by first saving it in a register used for both > comparisons. > That doesn't make sense. If someone is setting those asynchronously you can always race. You should really just stop the process like ptrace does before changing such things. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1: First impressions
Andrew Morton wrote: On Fri, 26 Oct 2007 21:33:40 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: * Andrew Morton <[EMAIL PROTECTED]> wrote: so a final 'sync' should be added to the test too, and the time it takes factored into the bandwidth numbers? That's one way of doing it. Or just run the test for a "long" time. ie: much longer than (total-memory / disk-bandwidth). Probably the latter will give a more accurate result, but it can get boring. Longer might be less inaccurate, but without flushing the last data you really don't get best accuracy, you just reduce the error. Clearly doing fdatasync() is best, since other i/o caused by sync() can skew the results. -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Adding TIF_TRACE_KERNEL to x86_64
On Fri, Oct 26, 2007 at 03:37:38PM -0400, Mathieu Desnoyers wrote: > > 1 - process A enters in a syscall, TIF_KERNEL_TRACE is cleared > 2 - we activate TIF_KERNEL_TRACE > 3 - process A returns from syscall (with wrong top of stack ?) -> segfault. > > Am I on the right track ? Yes. The code was not designed to allow that. The syscall path has been optimized to go as fast as possible. > > Can this be a concern with TIF_SYSCALL_TRACE also ? (potential race in > ptrace ?) ptrace only changes state in stopped processes; and those are not hanging in syscalls. So no there is no race in a standard kernel. If you wanted to change it the right way would be probably to test for SYSCALL_TRACE too in the work flags and handle it there -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide/arm/icside: fallout from commit 86f3a492bb09eee5745b93af35f2212179c251fd
On Saturday 27 October 2007, Al Viro wrote: > struct device doesn't have ->dma; it's in struct expansion_card where > that struct device is embedded into. Yeah, your fix is obviously correct, thanks! Bart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] bitops kernel-doc: inline instead of macro
Andy Whitcroft wrote: On Thu, Oct 25, 2007 at 01:48:14PM -0700, Andrew Morton wrote: Andy, I thought we were going to whine about __inline__ and __inline, too? Hmmm, I don't remember that coming up, but I'll add it to the todo. I am assuming plain 'inline' is preferred over both of these -- yell if you meant something else. Yes, use __inline__ if and only if it is exported to userspace. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel
--- Joshua Brindle <[EMAIL PROTECTED]> wrote: > Casey Schaufler wrote: > > The Smack patch and Paul Moore's netlabel API patch, > > together for 2.6.24-rc1. Paul's changes are identical > > to the previous posting, but it's been a while so they're > > here again. > > > > The sole intent of change has been to address locking > > and/or list processing issues. Please don't hesitate to > > point out any problems that you might see or suggest > > alternatives where things might not be to your liking. > > > > This version is aimed at 2.6.24, and has been tested > > against 2.6.24-rc1. > > > with both of these patches applied to 2.6.24-rc1 I get the following > oops when nfsd starts: > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 013c > printing eip: c01d7e39 *pde = > Oops: [#1] SMP > > Thanks for the bug report. I have just discovered that it is possible to have a virtual disk go virtually bad under VMware. I will be looking at this as soon as I recover. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cirrusfb nonsense
(pointer > 0) is deeply weird; (pointer >= 0) is even dumber... Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/drivers/video/cirrusfb.c b/drivers/video/cirrusfb.c index f99cb77..f7e2d5a 100644 --- a/drivers/video/cirrusfb.c +++ b/drivers/video/cirrusfb.c @@ -2509,8 +2509,7 @@ static int cirrusfb_zorro_register(struct zorro_dev *z, cinfo = info->par; cinfo->btype = btype; - assert(z > 0); - assert(z2 >= 0); + assert(z); assert(btype != BT_NONE); cinfo->zdev = z; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] scatterlist fallout: drivers/scsi/arm/
Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h index 21ba571..2d09fef 100644 --- a/drivers/scsi/arm/scsi.h +++ b/drivers/scsi/arm/scsi.h @@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp) SCp->buffer++; SCp->buffers_residual--; SCp->ptr = (char *) -(page_address(SCp->buffer->page) + +(page_address(sg_page(SCp->buffer)) + SCp->buffer->offset); SCp->this_residual = SCp->buffer->length; } else { @@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt) SCpnt->SCp.buffer = (struct scatterlist *) SCpnt->request_buffer; SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1; SCpnt->SCp.ptr = (char *) -(page_address(SCpnt->SCp.buffer->page) + +(page_address(sg_page(SCpnt->SCp.buffer)) + SCpnt->SCp.buffer->offset); SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length; SCpnt->SCp.phase = SCpnt->request_bufflen; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent
On Sat, Oct 27, 2007 at 08:54:30PM +0400, Andrey Borzenkov wrote: > I am not exactly sure about this one ... what other power_supply class > drivers > do? Should I fix HAL instead (but then, I do not know whether HAL is the only > application that is using this interface). Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and olpc drivers becuase it's not trivial to register/unregister their batteries on physical insertion/removal. I have some plans to teach at least pmu batteries to not use PROP_PRESENT. I don't have any OLPC, thus I can't convert it. To sum this: the good way to handle "missing" batteries is to unregister them, so they'll not show up in the /sys/class/power_supply. Is that possible with the ACPI? The good example is ds2760 batteries: drivers/power/ds2760_battery.c - is platform device drivers/w1/slaves/w1_ds2760.c - is w1 slave, which registers/unregisters pdevs on the detection/removal. > Subject: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery > is absent Bad idea. Don't use present attribute, if possible. > From: Andrey Borzenkov <[EMAIL PROTECTED]> > > Ensure that we always have "present" attribute in sysfs. This is compatible > with procfs case where we had "present: no" if battery was not available. > > This fixes HAL battery detection where it does pretend battery is present > but canot provide any value for it. > > Signed-off-by: Andrey Borzenkov <[EMAIL PROTECTED]> > > --- > > drivers/acpi/battery.c | 11 ++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c > index 681e26b..6e67fcd 100644 > --- a/drivers/acpi/battery.c > +++ b/drivers/acpi/battery.c > @@ -187,6 +187,10 @@ static int acpi_battery_get_property(struct power_supply > *psy, > return 0; > } > > +static enum power_supply_property missing_battery_props[] = { > + POWER_SUPPLY_PROP_PRESENT, > +}; > + > static enum power_supply_property charge_battery_props[] = { > POWER_SUPPLY_PROP_STATUS, > POWER_SUPPLY_PROP_PRESENT, > @@ -389,8 +393,13 @@ static int acpi_battery_update(struct acpi_battery > *battery) > { > int saved_present = acpi_battery_present(battery); > int result = acpi_battery_get_status(battery); > - if (result || !acpi_battery_present(battery)) > + if (result) > return result; > + if (!acpi_battery_present(battery)) { > + battery->bat.properties = missing_battery_props; > + battery->bat.num_properties = ARRAY_SIZE(missing_battery_props); > + return result; > + } > if (saved_present != acpi_battery_present(battery) || > !battery->update_time) { > battery->update_time = 0; -- Anton Vorontsov email: [EMAIL PROTECTED] backup email: [EMAIL PROTECTED] irc://irc.freenode.net/bd2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Dump stack during sysctl registration failure
On Sat, Oct 27, 2007 at 10:34:53PM +0400, wrote: > --- a/arch/powerpc/kernel/idle.c > +++ b/arch/powerpc/kernel/idle.c > @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = { > { > .ctl_name = CTL_KERN, > .procname = "kernel", > - .mode = 0755, > + .mode = 0555, Speaking of... [PATCH] Dump stack during sysctl registration failure Let's make immediately obvious from where sysctl comes from and messages itself more noticeable. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- kernel/sysctl_check.c |1 + 1 file changed, 1 insertion(+) --- a/kernel/sysctl_check.c +++ b/kernel/sysctl_check.c @@ -1432,6 +1432,7 @@ static void set_fail(const char **fail, struct ctl_table *table, const char *str printk(KERN_ERR "sysctl table check failed: "); sysctl_print_path(table); printk(" %s\n", *fail); + dump_stack(); } *fail = str; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] scatterlist fallout: mmc
#include is an odd thing to do... Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- drivers/mmc/host/au1xmmc.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/mmc/host/au1xmmc.c b/drivers/mmc/host/au1xmmc.c index b2104d4..c3926eb 100644 --- a/drivers/mmc/host/au1xmmc.c +++ b/drivers/mmc/host/au1xmmc.c @@ -40,7 +40,7 @@ #include #include #include -#include +#include #include #include -- 1.5.3.GIT - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ide/arm/icside: fallout from commit 86f3a492bb09eee5745b93af35f2212179c251fd
struct device doesn't have ->dma; it's in struct expansion_card where that struct device is embedded into. Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c index 410a0d1..93f71fc 100644 --- a/drivers/ide/arm/icside.c +++ b/drivers/ide/arm/icside.c @@ -316,13 +316,13 @@ static int icside_dma_end(ide_drive_t *drive) drive->waiting_for_dma = 0; - disable_dma(state->dev->dma); + disable_dma(ECARD_DEV(state->dev)->dma); /* Teardown mappings after DMA has completed. */ dma_unmap_sg(state->dev, hwif->sg_table, hwif->sg_nents, hwif->sg_dma_direction); - return get_dma_residue(state->dev->dma) != 0; + return get_dma_residue(ECARD_DEV(state->dev)->dma) != 0; } static void icside_dma_start(ide_drive_t *drive) @@ -331,8 +331,8 @@ static void icside_dma_start(ide_drive_t *drive) struct icside_state *state = hwif->hwif_data; /* We can not enable DMA on both channels simultaneously. */ - BUG_ON(dma_channel_active(state->dev->dma)); - enable_dma(state->dev->dma); + BUG_ON(dma_channel_active(ECARD_DEV(state->dev)->dma)); + enable_dma(ECARD_DEV(state->dev)->dma); } static int icside_dma_setup(ide_drive_t *drive) @@ -350,7 +350,7 @@ static int icside_dma_setup(ide_drive_t *drive) /* * We can not enable DMA on both channels. */ - BUG_ON(dma_channel_active(state->dev->dma)); + BUG_ON(dma_channel_active(ECARD_DEV(state->dev)->dma)); icside_build_sglist(drive, rq); @@ -367,14 +367,14 @@ static int icside_dma_setup(ide_drive_t *drive) /* * Select the correct timing for this drive. */ - set_dma_speed(state->dev->dma, drive->drive_data); + set_dma_speed(ECARD_DEV(state->dev)->dma, drive->drive_data); /* * Tell the DMA engine about the SG table and * data direction. */ - set_dma_sg(state->dev->dma, hwif->sg_table, hwif->sg_nents); - set_dma_mode(state->dev->dma, dma_mode); + set_dma_sg(ECARD_DEV(state->dev)->dma, hwif->sg_table, hwif->sg_nents); + set_dma_mode(ECARD_DEV(state->dev)->dma, dma_mode); drive->waiting_for_dma = 1; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc1 sysctl table check failed on PowerMac
On Sat, Oct 27, 2007 at 07:57:38PM +0200, Mikael Pettersson wrote: > Booting 2.6.24-rc1 on my PowerMac the kernel now spits > out a sysctl warning early in the boot sequence: > > --- dmesg-2.6.23 > +++ dmesg-2.6.24-rc1 > ... > IP route cache hash table entries: 32768 (order: 5, 131072 bytes) > TCP established hash table entries: 131072 (order: 8, 1048576 bytes) > TCP bind hash table entries: 65536 (order: 6, 262144 bytes) > TCP: Hash tables configured (established 131072 bind 65536) > TCP reno registered > +sysctl table check failed: /kernel .1 Writable sysctl directory [PATCH] powerpc: fix sysctl whining re kernel.powersave-nap kernel was marked with 0755. Everywhere else it's 0555. Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- arch/powerpc/kernel/idle.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/powerpc/kernel/idle.c +++ b/arch/powerpc/kernel/idle.c @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = { { .ctl_name = CTL_KERN, .procname = "kernel", - .mode = 0755, + .mode = 0555, .child = powersave_nap_ctl_table, }, {} - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] scatterlist fallout: frv
Signed-off-by: Al Viro <[EMAIL PROTECTED]> --- diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c index 671ce1e..662f7b1 100644 --- a/arch/frv/mb93090-mb00/pci-dma.c +++ b/arch/frv/mb93090-mb00/pci-dma.c @@ -15,6 +15,7 @@ #include #include #include +#include #include void *dma_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t *dma_handle, gfp_t gfp) @@ -86,7 +87,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, dampr2 = __get_DAMPR(2); for (i = 0; i < nents; i++) { - vaddr = kmap_atomic(sg[i].page, __KM_CACHE); + vaddr = kmap_atomic(sg_page([i]), __KM_CACHE); frv_dcache_writeback((unsigned long) vaddr, (unsigned long) vaddr + PAGE_SIZE); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID 10 w AHCI w NCQ = Spurius I/O error
Nestor A. Diaz wrote: Hello People, I need your help, this problem is turning me crazy. Did you know there is a raid list? I have created a RAID 10 using a RAID0 configuration on top of a two RAID1 devices (all software raid), like this: You have created a raid 0+1, raid10 is a different thing. Given your setup, raid10 is probably what you *should* have created. Personalities : [raid0] [raid1] md4 : active raid0 md2[0] md3[1] 605071872 blocks 64k chunks md0 : active raid1 sdd3[3] sda3[0] sdc3[2] sdb3[1] 9791552 blocks [4/4] [] md3 : active raid1 sdd2[2](F) sdb2[0] 302536000 blocks [2/1] [U_] md1 : active raid1 sdd1[3] sda1[0] sdc1[2] sdb1[1] 240832 blocks [4/4] [] md2 : active raid1 sda2[0] sdc2[1] 302536000 blocks [2/2] [UU] unused devices: But the sdd device sometimes fail, i have changed the hard disk, check the older sata drive, reformat using mke2fs -c -c (to check for media errrors both read and write, no media problems found, change the sata disk and the problem remains, also with a new sata hard disk). The systema is a supermicro server 5015-mt+ with an ich7 ahci controller [___snip___] The RAID 1 builds perfectly, but five days after that, the system shows a: end_request: I/O error, dev sdd, sector 144006110 raid1: Disk failure on sdd2, disabling device. Operation continuing on 1 devices end_request: I/O error, dev sdd, sector 144006222 end_request: I/O error, dev sdd, sector 144268814 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdb2 disk 1, wo:1, o:0, dev:sdd2 RAID1 conf printout: --- wd:1 rd:2 disk 0, wo:0, o:1, dev:sdb2 Hardware error, almost certainly. If you're using a hub, I suspect that first, then cables and heat problems, then the controller, in rough order of likelyhood. a week before i get (under 2.6.18) the following message: [___lots more snip___] I have updated from 2.6.18 to 2.6.22 expecting to not have the problem, but the problem remains and i didn't know what could be the problem, the problem always happen on /dev/sdd, i use LVM on top of the RAID 10 software device. I am not sure if the problem was because i create the RAID10 using two RAID1 devices and then do a RAID0, or should i have to be used mdadm and the level 10 option ? Any suggestions will be welcome. Do you ever get errors in partitions which are not part of the raid0+1 setup, like md1? If not, look at your partition tables to see if you have any strange values there. Are all drives at the same firmware level? -- Bill Davidsen <[EMAIL PROTECTED]> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent
Andrey Borzenkov wrote: > On Saturday 27 October 2007, Alexey Starikovskiy wrote: >> Andrey Borzenkov wrote: >>> I am not exactly sure about this one ... what other power_supply class >>> drivers do? Should I fix HAL instead (but then, I do not know whether HAL >>> is the only application that is using this interface). >> Hm, do you need separate set of properties for that? You could register >> either of existing two, and read function will not allow read of anything >> but "present". IMHO, this is what other modules do (/drivers/power) > > Do they have different set of properties depending on underlying hardware > that > you can't query unless hardware is present? I'd rather avoid adding fake > attributes; but I do not actually care so which one do you prefer? :) I prefer less code :) > >> One remaining trick here, you need to call unregister/register for >> power_supply if you change attributes -- so please check if your patched >> driver survives insertion of the battery. >> > > > Neither does your code (nor kpowersave :) ) Remove battery and set of > attributes is "stuck" instead of being reset to only fixed set of power > device attributes (basically "info"). The only call to power_supply_register > is in acpi_battery_add and as far as I can tell this is executed on adding > *slot* not when content of this slot changes. Point is -- it does not break :) If you start to play with shorter length of attributes table and don't call unregister/register, power_supply may go past the end of table. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/