Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel

2007-10-27 Thread Casey Schaufler

--- Al Viro <[EMAIL PROTECTED]> wrote:

> On Sat, Oct 27, 2007 at 11:01:12AM +0200, Ahmed S. Darwish wrote:
> > The problem here (As discussed in private mails) is that the for loop 
> > assumes that the beginning of given user-space buffer is the beginning
> > of a rule. This leads to situations where the rule becomes "ecret 20",
> > or "cret 20" instead of "Secret 20". Big input buffers/files leads 
> > smack to recieve a rule like "Secret 20" in fragmented chunks like:
> > 
> > write("\nSec", ..)
> > write("r", 1, ..)
> > write("et 20\n", ..)
> > 
> > Parsing a rule in such tough conditions in _kernel space_ is very
> > hard. I began to feel that it will be much easier if we do the parsing 
> > in a userspace utility and let smack accept only small buffers (80 char). 
> 
> For crying out louf, all it takes is a finite state machine...  BTW, folks,
> your parser *and* input language suck.  Really.  Silently allowing noise
> is Dumb(tm).

I was planning to patient Dumb, but if you've already got a copywrite
on it I'll have to go a different route. Carp.

> Please, write the grammar down and _follow_ _it_.

Good idea. I'll do it.

>  ...
> 
> Come on, people, this is ridiculous - why bother reinventing the wheels for
> the stuff that belongs to exercises halfway through any self-respecting
> introductory textbook?  Scary parser, my arse...

That code is in need of cleansing. With bleach. Yes, it's on my list.
I managed to get my virtual disk back, found how I messed up on v9,
and now have a (long) list of improvements and fixes.

No rest for the wicked.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fujitsu application panel driver

2007-10-27 Thread Dmitry Torokhov
Hi Stephen,

On Tuesday 23 October 2007 15:55, Stephen Hemminger wrote:
> +
> +static int apanel_setkeycode(struct input_dev *idev, int scancode, int 
> keycode)
> +{
> + struct apanel *ap = idev->private;
> +
> + if (keycode < 0 || keycode > KEY_MAX)
> + return -EINVAL;
> +
> + if (scancode < 0 || scancode >= MAX_PANEL_KEYS)
> + return -EINVAL;

scancode >= idev->keycodemax is prbably better here - we don't want to
allow setting keycode for unsupported buttons.

> +
> + clear_bit(ap->keymap[scancode], idev->keybit);

This will not work if one has same code assigned to 2 buttons. Pretty
degenerate case, I know...

> + ap->keymap[scancode] = keycode;
> + set_bit(keycode, idev->keybit);
> + return 0;
> +}

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: UML building failed in current Linus-tree

2007-10-27 Thread WANG Cong
On Sun, Oct 28, 2007 at 03:43:43AM +, Al Viro wrote:
>On Sun, Oct 28, 2007 at 11:24:41AM +0800, WANG Cong wrote:
>> 
>> Hi, Jeff, Sam!
>> 
>> I just pulled from Linus-tree, and got the following error when building uml.
>> 
>> $ make defconfig ARCH=um
>> /home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: 
>> /home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or 
>> directory
>> make: *** No rule to make target 
>> `/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'.  Stop.
>> 
>> Is this a known problem? Yesterday's Linus-tree was fine.
>
>diff --git a/arch/um/Kconfig.i386 b/arch/um/Kconfig.i386
>index 9876d80..e0ac74e 100644
>--- a/arch/um/Kconfig.i386
>+++ b/arch/um/Kconfig.i386

Thanks, Al.

With your patch, it works fine.

Jeff, could you please consider pushing this and Al's another patch[1] into
Linus-tree? UML in Linus-tree can't work for a long time. ;(

[1] http://lkml.org/lkml/2007/10/21/118

Regards.

WANG Cong

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scatterlist fallout: drivers/scsi/arm/

2007-10-27 Thread Linus Torvalds


On Sat, 27 Oct 2007, Russell King wrote:
> 
> FYI, here's what I have queued (cut'n'pasted so probably won't apply):

Yes, much nicer.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Sanitize the type of struct user.u_ar0

2007-10-27 Thread H. Peter Anvin
struct user.u_ar0 is defined to contain a pointer offset on all
architectures in which it is defined (all architectures which define
an a.out format except SPARC.)  However, it has a pointer type in the
headers, which is pointless --  is not exported to
userspace, and it just makes the code messy.

Redefine the field as "unsigned long" (which is the same size as a
pointer on all Linux architectures) and change the setting code to
user offsetof() instead of hand-coded arithmetic.

Cc: Linux Arch Mailing List <[EMAIL PROTECTED]>
Cc: Bryan Wu <[EMAIL PROTECTED]>
Cc: Roman Zippel <[EMAIL PROTECTED]>
Cc: Thomas Gleixner <[EMAIL PROTECTED]>
Cc: Ingo Molnar <[EMAIL PROTECTED]>
Cc: Richard Henderson <[EMAIL PROTECTED]>
Cc: Ivan Kokshaysky <[EMAIL PROTECTED]>
Cc: Russell King <[EMAIL PROTECTED]>
Cc: Lennert Buytenhek <[EMAIL PROTECTED]>
Cc: Håvard Skinnemoen <[EMAIL PROTECTED]>
Cc: Mikael Starvik <[EMAIL PROTECTED]>
Cc: Yoshinori Sato <[EMAIL PROTECTED]>
Cc: Tony Luck <[EMAIL PROTECTED]>
Cc: Hirokazu Takata <[EMAIL PROTECTED]>
Cc: Ralf Baechle <[EMAIL PROTECTED]>
Cc: Paul Mackerras <[EMAIL PROTECTED]>
Cc: Martin Schwidefsky <[EMAIL PROTECTED]>
Cc: Heiko Carstens <[EMAIL PROTECTED]>
Cc: Paul Mundt <[EMAIL PROTECTED]>
Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
---
 arch/blackfin/kernel/process.c |2 +-
 arch/m68k/kernel/process.c |2 +-
 arch/x86/ia32/ia32_aout.c  |2 +-
 fs/binfmt_aout.c   |2 +-
 include/asm-alpha/user.h   |2 +-
 include/asm-arm/user.h |2 +-
 include/asm-avr32/user.h   |2 +-
 include/asm-blackfin/user.h|2 +-
 include/asm-cris/user.h|2 +-
 include/asm-h8300/user.h   |3 +--
 include/asm-ia64/user.h|2 +-
 include/asm-m32r/user.h|2 +-
 include/asm-m68k/user.h|3 +--
 include/asm-mips/user.h|2 +-
 include/asm-powerpc/user.h |2 +-
 include/asm-s390/user.h|3 +--
 include/asm-sh/user.h  |2 +-
 include/asm-sh64/user.h|2 +-
 include/asm-v850/user.h|2 +-
 include/asm-x86/user_32.h  |2 +-
 include/asm-x86/user_64.h  |2 +-
 21 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 9124467..5c08004 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -257,7 +257,7 @@ void dump_thread(struct pt_regs *regs, struct user *dump)
((unsigned long)(TASK_SIZE -
 dump->start_stack)) >> PAGE_SHIFT;
 
-   dump->u_ar0 = (struct user_regs_struct *)((int)>regs - (int)dump);
+   dump->u_ar0 = offsetof(struct user, regs);
 
dump->regs.r0 = regs->r0;
dump->regs.r1 = regs->r1;
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index 3ee9186..f85b928 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -335,7 +335,7 @@ void dump_thread(struct pt_regs * regs, struct user * dump)
if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - 
dump->start_stack)) >> PAGE_SHIFT;
 
-   dump->u_ar0 = (struct user_regs_struct *)((int)>regs - (int)dump);
+   dump->u_ar0 = offsetof(struct user, regs);
sw = ((struct switch_stack *)regs) - 1;
dump->regs.d1 = regs->d1;
dump->regs.d2 = regs->d2;
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 731aac1..c489fee 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -162,7 +162,7 @@ static int aout_core_dump(long signr, struct pt_regs *regs, 
struct file *file, u
has_dumped = 1;
current->flags |= PF_DUMPCORE;
strncpy(dump.u_comm, current->comm, sizeof(current->comm));
-   dump.u_ar0 = (u32)(((unsigned long)()) - ((unsigned 
long)()));
+   dump.u_ar0 = offsetof(struct user32, regs);
dump.signal = signr;
dump_thread32(regs, );
 
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index e176d19..9a00063 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -115,7 +115,7 @@ static int aout_core_dump(long signr, struct pt_regs *regs, 
struct file *file, u
current->flags |= PF_DUMPCORE;
strncpy(dump.u_comm, current->comm, sizeof(dump.u_comm));
 #ifndef __sparc__
-   dump.u_ar0 = (void *)(((unsigned long)()) - ((unsigned 
long)()));
+   dump.u_ar0 = offsetof(struct user, regs);
 #endif
dump.signal = signr;
dump_thread(regs, );
diff --git a/include/asm-alpha/user.h b/include/asm-alpha/user.h
index 7e417fc..a4eb6a4 100644
--- a/include/asm-alpha/user.h
+++ b/include/asm-alpha/user.h
@@ -39,7 +39,7 @@ struct user {
unsigned long   start_data; /* data starting address */
unsigned long   start_stack;/* stack starting address */
long intsignal; /* signal causing core dump */
-   

Re: [2.6 patch] input/serio/hp_sdc.c section fix

2007-10-27 Thread Dmitry Torokhov
On Wednesday 24 October 2007 12:26, Adrian Bunk wrote:
> hp_sdc_exit() mustn't be __exit since it's called from the
> __init hp_sdc_register().
> 

Applied, thank you Adrian.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: epoll design problems with common fork/exec patterns

2007-10-27 Thread David Schwartz

> 6) Epoll removes the file from the set, when the *kernel* object gets
>closed (internal use-count goes to zero)
>
> With that in mind, how can the code snippet above trigger a removal from
> the epoll set?

I don't see how that can be. Suppose I add fd 8 to an epoll set. 
Suppose fd
5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
since there no longer is an fd 8? Events on files registered for epoll
notification are reported by descriptor, so the set membership has to be
associated (as reflected into userspace) with the descriptor, not the file.

For example, consider:

1) Process creates an epoll set, the set gets fd 4.

2) Process creates a socket, it gets fd 5.

3) The process adds fd 5 to set 4.

4) The process forks.

5) The child inherits the epoll set but not the socket.

Here the kernel cannot quite do the right thing. Ideally, the parent 
would
still have fd 5 in its version of the epoll set. After all, it has not
closed fd 5. However, the child *cannot* see fd 5 in its version of the
epoll set since it has no fd 5. An event reported for fd 5 would be
nonsense.

So it seems the kernel either has to break one of these "would/cannot"
requirements, or it has to split the epoll set in two. However, splitting
the set into two sets is clearly wrong since the processes should share it.

  Q6 Will  the  close of an fd cause it to be removed from
all
 epoll sets automatically?

  A6 Yes.

Note that this talks of the close of an "fd", not a file. The 'close'
function in fact closes an fd, as that fd is then reusable. So it sounds
like the problem above is solved by removing the fd from the set, but in
practice this doesn't happen. I have programs that call 'close' between
'fork' and 'exec' and do not see the socket removed from the poll set.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Dump stack during sysctl registration failure

2007-10-27 Thread Eric W. Biederman
Alexey Dobriyan <[EMAIL PROTECTED]> writes:

> On Sat, Oct 27, 2007 at 10:34:53PM +0400,  wrote:
>> --- a/arch/powerpc/kernel/idle.c
>> +++ b/arch/powerpc/kernel/idle.c
>> @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = {
>>  {
>>  .ctl_name   = CTL_KERN,
>>  .procname   = "kernel",
>> -.mode   = 0755,
>> +.mode   = 0555,
>
> Speaking of...
>
>
> [PATCH] Dump stack during sysctl registration failure
>
> Let's make immediately obvious from where sysctl comes from and
> messages itself more noticeable.
>
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>

Acked-by: "Eric W. Biederman" <[EMAIL PROTECTED]>

For ambiguous cases like directories this looks like a serious
help.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: UML building failed in current Linus-tree

2007-10-27 Thread Al Viro
On Sun, Oct 28, 2007 at 11:24:41AM +0800, WANG Cong wrote:
> 
> Hi, Jeff, Sam!
> 
> I just pulled from Linus-tree, and got the following error when building uml.
> 
> $ make defconfig ARCH=um
> /home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: 
> /home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or 
> directory
> make: *** No rule to make target 
> `/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'.  Stop.
> 
> Is this a known problem? Yesterday's Linus-tree was fine.

diff --git a/arch/um/Kconfig.i386 b/arch/um/Kconfig.i386
index 9876d80..e0ac74e 100644
--- a/arch/um/Kconfig.i386
+++ b/arch/um/Kconfig.i386
@@ -1,6 +1,6 @@
 menu "Host processor type and features"
 
-source "arch/i386/Kconfig.cpu"
+source "arch/x86/Kconfig.cpu"
 
 endmenu
 
diff --git a/arch/um/Makefile-i386 b/arch/um/Makefile-i386
index 08433f8..b01dfb0 100644
--- a/arch/um/Makefile-i386
+++ b/arch/um/Makefile-i386
@@ -28,7 +28,7 @@ CONFIG_X86_32 := y
 export CONFIG_X86_32
 
 # First of all, tune CFLAGS for the specific CPU. This actually sets cflags-y.
-include $(srctree)/arch/i386/Makefile.cpu
+include $(srctree)/arch/x86/Makefile_32.cpu
 
 # prevent gcc from keeping the stack 16 byte aligned. Taken from i386.
 cflags-y += $(call cc-option,-mpreferred-stack-boundary=2)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 09/10] SLUB: Do our own locking via slab_lock and slab_unlock.

2007-10-27 Thread Christoph Lameter
Too many troubles with the bitlocks and we really do not need
to do any bitops. Bitops do not effectively retrieve the old
value which we want. So use a cmpxchg instead on the arches
that allow it.

Instead of modifying the page->flags with fake atomic operations
we pass the page state as a parameter to functions. No function uses
the slab state if the page lock is held. Function must wait until the
lock is cleared. Thus we can defer the update of page->flags until slab
processing is complete. The "unlock" operation is then simply updating
page->flags.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>


---
 mm/slub.c |  324 +++---
 1 file changed, 187 insertions(+), 137 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-27 07:56:03.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-27 07:56:52.0 -0700
@@ -101,6 +101,7 @@
  */
 
 #define FROZEN (1 << PG_active)
+#define LOCKED (1 << PG_locked)
 
 #ifdef CONFIG_SLUB_DEBUG
 #define SLABDEBUG (1 << PG_error)
@@ -108,36 +109,6 @@
 #define SLABDEBUG 0
 #endif
 
-static inline int SlabFrozen(struct page *page)
-{
-   return page->flags & FROZEN;
-}
-
-static inline void SetSlabFrozen(struct page *page)
-{
-   page->flags |= FROZEN;
-}
-
-static inline void ClearSlabFrozen(struct page *page)
-{
-   page->flags &= ~FROZEN;
-}
-
-static inline int SlabDebug(struct page *page)
-{
-   return page->flags & SLABDEBUG;
-}
-
-static inline void SetSlabDebug(struct page *page)
-{
-   page->flags |= SLABDEBUG;
-}
-
-static inline void ClearSlabDebug(struct page *page)
-{
-   page->flags &= ~SLABDEBUG;
-}
-
 /*
  * Issues still to be resolved:
  *
@@ -818,11 +789,12 @@ static void trace(struct kmem_cache *s, 
 /*
  * Tracking of fully allocated slabs for debugging purposes.
  */
-static void add_full(struct kmem_cache *s, struct page *page)
+static void add_full(struct kmem_cache *s, struct page *page,
+   unsigned long state)
 {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 
-   if (!SlabDebug(page) || !(s->flags & SLAB_STORE_USER))
+   if (!(state & SLABDEBUG) || !(s->flags & SLAB_STORE_USER))
return;
spin_lock(>list_lock);
list_add(>lru, >full);
@@ -894,7 +866,7 @@ bad:
 }
 
 static noinline int free_debug_processing(struct kmem_cache *s,
-   struct page *page, void *object, void *addr)
+   struct page *page, void *object, void *addr, unsigned long state)
 {
if (!check_slab(s, page))
goto fail;
@@ -930,7 +902,7 @@ static noinline int free_debug_processin
}
 
/* Special debug activities for freeing objects */
-   if (!SlabFrozen(page) && page->freelist == page->end)
+   if (!(state & FROZEN) && page->freelist == page->end)
remove_full(s, page);
if (s->flags & SLAB_STORE_USER)
set_track(s, object, TRACK_FREE, addr);
@@ -1047,7 +1019,8 @@ static inline int slab_pad_check(struct 
{ return 1; }
 static inline int check_object(struct kmem_cache *s, struct page *page,
void *object, int active) { return 1; }
-static inline void add_full(struct kmem_cache *s, struct page *page) {}
+static inline void add_full(struct kmem_cache *s, struct page *page,
+   unsigned long state) {}
 static inline unsigned long kmem_cache_flags(unsigned long objsize,
unsigned long flags, const char *name,
void (*ctor)(struct kmem_cache *, void *))
@@ -1105,6 +1078,7 @@ static noinline struct page *new_slab(st
void *start;
void *last;
void *p;
+   unsigned long state;
 
BUG_ON(flags & GFP_SLAB_BUG_MASK);
 
@@ -1117,11 +1091,12 @@ static noinline struct page *new_slab(st
if (n)
atomic_long_inc(>nr_slabs);
page->slab = s;
-   page->flags |= 1 << PG_slab;
+   state = 1 << PG_slab;
if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
SLAB_STORE_USER | SLAB_TRACE))
-   SetSlabDebug(page);
+   state |= SLABDEBUG;
 
+   page->flags |= state;
start = page_address(page);
page->end = start + 1;
 
@@ -1147,13 +1122,13 @@ static void __free_slab(struct kmem_cach
 {
int pages = 1 << s->order;
 
-   if (unlikely(SlabDebug(page))) {
+   if (unlikely(page->flags & SLABDEBUG)) {
void *p;
 
slab_pad_check(s, page);
for_each_object(p, s, slab_address(page))
check_object(s, page, p, 0);
-   ClearSlabDebug(page);
+   page->flags &= ~SLABDEBUG;
}
 
mod_zone_page_state(page_zone(page),
@@ -1196,27 +1171,73 @@ static void discard_slab(struct kmem_cac
free_slab(s, 

[patch 10/10] SLUB: Restructure slab alloc

2007-10-27 Thread Christoph Lameter
Restructure slab_alloc so that the code flows in the sequence
it is usually executed.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |   40 
 1 file changed, 24 insertions(+), 16 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-27 07:58:07.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-27 07:58:36.0 -0700
@@ -1580,16 +1580,28 @@ static void *__slab_alloc(struct kmem_ca
local_irq_save(flags);
preempt_enable_no_resched();
 #endif
-   if (!c->page)
-   goto new_slab;
+   if (likely(c->page)) {
+   state = slab_lock(c->page);
+
+   if (unlikely(node_match(c, node) &&
+   c->page->freelist != c->page->end))
+   goto load_freelist;
+
+   deactivate_slab(s, c, state);
+   }
+
+another_slab:
+   state = get_partial(s, c, gfpflags, node);
+   if (!state)
+   goto grow_slab;
 
-   state = slab_lock(c->page);
-   if (unlikely(!node_match(c, node)))
-   goto another_slab;
 load_freelist:
-   object = c->page->freelist;
-   if (unlikely(object == c->page->end))
-   goto another_slab;
+   /*
+* slabs from the partial list must have at least
+* one free object.
+*/
+   VM_BUG_ON(c->page->freelist == c->page->end);
+
if (unlikely(state & SLABDEBUG))
goto debug;
 
@@ -1607,20 +1619,16 @@ out:
 #endif
return object;
 
-another_slab:
-   deactivate_slab(s, c, state);
-
-new_slab:
-   state = get_partial(s, c, gfpflags, node);
-   if (state)
-   goto load_freelist;
-
+/* Extend the slabcache with a new slab */
+grow_slab:
state = get_new_slab(s, , gfpflags, node);
if (state)
goto load_freelist;
 
object = NULL;
goto out;
+
+/* Perform debugging */
 debug:
object = c->page->freelist;
if (!alloc_debug_processing(s, c->page, object, addr))

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 07/10] SLUB: Avoid referencing kmem_cache structure in __slab_alloc

2007-10-27 Thread Christoph Lameter
There is the need to use the objects per slab in the first part of
__slab_alloc() which is still pretty hot. Copy the number of objects
per slab into the kmem_cache_cpu structure. That way we can get the
value from a cache line that we already need to touch. This brings
the kmem_cache_cpu structure up to 4 even words.

There is no increase in the size of kmem_cache_cpu since the size of object
is rounded to the next word.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/slub_def.h |1 +
 mm/slub.c|3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/slub_def.h
===
--- linux-2.6.orig/include/linux/slub_def.h 2007-10-26 19:09:16.0 
-0700
+++ linux-2.6/include/linux/slub_def.h  2007-10-27 07:55:12.0 -0700
@@ -17,6 +17,7 @@ struct kmem_cache_cpu {
int node;
unsigned int offset;
unsigned int objsize;
+   unsigned int objects;
 };
 
 struct kmem_cache_node {
Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-27 07:52:12.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-27 07:55:12.0 -0700
@@ -1512,7 +1512,7 @@ load_freelist:
 
object = c->page->freelist;
c->freelist = object[c->offset];
-   c->page->inuse = s->objects;
+   c->page->inuse = c->objects;
c->page->freelist = c->page->end;
c->node = page_to_nid(c->page);
 unlock_out:
@@ -1896,6 +1896,7 @@ static void init_kmem_cache_cpu(struct k
c->node = 0;
c->offset = s->offset / sizeof(void *);
c->objsize = s->objsize;
+   c->objects = s->objects;
 }
 
 static void init_kmem_cache_node(struct kmem_cache_node *n)

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 08/10] SLUB: Optional fast path using cmpxchg_local

2007-10-27 Thread Christoph Lameter
Provide an alternate implementation of the SLUB fast paths for alloc
and free using cmpxchg_local. The cmpxchg_local fast path is selected
for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only
set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an
interrupt enable/disable sequence. This is known to be true for both
x86 platforms so set FAST_CMPXCHG_LOCAL for both arches.

Not all arches can support fast cmpxchg operations. Typically the
architecture must have an optimized cmpxchg instruction. The
cmpxchg fast path makes no sense on platforms whose cmpxchg is
slower than interrupt enable/disable (like f.e. IA64).

The advantages of a cmpxchg_local based fast path are:

1. Lower cycle count (30%-60% faster)

2. There is no need to disable and enable interrupts on the fast path.
   Currently interrupts have to be disabled and enabled on every
   slab operation. This is likely saving a significant percentage
   of interrupt off / on sequences in the kernel.

3. The disposal of freed slabs can occur with interrupts enabled.

The alternate path is realized using #ifdef's. Several attempts to do the
same with macros and in line functions resulted in a mess (in particular due
to the strange way that local_interrupt_save() handles its argument and due
to the need to define macros/functions that sometimes disable interrupts
and sometimes do something else. The macro based approaches made it also
difficult to preserve the optimizations for the non cmpxchg paths).

#ifdef seems to be the way to go here to have a readable source.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 arch/x86/Kconfig.i386   |4 ++
 arch/x86/Kconfig.x86_64 |4 ++
 mm/slub.c   |   71 ++--
 3 files changed, 77 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-27 10:39:07.583665939 -0700
+++ linux-2.6/mm/slub.c 2007-10-27 10:40:19.710415861 -0700
@@ -1496,7 +1496,12 @@ static void *__slab_alloc(struct kmem_ca
 {
void **object;
struct page *new;
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   unsigned long flags;
 
+   local_irq_save(flags);
+   preempt_enable_no_resched();
+#endif
if (!c->page)
goto new_slab;
 
@@ -1518,6 +1523,10 @@ load_freelist:
 unlock_out:
slab_unlock(c->page);
 out:
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   preempt_disable();
+   local_irq_restore(flags);
+#endif
return object;
 
 another_slab:
@@ -1592,9 +1601,26 @@ static void __always_inline *slab_alloc(
gfp_t gfpflags, int node, void *addr)
 {
void **object;
-   unsigned long flags;
struct kmem_cache_cpu *c;
 
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   c = get_cpu_slab(s, get_cpu());
+   do {
+   object = c->freelist;
+   if (unlikely(is_end(object) || !node_match(c, node))) {
+   object = __slab_alloc(s, gfpflags, node, addr, c);
+   if (unlikely(!object)) {
+   put_cpu();
+   goto out;
+   }
+   break;
+   }
+   } while (cmpxchg_local(>freelist, object, object[c->offset])
+   != object);
+   put_cpu();
+#else
+   unsigned long flags;
+
local_irq_save(flags);
c = get_cpu_slab(s, smp_processor_id());
if (unlikely((is_end(c->freelist)) || !node_match(c, node))) {
@@ -1609,6 +1635,7 @@ static void __always_inline *slab_alloc(
c->freelist = object[c->offset];
}
local_irq_restore(flags);
+#endif
 
if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
@@ -1644,6 +1671,11 @@ static void __slab_free(struct kmem_cach
void *prior;
void **object = (void *)x;
 
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   unsigned long flags;
+
+   local_irq_save(flags);
+#endif
slab_lock(page);
 
if (unlikely(SlabDebug(page)))
@@ -1669,6 +1701,9 @@ checks_ok:
 
 out_unlock:
slab_unlock(page);
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   local_irq_restore(flags);
+#endif
return;
 
 slab_empty:
@@ -1679,6 +1714,9 @@ slab_empty:
remove_partial(s, page);
 
slab_unlock(page);
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   local_irq_restore(flags);
+#endif
discard_slab(s, page);
return;
 
@@ -1703,9 +1741,37 @@ static void __always_inline slab_free(st
struct page *page, void *x, void *addr)
 {
void **object = (void *)x;
-   unsigned long flags;
struct kmem_cache_cpu *c;
 
+#ifdef CONFIG_FAST_CMPXCHG_LOCAL
+   void **freelist;
+
+   c = get_cpu_slab(s, get_cpu());
+   debug_check_no_locks_freed(object, s->objsize);
+  

[patch 06/10] SLUB: Provide unique end marker for each slab

2007-10-27 Thread Christoph Lameter
Currently we use the NULL pointer to signal that there are no more objects.
However the NULL pointers of all slabs match in contrast to the
pointers to the real objects which are distinctive for each slab page.

Change the end pointer to be simply a pointer to the first object with
bit 0 set. That way all end pointers are different for each slab. This
is necessary to allow reliable end marker identification for the
next patch that implements a fast path without the need to disable interrupts.

Bring back the use of the mapping field by SLUB since we would otherwise
have to call a relatively expensive function page_address() in __slab_alloc().
Use of the mapping field allows avoiding calling page_address() in various
other functions as well.

There is no need to change the page_mapping() function since bit 0 is
set on the mapping as also for anonymous pages. page_mapping(slab_page)
will therefore still return NULL although the mapping field is overloaded.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/mm_types.h |5 ++-
 mm/slub.c|   72 ++-
 2 files changed, 51 insertions(+), 26 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-26 19:09:03.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-27 07:52:12.0 -0700
@@ -277,15 +277,32 @@ static inline struct kmem_cache_cpu *get
 #endif
 }
 
+/*
+ * The end pointer in a slab is special. It points to the first object in the
+ * slab but has bit 0 set to mark it.
+ *
+ * Note that SLUB relies on page_mapping returning NULL for pages with bit 0
+ * in the mapping set.
+ */
+static inline int is_end(void *addr)
+{
+   return (unsigned long)addr & PAGE_MAPPING_ANON;
+}
+
+void *slab_address(struct page *page)
+{
+   return page->end - PAGE_MAPPING_ANON;
+}
+
 static inline int check_valid_pointer(struct kmem_cache *s,
struct page *page, const void *object)
 {
void *base;
 
-   if (!object)
+   if (object == page->end)
return 1;
 
-   base = page_address(page);
+   base = slab_address(page);
if (object < base || object >= base + s->objects * s->size ||
(object - base) % s->size) {
return 0;
@@ -318,7 +335,8 @@ static inline void set_freepointer(struc
 
 /* Scan freelist */
 #define for_each_free_object(__p, __s, __free) \
-   for (__p = (__free); __p; __p = get_freepointer((__s), __p))
+   for (__p = (__free); (__p) != page->end; __p = get_freepointer((__s),\
+   __p))
 
 /* Determine object index from a given position */
 static inline int slab_index(void *p, struct kmem_cache *s, void *addr)
@@ -470,7 +488,7 @@ static void slab_fix(struct kmem_cache *
 static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 {
unsigned int off;   /* Offset of last byte */
-   u8 *addr = page_address(page);
+   u8 *addr = slab_address(page);
 
print_tracking(s, p);
 
@@ -648,7 +666,7 @@ static int slab_pad_check(struct kmem_ca
if (!(s->flags & SLAB_POISON))
return 1;
 
-   start = page_address(page);
+   start = slab_address(page);
end = start + (PAGE_SIZE << s->order);
length = s->objects * s->size;
remainder = end - (start + length);
@@ -715,7 +733,7 @@ static int check_object(struct kmem_cach
 * of the free objects in this slab. May cause
 * another error because the object count is now wrong.
 */
-   set_freepointer(s, p, NULL);
+   set_freepointer(s, p, page->end);
return 0;
}
return 1;
@@ -749,18 +767,18 @@ static int on_freelist(struct kmem_cache
void *fp = page->freelist;
void *object = NULL;
 
-   while (fp && nr <= s->objects) {
+   while (fp != page->end && nr <= s->objects) {
if (fp == search)
return 1;
if (!check_valid_pointer(s, page, fp)) {
if (object) {
object_err(s, page, object,
"Freechain corrupt");
-   set_freepointer(s, object, NULL);
+   set_freepointer(s, object, page->end);
break;
} else {
slab_err(s, page, "Freepointer corrupt");
-   page->freelist = NULL;
+   page->freelist = page->end;
page->inuse = s->objects;
slab_fix(s, "Freelist cleared");
return 0;
@@ -870,7 +888,7 @@ bad:
 */
slab_fix(s, "Marking all objects used");
  

[patch 00/10] SLUB: SMP regression tests on Dual Xeon E5345 (8p) and new performance patches

2007-10-27 Thread Christoph Lameter
Recent reports from Intel indicated that there were regression on
SMP benchmarks vs. SLAB. This is a discussion of performance results
and some patches are attached to fix various issues.
The patches are also available via git pull from

git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git performance


SLAB and SLUB are fundamentally different architectures. SLAB batches multiple
objects on queues. The movement between queues is protected by a single lock
(at least in the SMP configuration). SLAB can move an arbitrary amount of
objects by taking the list_lock. Integration of objects into the slabs
is deferred as much as possible while objects circle on various slab
queues.

SLUB's design is to directly integrate or extract the objects from the
slabs without going through intermediate queues. Thus the overhead is
eliminated. SLUB has a lock in each slab allowing fine grained locking.
Centralized locks are rarely taken. SLUB cannot batch objects to
optimize lock use. Instead a whole slab is assigned to a processor.
Allocations and frees can then occur from the CPU slab without taking
the slab lock. However, that is limited to the number of objects that
fit into a slab in contrast to SLAB which can extract objects from
multiple slabs and put them on a per CPU queue.

If SLUB is freeing an objects then the per CPU slab can only be used if the
object is part of the CPU slab. This is usually the case for short lived
allocations. Long lived allocations and objects allocated on other CPUs
will need to use the slow path where the slab_lock must be taken to
synchronize the free. This makes the slab_free() path particularly
problematic in SMP contexts.

Optimization in SLUB is therefore mainly optimization of locking
and of the execution code paths. The following patches optimize
locking further by using a cmpxchg_local in the fast path and
by avoiding stores to page struct fields etc to address regressions
that we see under SMP.

Another fundamental distinction between SLAB and SLUB is that SLAB
was designed with SMP in mind. NUMA was a later add-on that added
a significant complexity. SLUB was written for NUMA. NUMA support is
native. The same slab_free() path that is problematic under SMP is
effectively dealing with the alien cache problem that SLAB has under NUMA
and is increasing performance of remote free operations significantly.
The cpu_slab concept makes the determination of NUMA locality of objects
simpler since we can match on the page that an object belongs to and move
the whole page of objects in a NUMA aware fashion instead of the individual
objects in the queues of SLAB.

The fine grained locking is also important for SMP system with a large number
of processors. SLAB can put lots of objects on its queues. However, current
processors can take a large number of objects off the queues in a short
time period. As a result we see significant lock contention using SLAB during
parallel operations on the 8p SMP machine that is investigated here. SLAB has
less problems scaling on NUMA with a more limited number of processors per node
because SLAB will then use node based locks instead of global locks.

Tests were run with 4 different kernels:

SLAB = 2.6.24-rc1 configured to run SLAB
SLUB = 2.6.24-rc1 configured to run SLUB
SLUB+ = 2.6.24-rc1 patched with the following patches.
SLUB-o = SLUB+ booted with slub_max_order=3 slub_min_objects=20

The SLAB tests result in the baseline to work against. SLUB is the
current state of 2.6.24. SLUB+ is an version of SLUB that was optimized
to run on the 8p SMP box after observing some of the performance issues.
SLUB-o is useful to see what effect the use of higher order pages has
on performance.

All tests are done by running 1 operations on each processor. The time
needed is measured using TSC times tamps.

All measurements are in cycle counts. The higher the cycle count the more
time the allocator needs to perform an operation. The lower the count the
better the performance of the allocator.

Test A: Single threaded kmalloc
===

A single cpu is running and is allocating 1 objects of various
sizes.

SLABSLUBSLUB+   SLUB-o
   896  86  45  44  2 *
  1684  92  49  48  
  3284  106 61  59  +++
  64102 129 82  88  ++
 128147 226 188 181 --
 256200 248 207 285 -
 512300 301 260 209 ++
1024416 440 398 264 ++
2048720 542 530 390 +++
40961254342 342 336 3 *

SLUB passes 4k allocations directly through to the page allocator which
is more efficient at handling page sized allocations than SLABs handling
of them. 4k (or page sized) allocations will be special throughout these
tests.

We see a performance degradation vs. SLAB in the middle range that
is reduced by the patch set.

The cmpxchg_local operation used in SLUB+ 

[patch 05/10] SLUB: __slab_alloc() exit path consolidation

2007-10-27 Thread Christoph Lameter
Use a single exit path by using goto's to the hottest exit path.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-25 19:38:14.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-25 19:38:47.0 -0700
@@ -1493,7 +1493,9 @@ load_freelist:
c->page->inuse = s->objects;
c->page->freelist = NULL;
c->node = page_to_nid(c->page);
+unlock_out:
slab_unlock(c->page);
+out:
return object;
 
 another_slab:
@@ -1541,7 +1543,8 @@ new_slab:
c->page = new;
goto load_freelist;
}
-   return NULL;
+   object = NULL;
+   goto out;
 debug:
object = c->page->freelist;
if (!alloc_debug_processing(s, c->page, object, addr))
@@ -1550,8 +1553,7 @@ debug:
c->page->inuse++;
c->page->freelist = object[c->offset];
c->node = -1;
-   slab_unlock(c->page);
-   return object;
+   goto unlock_out;
 }
 
 /*

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 04/10] SLUB: Avoid checking for a valid object before zeroing on the fast path

2007-10-27 Thread Christoph Lameter
The fast path always results in a valid object. Move the check
for the NULL pointer to the slow branch that calls
__slab_alloc. Only __slab_alloc can return NULL if there is no
memory available anymore and that case is exceedingly rare.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-25 19:37:38.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-25 19:38:14.0 -0700
@@ -1573,19 +1573,22 @@ static void __always_inline *slab_alloc(
 
local_irq_save(flags);
c = get_cpu_slab(s, smp_processor_id());
-   if (unlikely(!c->freelist || !node_match(c, node)))
+   if (unlikely(!c->freelist || !node_match(c, node))) {
 
object = __slab_alloc(s, gfpflags, node, addr, c);
-
-   else {
+   if (unlikely(!object)) {
+   local_irq_restore(flags);
+   goto out;
+   }
+   } else {
object = c->freelist;
c->freelist = object[c->offset];
}
local_irq_restore(flags);
 
-   if (unlikely((gfpflags & __GFP_ZERO) && object))
+   if (unlikely((gfpflags & __GFP_ZERO)))
memset(object, 0, c->objsize);
-
+out:
return object;
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 03/10] SLUB: Move kmem_cache_node determination into add_full and add_partial

2007-10-27 Thread Christoph Lameter
The kmem_cache_node determination can be moved into add_full()
and add_partial(). This removes some code from the slab_free()
slow path and reduces the register overhead that has to be managed
in the slow path.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |   29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-25 19:36:59.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-25 19:37:38.0 -0700
@@ -800,8 +800,12 @@ static void trace(struct kmem_cache *s, 
 /*
  * Tracking of fully allocated slabs for debugging purposes.
  */
-static void add_full(struct kmem_cache_node *n, struct page *page)
+static void add_full(struct kmem_cache *s, struct page *page)
 {
+   struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
+   if (!SlabDebug(page) || !(s->flags & SLAB_STORE_USER))
+   return;
spin_lock(>list_lock);
list_add(>lru, >full);
spin_unlock(>list_lock);
@@ -1025,7 +1029,7 @@ static inline int slab_pad_check(struct 
{ return 1; }
 static inline int check_object(struct kmem_cache *s, struct page *page,
void *object, int active) { return 1; }
-static inline void add_full(struct kmem_cache_node *n, struct page *page) {}
+static inline void add_full(struct kmem_cache *s, struct page *page) {}
 static inline unsigned long kmem_cache_flags(unsigned long objsize,
unsigned long flags, const char *name,
void (*ctor)(struct kmem_cache *, void *))
@@ -1198,9 +1202,11 @@ static __always_inline int slab_trylock(
 /*
  * Management of partially allocated slabs
  */
-static void add_partial(struct kmem_cache_node *n,
+static void add_partial(struct kmem_cache *s,
struct page *page, int tail)
 {
+   struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
spin_lock(>list_lock);
n->nr_partial++;
if (tail)
@@ -1336,19 +1342,18 @@ static struct page *get_partial(struct k
  */
 static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 {
-   struct kmem_cache_node *n = get_node(s, page_to_nid(page));
-
ClearSlabFrozen(page);
if (page->inuse) {
 
if (page->freelist)
-   add_partial(n, page, tail);
-   else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
-   add_full(n, page);
+   add_partial(s, page, tail);
+   else
+   add_full(s, page);
slab_unlock(page);
 
} else {
-   if (n->nr_partial < MIN_PARTIAL) {
+   if (get_node(s, page_to_nid(page))->nr_partial
+   < MIN_PARTIAL) {
/*
 * Adding an empty slab to the partial slabs in order
 * to avoid page allocator overhead. This slab needs
@@ -1357,7 +1362,7 @@ static void unfreeze_slab(struct kmem_ca
 * partial list stays small. kmem_cache_shrink can
 * reclaim empty slabs from the partial list.
 */
-   add_partial(n, page, 1);
+   add_partial(s, page, 1);
slab_unlock(page);
} else {
slab_unlock(page);
@@ -1633,7 +1638,7 @@ checks_ok:
 * then add it.
 */
if (unlikely(!prior))
-   add_partial(get_node(s, page_to_nid(page)), page, 0);
+   add_partial(s, page, 0);
 
 out_unlock:
slab_unlock(page);
@@ -2041,7 +2046,7 @@ static struct kmem_cache_node *early_kme
 #endif
init_kmem_cache_node(n);
atomic_long_inc(>nr_slabs);
-   add_partial(n, page, 0);
+   add_partial(kmalloc_caches, page, 0);
return n;
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 02/10] SLUB: Noinline some functions to avoid them being folded into alloc/free

2007-10-27 Thread Christoph Lameter
Some function tend to get folded into __slab_free and __slab_alloc
although they are rarely called. They cause register pressure that
leads to bad code generation.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-25 19:36:10.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-25 19:36:59.0 -0700
@@ -831,8 +831,8 @@ static void setup_object_debug(struct km
init_tracking(s, object);
 }
 
-static int alloc_debug_processing(struct kmem_cache *s, struct page *page,
-   void *object, void *addr)
+static noinline int alloc_debug_processing(struct kmem_cache *s,
+   struct page *page, void *object, void *addr)
 {
if (!check_slab(s, page))
goto bad;
@@ -871,8 +871,8 @@ bad:
return 0;
 }
 
-static int free_debug_processing(struct kmem_cache *s, struct page *page,
-   void *object, void *addr)
+static noinline int free_debug_processing(struct kmem_cache *s,
+   struct page *page, void *object, void *addr)
 {
if (!check_slab(s, page))
goto fail;
@@ -1075,7 +1075,8 @@ static void setup_object(struct kmem_cac
s->ctor(s, object);
 }
 
-static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
+static noinline struct page *new_slab(struct kmem_cache *s,
+   gfp_t flags, int node)
 {
struct page *page;
struct kmem_cache_node *n;
@@ -1209,7 +1210,7 @@ static void add_partial(struct kmem_cach
spin_unlock(>list_lock);
 }
 
-static void remove_partial(struct kmem_cache *s,
+static noinline void remove_partial(struct kmem_cache *s,
struct page *page)
 {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 01/10] SLUB: Consolidate add_partial and add_partial_tail to one function

2007-10-27 Thread Christoph Lameter
Add a parameter to add_partial instead of having separate functions.
That allows the detailed control from multiple places when putting
slabs back to the partial list. If we put slabs back to the front
then they are likely immediately used for allocations. If they are
put at the end then we can maximize the time that the partial slabs
spent without allocations.

When deactivating slab we can put the slabs that had remote objects freed
to them at the end of the list so that the cache lines can cool down.
Slabs that had objects from the local cpu freed to them are put in the
front of the list to be reused ASAP in order to exploit the cache hot state.

[This patch is already in mm]

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
---
 mm/slub.c |   31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

Index: linux-2.6/mm/slub.c
===
--- linux-2.6.orig/mm/slub.c2007-10-24 08:33:01.0 -0700
+++ linux-2.6/mm/slub.c 2007-10-24 09:19:52.0 -0700
@@ -1197,19 +1197,15 @@
 /*
  * Management of partially allocated slabs
  */
-static void add_partial_tail(struct kmem_cache_node *n, struct page *page)
+static void add_partial(struct kmem_cache_node *n,
+   struct page *page, int tail)
 {
spin_lock(>list_lock);
n->nr_partial++;
-   list_add_tail(>lru, >partial);
-   spin_unlock(>list_lock);
-}
-
-static void add_partial(struct kmem_cache_node *n, struct page *page)
-{
-   spin_lock(>list_lock);
-   n->nr_partial++;
-   list_add(>lru, >partial);
+   if (tail)
+   list_add_tail(>lru, >partial);
+   else
+   list_add(>lru, >partial);
spin_unlock(>list_lock);
 }
 
@@ -1337,7 +1333,7 @@
  *
  * On exit the slab lock will have been dropped.
  */
-static void unfreeze_slab(struct kmem_cache *s, struct page *page)
+static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 {
struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 
@@ -1345,7 +1341,7 @@
if (page->inuse) {
 
if (page->freelist)
-   add_partial(n, page);
+   add_partial(n, page, tail);
else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
add_full(n, page);
slab_unlock(page);
@@ -1360,7 +1356,7 @@
 * partial list stays small. kmem_cache_shrink can
 * reclaim empty slabs from the partial list.
 */
-   add_partial_tail(n, page);
+   add_partial(n, page, 1);
slab_unlock(page);
} else {
slab_unlock(page);
@@ -1375,6 +1371,7 @@
 static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
struct page *page = c->page;
+   int tail = 1;
/*
 * Merge cpu freelist into freelist. Typically we get here
 * because both freelists are empty. So this is unlikely
@@ -1383,6 +1380,8 @@
while (unlikely(c->freelist)) {
void **object;
 
+   tail = 0;   /* Hot objects. Put the slab first */
+
/* Retrieve object from cpu_freelist */
object = c->freelist;
c->freelist = c->freelist[c->offset];
@@ -1393,7 +1392,7 @@
page->inuse--;
}
c->page = NULL;
-   unfreeze_slab(s, page);
+   unfreeze_slab(s, page, tail);
 }
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
@@ -1633,7 +1632,7 @@
 * then add it.
 */
if (unlikely(!prior))
-   add_partial(get_node(s, page_to_nid(page)), page);
+   add_partial(get_node(s, page_to_nid(page)), page, 0);
 
 out_unlock:
slab_unlock(page);
@@ -2041,7 +2040,7 @@
 #endif
init_kmem_cache_node(n);
atomic_long_inc(>nr_slabs);
-   add_partial(n, page);
+   add_partial(n, page, 0);
return n;
 }
 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


UML building failed in current Linus-tree

2007-10-27 Thread WANG Cong

Hi, Jeff, Sam!

I just pulled from Linus-tree, and got the following error when building uml.

$ make defconfig ARCH=um
/home/wangcong/projects/linux-2.6/arch/um/Makefile-i386:32: 
/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu: No such file or 
directory
make: *** No rule to make target 
`/home/wangcong/projects/linux-2.6/arch/i386/Makefile.cpu'.  Stop.

Is this a known problem? Yesterday's Linus-tree was fine.

Regards.

-- 
May the Source Be With You.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mapping PCI memory to user-space

2007-10-27 Thread Roland Dreier
 > I am writing a driver to map a PCI board memory space (pcibar2) into a
 > user-space vma via 'mmap'.  What is the relationship between the address
 > returned from ioremap and the type of address needed in the
 > 'io_remap_page_range' or 'remap_pfn_range' functions?  How about the
 > following? (I am developing under RHEL4 and a 2.6.9 kernel)

There is no relationship between the address returned from ioremap and
what you pass into io_remap_page_range().  ioremap gives you a kernel
virtual address for the PCI address you remap.  io_remap_page_range()
creates a userspace mapping in the same way, and you should pass in
the PCI address exactly the same way you pass in the PCI address into
ioremap.  io_remap_pfn_range() takes a PFN ("page frame number"),
which is basically the PCI address you want to map divided by
PAGE_SIZE.  The main reason for using PFNs is that they allow you to
map addresses above 4G even if sizeof long is only 4.

In your code:

 >  dev.pcibar2 = ioremap_nocache(resource,size);
 >  dev.region_start = dev.pcibar2 + offset; // RAM is at some offset 
 > from base

This gives you a kernel mapping that you can use with readl(),
writel() etc to access the PCI memory from the kernel.

To map to userspace, this:

 >   if (io_remap_page_range(vma, phyaddr, vma->vm_start, vsize, 
 > vma->vm_page_prot))

should use phyaddr as you have it here:

 >   // phyaddr   = physical address of PCI memory area

This is just wrong:

 >   unsigned long phy= __pa(dev->region_start + off);

__pa() doesn't work on addresses returned from ioremap.  Just use the
same resource address you passed into ioremap.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [POWERPC] Fix CONFIG_SMP=n build break

2007-10-27 Thread Olof Johansson
On Sun, Oct 28, 2007 at 12:38:50PM +1100, Stephen Rothwell wrote:
> Hi Olof,
> 
> Just a trivial thing ...
> 
> On Sat, 27 Oct 2007 12:28:51 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
> >
> > +u8 iic_get_target_id(int cpu)
> > +{
> > +   return per_cpu(iic, cpu).target_id;
> > +}
> > +
> > +EXPORT_SYMBOL_GPL(iic_get_target_id);
> 
> We don't normally put a blank line between a function and its EXPORT...

Yeah, sloppy of me, I thought I just copied and pasted. Paulus: feel
free to fix up before applying.


Thanks for your feedback,

-Olof
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] [POWERPC] Fix CONFIG_SMP=n build break

2007-10-27 Thread Stephen Rothwell
Hi Olof,

Just a trivial thing ...

On Sat, 27 Oct 2007 12:28:51 -0500 Olof Johansson <[EMAIL PROTECTED]> wrote:
>
> +u8 iic_get_target_id(int cpu)
> +{
> + return per_cpu(iic, cpu).target_id;
> +}
> +
> +EXPORT_SYMBOL_GPL(iic_get_target_id);

We don't normally put a blank line between a function and its EXPORT...

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpTqpBpC9CIc.pgp
Description: PGP signature


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Herbert Xu
Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
>
>irq_flags_t flags;
> 
>flags = spin_lock_irqXXX();
>spin_unlock_irqYYY(, flags);
> 
> where XXX and YYY are still to be found good names :^) It's also a solution

How about flags?

flags = spin_lock_irqflags();
spin_unlock_irqflags(, flags);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: eradicating out of tree modules (was: Linux Security *Module* Framework)

2007-10-27 Thread Adrian Bunk
On Sat, Oct 27, 2007 at 04:07:41PM +0200, Tilman Schmidt wrote:
> Greg KH schrieb:
> > On Fri, Oct 26, 2007 at 11:46:39AM +0200, Tilman Schmidt wrote:
> >> [...] I still think there will always be
> >> a number of external modules that cannot be merged right now or at
> >> all, and deliberately making life difficult for out-of-tree code
> >> maintainers in order to coerce them into submitting their code for
> >> inclusion in the kernel will not work, it'll only create bad
> >> feelings.
> > 
> > Do you have examples of proof of this?
> 
> No proof in the legal, mathematical or scientific sense of the
> term, but examples:
> 
> - at least one talented kernel developer giving up his work,
> until then maintained out of tree, after submitting it for
> inclusion in the kernel and taking the ensuing fla^Wdiscussion
> on LKML (nothing extraordinary, just the usual lack of
> courtesy and respect) too much to heart
>...

There's one important point to note:

In a project of the size of the Linux kernel (at about 2000 distinct 
people contributing code within one year) you will always lose 
developers:

If you require too much from code for getting it included you lose some 
of the people who develop code.

If you accept code of dubious quality you lose some of the people who 
care about the quality of the kernel.

And if you add a stable API for modules with not GPL compatible licences 
at least one untalented kernel developer (me) might give up his work.

If your goal is to please all developers you have a goal you can't 
achieve.

The only reasonable way is to accept that whatever you do you'll lose 
some people and go in the direction you consider the right one.

And the power of open source is that when an open source project gets 
into a direction many people dislike they can simply fork it. Consider 
e.g. XFree86->X.Org or NetBSD->OpenBSD. And that's nothing bad - either 
the forks develop in different directions creating different useful 
software or there's an evolutionary contest for the best software.

> T.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: eradicating out of tree modules (was: : Linux Security *Module* Framework)

2007-10-27 Thread Adrian Bunk
On Sat, Oct 27, 2007 at 04:47:15PM +0200, Tilman Schmidt wrote:
> Adrian Bunk schrieb:
> > On Fri, Oct 26, 2007 at 11:46:39AM +0200, Tilman Schmidt wrote:
> >> On Thu, 25 Oct 2007 19:56:47 -0700, Greg KH wrote:
> >>> On Fri, Oct 26, 2007 at 01:09:14AM +0200, Tilman Schmidt wrote:
>  [...] Once you admit that there is code which, for very good
>  reasons, won't ever be accepted into the mainline kernel tree, what you
>  are saying amounts to: "Code that isn't fit to be included in the
>  mainline kernel isn't fit to exist at all."
> >>> What kind of code is not accepted into the mainline kernel tree for good
> >>> reasons?
> >> - proprietary code
> > 
> > It's unclear whether distributing not GPL compatible modules is legal
> > at all.
> 
> We're neither talking about distribution nor legal aspects, but
> about existence. But anyway, you seem to agree with me that there
> are very good reasons for not including these in the kernel.
> 
> > And they are definitely not "very good reasons" for doing anything in 
> > the kernel.
> 
> There is a big difference between "not doing anything to help"
> and "actively doing something to make life difficult for". The
> former is undoubtedly legitimate. It's the latter we're
> discussing here.

Justifying anything with code with not GPL compatible licences has zero 
relevance here.

And there's value in making life harder for such modules with 
questionable legality. As an example, consider people who experienced 
crashes of "the Linux kernel" caused by some binary-only driver.
Not that uncommon e.g. with some graphics drivers.
This harms the reputation of Linux as being stable.

The solution is not to support proprietary drivers, the solution is to 
get open source replacements.

>...
> >> - code conflicting with existing kernel structure or policy
> >> - code in which the concerned subsystem maintainers see no benefit
> > 
> > Let's fix the problems, not work around them.
> 
> That's certainly better, but not always possible. Do you
> agree with me that if it isn't, then that's a very good
> reason for not including that code in the kernel?

No, it's still a reason for fixing the real problem.

> > There is a conflict between getting code included and ensuring some 
> > minimum quality of the kernel, but in many cases we could try better.
> 
> Correct. Again, you appear to agree with my statement that
> for some code there are very good reasons not to include it
> in the kernel.

But this does not result in any obligation of supporting low quality 
external code that destabilizes the kernel of people using it.

If it's low quality code doing something useful - well, how many hundred 
people are on Greg's list only waiting for some driver they could write?

> > And when there's a good reason for a kernel policy, then code that
> > violates this policy is not a "very good reason" for anything.
> 
> >> - code which its author is unable and/or unwilling to convert to
> >>   kernel coding standards
> >> - code whose author is unable and/or unwilling to defend it on LKML
> >> ...
> > 
> > That's their fault, and definitely not a "very good reason" for making 
> > life easier for them.
> 
> Putting aside the fruitless question of whose fault it is,
> is it a "very good reason" for actively making life more
> difficult for them than it is already, eg. by gratuitiously
> breaking interfaces they rely on for no other "very good
> reason" than to discourage out-of-tree development? In other
> words, do you think it benefits the Linux community if you
> discourage those programmers you've already scared away from
> submitting their code to the kernel from continuing their
> work off-tree, too? In summary, do you think the world would
> be a better place if all the existing out-of-tree modules
> just ceased to exist, without any replacement?

With your "without any replacement" you needlessly excluded the 
reasonable solution:

The solution is that someone other than the author either takes the 
existing external code or rewrites it from scratch, submits it for 
inclusion into the kernel, and maintains it there.

Let me repeat that Greg has said he has hundreds of volunteers for such 
tasks.

> T.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


struct user . u_ar0

2007-10-27 Thread H. Peter Anvin
I recently noticed that all architectures appear to have an entry n 
struct user called u_ar0:


  struct user_pt_regs * u_ar0;  /* Used by gdb to help find the values 
for */

/* the registers. */

In all cases, u_ar0 is a pointer type, although the type of pointer 
varies with the architecture.


However, under no conditions does this field ever contain a pointer 
value!  It is set by the a.out code and its derivatives as an offset, 
not a pointer value (there are a total of four references in the kernel, 
in arch/{m68k,blackfin}/kernel/process.c, arch/x86/ia32/ia32_aout.c and 
fs/binfmt_aout.c -- they are all functionally identical and write-only):


dump.u_ar0 = (void *)(((unsigned long)()) - ((unsigned 
long)()));


Any reason to *NOT* change this field to "unsigned long"?  , 
where struct user is defined, is not exported to userspace in any 
architecture as far as I can tell, although , which just 
contains #include , *is* exported (clearly a bug.)


-hpa


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread H. Peter Anvin

Mike Waychison wrote:

The following series is meant to clean up FIBMAP paths with the eventual goal 
of allowing users to be able to FIBMAP their data.


Keep in mind FIBMAP is currently extremely expensive on some 
filesystems, e.g. ext3.  Therefore, additional filesystem-level work 
would have to be done in order for this not become a DoS issue.


-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kconfig: conf segfault (with invalid kconfig contents)

2007-10-27 Thread Roman Zippel
Hi,

On Thursday 25 October 2007, Sam Ravnborg wrote:

> > It's clearly invalid in that it depends on what it selects, but it should
> > just abort instead.
>
> Thanks for the report and especially for the testcase!
> I will try to look at it a bit later if noone bites me (I'm afraid not).

Well, you're also responsible for it. :)
http://lkml.org/lkml/2007/5/6/14

It actually finds the recursive dependency, but it crashes because sym->prop 
is NULL.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


USB_SUSPEND needs SUSPEND?

2007-10-27 Thread Oliver Martin
Hello,

it seems that there is some dependency missing for CONFIG_USB_SUSPEND.
When I compile a kernel without CONFIG_SUSPEND, USB suspend doesn't work
(or at least doesn't expose its interface to userspace).
/sys/bus/usb/devices/.../power doesn't exist, but it is there when I
enable CONFIG_SUSPEND. So should CONFIG_USB_SUSPEND depend on
CONFIG_SUSPEND (or something selected by CONFIG_SUSPEND)?

diff between the two configs follows, full config and dmesg for both
kernels is available at http://stud4.tuwien.ac.at/~e0626486/usb_suspend/
(though the dmesg diff doesn't seem to be all that interesting).


--- /boot/config-2.6.23.1-nosuspend
+++ /boot/config-2.6.23.1-suspend
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.23.1
-# Sat Oct 27 23:59:11 2007
+# Sat Oct 27 22:45:27 2007
 #
 CONFIG_X86_32=y
 CONFIG_GENERIC_TIME=y
@@ -30,7 +30,7 @@
 CONFIG_EXPERIMENTAL=y
 CONFIG_BROKEN_ON_SMP=y
 CONFIG_INIT_ENV_ARG_LIMIT=32
-CONFIG_LOCALVERSION="-nosuspend"
+CONFIG_LOCALVERSION="-suspend"
 # CONFIG_LOCALVERSION_AUTO is not set
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
@@ -229,11 +229,13 @@
 CONFIG_PM=y
 CONFIG_PM_LEGACY=y
 # CONFIG_PM_DEBUG is not set
+CONFIG_PM_SLEEP=y
 CONFIG_SUSPEND_UP_POSSIBLE=y
-# CONFIG_SUSPEND is not set
+CONFIG_SUSPEND=y
 CONFIG_HIBERNATION_UP_POSSIBLE=y
 # CONFIG_HIBERNATION is not set
 CONFIG_ACPI=y
+CONFIG_ACPI_SLEEP=y
 CONFIG_ACPI_PROCFS=y
 CONFIG_ACPI_PROC_EVENT=y
 CONFIG_ACPI_AC=m
@@ -255,6 +257,7 @@
 CONFIG_X86_PM_TIMER=y
 CONFIG_ACPI_CONTAINER=m
 CONFIG_ACPI_SBS=m
+# CONFIG_APM is not set

 #
 # CPU Frequency scaling


-- 
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel

2007-10-27 Thread Al Viro
On Sat, Oct 27, 2007 at 11:01:12AM +0200, Ahmed S. Darwish wrote:
> The problem here (As discussed in private mails) is that the for loop 
> assumes that the beginning of given user-space buffer is the beginning
> of a rule. This leads to situations where the rule becomes "ecret 20",
> or "cret 20" instead of "Secret 20". Big input buffers/files leads 
> smack to recieve a rule like "Secret 20" in fragmented chunks like:
> 
> write("\nSec", ..)
> write("r", 1, ..)
> write("et 20\n", ..)
> 
> Parsing a rule in such tough conditions in _kernel space_ is very
> hard. I began to feel that it will be much easier if we do the parsing 
> in a userspace utility and let smack accept only small buffers (80 char). 

For crying out louf, all it takes is a finite state machine...  BTW, folks,
your parser *and* input language suck.  Really.  Silently allowing noise
is Dumb(tm).

Please, write the grammar down and _follow_ _it_.  AFAICS, trimming the
crap ("we have a number, skip everything until '/', whatever noise we
have there doesn't matter") leads to something like

text: (whitespace line? \n)*
whitespace: [ \t]*
line: label whitespace number whitespace (/ whitespace set whitespace)?
set: number (whitespace , whitespace number)*
label: [!-.0-~]{1,23}
number: [0-9]+

and even that might be too liberal.  For fsck sake, all you need is to
keep a struct that would contain
* state
* (partial) number
* list of smack_known, with the first element being the partial one.
* number of characters already seen in label (label itself stored
in the list head ->smk_known)
and that's it - just have a switch by ->state to handle the next character.
Allocate that struct in ->open(), modify in ->write(), apply the entire thing
at once in ->release().

Come on, people, this is ridiculous - why bother reinventing the wheels for
the stuff that belongs to exercises halfway through any self-respecting
introductory textbook?  Scary parser, my arse...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [build bug, 2.6.24-rc1] CONFIG_VIDEO_DEV=m & CONFIG_VIDEO_SAA7146_VV=y

2007-10-27 Thread Randy Dunlap
On Sat, 27 Oct 2007 23:30:57 +0200 Johannes Stezenbach wrote:

> On Fri, Oct 26, 2007, Ingo Molnar wrote:
> > 
> > the attached config (generated via make randconfig) fails to build due 
> > to the combination of these config entries:
> > 
> >   CONFIG_VIDEO_DEV=m
> >   CONFIG_VIDEO_SAA7146_VV=y
> > 
> > i found no obvious Kconfig way to force VIDEO_SAA7146_VV to be modular 
> > when VIDEO_DEV is modular - is there a good solution for this?
> 
> According to http://lkml.org/lkml/2007/10/21/226 :
> 
> config VIDEO_SAA7146_VV
>   tristate
>   depends on VIDEO_DEV = y || VIDEO_DEV = VIDEO_SAA7146_VV
>   select VIDEOBUF_DMA_SG
>   select VIDEO_SAA7146
> 
> (untested)

Nope, won't work.  I tried that last night.  VIDEO_DEV_SAA7146_VV
has too many "select"s involved, but select doesn't follow the dependency
chains.  IOW, as written in Documentation/kbuild/kconfig-language.txt:

select is evil select will by brute force set a symbol
equal to 'y' without visiting the dependencies. So abusing
select you are able to select a symbol FOO even if FOO depends
on BAR that is not set.


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tristate and bool not enogh for Kconfig anymore

2007-10-27 Thread Roman Zippel
Hi,

On Monday 22 October 2007, Randy Dunlap wrote:

> ~
> Another common idiom that we see (and sometimes have problems
> with) is this:
>
> When B (module or subsystem) uses interfaces from A (module or
> subsystem), A can be linked statically into the kernel image or
> can be built as loadable module(s).   This limits how B can be
> built.  If A is linked statically into the kernel image, B can be
> built statically or as loadable module(s).  However, if A is built
> as loadable module(s), then B must be restricted to loadable
> module(s) also.  This can be expressed in kconfig language as:
>
> config B
>   depends on A = y || A = B

What you describe is a simple "depends on A" and your example won't work 
because it adds a recursive dependency.

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
David wrote:
> I think there's a mixup in the flag name [MPOL_MF_RELATIVE] there

Most likely.  The discussion involving that flag name was kinda mixed up ;).

> but I actually would recommend against any flag to effect Choice A.
> It's simply going to be too complex to describe and is going to be a
> headache to code and support. 

While I am sorely tempted to agree entirely with this, I suspect that
Christoph has a point when he cautions against breaking this kernel API.

Especially for users of the set/get mempolicy calls coming in via
libnuma, we have to be very careful not to break the current behaviour,
whether it is documented API or just an accident of the implementation.

There is a fairly deep and important stack of software, involving a
well known DBMS product whose name begins with 'O', sitting on that
libnuma software stack.  Steering that solution stack is like steering
a giant oil tanker near shore.  You take it slow and easy, and listen
closely to the advice of the ancient harbor master.  The harbor masters
in this case are or were Andi Kleen and Christoph Lameter.

> It's simply going to be too complex to describe and is going to be a
> headache to code and support.

True, which is why I am hoping we can keep this modal flag, if such be,
from having to be used on every set/get mempolicy call.  The ordinary
coder of new code using these calls directly should just see Choice B
behaviour.  However the user of libnuma should continue to see whatever
API libnuma supports, with no change whatsoever, and various versions of
libnuma, including those already shipped years ago, must continue to
behave without any changes in node numbering.

There are two decent looking ways (and some ugly ways) that I can see
to accomplish this:

 1) One could claim that no important use of Oracle over libnuma over
these memory policy calls is happening on a system using cpusets.
There would be a fair bit of circumstantial evidence for this
claim, but I don't know it for a fact, and would not be the
expert to determine this.  On systems making no use of cpusets,
these two Choices A and B are identical, and this is a non-issue.
Those systems will see no API changes whatsoever from any of this.

 2) We have a per-task mode flag selecting whether Choice A or B
node numbering apply to the masks passed in to set_mempolicy.

The kernel implementation is fairly easy.  (Yeah, I know, I
too cringe everytime I read that line ;)

If Choice A is active, then we continue to enforce the current
check, in mm/mempolicy.c: contextualize_policy(), that the passed
in mask be a subset of the current cpusets allowed memory nodes,
and we add a call to nodes_remap(), to map the passed in nodemask
from cpuset centric to system centric (if they asked for the
fourth node in their current cpuset, they get the fourth node in
the entire system.)

Similarly, for masks passed back by get_mempolicy, if Choice A
is active, we use nodes_remap() to convert the mask from system
centric back to cpuset centric.

There is a subtle change in the kernel API's here:

In current kernels, which are Choice A, if a task is moved from
a big cpuset (many nodes) to a small cpuset and then -back-
to a big cpuset, the nodemasks returned by get_mempolicy
will still show the smaller masks (fewer set nodes) imposed
by the smaller cpuset.

In todays kernels, once scrunched or folded down, the masks
don't recover their original size after the task is moved
back to a large cpuset.

With this change, even a task asking for Choice A would,
once back on a larger cpuset, again see the larger masks from
get_mempolicy queries.  This is a change in the kernel API's
visible to user space; but I really do not think that there
is sufficient use of Oracle over libnuma on systems actively
moving tasks between differing size cpusets for this to be
a problem.

Indeed, if there was much such usage, I suspect they'd
be complaining that the current kernel API was borked, and
they'd be filing a request for enhancement -asking- for just
this subtle change in the kernel API's here.  In other words,
this subtle API change is a feature, not a bug ;)

The bulk of the kernel's mempolicy code is coded for Choice B.

If Choice B is active, we don't enforce the subset check in
contextualize_policy(), and we don't invoke nodes_remap() in either
of the set or get mempolicy code paths.

A new option to get_mempolicy() would query the current state of
this mode flag, and a new option to set_mempolicy() would set
and clear this mode flag.  Perhaps Christoph had this in mind
when he wrote in an earlier message "The alternative is to add
new set/get mempolicy functions."

The default kernel API for each task would be Choice B 

[PATCH 1/3] VFS: apply coding standards to fs/ioctl.c

2007-10-27 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/ioctl.c |  164 +++-
 1 files changed, 84 insertions(+), 80 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index c2a773e..652cacf 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -12,8 +12,8 @@
 #include 
 #include 
 #include 
+#include 
 
-#include 
 #include 
 
 static long do_ioctl(struct file *filp, unsigned int cmd,
@@ -45,31 +45,31 @@ static int file_ioctl(struct file *filp, unsigned int cmd,
 {
int error;
int block;
-   struct inode * inode = filp->f_path.dentry->d_inode;
+   struct inode *inode = filp->f_path.dentry->d_inode;
int __user *p = (int __user *)arg;
 
switch (cmd) {
-   case FIBMAP:
-   {
-   struct address_space *mapping = filp->f_mapping;
-   int res;
-   /* do we support this mess? */
-   if (!mapping->a_ops->bmap)
-   return -EINVAL;
-   if (!capable(CAP_SYS_RAWIO))
-   return -EPERM;
-   if ((error = get_user(block, p)) != 0)
-   return error;
-
-   lock_kernel();
-   res = mapping->a_ops->bmap(mapping, block);
-   unlock_kernel();
-   return put_user(res, p);
-   }
-   case FIGETBSZ:
-   return put_user(inode->i_sb->s_blocksize, p);
-   case FIONREAD:
-   return put_user(i_size_read(inode) - filp->f_pos, p);
+   case FIBMAP:
+   {
+   struct address_space *mapping = filp->f_mapping;
+   int res;
+   /* do we support this mess? */
+   if (!mapping->a_ops->bmap)
+   return -EINVAL;
+   if (!capable(CAP_SYS_RAWIO))
+   return -EPERM;
+   error = get_user(block, p);
+   if (error)
+   return error;
+   lock_kernel();
+   res = mapping->a_ops->bmap(mapping, block);
+   unlock_kernel();
+   return put_user(res, p);
+   }
+   case FIGETBSZ:
+   return put_user(inode->i_sb->s_blocksize, p);
+   case FIONREAD:
+   return put_user(i_size_read(inode) - filp->f_pos, p);
}
 
return do_ioctl(filp, cmd, arg);
@@ -82,81 +82,85 @@ static int file_ioctl(struct file *filp, unsigned int cmd,
  * vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
  * It's just a simple helper for sys_ioctl and compat_sys_ioctl.
  */
-int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd, unsigned 
long arg)
+int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
+ unsigned long arg)
 {
unsigned int flag;
int on, error = 0;
 
switch (cmd) {
-   case FIOCLEX:
-   set_close_on_exec(fd, 1);
-   break;
+   case FIOCLEX:
+   set_close_on_exec(fd, 1);
+   break;
 
-   case FIONCLEX:
-   set_close_on_exec(fd, 0);
-   break;
+   case FIONCLEX:
+   set_close_on_exec(fd, 0);
+   break;
 
-   case FIONBIO:
-   if ((error = get_user(on, (int __user *)arg)) != 0)
-   break;
-   flag = O_NONBLOCK;
+   case FIONBIO:
+   error = get_user(on, (int __user *)arg);
+   if (error)
+   break;
+   flag = O_NONBLOCK;
 #ifdef __sparc__
-   /* SunOS compatibility item. */
-   if(O_NONBLOCK != O_NDELAY)
-   flag |= O_NDELAY;
+   /* SunOS compatibility item. */
+   if (O_NONBLOCK != O_NDELAY)
+   flag |= O_NDELAY;
 #endif
-   if (on)
-   filp->f_flags |= flag;
-   else
-   filp->f_flags &= ~flag;
+   if (on)
+   filp->f_flags |= flag;
+   else
+   filp->f_flags &= ~flag;
+   break;
+
+   case FIOASYNC:
+   error = get_user(on, (int __user *)arg);
+   if (error)
break;
-
-   case FIOASYNC:
-   if ((error = get_user(on, (int __user *)arg)) != 0)
-   break;
-   flag = on ? FASYNC : 0;
-
-   /* Did FASYNC state change ? */
-   if ((flag ^ filp->f_flags) & FASYNC) {
-   if (filp->f_op && filp->f_op->fasync) {
-  

[PATCH 3/3] Unionfs: use vfs_ioctl

2007-10-27 Thread Erez Zadok
Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/unionfs/commonfops.c |   32 ++--
 1 files changed, 6 insertions(+), 26 deletions(-)

diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
index 50e5775..c99b519 100644
--- a/fs/unionfs/commonfops.c
+++ b/fs/unionfs/commonfops.c
@@ -661,31 +661,6 @@ out:
return err;
 }
 
-/* pass the ioctl to the lower fs */
-static long do_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
-   struct file *lower_file;
-   int err;
-
-   lower_file = unionfs_lower_file(file);
-
-   err = -ENOTTY;
-   if (!lower_file || !lower_file->f_op)
-   goto out;
-   if (lower_file->f_op->unlocked_ioctl) {
-   err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
-   } else if (lower_file->f_op->ioctl) {
-   lock_kernel();
-   err = lower_file->f_op->ioctl(
-   lower_file->f_path.dentry->d_inode,
-   lower_file, cmd, arg);
-   unlock_kernel();
-   }
-
-out:
-   return err;
-}
-
 /*
  * return to user-space the branch indices containing the file in question
  *
@@ -752,6 +727,7 @@ out:
 long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
long err;
+   struct file *lower_file;
 
unionfs_read_lock(file->f_path.dentry->d_sb);
 
@@ -775,7 +751,11 @@ long unionfs_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
 
default:
/* pass the ioctl down */
-   err = do_ioctl(file, cmd, arg);
+   lower_file = unionfs_lower_file(file);
+   if (lower_file)
+   err = vfs_ioctl(lower_file, cmd, arg);
+   else
+   err = -ENOTTY;
break;
}
 
-- 
1.5.2.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] VFS: swap do_ioctl and vfs_ioctl names

2007-10-27 Thread Erez Zadok
Rename old vfs_ioctl to do_ioctl, because the comment above it clearly
indicates that it is an internal function not to be exported to modules;
therefore it should have a more traditional do_XXX name.  The new do_ioctl
is exported in fs.h but not to modules.

Rename the old do_ioctl to vfs_ioctl because the names vfs_XXX should
preferably be reserved to callable VFS functions which modules may call,
as many other vfs_XXX functions already do.  Export the new vfs_ioctl to
modules so others can use it (including Unionfs and eCryptfs).

Signed-off-by: Erez Zadok <[EMAIL PROTECTED]>
---
 fs/compat_ioctl.c  |2 +-
 fs/ioctl.c |   18 ++
 include/linux/fs.h |3 ++-
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index a4284cc..a1604ce 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -2972,7 +2972,7 @@ asmlinkage long compat_sys_ioctl(unsigned int fd, 
unsigned int cmd,
}
 
  do_ioctl:
-   error = vfs_ioctl(filp, fd, cmd, arg);
+   error = do_ioctl(filp, fd, cmd, arg);
  out_fput:
fput_light(filp, fput_needed);
  out:
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 652cacf..00abbbf 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -16,8 +16,9 @@
 
 #include 
 
-static long do_ioctl(struct file *filp, unsigned int cmd,
-   unsigned long arg)
+/* vfs_ioctl can be called by other file systems or modules */
+long vfs_ioctl(struct file *filp, unsigned int cmd,
+  unsigned long arg)
 {
int error = -ENOTTY;
 
@@ -39,6 +40,7 @@ static long do_ioctl(struct file *filp, unsigned int cmd,
  out:
return error;
 }
+EXPORT_SYMBOL(vfs_ioctl);
 
 static int file_ioctl(struct file *filp, unsigned int cmd,
unsigned long arg)
@@ -72,18 +74,18 @@ static int file_ioctl(struct file *filp, unsigned int cmd,
return put_user(i_size_read(inode) - filp->f_pos, p);
}
 
-   return do_ioctl(filp, cmd, arg);
+   return vfs_ioctl(filp, cmd, arg);
 }
 
 /*
  * When you add any new common ioctls to the switches above and below
  * please update compat_sys_ioctl() too.
  *
- * vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
+ * do_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
  * It's just a simple helper for sys_ioctl and compat_sys_ioctl.
  */
-int vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
- unsigned long arg)
+int do_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
+unsigned long arg)
 {
unsigned int flag;
int on, error = 0;
@@ -152,7 +154,7 @@ int vfs_ioctl(struct file *filp, unsigned int fd, unsigned 
int cmd,
if (S_ISREG(filp->f_path.dentry->d_inode->i_mode))
error = file_ioctl(filp, cmd, arg);
else
-   error = do_ioctl(filp, cmd, arg);
+   error = vfs_ioctl(filp, cmd, arg);
break;
}
return error;
@@ -172,7 +174,7 @@ asmlinkage long sys_ioctl(unsigned int fd, unsigned int 
cmd, unsigned long arg)
if (error)
goto out_fput;
 
-   error = vfs_ioctl(filp, fd, cmd, arg);
+   error = do_ioctl(filp, fd, cmd, arg);
  out_fput:
fput_light(filp, fput_needed);
  out:
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b3ec4a4..c0c5d36 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1924,7 +1924,8 @@ extern int vfs_stat_fd(int dfd, char __user *, struct 
kstat *);
 extern int vfs_lstat_fd(int dfd, char __user *, struct kstat *);
 extern int vfs_fstat(unsigned int, struct kstat *);
 
-extern int vfs_ioctl(struct file *, unsigned int, unsigned int, unsigned long);
+extern long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+extern int do_ioctl(struct file *, unsigned int, unsigned int, unsigned long);
 
 extern void get_filesystem(struct file_system_type *fs);
 extern void put_filesystem(struct file_system_type *fs);
-- 
1.5.2.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 0/3 fs/ioctl.c coding style, rename vfs_ioctl/do_ioctl

2007-10-27 Thread Erez Zadok

This series of three proposed patches changes fs/ioctl.c and Unionfs as
follows.  This series is against v2.6.24-rc1-192-gef49c32.

Patch 1: just applies coding standards to fs/ioctl.c (while I'm at it, I
figured it's worth cleaning VFS files one at a time).

Patch 2: does two things:

(a) Renames the old vfs_ioctl to do_ioctl, because the comment above the old
vfs_ioctl clearly indicates that it is an internal function not to be
exported to modules; therefore it should have a more traditional do_XXX
"internal function" name.  The new do_ioctl is exported in fs.h but not
to modules.

(b) Renames the old (static) do_ioctl to vfs_ioctl because the names vfs_XXX
should preferably be reserved to callable VFS functions which modules
may call, as other vfs_XXX functions already do.  Export the new
vfs_ioctl to modules so others can use it (including Unionfs and
eCryptfs).

Patch 3: demonstrates how Unionfs can use the new vfs_ioctl.  I successfully
tested unionfs with this new exported vfs_ioctl.  (eCryptfs could do the
same.)

I'd like to propose that the first two patches be merged in -mm and even
mainline, pending review.

Erez Zadok (3):
  VFS: apply coding standards to fs/ioctl.c
  VFS: swap do_ioctl and vfs_ioctl names
  Unionfs: use vfs_ioctl

 fs/compat_ioctl.c   |2 
 fs/ioctl.c  |  176 
 fs/unionfs/commonfops.c |   22 +-
 include/linux/fs.h  |3 

Cheers,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-10-27 Thread Daniel Phillips
On Friday 26 October 2007 10:44, Peter wrote:
> > ...the way the watermarks work they will be evenly distributed
> > over the appropriate zones. ..

Hi Peter,

The term is "highwater mark" not "high watermark".  A watermark is an 
anti-counterfeiting device printed on paper money.  "Highwater" is how 
high water gets, which I believe is the sense we intend in Linux.  
Therefore any occurrence of "watermark" in the kernel source is a 
spelling mistake, unless it has something to do with printing paper 
money.

While fixing this entrenched terminology abuse in our kernel source may 
be difficult, sticking to the correct English on lkml is quite easy :-)

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/9] Unionfs: security convert lsm into a static interface fix

2007-10-27 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Christoph Hellwig writes:
> On Mon, Oct 22, 2007 at 08:48:04PM -0400, Erez Zadok wrote:
> > Why?  Are you concerned that the security policy may change after a module
> > is loaded?
> 
> No, it's a matter of proper layering.  We generally don't want modules
> like stackabke filesystems to call directly into methods but rather use
> proper highlevel VFS helpers to isolate them from details and possible
> changes.  The move to out of line security_ helpers just put this on the
> radard.

OK, I'll be shortly posting a couple of patches to fs/ioctl.c.

> > I can probably get rid of having unionfs call security_inode_permission,
> > by calling permission() myself and carefully post-process its return
> > code (unionfs needs to "ignore" EROFS initially, to allow copyup to take
> > place).
> 
> Sounds fine.

I was able to test this idea and it works fine.  Now unionfs calls
permission(), post-processes the return value, and I don't need my own
modified version of permission() in unionfs.  This saved me ~50 LoC and
reduced stack pressure a little.

> > But security_file_ioctl doesn't have any existing helper I can call.  I
> > can introduce a trivial vfs_security_file_ioctl wrapper to
> > security_file_ioctl, but what about the already existing *19* exported
> > security_* functions in security/security.c?  Do you want to see simple
> > wrappers for all of them?  It seems redundant to add a one-line wrapper
> > around an already one-line function around security_ops->XXX.  Plus,
> > some of the existing exported security_* functions are file-system
> > related, others are networking, etc.  So we'll need wrappers whose names
> > are prefixed appropriately: vfs_*, net_*, etc.
> 
> The fix for security_file_ioctl is probably to either not do it at all
> or move it the call to security_file_ioctl into vfs_ioctl and get it by
> using that helper.  I suspect most other security_ exports should be
> avoided similarly.

Christoph, I looked more closely at that and the selinux code.  Only
sys_ioctl calls security_file_ioctl.  And security_file_ioctl performs all
sorts of checks that mostly have to do with the currently running task or
the open file.  The running task is still the same, whether filesystem
stacking is involved or not.  Also, the unionfs-level struct file is
logically the same file at the lower level: they refer to the same object,
just at two layers.  I can't see any reason why unionfs_ioctl should have to
call security_file_ioctl(lower_file) the way sys_ioctl does: that check is
already done well before the file system's ->ioctl is invoked.  I also don't
see how it would ever be possible that sys_ioctl will succeed in its call to
security_file_ioctl(upper_file), but unionfs will fail the same security
check on the lower file.

So I commented out unionfs's call to security_file_ioctl(lower_file) and
tested it on a bunch of things, including an selinux-enabled livecd.
Everything seemed to work just fine, so I'll be sending some patches to that
effect, and we can drop the -mm patch which exports security_file_ioctl().
BTW, ecryptfs doesn't call security_file_ioctl().

Cheers,
Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-10-27 Thread Daniel Phillips
On Friday 26 October 2007 10:55, Christoph Lameter wrote:
> On Fri, 26 Oct 2007, Pavel Machek wrote:
> > > And, _no_, it does not necessarily mean global serialisation. By
> > > simply saying there must be N pages available I say nothing about
> > > on which node they should be available, and the way the
> > > watermarks work they will be evenly distributed over the
> > > appropriate zones.
> >
> > Agreed. Scalability of emergency swapping reserved is simply
> > unimportant. Please, lets get swapping to _work_ first, then we can
> > make it faster.
>
> Global reserve means that any cpuset that runs out of memory may
> exhaust the global reserve and thereby impact the rest of the system.
> The emergencies that are currently localized to a subset of the
> system and may lead to the failure of a job may now become global and
> lead to the failure of all jobs running on it.

If it does, it is a bug in the reserve accounting.  That said, I still 
agree with you that per-node reserve is a desirable goal for numa.  I 
would just like to be clear that it is not necessary, even for numa, 
just nice.  By all means somebody should be hacking on a numa feature 
for per-node emergency reserves, but as far as fixing the immediate, 
serious kernel block IO deadlocks goes, it does not matter.

Pavel, I do not agree that efficiency is unimportant on the 
under-pressure path.  I do not even like to call that the "emergency" 
path, because under heavy load it is normal for a machine to spend a 
significant fraction of its time in that state.  However, the 
efficiency goal there does not need to be quite the same as normal 
mode.

To illustrate, I would expect to see something like 95% of normal block 
IO performance on a numa machine in the case that "emergency" (aka 
memalloc memory) is allocated globally instead of locally, thus paying 
a (modest compared to the disk transfer itself) penalty for transfer of 
disk data over the numa interconnect.  95% of normal throughput on the 
block IO path is not a problem: if the machine spends 5% of its time on 
the "emergency" (aka memalloc) path, then overall efficiency will be 
95% * 95% = 99.75%.

Moral of this story: let's get the memory recursion fixes done in the 
most obviously correct way and not get distracted by illusory 
efficiency requirements for numa, that do not have a big bottom line 
impact.

I'm glad to see everybody still interested in these problems.  Though we 
have been a little quiet on this issue over here for a while, it does 
not mean that progress has stopped.  In fact, we are testing our 
solutions more heavily than ever, and getting closer to a solution that 
not only works solidly, but that should enable mass deletion of the 
whole creaky notion of dirty page limits in favor of nice, tight 
per-device control of in flight write traffic as I have described 
previously.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] SCSI not showing tray status correctly

2007-10-27 Thread Maarten Bressers
> This patch is too simplistic.  ide-cd.c:ide_cdrom_drive_status() looks
> to be a reasonable implementation.  However, the worry is that
> GET_EVENT_NOTIFICATION is a MMC command; devices not conforming to MMC
> won't support it.  In theory, they should just return ILLEGAL_REQUEST,
> but USB devices have been known to crash when given commands they don't
> understand.  How widely tested has this been (if it's been in Gentoo
> since 2004, then it's probably widely tested enough)?
>
> James
>
This patch hasn't been tested at all, I'm sorry if I gave you that
impression, it isn't in Gentoo's patches, it's just been brought to our
attention in the bug I linked too.

I created another patch, based on your recommendations, and with a lot of
help from Daniel Drake ([EMAIL PROTECTED]), that's included below. Some
changes are made to test_unit_ready() to be able to pass the sense data
to sr_drive_status(). Currently, sr_do_ioctl() swallows this data and
makes its own interpretations of sense codes, which isn't what we want
here.

Is this what you had in mind? Do you think the possible problems with
USB drives that you mentioned will prevent this patch from going in? 
The patch has only been compile tested right now, we can do some real
testing later, for now I'd just like to get your feedback on it.

Maarten

---

--- a/drivers/scsi/sr_ioctl.c   2007-10-26 22:40:41.0 +0200
+++ b/drivers/scsi/sr_ioctl.c   2007-10-27 23:56:16.0 +0200
@@ -275,16 +275,21 @@ int sr_do_ioctl(Scsi_CD *cd, struct pack
 /* -- */
 /* interface to cdrom.c   */
 
-static int test_unit_ready(Scsi_CD *cd)
+static int test_unit_ready(Scsi_CD *cd, struct request_sense *sense)
 {
-   struct packet_command cgc;
+   struct scsi_device *SDev = cd->device;
+   unsigned char cmd[CDROM_PACKET_SIZE] = { GPCMD_TEST_UNIT_READY };
+   int result;
 
-   memset(, 0, sizeof(struct packet_command));
-   cgc.cmd[0] = GPCMD_TEST_UNIT_READY;
-   cgc.quiet = 1;
-   cgc.data_direction = DMA_NONE;
-   cgc.timeout = IOCTL_TIMEOUT;
-   return sr_do_ioctl(cd, );
+   if (!scsi_block_when_processing_errors(SDev))
+   return -ENODEV;
+
+   memset(sense, 0, sizeof(*sense));
+   result = scsi_execute(SDev, cmd, DMA_NONE,
+ NULL, 0, (char *)sense,
+ IOCTL_TIMEOUT, IOCTL_RETRIES, 0);
+
+   return driver_byte(result);
 }
 
 int sr_tray_move(struct cdrom_device_info *cdi, int pos)
@@ -310,14 +315,41 @@ int sr_lock_door(struct cdrom_device_inf
 
 int sr_drive_status(struct cdrom_device_info *cdi, int slot)
 {
+   struct request_sense sense;
+   struct media_event_desc med;
+
if (CDSL_CURRENT != slot) {
/* we have no changer support */
return -EINVAL;
}
-   if (0 == test_unit_ready(cdi->handle))
+   if ((0 == test_unit_ready(cdi->handle, )) || 
+   sense.sense_key == UNIT_ATTENTION)
return CDS_DISC_OK;
 
-   return CDS_TRAY_OPEN;
+   if (!cdrom_get_media_event(cdi, )) {
+   if (med.media_present)
+   return CDS_DISC_OK;
+   else if (med.door_open)
+   return CDS_TRAY_OPEN;
+   else
+   return CDS_NO_DISC;
+   }
+
+   if (sense.sense_key == NOT_READY && sense.asc == 0x04 && sense.ascq == 
0x04)
+   return CDS_DISC_OK;
+
+   /*
+* If not using Mt Fuji extended media tray reports,
+* just return TRAY_OPEN since ATAPI doesn't provide
+* any other way to detect this...
+*/
+   if (sense.sense_key == NOT_READY) {
+   if (sense.asc == 0x3a && sense.ascq == 1)
+   return CDS_NO_DISC;
+   else
+   return CDS_TRAY_OPEN;
+   }
+   return CDS_DRIVE_NOT_READY;
 }
 
 int sr_disk_status(struct cdrom_device_info *cdi)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: lock held when returning to user space

2007-10-27 Thread Jiri Kosina
On Sat, 27 Oct 2007, Peter Zijlstra wrote:

> > > [  592.752781] 
> > > [  592.753478] [ BUG: lock held when returning to user space! ]
> > > [  592.753880] 
> > > [  592.754262] hwclock/1452 is leaving the kernel with locks still held!
> > > [  592.754655] 1 lock held by hwclock/1452:
> > > [  592.755007]  #0:  (>char_lock){--..}, at: [] 
> > > rtc_dev_open+0x2e/0x7e
> > Yes, this is because rtc keeps a char_lock mutex locked as long as the 
> > device is open, to avoid concurrent accessess.
> I think, in this case, the lock is associated with a kernel object that
> is properly cleaned up if the holding tasks gets a SIGKILL. But in
> general I'd like to see this kind of thing go away.

Yes, but the fact is that is really is invalid use of mutex -- because the 
mutex owner could become seriously wrong after fork() or sending the 
filedescriptor through unix socket ... this easily leads to broken 
situation.

This seems to have been introduced in e824290e5d ... Alessandro, could you 
convert this to test_and_set_bit()/clear_bit() semantics instead of a 
mutex please?

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] proc_fs.h redux

2007-10-27 Thread Joe Perches
On Sat, 2007-10-27 at 23:47 +0400, Alexey Dobriyan wrote:
> Remove proc_fs.h from headers that doesn't really need it.

> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Your code doesn't match your patch description.

You've got new includes of:






and forward declarations of

struct proc_dir_entry;
struct file_operations;

As a general rule, I think it better to use includes
than use naked forward declarations.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] select PARAVIRT when PARAVIRT_GUEST on

2007-10-27 Thread Rusty Russell
On Sunday 28 October 2007 00:07:12 Glauber de Oliveira Costa wrote:
> Commit d3d1c4bdf16bd154d9f27f34fca28edca90465eb introduces a new config
> knob that, according to its changelog, should select CONFIG_PARAVIRT.

Hi Glauber,

PARAVIRT is now selected by the options which need it.  Merely turning 
on the 
menu "does not add any kernel code" as promised in the text...

Rusty.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Sat, 27 Oct 2007, Maxim Levitsky wrote:

> > Use del_timer_sync().  It guarantees that when it returns, the timer 
> > will be stopped and the timer routine will no longer be running on any 
> > CPU.
> > 
> Even if the timer re-enables itself, are you sure?

Last time I looked at the source code, that's what it did.  I'll look
again...  Yep, it still does.  It checks to see if the timer routine is
currently running; if so then it waits a little while and tries again.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-27 Thread Szabolcs Szakacsits

On Sat, 27 Oct 2007, Anton Altaparmakov wrote:

> And another of my pet peeves with ->bmap is that it uses 0 to mean "sparse"
> which causes a conflict on NTFS at least as block zero is part of the $Boot
> system file so it is a real, valid block...  NTFS uses -1 to denote sparse
> blocks internally.

In practice, the meaning of 0 is file system [driver] dependent. For 
example in case of NTFS-3G it means that the block is sparse or the file is 
encrypted or compressed, or resident, or it's the $Boot file, or an error 
happened.

Thankfully the widely used FIBMAP users (swapon and the ever less used 
lilo) are only interested in the non-zero values and they report an error 
if the driver returns 0 for some reason. Which is perfectly ok since both 
swaping and Linux booting would fail using a sparse, encrypted, compressed, 
resident, or the NTFS $Boot file. 

But in real, both swap files and lilo work fine with NTFS if the needed 
files were created the way these softwares expect. If not then swapon or 
lilo will catch and report the file creation error.

Afair, somebody is doing (has done?) an indeed much needed, better 
alternative. Bmap is legacy, thank you Mike for maintaining it.

Szaka

--
NTFS-3G Lead Developer:  http://ntfs-3g.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote:
> On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> > On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > I had me a little look at bdi usage in networked filesystems.
> > >
> > >  NFS, CIFS, (smbfs), AFS, CODA and NCP
> > >
> > > And of those, NFS is the only one that I could find that creates
> > > backing_dev_info structures. The rest seems to fall back to
> > > default_backing_dev_info.
> > >
> > > With my recent per bdi dirty limit patches the bdi has become more
> > > important than it has been in the past. While falling back to the
> > > default_backing_dev_info isn't wrong per-se, it isn't right either.
> > >
> > > Could I implore the various maintainers to look into this issue for
> > > their respective filesystem. I'll try and come up with some patches to
> > > address this, but feel free to beat me to it.
> > 
> > I would like to understand more about your patches to see what bdi
> > values makes sense for CIFS and how to report possible congestion back
> > to the page manager. 
> 
> So, what my recent patches do is carve up the total writeback cache
> size, or dirty page limit as we call it, proportionally to a BDIs
> writeout speed. So a fast device gets more than a slow device, but will
> not starve it.
> 
> However, for this to work, each device, or remote backing store in the
> case of networked filesystems, need to have a BDI.
> 
> >   I had been thinking about setting bdi->ra_pages
> > so that we do more sensible readahead and writebehind - better
> > matching what is possible over the network and what the server
> > prefers.  
> 
> Well, you'd first have to create backing_dev_info instances before
> setting that value :-)
> 
> >   SMB/CIFS Servers typically allow a maximum of 50 requests
> > in parallel at one time from one client (although this is adjustable
> > for some).
> 
> That seems like a perfect point to set congestion.
> 
> So in short, stick a struct backing_dev_info into whatever represents a
> client, initialize it using bdi_init(), destroy using bdi_destroy().

Oh, and the most important point, make your fresh I_NEW inodes point to
this bdi struct.

> Mark it congested once you have 50 (or more) outstanding requests, clear
> congestion when you drop below 50.
> 
> and you should be set.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 23:08 +0200, Kay Sievers wrote:
> On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote:

> > Ah, I see a few problems.  Here, try this version instead.  It's
> > compile-tested only, and should be a lot simpler.
> > 
> > Note, we still are not setting the parent to the new bdi structure
> > properly, so the devices will show up in /sys/devices/virtual/ instead
> > of in their proper location.  To do this, we need the parent of the
> > device, which I'm not so sure what it should be (block device?  block
> > device controller?)
> 
> Assigning a parent device will only work with the upcoming conversion of
> the raw kobjects in the block subsystem to "struct device".
> 
> A few comments to the patch:
> 
> > --- a/include/linux/string.h
> > +++ b/include/linux/string.h
> > @@ -8,6 +8,7 @@
> >  #include /* for inline */
> >  #include/* for size_t */
> >  #include   /* for NULL */
> > +#include 
> >  
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si
> >  extern char **argv_split(gfp_t gfp, const char *str, int *argcp);
> >  extern void argv_free(char **argv);
> >  
> > +char *kvprintf(const char *fmt, va_list args);
> > +char *kprintf(const char *fmt, ...);
> 
> Why is that here? I don't think we need this when we use the existing:
>   kvasprintf(GFP_KERNEL, fmt, args)

Ignorance of the existance of said function. Thanks for pointing it out.
(kobject_set_name ought to use it too I guess)

> > --- a/mm/backing-dev.c
> > +++ b/mm/backing-dev.c
> 
> > +
> > +static struct device_attribute bdi_dev_attrs[] = {
> > +   __ATTR(readahead, 0644, readahead_show, readahead_store),
> > +   __ATTR_RO(reclaimable),
> > +   __ATTR_RO(writeback),
> > +   __ATTR_RO(dirty),
> > +   __ATTR_RO(bdi_dirty),
> > +};
> 
> Default attributes will need the NULL termination back (see below).
> 
> > +static __init int bdi_class_init(void)
> > +{
> > +   bdi_class = class_create(THIS_MODULE, "bdi");
> > +   return 0;
> > +}
> > +
> > +__initcall(bdi_class_init);
> > +
> > +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...)
> 
> This function should accept a: "struct device *parent" and all callers
> just pass NULL until the block layer conversion gets merged.

Yeah, you're right, but I wanted to just get something working before
bothering with the parent thing.

> > +{
> > +   char *name;
> > +   va_list args;
> > +   int ret = -ENOMEM;
> > +   int i;
> > +
> > +   va_start(args, fmt);
> > +   name = kvprintf(fmt, args);
> 
> kvasprintf(GFP_KERNEL, fmt, args);
> 
> > +   va_end(args);
> > +
> > +   if (!name)
> > +   return -ENOMEM;
> > +
> > +   bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name);
> 
> The parent should be passed here.
> 
> > +   for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) {
> > +   ret = device_create_file(bdi->dev, _dev_attrs[i]);
> > +   if (ret)
> > +   break;
> > +   }
> > +   if (ret) {
> > +   while (--i >= 0)
> > +   device_remove_file(bdi->dev, _dev_attrs[i]);
> > +   device_unregister(bdi->dev);
> > +   bdi->dev = NULL;
> > +   }
> 
> All this open-coded attribute stuff should go away and be replaced by:
>   bdi_class->dev_attrs = bdi_dev_attrs;
> Otherwise at event time the attributes are not created and stuff hooking
> into the events will not be able to set values. Also, the core will do
> proper add/remove and error handling then.

ok, that's good to know. someone ought to write a book on how to use all
this... really, even the functions are bare of documentation or
comments.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [build bug, 2.6.24-rc1] CONFIG_VIDEO_DEV=m & CONFIG_VIDEO_SAA7146_VV=y

2007-10-27 Thread Johannes Stezenbach
On Fri, Oct 26, 2007, Ingo Molnar wrote:
> 
> the attached config (generated via make randconfig) fails to build due 
> to the combination of these config entries:
> 
>   CONFIG_VIDEO_DEV=m
>   CONFIG_VIDEO_SAA7146_VV=y
> 
> i found no obvious Kconfig way to force VIDEO_SAA7146_VV to be modular 
> when VIDEO_DEV is modular - is there a good solution for this?

According to http://lkml.org/lkml/2007/10/21/226 :

config VIDEO_SAA7146_VV
tristate
depends on VIDEO_DEV = y || VIDEO_DEV = VIDEO_SAA7146_VV
select VIDEOBUF_DMA_SG
select VIDEO_SAA7146

(untested)


Johannes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote:
> On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I had me a little look at bdi usage in networked filesystems.
> >
> >  NFS, CIFS, (smbfs), AFS, CODA and NCP
> >
> > And of those, NFS is the only one that I could find that creates
> > backing_dev_info structures. The rest seems to fall back to
> > default_backing_dev_info.
> >
> > With my recent per bdi dirty limit patches the bdi has become more
> > important than it has been in the past. While falling back to the
> > default_backing_dev_info isn't wrong per-se, it isn't right either.
> >
> > Could I implore the various maintainers to look into this issue for
> > their respective filesystem. I'll try and come up with some patches to
> > address this, but feel free to beat me to it.
> 
> I would like to understand more about your patches to see what bdi
> values makes sense for CIFS and how to report possible congestion back
> to the page manager. 

So, what my recent patches do is carve up the total writeback cache
size, or dirty page limit as we call it, proportionally to a BDIs
writeout speed. So a fast device gets more than a slow device, but will
not starve it.

However, for this to work, each device, or remote backing store in the
case of networked filesystems, need to have a BDI.

>   I had been thinking about setting bdi->ra_pages
> so that we do more sensible readahead and writebehind - better
> matching what is possible over the network and what the server
> prefers.  

Well, you'd first have to create backing_dev_info instances before
setting that value :-)

>   SMB/CIFS Servers typically allow a maximum of 50 requests
> in parallel at one time from one client (although this is adjustable
> for some).

That seems like a perfect point to set congestion.

So in short, stick a struct backing_dev_info into whatever represents a
client, initialize it using bdi_init(), destroy using bdi_destroy().

Mark it congested once you have 50 (or more) outstanding requests, clear
congestion when you drop below 50.

and you should be set.



signature.asc
Description: This is a digitally signed message part


Re: [BUG] [linux-pm] Commit "Hibernation: Enter platform hibernation state in a consistent way)" makes my system to resume instantly from S4

2007-10-27 Thread Rafael J. Wysocki
On Saturday, 27 October 2007 14:05, Maxim Levitsky wrote:
> Hi,
> 
> Recently I noticed that my system resumes just after suspend to disk.
> 
> I traced this to commit 9cd9a0058dd35268b24fa16795a92c800f4086d4.
> 
> Note:
> 
> This happens only if I enable WOL using /proc/acpi/wakeup
> (echo "ILAN" > /proc/acpi/wakeup)

What happens after a suspend to RAM?

> and have 
> "ACPI-Hibernate-erroneously-disabled-Suspend-wakeup" applied, since otherwise
> all wake-up sources are disabled in S4.
> 
> 
> Clearly the above commit confuses the BIOS.
> Using latest -git with the above patch reverted makes everything work again 
> fine.

Well, this patch is needed to make wakeup from peripherals (eg. RTC alarm) work
on some boxes.

The symptom that you describe is similar to what I'm observing after a suspend
to RAM on one test box.

Can you produce a log of kernel messages printed before powering off the
system?

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
David wrote:
> I prefer Choice B because it does not force mempolicies to have any 
> dependence on cpusets with regard to what nodemask is passed.

Yes, well said.

> It would be very good to store the passed nodemask to set_mempolicy in 
> struct mempolicy, 

Yes - that's what I'm intending to do.

> If the cpuset has fewer than four nodes, the behavior 
> should be undefined (probably implemented to just cycle the set of 
> mems_allowed until you reach the fourth entry).

I do intend to implement it as you suggest.  See the lib/bitmap.c
routines bitmap_remap() and bitmap_bitremap(), and the nodemask
wrappers for these, nodes_remap() and node_remap().  They will
define the cycling, or I sometimes call it folding.

I would have tended to make this folding a defined part of the API,
though I will grant that the possibility of being lazy and forgetting
to document it seems attractive (less to document ;).

> That [running in a cpuset with fewer nodes than used in a memory policy
> mask] is the result of constraining a task to a cpuset that obviously
> wants access to more nodes -- it's a userspace mistake and abusing
> cpusets so that the task does not get what it expects.

Nah - I wouldn't put it that way.  It's no mistake or abuse.  It's just
one more example of a kernel making too few resources look sufficient
by sharing, multiplexing and virtualizing them.  That's what kernels do.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Roman Zippel
Hi,

On Sun, 28 Oct 2007, Alexey Dobriyan wrote:

> > If it's just about the type checking, something like below should pretty 
> > much do the same.
> 
> It won't catch, the following if both variables are unsigned long:
> 
>   spin_lock_irqsave(, flags);
>   [stuff]
>   spin_unlock_irqrestore(, foo->flags);
> 
> It won't catch "static unsigned long flags;". With sparse, we can
> eventually mark type as "on-stack only".

It should be on the stack, but we have cases where a pointer to it is used 
(e.g. lock_timer_base). How do you want to deal with this?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: fixing up DRM device model usage

2007-10-27 Thread Kay Sievers
On Fri, 2007-10-26 at 14:31 -0700, Jesse Barnes wrote:
> On Friday, October 26, 2007 12:08 pm Kay Sievers wrote:

> > The open coded: device_create_file(>dev, _attrs[i])
> > should probably replaced by passing the array to the class, and the
> > core will do that for you.
> 
> You mean just set drm_class->dev_attrs = device_attrs?  I didn't see in
> the core device model code where that would create the files...

Yeah, it is in device_add_attrs() in drivers/base/core.c.

Kay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Peter Zijlstra
On Sun, 2007-10-28 at 00:14 +0400, Alexey Dobriyan wrote:
> On Sat, Oct 27, 2007 at 09:20:43PM +0200, Roman Zippel wrote:
> > On Sun, 21 Oct 2007, Alexey Dobriyan wrote:
> > 
> > > So far remedies were:
> > > a) grep(1) -- obviously fragile. I tried at some point grepping for
> > >spin_lock_irqsave(), found quite a few, but it became bring 
> > > quickly.
> > > b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried,
> > >brutally broke some arches, survived one commit before revert :^)
> > >Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned 
> > > long).
> > > 
> > > So it would be nice to have something more robust.
> > 
> > If it's just about the type checking, something like below should pretty 
> > much do the same.
> 
> It won't catch, the following if both variables are unsigned long:
> 
>   spin_lock_irqsave(, flags);
>   [stuff]
>   spin_unlock_irqrestore(, foo->flags);
> 
> It won't catch "static unsigned long flags;". With sparse, we can
> eventually mark type as "on-stack only".

> > +static __always_inline void __irq_flags_check(unsigned long *flags)
> > +{
+   BUILD_BUG_ON(!__builtin_stack_addr(flags));
> > +}
> > +

obviously gcc doesn't (yet) support that __builtin function, but you
could make it work for sparse and define a dummy for gcc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-27 Thread Kay Sievers
On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote:
> On Sat, Oct 27, 2007 at 10:39:59AM +0200, Peter Zijlstra wrote:
> > On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote:
> > > On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote:
> > > > On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote:
> > > > > This crashes and burns on bootup, but I'm too tired to figure out 
> > > > > what I
> > > > > did wrong... will give it another try tomorrow..
> > > > 
> > > > Ok, can't sleep.. took a look. I have several problems here.
> > > > 
> > > > The thing that makes it go *boom* is the __ATTR_NULL. Removing that
> > > > makes it boot. Albeit it then warns me of multiple duplicate sysfs
> > > > objects, all named "bdi".

> > > I'll look at this and see what I can come up with.  Would you just like
> > > a whole new patch, or one against this one?
> > 
> > Sorry for the grumpy note, I get that way at 3.30 am. Maybe I ought not
> > have mailed :-/
> > 
> > This is the code I had at that time.
> 
> Ah, I see a few problems.  Here, try this version instead.  It's
> compile-tested only, and should be a lot simpler.
> 
> Note, we still are not setting the parent to the new bdi structure
> properly, so the devices will show up in /sys/devices/virtual/ instead
> of in their proper location.  To do this, we need the parent of the
> device, which I'm not so sure what it should be (block device?  block
> device controller?)

Assigning a parent device will only work with the upcoming conversion of
the raw kobjects in the block subsystem to "struct device".

A few comments to the patch:

> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -8,6 +8,7 @@
>  #include   /* for inline */
>  #include  /* for size_t */
>  #include /* for NULL */
> +#include 
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -111,6 +112,9 @@ extern void *kmemdup(const void *src, si
>  extern char **argv_split(gfp_t gfp, const char *str, int *argcp);
>  extern void argv_free(char **argv);
>  
> +char *kvprintf(const char *fmt, va_list args);
> +char *kprintf(const char *fmt, ...);

Why is that here? I don't think we need this when we use the existing:
  kvasprintf(GFP_KERNEL, fmt, args)

> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c

> +
> +static struct device_attribute bdi_dev_attrs[] = {
> + __ATTR(readahead, 0644, readahead_show, readahead_store),
> + __ATTR_RO(reclaimable),
> + __ATTR_RO(writeback),
> + __ATTR_RO(dirty),
> + __ATTR_RO(bdi_dirty),
> +};

Default attributes will need the NULL termination back (see below).

> +static __init int bdi_class_init(void)
> +{
> + bdi_class = class_create(THIS_MODULE, "bdi");
> + return 0;
> +}
> +
> +__initcall(bdi_class_init);
> +
> +int bdi_register(struct backing_dev_info *bdi, const char *fmt, ...)

This function should accept a: "struct device *parent" and all callers
just pass NULL until the block layer conversion gets merged.

> +{
> + char *name;
> + va_list args;
> + int ret = -ENOMEM;
> + int i;
> +
> + va_start(args, fmt);
> + name = kvprintf(fmt, args);

kvasprintf(GFP_KERNEL, fmt, args);

> + va_end(args);
> +
> + if (!name)
> + return -ENOMEM;
> +
> + bdi->dev = device_create(bdi_class, NULL, MKDEV(0,0), name);

The parent should be passed here.

> + for (i = 0; i < ARRAY_SIZE(bdi_dev_attrs); i++) {
> + ret = device_create_file(bdi->dev, _dev_attrs[i]);
> + if (ret)
> + break;
> + }
> + if (ret) {
> + while (--i >= 0)
> + device_remove_file(bdi->dev, _dev_attrs[i]);
> + device_unregister(bdi->dev);
> + bdi->dev = NULL;
> + }

All this open-coded attribute stuff should go away and be replaced by:
  bdi_class->dev_attrs = bdi_dev_attrs;
Otherwise at event time the attributes are not created and stuff hooking
into the events will not be able to set values. Also, the core will do
proper add/remove and error handling then.

Thanks,
Kay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Steve French
On 10/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I had me a little look at bdi usage in networked filesystems.
>
>  NFS, CIFS, (smbfs), AFS, CODA and NCP
>
> And of those, NFS is the only one that I could find that creates
> backing_dev_info structures. The rest seems to fall back to
> default_backing_dev_info.
>
> With my recent per bdi dirty limit patches the bdi has become more
> important than it has been in the past. While falling back to the
> default_backing_dev_info isn't wrong per-se, it isn't right either.
>
> Could I implore the various maintainers to look into this issue for
> their respective filesystem. I'll try and come up with some patches to
> address this, but feel free to beat me to it.

I would like to understand more about your patches to see what bdi
values makes sense for CIFS and how to report possible congestion back
to the page manager.   I had been thinking about setting bdi->ra_pages
so that we do more sensible readahead and writebehind - better
matching what is possible over the network and what the server
prefers.SMB/CIFS Servers typically allow a maximum of 50 requests
in parallel at one time from one client (although this is adjustable
for some). The CIFS client prefers to do writes 14 pages (an iovec of
56K) at a time (although many servers can efficiently handle multiple
of these 56K writes in parallel).  With minor changes CIFS could
handle even larger writes (to just under 64K for Windows and just
under 128K for Samba - the current CIFS Unix Extensions allow servers
to negotiate much larger writes, but lacking a "receivepage"
equivalent Samba does not currently support larger than 128K).
Ideally, to improve large file copy utilization, I would like to see
from 3-10 writes of 56K (or larger in the future) in parallel.   The
read path is harder since we only do 16K reads to Windows and Samba -
but we need to increase the number of these that are done in parallel
on the same inode.  There is a large Google Summer of Code patch for
this which needs more review.


-- 
Thanks,

Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread Paul Jackson
> > You have chosen (1) above, which keeps Choice A as the default.
> 
> There can be different defaults for the user space API via libnuma that 
> are indepdent from the kernel API which needs to remain stable. The kernel 
> API can be extended but not changed.

Yes - the user level code can have different defaults too.

I was discussing what should be the default kernel API.

> None of those [alternatives] sound appealing.  Multiple processes may run
> in one cpuset.

Well, that would justify keeping this choice per-task.  I tend to
agree with that.

But that doesn't justify having to specify it on each system call.

In another reply David recommends against supporting Choice A at all.
I'm inclined to agree with him.  I'll reply there, with more thoughts.

But if we did support Choice A, as a backwards compatible alternative
to Choice B, I'd suggest a per-task mode, not per-system call mode.
This would reduce the impact on the API of the ugly, unobvious, modal
flag needed to select the optional, non kernel default, Choice B
semantics.

I still have low confidence that you (Christoph) and I have the same
understanding of what these Choice A and B are.  Hopefully you can
address that, perhaps by briefly describing these choices in your words.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: gfs2_fh_to_parent() array overflow

2007-10-27 Thread Christoph Hellwig
On Wed, Oct 24, 2007 at 06:26:26PM +0200, Adrian Bunk wrote:
> The Coverity checker spotted the following array overflow caused by
> commit 34c0d154243dd913c5690ae6ceb9557017429b9c:

The line is a left-over from times when gfs stored the mode of the
inode in the file handle.  It can simply be deleted.  Steve, do you
want a patch for that or could you commit that one-liner directly?

> fs/gfs2/ops_export.c contains:
> 
> <--  snip  -->
> 
> ...
> static struct dentry *gfs2_fh_to_parent(struct super_block *sb, struct fid 
> *fid,
> int fh_len, int fh_type)
> {
> struct gfs2_inum_host parent;
> __be32 *fh = (__force __be32 *)fid->raw;  <
> 
> switch (fh_type) {
> case GFS2_LARGE_FH_SIZE:
> case GFS2_OLD_FH_SIZE:
> parent.no_formal_ino = ((u64)be32_to_cpu(fh[4])) << 32;
> parent.no_formal_ino |= be32_to_cpu(fh[5]);
> parent.no_addr = ((u64)be32_to_cpu(fh[6])) << 32;
>^
> parent.no_addr |= be32_to_cpu(fh[7]);
> ...   ^
> 
> <--  snip  -->
> 
> 
> cu
> Adrian
> 
> -- 
> 
>"Is there not promise of rain?" Ling Tan asked suddenly out
> of the darkness. There had been need of rain for many days.
>"Only a promise," Lao Er said.
>Pearl S. Buck - Dragon Seed
---end quoted text---
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel

2007-10-27 Thread Ahmed S. Darwish
> +/**
> + * smk_write_cipso - write() for /smack/cipso
> + * @filp: file pointer, not actually used
> + * @buf: where to get the data from
> + * @count: bytes sent
> + * @ppos: where to start
> + *
> + * Returns number of bytes written or error code, as appropriate
> + */
> +static ssize_t smk_write_cipso(struct file *file, const char __user *buf,
> +size_t count, loff_t *ppos)
> +{

[...]

> +
> + /*
> +  * Only allow one writer at a time. Writes should be
> +  * quite rare and small in any case.
> +  */
> + mutex_lock(_cipso_lock);
> +
> + *(data + count) = '\0';
> +
> + for (eolp = strchr(data, '\n'), linep = data;
> + eolp != NULL && rc >= 0;
> + linep = eolp + 1, eolp = strchr(linep, '\n')) {
> +

The problem here (As discussed in private mails) is that the for loop 
assumes that the beginning of given user-space buffer is the beginning
of a rule. This leads to situations where the rule becomes "ecret 20",
or "cret 20" instead of "Secret 20". Big input buffers/files leads 
smack to recieve a rule like "Secret 20" in fragmented chunks like:

write("\nSec", ..)
write("r", 1, ..)
write("et 20\n", ..)

Parsing a rule in such tough conditions in _kernel space_ is very
hard. I began to feel that it will be much easier if we do the parsing 
in a userspace utility and let smack accept only small buffers (80 char). 

i.e. A user space utility that takes a big input file like
exit 10/3,7,4
exit 10/3,7,4
exit 10/3,7,4
<100 times>

And transform it to 100 small write() calls. By this way we can return
-EINVAL if write()'s count field > 80, or if input contains no \n or
more than one.

Any Ideas ?. 

Casey, I can begin modifying cipso_write() and writing this small 
user-space utility now if you agree on this.

Regards,

-- 
Ahmed S. Darwish
HomePage: http://darwish.07.googlepages.com
Blog: http://darwish-07.blogspot.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

2007-10-27 Thread Erez Zadok
In message <[EMAIL PROTECTED]>, Hugh Dickins writes:
> On Mon, 22 Oct 2007, Erez Zadok wrote:
[...]
> > If you've got suggestions how I can handle unionfs_write more cleanly, or
> > comments on the above possibilities, I'd love to hear them.
> 
> For now I think you should pursue the ~(__GFP_FS|__GFP_IO) idea somehow.
> 
> Hugh

Hugh, thanks for the great explanations and suggestions (in multiple
emails).  I'm going to test all of those soon.

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [AppArmor 00/45] AppArmor security module overview

2007-10-27 Thread Christoph Hellwig
On Fri, Oct 26, 2007 at 07:37:21AM -0700, Arjan van de Ven wrote:
> before going into the LSM / security side of things, I'd like to get
> the VFS guys to look at your VFS interaction code.

It's been NACKed a few times, and just reposting it won't help.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Jan Engelhardt

On Oct 28 2007 00:24, Alexey Dobriyan wrote:
>
>> (Which would be the logical choice if it were a function and not a 
>> macro...) That would flag up all violations ("without cast to different 
>> pointer" or so) while usually not breaking compilation.
>> 
>> Of course, irq_flags_t is probably the best long-term solution if one 
>> wants to hide a struct. (Even then perhaps, use a pointer instead?)
>
>IIRC, Christoph mentioned:
>
>   irq_flags_t flags;
>
>   flags = spin_lock_irqXXX();
>   spin_unlock_irqYYY(, flags);
>
>where XXX and YYY are still to be found good names :^) It's also a solution
>without flag day and with more sane lock part -- "how flags are modified
>if they are passed by value?"
>
>I start to like this proposal but I can't come up with good names.
>

The BSD way: just add a number -- spin_lock_irq2()
Of course names are preferable.

irq_spin_lock and irq_spin_unlock?
spinirq_lock, spinirq_unlock?
spin_lock_irq_disable, spin_lock_irq_enable (a bit verbose...)

Maybe the 'irq' part should be completely dropped from the name,
as with -rt extensions, it behaves differently, does not it?
(Or was it preemption?)
If it was a home project, I'd just flag it, if it was a business project,
I'd add the number, for Linux - ha, too big for me :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Alexey Dobriyan
On Thu, Oct 25, 2007 at 05:40:03PM +0200, Jan Engelhardt wrote:
> On Oct 21 2007, Alexey Dobriyan wrote:
> >
> >One of type of bugs steadily being fought is the following one:
> >
> > unsigned int flags;
> > spin_lock_irqsave(, flags);
> >
> >where "flags" should be "unsigned long". Here is far from complete list 
> >of commits fixing such bugs:
> >
> 
> How about making spin_lock_irqsave actually take a pointer to flags? 

How would you do it without flag day?

> (Which would be the logical choice if it were a function and not a 
> macro...) That would flag up all violations ("without cast to different 
> pointer" or so) while usually not breaking compilation.
> 
> Of course, irq_flags_t is probably the best long-term solution if one 
> wants to hide a struct. (Even then perhaps, use a pointer instead?)

IIRC, Christoph mentioned:

irq_flags_t flags;

flags = spin_lock_irqXXX();
spin_unlock_irqYYY(, flags);

where XXX and YYY are still to be found good names :^) It's also a solution
without flag day and with more sane lock part -- "how flags are modified
if they are passed by value?"

I start to like this proposal but I can't come up with good names.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scatterlist fallout: drivers/scsi/arm/

2007-10-27 Thread Russell King
On Sat, Oct 27, 2007 at 07:44:04PM +0100, Al Viro wrote:
> 
> Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h
> index 21ba571..2d09fef 100644
> --- a/drivers/scsi/arm/scsi.h
> +++ b/drivers/scsi/arm/scsi.h
> @@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp)
>   SCp->buffer++;
>   SCp->buffers_residual--;
>   SCp->ptr = (char *)
> -  (page_address(SCp->buffer->page) +
> +  (page_address(sg_page(SCp->buffer)) +
> SCp->buffer->offset);
>   SCp->this_residual = SCp->buffer->length;
>   } else {
> @@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt)
>   SCpnt->SCp.buffer = (struct scatterlist *) 
> SCpnt->request_buffer;
>   SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1;
>   SCpnt->SCp.ptr = (char *)
> -  (page_address(SCpnt->SCp.buffer->page) +
> +  (page_address(sg_page(SCpnt->SCp.buffer)) +
> SCpnt->SCp.buffer->offset);
>   SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length;
>   SCpnt->SCp.phase = SCpnt->request_bufflen;

FYI, here's what I have queued (cut'n'pasted so probably won't apply):

diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h
index 21ba571..bb6550e 100644
--- a/drivers/scsi/arm/scsi.h
+++ b/drivers/scsi/arm/scsi.h
@@ -38,9 +38,7 @@ static inline int next_SCp(struct scsi_pointer *SCp)
if (ret) {
SCp->buffer++;
SCp->buffers_residual--;
-   SCp->ptr = (char *)
-(page_address(SCp->buffer->page) +
- SCp->buffer->offset);
+   SCp->ptr = sg_virt(SCp->buffer);
SCp->this_residual = SCp->buffer->length;
} else {
SCp->ptr = NULL;
@@ -76,9 +74,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt)

SCpnt->SCp.buffer = (struct scatterlist *) 
SCpnt->request_buffer;
SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1;
-   SCpnt->SCp.ptr = (char *)
-(page_address(SCpnt->SCp.buffer->page) +
- SCpnt->SCp.buffer->offset);
+   SCpnt->SCp.ptr = sg_virt(SCpnt->SCp.buffer);
SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length;
SCpnt->SCp.phase = SCpnt->request_bufflen;



-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] duplicate initializer in sound/pci/hda/patch_realtek.c

2007-10-27 Thread Al Viro

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index d9f78c8..1c50278 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -9299,7 +9299,6 @@ static struct alc_config_preset alc268_presets[] = {
.num_channel_mode = ARRAY_SIZE(alc268_modes),
.channel_mode = alc268_modes,
.input_mux = _capture_source,
-   .input_mux = _capture_source,
.unsol_event = alc268_toshiba_unsol_event,
.init_hook = alc268_toshiba_automute,
},
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Alexey Dobriyan
On Sat, Oct 27, 2007 at 09:20:43PM +0200, Roman Zippel wrote:
> On Sun, 21 Oct 2007, Alexey Dobriyan wrote:
> 
> > So far remedies were:
> > a) grep(1) -- obviously fragile. I tried at some point grepping for
> >spin_lock_irqsave(), found quite a few, but it became bring quickly.
> > b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried,
> >brutally broke some arches, survived one commit before revert :^)
> >Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned long).
> > 
> > So it would be nice to have something more robust.
> 
> If it's just about the type checking, something like below should pretty 
> much do the same.

It won't catch, the following if both variables are unsigned long:

spin_lock_irqsave(, flags);
[stuff]
spin_unlock_irqrestore(, foo->flags);

It won't catch "static unsigned long flags;". With sparse, we can
eventually mark type as "on-stack only".

> > * irq_flags_t allows arch maintainers to eventually switch to something
> >   smaller than "unsigned long" if they want to.
> 
> Considering how painful this conversion could be, the question is whether 
> this is really needed.

In the long run, I believe, this is needed. We want more bugs to be
found automatically, so we can have more time for non-trivial bugs.

> Comments so far suggest some archs don't need all 
> of it, but does someone need more?

I don't know.

> --- linux-2.6.orig/include/linux/irqflags.h
> +++ linux-2.6/include/linux/irqflags.h
> @@ -41,6 +41,10 @@
>  # define INIT_TRACE_IRQFLAGS
>  #endif
>  
> +static __always_inline void __irq_flags_check(unsigned long *flags)
> +{
> +}
> +
>  #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
>  
>  #include 
> @@ -50,10 +54,11 @@
>  #define local_irq_disable() \
>   do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
>  #define local_irq_save(flags) \
> - do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0)
> + do { __irq_flags_check(); raw_local_irq_save(flags); 
> trace_hardirqs_off(); } while (0)
>  
>  #define local_irq_restore(flags) \
>   do {\
> + __irq_flags_check();  \
>   if (raw_irqs_disabled_flags(flags)) {   \
>   raw_local_irq_restore(flags);   \
>   trace_hardirqs_off();   \
> @@ -69,8 +74,8 @@
>   */
>  # define raw_local_irq_disable() local_irq_disable()
>  # define raw_local_irq_enable()  local_irq_enable()
> -# define raw_local_irq_save(flags)   local_irq_save(flags)
> -# define raw_local_irq_restore(flags)local_irq_restore(flags)
> +# define raw_local_irq_save(flags)   ({ __irq_flags_check(); 
> local_irq_save(flags); })
> +# define raw_local_irq_restore(flags)({ __irq_flags_check(); 
> local_irq_restore(flags); })
>  #endif /* CONFIG_TRACE_IRQFLAGS_SUPPORT */
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.23-rt4 (was 2.6.23-rt1 trouble)

2007-10-27 Thread Rui Nuno Capela
Steven Rostedt wrote:
> --
> 
> On Sat, 27 Oct 2007, Rui Nuno Capela wrote:
> 
>>> On Mon, October 15, 2007 11:49, Rui Nuno Capela wrote:
 I am experiencing some highly annoying but intermitent freezing on a
 pentium4 2.80G HT/SMT box, when doing normal desktop work with 2.6.23-rt1.


 The same crippling behavior does not occur on a Core 2 Due T7200 2.0G
 SMP, so I suspect it's something due specific to the SMT scheduling
 support (Hyper-Threading). But can't tell for sure, obviously :)

>>> I was wrong. After several trials the same behavior also occurs on the
>>> Core2 Duo T7200. It just took longer to show its nasty.
>>>
>>>
 The symptoms are noticeable primarily as some X/GUI intermitent freezing,
 sometimes only one application, then several and ultimately the whole X
 desktop becomes completely unresponsive. It looks like scheduling
> 
> When things start to freeze, could you capture the output of a sysrq-t.
> 

yes, you can find a complete serial console capture here, where it holds
the final SysRq-T output:
  http://www.rncbc.org/datahub/console-2.6.23.1-rt4.1-1.log

the corresponding .config is also there:
  http://www.rncbc.org/datahub/config-2.6.23.1-rt4.1

the reason this is a new kernel build, following a shot in the dark from
nick mainsbridge, which let out the ntfs module build (CONFIG_NTFS_FS is
not set) suggesting that would mitigate similar freezes.

in deed the general feel is that it seems to run longer and less prone
to those incidents, but that is just a gut feeling, nothing more.

hope this be any useful.

> 
 problems. There is this hint that switching to a spare console terminal
 (via Ctrl+Alt+Fn) might cause later recovery. But its just a question of
 some more time for it just happens again and again, one after another,
 several applications becoming temporarily frozen and just by luck the
 system gets back to normal, probably due to some incidental shake-up :)
 but there are other times that nothing seems to help with no alternative
 to the power-reset switch.

 I could not find any evidence on dmesg or in the system logs, of any
 apparent trouble. No BUGs, no oops, no panics, no nothing. It just
 freezes, this and that, now and then. It just makes it all unworkable
 and obviously subject to ditching.

 Again, this only happens on this P4/HT box. On a Core2 Duo laptop, with
 same 2.6.23-rt1 with the very same kernel configuration, it does not show
 any illness and is running quite fine.

>>> False. It used to run fine, until the creeps happen first time :(
>>>
>>>
 Remember one report I had about a similar freezing behavior? Now it's
 happening the other way around: the core2 is OK, the pentium4 is KO.

>>> Now it applies to all 2.6.23-rt1 images I could test upon.
>>>
>>>
 One naive suspicion goes like the new rcu-preempt code is to blame, since
 I don't remember having this or any other trouble with 2.6.23-rc8-rt1.

>>> Not be sure anymore, but this seems to be still a valid assumption.
>>>
>> just to let you know that still the same trouble persists with 2.6.23.1-rt4
>>
>> .config can be found here:
>>http://www.rncbc.org/datahub/config-2.6.23.1-rt4.0
>>
> 
> I have a P4HT laptop (unfortunately with no serial). I use it as one of my
> main machines, so it will suck for me when it freezes ;-). I'll take your
> config and try it out.
> 
> I'll most likely do this on Monday since process Wife has the highest
> priority over the weekend ;-)
> 

that is also true here :)

otoh, as reported before, the freezes are not exclusive to P4HT, which
is the main box I'm been reporting here (the one which has a good old
serial port anyway), but also applies to my other laptop, an core2 duo
t7200.

bye
-- 
rncbc aka Rui Nuno Capela
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fix breakage in pegasos_eth (fallout from commit b45d9147f1582333e180e1023624c003874b7312)

2007-10-27 Thread Al Viro

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/include/linux/mv643xx_eth.h b/include/linux/mv643xx_eth.h
index 3f27239..8df230a 100644
--- a/include/linux/mv643xx_eth.h
+++ b/include/linux/mv643xx_eth.h
@@ -8,6 +8,9 @@
 #define MV643XX_ETH_NAME   "mv643xx_eth"
 #define MV643XX_ETH_SHARED_REGS0x2000
 #define MV643XX_ETH_SHARED_REGS_SIZE   0x2000
+#define MV643XX_ETH_BAR_4  0x220
+#define MV643XX_ETH_SIZE_REG_4 0x224
+#define MV643XX_ETH_BASE_ADDR_ENABLE_REG   0x0290
 
 struct mv643xx_eth_platform_data {
int port_number;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Git pull] x86 bugfixes

2007-10-27 Thread Thomas Gleixner
Linus,

please pull x86 bugfixes from

  ssh://master.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git

Some more voyager fixes and other compile warning fixups than in the
yesterday pull request.

Thanks,
tglx

Eric W. Biederman (1):
  x86: Fix boot protocol KEEP_SEGMENTS check.

James Bottomley (1):
  x86: voyager: fix bogus conversion to per_cpu for boot_cpu_info

Jeff Garzik (2):
  x86: fix !SMP compiler warning in arch/x86/kernel/acpi/processor.c
  x86: fix compiler warnings in arch/x86/kernel/early-quirks.c

Ken'ichi Ohmichi (1):
  x86: Dump filtering supports x86_64 sparsemem

Thomas Gleixner (2):
  Revert "i386: export i386 smp_call_function_mask() to modules"
  x86: export smp_ops to allow modular build of KVM

 arch/x86/boot/compressed/head_32.S  |   12 
 arch/x86/boot/compressed/head_64.S  |7 +++
 arch/x86/kernel/acpi/processor.c|3 +--
 arch/x86/kernel/asm-offsets_64.c|   10 ++
 arch/x86/kernel/early-quirks.c  |4 +++-
 arch/x86/kernel/machine_kexec_64.c  |2 ++
 arch/x86/kernel/smp_32.c|8 +---
 arch/x86/mach-voyager/voyager_smp.c |4 ++--
 include/asm-x86/smp_32.h|9 ++---
 9 files changed, 36 insertions(+), 23 deletions(-)
diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index a0ae2e7..036e635 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -33,24 +33,20 @@
.globl startup_32
 
 startup_32:
-   /* check to see if KEEP_SEGMENTS flag is meaningful */
-   cmpw $0x207, BP_version(%esi)
-   jb 1f
-
+   cld
/* test KEEP_SEGMENTS flag to see if the bootloader is asking
 * us to not reload segments */
testb $(1<<6), BP_loadflags(%esi)
-   jnz 2f
+   jnz 1f
 
-1: cli
+   cli
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
movl %eax,%ss
-
-2: cld
+1:
 
 /* Calculate the delta between where we were compiled to run
  * at and where we were actually loaded at.  This can only be done
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 4946764..1ccb38a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 .section ".text.head"
.code32
@@ -36,11 +37,17 @@
 
 startup_32:
cld
+   /* test KEEP_SEGMENTS flag to see if the bootloader is asking
+* us to not reload segments */
+   testb $(1<<6), BP_loadflags(%esi)
+   jnz 1f
+
cli
movl$(__KERNEL_DS), %eax
movl%eax, %ds
movl%eax, %es
movl%eax, %ss
+1:
 
 /* Calculate the delta between where we were compiled to run
  * at and where we were actually loaded at.  This can only be done
diff --git a/arch/x86/kernel/acpi/processor.c b/arch/x86/kernel/acpi/processor.c
index 2ed0a4c..f63e5ff 100644
--- a/arch/x86/kernel/acpi/processor.c
+++ b/arch/x86/kernel/acpi/processor.c
@@ -62,8 +62,7 @@ static void init_intel_pdc(struct acpi_processor *pr, struct 
cpuinfo_x86 *c)
 /* Initialize _PDC data based on the CPU vendor */
 void arch_acpi_processor_init_pdc(struct acpi_processor *pr)
 {
-   unsigned int cpu = pr->id;
-   struct cpuinfo_x86 *c = _data(cpu);
+   struct cpuinfo_x86 *c = _data(pr->id);
 
pr->pdc = NULL;
if (c->x86_vendor == X86_VENDOR_INTEL)
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index 7e50bda..d1b6ed9 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -15,12 +15,16 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
 
 #define BLANK() asm volatile("\n->" : : )
 
+#define OFFSET(sym, str, mem) \
+   DEFINE(sym, offsetof(struct str, mem))
+
 #define __NO_STUBS 1
 #undef __SYSCALL
 #undef _ASM_X86_64_UNISTD_H_
@@ -109,5 +113,11 @@ int main(void)
DEFINE(crypto_tfm_ctx_offset, offsetof(struct crypto_tfm, __crt_ctx));
BLANK();
DEFINE(__NR_syscall_max, sizeof(syscalls) - 1);
+
+   BLANK();
+   OFFSET(BP_scratch, boot_params, scratch);
+   OFFSET(BP_loadflags, boot_params, hdr.loadflags);
+   OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch);
+   OFFSET(BP_version, boot_params, hdr.version);
return 0;
 }
diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
index dc34acb..639e632 100644
--- a/arch/x86/kernel/early-quirks.c
+++ b/arch/x86/kernel/early-quirks.c
@@ -35,12 +35,14 @@ static void __init via_bugs(void)
 }
 
 #ifdef CONFIG_ACPI
+#ifdef CONFIG_X86_IO_APIC
 
 static int __init nvidia_hpet_check(struct acpi_table_header *header)
 {
return 0;
 }
-#endif
+#endif /* CONFIG_X86_IO_APIC */
+#endif 

Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent

2007-10-27 Thread Anton Vorontsov
On Sat, Oct 27, 2007 at 03:32:04PM -0400, David Woodhouse wrote:
> On Sat, 2007-10-27 at 22:42 +0400, Anton Vorontsov wrote:
> > Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and
> > olpc drivers becuase it's not trivial to register/unregister their
> > batteries on physical insertion/removal. I have some plans to teach
> > at least pmu batteries to not use PROP_PRESENT. I don't have any
> > OLPC, thus I can't convert it.
> 
> Actually it's not hard to do that.

I didn't say it's hard. But we don't have any interrupts for the
battery events, thus we have to implement polling.

> It was done this way in response to a
> request from the userspace side. IIRC.

Oh. Userspace tried to do something weird then, I think. Generally
it's just sane to unregister absent hardware.

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] proc_fs.h redux

2007-10-27 Thread Alexey Dobriyan
Remove proc_fs.h from headers that doesn't really need it. Typical overkill is
including full header when one can get away with just forward declaration of
"struct proc_dir_entry".

Number of files that are recompiled after touching proc_fs.h drops from 1100
to 513(!) on x86_64 allmodconfig.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 arch/powerpc/kernel/process.c   |1 +
 arch/powerpc/kernel/prom.c  |1 +
 arch/powerpc/mm/init_32.c   |1 +
 arch/powerpc/mm/init_64.c   |1 +
 arch/powerpc/platforms/iseries/mf.c |1 +
 arch/powerpc/platforms/pasemi/electra_ide.c |2 +-
 arch/powerpc/sysdev/fsl_soc.c   |1 +
 arch/powerpc/sysdev/mv64x60_dev.c   |2 +-
 arch/powerpc/sysdev/qe_lib/qe.c |1 +
 arch/sparc/kernel/process.c |3 ++-
 arch/sparc64/kernel/ldc.c   |1 +
 arch/sparc64/kernel/mdesc.c |1 +
 arch/sparc64/kernel/viohs.c |2 +-
 drivers/acpi/bay.c  |1 +
 drivers/acpi/dock.c |2 +-
 drivers/block/cpqarray.h|2 +-
 drivers/hwmon/ibmpex.c  |2 +-
 drivers/isdn/hardware/eicon/platform.h  |1 -
 drivers/md/md.c |1 +
 drivers/misc/fujitsu-laptop.c   |1 +
 drivers/misc/msi-laptop.c   |1 +
 drivers/misc/thinkpad_acpi.c|2 +-
 drivers/misc/thinkpad_acpi.h|3 ++-
 drivers/net/bonding/bonding.h   |3 ++-
 drivers/net/cxgb3/t3cdev.h  |2 +-
 drivers/parport/parport_serial.c|2 +-
 drivers/sbus/char/envctrl.c |2 +-
 drivers/sbus/char/uctrl.c   |1 +
 drivers/scsi/imm.h  |1 -
 drivers/scsi/ppa.h  |1 -
 drivers/usb/gadget/lh7a40x_udc.h|1 -
 fs/nfs/client.c |1 +
 include/acpi/acpi_bus.h |2 +-
 include/asm-powerpc/prom.h  |3 ++-
 include/asm-sparc/prom.h|4 +++-
 include/asm-sparc64/prom.h  |4 +++-
 include/linux/atmdev.h  |2 +-
 include/linux/crash_dump.h  |3 ++-
 include/linux/efi.h |1 -
 include/linux/ipmi.h|1 -
 include/linux/netfilter.h   |2 +-
 include/linux/parport.h |1 -
 include/linux/raid/md.h |1 -
 include/linux/sunrpc/cache.h|4 +++-
 include/linux/sunrpc/rpc_pipe_fs.h  |2 ++
 include/linux/sunrpc/stats.h|3 ++-
 include/linux/wanrouter.h   |3 ++-
 include/net/sctp/sctp.h |2 +-
 net/atm/br2684.c|1 +
 net/atm/resources.h |2 --
 net/irda/irnet/irnet.h  |1 -
 net/irda/irnet/irnet_irda.c |2 +-
 net/sctp/objcnt.c   |1 +
 net/sctp/proc.c |1 +
 net/sctp/protocol.c |1 +
 net/sunrpc/auth.c   |2 ++
 net/sunrpc/auth_unix.c  |2 +-
 net/sunrpc/rpcb_clnt.c  |1 +
 net/sunrpc/xprt.c   |2 +-
 net/wanrouter/wanmain.c |1 +
 net/wanrouter/wanproc.c |1 +
 61 files changed, 66 insertions(+), 38 deletions(-)

--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
--- a/arch/powerpc/platforms/iseries/mf.c
+++ b/arch/powerpc/platforms/iseries/mf.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
--- a/arch/powerpc/platforms/pasemi/electra_ide.c
+++ b/arch/powerpc/platforms/pasemi/electra_ide.c
@@ -16,7 +16,7 @@
  * along with this program; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
-
+#include 
 #include 
 
 #include 
--- a/arch/powerpc/sysdev/fsl_soc.c
+++ b/arch/powerpc/sysdev/fsl_soc.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 

Re: 2.6.24-rc1 sysctl table check failed on PowerMac

2007-10-27 Thread Mikael Pettersson
On Sat, 27 Oct 2007 22:34:53 +0400, Alexey Dobriyan wrote:
> On Sat, Oct 27, 2007 at 07:57:38PM +0200, Mikael Pettersson wrote:
> > Booting 2.6.24-rc1 on my PowerMac the kernel now spits
> > out a sysctl warning early in the boot sequence:
> > 
> > --- dmesg-2.6.23
> > +++ dmesg-2.6.24-rc1
> > ...
> >  IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> >  TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> >  TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
> >  TCP: Hash tables configured (established 131072 bind 65536)
> >  TCP reno registered
> > +sysctl table check failed: /kernel .1 Writable sysctl directory
> 
> [PATCH] powerpc: fix sysctl whining re kernel.powersave-nap
> 
> kernel was marked with 0755. Everywhere else it's 0555.
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> ---
> 
>  arch/powerpc/kernel/idle.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = {
>   {
>   .ctl_name   = CTL_KERN,
>   .procname   = "kernel",
> - .mode   = 0755,
> + .mode   = 0555,
>   .child  = powersave_nap_ctl_table,
>   },
>   {}
> 

This eliminated the warning. Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent

2007-10-27 Thread David Woodhouse
On Sat, 2007-10-27 at 22:42 +0400, Anton Vorontsov wrote:
> Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and
> olpc drivers becuase it's not trivial to register/unregister their
> batteries on physical insertion/removal. I have some plans to teach
> at least pmu batteries to not use PROP_PRESENT. I don't have any
> OLPC, thus I can't convert it.

Actually it's not hard to do that. It was done this way in response to a
request from the userspace side. IIRC.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] scatterlist fallout: drivers/scsi/arm/

2007-10-27 Thread Russell King
Already have this one queued for Linus in my tree.

On Sat, Oct 27, 2007 at 07:44:04PM +0100, Al Viro wrote:
> 
> Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> ---
> diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h
> index 21ba571..2d09fef 100644
> --- a/drivers/scsi/arm/scsi.h
> +++ b/drivers/scsi/arm/scsi.h
> @@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp)
>   SCp->buffer++;
>   SCp->buffers_residual--;
>   SCp->ptr = (char *)
> -  (page_address(SCp->buffer->page) +
> +  (page_address(sg_page(SCp->buffer)) +
> SCp->buffer->offset);
>   SCp->this_residual = SCp->buffer->length;
>   } else {
> @@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt)
>   SCpnt->SCp.buffer = (struct scatterlist *) 
> SCpnt->request_buffer;
>   SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1;
>   SCpnt->SCp.ptr = (char *)
> -  (page_address(SCpnt->SCp.buffer->page) +
> +  (page_address(sg_page(SCpnt->SCp.buffer)) +
> SCpnt->SCp.buffer->offset);
>   SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length;
>   SCpnt->SCp.phase = SCpnt->request_bufflen;

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Maxim Levitsky
On Saturday 27 October 2007 21:17:55 Alan Stern wrote:
> On Fri, 26 Oct 2007, Maxim Levitsky wrote:
> 
> > > > Looking through the dmfe code, I noticed yet another possible race.
> > > > A race between the .suspend, and a timer that serves both as a 
> > > > watchdog, and link state detector.
> > > > Again I need to prevent it from running during the suspend/resume, but 
> > > > how?
> > > > 
> > > > I can use del_timer in .suspend, and mod_timer in .resume, but that 
> > > > doesn't protect against
> > > > race with already running timer.
> > > > I can use del_timer_sync, but it states that it is useless if timer 
> > > > re-enables itself, and I agree with that.
> > > > In dmfe case the timer does re-enable itself.
> > > 
> > > That comment isn't right.  del_timer_sync works perfectly well even if
> > > the timer routine re-enables itself, provided it stops doing so after a
> > > small number of iterations.
> > Thanks for the info. but
> > Due to the "don't access the hardware, while powered-off" rule, I must know 
> > that the timer isn't running.
> > and won't be.
> > So what function to use (if possible) to be sure that the timer won't run 
> > anymore?
> > (Taking in the account the fact that it re-enables itself)
> 
> Use del_timer_sync().  It guarantees that when it returns, the timer 
> will be stopped and the timer routine will no longer be running on any 
> CPU.
> 
Even if the timer re-enables itself, are you sure?

> Alan Stern
> 
> 

Best regards,
Maxim Levitsky
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Roman Zippel
Hi,

On Sun, 21 Oct 2007, Alexey Dobriyan wrote:

> So far remedies were:
> a) grep(1) -- obviously fragile. I tried at some point grepping for
>spin_lock_irqsave(), found quite a few, but it became bring quickly.
> b) BUILD_BUG_ON(sizeof(flags) != sizeof(unsigned long)) -- was tried,
>brutally broke some arches, survived one commit before revert :^)
>Doesn't work on i386 where sizeof(unsigned int) == sizeof(unsigned long).
> 
> So it would be nice to have something more robust.

If it's just about the type checking, something like below should pretty 
much do the same.

> * irq_flags_t allows arch maintainers to eventually switch to something
>   smaller than "unsigned long" if they want to.

Considering how painful this conversion could be, the question is whether 
this is really needed. Comments so far suggest some archs don't need all 
of it, but does someone need more?

bye, Roman

---
 include/linux/irqflags.h |   11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/irqflags.h
===
--- linux-2.6.orig/include/linux/irqflags.h
+++ linux-2.6/include/linux/irqflags.h
@@ -41,6 +41,10 @@
 # define INIT_TRACE_IRQFLAGS
 #endif
 
+static __always_inline void __irq_flags_check(unsigned long *flags)
+{
+}
+
 #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
 
 #include 
@@ -50,10 +54,11 @@
 #define local_irq_disable() \
do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
 #define local_irq_save(flags) \
-   do { raw_local_irq_save(flags); trace_hardirqs_off(); } while (0)
+   do { __irq_flags_check(); raw_local_irq_save(flags); 
trace_hardirqs_off(); } while (0)
 
 #define local_irq_restore(flags)   \
do {\
+   __irq_flags_check();  \
if (raw_irqs_disabled_flags(flags)) {   \
raw_local_irq_restore(flags);   \
trace_hardirqs_off();   \
@@ -69,8 +74,8 @@
  */
 # define raw_local_irq_disable()   local_irq_disable()
 # define raw_local_irq_enable()local_irq_enable()
-# define raw_local_irq_save(flags) local_irq_save(flags)
-# define raw_local_irq_restore(flags)  local_irq_restore(flags)
+# define raw_local_irq_save(flags) ({ __irq_flags_check(); 
local_irq_save(flags); })
+# define raw_local_irq_restore(flags)  ({ __irq_flags_check(); 
local_irq_restore(flags); })
 #endif /* CONFIG_TRACE_IRQFLAGS_SUPPORT */
 
 #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel

2007-10-27 Thread Casey Schaufler

--- "Ahmed S. Darwish" <[EMAIL PROTECTED]> wrote:

> Hi Casey,
> 
> Casey <[EMAIL PROTECTED]> wrote:
> >
> > This version is again aimed at addressing Al Viro's issues in
> > smackfs. Ahmed Darwish has again contributed in the repair of the
> > locking issues there. The move to 2.6.24 was also an important
> > release incentive.
> >
> 
> My patches mentiond above is not applied in this version. The same 
> lock issues remain. Did you forget applying them ?

The patch I sent out was one version older than the one I
intended to send. If I can recover the #$%^&*( disk I should
have an update that includes those fixes presently. If not it
may take a day or two longer. You have not been forgotten.
Thank you for your contribution.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] QUESTION: How to fix race between .suspend routine and watchdog timer

2007-10-27 Thread Alan Stern
On Fri, 26 Oct 2007, Maxim Levitsky wrote:

> > > Looking through the dmfe code, I noticed yet another possible race.
> > > A race between the .suspend, and a timer that serves both as a watchdog, 
> > > and link state detector.
> > > Again I need to prevent it from running during the suspend/resume, but 
> > > how?
> > > 
> > > I can use del_timer in .suspend, and mod_timer in .resume, but that 
> > > doesn't protect against
> > > race with already running timer.
> > > I can use del_timer_sync, but it states that it is useless if timer 
> > > re-enables itself, and I agree with that.
> > > In dmfe case the timer does re-enable itself.
> > 
> > That comment isn't right.  del_timer_sync works perfectly well even if
> > the timer routine re-enables itself, provided it stops doing so after a
> > small number of iterations.
> Thanks for the info. but
> Due to the "don't access the hardware, while powered-off" rule, I must know 
> that the timer isn't running.
> and won't be.
> So what function to use (if possible) to be sure that the timer won't run 
> anymore?
> (Taking in the account the fact that it re-enables itself)

Use del_timer_sync().  It guarantees that when it returns, the timer 
will be stopped and the timer routine will no longer be running on any 
CPU.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/2] cpusets: add interleave_over_allowed option

2007-10-27 Thread David Rientjes
On Fri, 26 Oct 2007, David Rientjes wrote:

> Hacking and requiring an updated version of libnuma to allow empty 
> nodemasks to be passed is a poor solution; if mempolicy's are supposed to 
> be independent from cpusets, then what semantics does an empty nodemask 
> actually imply when using MPOL_INTERLEAVE?  To me, it means the entire 
> set_mempolicy() should be a no-op, and that's exactly how mainline 
> currently treats it _as_well_ as libnuma.  So justifying this change in 
> the man page is respectible, but passing an empty nodemask just doesn't 
> make sense.
> 

Another reason that passing an empty nodemask to set_mempolicy() doesn't 
make sense is that libnuma uses numa_set_interleave_mask(_no_nodes)
to disable interleaving completely.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] DMA: Fix broken device refcounting

2007-10-27 Thread Dan Williams
On Sat, 2007-10-27 at 06:49 -0700, Haavard Skinnemoen wrote:
> On Fri, 26 Oct 2007 09:36:17 -0700
> Dan Williams <[EMAIL PROTECTED]> wrote:
> 
> > @@ -221,7 +220,6 @@ void dma_chan_cleanup(struct kref *kref)
> >  {
> >   struct dma_chan *chan = container_of(kref, struct dma_chan, refcount);
> >   chan->device->device_free_chan_resources(chan);
> > - kref_put(>device->refcount, dma_async_device_cleanup);
> >  }
> >  EXPORT_SYMBOL(dma_chan_cleanup);
> 
> While I can't see any problems with the rest of the patch, I think this
> part is wrong for the same reasons removing the kref_put() from the
> class device cleanup function is. I don't see any constraint that
> guarantees that dma_chan_cleanup() will always be called before
> dma_dev_release(), which means that "chan" may have been freed before
> this function gets a chance to run. Please correct me if I'm wrong.

Absolutely right, the driver, not dmaengine, frees the memory so there
must be a per channel reference on the device to hold off the driver's
remove routine.
> 
> Håvard

So how about this...

---snip---
dmaengine: Fix broken device refcounting

From: Haavard Skinnemoen <[EMAIL PROTECTED]>

When a DMA device is unregistered, its reference count is decremented
twice for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup(). This may result in the DMA device driver's
remove() function completing before all channels have been cleaned
up, causing lots of use-after-free fun.

Fix it by incrementing the device's reference count twice for each
channel during registration.

Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]>
[EMAIL PROTECTED]: kill unnecessary client refcounting]
Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/dma/dmaengine.c |   17 ++---
 1 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 84257f7..ec7e871 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -186,10 +186,9 @@ static void dma_client_chan_alloc(struct dma_client 
*client)
/* we are done once this client rejects
 * an available resource
 */
-   if (ack == DMA_ACK) {
+   if (ack == DMA_ACK)
dma_chan_get(chan);
-   kref_get(>refcount);
-   } else if (ack == DMA_NAK)
+   else if (ack == DMA_NAK)
return;
}
}
@@ -276,11 +275,8 @@ static void dma_clients_notify_removed(struct dma_chan 
*chan)
/* client was holding resources for this channel so
 * free it
 */
-   if (ack == DMA_ACK) {
+   if (ack == DMA_ACK)
dma_chan_put(chan);
-   kref_put(>device->refcount,
-   dma_async_device_cleanup);
-   }
}
 
mutex_unlock(_list_mutex);
@@ -320,11 +316,8 @@ void dma_async_client_unregister(struct dma_client *client)
ack = client->event_callback(client, chan,
DMA_RESOURCE_REMOVED);
 
-   if (ack == DMA_ACK) {
+   if (ack == DMA_ACK)
dma_chan_put(chan);
-   kref_put(>device->refcount,
-   dma_async_device_cleanup);
-   }
}
 
list_del(>global_node);
@@ -401,6 +394,8 @@ int dma_async_device_register(struct dma_device *device)
goto err_out;
}
 
+   /* One for the channel, one of the class device */
+   kref_get(>refcount);
kref_get(>refcount);
kref_init(>refcount);
chan->slow_ref = 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] USB HCD: avoid duplicate local_irq_disable()

2007-10-27 Thread Alan Stern
On Thu, 25 Oct 2007, Greg KH wrote:

> On Tue, Oct 23, 2007 at 12:01:37AM -0400, Alan Stern wrote:
> > On Tue, 23 Oct 2007, Arnd Bergmann wrote:
> > 
> > > usb_hcd_flush_endpoint() has a retry loop that starts with a 
> > > spin_lock_irq(),
> > > but only gives up the spinlock, not the irq_disable before jumping to the
> > > rescan label.
> > > 
> > > Split the spin_lock_irq into the retryable part and the 
> > > local_irq_disable()
> > > that is only done once as a micro-optimization and slight cleanup.
> > 
> > I agree with your sentiment, but it would be better to solve this 
> > problem without using local_irq_disable().  The patch below does this.
> > 
> > ---
> > 
> > Signed-off-by: Alan Stern <[EMAIL PROTECTED]>
> 
> Alan, is this something you want added to the tree and in before 2.6.24
> is out?

Yes.  It's a small thing, but we're better off keeping IRQ
enable/disable calls properly balanced.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix x86_64 TIF_SYSCALL_TRACE race in entry.S

2007-10-27 Thread Andi Kleen
Mathieu Desnoyers <[EMAIL PROTECTED]> writes:

> We make sure that the thread flag read is coherent between our new test and 
> the ALLWORK_MASK test by first saving it in a register used for both 
> comparisons.
>

That doesn't make sense. If someone is setting those asynchronously you 
can always race.

You should really just stop the process like ptrace does before changing
such things.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1: First impressions

2007-10-27 Thread Bill Davidsen

Andrew Morton wrote:

On Fri, 26 Oct 2007 21:33:40 +0200
Ingo Molnar <[EMAIL PROTECTED]> wrote:


* Andrew Morton <[EMAIL PROTECTED]> wrote:



so a final 'sync' should be added to the test too, and the time it takes 
factored into the bandwidth numbers?


That's one way of doing it.  Or just run the test for a "long" time.  ie:
much longer than (total-memory / disk-bandwidth).  Probably the latter
will give a more accurate result, but it can get boring.

Longer might be less inaccurate, but without flushing the last data you 
really don't get best accuracy, you just reduce the error. Clearly doing 
fdatasync() is best, since other i/o caused by sync() can skew the results.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Adding TIF_TRACE_KERNEL to x86_64

2007-10-27 Thread Andi Kleen
On Fri, Oct 26, 2007 at 03:37:38PM -0400, Mathieu Desnoyers wrote:
> 
> 1 - process A enters in a syscall, TIF_KERNEL_TRACE is cleared
> 2 - we activate TIF_KERNEL_TRACE
> 3 - process A returns from syscall (with wrong top of stack ?) -> segfault.
> 
> Am I on the right track ?

Yes. The code was not designed to allow that. The syscall path
has been optimized to go as fast as possible.

> 
> Can this be a concern with TIF_SYSCALL_TRACE also ? (potential race in
> ptrace ?)

ptrace only changes state in stopped processes; and those are not hanging
in syscalls. So no there is no race in a standard kernel.

If you wanted to change it the right way would be probably to test
for SYSCALL_TRACE too in the work flags and handle it there

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ide/arm/icside: fallout from commit 86f3a492bb09eee5745b93af35f2212179c251fd

2007-10-27 Thread Bartlomiej Zolnierkiewicz
On Saturday 27 October 2007, Al Viro wrote:
> struct device doesn't have ->dma; it's in struct expansion_card where
> that struct device is embedded into.

Yeah, your fix is obviously correct, thanks!

Bart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] bitops kernel-doc: inline instead of macro

2007-10-27 Thread H. Peter Anvin

Andy Whitcroft wrote:

On Thu, Oct 25, 2007 at 01:48:14PM -0700, Andrew Morton wrote:


Andy, I thought we were going to whine about __inline__ and __inline, too?


Hmmm, I don't remember that coming up, but I'll add it to the todo.  I
am assuming plain 'inline' is preferred over both of these -- yell if
you meant something else.


Yes, use __inline__ if and only if it is exported to userspace.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] Version 9 (2.6.24-rc1) Smack: Simplified Mandatory Access Control Kernel

2007-10-27 Thread Casey Schaufler

--- Joshua Brindle <[EMAIL PROTECTED]> wrote:

> Casey Schaufler wrote:
> > The Smack patch and Paul Moore's netlabel API patch,
> > together for 2.6.24-rc1. Paul's changes are identical
> > to the previous posting, but it's been a while so they're
> > here again.
> >
> > The sole intent of change has been to address locking
> > and/or list processing issues. Please don't hesitate to
> > point out any problems that you might see or suggest
> > alternatives where things might not be to your liking.
> >
> > This version is aimed at 2.6.24, and has been tested
> > against 2.6.24-rc1.
> >   
> with both of these patches applied to 2.6.24-rc1 I get the following 
> oops when nfsd starts:
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 013c
> printing eip: c01d7e39 *pde = 
> Oops:  [#1] SMP
>
> 

Thanks for the bug report. I have just discovered that it is possible
to have a virtual disk go virtually bad under VMware. I will be
looking at this as soon as I recover.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cirrusfb nonsense

2007-10-27 Thread Al Viro
(pointer > 0) is deeply weird; (pointer >= 0) is even dumber...

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/drivers/video/cirrusfb.c b/drivers/video/cirrusfb.c
index f99cb77..f7e2d5a 100644
--- a/drivers/video/cirrusfb.c
+++ b/drivers/video/cirrusfb.c
@@ -2509,8 +2509,7 @@ static int cirrusfb_zorro_register(struct zorro_dev *z,
cinfo = info->par;
cinfo->btype = btype;
 
-   assert(z > 0);
-   assert(z2 >= 0);
+   assert(z);
assert(btype != BT_NONE);
 
cinfo->zdev = z;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scatterlist fallout: drivers/scsi/arm/

2007-10-27 Thread Al Viro

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/drivers/scsi/arm/scsi.h b/drivers/scsi/arm/scsi.h
index 21ba571..2d09fef 100644
--- a/drivers/scsi/arm/scsi.h
+++ b/drivers/scsi/arm/scsi.h
@@ -39,7 +39,7 @@ static inline int next_SCp(struct scsi_pointer *SCp)
SCp->buffer++;
SCp->buffers_residual--;
SCp->ptr = (char *)
-(page_address(SCp->buffer->page) +
+(page_address(sg_page(SCp->buffer)) +
  SCp->buffer->offset);
SCp->this_residual = SCp->buffer->length;
} else {
@@ -77,7 +77,7 @@ static inline void init_SCp(struct scsi_cmnd *SCpnt)
SCpnt->SCp.buffer = (struct scatterlist *) 
SCpnt->request_buffer;
SCpnt->SCp.buffers_residual = SCpnt->use_sg - 1;
SCpnt->SCp.ptr = (char *)
-(page_address(SCpnt->SCp.buffer->page) +
+(page_address(sg_page(SCpnt->SCp.buffer)) +
  SCpnt->SCp.buffer->offset);
SCpnt->SCp.this_residual = SCpnt->SCp.buffer->length;
SCpnt->SCp.phase = SCpnt->request_bufflen;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent

2007-10-27 Thread Anton Vorontsov
On Sat, Oct 27, 2007 at 08:54:30PM +0400, Andrey Borzenkov wrote:
> I am not exactly sure about this one ... what other power_supply class 
> drivers 
> do? Should I fix HAL instead (but then, I do not know whether HAL is the only 
> application that is using this interface).

Well, PROP_PRESENT wasn't my idea, currently it's used by pmu and
olpc drivers becuase it's not trivial to register/unregister their
batteries on physical insertion/removal. I have some plans to teach
at least pmu batteries to not use PROP_PRESENT. I don't have any
OLPC, thus I can't convert it.

To sum this: the good way to handle "missing" batteries is to
unregister them, so they'll not show up in the /sys/class/power_supply.
Is that possible with the ACPI?

The good example is ds2760 batteries:

drivers/power/ds2760_battery.c - is platform device
drivers/w1/slaves/w1_ds2760.c - is w1 slave, which
registers/unregisters pdevs on the detection/removal.

> Subject: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery 
> is absent

Bad idea. Don't use present attribute, if possible.

> From: Andrey Borzenkov <[EMAIL PROTECTED]>
> 
> Ensure that we always have "present" attribute in sysfs. This is compatible
> with procfs case where we had "present: no" if battery was not available.
> 
> This fixes HAL battery detection where it does pretend battery is present
> but canot provide any value for it.
> 
> Signed-off-by: Andrey Borzenkov <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/acpi/battery.c |   11 ++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
> index 681e26b..6e67fcd 100644
> --- a/drivers/acpi/battery.c
> +++ b/drivers/acpi/battery.c
> @@ -187,6 +187,10 @@ static int acpi_battery_get_property(struct power_supply 
> *psy,
>   return 0;
>  }
>  
> +static enum power_supply_property missing_battery_props[] = {
> + POWER_SUPPLY_PROP_PRESENT,
> +};
> +
>  static enum power_supply_property charge_battery_props[] = {
>   POWER_SUPPLY_PROP_STATUS,
>   POWER_SUPPLY_PROP_PRESENT,
> @@ -389,8 +393,13 @@ static int acpi_battery_update(struct acpi_battery 
> *battery)
>  {
>   int saved_present = acpi_battery_present(battery);
>   int result = acpi_battery_get_status(battery);
> - if (result || !acpi_battery_present(battery))
> + if (result)
>   return result;
> + if (!acpi_battery_present(battery)) {
> + battery->bat.properties = missing_battery_props;
> + battery->bat.num_properties = ARRAY_SIZE(missing_battery_props);
> + return result;
> + }
>   if (saved_present != acpi_battery_present(battery) ||
>   !battery->update_time) {
>   battery->update_time = 0;

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Dump stack during sysctl registration failure

2007-10-27 Thread Alexey Dobriyan
On Sat, Oct 27, 2007 at 10:34:53PM +0400,  wrote:
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = {
>   {
>   .ctl_name   = CTL_KERN,
>   .procname   = "kernel",
> - .mode   = 0755,
> + .mode   = 0555,

Speaking of...


[PATCH] Dump stack during sysctl registration failure

Let's make immediately obvious from where sysctl comes from and
messages itself more noticeable.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 kernel/sysctl_check.c |1 +
 1 file changed, 1 insertion(+)

--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1432,6 +1432,7 @@ static void set_fail(const char **fail, struct ctl_table 
*table, const char *str
printk(KERN_ERR "sysctl table check failed: ");
sysctl_print_path(table);
printk(" %s\n", *fail);
+   dump_stack();
}
*fail = str;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scatterlist fallout: mmc

2007-10-27 Thread Al Viro
 #include 
is an odd thing to do...

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
 drivers/mmc/host/au1xmmc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/mmc/host/au1xmmc.c b/drivers/mmc/host/au1xmmc.c
index b2104d4..c3926eb 100644
--- a/drivers/mmc/host/au1xmmc.c
+++ b/drivers/mmc/host/au1xmmc.c
@@ -40,7 +40,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
-- 
1.5.3.GIT

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ide/arm/icside: fallout from commit 86f3a492bb09eee5745b93af35f2212179c251fd

2007-10-27 Thread Al Viro
struct device doesn't have ->dma; it's in struct expansion_card where
that struct device is embedded into.

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/drivers/ide/arm/icside.c b/drivers/ide/arm/icside.c
index 410a0d1..93f71fc 100644
--- a/drivers/ide/arm/icside.c
+++ b/drivers/ide/arm/icside.c
@@ -316,13 +316,13 @@ static int icside_dma_end(ide_drive_t *drive)
 
drive->waiting_for_dma = 0;
 
-   disable_dma(state->dev->dma);
+   disable_dma(ECARD_DEV(state->dev)->dma);
 
/* Teardown mappings after DMA has completed. */
dma_unmap_sg(state->dev, hwif->sg_table, hwif->sg_nents,
 hwif->sg_dma_direction);
 
-   return get_dma_residue(state->dev->dma) != 0;
+   return get_dma_residue(ECARD_DEV(state->dev)->dma) != 0;
 }
 
 static void icside_dma_start(ide_drive_t *drive)
@@ -331,8 +331,8 @@ static void icside_dma_start(ide_drive_t *drive)
struct icside_state *state = hwif->hwif_data;
 
/* We can not enable DMA on both channels simultaneously. */
-   BUG_ON(dma_channel_active(state->dev->dma));
-   enable_dma(state->dev->dma);
+   BUG_ON(dma_channel_active(ECARD_DEV(state->dev)->dma));
+   enable_dma(ECARD_DEV(state->dev)->dma);
 }
 
 static int icside_dma_setup(ide_drive_t *drive)
@@ -350,7 +350,7 @@ static int icside_dma_setup(ide_drive_t *drive)
/*
 * We can not enable DMA on both channels.
 */
-   BUG_ON(dma_channel_active(state->dev->dma));
+   BUG_ON(dma_channel_active(ECARD_DEV(state->dev)->dma));
 
icside_build_sglist(drive, rq);
 
@@ -367,14 +367,14 @@ static int icside_dma_setup(ide_drive_t *drive)
/*
 * Select the correct timing for this drive.
 */
-   set_dma_speed(state->dev->dma, drive->drive_data);
+   set_dma_speed(ECARD_DEV(state->dev)->dma, drive->drive_data);
 
/*
 * Tell the DMA engine about the SG table and
 * data direction.
 */
-   set_dma_sg(state->dev->dma, hwif->sg_table, hwif->sg_nents);
-   set_dma_mode(state->dev->dma, dma_mode);
+   set_dma_sg(ECARD_DEV(state->dev)->dma, hwif->sg_table, hwif->sg_nents);
+   set_dma_mode(ECARD_DEV(state->dev)->dma, dma_mode);
 
drive->waiting_for_dma = 1;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc1 sysctl table check failed on PowerMac

2007-10-27 Thread Alexey Dobriyan
On Sat, Oct 27, 2007 at 07:57:38PM +0200, Mikael Pettersson wrote:
> Booting 2.6.24-rc1 on my PowerMac the kernel now spits
> out a sysctl warning early in the boot sequence:
> 
> --- dmesg-2.6.23
> +++ dmesg-2.6.24-rc1
> ...
>  IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
>  TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
>  TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
>  TCP: Hash tables configured (established 131072 bind 65536)
>  TCP reno registered
> +sysctl table check failed: /kernel .1 Writable sysctl directory

[PATCH] powerpc: fix sysctl whining re kernel.powersave-nap

kernel was marked with 0755. Everywhere else it's 0555.

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 arch/powerpc/kernel/idle.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -122,7 +122,7 @@ static ctl_table powersave_nap_sysctl_root[] = {
{
.ctl_name   = CTL_KERN,
.procname   = "kernel",
-   .mode   = 0755,
+   .mode   = 0555,
.child  = powersave_nap_ctl_table,
},
{}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] scatterlist fallout: frv

2007-10-27 Thread Al Viro

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/arch/frv/mb93090-mb00/pci-dma.c b/arch/frv/mb93090-mb00/pci-dma.c
index 671ce1e..662f7b1 100644
--- a/arch/frv/mb93090-mb00/pci-dma.c
+++ b/arch/frv/mb93090-mb00/pci-dma.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 void *dma_alloc_coherent(struct device *hwdev, size_t size, dma_addr_t 
*dma_handle, gfp_t gfp)
@@ -86,7 +87,7 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, 
int nents,
dampr2 = __get_DAMPR(2);
 
for (i = 0; i < nents; i++) {
-   vaddr = kmap_atomic(sg[i].page, __KM_CACHE);
+   vaddr = kmap_atomic(sg_page([i]), __KM_CACHE);
 
frv_dcache_writeback((unsigned long) vaddr,
 (unsigned long) vaddr + PAGE_SIZE);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID 10 w AHCI w NCQ = Spurius I/O error

2007-10-27 Thread Bill Davidsen

Nestor A. Diaz wrote:

Hello People,

I need your help, this problem is turning me crazy.


Did you know there is a raid list?


I have created a RAID 10 using  a RAID0 configuration on top of a two 
RAID1 devices (all software raid), like this:


You have created a raid 0+1, raid10 is a different thing. Given your 
setup, raid10 is probably what you *should* have created.


Personalities : [raid0] [raid1]
md4 : active raid0 md2[0] md3[1]
 605071872 blocks 64k chunks

md0 : active raid1 sdd3[3] sda3[0] sdc3[2] sdb3[1]
 9791552 blocks [4/4] []

md3 : active raid1 sdd2[2](F) sdb2[0]
 302536000 blocks [2/1] [U_]

md1 : active raid1 sdd1[3] sda1[0] sdc1[2] sdb1[1]
 240832 blocks [4/4] []

md2 : active raid1 sda2[0] sdc2[1]
 302536000 blocks [2/2] [UU]

unused devices: 

But the sdd device sometimes fail, i have changed the hard disk, check 
the older sata drive, reformat using mke2fs -c -c (to check for media 
errrors both read and write, no media problems found, change the sata 
disk and the problem remains, also with a new sata hard disk).


The systema is a supermicro server 5015-mt+ with an ich7 ahci controller


[___snip___]


The RAID 1 builds perfectly, but five days after that, the system shows a:

end_request: I/O error, dev sdd, sector 144006110
raid1: Disk failure on sdd2, disabling device.
Operation continuing on 1 devices
end_request: I/O error, dev sdd, sector 144006222
end_request: I/O error, dev sdd, sector 144268814
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2
disk 1, wo:1, o:0, dev:sdd2
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sdb2


Hardware error, almost certainly. If you're using a hub, I suspect that 
first, then cables and heat problems, then the controller, in rough 
order of likelyhood.


a week before i get (under 2.6.18) the following message:


[___lots more snip___]



I have updated from 2.6.18 to 2.6.22 expecting to not have the problem, 
but the problem remains and i didn't know what could be the problem, the 
problem  always happen on /dev/sdd, i use LVM on top of the RAID 10 
software device.


I am not sure if the problem was because i create the RAID10 using two 
RAID1 devices and then do a RAID0, or should i have to be used mdadm and 
the level 10 option ?


Any suggestions will be welcome.

Do you ever get errors in partitions which are not part of the raid0+1 
setup, like md1? If not, look at your partition tables to see if you 
have any strange values there.


Are all drives at the same firmware level?

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.24-rc1: ensure "present" sysfs attribute even if battery is absent

2007-10-27 Thread Alexey Starikovskiy
Andrey Borzenkov wrote:
> On Saturday 27 October 2007, Alexey Starikovskiy wrote:
>> Andrey Borzenkov wrote:
>>> I am not exactly sure about this one ... what other power_supply class
>>> drivers do? Should I fix HAL instead (but then, I do not know whether HAL
>>> is the only application that is using this interface).
>> Hm, do you need separate set of properties for that? You could register
>> either of existing two, and read function will not allow read of anything
>> but "present". IMHO, this is what other modules do (/drivers/power)
> 
> Do they have different set of properties depending on underlying hardware 
> that 
> you can't query unless hardware is present? I'd rather avoid adding fake 
> attributes; but I do not actually care so which one do you prefer? :)
I prefer less code :) 
> 
>> One remaining trick here, you need to call unregister/register for
>> power_supply if you change attributes -- so please check if your patched
>> driver survives insertion of the battery.
>>
> 
> 
> Neither does your code (nor kpowersave :) ) Remove battery and set of 
> attributes is "stuck" instead of being reset to only fixed set of power 
> device attributes (basically "info"). The only call to power_supply_register 
> is in acpi_battery_add and as far as I can tell this is executed on adding 
> *slot* not when content of this slot changes.
Point is -- it does not break :) If you start to play with shorter length of 
attributes table and don't call unregister/register, power_supply may 
go past the end of table.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >