date:20070805

Re: [RFC 16/26] union-mount: Introduce union_mount structure

2007-08-05 Thread Bharata B Rao

On Mon, Jul 30, 2007 at 06:13:39PM +0200, Jan Blunck wrote:
> +
> +int append_to_union(struct vfsmount *mnt, struct dentry *dentry,
> + struct vfsmount *dest_mnt, struct dentry *dest_dentry)
> +{
> + struct union_mount *this, *um;
> +
> + BUG_ON(!IS_MNT_UNION(mnt));
> +
> + this = union_alloc(dentry, mnt, dest_dentry, dest_mnt);
> + if (!this)
> + return -ENOMEM;
> +
> + spin_lock(_lock);
> + um = union_lookup(dentry, mnt);
> + if (um) {
> + BUG_ON((um->u_next.dentry != dest_dentry) ||
> +(um->u_next.mnt != dest_mnt));
> + spin_unlock(_lock);
> + union_put(this);
> + return 0;
> + }
> + __union_hash(this);
> + spin_unlock(_lock);
> + return 0;
> +}

This breaks if we append to union stack from outside of the union.
A particular case I hit is with a 3 layer union with a subdir union
between topmost and bottom layer. Now if you create the same-named
directory in the middle layer from outside of this union, you hit the
above BUG_ON. The below patch fixes this and it applies on top of all of
your patches.

From: Bharata B Rao <[EMAIL PROTECTED]>

Direct additions to union stack from outside of the union is resulting in
BUG_ON. But this is a valid case and hence needs to be supported. Modify
append_to_union() to correctly handle this case.

Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
---
 fs/namei.c|8 +---
 fs/union.c|   32 
 include/linux/union.h |4 ++--
 3 files changed, 31 insertions(+), 13 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -512,7 +512,7 @@ static int __cache_lookup_union(struct n
}
 
/* now we know we found something "real"  */
-   append_to_union(last.mnt, last.dentry, nd->mnt, dentry);
+   append_to_union(last.mnt, last.dentry, nd->mnt, dentry, 1);
 
if (last.dentry != path->dentry)
pathput();
@@ -789,7 +789,8 @@ static int __real_lookup_union(struct na
}
 
/* now we know we found something "real" */
-   append_to_union(last.mnt, last.dentry, next.mnt, next.dentry);
+   append_to_union(last.mnt, last.dentry,
+   next.mnt, next.dentry, 1);
 
if (last.dentry != path->dentry)
pathput();
@@ -1775,7 +1776,8 @@ static int __hash_lookup_union(struct na
}
 
/* now we know we found something "real" */
-   append_to_union(last.mnt, last.dentry, next.mnt, next.dentry);
+   append_to_union(last.mnt, last.dentry,
+   next.mnt, next.dentry, 1);
 
if (last.dentry != path->dentry)
pathput();
--- a/fs/union.c
+++ b/fs/union.c
@@ -248,7 +248,8 @@ int is_unionized(struct dentry *dentry, 
 }
 
 int append_to_union(struct vfsmount *mnt, struct dentry *dentry,
-   struct vfsmount *dest_mnt, struct dentry *dest_dentry)
+   struct vfsmount *dest_mnt, struct dentry *dest_dentry,
+   int from_lookup)
 {
struct union_mount *this, *um;
 
@@ -264,11 +265,26 @@ int append_to_union(struct vfsmount *mnt
spin_lock(_lock);
um = union_lookup(dentry, mnt);
if (um) {
-   BUG_ON((um->u_next.dentry != dest_dentry) ||
-  (um->u_next.mnt != dest_mnt));
-   spin_unlock(_lock);
-   union_put(this);
-   return 0;
+   if (um->u_next.dentry == dest_dentry &&
+   um->u_next.mnt == dest_mnt) {
+   spin_unlock(_lock);
+   union_put(this);
+   return 0;
+   }
+   if (from_lookup) {
+   __union_unhash(um);
+   list_del(>u_list);
+   list_del(>u_unions);
+   um->u_next.dentry->d_unionized--;
+   spin_unlock(_lock);
+   union_put(um);
+   spin_lock(_lock);
+   } else {
+   BUG();
+   spin_unlock(_lock);
+   union_put(this);
+   return 0;
+   }
}
list_add(>u_list, >mnt_unions);
list_add(>u_unions, >d_unions);
@@ -451,7 +467,7 @@ int attach_mnt_union(struct vfsmount *mn
if (!IS_MNT_UNION(mnt))
return 0;
 
-   return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry);
+   return append_to_union(mnt, mnt->mnt_root, dest_mnt, dest_dentry, 0);
 }
 
 void detach_mnt_union(struct vfsmount *mnt)
@@ -941,7 +957,7 @@ struct dentry *union_create_topmost(stru
UM_DEBUG_DENTRY(dentry);
 
res =

Re: Possible error in 2.6.23-rc2-rt1 series

2007-08-05 Thread Ingo Molnar


* Peter Williams <[EMAIL PROTECTED]> wrote:

> I've just been reviewing these patches and have spotted a possible 
> error in the file arch/ia64/kernel/time.c in that the scope of the
> #ifdef on CONFIG_TIME_INTERPOLATION seems to have grown quite a lot
> since 2.2.23-rc1-rt7.  It used to chop out one if statement and now it 
> chops out half the file.

i have not got much feedback about the ia64 -rt code. Does it even 
compile? The above thing could be a merge artifact - TIME_INTERPOLATION 
has been removed from upstream recently.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Arjan van de Ven

On Mon, 2007-08-06 at 03:03 +0200, Roman Zippel wrote:

> There's no problem to provide a high resolution sleep, but there is also 
> no reason to mess with msleep, don't fix what ain't broken...

John Corbet provided the patch because he had a problem with the current
msleep... in that it didn't provide as good a common case as he
wanted... so I think your statement is wrong ;)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/5] x86_64 EFI support -v3

2007-08-05 Thread Huang, Ying

On Tue, 2007-07-31 at 12:47 +0800, Eric W. Biederman wrote:
> Using efi_set_virtual means kdump doesn't work which means that no
> one is going to use this in a prebuilt kernel.

It is possible to make kexec/kdump work with EFI virtual mode, in
following ways:

1. Do not turn on EFI in kexeced kernel. That is, when kexec prepares
the boot parameters for kexeced kernel, do not set boot parameter
"EFI_LOADER_SIG" to be "EFIL". And, if the boot parameter
"screen_info.orig_video_isVGA" is set to VIDEO_TYPE_EFI and other
members of "screen_info" are set properly, the EFI framebuffer can work
properly too. With this method, a EFI disabled kernel can be kexeced
from a EFI enabled kernel. This is OK for kdump to work.

2. If it is intended to kexec a EFI enabled kernel from a EFI enabled
kernel, the same method as IA64 EFI virtual mode support can be used.
That is, the memory area used by EFI runtime service is mapped to exact
same address in both kernels, and the "efi_set_virtual" is not called in
kexeced kernel. A fixmap area can be used to map memory mapped IO area
of EFI runtime service, the code and data area of EFI runtime service
are always mapped to same address in direct map area.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] implement smarter atime updates support, v2

2007-08-05 Thread Ingo Molnar


* Theodore Tso <[EMAIL PROTECTED]> wrote:

> On Sun, Aug 05, 2007 at 09:28:38PM +0200, Ingo Molnar wrote:
> > 
> > added the relatime_interval sysctl that allows the changing of the 
> > atime update frequency. (default: 1 day / 86400 seconds)
> 
> What if you specify the interval as a per-mount option?  i.e.,
> 
>   mount -o relatime=86400 /dev/sda2 /u1
> 
> If you had this, I don't think we would need the sysctl tuning 
> parameter.

it's much more flexible if there are _more_ options available. People 
can thus make use of the feature earlier, use it even on distros that 
dont support it yet, etc.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Off-by-one in /sys/module/*/refcnt

2007-08-05 Thread Tejun Heo

Kay Sievers wrote:
>> @@ -785,7 +785,7 @@ static ssize_t show_refcnt(struct module_attribute 
>> *mattr,
>>struct module *mod, char *buffer)
>>  {
>> /* sysfs holds a reference */
>> -   return sprintf(buffer, "%u\n", module_refcount(mod)-1);
>> +   return sprintf(buffer, "%u\n", module_refcount(mod));
>>  }
> 
> It's likely caused by sysfs core changes, that opened attributes are
> no longer coupled to the refcount of modules. They used to take a
> reference.
> 
> The "holds a reference" comment should be removed along with your fix.
> Adding Tejun, to confirm this.

Yeap, that's correct.  Opening a sysfs node doesn't hold the module anymore.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Disk spin down issue on shut down/suspend to disk

2007-08-05 Thread Tejun Heo

Cc'ing Henrique.  Any ideas?

Michał sed wrote:
> Greetings
> 
> I'm experiencing double disk spin down issue on my HP nx6310 laptop
> during shut down and suspend to disk. The drive is power down on "Will
> now halt message" then turned back on and off again with the laptop
> itself. I'm using the newest bios available F.0D, 2.6.23-rc1-mm kernel
> along with Debian Unstable plus fixes from Sidux repository so I have
> the updated shut down script. I have also verified on two other systems
> [AMD/Nforce based] that the spin down issue has been resolved by the
> Sidux update and I'm certain that this is a hp bios bug or a piix kernel
> module problem.
> 
> Michael



-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix /proc/pid/pagemap return length calculation

2007-08-05 Thread Dave Boutcher

On Sun, 5 Aug 2007 22:34:46 -0500, Matt Mackall <[EMAIL PROTECTED]> said:
> 
> On Sun, Aug 05, 2007 at 09:03:23PM -0500, Dave Boutcher wrote:
>> 
>> /proc/pid/pagemap has a header (usually 8 bytes) the length
>> of which needs to be compensated for when converting from
>> proc file offset to page number.  The calculation of the
>> starting page number (svpfn) compensates for this, but the
>> calculation of the ending page number (evpfn) does not, resulting
>> in reads returning 8 bytes more than were asked for and
>> nastily overwriting userspace memory.
> 
> Does this mean you're running on a 64-bit arch? I'd already fixed this
> locally, but it was off by 4 for me.
> 
> Acked-by: Matt Mackall <[EMAIL PROTECTED]>

Yeah, and there is going to be at least one more patch coming, since
with this fix, which is a righteous fix, things don't get copied up to
user space correctly since some other code was dependent on the borken 
length :-)

I like the /proc/xxx/pagemap function though...thanks for writing it.

Dave B
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: why are some atomic_t's not volatile, while most are?

2007-08-05 Thread Jerry Jiang

Is there some feedback on this point ?

Thank you
./Jerry

On Sun, 1 Jul 2007 08:49:37 -0400 (EDT)
"Robert P. J. Day" <[EMAIL PROTECTED]> wrote:

> 
>   prompted by the earlier post on "volatile"s, is there a reason that
> most atomic_t typedefs use volatile int's, while the rest don't?
> 
> $ grep "typedef.*struct"  $(find . -name atomic.h)
> ./include/asm-v850/atomic.h:typedef struct { int counter; } atomic_t;
> ./include/asm-mips/atomic.h:typedef struct { volatile int counter; } atomic_t;
> ./include/asm-mips/atomic.h:typedef struct { volatile long counter; } 
> atomic64_t;
> ...
> 
>   etc, etc.  just curious.
> 
> rday
> -- 
> 
> Robert P. J. Day
> Linux Consulting, Training and Annoying Kernel Pedantry
> Waterloo, Ontario, CANADA
> 
> http://fsdev.net/wiki/index.php?title=Main_Page
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Memory leaking behaviour in 2.6.20.11, reiserfs related?

2007-08-05 Thread Rob Mueller


This is pretty much a vanilla kernel, with just one patch to work
around a deadlock problem in the reiserfs_file_write code that I
think isn't fixed.

http://lists.linuxcoding.com/kernel/2006-q1/msg32508.html


So that sounds like a reiserfs bug.


Yes, and it was definitely there still in 2.6.16. I don't know if it's been 
fixed since or not, haven't heard anything more from anyone on it.



BUG: at fs/reiserfs/inode.c:2868 reiserfs_releasepage()
 [] reiserfs_releasepage+0xa3/0xa8

...

 [] kernel_thread_helper+0x7/0x1c
 ===


And so does that.


These messages are "new", in that I think we've only seen them since 
upgrading to 2.6.20, they weren't in 2.6.16. They do seem like a reiserfs 
bug, but haven't seen any confirmation of that either. I thought they might 
be related to the leak behaviour.



Re: 2.6.22-rc6-mm1 + leak patches


I haven't had a chance to test these. Given that you're not sure they'll 
even be helpful, is there something else we can test first before going down 
this path?



Quite a few people are using reiserfs and yours is the only report of this
which I can recall.  Can you think of any reason why your setup differs
from most other people's?


No, not really, that 1 patch I mentioned previously is the only difference 
to a vanilla kernel.


Some other things that are interesting/strange

1. After rebooting that machine, I found it in the same memory leaked state 
17 hours later, so it doesn't even take a day to leak all that memory.
2. Although it ended up in the same leaked state after just 17 hours, even a 
week later, the machine is still running fine. It seems to reach a "steady 
state" where it has lots of leaked memory, but it doesn't cause the machine 
to swap or do anything particularly crazy, it just sits in that state.
3. We use the exact same kernel on some Prescott Xeon based machines with 8G 
of memory, and they don't display the same problem at all. The problem only 
seems to be occuring on our newer 12G Woodcrest Xeon based machines. For 
example.


[EMAIL PROTECTED] ~]$ uname -a
Linux imap8 2.6.20.11-reiserfix-fai #1 SMP Wed May 23 09:40:20 UTC 2007 i686 
GNU/Linux

[EMAIL PROTECTED] ~]$ cat /proc/cpuinfo | grep 'model name'
model name  : Intel(R) Xeon(TM) CPU 3.00GHz
model name  : Intel(R) Xeon(TM) CPU 3.00GHz
[EMAIL PROTECTED] ~]$ free
total   used   free sharedbuffers cached
Mem:   83089088034260 274648  0 4937242008128
-/+ buffers/cache:55324082776500
Swap:  2048276  618201986456
[EMAIL PROTECTED] ~]$ ps auxw | wc -l
1538

[EMAIL PROTECTED] ~]$ uname -a
Linux imap9 2.6.20.11-reiserfix-fai #1 SMP Thu May 10 01:57:03 UTC 2007 i686 
GNU/Linux

[EMAIL PROTECTED] ~]$ cat /proc/cpuinfo | grep 'model name'
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
model name  : Intel(R) Xeon(R) CPU5130  @ 2.00GHz
[EMAIL PROTECTED] ~]$ free
total   used   free sharedbuffers cached
Mem:  12466848   12419764  47084  0 4635641550232
-/+ buffers/cache:   104059682060880
Swap:  2048276  698281978448
[EMAIL PROTECTED] ~]$ ps auxw | wc -l
1523

Actually, maybe the other machines are displaying the same problem, I just 
wasn't as aware of it because it doesn't actually make the machine seem to 
do anything crazy. I guess I realised that there was definitely a problem 
with the new machines, because they were using a similar number and mix of 
processes to the other machines, but seemed to be using twice as much 
memory!


Rob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Possible error in 2.6.23-rc2-rt1 series

2007-08-05 Thread Peter Williams

I've just been reviewing these patches and have spotted a possible
error in the file arch/ia64/kernel/time.c in that the scope of the
#ifdef on CONFIG_TIME_INTERPOLATION seems to have grown quite a lot
since 2.2.23-rc1-rt7.  It used to chop out one if statement and now it
chops out half the file.

Is it correct?
Peter
-- 
Peter Williams   [EMAIL PROTECTED]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Fix /proc/pid/pagemap return length calculation

2007-08-05 Thread Matt Mackall

On Sun, Aug 05, 2007 at 09:03:23PM -0500, Dave Boutcher wrote:
> 
> /proc/pid/pagemap has a header (usually 8 bytes) the length
> of which needs to be compensated for when converting from
> proc file offset to page number.  The calculation of the
> starting page number (svpfn) compensates for this, but the
> calculation of the ending page number (evpfn) does not, resulting
> in reads returning 8 bytes more than were asked for and
> nastily overwriting userspace memory.

Does this mean you're running on a 64-bit arch? I'd already fixed this
locally, but it was off by 4 for me.

Acked-by: Matt Mackall <[EMAIL PROTECTED]>
 
> Diffed against 2.6.23-rc1-mm2
> 
> Signed-off-by: Dave Boutcher <[EMAIL PROTECTED]>
> ---
>  fs/proc/task_mmu.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 4594f15..b2baeab 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -627,7 +627,7 @@ static ssize_t pagemap_read(struct file *file, char 
> __user *buf,
>   addr = PAGE_SIZE * svpfn;
>   if ((svpfn + 1) * sizeof(unsigned long) != src)
>   goto out;
> - evpfn = min((src + count) / sizeof(unsigned long),
> + evpfn = min((src + count) / sizeof(unsigned long) - 1,
>   ((~0UL) >> PAGE_SHIFT) + 1);
>   count = (evpfn - svpfn) * sizeof(unsigned long);
>   end = PAGE_SIZE * evpfn;
> -- 
> 1.4.4.2

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lmbench ctxsw regression with CFS

2007-08-05 Thread Nick Piggin

On Sat, Aug 04, 2007 at 08:50:37AM +0200, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Oh good. Thanks for getting to the bottom of it. We have normally 
> > disliked too much runtime tunables in the scheduler, so I assume these 
> > are mostly going away or under a CONFIG option for 2.6.23? Or...?
> 
> yeah, they are all already under CONFIG_SCHED_DEBUG. (it's just that the 
> add-on optimization is not upstream yet - the tunings are still being 

Ah, OK. So long as that goes upstream I'm happy... and it is good
to see that with that patch, the base context switching performance
_has_ actually gone up like I had hoped. Nice.

> tested) Btw., with SCHED_DEBUG we now also have your domain-tree sysctl 
> patch upstream, which has been in -mm for a near eternity.
> 
> > What CPU did you get these numbers on? Do the indirect calls hurt much 
> > on those without an indirect predictor? (I'll try running some tests).
> 
> it was on an older Athlon64 X2. I never saw indirect calls really 
> hurting on modern x86 CPUs - dont both CPU makers optimize them pretty 
> efficiently? (as long as the target function is always the same - which 
> it is here.)

I think a lot of CPUs do. I think ia64 does not. It predicts
based on the contents of a branch target register which has to
be loaded I presume before instructoin fetch reaches the branch.
I don't know if this would hurt or not.

> > I must say that I don't really like the indirect calls a great deal, 
> > and they could be eliminated just with a couple of branches and direct 
> > calls.
> 
> yeah - i'll try that too. We can make the indirect call the uncommon 
> case and a NULL pointer be the common case, combined with a 'default', 
> direct function call. But i doubt it makes a big (or even measurable) 
> difference.

You might be right there.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] take sched_debug.c out of nasal demon territory

2007-08-05 Thread Al Viro

C99 6.10.3[11]: preprocessing directive within the argument list
of macro invocation => undefined behaviour.  Don't do that...

Signed-off-by: Al Viro <[EMAIL PROTECTED]>
---
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index 1c61e53..8421b93 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -36,24 +36,24 @@ print_task(struct seq_file *m, struct rq *rq, struct 
task_struct *p, u64 now)
else
SEQ_printf(m, " ");
 
-   SEQ_printf(m, "%15s %5d %15Ld %13Ld %13Ld %9Ld %5d "
- "%15Ld %15Ld %15Ld %15Ld %15Ld\n",
+   SEQ_printf(m, "%15s %5d %15Ld %13Ld %13Ld %9Ld %5d ",
p->comm, p->pid,
(long long)p->se.fair_key,
(long long)(p->se.fair_key - rq->cfs.fair_clock),
(long long)p->se.wait_runtime,
(long long)(p->nvcsw + p->nivcsw),
-   p->prio,
+   p->prio);
 #ifdef CONFIG_SCHEDSTATS
+   SEQ_printf(m, "%15Ld %15Ld %15Ld %15Ld %15Ld\n",
(long long)p->se.sum_exec_runtime,
(long long)p->se.sum_wait_runtime,
(long long)p->se.sum_sleep_runtime,
(long long)p->se.wait_runtime_overruns,
-   (long long)p->se.wait_runtime_underruns
+   (long long)p->se.wait_runtime_underruns);
 #else
-   0LL, 0LL, 0LL, 0LL, 0LL
+   SEQ_printf(m, "%15Ld %15Ld %15Ld %15Ld %15Ld\n",
+   0LL, 0LL, 0LL, 0LL, 0LL);
 #endif
-   );
 }
 
 static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu, u64 now)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] VFS: mnotify (was: [PATCH 00/23] per device dirty throttling -v8)

2007-08-05 Thread Al Boldi

Jakob Oestergaard wrote:
> Why on earth would you cripple the kernel defaults for ext3 (which is a
> fine FS for boot/root filesystems), when the *fundamental* problem you
> really want to solve lie much deeper in the implementation of the
> filesystem?  Noatime doesn't solve the problem, it just makes it "less
> horrible".

inotify could easily solve the atime problem, but it's got the drawback of 
forcing the user to register each and every file/dir of interest, which 
isn't really reasonable on TB-filesystems.

It could be feasible to introduce mnotify, which would notify the user of 
meta changes, like atime, across the filesystem.  Something like mnotify 
could also be helpful in CoW situations, provided it supported an in-sync 
interface.

Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RT] put in a relatively high number for rcu read lock upper limit.

2007-08-05 Thread Paul E. McKenney

On Sun, Aug 05, 2007 at 07:53:10PM +0200, Ingo Molnar wrote:
> 
> * Steven Rostedt <[EMAIL PROTECTED]> wrote:
> 
> > Paul and Ingo,
> > 
> > Should we just remove the upper limit check, or is something like this 
> > patch sound?
> 
> i've changed the limit to 30 (the same depth limit is used by lockdep).
> 
> beyond that we could get stack overflow, etc.

Works for me!

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix /proc/pid/pagemap return length calculation

2007-08-05 Thread Dave Boutcher


/proc/pid/pagemap has a header (usually 8 bytes) the length
of which needs to be compensated for when converting from
proc file offset to page number.  The calculation of the
starting page number (svpfn) compensates for this, but the
calculation of the ending page number (evpfn) does not, resulting
in reads returning 8 bytes more than were asked for and
nastily overwriting userspace memory.

Diffed against 2.6.23-rc1-mm2

Signed-off-by: Dave Boutcher <[EMAIL PROTECTED]>
---
 fs/proc/task_mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4594f15..b2baeab 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -627,7 +627,7 @@ static ssize_t pagemap_read(struct file *file, char __user 
*buf,
addr = PAGE_SIZE * svpfn;
if ((svpfn + 1) * sizeof(unsigned long) != src)
goto out;
-   evpfn = min((src + count) / sizeof(unsigned long),
+   evpfn = min((src + count) / sizeof(unsigned long) - 1,
((~0UL) >> PAGE_SHIFT) + 1);
count = (evpfn - svpfn) * sizeof(unsigned long);
end = PAGE_SIZE * evpfn;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: e1000 doesn't resume properly from standby (2.6.23-rc2)

2007-08-05 Thread Kok, Auke


Simon Arlott wrote:

00:0a.0 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller 
(Copper) (rev 01)
Subsystem: Intel Corp.: Unknown device 1012
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 5
Memory at e302 (64-bit, non-prefetchable) [size=128K]
I/O ports at b000 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 
Enable-

00:0a.1 Ethernet controller: Intel Corp. 82546EB Gigabit Ethernet Controller 
(Copper) (rev 01)
Subsystem: Intel Corp.: Unknown device 1012
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 12
Memory at e300 (64-bit, non-prefetchable) [size=128K]
I/O ports at b400 [size=64]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device.
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 
Enable-

[  950.132046] Stopping tasks ... done.
[  950.459794] Suspending console(s)
[  951.776277] pnp: Device 00:0c disabled.
[  951.776673] pnp: Device 00:0a disabled.
[  951.776984] pnp: Device 00:09 disabled.
[  951.777306] pnp: Device 00:08 disabled.
[  951.86] ACPI: PCI interrupt for device :00:11.5 disabled
[  951.995359] ACPI: PCI interrupt for device :00:11.3 disabled
[  952.006094] ACPI: PCI interrupt for device :00:11.2 disabled
[  952.022243] ACPI handle has no context!
[  952.033068] ACPI: PCI interrupt for device :00:0c.2 disabled
[  952.044086] ACPI: PCI interrupt for device :00:0c.1 disabled
[  952.055083] ACPI: PCI interrupt for device :00:0c.0 disabled
[  952.282211] ACPI: PCI interrupt for device :00:0a.1 disabled
[  952.282221] ACPI handle has no context!
[  952.537474] ACPI: PCI interrupt for device :00:0a.0 disabled
[  952.537495] ACPI handle has no context!

[  956.857085] Back to C!
[  957.295035] ACPI: Unable to turn cooling device [b18d0e00] 'off'
[  957.521400] PCI: Setting latency timer of device :00:01.0 to 64
[  957.521478] ACPI: PCI Interrupt :00:09.0[A] -> Link [LNKB] -> GSI 11 
(level, low) -> IRQ 11
[  957.532256] PM: Writing back config space on device :00:0a.0 at offset f 
(was ff0100, writing ff0105)
[  957.532277] PM: Writing back config space on device :00:0a.0 at offset 8 
(was 1, writing b001)
[  957.532291] PM: Writing back config space on device :00:0a.0 at offset 4 
(was 4, writing e3020004)
[  957.532299] PM: Writing back config space on device :00:0a.0 at offset 3 
(was 80, writing 802008)
[  957.532309] PM: Writing back config space on device :00:0a.0 at offset 1 
(was 230, writing 237)
[  957.532339] ACPI: PCI Interrupt :00:0a.0[A] -> Link [LNKC] -> GSI 5 (level, 
low) -> IRQ 5
[  957.567251] PM: Writing back config space on device :00:0a.1 at offset f 
(was ff0200, writing ff020c)
[  957.567275] PM: Writing back config space on device :00:0a.1 at offset 8 
(was 1, writing b401)
[  957.567290] PM: Writing back config space on device :00:0a.1 at offset 4 
(was 4, writing e304)
[  957.567298] PM: Writing back config space on device :00:0a.1 at offset 3 
(was 80, writing 802008)
[  957.567308] PM: Writing back config space on device :00:0a.1 at offset 1 
(was 230, writing 237)
[  957.567346] ACPI: PCI Interrupt :00:0a.1[B] -> Link [LNKD] -> GSI 12 
(level, low) -> IRQ 12
[  957.589975] ACPI: PCI Interrupt :00:0b.0[A] -> Link [LNKD] -> GSI 12 
(level, low) -> IRQ 12
[  957.600217] ACPI: PCI Interrupt :00:0c.0[A] -> Link [LNKA] -> GSI 11 
(level, low) -> IRQ 11
[  957.611230] ACPI: PCI Interrupt :00:0c.1[B] -> Link [LNKB] -> GSI 11 
(level, low) -> IRQ 11
[  957.838282] ACPI: PCI Interrupt :00:0c.2[C] -> Link [LNKC] -> GSI 5 (level, 
low) -> IRQ 5
[  957.902166] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[11]  
MMIO=[e3046000-e30467ff]  Max Packet=[512]  IR/IT contexts=[4/8]
[  957.911656] ACPI: PCI Interrupt :00:11.1[A] -> Link [LNKA] -> GSI 11 
(level, low) -> IRQ 11
[  957.911666] PCI: VIA VLink IRQ fixup for :00:11.1, from 255 to 11
[  957.922034] ACPI: PCI Interrupt :00:11.2[D] -> Link [LNKD] -> GSI 12 
(level, low) -> IRQ 12
[  957.933028] ACPI: PCI Interrupt :00:11.3[D] -> Link [LNKD] -> GSI 12 
(level, low) -> IRQ 12
[  957.944076] ACPI: PCI Interrupt :00:11.5[C] -> Link [LNKC] -> GSI 5 (level, 
low) -> IRQ 5
[  957.944091] PCI: Setting latency timer of device :00:11.5 to 64
[  957.946061] ACPI: PCI Interrupt :01:00.0[A] -> Link [LNKA] -> GSI 11 
(level, low) -> IRQ 11
[  957.947464] pnp: Device 00:08 activated.
[  957.948724] pnp: Device 00:09 activated.
[  957.950635] pnp: Device 00:0a activated.
[  957.950664] pnp: Failed to activate device 00:0b.
[  957.951942] pnp: Device 00:0c activated.
[  959.883939] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full 
Duplex, Flow

Re: [PATCH] remove hugetlb_instantiation_mutex

2007-08-05 Thread Zhang, Yanmin

On Fri, 2007-08-03 at 09:53 -0700, Nish Aravamudan wrote:
> On 8/3/07, Adam Litke <[EMAIL PROTECTED]> wrote:
> > On Mon, 2007-07-30 at 15:15 +0800, Zhang, Yanmin wrote:
> > > On Fri, 2007-07-27 at 11:37 -0500, Adam Litke wrote:
> > > > Hey... I am amazed at how quickly you came back with a patch for this :)
> > > > Thanks for looking at it.  Unfortunately there is one show-stopper and I
> > > > have some reservations (pun definitely intended) with your approach:
> > > Thanks for your great comments.
> >
> > Sorry for such a long delay in responding.  I have been pretty busy
> > lately.
> >
> > > > First, your patch does not pass the libhugetlbfs test
> > > > 'alloc-instantiate-race' which was written to tickle the the race which
> > > > the mutex was introduced to solve.  Your patch works for shared
> > > > mappings, but not for the private case.
> > > My testing about private might not be thorough. Function hugetlb_cow has 
> > > a race
> > > for multi-thread to fault on the same private page index. But after I 
> > > fixed it,
> > > alloc-instantiate-race still failed.
> > >
> > > I tried to google the source code tarball of libhugetlbfs test suite, but 
> > > couldn't
> > > find it. Would you like to send me a copy of the test source codes?
> >
> > http://libhugetlbfs.ozlabs.org/releases/libhugetlbfs-1.2-pre1.tar.gz
> >
> > The tarball will contain a test called alloc-instantiate-race.  Make
> > sure to run it in private and shared mode.  Let me know what you find
> > out.
> 
> Actually, please use
> http://libhugetlbfs.ozlabs.org/snapshots/libhugetlbfs-dev-20070718.tar.gz.
> 1.2-pre1 had a build error that is fixed in the development snapshot.
Sorry for replying late. I got a fever last week.

The test case is very nice. I located the root cause.

In function hugetlb_no_page, if the thread couldn't get a quota/huge page while 
there
is no flight page, it will return VM_FAULT_SIGBUS or VM_FAULT_OOM. If the 
mapping
is private, there might be a narrow race for multi-thread fault on the same 
private
mapping index.

I added a checking "if (!pte_none(*ptep))" if the thread couldn't get a 
quota/huge page
while there is no flight page.

alloc-instantiate-race's both private and shared could pass now.

Thank all of you guys for the good pointer!

--Yanmin

The 3nd version of the patch-

--- linux-2.6.22/fs/hugetlbfs/inode.c   2007-07-09 07:32:17.0 +0800
+++ linux-2.6.22_hugetlb/fs/hugetlbfs/inode.c   2007-07-26 14:52:04.0 
+0800
@@ -662,6 +662,7 @@ hugetlbfs_fill_super(struct super_block 
spin_lock_init(>stat_lock);
sbinfo->max_blocks = config.nr_blocks;
sbinfo->free_blocks = config.nr_blocks;
+   sbinfo->flight_blocks = 0;
sbinfo->max_inodes = config.nr_inodes;
sbinfo->free_inodes = config.nr_inodes;
sb->s_maxbytes = MAX_LFS_FILESIZE;
@@ -694,8 +695,11 @@ int hugetlb_get_quota(struct address_spa
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   if (sbinfo->free_blocks > 0)
+   if (sbinfo->free_blocks > 0) {
sbinfo->free_blocks--;
+   sbinfo->flight_blocks ++;
+   } else if (sbinfo->flight_blocks)
+   ret = -EAGAIN;
else
ret = -ENOMEM;
spin_unlock(>stat_lock);
@@ -710,7 +714,30 @@ void hugetlb_put_quota(struct address_sp
 
if (sbinfo->free_blocks > -1) {
spin_lock(>stat_lock);
-   sbinfo->free_blocks++;
+   sbinfo->free_blocks ++;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_commit_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->flight_blocks --;
+   spin_unlock(>stat_lock);
+   }
+}
+
+void hugetlb_rollback_quota(struct address_space *mapping)
+{
+   struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb);
+
+   if (sbinfo->free_blocks > -1) {
+   spin_lock(>stat_lock);
+   sbinfo->free_blocks ++;
+   sbinfo->flight_blocks --;
spin_unlock(>stat_lock);
}
 }
--- linux-2.6.22/include/linux/hugetlb.h2007-07-09 07:32:17.0 
+0800
+++ linux-2.6.22_hugetlb/include/linux/hugetlb.h2007-07-24 
16:54:39.0 +0800
@@ -140,6 +140,7 @@ struct hugetlbfs_config {
 struct hugetlbfs_sb_info {
longmax_blocks;   /* blocks allowed */
longfree_blocks;  /* blocks free */
+   longflight_blocks;/* blocks allocated but still not be used */
longmax_inodes;   /* inodes allowed */
longfree_inodes;  /* inodes free */
spinlock_t  stat_lock;
@@ -166,6 +167,8 @@ extern struct vm_operations_struct huget
 struct file

Re: [PATCH] Fix /proc/pid/pagemap end address calculation

2007-08-05 Thread Matt Mackall

On Sun, Aug 05, 2007 at 09:03:28PM -0500, Dave Boutcher wrote:
> 
> When dumping vma information the pagemap_read routine calculates
> the minimum of what the user asks for and the end of the vma.
> Unfortunately the code uses vma->vm_start rather than vma->vm_end
> which can result in the end address being before the start, and
> a nasty never-ending loop in the kernel.
> 
> Diffed against 2.6.23-rc1-mm2
> 
> Signed-off-by: Dave Boutcher <[EMAIL PROTECTED]>

Thanks, Dave. I've added this fix to my local tree. It's still in a
broken state at the moment, so Andrew, feel free to pick this up.

Acked-by: Matt Mackall <[EMAIL PROTECTED]>

> ---
>  fs/proc/task_mmu.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index b2baeab..b12740c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -671,7 +671,7 @@ static ssize_t pagemap_read(struct file *file, char 
> __user *buf,
>   ret = -EIO;
>   goto out_mm;
>   }
> - vend = min(vma->vm_start - 1, end - 1) + 1;
> + vend = min(vma->vm_end - 1, end - 1) + 1;
>   ret = pagemap_fill(, vend);
>   if (ret || !pm.count)
>   break;
> -- 
> 1.4.4.2
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Blackfin arch update for 2.6.23

2007-08-05 Thread Bryan Wu

On Sun, 2007-08-05 at 22:26 -0400, Mike Frysinger wrote:
> On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> > On Sun, 2007-08-05 at 22:04 -0400, Mike Frysinger wrote:
> > > On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> > > > Bryan Wu (4):
> > > >   Blackfin SPI driver: Initial supporting BF54x in SPI driver
> > > >
> > > > Michael Hennerich (11):
> > > >   Blackfin arch: store labels so we later know who allocated 
> > > > GPIO/Peripheral resources
> > > >   Blackfin arch: add peripheral resource allocation support
> > > >   Blackfin arch: Add label to call new GPIO API
> > > >   Blackfin SPI driver: Make BF54x SPI work and add support for  
> > > > portmux API
> > > >   Blackfin SPI driver: use new GPIO API and add error handling
> > >
> > > i think this is the sort of thing Linus wants left for initial merge 
> > > windows ?
> > > -mike
> >
> > Actually, this GPIO API has been added to the upstream in -RC1. In this
> > pull, Michael's patch just enable it in arch code and driver. And it is
> > tested at least 2-3 weeks, I think it is OK for the -RC merge.
> >
> > And most our driver things are moved to depend on this new GPIO API. I
> > just wanna make thing easier to maintain.
> 
> i was referring to the SPI stuff, not GPIO
> -mike

This GIT-PULL is for Blackfin new GPIO update, so I including this SPI
driver patches related to GPIO update. You know, there are still some
SPI patches not included in this GIT-PULL because they are not related
to GPIO update.

So next GIT-PULL, I will try to send out anomaly update from you and
Robin.

Regards,
- Bryan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Fix /proc/pid/pagemap end address calculation

2007-08-05 Thread Dave Boutcher


When dumping vma information the pagemap_read routine calculates
the minimum of what the user asks for and the end of the vma.
Unfortunately the code uses vma->vm_start rather than vma->vm_end
which can result in the end address being before the start, and
a nasty never-ending loop in the kernel.

Diffed against 2.6.23-rc1-mm2

Signed-off-by: Dave Boutcher <[EMAIL PROTECTED]>
---
 fs/proc/task_mmu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index b2baeab..b12740c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -671,7 +671,7 @@ static ssize_t pagemap_read(struct file *file, char __user 
*buf,
ret = -EIO;
goto out_mm;
}
-   vend = min(vma->vm_start - 1, end - 1) + 1;
+   vend = min(vma->vm_end - 1, end - 1) + 1;
ret = pagemap_fill(, vend);
if (ret || !pm.count)
break;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATH 0/1] Kexec jump - v2 - the first step to kexec based hibernation

2007-08-05 Thread Huang, Ying

On Sun, 2007-08-05 at 20:55 +0200, Pavel Machek wrote:
> Did the trick, I got the kernel to load, and it even attempted
> exec... but I got doublefault (or what is it?)
> 
> Int 6: ... EIP: c4739906. Address is in reserve_bootmem_core.
> 
> Do I have to disable ACPI completely? I tried with acpi=off,
> nosmp... but problem does not seem device related.

It seems that the problem has nothing to do with device or ACPI. Can you
do a normal kexec? That is:

kexec -l <...>
kexec -e

or 

kexec -p <...>
ALT-SysRq-c to trigger a crash dump.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Blackfin arch update for 2.6.23

2007-08-05 Thread Mike Frysinger

On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> On Sun, 2007-08-05 at 22:04 -0400, Mike Frysinger wrote:
> > On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> > > Bryan Wu (4):
> > >   Blackfin SPI driver: Initial supporting BF54x in SPI driver
> > >
> > > Michael Hennerich (11):
> > >   Blackfin arch: store labels so we later know who allocated 
> > > GPIO/Peripheral resources
> > >   Blackfin arch: add peripheral resource allocation support
> > >   Blackfin arch: Add label to call new GPIO API
> > >   Blackfin SPI driver: Make BF54x SPI work and add support for  
> > > portmux API
> > >   Blackfin SPI driver: use new GPIO API and add error handling
> >
> > i think this is the sort of thing Linus wants left for initial merge 
> > windows ?
> > -mike
>
> Actually, this GPIO API has been added to the upstream in -RC1. In this
> pull, Michael's patch just enable it in arch code and driver. And it is
> tested at least 2-3 weeks, I think it is OK for the -RC merge.
>
> And most our driver things are moved to depend on this new GPIO API. I
> just wanna make thing easier to maintain.

i was referring to the SPI stuff, not GPIO
-mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-08-05 Thread david

On Mon, 6 Aug 2007, Nick Piggin wrote:

[EMAIL PROTECTED] wrote:

 On Sun, 29 Jul 2007, Rene Herman wrote:

>  On 07/29/2007 01:41 PM, [EMAIL PROTECTED] wrote:
> 
> >  I agree that tinkering with the core VM code should not be done 
> >  lightly,
> >   but this has been put through the proper process and is stalled with 
> >   no

> >   hints on how to move forward.
> 
> 
>  It has not. Concerns that were raised (by specifically Nick Piggin) 
>  weren't being addressed.

 I may have missed them, but what I saw from him weren't specific issues,
 but instead a nebulous 'something better may come along later'

Something better, ie. the problems with page reclaim being fixed.
Why is that nebulous?

becouse that doesn't begin to address all the benifits.

the approach of fixing page reclaim and updatedb is pretending that if you 
only do everything right pages won't get pushed to swap in the first 
place, and therefor swap prefetch won't be needed.

this completely ignores the use case where the swapping was exactly the 
right thing to do, but memory has been freed up from a program exiting so 
that you couldnow fill that empty ram with data that was swapped out.

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

suspend-to-disk using a SAS drive

2007-08-05 Thread Nevine AbouGhazaleh


I am trying to suspend-to-disk using the Suspend2 modules. I am using an SSD 
drive connected through a SAS bus interface. The SSD acts as a boot disk.
All read and write transactions to the drive works well during normal operation.  
when I hibernate (suspend-to-disk).


I get the following errors:
--
mptbase: ioc0: ERROR – Invalid IOC facts reply, msgLength=0 offset=6
pnp: Failed to activate device 00:09.
pnp: Failed to activate device 00:0a.


Immediately after these messages, Linux hangs as soon as it starts writing the 
image to the drive. showing the following output.
--
Writing Kernel and process data ...
20%...
--


When using KDB to trace the problem after the kernel hung.

 shows a running process: Ks2io

when doing Back Trace for Ks2io wheen the kernel hangs, I get the following 
trace

_spin_unlock_irq+0xb
thread_return+0x64
_raw_spin_lock+ox90
__mutex_up_process+0x10
wake_up_process+0x10
suspend_bio_write_page
mutex_lock+0x2a
flush_workqueue+0x51
kblockd_flush+0x10
do_bio_wait+0x1b
suspend_bio_write_page
suepend_bio_write_page+0x41
suspend_compress_write_page+0x137
worker_rw_loop
worker_rw_loop+0x133
worker_rw_loop
kthread+0xf5
schedule_tail_0x45
child_rip+0x45
worker_thread
kthread
child_rip 



It seems that the problem is when suspend2 tries to write to the SSD drive.
I am not sure whether the problem is with the suspend2 module or the mpt driver? 
I am using kernel 2.6.21.1 x86_64 with fedora core 5 x86_64. MPT driver 3.04
Any insights/recommendations will be greatly appreciated.  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]

2007-08-05 Thread Nick Piggin


[EMAIL PROTECTED] wrote:

On Sun, 29 Jul 2007, Rene Herman wrote:


On 07/29/2007 01:41 PM, [EMAIL PROTECTED] wrote:

 I agree that tinkering with the core VM code should not be done 
lightly,

 but this has been put through the proper process and is stalled with no
 hints on how to move forward.



It has not. Concerns that were raised (by specifically Nick Piggin) 
weren't being addressed.



I may have missed them, but what I saw from him weren't specific issues, 
but instead a nebulous 'something better may come along later'


Something better, ie. the problems with page reclaim being fixed.
Why is that nebulous?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ck] Re: -mm merge plans for 2.6.23

2007-08-05 Thread Nick Piggin


Matthew Hawkins wrote:

On 7/25/07, Nick Piggin <[EMAIL PROTECTED]> wrote:


I guess /proc/meminfo, /proc/zoneinfo, /proc/vmstat, /proc/slabinfo
before and after the updatedb run with the latest kernel would be a
first step. top and vmstat output during the run wouldn't hurt either.



Hi Nick,

I've attached two files with this kind of info.  Being up at the cron
hours of the morning meant I got a better picture of what my system is
doing.  Here's a short summary of what I saw in top:

beagleindexer used gobs of ram.  600M or so (I have 1G)


Hmm OK, beagleindexer. I thought beagle didn't need frequent reindexing
because of inotify? Oh well...



updatedb didn't use much ram, but while it was running kswapd kept on
frequenting the top 10 cpu hogs - it would stick around for 5 seconds
or so then disappear for no more than 10 seconds, then come back
again.  This behaviour persisted during the run.  updatedb ran third
(beagleindexer was first, then update-dlocatedb)


Kswapd will use CPU when memory is low, even if there is no swapping.

Your "buffers" grew by 600% (from 50MB to 350MB), and slab also grew
by a few thousand entries. This is not just a problem when it pushes
out swap, it will also harm filebacked working set.

This (which Ray's traces also show) is a bit of a problem. As Andrew
noticed, use-once isn't working well for buffer cache, and it doesn't
really for dentry and inode cache either (although those don't seem
to be as much of a problem on your workload).

Andrew has done a little test patch for this in -mm, but it probably
wants more work and testing. If you can test the -mm kernel and see
if things are improved, that would help.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Blackfin arch update for 2.6.23

2007-08-05 Thread Bryan Wu

On Sun, 2007-08-05 at 22:04 -0400, Mike Frysinger wrote:
> On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> > Bryan Wu (4):
> >   Blackfin SPI driver: Initial supporting BF54x in SPI driver
> >
> > Michael Hennerich (11):
> >   Blackfin arch: store labels so we later know who allocated 
> > GPIO/Peripheral resources
> >   Blackfin arch: add peripheral resource allocation support
> >   Blackfin arch: Add label to call new GPIO API
> >   Blackfin SPI driver: Make BF54x SPI work and add support for  portmux 
> > API
> >   Blackfin SPI driver: use new GPIO API and add error handling
> 
> i think this is the sort of thing Linus wants left for initial merge windows ?
> -mike

Actually, this GPIO API has been added to the upstream in -RC1. In this
pull, Michael's patch just enable it in arch code and driver. And it is
tested at least 2-3 weeks, I think it is OK for the -RC merge.

And most our driver things are moved to depend on this new GPIO API. I
just wanna make thing easier to maintain.

Thanks Mike
- Bryan Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] Blackfin arch update for 2.6.23

2007-08-05 Thread Mike Frysinger

On 8/5/07, Bryan Wu <[EMAIL PROTECTED]> wrote:
> Bryan Wu (4):
>   Blackfin SPI driver: Initial supporting BF54x in SPI driver
>
> Michael Hennerich (11):
>   Blackfin arch: store labels so we later know who allocated 
> GPIO/Peripheral resources
>   Blackfin arch: add peripheral resource allocation support
>   Blackfin arch: Add label to call new GPIO API
>   Blackfin SPI driver: Make BF54x SPI work and add support for  portmux 
> API
>   Blackfin SPI driver: use new GPIO API and add error handling

i think this is the sort of thing Linus wants left for initial merge windows ?
-mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: high system cpu load during intense disk i/o

2007-08-05 Thread Andrew Morton

On Sun, 5 Aug 2007 19:03:12 +0300 Dimitrios Apostolou <[EMAIL PROTECTED]> wrote:

> was my report so complicated?

We're bad.

Seems that your context switch rate when running two instances of
badblocks against two different disks went batshit insane.  It doesn't
happen here.

Please capture the `vmstat 1' output while running the problematic
workload.

The oom-killing could have been unrelated to the CPU load problem.  iirc
badblocks uses a lot of memory, so it might have been genuine.  Keep an eye
on the /proc/meminfo output and send the kernel dmesg output from the
oom-killing event.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [rfc] balance-on-fork NUMA placement

2007-08-05 Thread Nick Piggin

On Fri, Aug 03, 2007 at 01:10:13PM -0700, Suresh B wrote:
> On Fri, Aug 03, 2007 at 02:20:10AM +0200, Nick Piggin wrote:
> > On Thu, Aug 02, 2007 at 11:33:39AM -0700, Martin Bligh wrote:
> > > Nick Piggin wrote:
> > > >On Wed, Aug 01, 2007 at 03:52:11PM -0700, Martin Bligh wrote:
> > > >>>And so forth.  Initial forks will balance.  If the children refuse to
> > > >>>die, forks will continue to balance.  If the parent starts seeing short
> > > >>>lived children, fork()s will eventually start to stay local.  
> > > >>Fork without exec is much more rare than without. Optimising for
> > > >>the uncommon case is the Wrong Thing to Do (tm). What we decided
> > > >
> > > >It's only the wrong thing to do if it hurts the common case too
> > > >much. Considering we _already_ balance on exec, then adding another
> > > >balance on fork is not going to introduce some order of magnitude
> > > >problem -- at worst it would be 2x but it really isn't too slow
> > > >anyway (at least nobody complained when we added it).
> > > >
> > > >One place where we found it helps is clone for threads.
> > > >
> > > >If we didn't do such a bad job at keeping tasks together with their
> > > >local memory, then we might indeed reduce some of the balance-on-crap
> > > >and increase the aggressiveness of periodic balancing.
> > > >
> > > >Considering we _already_ balance on fork/clone, I don't know what
> > > >your argument is against this patch is? Doing the balance earlier
> > > >and allocating more stuff on the local node is surely not a bad
> > > >idea.
> > > 
> > > I don't know who turned that on ;-( I suspect nobody bothered
> > > actually measuring it at the time though, or used some crap
> > > benchmark like stream to do so. It should get reverted.
> > 
> > So you have numbers to show it hurts? I tested some things where it
> > is not supposed to help, and it didn't make any difference. Nobody
> > else noticed either.
> > 
> > If the cost of doing the double balance is _really_ that painful,
> > then we ccould skip balance-on-exec for domains with balance-on-fork
> > set.
> 
> Nick, Even if it is not painful, can we skip balance-on-exec if
> balance-on-fork is set. There is no need for double balance, right?

I guess we could. There is no need for the double balance if the exec
happens immediately after the fork which is surely the common case. I
think there can be some other weird cases (eg multi-threaded code) that
does funny things though...


> Especially with the optimization you are trying to do with this patch,
> balance-on-exec may lead to wrong decision making this optimization
> not work as expected.

That's true.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel

Hi,

On Sun, 5 Aug 2007, Arjan van de Ven wrote:

> Timers are course resolution that is highly HZ-value dependent. For
> cases where you want a finer resolution, the kernel now has a way to
> provide that functionality... so why not use the quality of service this
> provides..

We're going in circles here. We have two different timer APIs for a 
reason, only because hrtimer provide better resolution, doesn't 
automatically make them the better generic timer.
There's no problem to provide a high resolution sleep, but there is also 
no reason to mess with msleep, don't fix what ain't broken...

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH][RESEND] fix a potential NULL pointer deref in XFS on failed mount.

2007-08-05 Thread David Chinner

On Sat, Aug 04, 2007 at 08:30:21PM +0200, Jesper Juhl wrote:
> Back in 2006 (2006-10-31 to be specific, reposted on 2006-11-16), I 
> submitted a patch to fix a potential NULL pointer deref in XFS on 
> failed mount.

Already checked into xfs-dev tree. Will go to next mainline merge.

http://oss.sgi.com/archives/xfs/2007-08/msg00030.html

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Enable lguest drivers in Kconfig

2007-08-05 Thread Rusty Russell

Lguest drivers need to default to "Y" otherwise they're never selected
for new builds.  (We don't bother prompting, because they're less than
4k combined, and implied by selecting lguest support).

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/lguest/Kconfig |2 ++
 1 file changed, 2 insertions(+)

===
--- a/drivers/lguest/Kconfig
+++ b/drivers/lguest/Kconfig
@@ -21,8 +21,10 @@ config LGUEST_GUEST
 
 config LGUEST_NET
tristate
+   default y
depends on LGUEST_GUEST && NET
 
 config LGUEST_BLOCK
tristate
+   default y
depends on LGUEST_GUEST && BLOCK


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Arjan van de Ven

> > because a lot of parts of the kernel think and work in milliseconds,
> > it's logical and USEFUL to at least provide an interface that works on
> > milliseconds.
> 
> If the millisecond resolution is enough for these users, that means the 
> current msleep will work fine for them.

except that you get a 20ms minimum, and 10ms increment.

> This generalization is simply not true. First it requires the 
> HIGH_RES_TIMERS option to be enabled to really make a real difference.

so? you provide the best possible for the config options selected...


> > > If you don't like the hrsleep name, we can also call it nanosleep and so 
> > > match what we already do for userspace.
> > 
> > having a nanosleep *in addition* to msleep (or maybe nsleep() and
> > usleep() to have consistent naming) sounds reasonable to me.
> 
> We only need one sleep implementation of both and msleep is a fine name 
> for the current implementation - not only does it describe the unit, but 
> it also describe the best resolution one can expect from it.

that's... combining 2 independent things into one. That's not a really
good idea.


> I can give the question back, what do you have against simple timers, that 
> you want to make them as awkward as possible to use?

msleep() isn't about timers. The timer type used is an implementation
detail behind the interface

Timers are course resolution that is highly HZ-value dependent. For
cases where you want a finer resolution, the kernel now has a way to
provide that functionality... so why not use the quality of service this
provides..

> hrtimer have a higher usage cost depending on the clock source, so simply 
> using them only because they are the new cool kid in town doesn't make 
> sense.

no but they DO provide a much better quality of the api implementation;
instead of a 20ms timeout you get really close to what you asked for!

There have been drivers that did if (HZ<1000) mdelay(x); else msleep(x);
Yes that's horrible, but it's a clear sign that this matters.

>  It may not be that critical for a simple sleep implementation, but 
> that only means we should keep the API as simple as possible, that means 
> one low resolution, cheap msleep and one high resolution nanosleep is 
> enough. Why do you insist on making more complex than necessary?

ehm it was you who insisted on adding complexity to this; the initial
proposal was to just replace the msleep() implementation with one that
has a more gentle behavior (you ask for 1ms you get 1ms, not 20ms)

What really is your problem with that? "It may be more expensive on some
hardware?"

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 2.6.23-rc2

2007-08-05 Thread Jeff Chua

On 8/6/07, Rafael J. Wysocki <[EMAIL PROTECTED]> wrote:

> What does 's2ram -i' say about your machine?

This machine can be identified by:
sys_vendor   = "LENOVO"
sys_product  = "1702E7A"
sys_version  = "ThinkPad X60s"
bios_version = "7BETD0WW (2.11 )"


Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread David Chinner

On Sat, Aug 04, 2007 at 09:16:35PM +0200, Florian Weimer wrote:
> * Andrew Morton:
> 
> > The easy preventive is to mount with data=writeback.  Maybe that should
> > have been the default.
> 
> The documentation I could find suggests that this may lead to a
> security weakness (old data in blocks of a file that was grown just
> before the crash leaks to a different user).  XFS overwrites that data
> with zeros upon reboot, which tends to irritate users when it happens.

XFS has never overwritten data on reboot. It leaves holes when the kernel has
failed to write out data. A hole == zeros so XFS does not expose stale data in
this situation. As it is, the underlying XFS problem (lack of synchronisation
between inode size update and data writes has been mostly fixed in 2.6.22 by
only updating the file size to be written to disk on data I/O completion.

FWIW, fsync() would prevent this from happening, but many application writers
seem strangely reluctant to put fsync() calls into code to ensure the data
they write is safely on disk.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] msleep() with hrtimers

2007-08-05 Thread Roman Zippel

Hi,

On Sat, 4 Aug 2007, Arjan van de Ven wrote:

> > hr_msleep makes no sense. Why should we tie this interface to millisecond 
> > resolution?
> 
> because a lot of parts of the kernel think and work in milliseconds,
> it's logical and USEFUL to at least provide an interface that works on
> milliseconds.

If the millisecond resolution is enough for these users, that means the 
current msleep will work fine for them.

> > Your suggested msleep_approx makes not much sense to me either, since 
> > neither interface guarantees anything and may "approximate" the sleep 
> > (and if the user is surprised by that something else already went wrong).
> 
> an interface should try to map to the implementation that provides the
> best implementation quality of the requested thing in general. That's
> the hrtimers based msleep().

This generalization is simply not true. First it requires the 
HIGH_RES_TIMERS option to be enabled to really make a real difference. 
Second a hrtimers based msleep has a higher setup cost, which can't be 
completely ignored. "Best" is a subjective term here and can't be that 
easily generalized to all current users.

> > If you don't like the hrsleep name, we can also call it nanosleep and so 
> > match what we already do for userspace.
> 
> having a nanosleep *in addition* to msleep (or maybe nsleep() and
> usleep() to have consistent naming) sounds reasonable to me.

We only need one sleep implementation of both and msleep is a fine name 
for the current implementation - not only does it describe the unit, but 
it also describe the best resolution one can expect from it.

> Do you have something against hrtimer use in general? From your emails
> on this msleep topic it sort of seems you do 

I can give the question back, what do you have against simple timers, that 
you want to make them as awkward as possible to use?
hrtimer have a higher usage cost depending on the clock source, so simply 
using them only because they are the new cool kid in town doesn't make 
sense. It may not be that critical for a simple sleep implementation, but 
that only means we should keep the API as simple as possible, that means 
one low resolution, cheap msleep and one high resolution nanosleep is 
enough. Why do you insist on making more complex than necessary?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread David Chinner

On Sun, Aug 05, 2007 at 06:42:30AM -0400, Jeff Garzik wrote:
> Jakob Oestergaard wrote:
> >Oh dear.
> >
> >Why not just make ext3 fsync() a no-op while you're at it?
> >
> >Distros can turn it back on if it's needed...
> >
> >Of course I'm not serious, but like atime, fsync() is something one
> 
> No, they are nothing alike, and you are just making yourself look silly 
> if you compare them.  fsync has to do with fundamental guarantees about 
> data.

Hi Jeff - just as a point to note, I think you should check the spec
for fsync before stating that:

"It is explicitly intended that a null implementation is permitted."

and

"... fsync() might or might not actually cause data to be written where it is
safe from a power failure."

http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html

So fsync() does not have to provide the fundamental guarantees you think
it should.

Note - I'm not saying that this is at all sane (it's crazy, IMO), I'm just
pointing out that a "nofsync" mount option to avoid fsync overhead is a
legal thing to do

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: rtc max frequency setting

2007-08-05 Thread Michael Chang

On 8/4/07, H. Peter Anvin <[EMAIL PROTECTED]> wrote:
> Jan Engelhardt wrote:
> > Hi,
> >
> > with the old rtc.ko module, there was a /proc/sys/dev/rtc/max-user-freq
> > that could be set. With rtc_cmos.ko (or the new rtc infrastructure in
> > general), I am missing this file. Where can I set the max-user-freq now,
> > or is this obsolete now? (mplayer prefers to have user-freq to be >= 1024.)
> >
>
> Qemu wants something like this too.  Both of these really want something
> else, which is a high-frequency userspace timer.

For mplayer, you can use -softsleep, but that uses a lot of CPU, and
you're probably already using a great deal of CPU for video decoding,
so it might be less than optimal.

-softsleep
Time  frames  by  repeatedly  checking the current time instead of
asking the kernel to wake up MPlayer at the correct time.  Useful if
your kernel timing is imprecise and you cannot use the RTC either.
Comes at the price of higher CPU consumption.

And apparently, the build of MPlayer[1] that I have doesn't need rtc,
except on slower machines, according to the man page:

-rtc (RTC only)
Turns  on  usage  of the Linux RTC (realtime clock - /dev/rtc) as
timing mechanism.  This wakes up the process every 1/1024 seconds to
check the current time.  Useless with modern Linux kernels configured
for desktop use as they already wake up the process  with  similar
accuracy when using normal timed sleep.

[1] MPlayer dev-SVN-r23777-4.1.2 (C) 2000-2007 MPlayer Team

--
Michael Chang

Please avoid sending me Word or PowerPoint attachments. Send me ODT,
RTF, or HTML instead.
See http://www.gnu.org/philosophy/no-word-attachments.html
Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Theodore Tso

On Sat, Aug 04, 2007 at 09:16:35PM +0200, Florian Weimer wrote:
> * Andrew Morton:
> 
> > The easy preventive is to mount with data=writeback.  Maybe that should
> > have been the default.
> 
> The documentation I could find suggests that this may lead to a
> security weakness (old data in blocks of a file that was grown just
> before the crash leaks to a different user).  XFS overwrites that data
> with zeros upon reboot, which tends to irritate users when it happens.
> 
> From this point of view, data=ordered doesn't seem too bad.

The other alternative which addresses the security concern is
data=journal, which if you have a big enough journal, can sometimes be
*faster* than data=ordered or even data=writeback, because it reduces
seeking.  The problem is that it's workload dependent which is better;
if the workload is very, very heavy on data writes, each data block
ends up getting writen twice, once to the journal and once to the
final location on disk, and so this halves your total max write
bandwidth.  But if the workload doesn't do as much writing, and is
very seeky, and or is very, very, fsync()-centric (like a mailhub),
data=journal is probably the right answer.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG RT] WARNING: at kernel/sched.c:5071 2.6.23-rc1-rt7

2007-08-05 Thread Steven Rostedt

On Sun, 2007-08-05 at 08:56 +0200, Ingo Molnar wrote:
> * Steven Rostedt <[EMAIL PROTECTED]> wrote:

> 
> > P.S. I really found out that the system becomes VERY non-responsive 
> > when you run with both hard and softirqs as threads, but with 
> > PREEMPT_NONE ;-)
> 
> hm. That's not supposed to happen. Could there be some wakeup or softirq 
> processing problem?
> 
>   Ingo

Well, it happened right after I installed the ath driver (from
http://svn.madwifi.org/trunk ).  But I've recompiled my system with
PREEMPT_RT and it works fine now.  I can investigate it later, but I
currently need this system working with a PREEMPT_RT kernel.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: suspend/hibernation regression between 2.6.19 and 2.6.20 w/ Thinkpad T41

2007-08-05 Thread Pavel Machek

Hi!

> It is a small - but IMHO nagging - regression between these 2 kernel versions.
> 
> To make a "software suspend" at this notebook ("suspend to RAM") you have
> to press + . Pressing the -Key after that wakes up the notenbook.
> 

> If you hibernated the system ("suspend to disc"), you have to press the power
> button to wake up the notebook.

Yes, I seen similar reports. Does it happen in all shutdown mode and
2.6.22? Does it happen   in platform mode in 2.6.19?


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.22.y] ieee1394: revert "sbp2: enforce 32bit DMA mapping"

2007-08-05 Thread Benjamin Herrenschmidt

On Sun, 2007-08-05 at 09:54 +0200, Stefan Richter wrote:
> Benjamin Herrenschmidt wrote:
> >>> If setting 32-bit DMA mask fails on ppc64, that sounds like a problem
> >>> with the DMA implementation on that architecture. There are enough cards
> >>> out there that only support 32-bit DMA that this really needs to work..
> >> Yes, could the PPC folks please have a look at it?  Thanks.
> > 
> > Smells like we may have a bug there. No worries though, all current PPC
> > machines have an iommu that will not give out addresses above 32 bits
> > anyway, but I'll double check what's up.
> > 
> > Do you see something in dmesg when that happens ?
> 
> There was nothing in Olaf's report, except for trouble in sbp2 _after_
> the failure.  http://lkml.org/lkml/2007/8/1/344  (I don't have a PMac.)

Hrm, allright, that's a bit weird. Olaf machine has only 256M of RAM
according to that dmesg, and thus, the kernel isn't enabling the iommu,
we use the "trivial" version of the dma mapping ops.

I suspect we have a bug in our imlementation of set_dma_mask though, in
that it does the "dma_supported" check using the previous mask and not
the one passed in :-)

The implementation of dma_supported that we hit in the no-iommu case
looks like that:

static int dma_direct_dma_supported(struct device *dev, u64 mask)
{
/* Could be improved to check for memory though it better be
 * done via some global so platforms can set the limit in case
 * they have limited DMA windows
 */
return mask >= DMA_32BIT_MASK;
}

So that should have worked. (The comment is a bit obscure, just ignore
it for now).

However, as I said above, our dma_set_mask() wrapper uses the wrong
value (the old, not the new mask). But that still should have worked
since the default dma mask for a PCI device is 0x

Thus at this stable, I'm a bit at a loss of why it didn't work, I'll
have to test on one of those machines with some printk's in when I
manage to get to work (dunno when, kid's sick so I may have to stay home
today).

BTW. Any reason why you don't set the DMA mask in the ohci driver rather
than the sbp2 one ?

Cheers,
Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "Badness at kernel/irq/resend.c:70" on boot - via-pmu?

2007-08-05 Thread Benjamin Herrenschmidt

On Sat, 2007-08-04 at 21:41 -0700, Linus Torvalds wrote:
> 
> On Sun, 5 Aug 2007, Paul Collins wrote:
> > 
> > I got the message below on boot with 2.6.23-rc2 on my PowerBook.
> 
> It's a debug message, I think we need to remove it. It's trying to figure 
> out what goes wrong with one particular machine, and I probably shouldn't 
> have merged it for mainline.
> 
> Ignore it, it will be gone soon enough, and it should happen just once per 
> boot.

Actually, it's interesting as that irq shouldn't hit that path :-) Not
critical (won't break anything), but still something I'll look into just
in case it hides something bad.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Possible bug in realtek 8169 ethernet driver

2007-08-05 Thread Francois Romieu

Bram <[EMAIL PROTECTED]> :
[...]
> The router attached to it indicates a 100mbit link. But that's about it.
> I cannot get any data over it. I can manually configure it to have an IP
> address and netmask, but it won't see anything on the local net. DHCP
> doesn't work either. Nothing out of the ordinary is logged in dmesg or
> anywhere else. An usb ethernet dongle on the system works just fine,
> ruling out (absent anyway) firewall or similar trouble.  The device works
> well in windows XP.

Please try the patch below on top of 2.6.23-rc2 ?

> Relevant system specs:
> -Gigabyte GA-G33m-DS2R motherboard, with the integrated realtec nic

Ok, unknown beast.

[...]
> dmesg output:
> 
> r8169 Gigabit Ethernet driver 2.2LK loaded
> ACPI: PCI Interrupt :04:00.0[A] -> GSI 17 (level, low) -> IRQ 18
> PCI: Setting latency timer of device :04:00.0 to 64
> eth0: RTL8168b/8111b at 0xf8854000, 00:1a:4d:44:a1:1f, IRQ 18
> ...
> r8169: eth2: link up
> r8169: eth2: link up

Do not hesitate to send a whole dmesg. More context to not hurt.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 8be51c4..fecedef 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -789,6 +789,12 @@ static int rtl8169_set_speed_xmii(struct net_device *dev,
 
auto_nego |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
 
+   if (tp->mac_version == RTL_GIGA_MAC_VER_12) {
+   /* Vendor specific (0x1f) and reserved (0x0e) MII registers. */
+   mdio_write(ioaddr, 0x1f, 0x);
+   mdio_write(ioaddr, 0x0e, 0x);
+   }
+
tp->phy_auto_nego_reg = auto_nego;
tp->phy_1000_ctrl_reg = giga_ctrl;
 
-- 
1.4.4.2

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SysV IPC: shmctl/msgctl/semctl returns EIDRM instead of EINVAL

2007-08-05 Thread Anton Arapov


  Please, fellas, take a look on my post!

Thanks in advance.

Anton Arapov <[EMAIL PROTECTED]> writes:
> Hi!
>
>   SysV code returns EIDRM for collision of IDs. I sure it should return 
> EINVAL.
>
>   Steps to reproduce: (this for shared memory code, for msg/sem it is the 
> same)
>1. Create then drop 2 shmem segments, then create a third.
>2. Try to shmctl(IPC_STAT) the two now-invalid shm IDs.
>3. Note error codes returned.
>
>One call gives EINVAL, one gives EIDRM due to collision with the third 
> shmem segment.
>Should both give EINVAL, this is what I've got on every other Unix I've 
> tried it on. 
>
>   IPC code is good, EIDRM is justification of EINVAL. But neither SVr4 nor 
> SVID documents EIDRM. 
>   Single Unix Specification mentions EINVAL but not EIDRM as a possible 
> failure for shmctl(), so the current kernel behavior is not merely 
> self-inconsistent but a flat violation of the spec. 
>
>   Can somebody explain why do we have EIDRM?
>
> Anton.
> SUS: http://www.opengroup.org/onlinepubs/007908799/xsh/shmctl.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips

On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote:
> On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote:
> > > DST original code worked as device mapper plugin too, but its two
> > > additional allocations (io and clone) per block request ended up
> > > for me as a show stopper.
> >
> > Ah, sorry, I misread.  A show stopper in terms of efficiency, or in
> > terms of deadlock?
>
> At least as in terms of efficiency. Device mapper lives in happy
> world where memory does not end and allocations are fast.

Are you saying that things are different for a network block device 
because it needs to do GFP_ATOMIC allocations?  If so then that is just 
a misunderstanding.  The global page reserve Peter and I use is 
available in interrupt context just like GFP_ATOMIC.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips

On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote:
> If we are sleeping in memory pool, then we already do not have memory
> to complete previous requests, so we are in trouble.

Not at all.  Any requests in flight are guaranteed to get the resources 
they need to complete.  This is guaranteed by the combination of memory 
reserve management and request queue throttling.  In logical terms, 
reserve management plus queue throttling is necessary and sufficient to 
prevent these deadlocks.  Conversely, the absence of either one allows 
deadlock.

> This can work 
> for devices which do not require additional allocations (like usual
> local storage), but not for network connected ones.

It works for network devices too, and also for a fancy device like 
ddsnap, which is the moral equivalent of a filesystem implemented in 
user space.

> If not in device, then at least it should say to block layer about
> its limits. What about new function to register queue...

Yes, a new internal API is needed eventually.  However, no new api is 
needed right at the moment because we can just hard code the reserve 
sizes and queue limits and audit them by hand, which is not any more 
sloppy than several other kernel subsystems.  The thing is, we need to 
keep any obfuscating detail out of the initial patches because these 
principles are hard enough to explain already without burying them in 
hundreds of lines of API fluff.

That said, the new improved API should probably not be a new way to 
register, but a set of function calls you can use after the queue is 
created, which follows the pattern of the existing queue API.

> ...which will get 
> maximum number of bios in flight and sleep in generic_make_request()
> when new bio is going to be submitted and it is about to exceed the
> limit?

Exactly.  This is what ddsnap currently does and it works.  But we did 
not change generic_make_request for this driver, instead we throttled 
the driver from the time it makes a request to its user space server, 
until the reply comes back.  We did it that way because it was easy and 
was the only segment of the request lifeline that could not be fixed by 
other means.  A proper solution for all block devices will move the 
throttling up into generic_make_request, as you say below.

> By default things will be like they are now, except additional
> non-atomic increment and branch in generic_make_request() and
> decrement and wake in bio_end_io()?

->endio is called in interrupt context, so the accounting needs to be 
atomic as far as I can see.

We actually account the total number of bio pages in flight, otherwise 
you would need to assume the largest possible bio and waste a huge 
amount of reserve memory.  A counting semaphore works fine for this 
purpose, with some slight inefficiency that is nigh on unmeasurable in 
the block IO path.  What the semaphore does is make the patch small and 
easy to understand, which is important at this point.

> I can cook up such a patch if idea worth efforts.

It is.  There are some messy details...  You need a place to store the 
accounting variable/semaphore and need to be able to find that place 
again in ->endio.  Trickier than it sounds, because of the unstructured 
way drivers rewrite ->bi_bdev.   Peterz has already poked at this in a 
number of different ways, typically involving backing_dev_info, which 
seems like a good idea to me.

A simple way to solve the stable accounting field issue is to add a new 
pointer to struct bio that is owned by the top level submitter 
(normally generic_make_request but not always) and is not affected by 
any recursive resubmission.  Then getting rid of that field later 
becomes somebody's summer project, which is not all that urgent because 
struct bio is already bloated up with a bunch of dubious fields and is 
a transient structure anyway.

Regards,

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Andi Kleen

Brice Figureau <[EMAIL PROTECTED]> writes:
> 
>  2) I _still_ don't get the "performances" of 2.6.17, but since that's the
> better combination I could get, I think there is IMHO progress in the right
> direction (to be compared to no progress since 2.6.18, that's better :-)).

If you could characterize your workload well (e.g. how many disks,
what file systems, what load on mysql) perhaps it would be possible
to reproduce the problem with a test program or a mysql driver.
Then it could be bisected.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait_task_zombie: remove unneeded child->signal check

2007-08-05 Thread Oleg Nesterov

On 08/05, Roland McGrath wrote:
>
> > A zombie must have a valid ->signal, we are going to release it and
> > __exit_signal() starts with BUG_ON(!sig).
> 
> Yes, this is safe because it's after the EXIT_DEAD check under tasklist_lock.

Yes thanks, the changelog could be better.

We "own" this child (it was us sho set EXIT_DEAD), nobody can release it
including the child itself (it already passed exit_notify()). We could even
drop tasklist, but we need ->parent->sighand->siglock.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] UML - Simplify helper stack handling

2007-08-05 Thread Andrew Morton

On Sun, 5 Aug 2007 22:41:14 +0200 Luca Tettamanti <[EMAIL PROTECTED]> wrote:

> Il Wed, Jun 27, 2007 at 11:37:01PM -0700, Andrew Morton ha scritto: 
> > 
> > So I'm running the generic version of this on i386 with 8k stacks (below),
> > with a quick LTP run.
> > 
> > Holy cow, either we use a _lot_ of stack or these numbers are off:
> > 
> > vmm:/home/akpm> dmesg -s 100|grep 'bytes left' 
> > khelper used greatest stack depth: 7176 bytes left
> > khelper used greatest stack depth: 7064 bytes left
> > khelper used greatest stack depth: 6840 bytes left
> > khelper used greatest stack depth: 6812 bytes left
> > hostname used greatest stack depth: 6636 bytes left
> > uname used greatest stack depth: 6592 bytes left
> > uname used greatest stack depth: 6284 bytes left
> > hotplug used greatest stack depth: 5568 bytes left
> > rpc.nfsd used greatest stack depth: 5136 bytes left
> > chown02 used greatest stack depth: 4956 bytes left
> > fchown01 used greatest stack depth: 4892 bytes left
> > 
> > That's the sum of process stack and interrupt stack, but I doubt if this
> > little box is using much interrupt stack space.
> > 
> > No wonder people are still getting stack overflows with 4k stacks...
> 
> Hi Andrew,
> I was a bit worried about stack usage on my setup and google found your
> mail :P
> 
> FYI:
> 
> khelper used greatest stack depth: 3228 bytes left
> khelper used greatest stack depth: 3124 bytes left
> busybox used greatest stack depth: 2808 bytes left
> modprobe used greatest stack depth: 2744 bytes left
> busybox used greatest stack depth: 2644 bytes left
> modprobe used greatest stack depth: 1836 bytes left
> modprobe used greatest stack depth: 1176 bytes left
> java used greatest stack depth: 932 bytes left
> java used greatest stack depth: 540 bytes left
> 
> I'm running git-current, with 4KiB stacks; filesystems are ext3 and XFS
> on LVM (on libata devices).
> Does it make sense to raise STACK_WARN to get a stack trace in do_IRQ?
> Or is 540 bytes still "safe" taking into account the separate IRQ stack?
> 

540 bytes free means that we've used 90% of the stack.  I'd say it is
extremely unsafe.

Unbelieveably unsafe.  I'm suspecting that the instrumentation is lying to
us for some reason.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] wait_task_zombie: remove unneeded child->signal check

2007-08-05 Thread Roland McGrath

> A zombie must have a valid ->signal, we are going to release it and
> __exit_signal() starts with BUG_ON(!sig).

Yes, this is safe because it's after the EXIT_DEAD check under tasklist_lock.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] exit_notify: don't take tasklist for TIF_SIGPENDING re-targeting

2007-08-05 Thread Roland McGrath

Looks fine to me.

Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] zap_other_threads: don't optimize thread_group_empty() case

2007-08-05 Thread Roland McGrath

Looks fine to me.

Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [1/3] 2.6.23-rc2: known regressions

2007-08-05 Thread Henrique de Moraes Holschuh

On Sun, 05 Aug 2007, Michal Piotrowski wrote:
> Subject : T60 ACPI issues/THINKPAD_ACPI_INPUT_ENABLED seems regressive
> References  : http://lkml.org/lkml/2007/8/1/198
>   http://lkml.org/lkml/2007/8/1/176
> Last known good : ?
2.6.22
> Submitter   : Michael S. Tsirkin <[EMAIL PROTECTED]>
>   Hugh Dickins <[EMAIL PROTECTED]>
> Caused-By   : ?
1a343760b516ca5466d201bec32b1794858b18a5
> Handled-By  : Henrique de Moraes Holschuh <[EMAIL PROTECTED]>
> Status  : problem is being debugged
Patch available, and sent upstream for merge
http://thread.gmane.org/gmane.linux.acpi.devel/24646

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] implement smarter atime updates support, v2

2007-08-05 Thread Theodore Tso

On Sun, Aug 05, 2007 at 09:28:38PM +0200, Ingo Molnar wrote:
> 
> added the relatime_interval sysctl that allows the changing of the atime 
> update frequency. (default: 1 day / 86400 seconds)

What if you specify the interval as a per-mount option?  i.e., 

mount -o relatime=86400 /dev/sda2 /u1

If you had this, I don't think we would need the sysctl tuning parameter.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Christoph Hellwig

On Sun, Aug 05, 2007 at 09:57:02AM +0200, Florian Weimer wrote:
> For instance, some editors don't perform fsync-then-rename, but simply
> truncate the file when saving (because they want to preserve hard
> links).  With XFS, this tends to cause null bytes on crashes.  Since
> ext3 has got a much larger install base, this would result in lots of
> bug reports, I fear.

XFS has recently been changed to only updated the on-disk i_size after
data writeback has finished to get rid of this irritation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Christoph Hellwig

On Sun, Aug 05, 2007 at 02:26:53AM +0200, Andi Kleen wrote:
> I always thought the right solution would be to just sync atime only
> very very lazily. This means if a inode is only dirty because of an
> atime update put it on a "only write out when there is nothing to do
> or the memory is really needed" list.

Which is the policy I implemented for XFS a while ago.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFT] finish i386 and x86-64 sysdata conversion

2007-08-05 Thread Yinghai Lu

On 8/5/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:
> Yinghai Lu wrote:
> > pci_scan_bus_on_node(int bus, struct pci_ops *ops, int node)
> > x86_pci_scan_root_bus(int bus)
> > {
> >   pci_scan_bus_on_node(bus, _root_ops, -1);
> > }
> >
> > i need node as one param for my patch later in irq.c and legacy.c
>
>
> It is a mistake to start coding NUMA details into pci scan functions.
>
> Anywhere the current code does not set the NUMA node, set it to -1 or
> some other default value.

Can you check
http://lkml.org/lkml/2007/7/26/377
http://lkml.org/lkml/2007/7/26/378
http://lkml.org/lkml/2007/7/26/379

it will make sure numa_node on device get correct value after pci scan.
esp for k8 system with second peer root bus on second node.

Thanks

Yinghai Lu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] UML - Simplify helper stack handling

2007-08-05 Thread Luca Tettamanti

Il Wed, Jun 27, 2007 at 11:37:01PM -0700, Andrew Morton ha scritto: 
> 
> So I'm running the generic version of this on i386 with 8k stacks (below),
> with a quick LTP run.
> 
> Holy cow, either we use a _lot_ of stack or these numbers are off:
> 
> vmm:/home/akpm> dmesg -s 100|grep 'bytes left' 
> khelper used greatest stack depth: 7176 bytes left
> khelper used greatest stack depth: 7064 bytes left
> khelper used greatest stack depth: 6840 bytes left
> khelper used greatest stack depth: 6812 bytes left
> hostname used greatest stack depth: 6636 bytes left
> uname used greatest stack depth: 6592 bytes left
> uname used greatest stack depth: 6284 bytes left
> hotplug used greatest stack depth: 5568 bytes left
> rpc.nfsd used greatest stack depth: 5136 bytes left
> chown02 used greatest stack depth: 4956 bytes left
> fchown01 used greatest stack depth: 4892 bytes left
> 
> That's the sum of process stack and interrupt stack, but I doubt if this
> little box is using much interrupt stack space.
> 
> No wonder people are still getting stack overflows with 4k stacks...

Hi Andrew,
I was a bit worried about stack usage on my setup and google found your
mail :P

FYI:

khelper used greatest stack depth: 3228 bytes left
khelper used greatest stack depth: 3124 bytes left
busybox used greatest stack depth: 2808 bytes left
modprobe used greatest stack depth: 2744 bytes left
busybox used greatest stack depth: 2644 bytes left
modprobe used greatest stack depth: 1836 bytes left
modprobe used greatest stack depth: 1176 bytes left
java used greatest stack depth: 932 bytes left
java used greatest stack depth: 540 bytes left

I'm running git-current, with 4KiB stacks; filesystems are ext3 and XFS
on LVM (on libata devices).
Does it make sense to raise STACK_WARN to get a stack trace in do_IRQ?
Or is 540 bytes still "safe" taking into account the separate IRQ stack?

Luca
-- 
42
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Andrew Morton

On Sun, 5 Aug 2007 22:21:12 +0200 Jörn Engel <[EMAIL PROTECTED]> wrote:

> On Sun, 5 August 2007 20:37:14 +0200, Jörn Engel wrote:
> > 
> > Guess I should throw in a kernel compile test as well, just to get a
> > feel for the performance.
> 
> Three runs each of noatime, relatime and atime, both with cold caches
> and with warm caches.  Scripts below.  Run on a Thinkpad T40, 1.5GHz,
> 2GiB RAM, 60GB 2.5" IDE disk, ext3.
> 
> Biggest difference between atime and noatime (median run, cold cache) is
> ~2.3%, nowhere near the numbers claimed by Ingo.  Ingo, how did you
> measure 10% and more?

Ingo had CONFIG_DEBUG_INFO=y, which generates heaps more writeout,
but no additional atime updates.

Ingo had a faster computer ;)  That will generate many more MB/sec
write traffic, so the cost of those atime seeks becomes proportionally
higher.  Basically: you're CPU-limited, Ingo is seek-limited.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Christoph Hellwig

On Sat, Aug 04, 2007 at 09:42:59PM +0200, J??rn Engel wrote:
> On Sat, 4 August 2007 21:26:15 +0200, J??rn Engel wrote:
> > 
> > Given the choice between only "atime" and "noatime" I'd agree with you.
> > Heck, I use it myself.  But "relatime" seems to combine the best of both
> > worlds.  It currently just suffers from mount not supporting it in any
> > relevant distro.
> 
> And here is a completely untested patch to enable it by default.  Ingo,
> can you see how good this fares compared to "atime" and
> "noatime,nodiratime"?

Umm, no f**king way.  atime selection is 100% policy and belongs into
userspace.  Add to that the problem that we can't actually re-enable
atimes because of the way the vfs-level mount flags API is designed.
Instead of doing such a fugly kernel patch just talk to the handfull
of distributions that matter to update their defaults.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Christoph Hellwig

On Sun, Aug 05, 2007 at 11:01:18AM -0700, Arjan van de Ven wrote:
> 
> on the journalling side this would be one transaction (not 5 milion)
> and... since inodes are grouped on disk, you can even get some better
> coalescing this way... 
> 
> Wonder if we could do inode-grouping smartly; eg if we HAVE to write
> inode X, also write out the atime-dirty inodes in range X-Y to X+Y
> (where Y is some tunable) in the same IO..

We already have filesystems in the tree that do such advances things as
inode writeback clustering for more than ten years :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Jörn Engel

On Sun, 5 August 2007 20:37:14 +0200, Jörn Engel wrote:
> 
> Guess I should throw in a kernel compile test as well, just to get a
> feel for the performance.

Three runs each of noatime, relatime and atime, both with cold caches
and with warm caches.  Scripts below.  Run on a Thinkpad T40, 1.5GHz,
2GiB RAM, 60GB 2.5" IDE disk, ext3.

Biggest difference between atime and noatime (median run, cold cache) is
~2.3%, nowhere near the numbers claimed by Ingo.  Ingo, how did you
measure 10% and more?

noatime, cold cache relatime, cold cacheatime, cold cache

real2m10.242s   real2m10.549s   real2m10.388s
user1m46.886s   user1m46.680s   user1m47.000s
sys 0m8.243ssys 0m8.423ssys 0m8.239s

real2m11.270s   real2m11.212s   real2m14.280s
user1m46.940s   user1m46.776s   user1m46.670s
sys 0m8.139ssys 0m8.283ssys 0m8.503s

real2m11.601s   real2m14.861s   real2m14.335s
user1m46.920s   user1m47.103s   user1m46.846s
sys 0m8.246ssys 0m8.266ssys 0m8.349s

noatime, warm cache relatime, warm cacheatime, warm cache

real1m55.894s   real1m56.053s   real1m56.905s
user1m46.683s   user1m46.600s   user1m46.853s
sys 0m8.186ssys 0m8.349ssys 0m8.249s

real1m55.823s   real1m56.093s   real1m57.077s
user1m46.583s   user1m46.913s   user1m46.590s
sys 0m8.259ssys 0m7.966ssys 0m8.523s

real1m55.789s   real1m56.214s   real1m57.224s
user1m46.803s   user1m46.753s   user1m46.953s
sys 0m8.053ssys 0m8.113ssys 0m8.113s

Jörn

-- 
Data expands to fill the space available for storage.
-- Parkinson's Law

Cold cache script:
#!/bin/sh
make distclean
echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches
make allnoconfig
time make

Warm cache script:
#!/bin/sh
make distclean
make allnoconfig
rgrep laksdflkdsaflkadsfja .
time make
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] implement smarter atime updates support

2007-08-05 Thread Arjan van de Ven

On Sun, 2007-08-05 at 21:04 +0100, Alan Cox wrote:
> O> you might want to add
> > 
> > /* 
> >  * if the inode is dirty already, do the atime update since
> >  * we'll be doing the disk IO anyway to clean the inode.
> >  */
> > if (inode->i_state & I_DIRTY)
> > return 1;
> 
> This makes the actual result somewhat less predictable. Is that wise ?
> Right now its clear what happens based on what user sequence of events
> and that this is easily repeatable.

I can see the repeatability argument; on the flipside, having a system
of "opportunistic atime", eg as good as you can go cheaply, but with
minimum guarantees has some attraction as well. For example one could
imagine a system where the inode gets it's atime updated anyway, just
not flagged for writing back to disk. If it later undergoes some event
that would cause it to go to disk, it gets preserved...

otoh that's even more unpredictable since VM pressure could drop this
update early.

For the dirty case, such drawbacks don't exist; it's just one more step
of "when we can cheaply".

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.23 regression fix] fix thinkpad_acpi without hardware

2007-08-05 Thread Henrique de Moraes Holschuh

On Sun, 05 Aug 2007, Adrian Bunk wrote:
> René Treffer reported that booting a CONFIG_THINKPAD_ACPI=y kernel on a 
> machine without the hardware results in an Oops.
> 
> The trace is thinkpad_acpi_module_init -> thinkpad_acpi_module_exit -> 
> driver_remove_file -> sysfs_hash_and_remove.
> 
> The error handling if thinkpad_acpi_module_init() fails generally looks 
> suspicious, but this patch at least fixes the common case if no hardware 
> was found, and it seems in this case there isn't any cleanup 
> actually required.
> 
> Broken by commit d5a2f2f1d68e2da538ac28540cddd9ccc733b001.
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

NAK

Proper fix already sent to Len Brown, and already queued to be pulled by
Linus.

Len, that'd be "ACPI: thinkpad-acpi: fix the module init failure path",
http://thread.gmane.org/gmane.linux.acpi.devel/24413

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch] panic.c

2007-08-05 Thread Heiko Carstens

> >The idea behind this is to keep the power usage on panicd machines
> >(without auto-reboot) low. Another point is in an Virtual Machine
> >environment the process of the VM is using 100% of the host-cpu. This
> >would stuck other programs or VMs. This patch brings the VM to stop and
> >keeps the cpu usage below 1%.

For VM environments it would be better to have an interface that tells
the hypervisor that your guest is dead. That's what the disabled_wait()
line in panic() is good for on s390.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: suspend/hibernation regression between 2.6.19 and 2.6.20 w/ Thinkpad T41

2007-08-05 Thread Henrique de Moraes Holschuh

On Sun, 05 Aug 2007, Toralf Förster wrote:
> It is a small - but IMHO nagging - regression between these 2 kernel versions.
> 
> To make a "software suspend" at this notebook ("suspend to RAM") you have
> to press + . Pressing the -Key after that wakes up the notenbook.
> 
> If you hibernated the system ("suspend to disc"), you have to press the power
> button to wake up the notebook.
> 
> But now there is an issue if you want to wake up this notebook, after it was
> suspended with + again. It is now not possible to wake it up with 
> ,
> instead you have to press the power button as you would have it to do after a
> hibernation.
> 
> This issue occures in the current 2.6.21 kernel too.
> (all tested at a stable Gentoo system - also with git-kernel-versions).

I am at a loss of how thinkpad-acpi could in any way cause, or change, the
firmware wake-up notification behaviour.

That said, my T43 with 2.6.21 and latest-of-the-latest thinkpad-acpi wakes
up from S3 just fine by pressing the "Fn" key and holding it down for ~2s.
I didn't know it did that :-)  I always use the power button.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ufs: move non-layout parts of ufs_fs.h to fs/ufs/

2007-08-05 Thread Christoph Hellwig

On Sat, Aug 04, 2007 at 08:36:49PM +0100, Al Viro wrote:
> On Sat, Aug 04, 2007 at 11:24:31PM +0400, Evgeniy Dushistov wrote:
> > Move prototypes and in-core structures to fs/ufs/ similar to what most
> > other filesystems already do.
> > 
> > I made little modifications: move also ufs debug macros and
> > mount options constants into fs/ufs/ufs.h, this stuff
> > also private for ufs.
> 
> Is there any reason to have util.h included directly?  Or to have
> it as a separate file and not a part of ufs.h, while we are at it...

I didn't want to do too many different things at once, but getting
rid of util.h sounds fine to me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: high system cpu load during intense disk i/o

2007-08-05 Thread Rafał Bilski

Hello and thanks for your reply. 
Hi, 
The cron job that is running every 10 min on my system is mpop (a 
fetchmail-like program) and another running every 5 min is mrtg. Both 
normally finish within 1-2 seconds. 

The fact that these simple cron jobs don't finish ever is certainly because of 
the high system CPU load. If you see the two_discs_bad.txt which I attached 
on my original message, you'll see that *vmlinux*, and specifically the 
*scheduler*, take up most time. 

And the fact that this happens only when running two i/o processes but when 
running only one everything is absolutely snappy (not at all slow, see 
one_disc.txt), makes me sure that this is a kernel bug. I'd be happy to help 
but I need some guidance to pinpoint the problem. 
OK, but first can You try to fix Your cron daemon? Just make sure that if mpop 
is already started it won't be started again. Maybe something like "pgrep mpop" 
and "if [ $?". 
I don't remember exactly, but some time ago somebody had problem with to large 
disk buffers and sync(). Check LKML archives. MPOP is doing fsync().
You have VIA chipset. Me too. It isn't very reliable. Don't You have something 
like "error { d0 BUSY }" in dmesg? This would explain high CPU load. Simply 
DMA isn't used after such error and disk goes to PIO mode. On two disk system 
load is about 4.0 in this case. Simple program takes hours to complete if 
there is havy I/O in progress. Btw. SLUB seems to behave better in this 
situation (at least up to 8.0).
Thanks, 
Dimitris

Regards
Rafał


--
Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz, latwiej je
oczarujesz 


http://link.interia.pl/f1b17


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] implement smarter atime updates support

2007-08-05 Thread Alan Cox

O> you might want to add
> 
>   /* 
>* if the inode is dirty already, do the atime update since
>* we'll be doing the disk IO anyway to clean the inode.
>*/
>   if (inode->i_state & I_DIRTY)
>   return 1;

This makes the actual result somewhat less predictable. Is that wise ?
Right now its clear what happens based on what user sequence of events
and that this is easily repeatable.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sonypi: Fix initialization warning

2007-08-05 Thread Richard Knutsson


Thomas Renninger wrote:

On Sun, 2007-08-05 at 21:05 +0200, Richard Knutsson wrote:
  

Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
---
Got this from the compiler (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)):
drivers/char/sonypi.c:1153: warning: initialization from incompatible pointer 
type


diff --git a/drivers/char/sonypi.c b/drivers/char/sonypi.c
index 73037a4..2dcd519 100644
--- a/drivers/char/sonypi.c
+++ b/drivers/char/sonypi.c
@@ -1147,10 +1147,15 @@ static int sonypi_acpi_remove(struct acpi_device 
*device, int type)
return 0;
 }
 
+static const struct acpi_device_id sonypi_acpi_driver_ids[] = {

+{ACPI_PROCESSOR_HID, 0},


This is wrong. You need to take the HID the driver should match for, in
this case: {"SNY6001", 0 },
  

Oh bugger, a cut 'n' paste error...
Thanks for reviewing it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

encrypted hibernation (was Re: Hibernation considerations)

2007-08-05 Thread Pavel Machek

Hi!

> > > Two things which I think would be nice to consider are:
> > >1) Encryption - I'd actually prefer if my luks device did not
> > >remember the key accross a hibernation; I want to be forced to
> > >reenter the phrase.  However I don't know what the best thing
> > >to do to partitions/applications using the luks device is.
> > 
> > Encryption is possible with both the userland hibernation (aka uswsusp) and
> > TuxOnIce (formerly known as suspend2).  Still, I don't consider it as a 
> > "must
> > have" feature for a framework to be generally useful (many users don't use 
> > it
> > anyway).
> 
> If a user uses an encrypted filesystem, then he also needs an encrypted
> swap and encrypted hibernation image: Otherwise the fileystem encryption
> is not very useful.

Actually, we can do most of that stuff already. 

We can encrypt filesystems, encrypt swaps (LVM), and encrypt hibernation.

What we _can't_ do is to hibernate on LVM encrypted partition, and we
could only suspend to swap partition. Bad combination, but here's way
out: just use separate (raw) partition for hibernation.

Ok, that needs re-partitioning; if that's bad, just swapoff before
hibernation and mkswap/swapon after its done.

Index: suspend.c
===
RCS file: /cvsroot/suspend/suspend/suspend.c,v
retrieving revision 1.82
diff -u -u -r1.82 suspend.c
--- suspend.c   29 Jul 2007 12:48:10 -  1.82
+++ suspend.c   5 Aug 2007 19:49:05 -
@@ -59,6 +59,7 @@
 static unsigned long pref_image_size = IMAGE_SIZE;
 static int suspend_loglevel = SUSPEND_LOGLEVEL;
 static char compute_checksum;
+static int raw_partition = 1;
 #ifdef CONFIG_COMPRESS
 static char compress;
 #else
@@ -184,6 +185,9 @@
int error;
loff_t free_swap;
 
+   if (raw_partition)
+   return 1*1024*1024*1024;
+
error = ioctl(dev, SNAPSHOT_AVAIL_SWAP, _swap);
if (!error)
return free_swap;
@@ -197,6 +201,12 @@
int error;
loff_t offset;
 
+   if (raw_partition) {
+   static int cur_offset = 0;
+   cur_offset += page_size;
+   return cur_offset;
+   }
+
error = ioctl(dev, SNAPSHOT_GET_SWAP_PAGE, );
if (!error)
return offset;
@@ -205,6 +215,8 @@
 
 static inline int free_swap_pages(int dev)
 {
+   if (raw_partition)
+   return 0;
return ioctl(dev, SNAPSHOT_FREE_SWAP_PAGES, 0);
 }
 
@@ -213,6 +225,8 @@
struct resume_swap_area swap;
int error;
 
+   if (raw_partition)
+   return 0;
swap.dev = blkdev;
swap.offset = offset;
error = ioctl(dev, SNAPSHOT_SET_SWAP_AREA, );



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] implement smarter atime updates support

2007-08-05 Thread Arjan van de Ven


> +static int relatime_need_update(struct inode *inode, struct timespec now)
> +{
> + /*
> +  * Is mtime younger than atime? If yes, update atime:
> +  */
> + if (timespec_compare(>i_mtime, >i_atime) >= 0)
> + return 1;
> + /*
> +  * Is ctime younger than atime? If yes, update atime:
> +  */
> + if (timespec_compare(>i_ctime, >i_atime) >= 0)
> + return 1;
> +
> + /*
> +  * Is the previous atime value older than a day? If yes,
> +  * update atime:
> +  */
> + if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= 24*60*60)
> + return 1;


you might want to add

/* 
 * if the inode is dirty already, do the atime update since
 * we'll be doing the disk IO anyway to clean the inode.
 */
if (inode->i_state & I_DIRTY)
return 1;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[2.6.23 regression fix] fix thinkpad_acpi without hardware

2007-08-05 Thread Adrian Bunk

René Treffer reported that booting a CONFIG_THINKPAD_ACPI=y kernel on a 
machine without the hardware results in an Oops.

The trace is thinkpad_acpi_module_init -> thinkpad_acpi_module_exit -> 
driver_remove_file -> sysfs_hash_and_remove.

The error handling if thinkpad_acpi_module_init() fails generally looks 
suspicious, but this patch at least fixes the common case if no hardware 
was found, and it seems in this case there isn't any cleanup 
actually required.

Broken by commit d5a2f2f1d68e2da538ac28540cddd9ccc733b001.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
bfa7bcd2b872f2c20afa7f7260d9be7dffe92d2e 
diff --git a/drivers/misc/thinkpad_acpi.c b/drivers/misc/thinkpad_acpi.c
index fa80f35..c7432a7 100644
--- a/drivers/misc/thinkpad_acpi.c
+++ b/drivers/misc/thinkpad_acpi.c
@@ -4644,10 +4644,8 @@ static int __init thinkpad_acpi_module_init(void)
 
get_thinkpad_model_data(_id);
ret = probe_for_thinkpad();
-   if (ret) {
-   thinkpad_acpi_module_exit();
+   if (ret)
return ret;
-   }
 
/* Driver initialization */
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lvcreate on 2.6.22.1: kernel tried to execute NX-protected page

2007-08-05 Thread Juergen Kreileder

Arjan van de Ven wrote:
> On Sun, 2007-08-05 at 21:03 +0200, Juergen Kreileder wrote:
>> I've upgraded devmapper to 1.02.20 and lvm2 to 2.02.26.  Didn't help much,
>> I just got a the same BUG again:
>>
>> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
>> BUG: unable to handle kernel paging request at virtual address f492c1f8
> 
> 
> I suspect this is a module that got unloaded but still had some function
> pointer registered somewhere.
> 
> do you know if/which module that could be?
> (one trick is to compile the kernel such that you don't allow modules to
> be unloaded at all; if that makes it work it's clearly the type of bug I
> described above)

It's a static kernel, no modules.  I've attached the config.


Juergen




config-2.6.22.1-jk1-exec-shield.gz
Description: GNU Zip compressed data

Re: [2/3] 2.6.23-rc2: known regressions

2007-08-05 Thread Michal Piotrowski

Rafael J. Wysocki pisze:
> On Sunday, 5 August 2007 18:26, Michal Piotrowski wrote:
>> Hi all,
>>
>> Here is a list of some known regressions in 2.6.23-rc2.
>>
>> Feel free to add new regressions/remove fixed etc.
>> http://kernelnewbies.org/known_regressions
>>
>  
>> Power management
>>
>> Subject : Kconfig: 'SUSPEND_SMP' refers to undefined symbol 
>> 'HOTPLUG_CPU'
>> References  : http://lkml.org/lkml/2007/8/4/39
>> Last known good : ?
>> Submitter   : Meelis Roos <[EMAIL PROTECTED]>
>> Caused-By   : ?
>> Handled-By  : Rafael J. Wysocki <[EMAIL PROTECTED]>
>> Status  : unknown
> 
> Is being debugged.
> 
> Frankly, I don't know what to think about that.  With the Meelis' .config
> SUSPEND_SMP can't even be set ...

Meelis, have you used a "make randconfig" without "make oldconfig"?

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sonypi: Fix initialization warning

2007-08-05 Thread Thomas Renninger

On Sun, 2007-08-05 at 21:05 +0200, Richard Knutsson wrote:
> Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
> ---
> Got this from the compiler (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)):
> drivers/char/sonypi.c:1153: warning: initialization from incompatible pointer 
> type
> 
> 
> diff --git a/drivers/char/sonypi.c b/drivers/char/sonypi.c
> index 73037a4..2dcd519 100644
> --- a/drivers/char/sonypi.c
> +++ b/drivers/char/sonypi.c
> @@ -1147,10 +1147,15 @@ static int sonypi_acpi_remove(struct acpi_device 
> *device, int type)
>   return 0;
>  }
>  
> +static const struct acpi_device_id sonypi_acpi_driver_ids[] = {
> +{ACPI_PROCESSOR_HID, 0},
This is wrong. You need to take the HID the driver should match for, in
this case: {"SNY6001", 0 },

> +{"", 0},
> +};
> +
>  static struct acpi_driver sonypi_acpi_driver = {
>   .name   = "sonypi",
>   .class  = "hkey",
> - .ids= "SNY6001",
> + .ids= sonypi_acpi_driver_ids,
>   .ops= {
>  .add = sonypi_acpi_add,
>  .remove = sonypi_acpi_remove,

A patch from Eugene Teo fixing this is already on the list of Len's
patches.
See subject: [PATCH 04/12] sonypi: fix ids member of struct acpi_driver
posted yesterday on linux-acpi.
Anyway, thanks for the heads up,

Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lvcreate on 2.6.22.1: kernel tried to execute NX-protected page

2007-08-05 Thread Arjan van de Ven

On Sun, 2007-08-05 at 21:03 +0200, Juergen Kreileder wrote:
> I've upgraded devmapper to 1.02.20 and lvm2 to 2.02.26.  Didn't help much,
> I just got a the same BUG again:
> 
> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> BUG: unable to handle kernel paging request at virtual address f492c1f8

I suspect this is a module that got unloaded but still had some function
pointer registered somewhere.

do you know if/which module that could be?
(one trick is to compile the kernel such that you don't allow modules to
be unloaded at all; if that makes it work it's clearly the type of bug I
described above)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Ingo Molnar


* Alan Cox <[EMAIL PROTECTED]> wrote:

> > also add the CONFIG_DEFAULT_RELATIME kernel option, which makes 
> > "norelatime" the default for all mounts without an extra kernel boot 
> > option.
> 
> Should be a mount option.

it is already a mount option too.

> > +   relatime[FS] default to enabled relatime updates on all
> > +   filesystems.
> > +
> > +   relatime=   [FS] default to enabled/disabled relatime updates on
> > +   all filesystems.
> > +
> 
> Double patch

no - it was not a double patch, i made all the common variants valid 
boot options: "relatime", "relatime=0/1", "norelatime" and 
"norelatime=0/1". Anyway, this is mooth, in the latest (v2) version 
there's only a single boot parameter.

> > +config DEFAULT_RELATIME
> > +   bool "Mount all filesystems with relatime by default"
> > +   default y
> 
> Changes behaviour so probably should default n. Better yet it should 
> be the mount option so its flexible and strongly encouraged for 
> vendors.

relatime is a mount option already. And distros can disable it if they 
want. (they are conscious about their kernel config selections anyway.)

> > +0
> > +#endif
> > +;
> 
> This ifdef mess would go away for a mount option

i fixed that in v2.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] implement smarter atime updates support, v2

2007-08-05 Thread Ingo Molnar


new version:

added the relatime_interval sysctl that allows the changing of the atime 
update frequency. (default: 1 day / 86400 seconds)

Ingo

-->
Subject: [patch] [patch] implement smarter atime updates support
From: Ingo Molnar <[EMAIL PROTECTED]>

change relatime updates to be performed once per day. This makes
relatime a compatible solution for HSM, mailer-notification and
tmpwatch applications too.

also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
"norelatime" the default for all mounts without an extra kernel
boot option.

add the "default_relatime=0" boot option to turn this off.

also add the /proc/sys/kernel/default_relatime flag which can be changed
runtime to modify the behavior of subsequent new mounts.

tested by moving the date forward:

   # date
   Sun Aug  5 22:55:14 CEST 2007
   # date -s "Tue Aug  7 22:55:14 CEST 2007"
   Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and
it generated exactly one IO after the date was set.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |8 +
 fs/Kconfig  |   22 ++
 fs/inode.c  |   53 +++-
 fs/namespace.c  |   24 
 include/linux/mount.h   |3 ++
 kernel/sysctl.c |   17 +++
 6 files changed, 114 insertions(+), 13 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -525,6 +525,10 @@ and is between 256 and 4096 characters. 
This is a 16-member array composed of values
ranging from 0-255.
 
+   default_relatime=
+   [FS] mount all filesystems with relative atime
+   updates by default.
+
default_utf8=   [VT]
Format=<0|1>
Set system-wide default UTF-8 mode for all tty's.
@@ -1468,6 +1472,10 @@ and is between 256 and 4096 characters. 
Format: [,[,...]]
See arch/*/kernel/reboot.c or arch/*/kernel/process.c   

 
+   relatime_interval=
+   [FS] relative atime update frequency, in seconds.
+   (default: 1 day: 86400 seconds)
+
reserve=[KNL,BUGS] Force the kernel to ignore some iomem area
 
reservetop= [X86-32]
Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig
+++ linux/fs/Kconfig
@@ -2060,6 +2060,28 @@ config 9P_FS
 
 endmenu
 
+config DEFAULT_RELATIME
+   bool "Mount all filesystems with relatime by default"
+   default y
+   help
+ If you say Y here, all your filesystems will be mounted
+ with the "relatime" mount option. This eliminates many atime
+ ('file last accessed' timestamp) updates (which otherwise
+ is performed on every file access and generates a write
+ IO to the inode) and thus speeds up IO. Atime is still updated,
+ but only once per day.
+
+ The mtime ('file last modified') and ctime ('file created')
+ timestamp are unaffected by this change.
+
+ Use the "norelatime" kernel boot option to turn off this
+ feature.
+
+config DEFAULT_RELATIME_VAL
+   int
+   default "1" if DEFAULT_RELATIME
+   default "0"
+
 if BLOCK
 menu "Partition Types"
 
Index: linux/fs/inode.c
===
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -1162,6 +1162,41 @@ sector_t bmap(struct inode * inode, sect
 }
 EXPORT_SYMBOL(bmap);
 
+/*
+ * Relative atime updates frequency (default: 1 day):
+ */
+int relatime_interval __read_mostly = 24*60*60;
+
+/*
+ * With relative atime, only update atime if the
+ * previous atime is earlier than either the ctime or
+ * mtime.
+ */
+static int relatime_need_update(struct inode *inode, struct timespec now)
+{
+   /*
+* Is mtime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_mtime, >i_atime) >= 0)
+   return 1;
+   /*
+* Is ctime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_ctime, >i_atime) >= 0)
+   return 1;
+
+   /*
+* Is the previous atime value older than a day? If yes,
+* update atime:
+*/
+   if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= relatime_interval)
+   return 1;
+   /*
+* Good, we can skip the atime update:
+*/
+   return 0;
+}
+
 /**
  * touch_atime -   update the access time
  * @mnt: mount the inode is accessed

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Alan Cox

> change relatime updates to be performed once per day. This makes
> relatime a compatible solution for HSM, mailer-notification and
> tmpwatch applications too.

Sweet
> 

> also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
> "norelatime" the default for all mounts without an extra kernel
> boot option.

Should be a mount option.


> + relatime[FS] default to enabled relatime updates on all
> + filesystems.
> +
> + relatime=   [FS] default to enabled/disabled relatime updates on
> + all filesystems.
> +

Double patch

>   atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess,
>   EzKey and similar keyboards
>  
> @@ -1100,6 +1106,12 @@ and is between 256 and 4096 characters. 
>   noasync [HW,M68K] Disables async and sync negotiation for
>   all devices.
>  
> + norelatime  [FS] default to disabled relatime updates on all
> + filesystems.
> +
> + norelatime= [FS] default to disabled/enabled relatime updates
> + on all filesystems.
> +

Double patch

> +config DEFAULT_RELATIME
> + bool "Mount all filesystems with relatime by default"
> + default y

Changes behaviour so probably should default n. Better yet it should be
the mount option so its flexible and strongly encouraged for vendors.

>  /*
> + * Allow users to disable (or enable) atime updates via a .config
> + * option or via the boot line, or via /proc/sys/fs/mount_with_relatime:
> + */
> +int mount_with_relatime __read_mostly =
> +#ifdef CONFIG_DEFAULT_RELATIME
> +1
> +#else
> +0
> +#endif
> +;

This ifdef mess would go away for a mount option

> +/*
> + * The "norelatime=", "atime=", "norelatime" and "relatime" boot parameters:
> + */
> +static int toggle_relatime_updates(int val)
> +{
> + mount_with_relatime = val;
> +
> + printk("Relative atime updates are: %s\n", val ? "on" : "off");
> +
> + return 1;
> +}
> +
> +static int __init set_relatime_setup(char *str)
> +{
> + int val;
> +
> + get_option(, );
> + return toggle_relatime_updates(val);
> +}
> +__setup("relatime=", set_relatime_setup);
> +
> +static int __init set_norelatime_setup(char *str)
> +{
> + int val;
> +
> + get_option(, );
> + return toggle_relatime_updates(!val);
> +}
> +__setup("norelatime=", set_norelatime_setup);
> +
> +static int __init set_relatime(char *str)
> +{
> + return toggle_relatime_updates(1);
> +}
> +__setup("relatime", set_relatime);
> +
> +static int __init set_norelatime(char *str)
> +{
> + return toggle_relatime_updates(0);
> +}
> +__setup("norelatime", set_norelatime);


All the above chunk is unneccessary as it can be a mount option. That
avoids tons of messy extra code and complication. Users are far safer
editing fstab than grub.conf.

> + {
> + .ctl_name   = CTL_UNNUMBERED,
> + .procname   = "mount_with_relatime",
> + .data   = _with_relatime,
> + .maxlen = sizeof(int),
> + .mode   = 0644,
> + .proc_handler   = _dointvec,
> + },

More code you don't need if you just leave it as a mount option.

I'd much rather see the small clean patch for this as a mount option.
Leave the rest to users/distros/lwn and it'll just happen now you've
sorted the compabitility problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] implement smarter atime updates support

2007-08-05 Thread Ingo Molnar


* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> tested it by moving the date forward:
> 
>   # date
>   Sun Aug  5 22:55:14 CEST 2007
>   # date -s "Tue Aug  7 22:55:14 CEST 2007"
>   Tue Aug  7 22:55:14 CEST 2007
> 
> access to a file did not generate disk IO before the date was set, and 
> it generated exactly one IO after the date was set.
> 
> ( should i perhaps reduce the number of boot options and only use a
>   single "norelatime_default" boot option to turn this off? )

ok, cleaned it up some more: only a single, consistent boot option and 
all the switches (be that config, boot or sysctl) are now called 
"default_relatime". Also, got rid of that #ifdef ugliness in namespace.c 
via a cleaner Kconfig solution (suggested by Peter Zijlstra).

Ingo

>
Subject: [patch] implement smarter atime updates support
From: Ingo Molnar <[EMAIL PROTECTED]>

change relatime updates to be performed once per day. This makes
relatime a compatible solution for HSM, mailer-notification and
tmpwatch applications too.

also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
"norelatime" the default for all mounts without an extra kernel
boot option.

add the "default_relatime=0" boot option to turn this off.

also add the /proc/sys/kernel/default_relatime flag which can be changed
runtime to modify the behavior of subsequent new mounts.

tested by moving the date forward:

   # date
   Sun Aug  5 22:55:14 CEST 2007
   # date -s "Tue Aug  7 22:55:14 CEST 2007"
   Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and
it generated exactly one IO after the date was set.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |4 +++
 fs/Kconfig  |   22 
 fs/inode.c  |   48 ++--
 fs/namespace.c  |   25 ++
 include/linux/mount.h   |2 +
 kernel/sysctl.c |9 ++
 6 files changed, 97 insertions(+), 13 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -525,6 +525,10 @@ and is between 256 and 4096 characters. 
This is a 16-member array composed of values
ranging from 0-255.
 
+   default_relatime=
+   [FS] mount all filesystems with relative atime
+   updates by default.
+
default_utf8=   [VT]
Format=<0|1>
Set system-wide default UTF-8 mode for all tty's.
Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig
+++ linux/fs/Kconfig
@@ -2060,6 +2060,28 @@ config 9P_FS
 
 endmenu
 
+config DEFAULT_RELATIME
+   bool "Mount all filesystems with relatime by default"
+   default y
+   help
+ If you say Y here, all your filesystems will be mounted
+ with the "relatime" mount option. This eliminates many atime
+ ('file last accessed' timestamp) updates (which otherwise
+ is performed on every file access and generates a write
+ IO to the inode) and thus speeds up IO. Atime is still updated,
+ but only once per day.
+
+ The mtime ('file last modified') and ctime ('file created')
+ timestamp are unaffected by this change.
+
+ Use the "norelatime" kernel boot option to turn off this
+ feature.
+
+config DEFAULT_RELATIME_VAL
+   int
+   default "1" if DEFAULT_RELATIME
+   default "0"
+
 if BLOCK
 menu "Partition Types"
 
Index: linux/fs/inode.c
===
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -1162,6 +1162,36 @@ sector_t bmap(struct inode * inode, sect
 }
 EXPORT_SYMBOL(bmap);
 
+/*
+ * With relative atime, only update atime if the
+ * previous atime is earlier than either the ctime or
+ * mtime.
+ */
+static int relatime_need_update(struct inode *inode, struct timespec now)
+{
+   /*
+* Is mtime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_mtime, >i_atime) >= 0)
+   return 1;
+   /*
+* Is ctime younger than atime? If yes, update atime:
+*/
+   if (timespec_compare(>i_ctime, >i_atime) >= 0)
+   return 1;
+
+   /*
+* Is the previous atime value older than a day? If yes,
+* update atime:
+*/
+   if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= 24*60*60)
+   return 1;
+   /*
+* Good, we can skip the atime update:
+*/
+   return 0;
+}
+
 /**
  * touch_atime -   update the access time
  * @mnt: mount the inode is

Re: Kernel Bug in 2.4.35 when compiled gcc>=4.2.0 and -march=c3

2007-08-05 Thread Willy Tarreau

On Sun, Aug 05, 2007 at 05:43:37PM +0200, Willy Tarreau wrote:
> On Sun, Aug 05, 2007 at 10:56:04AM +0200, Axel Reinhold wrote:
> > i found a bug in linux-2.4.35.
> > 
> > the bug produces a crashing kernel when compiled
> > with gcc >=4.2.0 and VIA C3 optimized -march=c3
> > (CONFIG_MCYRIXIII=y)
> > 
> > this issue was first discussed on the gcc bugzilla:
> >  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32264
> > 
> > and tracked down to the include/asm-i386/hw_irq.h
> > module with the help of the gcc guys:
> > 
> > (pluto at agmk dot net) wrote:
> > >yup, i see something new :)
> > >
> > >please look at line 12137 of i8259.i:
> > >
> > >__attribute__((regparm(0))) void call_do_IRQ(void); __asm__(...
> > >
> > >as you can see there is a semicolon after call_do_IRQ(void)
> > >and following asm statement isn't treated as a function body.
> > >in this way -O1 -f{no-}unit-at-a-time accidentally produces
> > >different code. it's not a gcc bug.
> > >
> > >linux-2.4.35/include/asm-i386/hw_irq.h
> > >contains these evil macros.
> > 
> > is there a chance to fix this?
> > these macros a far beyond my capabilities to fix.
> 
> Axel,
> 
> I've reproduced it and posted the following explanation to GCC's
> bugzilla ; I think I can provide you with a simple fix very soon.

OK Axel,

I have a fix now. Three good news :

  - there were other symbols which were affected by -fno-unit-at-a-time under
gcc-4.2

  - gcc-4.1 was also slightly affected but it was not very serious, since the
difference only lie in data <-> rodata

  - vmlinux is smaller by 65 kB on my machine with both gcc-4.1 and gcc-4.2,
and bzImage is smaller by 7 kB. gcc-4.2's bzImage was 2.5 kB larger and
is now 100 bytes smaller.

The fix simply consists in removing -fno-unit-at-a-time with gcc-4, as is
done in 2.6. This was added for gcc-3.4 and is not appropriate for 4.x.

Here's the list of the other affected symbols. First column is the type of
the symbol and second one is the symbol name. It's a diff -y of the symbols
between vmlinux build with both gcc versions :

2.4.35 + gcc-4.2.1| 2.4.35-git + gcc-4.2.1
d IRQ0x00_interrupt   | t IRQ0x00_interrupt
d IRQ0x01_interrupt   | t IRQ0x01_interrupt
d IRQ0x02_interrupt   | t IRQ0x02_interrupt
d IRQ0x03_interrupt   | t IRQ0x03_interrupt
d IRQ0x04_interrupt   | t IRQ0x04_interrupt
d IRQ0x05_interrupt   | t IRQ0x05_interrupt
d IRQ0x06_interrupt   | t IRQ0x06_interrupt
d IRQ0x07_interrupt   | t IRQ0x07_interrupt
d IRQ0x08_interrupt   | t IRQ0x08_interrupt
d IRQ0x09_interrupt   | t IRQ0x09_interrupt
d IRQ0x0a_interrupt   | t IRQ0x0a_interrupt
d IRQ0x0b_interrupt   | t IRQ0x0b_interrupt
d IRQ0x0c_interrupt   | t IRQ0x0c_interrupt
d IRQ0x0d_interrupt   | t IRQ0x0d_interrupt
d IRQ0x0e_interrupt   | t IRQ0x0e_interrupt
d IRQ0x0f_interrupt   | t IRQ0x0f_interrupt
d IRQ0x10_interrupt   | t IRQ0x10_interrupt
d IRQ0x11_interrupt   | t IRQ0x11_interrupt
d IRQ0x12_interrupt   | t IRQ0x12_interrupt
d IRQ0x13_interrupt   | t IRQ0x13_interrupt
d IRQ0x14_interrupt   | t IRQ0x14_interrupt
d IRQ0x15_interrupt   | t IRQ0x15_interrupt
d IRQ0x16_interrupt   | t IRQ0x16_interrupt
d IRQ0x17_interrupt   | t IRQ0x17_interrupt
d IRQ0x18_interrupt   | t IRQ0x18_interrupt
d IRQ0x19_interrupt   | t IRQ0x19_interrupt
d IRQ0x1a_interrupt   | t IRQ0x1a_interrupt
d IRQ0x1b_interrupt   | t IRQ0x1b_interrupt
d IRQ0x1c_interrupt   | t IRQ0x1c_interrupt
d IRQ0x1d_interrupt   | t IRQ0x1d_interrupt
d IRQ0x1e_interrupt   | t IRQ0x1e_interrupt
d IRQ0x1f_interrupt   | t IRQ0x1f_interrupt
d IRQ0x20_interrupt   | t IRQ0x20_interrupt
d IRQ0x21_interrupt   | t IRQ0x21_interrupt
d IRQ0x22_interrupt   | t IRQ0x22_interrupt
d IRQ0x23_interrupt   | t IRQ0x23_interrupt
d IRQ0x24_interrupt   | t IRQ0x24_interrupt
d IRQ0x25_interrupt   | t IRQ0x25_interrupt
d IRQ0x26_interrupt   | t IRQ0x26_interrupt
d IRQ0x27_interrupt   | t IRQ0x27_interrupt
d IRQ0x28_interrupt   | t IRQ0x28_interrupt
d IRQ0x29_interrupt   | t IRQ0x29_interrupt
d IRQ0x2a_interrupt   | t IRQ0x2a_interrupt
d IRQ0x2b_interrupt   | t IRQ0x2b_interrupt
d IRQ0x2c_interrupt   | t IRQ0x2c_interrupt
d IRQ0x2d_interrupt   | t IRQ0x2d_interrupt
d IRQ0x2e_interrupt   | t IRQ0x2e_interrupt
d IRQ0x2f_interrupt   | t IRQ0x2f_interrupt
d IRQ0x30_interrupt   | t IRQ0x30_interrupt
d IRQ0x31_interrupt   | t IRQ0x31_interrupt
d IRQ0x32_interrupt   | t IRQ0x32_interrupt
d IRQ0x33_interrupt   | t IRQ0x33_interrupt
d IRQ0x34_interrupt   | t IRQ0x34_interrupt
d IRQ0x35_interrupt   | t IRQ0x35_interrupt
d IRQ0x36_interrupt   | t IRQ0x36_interrupt
d

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Alan Cox

On Sun, 5 Aug 2007 20:08:26 +0200
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Alan Cox <[EMAIL PROTECTED]> wrote:
> 
> > And you honestly think that putting it in Kconfig as well as allowing 
> > users to screw up horribly and creating incompatible defaults you
> 
> So far you've not offered one realistic scenario of "screw up horribly". 
> People have been using noatime for a long time and there are no horror 
> stories about that. _Which_ OSS HSM software relies on atime?

Whats this about "OSS". OSS or proprietary. And you've been given one
example already - tmpwatch. Although its more of a trash compactor than
HSM.

> > can't test for in a user space app where it matters is going to 
> > *change* this.
> 
> The patch i posted today adds /proc/sys/kernel/mount_with_atime. That 
> can be tested by user-space, if it truly cares about atime.

We have an existing API and ABI thank you. See man mount.

> > Do you really think anyone who said "noatime, compatibility, umm errr" 
> > is going to say "noatime, compatibility, but hey its in Kconfig lets 
> > do it". You argument doesn't hold up to minimal rational 
> > consideration. Posting to the distribution devel list with: "Its a 50% 
> > performance win, we need to fix these corner cases, here's a tmpwatch 
> > patch" is *exactly* what is needed to change it, and Kconfig options 
> > are irrelevant to that.
> 
> i did exactly that 6 months ago, check your email folders. I went by the 
> "process". But it doesnt really matter anymore, Ubuntu has done the step 

And your Kconfig argument is still not rational. A question I note you
chose not to answer. Anyway if Ubuntu has switched to noatime by default
(or relatime) and hasn't used a Kconfig line that proves my whole point -
we don't need one and its pointless to add so.

> we really have to ask ourselves whether the "process" is correct if 
> advantages to the user of this order of magnitude can be brushed aside 
> with simple "this breaks binary-only HSM" and "it's not standards 
> compliant" arguments.

Thats a discussion to have with your distribution development team. The
kernel provides the required facilities already. Open source means
everyone can do cool stuff as they see fit and natural selection will do
the rest.

Look I agree entirely with you that relatime, or noatime + minor package
patches is the right thing to do for FC8. I've also pointed out you can
build and release tuning packages for FC 7 and they'll make the
distribution. FC8 beta 1 approaches so now is the time to be talking to
the distribution people and to the ever kernel building Dave Jones about
it.

But none of this makes stupid Kconfig hacks the right answer.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc1: USB hard disk broken

2007-08-05 Thread David Brownell

On Sunday 05 August 2007, Oliver Neukum wrote:
> > 
> > 2007-08-05_10:30:27.75572 kern.err:
> > ehci_hcd :00:1d.7: dev 6 ep1in scatterlist error 0/-121

That's rather strange since it means a *success* (urb->status 0) was
reported after a short read (scatterlist status -120, -EREMOTEIO).

The hardware should have stopped queue processing after the short
read, because of how qtd->hw_alt_next gets set up ... at least,
that's how I remember it, these many years after writing that code.

It might be that because of the issue noted below, it was wrongly
restarted by the software.

> > 2007-08-05_10:30:27.86576 kern.info: usb 1-6: reset high speed USB device 
> > using ehci_hcd and address 5
> > 2007-08-05_10:30:55.95293 kern.info: usb 1-6: USB disconnect, address 5
> > 2007-08-05_10:30:55.95300 kern.err:
> > ehci_hcd :00:1d.7: dev 6 ep1in scatterlist error -108/-108 

That one just means nobody updated that test to recognize that
the -ESHUTDOWN (-108) triggered after disconnect is a "clean"
failure like the ones triggered by unlinking.

However it also indicates that something changed in the unlink
code paths, since I see the *expected* code (-ECONNRESET) is no
longer being set by usbcore during unlinks ... it's not quite
clear to me what else that change will have broken.  Including
whether that might not explain how the hardware queue got wrongly
restarted after the short read above.

- Dave

> David, does this error say anything to you?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread david


On Sun, 5 Aug 2007, Diego Calleja wrote:


El Sun, 5 Aug 2007 09:13:20 +0200, Ingo Molnar <[EMAIL PROTECTED]> escribió:


Measurements show that noatime helps 20-30% on regular desktop
workloads, easily 50% for kernel builds and much more than that (in
excess of 100%) for file-read-intense workloads. We cannot just walk



And as everybody knows in servers is a popular practice to disable it.
According to an interview to the kernel.org admins

"Beyond that, Peter noted, "very little fancy is going on, and that is good
because fancy is hard to maintain." He explained that the only fancy thing
being done is that all filesystems are mounted noatime meaning that the
system doesn't have to make writes to the filesystem for files which are
simply being read, "that cut the load average in half."

I bet that some people would consider such performance hit a bug...



actually, it's popular practice to disable it by people who know how big a 
hit it is and know how few programs use it.


i've been a linux sysadmin for 10 years, and have known about noatime for 
at least 7 years, but I always thought of it in the catagory of 'use it 
only on your performance critical machines where you are trying to extract 
every ounce of performance, and keep an eye out for things misbehaving'


I never imagined that itwas the 20%+ hit that is being described, and with 
so little impact, or I would have switched to it across the board years 
ago.


I'll bet there are a lot of admins out there in the same boat.

adding an option in the kernel to change the default sounds like a very 
good first step, even if the default isn't changed today.


David Lang

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Ingo Molnar


* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Sun, 5 Aug 2007, Ingo Molnar wrote:
> > 
> > you mean tmpwatch? The trivial change below fixes this. And with that 
> > we've come to the end of an extremely short list of atime dependencies.
> 
> You wouldn't even need these kinds of games.
> 
> What we could do is to make "relatime" updates a bit smarter.
> 
> A bit smarter would be:
> 
>  - update atime if the old atime is <= than mtime/ctime
> 
>Logic: things like mailers can care about whether some new state has 
>been read or not. This is the current relatime.
> 
>  - update atime if the old atime is more than X seconds in the past 
>(defaulting to one day or something)
> 
>Logic: things like tmpwatch and backup software may want to remove 
>stuff that hasn't been touched in a long time, but they sure don't care 
>about "exact" atime.

ok, i've implemented this and it's working fine. Check out the 
relatime_need_update() function for the details of the logic. Atime 
update frequency is 1 day with that, and we update at least once after 
every modification as well, for the mailer logic.

tested it by moving the date forward:

  # date
  Sun Aug  5 22:55:14 CEST 2007
  # date -s "Tue Aug  7 22:55:14 CEST 2007"
  Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and 
it generated exactly one IO after the date was set.

( should i perhaps reduce the number of boot options and only use a
  single "norelatime_default" boot option to turn this off? )

Ingo

>
Subject: [patch] add norelatime/relatime boot options, CONFIG_DEFAULT_RELATIME
From: Ingo Molnar <[EMAIL PROTECTED]>

change relatime updates to be performed once per day. This makes
relatime a compatible solution for HSM, mailer-notification and
tmpwatch applications too.

also add the CONFIG_DEFAULT_RELATIME kernel option, which makes
"norelatime" the default for all mounts without an extra kernel
boot option.

add the "norelatime" (and "relatime") boot options to enable/disable
relatime updates for all filesystems.

also add the /proc/sys/kernel/mount_with_relatime flag which can be changed
runtime to modify the behavior of subsequent new mounts.

tested by moving the date forward:

   # date
   Sun Aug  5 22:55:14 CEST 2007
   # date -s "Tue Aug  7 22:55:14 CEST 2007"
   Tue Aug  7 22:55:14 CEST 2007

access to a file did not generate disk IO before the date was set, and
it generated exactly one IO after the date was set.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 Documentation/kernel-parameters.txt |   12 +++
 fs/Kconfig  |   17 ++
 fs/inode.c  |   48 
 fs/namespace.c  |   61 
 include/linux/mount.h   |2 +
 kernel/sysctl.c |9 +
 6 files changed, 136 insertions(+), 13 deletions(-)

Index: linux/Documentation/kernel-parameters.txt
===
--- linux.orig/Documentation/kernel-parameters.txt
+++ linux/Documentation/kernel-parameters.txt
@@ -303,6 +303,12 @@ and is between 256 and 4096 characters. 
 
atascsi=[HW,SCSI] Atari SCSI
 
+   relatime[FS] default to enabled relatime updates on all
+   filesystems.
+
+   relatime=   [FS] default to enabled/disabled relatime updates on
+   all filesystems.
+
atkbd.extra=[HW] Enable extra LEDs and keys on IBM RapidAccess,
EzKey and similar keyboards
 
@@ -1100,6 +1106,12 @@ and is between 256 and 4096 characters. 
noasync [HW,M68K] Disables async and sync negotiation for
all devices.
 
+   norelatime  [FS] default to disabled relatime updates on all
+   filesystems.
+
+   norelatime= [FS] default to disabled/enabled relatime updates
+   on all filesystems.
+
nobats  [PPC] Do not use BATs for mapping kernel lowmem
on "Classic" PPC cores.
 
Index: linux/fs/Kconfig
===
--- linux.orig/fs/Kconfig
+++ linux/fs/Kconfig
@@ -2060,6 +2060,23 @@ config 9P_FS
 
 endmenu
 
+config DEFAULT_RELATIME
+   bool "Mount all filesystems with relatime by default"
+   default y
+   help
+ If you say Y here, all your filesystems will be mounted
+ with the "relatime" mount option. This eliminates many atime
+ ('file last accessed' timestamp) updates (which otherwise
+ is performed on every file access and generates a write
+ IO to the inode) and thus speeds up IO. Atime is still updated,
+ but only once per day.
+
+ The mtime ('file last modified') and ctime ('file created')
+ timestamp are

[PATCH] sonypi: Fix initialization warning

2007-08-05 Thread Richard Knutsson


Signed-off-by: Richard Knutsson <[EMAIL PROTECTED]>
---
Got this from the compiler (gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)):
drivers/char/sonypi.c:1153: warning: initialization from incompatible pointer 
type


diff --git a/drivers/char/sonypi.c b/drivers/char/sonypi.c
index 73037a4..2dcd519 100644
--- a/drivers/char/sonypi.c
+++ b/drivers/char/sonypi.c
@@ -1147,10 +1147,15 @@ static int sonypi_acpi_remove(struct acpi_device 
*device, int type)
return 0;
}

+static const struct acpi_device_id sonypi_acpi_driver_ids[] = {
+{ACPI_PROCESSOR_HID, 0},
+{"", 0},
+};
+
static struct acpi_driver sonypi_acpi_driver = {
.name   = "sonypi",
.class  = "hkey",
-   .ids= "SNY6001",
+   .ids= sonypi_acpi_driver_ids,
.ops= {
   .add = sonypi_acpi_add,
   .remove = sonypi_acpi_remove,


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] uli526x: Add suspend and resume routines

2007-08-05 Thread Rafael J. Wysocki

[Sorry for the excessive CCs, but I don't know who's the maintainer. ;-)]
---
From: Rafael J. Wysocki <[EMAIL PROTECTED]>

Add suspend/resume support to the uli526x network driver (tested on x86_64,
with "Ethernet controller: ALi Corporation M5263 Ethernet Controller, rev 40").

This patch is based on the suspend/resume code in the tg3 driver.

Signed-off-by: Rafael J. Wysocki <[EMAIL PROTECTED]>
---
 drivers/net/tulip/uli526x.c |  108 +---
 1 file changed, 102 insertions(+), 6 deletions(-)

Index: linux-2.6.23-rc2/drivers/net/tulip/uli526x.c
===
--- linux-2.6.23-rc2.orig/drivers/net/tulip/uli526x.c   2007-08-05 
20:18:57.0 +0200
+++ linux-2.6.23-rc2/drivers/net/tulip/uli526x.c2007-08-05 
20:47:07.0 +0200
@@ -1110,19 +1110,15 @@ static void uli526x_timer(unsigned long 
 
 
 /*
- * Dynamic reset the ULI526X board
  * Stop ULI526X board
  * Free Tx/Rx allocated memory
- * Reset ULI526X board
- * Re-initialize ULI526X board
+ * Init system variable
  */
 
-static void uli526x_dynamic_reset(struct net_device *dev)
+static void uli526x_reset_prepare(struct net_device *dev)
 {
struct uli526x_board_info *db = netdev_priv(dev);
 
-   ULI526X_DBUG(0, "uli526x_dynamic_reset()", 0);
-
/* Sopt MAC controller */
db->cr6_data &= ~(CR6_RXSC | CR6_TXSC); /* Disable Tx/Rx */
update_cr6(db->cr6_data, dev->base_addr);
@@ -1141,6 +1137,22 @@ static void uli526x_dynamic_reset(struct
db->link_failed = 1;
db->init=1;
db->wait_reset = 0;
+}
+
+
+/*
+ * Dynamic reset the ULI526X board
+ * Stop ULI526X board
+ * Free Tx/Rx allocated memory
+ * Reset ULI526X board
+ * Re-initialize ULI526X board
+ */
+
+static void uli526x_dynamic_reset(struct net_device *dev)
+{
+   ULI526X_DBUG(0, "uli526x_dynamic_reset()", 0);
+
+   uli526x_reset_prepare(dev);
 
/* Re-initialize ULI526X board */
uli526x_init(dev);
@@ -1150,6 +1162,88 @@ static void uli526x_dynamic_reset(struct
 }
 
 
+#ifdef CONFIG_PM_SLEEP
+
+/*
+ * Suspend the interface.
+ */
+
+static int uli526x_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+   struct net_device *dev = pci_get_drvdata(pdev);
+   int err = 0;
+
+   ULI526X_DBUG(0, "uli526x_suspend", 0);
+
+   if (dev && netdev_priv(dev)) {
+   pci_power_t power_state;
+
+   pci_save_state(pdev);
+
+   if (!netif_running(dev))
+   return 0;
+
+   netif_device_detach(dev);
+   uli526x_reset_prepare(dev);
+
+   power_state = pci_choose_state(pdev, state);
+   pci_enable_wake(pdev, power_state, 0);
+   err = pci_set_power_state(pdev, power_state);
+   if (err) {
+   netif_device_attach(dev);
+   /* Re-initialize ULI526X board */
+   uli526x_init(dev);
+   /* Restart upper layer interface */
+   netif_wake_queue(dev);
+   }
+   }
+   return err;
+}
+
+/*
+ * Resume the interface.
+ */
+
+static int uli526x_resume(struct pci_dev *pdev)
+{
+   struct net_device *dev = pci_get_drvdata(pdev);
+   struct uli526x_board_info *db = netdev_priv(dev);
+
+   ULI526X_DBUG(0, "uli526x_resume", 0);
+
+   if (dev && db) {
+   int err;
+
+   pci_restore_state(pdev);
+
+   if (!netif_running(dev))
+   return 0;
+
+   err = pci_set_power_state(pdev, PCI_D0);
+   if (err) {
+   printk(KERN_WARNING
+   "%s: Could not put device into D0\n",
+   dev->name);
+   return err;
+   }
+
+   netif_device_attach(dev);
+   /* Re-initialize ULI526X board */
+   uli526x_init(dev);
+   /* Restart upper layer interface */
+   netif_wake_queue(dev);
+   }
+   return 0;
+}
+
+#else /* !CONFIG_PM_SLEEP */
+
+#define uli526x_suspendNULL
+#define uli526x_resume NULL
+
+#endif /* !CONFIG_PM_SLEEP */
+
+
 /*
  * free all allocated rx buffer
  */
@@ -1689,6 +1783,8 @@ static struct pci_driver uli526x_driver 
.id_table   = uli526x_pci_tbl,
.probe  = uli526x_init_one,
.remove = __devexit_p(uli526x_remove_one),
+   .suspend= uli526x_suspend,
+   .resume = uli526x_resume,
 };
 
 MODULE_AUTHOR("Peer Chen, [EMAIL PROTECTED]");
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lvcreate on 2.6.22.1: kernel tried to execute NX-protected page

2007-08-05 Thread Juergen Kreileder


I've upgraded devmapper to 1.02.20 and lvm2 to 2.02.26.  Didn't help much,
I just got a the same BUG again:

kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at virtual address f492c1f8
 printing eip:
f492c1f8
*pdpt = 1001
*pde = 8000348001e3
*pte = ec1c7da0ec1c7da0
Oops: 0011 [#1]
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010282   (2.6.22.1-jk1-exec-shield #1)
EIP is at 0xf492c1f8
eax: f492c1cc   ebx: f492c1cc   ecx:    edx: f492c1f8
esi: f492c1f8   edi:    ebp:    esp: d4177db4
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process lvcreate (pid: 2303, ti=d4176000 task=d1c88a00 task.ti=d4176000)
Stack: c02088c4 f492c1e4 c02088d0 c03dd95e c2969f60 c0209118 ce684800 e5436280 
   0287 c03dd952 c03dd952 f6a44548 c018e393 c19ea90c  c19eb900 
   c03dd952 c18f9780 f45cb300  c0157664 c18f97cc c18f9780 c01575be 
Call Trace:
 [kobject_cleanup+116/128] kobject_cleanup+0x74/0x80
 [kobject_release+0/16] kobject_release+0x0/0x10
 [kref_put+56/160] kref_put+0x38/0xa0
 [sysfs_hash_and_remove+275/320] sysfs_hash_and_remove+0x113/0x140
 [sysfs_slab_alias+100/128] sysfs_slab_alias+0x64/0x80
 [sysfs_slab_add+174/208] sysfs_slab_add+0xae/0xd0
 [kmem_cache_create+236/320] kmem_cache_create+0xec/0x140
 [jobs_init+46/128] jobs_init+0x2e/0x80
 [kcopyd_init+45/176] kcopyd_init+0x2d/0xb0
 [kcopyd_client_create+28/208] kcopyd_client_create+0x1c/0xd0
 [init_hash_tables+142/192] init_hash_tables+0x8e/0xc0
 [snapshot_ctr+506/752] snapshot_ctr+0x1fa/0x2f0
 [dm_split_args+47/272] dm_split_args+0x2f/0x110
 [dm_table_add_target+252/400] dm_table_add_target+0xfc/0x190
 [vmalloc+32/48] vmalloc+0x20/0x30
 [populate_table+98/192] populate_table+0x62/0xc0
 [table_load+82/240] table_load+0x52/0xf0
 [table_load+0/240] table_load+0x0/0xf0
 [ctl_ioctl+209/288] ctl_ioctl+0xd1/0x120
 [ctl_ioctl+0/288] ctl_ioctl+0x0/0x120
 [do_ioctl+59/96] do_ioctl+0x3b/0x60
 [vfs_ioctl+94/416] vfs_ioctl+0x5e/0x1a0
 [sys_ioctl+61/128] sys_ioctl+0x3d/0x80
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 ===
Code: 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00 20 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  c1 92 f4 f8 c1 92 
f4 4c 16 b6 f5 00 00 00 00 00 00 00 00 00 
EIP: [] 0xf492c1f8 SS:ESP 0068:d4177db4


> I got the appended BUG from a 32-bit 2.6.22.1 kernel (with exec-shield
> patch and PAE enabled) on an Athlon64 with dmsetup 1.02.03 and lvm2
> v2.02.02.
> (Note, the message comes from the vanilla kernel, not from the
> exec-shiled patch.)
> 
> I wasn't able to reproduce the problem so far.  The machine creates
> several snapshot volumes every 4 hours and worked fine with the new
> kernel for several days.  It had 2.6.16.12+exec-shield before and ran
> flawlessy for over a year.
> 
> 
> kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> BUG: unable to handle kernel paging request at virtual address f551df78
>  printing eip:
> f551df78
> *pdpt = 1001
> *pde = 8000354001e3
> *pte = 9293396c5d22e546
> Oops: 0011 [#1]
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010286   (2.6.22.1-jk1-exec-shield #1)
> EIP is at 0xf551df78
> eax: f551df4c   ebx: f551df4c   ecx:    edx: f551df78
> esi: f551df78   edi:    ebp:    esp: e8ee5db4
> ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
> Process lvcreate (pid: 25916, ti=e8ee4000 task=f7358a00 task.ti=e8ee4000)
> Stack: c02088c4 f551df64 c02088d0 c03dd95e c64ccf00 c0209118 0287 c03dd95e
>0287 c018e38b c03dd952 d3e460e8 c018e393 c192a90c  c192b900
>c03dd952 f557a600 f59bbcc0  c0157664 f557a64c f557a600 c01575be
> Call Trace:
>  [kobject_cleanup+116/128] kobject_cleanup+0x74/0x80
>  [kobject_release+0/16] kobject_release+0x0/0x10
>  [kref_put+56/160] kref_put+0x38/0xa0
>  [sysfs_hash_and_remove+267/320] sysfs_hash_and_remove+0x10b/0x140
>  [sysfs_hash_and_remove+275/320] sysfs_hash_and_remove+0x113/0x140
>  [sysfs_slab_alias+100/128] sysfs_slab_alias+0x64/0x80
>  [sysfs_slab_add+174/208] sysfs_slab_add+0xae/0xd0
>  [kmem_cache_create+236/320] kmem_cache_create+0xec/0x140
>  [jobs_init+46/128] jobs_init+0x2e/0x80
>  [kcopyd_init+45/176] kcopyd_init+0x2d/0xb0
>  [kcopyd_client_create+28/208] kcopyd_client_create+0x1c/0xd0
>  [init_hash_tables+142/192] init_hash_tables+0x8e/0xc0
>  [snapshot_ctr+506/752] snapshot_ctr+0x1fa/0x2f0
>  [dm_split_args+47/272] dm_split_args+0x2f/0x110
>  [dm_table_add_target+252/400] dm_table_add_target+0xfc/0x190
>  [vmalloc+32/48] vmalloc+0x20/0x30
>  [populate_table+98/192] populate_table+0x62/0xc0
>  [table_load+82/240] table_load+0x52/0xf0
>  [table_load+0/240] table_load+0x0/0xf0
>  [ctl_ioctl+209/288] ctl_ioctl+0xd1/0x120
>  [ctl_ioctl+0/288] ctl_ioctl+0x0/0x120
>  [do_ioctl+59/96] do_ioctl+0x3b/0x60
>  [vfs_ioctl+94/416] vfs_ioctl+0x5e/0x1a0
>  [sys_ioctl+61/128]

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread adi

On Sun, Aug 05, 2007 at 02:44:08PM -0400, Dave Jones wrote:
> It still fails miserably for me.
> 
> If I hit 'C' and '?' I get a list of my mail folders, with some of them
> marked 'N' if they have new mail.  Without atime, those N's never show
> up and every mbox looks like it has no new mail.

This is true for one using mbox_type=mbox (i.e unix native mailbox
format). Maildir type should work just fine as mutt will noticed
that new mail has arrived on 'new' subdir (according to maildir spec).

Then yes, it is configuration dependent.

Regards,

P.Y. Adi Prasaja
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATH 0/1] Kexec jump - v2 - the first step to kexec based hibernation

2007-08-05 Thread Pavel Machek

Hi!

> > [EMAIL PROTECTED]:~# kexec -p /data/l/linux/arch/i386/boot/bzImage 
> > --append="init=/bin/bash kexec_jump_buf_pfn=`cat 
> > /sys/kernel/kexec_jump_buf_pfn`"
> > Could not find a free area of memory of 9000 bytes...
> > locate_hole failed
> > [EMAIL PROTECTED]:~#
> > 
> > What am I doing wrong?
> 
> The kexec-tools version 1.101 does not work perfectly with relocatable
> kernel. This would have been solved if I worked against kexec-tools
> testing tree. I will work against testing tree in the next version.
> 
> But, with some trick, it can work. When configure kernel, make sure the
> following option is set:
> 
> CONFIG_PHYSICAL_START=0x400 # if crashkernel=[EMAIL PROTECTED]

Did the trick, I got the kernel to load, and it even attempted
exec... but I got doublefault (or what is it?)

Int 6: ... EIP: c4739906. Address is in reserve_bootmem_core.

Do I have to disable ACPI completely? I tried with acpi=off,
nosmp... but problem does not seem device related.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Dave Jones

On Sun, Aug 05, 2007 at 09:21:41AM +0200, Ingo Molnar wrote:
 > * Alan Cox <[EMAIL PROTECTED]> wrote:
 > 
 > > With a Red Hat on if we can move from /dev/hda to /dev/sda in FC7 then 
 > > we can move from atime to noatime by default on FC8 with appropriate 
 > > release note warnings and having a couple of betas to find out what 
 > > other than mutt goes boom.
 > 
 > btw., Mutt does not go boom, i use it myself. It works just fine and 
 > notices new mails even on a noatime,nodiratime filesystem.
 
It still fails miserably for me.

If I hit 'C' and '?' I get a list of my mail folders, with some of them
marked 'N' if they have new mail.  Without atime, those N's never show
up and every mbox looks like it has no new mail.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATH 1/1] Kexec jump - v2 - kexec jump

2007-08-05 Thread Pavel Machek

Hi!

> > > This patch implement the functionality of jumping from kexeced kernel
> > > to original kernel.
> > > 
> > > A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
> > > trigger the jumping to (executing) the new kernel or jumping back to
> > > the original kernel.
> > 
> > Could we get two reboot commands? Exec loaded kernel seems to be quite
> > different operation than "jump back".
> 
> Exec loaded kernel for jumping back is also different from exec loaded
> kernel normally.
> 
> Maybe the two reboot operations in kexec jump can be seen as "jump to"
> and "jump back"?

Yes, that would be better.

> Document will be added in next version too.
> 
> This parameter is used for kernel-to-kernel communication. But now, it
> must be setup by user, as in usage guide.

Thanks for your help :-).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: high system cpu load during intense disk i/o

2007-08-05 Thread Dimitrios Apostolou

On Sunday 05 August 2007 20:58:15 Rafał Bilski wrote:
> > Hello again,
>
> Hi!
>
> > was my report so complicated? Perhaps I shouldn't have included so many
> > oprofile outputs. Anyway, if anyone wants to have a look, the most
> > important is two_discs_bad.txt oprofile output, attached on my original
> > message. The problem is 100% reproducible for me so I would appreciate if
> > anyone told me he has similar experiences.
>
> Probably nobody replied to Your message because people at this list think
> that Your problem isn't kernel related. In this moment I'm using "Arch
> Linux" too, so I checked /etc/cron directory. There simple jobs You are
> talking about are not so simple:
> - update the "locate" database,
> - update the "whatis" database.
> Both jobs are scaning "/" partition. I don't know how dcron works, but I
> can imagine situation in which it is polling cron.daily and says: "hey it
> wasn't done today yet" and it is starting same jobs over and over again.
> More and more tasks scans the "/" partition and in result access is slower
> and slower.

Hello and thanks for your reply. 

The cron job that is running every 10 min on my system is mpop (a 
fetchmail-like program) and another running every 5 min is mrtg. Both 
normally finish within 1-2 seconds. 

The fact that these simple cron jobs don't finish ever is certainly because of 
the high system CPU load. If you see the two_discs_bad.txt which I attached 
on my original message, you'll see that *vmlinux*, and specifically the 
*scheduler*, take up most time. 

And the fact that this happens only when running two i/o processes but when 
running only one everything is absolutely snappy (not at all slow, see 
one_disc.txt), makes me sure that this is a kernel bug. I'd be happy to help 
but I need some guidance to pinpoint the problem. 

Thanks, 
Dimitris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/23] per device dirty throttling -v8

2007-08-05 Thread Jörn Engel

On Sun, 5 August 2007 11:02:33 -0700, Arjan van de Ven wrote:
> 
> but does it work with relatime ?

Like a greased penguin.  I had to reboot with my ugly patch posted
earlier in the patch to actually test it, though.  Relatime suffers from
a distribution problem, nothing else.

Guess I should throw in a kernel compile test as well, just to get a
feel for the performance.

Jörn

-- 
Homo Sapiens is a goal, not a description.
-- unknown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] create CONFIG_SUSPEND_UP_POSSIBLE

2007-08-05 Thread Pavel Machek

On Fri 2007-08-03 15:23:19, Len Brown wrote:
> On Tuesday 31 July 2007 02:38, Pavel Machek wrote:
> > Hi!
> > 
> > > Without this change, it is possible to build CONFIG_HIBERNATE
> > > on all !SMP architectures, but not necessarily their SMP versions.
> > 
> > Did you want to say "CONFIG_SUSPEND"?
> 
> Yes.
> 
> > > I don't know for sure if the architecture list under SUSPEND_UP_POSSIBLE
> > > is correct.  For now it simply matches the list for
> > > SUSPEND_SMP_POSSIBLE.
> > 
> > I do not think it is.
> > 
> > > Signed-off-by: Len Brown <[EMAIL PROTECTED]>
> > > ---
> > >  Kconfig |7 ++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> > > index 412859f..ccf6576 100644
> > > --- a/kernel/power/Kconfig
> > > +++ b/kernel/power/Kconfig
> > > @@ -72,6 +72,11 @@ config PM_TRACE
> > >   CAUTION: this option will cause your machine's real-time clock to be
> > >   set to an invalid time after a resume.
> > >  
> > > +config SUSPEND_UP_POSSIBLE
> > > + bool
> > > + depends on (X86 && !X86_VOYAGER) || (PPC64 && (PPC_PSERIES ||
> > 
> > At least ARM can do suspend, too... probably others. I was under
> > impression that SUSPEND is "supported" by all the architectures, just
> > some of them veto it at runtime (using pm_ops or how was it renamed).
> 
> The reason this entire thread started is because Linus, Jeff and others
> said that they didn't want code magically compiled into their kernel
> that they did not explicitly ask for -- even if the savings were small
> and that kernel was already something rather beefy, such as ACPI+SMP.
> 
> The current code is simply broken, because it allows SUSPEND
> on IA64 if UP, but not on SMP.  It should really be neither.

Actually, it should be both, AFAICT. Suspend infrastructure should be
there, just returing -EINVAL... that's how it worked in 2.6.22 IIRC.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 >

1 - 100 of 494 matches

Mail list logo