Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread David Chinner
On Tue, Jan 15, 2008 at 07:44:15PM -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2008 11:01:08 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Jan 15, 2008 at 09:53:42AM -0800, Michael Rubin wrote:
> > > On Jan 15, 2008 12:46 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > > Just a quick question, how does this interact/depend-uppon etc.. with
> > > > Fengguangs patches I still have in my mailbox? (Those from Dec 28th)
> > > 
> > > They don't. They apply to a 2.6.24rc7 tree. This is a candidte for 2.6.25.
> > > 
> > > This work was done before Fengguang's patches. I am trying to test
> > > Fengguang's for comparison but am having problems with getting mm1 to
> > > boot on my systems.
> > 
> > Yeah, they are independent ones. The initial motivation is to fix the
> > bug "sluggish writeback on small+large files". Michael introduced
> > a new rbtree, and me introduced a new list(s_more_io_wait).
> > 
> > Basically I think rbtree is an overkill to do time based ordering.
> > Sorry, Michael. But s_dirty would be enough for that. Plus, s_more_io
> > provides fair queuing between small/large files, and s_more_io_wait
> > provides waiting mechanism for blocked inodes.
> > 
> > The time ordered rbtree may delay io for a blocked inode simply by
> > modifying its dirtied_when and reinsert it. But it would no longer be
> > that easy if it is to be ordered by location.
> 
> What does the term "ordered by location" mean?  Attemting to sort inodes by
> physical disk address?  By using their i_ino as a key?
> 
> That sounds optimistic.

In XFS, inode number is an encoding of it's location on disk, so
ordering inode writeback by inode number *does* make sense.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: What's in sh-2.6.git for 2.6.25?

2008-01-15 Thread Sam Ravnborg
On Wed, Jan 16, 2008 at 04:04:03PM +0900, Paul Mundt wrote:
> This is a brief summary of the changes that are sitting in the sh queue
> for 2.6.25.
> 
> The main points to note are as follows:
> 
>   - sh64->sh integration.
>   - A handful of new CPUs (SH7721, SH7763, SH7203, SH7263).
>   - SH-2A FPU support.
>   - Board support updates (R2D, R7785RP).
> 
> The sh64->sh integration is basically the only thing that's really
> interesting, and so it's worth summarizing that a bit.

Any kbuild bits that needs an extra pair of eyes before
integration or can I postpone my review until it hits mainline?
If you feel confident I prefer to wait as I'm busy atm.

Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/4] x86: PAT followup - use ioremap for devmem read of reserved regions

2008-01-15 Thread Ingo Molnar

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> --- linux-2.6.git.orig/drivers/char/mem.c 2008-01-15 10:05:13.0 
> -0800
> +++ linux-2.6.git/drivers/char/mem.c  2008-01-15 10:05:51.0 -0800
> @@ -127,9 +127,14 @@
>* by the kernel or data corruption may occur
>*/
>   ptr = xlate_dev_mem_ptr(p);
> + if (!ptr)
> + return -EFAULT;
>  
>   if (copy_to_user(buf, ptr, sz))
>   return -EFAULT;
> +
> + unxlate_dev_mem_ptr(p, ptr);

sidenote: drivers/char/mem.c has no locking here, are you sure it's safe 
to create a possibly large number of aliases here? At least on 32-bit it 
could deplete the vmalloc area. (where all ioremaps go)

since /dev/mem access is strongly discouraged anyway (except perhaps for 
debugging purposes), wouldnt it be safer to stick a mutex around these 
areas?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs

2008-01-15 Thread Nick Piggin
On Monday 14 January 2008 22:30, Andi Kleen wrote:

> In general there are more scaling problems like this (e.g. it also doesn't
> make sense to scale kernel threads for each CPU thread for example).

I think in a lot of ways, per-CPU kernel threads scale OK. At least
they should mostly be dynamic, so they don't require overhead on
smaller systems. On larger systems, I don't know if there are too
many kernel problems with all those threads (except for userspace
tools sometimes don't report well).

And I think making them per-CPU can be much easier than tuning some
arbitrary algorithm to get a mix between parallelism and footprint.

For example, I'm finding that it might actually be worthwhile to move
some per-node and dynamically-controlled thread creation over to the
basic per-CPU scheme because of differences in topologies...

Anyway, that's just an aside.

Oh, just while I remember it also, something funny is that MAX_NUMNODES
can be bigger than NR_CPUS on x86. I guess one can have CPUless nodes,
but wouldn't it make sense to have an upper bound of NR_CPUS by default?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] rlim in proc//status (2nd rev.)

2008-01-15 Thread KOSAKI Motohiro
Hi Clifford,

> +static inline char *task_rlim(struct task_struct *p, char *buffer)
> +{
> + unsigned long flags;
> + struct rlimit rlim[RLIM_NLIMITS];
> + int i;
> + 
> + rcu_read_lock();
> + if (lock_task_sighand(p, )) {
> + for (i=0; i + rlim[i] = p->signal->rlim[i];
> + unlock_task_sighand(p, );
> + }

lock_task_sighand is possible return NULL?
if so, rlim is uninitialized when NULL.


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/4] x86: PAT followup - Incremental changes and bug fixes

2008-01-15 Thread Ingo Molnar

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> Some incremental changes and bug fixes for PAT patchset. The changes 
> are from the feedback we received earlier. There are few more pending 
> changes that will follow soon.

thanks, applied them to x86.git.

Note that PAT is still hardcoded to disabled in arch/x86/mm/pat.c:

  int __read_mostly pat_disabled = 1;

because one of my testsystems failed during bootup. I'll re-check 
whether these fixes resolve that, and if it passes then we could enable 
PAT.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] x86: some more patches

2008-01-15 Thread Ingo Molnar

* Andi Kleen <[EMAIL PROTECTED]> wrote:

> [...] I found that Jan's ioremap fix also causes silent hangs here 
> during earlier bisecting (not sure why)

please send a fuller bugreport - all these problems on your testbox 
might be interrelated. Jan's patch is undone in a later part of the 
series so it's a bisection artifact. I've fixed that up.

Ingo

--->
Subject: x86: fix 64-bit ioremap()
From: From: Jan Beulich <[EMAIL PROTECTED]>

> Yeah, that may be true, but this particular tree is weird, and I'm trying
> to understand what's going on here.  Specifically, 64-bit ioremap()s
> *don't* set _PAGE_GLOBAL, which appears to be an accident resulting from
> the strange definitions of __PAGE_KERNEL_* vs PAGE_KERNEL_*.

ioremap() should set G agreed.

> For example, ioremap_64.c:__ioremap() creates a vma for the io mapping, and
> explicitly sets _PAGE_GLOBAL in the vma's version of pgprot - but then it
> calls ioremap_page_range() to actually create the mapping, which ends up
> making a non-global mapping, because its rolling its own version of
> PAGE_KERNEL by using pgprot(__PAGE_KERNEL) - which is not the actual
> definition of PAGE_KERNEL.

That should not really matter because ioremap_change_attr()->c_p_a is only 
called
when flags is != 0 and that means it is already different from PAGE_KERNEL.

>
> I think there's a bug around here, but I think its currently being hidden

There's one Jan pointed out: iounmap does not subtract the guard page size
so it ends up resetting one page too much. That is probably what causes your
problem. But again you should be passing in G in the first place.

Additionally I found it necessary to fix ioremap_64.c's use of
change_page_attr_addr():

[ [EMAIL PROTECTED]: fixed coding style errors ]

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 arch/x86/mm/ioremap_64.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

Index: linux-x86.q/arch/x86/mm/ioremap_64.c
===
--- linux-x86.q.orig/arch/x86/mm/ioremap_64.c
+++ linux-x86.q/arch/x86/mm/ioremap_64.c
@@ -45,10 +45,12 @@ ioremap_change_attr(unsigned long phys_a
unsigned long vaddr = (unsigned long) __va(phys_addr);
 
/*
-* Must use a address here and not struct page because the phys 
addr
-* can be a in hole between nodes and not have an memmap entry.
+* Must use an address here and not struct page because the
+* phys addr can be a in hole between nodes and not have an
+* memmap entry:
 */
-   err = 
change_page_attr_addr(vaddr,npages,__pgprot(__PAGE_KERNEL|flags));
+   err = change_page_attr_addr(vaddr, npages,
+   MAKE_GLOBAL(__PAGE_KERNEL|flags));
if (!err)
global_flush_tlb();
}
@@ -181,7 +183,7 @@ void iounmap(volatile void __iomem *addr
 
/* Reset the direct mapping. Can block */
if (p->flags >> 20)
-   ioremap_change_attr(p->phys_addr, p->size, 0);
+   ioremap_change_attr(p->phys_addr, get_vm_area_size(p), 0);
 
/* Finally remove it */
o = remove_vm_area((void *)addr);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22

2008-01-15 Thread Zhang, Yanmin
On Wed, 2008-01-16 at 08:34 +0800, Zhang, Yanmin wrote:
> On Mon, 2008-01-14 at 21:53 +1100, Herbert Xu wrote:
> > On Mon, Jan 14, 2008 at 08:44:40AM +, Ilpo Jrvinen wrote:
> > >
> > > > > I tried to use bisect to locate the bad patch between 2.6.22 and 
> > > > > 2.6.23-rc1,
> > > > > but the bisected kernel wasn't stable and went crazy.
> > > 
> > > TCP work between that is very much non-existing.
> > 
> > Make sure you haven't switched between SLAB/SLUB while testing this.
> I can make sure. In addition, I tried both SLAB and SLUB and make sure the 
> regression is still there if CONFIG_SLAB=y.
I retried bisect between 2.6.22 and 2.6.23-rc1. This time, I enabled 
CONFIG_SLAB=y,
and deleted the warmup procedure in the testing scripts. In addition, bind the 2
processes on the same logical processor. The regression is about 20% which is 
larger
than the one when binding 2 processes to different core.

The new bisect reported cfs core patch causes it. The results of every step look
stable.

dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 is first bad commit
commit dd41f596cda0d7d6e4a8b139ffdfabcefdd46528
Author: Ingo Molnar <[EMAIL PROTECTED]>
Date:   Mon Jul 9 18:51:59 2007 +0200

sched: cfs core code

apply the CFS core code.

this change switches over the scheduler core to CFS's modular
design and makes use of kernel/sched_fair/rt/idletask.c to implement
Linux's scheduling policies.

thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
feedback and for fixlets.

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
Signed-off-by: Mike Galbraith <[EMAIL PROTECTED]>
Signed-off-by: Dmitry Adamushko <[EMAIL PROTECTED]>
Signed-off-by: Srivatsa Vaddagiri <[EMAIL PROTECTED]>


-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Andrew Morton
On Wed, 16 Jan 2008 00:09:31 -0700 "Dan Williams" <[EMAIL PROTECTED]> wrote:

> > heheh.
> >
> > it's really easy to reproduce the hang without the patch -- i could
> > hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
> > i'll try with ext3... Dan's experiences suggest it won't happen with ext3
> > (or is even more rare), which would explain why this has is overall a
> > rare problem.
> >
> 
> Hmmm... how rare?
> 
> http://marc.info/?l=linux-kernel=119461747005776=2
> 
> There is nothing specific that prevents other filesystems from hitting
> it, perhaps XFS is just better at submitting large i/o's.  -stable
> should get some kind of treatment.  I'll take altered performance over
> a hung system.

We can always target 2.6.25-rc1 then 2.6.24.1 if Neil is still feeling
wimpy.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-15 Thread S.Çağlar Onur
Hi,

16 Oca 2008 Çar tarihinde, Steven Rostedt şunları yazmıştı: 
> On Tue, 15 Jan 2008, [utf-8] S.Ã^GaÄ^_lar Onur wrote:
> > 2.6.24-rc7-rt2 (-rt2 patchset on top of Linus's current git commit
> > 031f2dcd7075e218e74dd7f942ad015cf82dffab) starts to complain like
> > following (full dmesg can be found @ [1]) when try to login from console
> > (the other acpi related errors also existed in 2.6.24-rc5-rt1) and FYI,
> > plain 2.6.24-rc7 (again commit 031f2dcd7075e218e74dd7f942ad015cf82dffab)
> > has no issues.
>
> Do you get the same issues if you add to -rc7 and not git.

I'll try after trying -rt3.

> > [...]
> > sysfs: duplicate filename 'vcs1' can not be created
> > WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
> > Pid: 1298, comm: mingetty Not tainted 2.6.24-rc7-rt2-99 #1
> > [...]
> >
> > And because of mcount-add-basic-support-for-gcc-profiler-instrum.patch,
> > closed source nvidia-new module cannot be used with this release (mcount
> > is exported GPL only), i know this is not supported but i used it with
> > that [2] patch up until now without a single problem.
>
> Ah, sorry about that. I'll try to fix that later on. You should still be
> able to use NVidia by turning off function trace.

Wonderfull news :) [and yes turning off function trace works]

> > Please don't misunderstand this, i really do not want to start a
> > discussion for this, i just want to ask the possibility of converting
> > this into EXPORT_SYMBOL cause i thought some of the possible -rt users
> > may need this closed source module explicitly because of its 3D
> > performance.
> >
> > If anything else needed for sysfs warnings please just say it...
> >
> > [1] http://cekirdek.pardus.org.tr/~caglar/dmesg.rt
> > [2]
> > http://svn.pardus.org.tr/pardus/devel/kernel/drivers/nvidia-new/files/rt.
> >patch
>
> Thanks for the report. I'll see what I can do for the next release. But
> for now this will have to wait till after -rt3.

Thanks...

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH] rlim in proc//status (2nd rev.)

2008-01-15 Thread Clifford Wolf
Hi,

On Tue, Jan 15, 2008 at 02:36:59PM -0600, [EMAIL PROTECTED] wrote:
> > +   rcu_read_lock();
> > +   if (lock_task_sighand(p, )) {
> > +   for (i=0; i > +   rlim[i] = p->signal->rlim[i];
> 
> I'm confused - where do you unlock_task_sighand()?

oh fsck! thanks for that pointer..

Here is a new version of the patch which solves this issue and the issues
adressed earlier in this thread by kosaki.

yours,
 - clifford

Signed-off-by: Clifford Wolf <[EMAIL PROTECTED]>

--- linux/fs/proc/array.c   (revision 750)
+++ linux/fs/proc/array.c   (revision 764)
@@ -239,6 +239,58 @@
}
 }
 
+static char *rlim_names[RLIM_NLIMITS] = {
+   [RLIMIT_CPU]= "CPU",
+   [RLIMIT_FSIZE]  = "FSize",
+   [RLIMIT_DATA]   = "Data",
+   [RLIMIT_STACK]  = "Stack",
+   [RLIMIT_CORE]   = "Core",
+   [RLIMIT_RSS]= "RSS",
+   [RLIMIT_NPROC]  = "NProc",
+   [RLIMIT_NOFILE] = "NoFile",
+   [RLIMIT_MEMLOCK]= "MemLock",
+   [RLIMIT_AS] = "AddrSpace",
+   [RLIMIT_LOCKS]  = "Locks",
+   [RLIMIT_SIGPENDING] = "SigPending",
+   [RLIMIT_MSGQUEUE]   = "MsgQueue",
+   [RLIMIT_NICE]   = "Nice",
+   [RLIMIT_RTPRIO] = "RTPrio"
+};
+
+#if RLIM_NLIMITS != 15
+#  error Value of RLIM_NLIMITS changed. \
+ Please update rlim_names in fs/proc/array.c
+#endif
+
+static inline char *task_rlim(struct task_struct *p, char *buffer)
+{
+   unsigned long flags;
+   struct rlimit rlim[RLIM_NLIMITS];
+   int i;
+   
+   rcu_read_lock();
+   if (lock_task_sighand(p, )) {
+   for (i=0; isignal->rlim[i];
+   unlock_task_sighand(p, );
+   }
+   rcu_read_unlock();
+
+   for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Dan Williams
> heheh.
>
> it's really easy to reproduce the hang without the patch -- i could
> hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
> i'll try with ext3... Dan's experiences suggest it won't happen with ext3
> (or is even more rare), which would explain why this has is overall a
> rare problem.
>

Hmmm... how rare?

http://marc.info/?l=linux-kernel=119461747005776=2

There is nothing specific that prevents other filesystems from hitting
it, perhaps XFS is just better at submitting large i/o's.  -stable
should get some kind of treatment.  I'll take altered performance over
a hung system.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


What's in sh-2.6.git for 2.6.25?

2008-01-15 Thread Paul Mundt
This is a brief summary of the changes that are sitting in the sh queue
for 2.6.25.

The main points to note are as follows:

- sh64->sh integration.
- A handful of new CPUs (SH7721, SH7763, SH7203, SH7263).
- SH-2A FPU support.
- Board support updates (R2D, R7785RP).

The sh64->sh integration is basically the only thing that's really
interesting, and so it's worth summarizing that a bit.

The sh64 and sh ports had both areas of considerable divergence, and
areas of considerable overlap. In the past consolidation has not been
possible since there was no clean way to abstract the differences with
a common implementation (the obvious case is that the SH-5 and the other
parts use a totally different instruction and register set). With the
work in the nommu area, we've already had to abstract most of the
exception handling code, disjoint syscall ABIs, incompatible instruction
sets, and so on. As a result of that work, the SH-5 integration was
finally at a point where it could be done with minimal pain. The fact
that the sh64 port itself was bitrotting was also a motivator for just
getting the integration done and over with.

There is still more work to do on unifying the _32/_64 splits, especially
as we have to start supporting new CPUs that sit somewhere between the
SH-4A and the SH-5 architecturally. The integration work is an ongoing
effort, and there will likely still be a bit of churn in this area
throughout 2.6.25 and in to 2.6.26.

These changes have basically been in -mm for a few iterations, and so
nothing here should be much of a surprise. We do manage to kill off quite
a bit of code in the process, and this is obviously a number that will go
up considerably as more _32/_64 split unification is done going in to
2.6.25 proper. Most of this is just a reorganization of existing in-tree
code, so there's very little in the way of new code or surprises here.

The tree in question can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.git

Which contains:

Andrew Murray (2):
  sh: sh7712 clock support
  sh: Update SE7712 PCLK definition.

Harvey Harrison (1):
  sh: Use def_bool where possible.

Joe Perches (2):
  sh: arch/sh/: Spelling fixes.
  sh: include/asm-sh/: Spelling fixes.

Magnus Damm (4):
  sh: intc - remove default interrupt priority tables
  sh: r2d - add lcd planel timings to sm501 platform data
  sh: remove voyagergx
  sh: r2d - enable sm501 usb host function

Nobuhiro Iwamatsu (1):
  sh: Fix compile error of arch/sh/mm/pmb.c

Paul Mundt (160):
  rtc: rtc-sh: SH-5 support.
  sh64: Use the generic rtc-sh driver.
  sh: Rename Kconfig to Kconfig.sh.
  sh: Move CPU subtypes to Kconfig.sh.
  sh: Add a SUPERH32 config symbol.
  sh: Split out PXSEG segmentation per-CPU family.
  sh: Split out cache status bits per-CPU family.
  sh: Move the sh64 Kconfig to arch/sh/Kconfig.sh64.
  sh: Move arch/sh64/lib to arch/sh/lib64.
  sh: Plug SH-5 in to arch/sh/Makefile.
  sh: Switch Kconfig.sh64 to use arch/sh/mm/Kconfig.
  sh: Add SH-5 support to asm/module.h.
  sh: Fix up fixmap location for SH-5.
  sh: BUGFLAG_WARNING needs GENERIC_BUG.
  sh: Add addrspace.h segmentation stub for SH-5.
  sh: Add cache definitions for SH-5.
  sh: Correct SH-5 instruction size value.
  sh: Move sh64 boards to arch/sh/.
  sh: Move sh64 board defconfigs to arch/sh/configs.
  sh64: Kill off arch/sh64/oprofile.
  sh: Add in cacheflush and DMA headers for SH-5.
  sh: Add SH-5 support to io.h.
  sh: Split out asm/string.h for sh32 and sh64.
  sh: Split out irqflags.h in to _32 and _64 variants.
  sh: SH-5 version of current_thread_info().
  sh: Consolidate CPU features in Kconfig.cpu.
  sh: SH-5 byteorder routines.
  sh: Move sh32 optimized I/O routines to arch/sh/lib/
  sh: Kill off lib64 version of io.c.
  sh: Move in the SH-5 mmu_context headers.
  sh: Have 32-bit use arch/sh/kernel/Makefile_32.
  sh: Split out arch/sh/kernel/process.c for _32 and _64 variants.
  sh: SH-5 pt_regs.
  sh: Split out processor.h in to _32 and _64 variants.
  sh: Split out 29-bit and 32-bit physical mode definitions.
  sh: Split out system.h in to _32 and _64 variants.
  sh: Move in the SH-5 ptrace impl.
  sh: SH-5 also uses the ASID cache.
  sh: Split out uaccess.h in to _32 and _64 variants.
  sh: Consolidate slab/kmalloc minalign values.
  sh: More SH-5 cpuinfo tidying.
  sh: Move in the SH-5 signal trampoline impl.
  sh: Move arch/sh64/kernel/sys_sh64.c to arch/sh/kernel/
  sh: timer.h stub for SH-5.
  sh: Move in the SH-5 traps.c impl.
  sh: imask IRQ depends on sh32.
  sh: Don't reference UBC code in CPU init on sh64.
  sh: Disable initial cache flush on SH-5.
  sh: Have SH-5 provide an {en,dis}able_fpu() impl.
  sh: Move over the SH-5 head.S and 

Re: 2.6.24-rc7, intel audio: alsa doesn't say a beep

2008-01-15 Thread Harald Dunkel

Takashi Iwai wrote:


Linus, please revert the commit 57a04513cb3 as now.
The life can go well without this patch.



hda_intel.c works for me in rc8.


Many thanx to all

Harri

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: questions on NAPI processing latency and dropped network packets

2008-01-15 Thread Jarek Poplawski
On Wed, Jan 16, 2008 at 11:17:08AM +1100, Herbert Xu wrote:
...
> Well people are always going to operate on this model for commercial
> reasons.  FWIW I used to work for a company that stuck to a specific
> version of the Linux kernel, and I suppose I still do even now :)
> 
> But the important thing is that if you're going to do that, then the
> cost that comes with it should be borne by the company and not the
> community.

Sure. But the most sad thing is there seems to be not so much savings
in this (unless a company isn't sure of its near future). Trying to
upgrade and test current products with current kernels, even if not
necessary, should be always useful and make developing of new products
faster and better fit (and of course, BTW, make the kernel better on
time).

Regards,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: hpet_late_init hang

2008-01-15 Thread Ingo Molnar

(Balaji Cc:-ed)

* Yinghai Lu <[EMAIL PROTECTED]> wrote:

> "
> commit e5ed385fa0d6f35406e3e3ed75e5eb9adeb811df
> Author: Balaji Rao <[EMAIL PROTECTED]>
> Date:   Tue Jan 15 16:53:29 2008 +0100
> 
> Assign IRQs to HPET Timers
> "
> in x86.git
> 
> cause my servers hang
> after
> Calling initcall 0x80b9a465: hpet_late_init+0x0/0x100()

i'm wondering, where does it hang exactly and why?

> after reverting that I got:
> 
> initcall 0x80b947d1 ran for 19 msecs: pci_iommu_init+0x0/0x13()
> Calling initcall 0x80b9a465: hpet_late_init+0x0/0x100()
> hpet0: at MMIO 0xfed0, IRQs 2, 8, 31
> hpet0: 3 32-bit timers, 2500 Hz
> initcall 0x80b9a465: hpet_late_init+0x0/0x100() returned 0.
> initcall 0x80b9a465 ran for 7 msecs: hpet_late_init+0x0/0x100()
> 
>CPU0   CPU1   CPU2   CPU3   CPU4   CPU5
>   CPU6   CPU7
>   0: 86  0  0  0  0  0
>  1  0   IO-APIC-edge  timer
>   4:  0  0  0  0  0  0
>  1838   IO-APIC-edge  serial
>   7:  1  0  0  0  0  0
>  0  0   IO-APIC-edge
>   8:  0  0  0  0  0  0
>  0  0   IO-APIC-edge  rtc0
> 
> for mcp55, it should already route hpet to ioapic pin2 or the irq0.

hm, these new bits:

+   /* Assign IRQs statically for legacy devices */
+   hpetp->hp_dev[0].hd_hdwirq = hdp->hd_irq[0];
+   hpetp->hp_dev[1].hd_hdwirq = hdp->hd_irq[1];

seem to be different from where we came from:

-   for (i = 2; i < nrtimers; timer++, i++)
-   hd.hd_irq[i] = (timer->hpet_config & Tn_INT_ROUTE_CNF_MASK) >>
-   Tn_INT_ROUTE_CNF_SHIFT;

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

2008-01-15 Thread Tejun Heo
Linus Torvalds wrote:
> 
> On Wed, 16 Jan 2008, Tejun Heo wrote:
>> * sysfs_move_dir() has an extra dput() on success path.
> 
> Are you sure? How did this ever work?

I'm pretty sure.  I've seen dentry blowing up due to early release &&
compared it with older code.  It was my mistake during restructuring
error path.  The only user of sysfs_move_dir() was S390 Cornelia works
on (cc'd).  Cornelia is usually very good at spotting and debugging
sysfs bugs.  Dunno how it got slipped this time.

> Also, looking at this, I think the "how did this ever work" question is 
> answered by "it didn't",

Before dput() bug was introduced, it worked although error handling path
was broken.

> but I also think there are still serious problems 
> there. Look at
> 
>   again:
>   mutex_lock(_parent->d_inode->i_mutex);
>   if (!mutex_trylock(_parent->d_inode->i_mutex)) {
>   mutex_unlock(_parent->d_inode->i_mutex);
>   goto again;
>   }
> 
> and wonder what happen sif old_parent == new_parent. Is that trying to 
> avoid an ABBA deadlock?

It will fall in infinite loop if old_parent == new_parent and for the
question, I suppose so.  Cornelia, right?

> Normally you'd do it by ordering the locks, or by 
> taking a third lock to guarantee serialization at a higher level (ie the 
> "s_vfs_rename_mutex" on the VFS layer)

sysfs currently doesn't depend on VFS locking.  VFS locking is done just
to keep VFS layer happy.  sysfs_dirent hierarchy is protected by
sysfs_mutex and renaming/moving are protected by sysfs_rename_mutex.  As
both ops are under rename_mutex, I think the above code just can grab
both mutexes in any order.  It's probably a remnant of the days when
sysfs used VFS locking to protect internal structures.

s390 was the only user of the move interface till now and through all
the recent sysfs change, it didn't receive enough attention other than
Cornelia's testing.  Eventually, I think sysfs_rename_dir() and
sysfs_move_dir() should be merged into sysfs_move() but for the current
two users, I don't see anything wrong with the locking.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-15 Thread Valdis . Kletnieks
On Tue, 15 Jan 2008 23:04:39 EST, Steven Rostedt said:
> 
> On Tue, 15 Jan 2008 [EMAIL PROTECTED] wrote:
> 
> > On Tue, 15 Jan 2008 02:37:37 +0200, =?utf-8?q?S=2E=C3=87a=C4=9Flar?= Onur 
> > said:
> > > And because of mcount-add-basic-support-for-gcc-profiler-instrum.patch, 
> > > closed
> > > source nvidia-new module cannot be used with this release (mcount is 
> > > exported
> > > GPL only), i know this is not supported but i used it with that [2] patch 
> > > up
> > > until now without a single problem.
> >
> > Playing devil's advocate here - the claim is that EXPORT_SYMBOL_GPL is to
> > indicate that code is getting too chummy with Linux internals.
> >
> > However, in *this* case, isn't it "code that is too chummy with *GCC* 
> > internals",
> > and thus it isn't our place to say what can and can't be done with code that
> > is derivative of the GCC compiler? ;)
> 
> Actually, it got put in there by accident. I usually default all my
> exports as GPL.  But this breaks pretty much everything, so I'll leave it
> as EXPORT_SYMBOL.

OK, I can live with that. ;)


pgpEQkhkYNbPh.pgp
Description: PGP signature


rvr split LRU minor regression ?

2008-01-15 Thread KOSAKI Motohiro
Hi Rik

I tested new hackbench on rvr split LRU patch.

new hackbench URL is
   http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c


method of test

(1) $ ./hackbench 150 process 1000
(2) # sync; echo 3 > /proc/sys/vm/drop_caches
$ dd if=tmp10G of=/dev/null
$ ./hackbench 150 process 1000

test machine:
  CPU:x86_64 1.86GHz x2
  memory: 6GB


result:

 2.6.24-rc6-mm1  +rvr-split-lru  ratio
(small is faster)
---
(1)  364.981 359.38698.47%
(2)  364.461 387.471   106.31%


more detail:
1. /usr/bin/time command output

vanilla 2.6.24-rc6-mm1
33.74user 703.10system 6:09.56elapsed 199%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (0major+372467minor)pagefaults 0swaps

2.6.24-rc6-mm1 + rvr-split-lru
36.22user 731.30system 6:35.16elapsed 194%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (804major+389524minor)pagefaults 0swaps

It seems increase page fault.


2.
after test (2), cat /proc/meminfo

vanilla 2.6.24-rc6-mm1

MemTotal:  5931808 kB
MemFree:   1751632 kB
Buffers:  4360 kB
Cached:3930020 kB
SwapCached:  0 kB
Active:  46396 kB
Inactive:  3924108 kB
SwapTotal:20972848 kB
SwapFree: 20972720 kB
Dirty:   0 kB
Writeback:   0 kB
AnonPages:   36140 kB
Mapped:  10104 kB
Slab:   160020 kB
SReclaimable: 3460 kB
SUnreclaim: 156560 kB
PageTables:   3712 kB
NFS_Unstable:0 kB
Bounce:  0 kB
CommitLimit:  23938752 kB
Committed_AS:78940 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 57220 kB
VmallocChunk: 34359680999 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
HugePages_Surp:  0
Hugepagesize: 2048 kB


2.6.24-rc6-mm1 + rvr-split-lru

MemTotal:5931356 kB
MemFree: 1771800 kB
Buffers:2776 kB
Cached:  3914800 kB
SwapCached: 7940 kB
Active(anon):  21868 kB
Inactive(anon): 6560 kB
Active(file):1722888 kB
Inactive(file):  2192128 kB
Noreclaim:3472 kB
Mlocked:  3724 kB
SwapTotal:  20972848 kB
SwapFree:   20935032 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 23912 kB
Mapped: 9500 kB
Slab: 162188 kB
SReclaimable:   5544 kB
SUnreclaim:   156644 kB
PageTables:  kB
NFS_Unstable:  0 kB
Bounce:0 kB
CommitLimit:23938524 kB
Committed_AS: 106816 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   57220 kB
VmallocChunk:   34359680999 kB
HugePages_Total: 0
HugePages_Free:  0
HugePages_Rsvd:  0
HugePages_Surp:  0
Hugepagesize: 2048 kB


It seems used once memory incorrect activation increased.
What do you think it?



- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Tue, 15 Jan 2008, Andrew Morton wrote:

> On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> 
> wrote:
> 
> > On Mon, 14 Jan 2008, NeilBrown wrote:
> > 
> > > 
> > > raid5's 'make_request' function calls generic_make_request on
> > > underlying devices and if we run out of stripe heads, it could end up
> > > waiting for one of those requests to complete.
> > > This is bad as recursive calls to generic_make_request go on a queue
> > > and are not even attempted until make_request completes.
> > > 
> > > So: don't make any generic_make_request calls in raid5 make_request
> > > until all waiting has been done.  We do this by simply setting
> > > STRIPE_HANDLE instead of calling handle_stripe().
> > > 
> > > If we need more stripe_heads, raid5d will get called to process the
> > > pending stripe_heads which will call generic_make_request from a
> > > different thread where no deadlock will happen.
> > > 
> > > 
> > > This change by itself causes a performance hit.  So add a change so
> > > that raid5_activate_delayed is only called at unplug time, never in
> > > raid5.  This seems to bring back the performance numbers.  Calling it
> > > in raid5d was sometimes too soon...
> > > 
> > > Cc: "Dan Williams" <[EMAIL PROTECTED]>
> > > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> > 
> > probably doesn't matter, but for the record:
> > 
> > Tested-by: dean gaudet <[EMAIL PROTECTED]>
> > 
> > this time i tested with internal and external bitmaps and it survived 8h 
> > and 14h resp. under the parallel tar workload i used to reproduce the 
> > hang.
> > 
> > btw this should probably be a candidate for 2.6.22 and .23 stable.
> > 
> 
> hm, Neil said
> 
>   The first fixes a bug which could make it a candidate for 24-final. 
>   However it is a deadlock that seems to occur very rarely, and has been in
>   mainline since 2.6.22.  So letting it into one more release shouldn't be
>   a big problem.  While the fix is fairly simple, it could have some
>   unexpected consequences, so I'd rather go for the next cycle.
> 
> food fight!
> 

heheh.

it's really easy to reproduce the hang without the patch -- i could
hang the box in under 20 min on 2.6.22+ w/XFS and raid5 on 7x750GB.
i'll try with ext3... Dan's experiences suggest it won't happen with ext3
(or is even more rare), which would explain why this has is overall a
rare problem.

but it doesn't result in dataloss or permanent system hangups as long
as you can become root and raise the size of the stripe cache...

so OK i agree with Neil, let's test more... food fight over! :)

-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: INITIO scsi driver fails to work properly

2008-01-15 Thread FUJITA Tomonori
On Tue, 15 Jan 2008 09:16:06 -0600
James Bottomley <[EMAIL PROTECTED]> wrote:

> 
> On Sun, 2008-01-13 at 14:28 +0200, Filippos Papadopoulos wrote:
> > On 1/11/08, James Bottomley <[EMAIL PROTECTED]> wrote:
> > >
> > > On Fri, 2008-01-11 at 18:44 +0200, Filippos Papadopoulos wrote:
> > > > On Jan 11, 2008 5:44 PM, James Bottomley
> > > > <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > I havent reported "initio: I/O port range 0x0 is busy."
> > > > >
> > > > > Sorry ... we appear to have several reporters of different bugs in 
> > > > > this
> > > > > thread.  That message was copied by Chuck Ebbert from a Red Hat
> > > > > bugzilla ... I was assuming it was the same problem.
> > > > >
> > > > > > I applied the patch on 2.6.24-rc6-git9 but unfortunatelly same 
> > > > > > thing happens.
> > > > >
> > > > > First off, has this driver ever worked for you in 2.6?  Just booting
> > > > > SLES9 (2.6.5) or RHEL4 (2.6.9) ... or one of their open equivalents to
> > > > > check a really old kernel would be helpful.  If you can get it to 
> > > > > work,
> > > > > then we can proceed with a patch reversion regime based on the
> > > > > assumption that the problem is a recent commit.
> > > >
> > > > Yes it works under 2.6.16.13.  See the beginning of this thread, i
> > > > mention there some things about newer versions.
> > >
> > > Thanks, actually, I see this:
> > >
> > > > I tried to install OpenSUSE 10.3 (kernel 2.6.22.5) and the latest
> > > > OpenSUSE 11.0 Alpha 0  (kernel 2.6.24-rc4) but although the initio
> > > > drivergets loaded during the installation process, yast reports that no 
> > > > hard
> > > > disk is found.
> > >
> > > Could you try with a vanilla 2.6.22 kernel?  The reason for all of this
> > > is that 2.6.22 predates Alan's conversion of this driver (which was my
> > > 95% candidate for the source of the bug).  I want you to try the vanilla
> > > kernel just in case the opensuse one contains a backport.
> > 
> > 
> > Yes you are right. I compiled the vanilla 2.6.22 and initio driver works.
> > Tell me if you want to apply any patch to it.
> 
> 
> That's good news ... at least we know where the issue lies; now the
> problem comes: there are two candidate patches for this issue: Alan's
> driver update patch and Tomo's accessors patch.  Unfortunately, due to
> merge conflicts the two are pretty hopelessly intertwined.  I think I
> already spotted one bug in the accessor conversion, so I'll look at that
> again.  Alan's also going to acquire an inito board and retest his
> conversions.
> 
> I'm afraid it might be a while before we have anything for you to test.

Can you try this patch?

Thanks,

diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c
index 01bf018..6891d2b 100644
--- a/drivers/scsi/initio.c
+++ b/drivers/scsi/initio.c
@@ -2609,6 +2609,7 @@ static void initio_build_scb(struct initio_host * host, 
struct scsi_ctrl_blk * c
cblk->bufptr = cpu_to_le32((u32)dma_addr);
cmnd->SCp.dma_handle = dma_addr;
 
+   cblk->sglen = nseg;
 
cblk->flags |= SCF_SG;  /* Turn on SG list flag   */
total_len = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pnpacpi : exceeded the max number of IO resources

2008-01-15 Thread Dave Young
On Jan 9, 2008 10:47 PM, Rene Herman <[EMAIL PROTECTED]> wrote:
> On 09-01-08 10:34, Frans Pop wrote:
>
> Bjorn:
>
> > Len Brown wrote:
>
>  Well, yes, the warning is actually new as well. Previously your kernel
>  just silently ignored 8 more mem resources than it does now it seems.
> 
>  Given that people are hitting these limits, it might make sense to just
>  do away with the warning for 2.6.24 again while waiting for the dynamic
>  code?
> >>> Ping. Should these warnings be reverted for 2.6.24?
> >> No. I don't think hiding this issue again is a good idea.
> >> I'd rather live with people complaining about an addition dmesg line.
> >
> > We're not talking about "a" additional line. In my case [1] we're talking
> > about 22 (!) additional identical lines.
>
> You lucky devil. Someone else reported 92 if I remember rightly. This really
> needs to be called a 2.6.24 bug. Stick the word "regression" in the subject
> line and someone will notice...
>
> The warning might provide useful information to someone looking at a dmesg
> but given that people are hitting them way too hard with the only difference
> versus 2.6.23 being tke kernel now complaining about it, they're not useful
> enough to be printed more than once, or at more then DEBUG level or even at
> all in fact since we already know the static limit isn't enough for everyone
> and needs be turned dynamic -- really, what else is someone going to debug
> with it?
>
> I'd consider Bjorn Helgaas the PnP maintainer and he earlier agreed that
> this needed something:
>
> http://lkml.org/lkml/2007/12/5/301
>
> Printing the warning only once per type as per attached fixes the problenm
> as well.
>
> Bjorn, could you push your preference into 2.6.24?
>
>
> > Not fixing this before 2.6.24 seems completely inconsistent:
> > - either this is a real bug and the ERR level message is correct, in which
> >   case the limits should be increased;
> > - or hitting the limits is harmless and the message should be changed to
> >   DEBUG level.
> >
> > It is great to hear that the memory allocation will become dynamic in the
> > future and maybe that could just justify your standpoint, but having the
> > messages is damn ugly and alarming from a user point of view.
> >
> > Please keep in mind that depending on distro release schedules, 2.6.24 could
> > live for quite a bit longer than just the period needed to release 2.6.25
> > (if that is when the dynamic allocation will be implemented).
> >
> > Cheers,
> > FJP
> >
> > [1] http://lkml.org/lkml/2008/1/6/279
>
>
>
> diff --git a/drivers/pnp/pnpacpi/rsparser.c b/drivers/pnp/pnpacpi/rsparser.c
> index 3c5eb37..cd9d4a8 100644
> --- a/drivers/pnp/pnpacpi/rsparser.c
> +++ b/drivers/pnp/pnpacpi/rsparser.c
> @@ -73,6 +73,7 @@ static void pnpacpi_parse_allocated_irqresource(struct 
> pnp_resource_table *res,
> u32 gsi, int triggering,
> int polarity, int shareable)
>  {
> +   static int warned;
> int i = 0;
> int irq;
> int p, t;
> @@ -84,8 +85,9 @@ static void pnpacpi_parse_allocated_irqresource(struct 
> pnp_resource_table *res,
>i < PNP_MAX_IRQ)
> i++;
> if (i >= PNP_MAX_IRQ) {
> -   printk(KERN_ERR "pnpacpi: exceeded the max number of IRQ "
> -   "resources: %d \n", PNP_MAX_IRQ);
> +   if (!warned++)
> +   printk(KERN_ERR "pnpacpi: exceeded the max number of 
> IRQ "
> +   "resources: %d \n", PNP_MAX_IRQ);
> return;
> }
> /*
> @@ -168,6 +170,7 @@ static void pnpacpi_parse_allocated_dmaresource(struct 
> pnp_resource_table *res,
> u32 dma, int type,
> int bus_master, int transfer)
>  {
> +   static int warned;
> int i = 0;
>
> while (i < PNP_MAX_DMA &&
> @@ -183,7 +186,7 @@ static void pnpacpi_parse_allocated_dmaresource(struct 
> pnp_resource_table *res,
> }
> res->dma_resource[i].start = dma;
> res->dma_resource[i].end = dma;
> -   } else {
> +   } else if (!warned++) {
> printk(KERN_ERR "pnpacpi: exceeded the max number of DMA "
> "resources: %d \n", PNP_MAX_DMA);
> }
> @@ -192,6 +195,7 @@ static void pnpacpi_parse_allocated_dmaresource(struct 
> pnp_resource_table *res,
>  static void pnpacpi_parse_allocated_ioresource(struct pnp_resource_table 
> *res,
>u64 io, u64 len, int io_decode)
>  {
> +   static int warned;
> int i = 0;
>
> while (!(res->port_resource[i].flags & IORESOURCE_UNSET) &&
> @@ -207,7 +211,7 @@ static void pnpacpi_parse_allocated_ioresource(struct 
> pnp_resource_table *res,
>  

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread Andrew Morton
On Tue, 15 Jan 2008 21:01:17 -0800 (PST) dean gaudet <[EMAIL PROTECTED]> wrote:

> On Mon, 14 Jan 2008, NeilBrown wrote:
> 
> > 
> > raid5's 'make_request' function calls generic_make_request on
> > underlying devices and if we run out of stripe heads, it could end up
> > waiting for one of those requests to complete.
> > This is bad as recursive calls to generic_make_request go on a queue
> > and are not even attempted until make_request completes.
> > 
> > So: don't make any generic_make_request calls in raid5 make_request
> > until all waiting has been done.  We do this by simply setting
> > STRIPE_HANDLE instead of calling handle_stripe().
> > 
> > If we need more stripe_heads, raid5d will get called to process the
> > pending stripe_heads which will call generic_make_request from a
> > different thread where no deadlock will happen.
> > 
> > 
> > This change by itself causes a performance hit.  So add a change so
> > that raid5_activate_delayed is only called at unplug time, never in
> > raid5.  This seems to bring back the performance numbers.  Calling it
> > in raid5d was sometimes too soon...
> > 
> > Cc: "Dan Williams" <[EMAIL PROTECTED]>
> > Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> 
> probably doesn't matter, but for the record:
> 
> Tested-by: dean gaudet <[EMAIL PROTECTED]>
> 
> this time i tested with internal and external bitmaps and it survived 8h 
> and 14h resp. under the parallel tar workload i used to reproduce the 
> hang.
> 
> btw this should probably be a candidate for 2.6.22 and .23 stable.
> 

hm, Neil said

  The first fixes a bug which could make it a candidate for 24-final. 
  However it is a deadlock that seems to occur very rarely, and has been in
  mainline since 2.6.22.  So letting it into one more release shouldn't be
  a big problem.  While the fix is fairly simple, it could have some
  unexpected consequences, so I'd rather go for the next cycle.

food fight!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Andrew Morton
On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:

> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
> > On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > 
> > > list_heads are OK if we use them for one and only function.
> > 
> > Not really.  They're inappropriate when you wish to remember your
> > position in the list while you dropped the lock (as we must do in
> > writeback).
> > 
> > A data structure which permits us to interate across the search key rather
> > than across the actual storage locations is more appropriate.
> 
> I totally agree with you. What I mean is to first do the split of
> functions - into three: ordering, starvation prevention, and blockade
> waiting.

Does "ordering" here refer to ordering bt time-of-first-dirty?

What is "blockade waiting"?

> Then to do better ordering by adopting radix tree(or rbtree
> if radix tree is not enough),

ordering of what?

> and lastly get rid of the list_heads to
> avoid locking. Does it sound like a good path?

I'd have thaought that replacing list_heads with another data structure
would be a simgle commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression: 100% io-wait with 2.6.24-rcX

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 04:13:22PM -0500, Mike Snitzer wrote:
> On Jan 14, 2008 7:50 AM, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > On Mon, Jan 14, 2008 at 12:41:26PM +0100, Peter Zijlstra wrote:
> > >
> > > On Mon, 2008-01-14 at 12:30 +0100, Joerg Platte wrote:
> > > > Am Montag, 14. Januar 2008 schrieb Fengguang Wu:
> > > >
> > > > > Joerg, this patch fixed the bug for me :-)
> > > >
> > > > Fengguang, congratulations, I can confirm that your patch fixed the 
> > > > bug! With
> > > > previous kernels the bug showed up after each reboot. Now, when booting 
> > > > the
> > > > patched kernel everything is fine and there is no longer any suspicious
> > > > iowait!
> > > >
> > > > Do you have an idea why this problem appeared in 2.6.24? Did somebody 
> > > > change
> > > > the ext2 code or is it related to the changes in the scheduler?
> > >
> > > It was Fengguang who changed the inode writeback code, and I guess the
> > > new and improved code was less able do deal with these funny corner
> > > cases. But he has been very good in tracking them down and solving them,
> > > kudos to him for that work!
> >
> > Thank you.
> >
> > In particular the bug is triggered by the patch named:
> > "writeback: introduce writeback_control.more_io to indicate more io"
> > That patch means to speed up writeback, but unfortunately its
> > aggressiveness has disclosed bugs in reiserfs, jfs and now ext2.
> >
> > Linus, given the number of bugs it triggered, I'd recommend revert
> > this patch(git commit 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b). Let's
> > push it back to -mm tree for more testings?
> 
> Fengguang,
> 
> I'd like to better understand where your writeback work stands
> relative to 2.6.24-rcX and -mm.  To be clear, your changes in
> 2.6.24-rc7 have been benchmarked to provide a ~33% sequential write
> performance improvement with ext3 (as compared to 2.6.22, CFS could be
> helping, etc but...).  Very impressive!

Wow, glad to hear that.

> Given this improvement it is unfortunate to see your request to revert
> 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b but it is understandable if
> you're not confident in it for 2.6.24.
> 
> That said, you recently posted an -mm patchset that first reverts
> 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b and then goes on to address
> the "slow writes for concurrent large and small file writes" bug:
> http://lkml.org/lkml/2008/1/15/132
> 
> For those interested in using your writeback improvements in
> production sooner rather than later (primarily with ext3); what
> recommendations do you have?  Just heavily test our own 2.6.24 + your
> evolving "close, but not ready for merge" -mm writeback patchset?

It's not ready mainly because it is fresh made and need more
feedbacks. It's doing OK on my desktop :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]PCIE ASPM support - takes 2

2008-01-15 Thread Shaohua Li

On Tue, 2008-01-15 at 22:56 -0500, [EMAIL PROTECTED] wrote:
> On Tue, 15 Jan 2008 13:02:26 +0800, Shaohua Li said:
> 
> > In my test, power difference between powersave mode and performance mode
> > is about 1.3w in a system with 3 PCIE links.
> 
> Do you have any numbers on what the added latency is for powersave mode, and
> a rough idea of how quickly chipsets will drop to low-power? It may affect
> usability a lot if it's "adds 10ms latency after 100ms idle" or "adds 100ms
> latency after 5 seconds idle" or some other pattern...
> 
> (The chipset in my laptop claims to be an 82801G with 4 PCI-Express ports on
> it - I'm trying to get a rough idea what usage I'd get out of that feature..)
No, I thought to get the latency impact with ASPM enabled, but haven't
found a way to measure it. This is why the default setting of ASPM
currently is using BIOS setting.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression: 100% io-wait with 2.6.24-rcX

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 10:42:13PM +0100, Ingo Molnar wrote:
> 
> * Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, Jan 14, 2008 at 12:41:26PM +0100, Peter Zijlstra wrote:
> > > 
> > > On Mon, 2008-01-14 at 12:30 +0100, Joerg Platte wrote:
> > > > Am Montag, 14. Januar 2008 schrieb Fengguang Wu:
> > > > 
> > > > > Joerg, this patch fixed the bug for me :-)
> > > > 
> > > > Fengguang, congratulations, I can confirm that your patch fixed the 
> > > > bug! With 
> > > > previous kernels the bug showed up after each reboot. Now, when booting 
> > > > the 
> > > > patched kernel everything is fine and there is no longer any suspicious 
> > > > iowait!
> > > > 
> > > > Do you have an idea why this problem appeared in 2.6.24? Did somebody 
> > > > change 
> > > > the ext2 code or is it related to the changes in the scheduler?
> > > 
> > > It was Fengguang who changed the inode writeback code, and I guess the
> > > new and improved code was less able do deal with these funny corner
> > > cases. But he has been very good in tracking them down and solving them,
> > > kudos to him for that work!
> > 
> > Thank you.
> > 
> > In particular the bug is triggered by the patch named:
> > "writeback: introduce writeback_control.more_io to indicate more io"
> > That patch means to speed up writeback, but unfortunately its
> > aggressiveness has disclosed bugs in reiserfs, jfs and now ext2.
> > 
> > Linus, given the number of bugs it triggered, I'd recommend revert 
> > this patch(git commit 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b). Let's 
> > push it back to -mm tree for more testings?
> 
> i dont think a revert at this stage is a good idea and i'm not sure 
> pushing it back into -mm would really expose more of these bugs. And 
> these are real bugs in filesystems - bugs which we want to see fixed 
> anyway. You are also tracking down those bugs very fast.
> 
> [ perhaps, if it's possible technically (and if it is clean enough), you
>   might want to offer a runtime debug tunable that can be used to switch
>   off the new aspects of your code. That would speed up testing, in case
>   anyone suspects the new writeback code. ]

The patch is too aggressive in itself. We'd better not risk on it.
The iowait is only unpleasant not destructive. But it will hurt if
many users complaints. Comment says that "nfs_writepages() sometimes
bales out without doing anything."

However I have an improved and more safe patch now. It won't iowait
when nfs_writepages() bale out without increasing pages_skipped, or
even when some buggy filesystem forget to clear PAGECACHE_TAG_DIRTY.
(The magic lies in the first chunk below.)

Mike, you can use this one on 2.6.24.


---
 fs/fs-writeback.c |   17 +++--
 include/linux/writeback.h |1 +
 mm/page-writeback.c   |9 ++---
 3 files changed, 22 insertions(+), 5 deletions(-)

--- linux.orig/fs/fs-writeback.c
+++ linux/fs/fs-writeback.c
@@ -284,7 +284,16 @@ __sync_single_inode(struct inode *inode,
 * soon as the queue becomes uncongested.
 */
inode->i_state |= I_DIRTY_PAGES;
-   requeue_io(inode);
+   if (wbc->nr_to_write <= 0)
+   /*
+* slice used up: queue for next turn
+*/
+   requeue_io(inode);
+   else
+   /*
+* somehow blocked: retry later
+*/
+   redirty_tail(inode);
} else {
/*
 * Otherwise fully redirty the inode so that
@@ -479,8 +488,12 @@ sync_sb_inodes(struct super_block *sb, s
iput(inode);
cond_resched();
spin_lock(_lock);
-   if (wbc->nr_to_write <= 0)
+   if (wbc->nr_to_write <= 0) {
+   wbc->more_io = 1;
break;
+   }
+   if (!list_empty(>s_more_io))
+   wbc->more_io = 1;
}
return; /* Leave any unwritten inodes on s_io */
 }
--- linux.orig/include/linux/writeback.h
+++ linux/include/linux/writeback.h
@@ -62,6 +62,7 @@ struct writeback_control {
unsigned for_reclaim:1; /* Invoked from the page allocator */
unsigned for_writepages:1;  /* This is a writepages() call */
unsigned range_cyclic:1;/* range_start is cyclic */
+   unsigned more_io:1; /* more io to be dispatched */
 };
 
 /*
--- linux.orig/mm/page-writeback.c
+++ linux/mm/page-writeback.c
@@ -558,6 +558,7 @@ static void background_writeout(unsigned

Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-15 Thread Valdis . Kletnieks
On Tue, 15 Jan 2008 10:09:16 EST, Ric Wheeler said:
> I actually think that the value of this kind of reduction is huge. We 
> have seen fsck run for days (not just hours) which makes the "restore 
> from backup" versus "fsck" decision favor the tapes...

Funny thing is that for many of these sorts of cases, "restore from backup"
is *also* a "days" issue unless you do a *lot* of very clever planning
ahead to be able to get multiple tape drives moving at the same time while
not causing issues at the receiving end either





pgpA1bRM0Y3yi.pgp
Description: PGP signature


Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 Thread David Miller
From: "Brandeburg, Jesse" <[EMAIL PROTECTED]>
Date: Tue, 15 Jan 2008 13:53:43 -0800

> The tx code has an "early exit" that tries to limit the amount of tx
> packets handled in a single poll loop and requires napi or interrupt
> rescheduling based on the return value from e1000_clean_tx_irq.

That explains everything, thanks Jesse.

Ok, here is the patch I'll propose to fix this.  The goal is to make
it as simple as possible without regressing the thing we were trying
to fix.

Something more sophisticated can be done later.

Three of the 5 Intel drivers had the TX breakout logic.  e1000,
e1000e, and ixgbe.  e100 and ixgb did not, so they don't have any
problems we need to fix here.

What the fix does is behave as if the budget was fully consumed if
*_clean_tx_irq() returns true.

The only valid way to return from ->poll() without copleting the NAPI
poll is by returning work_done == budget.  That signals to the caller
that the NAPI instance has not been descheduled and therefore the
caller fully owns the NAPI context.

This does mean that for these drivers any time TX work is done, we'll
loop at least one extra time in the ->poll() loop of net_rx_work() but
that is historically what these drivers have caused to happen for
years.

For 2.6.25 or similar I would suggest investigating courses of action
to bring closure and consistency to this:

1) Determine whether the loop breakout is actually necessary.
   Jesse explained to me that they had seen a case where a
   thread on one cpu feeding the TX ring could keep a thread
   on another cpu constantly running the *_clean_tx_irq() code
   in a loop.

   I find this hard to believe since even the slowest CPU should be
   able to free up TX entries faster than they can be transmitted on
   gigabit links :-)

2) If the investigation in #1 deems the breakout logic is necessary,
   then consistently amongst all the 5 drivers a policy should be
   implemented which is integrated with the NAPI budgetting logic.
   For example, the simplest thing to do is to pass the budget and the
   "work_done" thing down into *_clean_tx_irq() and break out if it is
   exceeded.

   As a further refinement we can say that TX work is about 1/4 the
   expense of RX work and adjust the budget checking logic to match
   that.

[NET]: Fix TX timeout regression in Intel drivers.

This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")

As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever.  If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
 {
struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
-   int work_done = 0;
+   int tx_cleaned = 0, work_done = 0;
 
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
 * simultaneously.  A failure obtaining the lock means
 * tx_ring[0] is currently being cleaned anyway. */
if (spin_trylock(>tx_queue_lock)) {
-   e1000_clean_tx_irq(adapter,
-  >tx_ring[0]);
+   tx_cleaned = e1000_clean_tx_irq(adapter,
+   >tx_ring[0]);
spin_unlock(>tx_queue_lock);
}
 
adapter->clean_rx(adapter, >rx_ring[0],
  _done, budget);
 
+   if (tx_cleaned)
+   work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int 
budget)
 {
struct e1000_adapter *adapter = container_of(napi, struct 
e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
-   int work_done = 0;
+   int tx_cleaned = 0, work_done = 0;
 
/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int 
budget)
 * simultaneously.  A failure obtaining the lock means
 * tx_ring is currently being cleaned anyway. */
if (spin_trylock(>tx_queue_lock)) {

Re: [PATCH 001 of 6] md: Fix an occasional deadlock in raid5

2008-01-15 Thread dean gaudet
On Mon, 14 Jan 2008, NeilBrown wrote:

> 
> raid5's 'make_request' function calls generic_make_request on
> underlying devices and if we run out of stripe heads, it could end up
> waiting for one of those requests to complete.
> This is bad as recursive calls to generic_make_request go on a queue
> and are not even attempted until make_request completes.
> 
> So: don't make any generic_make_request calls in raid5 make_request
> until all waiting has been done.  We do this by simply setting
> STRIPE_HANDLE instead of calling handle_stripe().
> 
> If we need more stripe_heads, raid5d will get called to process the
> pending stripe_heads which will call generic_make_request from a
> different thread where no deadlock will happen.
> 
> 
> This change by itself causes a performance hit.  So add a change so
> that raid5_activate_delayed is only called at unplug time, never in
> raid5.  This seems to bring back the performance numbers.  Calling it
> in raid5d was sometimes too soon...
> 
> Cc: "Dan Williams" <[EMAIL PROTECTED]>
> Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

probably doesn't matter, but for the record:

Tested-by: dean gaudet <[EMAIL PROTECTED]>

this time i tested with internal and external bitmaps and it survived 8h 
and 14h resp. under the parallel tar workload i used to reproduce the 
hang.

btw this should probably be a candidate for 2.6.22 and .23 stable.

thanks
-dean
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kprobe: missing cast

2008-01-15 Thread Harvey Harrison
On Wed, 2008-01-16 at 10:22 +0530, Ananth N Mavinakayanahalli wrote:
> On Mon, Jan 14, 2008 at 07:21:55PM -0800, Stephen Hemminger wrote:
> > Fix warning from missing cast, maybe a result of the x86 merge?
> > 
> > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> 
> Acked-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>
> 
> Thanks Stephen!
> 

kprobes_32|64.c have already been merged to kprobes.c in the x86.git
tree.  A stack_addr() helper was added to deal with the differences
here between 32 and 64 bit.

I'm pretty sure the x86 kprobes unification is headed for 2.6.25, so
it will get fixed.

Harvey

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > list_heads are OK if we use them for one and only function.
> 
> Not really.  They're inappropriate when you wish to remember your
> position in the list while you dropped the lock (as we must do in
> writeback).
> 
> A data structure which permits us to interate across the search key rather
> than across the actual storage locations is more appropriate.

I totally agree with you. What I mean is to first do the split of
functions - into three: ordering, starvation prevention, and blockade
waiting. Then to do better ordering by adopting radix tree(or rbtree
if radix tree is not enough), and lastly get rid of the list_heads to
avoid locking. Does it sound like a good path?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kprobe: missing cast

2008-01-15 Thread Ananth N Mavinakayanahalli
On Mon, Jan 14, 2008 at 07:21:55PM -0800, Stephen Hemminger wrote:
> Fix warning from missing cast, maybe a result of the x86 merge?
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>

Acked-by: Ananth N Mavinakayanahalli <[EMAIL PROTECTED]>

Thanks Stephen!

Ananth

> 
>   CC  arch/x86/kernel/kprobes_32.o
> arch/x86/kernel/kprobes_32.c: In function ‘longjmp_break_handler’:
> arch/x86/kernel/kprobes_32.c:729: warning: comparison of distinct pointer 
> types lacks a cast
> 
> --- a/arch/x86/kernel/kprobes_32.c2008-01-14 19:18:01.0 -0800
> +++ b/arch/x86/kernel/kprobes_32.c2008-01-14 19:18:08.0 -0800
> @@ -726,7 +726,7 @@ int __kprobes longjmp_break_handler(stru
>   struct jprobe *jp = container_of(p, struct jprobe, kp);
>  
>   if ((addr > (u8 *) jprobe_return) && (addr < (u8 *) jprobe_return_end)) 
> {
> - if (>esp != kcb->jprobe_saved_esp) {
> + if ((unsigned long *) >esp != kcb->jprobe_saved_esp) {
>   struct pt_regs *saved_regs = >jprobe_saved_regs;
>   printk("current esp %p does not match saved esp %p\n",
>  >esp, kcb->jprobe_saved_esp);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.24-rc8

2008-01-15 Thread Dave Young
On Jan 16, 2008 12:50 PM, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Wed, 16 Jan 2008, Dave Young wrote:
> >
> > The kernel.org downloading seems not available, could you update?
>
> It should be there, but it may take a while to mirror out. It's definitely
> there on the master site already (and gitweb shows it, so the git repo has
> already mirrored out at least to that site).

I see it,  thanks. Maybe I'm too anxious to get it :)

>
> Linus
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.24-rc8

2008-01-15 Thread Linus Torvalds


On Wed, 16 Jan 2008, Dave Young wrote:
> 
> The kernel.org downloading seems not available, could you update?

It should be there, but it may take a while to mirror out. It's definitely 
there on the master site already (and gitweb shows it, so the git repo has 
already mirrored out at least to that site).

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.24-rc8

2008-01-15 Thread Dave Young
Hi, linus

The kernel.org downloading seems not available, could you update?

Regards
dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Andrew Morton
On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:

> list_heads are OK if we use them for one and only function.

Not really.  They're inappropriate when you wish to remember your
position in the list while you dropped the lock (as we must do in
writeback).

A data structure which permits us to interate across the search key rather
than across the actual storage locations is more appropriate.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH for 2.6.24] fix workqueue creation API lockdep interaction

2008-01-15 Thread Dave Young
On Jan 15, 2008 9:04 PM, Johannes Berg <[EMAIL PROTECTED]> wrote:
> Dave Young reported warnings from lockdep that the workqueue API
> can sometimes try to register lockdep classes with the same key
> but different names. This is not permitted in lockdep.
>
> Unfortunately, I was unaware of that restriction when I wrote
> the code to debug workqueue problems with lockdep and used the
> workqueue name as the lockdep class name. This can obviously
> lead to the problem if the workqueue name is dynamic.
>
> This patch solves the problem by always using a constant name
> for the workqueue's lockdep class, namely either the constant
> name that was passed in or a string consisting of the variable
> name.
>
> Signed-off-by: Johannes Berg <[EMAIL PROTECTED]>
> ---
> Please be careful with this patch, I haven't been able to test it so far
> because my powerbook doesn't have lockdep.

Hi,
Just for confirm, the warnings didn't trigger after applied your patch, thanks.

Regards
dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.24-rc8

2008-01-15 Thread Linus Torvalds

I do hate doing -rc's for so long, but I hate releasing when not feeling 
it's simmered enough even more. And the changes since -rc7 are bigger than 
the changes between -rc6 and -rc7 were (partly probably because people 
were still on vacation between -rc6 and -rc7, so we had something of a 
small trickle come in afterwards).

That said, the changes here really aren't that big, and the shortlog is 
fairly boring. So I'm pretty sure this is the last -rc, and the final 
2.6.24 will probably be out next weekend or so. But in the meantime, let's 
give this a final shakedown, and see if we can fix any last regressions 
still.

(I also get the feeling that more people are already working on 2.6.25 
features, so it's not like delaying 2.6.24 will help past some point 
anyway, but let's give it a few more days).

Anyway: drivers, networking, some arch updates, and ACPI. A fair number of 
really small commits. I honestly can't really improve on the appended 
shortlog - there isn't any over-arching theme, except for "lots of small 
boring fixes". 

Which is as it should be, of course.

Linus

---
Adrian Bunk (3):
  [NET]: Fix netx-eth.c compilation.
  scsi/qla2xxx/qla_os.c section fix
  OSS msnd: fix array overflows

Akinobu Mita (1):
  xip: fix get_zeroed_page with __GFP_HIGHMEM

Al Viro (4):
  xircom_cb endianness fixes
  de4x5 fixes
  endianness noise in tulip_core
  libata fixes for sparse-found problems

Alan Cox (5):
  pl2303: Fix mode switching regression
  libata-sff: PCI IRQ handling fix
  pata_pdc202xx_old: Further fixups
  ACPI : Not register gsi for PCI IDE controller in legacy mode
  libata: correct handling of TSS DVD

Alexey Starikovskiy (2):
  ACPI: EC: Enable boot EC before bus_scan
  ACPI: Make sysfs interface in ACPI power optional.

Amos Waterland (1):
  [IPV4] ipconfig: Fix regression in ip command line processing

Andrew Lutomirski (1):
  mac80211: return an error when SIWRATE doesn't match any rate

Andrew Morton (2):
  [libata] pata_bf54x: checkpatch fixes
  [libata] core checkpatch fix

Andy Wingo (1):
  macintosh: fix fabrication of caplock key events

Anton Vorontsov (1):
  fs_enet: check for phydev existence in the ethtool handlers

Atsushi Nemoto (2):
  [MIPS] Move inclusing of kernel/time/Kconfig menu to appropriate place
  [MIPS] Replace 40c7869b693b18412491fdcff64682215b739f9e kludge

Auke Kok (1):
  [NET] Intel ethernet drivers: update MAINTAINERS

Aurelien Jarno (1):
  [MIPS] Kconfig fixes for BCM47XX platform

Benjamin Herrenschmidt (1):
  [POWERPC] Workaround for iommu page alignment

Bernhard Walle (1):
  x86: fix RTC_AIE with CONFIG_HPET_EMULATE_RTC

Björn Steinbrink (1):
  [FORCEDETH]: Fix reversing the MAC address on suspend.

Bob Moore (1):
  ACPICA: fix acpi_serialize hang regression

Brian Haley (1):
  [IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()

Brice Goglin (1):
  [LRO] Fix lro_mgr->features checks

Carmelo Amoroso (1):
  sh: Fix argument page dcache flushing regression.

Chas Williams (1):
  [ATM]: [nicstar] delay irq setup until card is configured

Christoph Hellwig (1):
  [XFS] fix unaligned access in readdir

Christoph Lameter (1):
  quicklists: Only consider memory that can be used with GFP_KERNEL

Dan Williams (1):
  md: fix data corruption when a degraded raid5 array is reshaped

Dave Dillow (1):
  IB/srp: Release transport before removing host

Dave Young (1):
  [BLUETOOTH]: rfcomm tty BUG_ON() code fix

David Brownell (1):
  spi_bitbang: always grab lock with irqs blocked

David S. Miller (12):
  [NIU]: Missing ->last_rx update.
  [NIU]: Fix potentially stuck TCP socket send queues.
  [NIU]: Update driver version and release date.
  [NET]: Do not grab device reference when scheduling a NAPI poll.
  [NET]: Add NAPI_STATE_DISABLE.
  [NET]: Do not check netif_running() and carrier state in ->poll()
  [NETXEN]: Fix ->poll() done logic.
  [NET]: Fix drivers to handle napi_disable() disabling interrupts.
  [NET]: Stop polling when napi_disable() is pending.
  [NET]: Make ->poll() breakout consistent in Intel ethernet drivers.
  [SPARC]: Make gettimeofday() monotonic again.
  [SPARC64]: Fix build with SPARSEMEM_VMEMMAP disabled.

David Smith (1):
  TPM: fix suspend and resume failure

Dhananjay Phadke (1):
  netxen: fix byte-swapping in tx and rx

Dmitri Vorobiev (1):
  [MIPS] Malta: Fix software reset on big endian

Dmitry Baryshkov (1):
  Input: Handle EV_PWR type of input caps in input_set_capability.

Dotan Barak (1):
  IB/mlx4: Fix value of pkey_index in QP1 completions

Emil Medve (1):
  Fixed a small typo in the loopback driver

Eric Dumazet (6):
  [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  [XFRM]: xfrm_algo_clone() allocates too much memory
  [SOCK]: Adds a 

Re: [CALL FOR TESTING] Make Ext3 fsck way faster [2.6.24-rc6 -mm patch]

2008-01-15 Thread Valdis . Kletnieks
On Tue, 15 Jan 2008 03:04:41 PST, Andrew Morton said:

> In any decent environment, people will fsck their ext3 filesystems during
> planned downtime, and the benefit of reducing that downtime from 6
> hours/machine to 2 hours/machine is probably fairly small, given that there
> is no service interruption.  (The same applies to desktops and laptops).

I've got multiple boxes across the hall that have 50T of disk on them, in one
case as one large filesystem, and the users want *more* *bigger* still (damned
researchers - you put a 15 teraflop supercomputer in the room, and then they
want someplace to *put* all the numbers that come spewing out of there.. ;)

There comes a point where that downtime gets too long to be politically
expedient.  6->2 may not be a biggie, because you can likely get a 6 hour
window.  24->8 suddenly looks a lot different.

(Having said that, I'll admit the one 52T filesystem is an SGI Itanium box
running Suse and using XFS rather than ext3).

Has anybody done a back-of-envelope of what this would do for fsck times for
a "max realistically achievable ext3 filesystem" (i.e. 100T-200T or ext3
design limit, whichever is smaller)?

(And one of the research crew had a not-totally-on-crack proposal to get a
petabyte of spinning oxide.  Figuring out how to back that up would probably
have landed firmly in my lap.  Ouch. ;)


pgp6aWsioIWDY.pgp
Description: PGP signature


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 07:44:15PM -0800, Andrew Morton wrote:
> On Wed, 16 Jan 2008 11:01:08 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, Jan 15, 2008 at 09:53:42AM -0800, Michael Rubin wrote:
> > > On Jan 15, 2008 12:46 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > > Just a quick question, how does this interact/depend-uppon etc.. with
> > > > Fengguangs patches I still have in my mailbox? (Those from Dec 28th)
> > > 
> > > They don't. They apply to a 2.6.24rc7 tree. This is a candidte for 2.6.25.
> > > 
> > > This work was done before Fengguang's patches. I am trying to test
> > > Fengguang's for comparison but am having problems with getting mm1 to
> > > boot on my systems.
> > 
> > Yeah, they are independent ones. The initial motivation is to fix the
> > bug "sluggish writeback on small+large files". Michael introduced
> > a new rbtree, and me introduced a new list(s_more_io_wait).
> > 
> > Basically I think rbtree is an overkill to do time based ordering.
> > Sorry, Michael. But s_dirty would be enough for that. Plus, s_more_io
> > provides fair queuing between small/large files, and s_more_io_wait
> > provides waiting mechanism for blocked inodes.
> > 
> > The time ordered rbtree may delay io for a blocked inode simply by
> > modifying its dirtied_when and reinsert it. But it would no longer be
> > that easy if it is to be ordered by location.
> 
> What does the term "ordered by location" mean?  Attemting to sort inodes by
> physical disk address?  By using their i_ino as a key?
> 
> That sounds optimistic.

Yes, exactly. Think about email servers with lots of dirty files.

> > If we are going to do location based ordering in the future, the lists
> > will continue to be useful. It would simply be a matter of switching
> > from the s_dirty(order by time) to some rbtree or radix tree(order by
> > location).
> > 
> > We can even provide both ordering at the same time to different
> > fs/inodes which is configurable by the user. Because the s_dirty
> > and/or rbtree would provide _only_ ordering(not faireness or waiting)
> > and hence is interchangeable.
> > 
> > This patchset could be a good reference. It does location based
> > ordering with radix tree:
> > 
> > [RFC][PATCH] clustered writeback 
> 
> list_heads are just the wrong data structure for this function.  Especially
> list_heads which are protected by a non-sleeping lock.

list_heads are OK if we use them for one and only function. We have
been trying to jam too much into s_dirty in the past.  Grabbing a
refcount could be better than locking - anyway if we split the
functions today, it would be easy to replace the list_heads one by
one in the future.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/3] drivers/misc :UCC based TDM driver for MPC83xx platforms.

2008-01-15 Thread Aggrwal Poonam
Thanks Morton for your comments,
I shall incorporate them and reesnd the patch.

With Regards
Poonam 
 
 

-Original Message-
From: Andrew Morton [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 15, 2008 2:45 AM
To: Aggrwal Poonam
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org;
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; Barkowski Michael; Phillips Kim; Kalra Ashish;
Cutler Richard
Subject: Re: [PATCH 1/3] drivers/misc :UCC based TDM driver for MPC83xx
platforms.

On Mon, 10 Dec 2007 17:34:44 +0530 (IST)
Poonam_Aggrwal-b10812 <[EMAIL PROTECTED]> wrote:

> From: Poonam Aggrwal <[EMAIL PROTECTED]>
> 
> The UCC TDM driver basically multiplexes and demultiplexes data from 
> different channels. It can interface with for example SLIC kind of 
> devices to receive TDM data  demultiplex it and send to upper 
> applications. At the transmit end it receives data for different 
> channels multiplexes it and sends them on the TDM channel. It 
> internally uses TSA( Time Slot Assigner) which does multiplexing and 
> demultiplexing, UCC to perform SDMA between host buffers and the TSA,
CMX to connect TSA to UCC.
> 
> This driver will run on MPC8323E-RDB platforms.
> 
> ...
>
> +#define PREV_PHASE(x) ((x == 0) ? MAX_PHASE : (x - 1)) #define 
> +NEXT_PHASE(x) (((x + 1) > MAX_PHASE) ? 0 : (x + 1))

These macros can reference their arg more than once and are hence
dangerous.  What does PREV_PHASE(foo++) do to foo?

And, in general: do not implement in cpp that which could have been
implemented in C.

> +static struct ucc_tdm_info utdm_primary_info = {
> + .uf_info = {
> + .tsa = 1,
> + .cdp = 1,
> + .cds = 1,
> + .ctsp = 1,
> + .ctss = 1,
> + .revd = 1,
> + .urfs = 0x128,
> + .utfs = 0x128,
> + .utfet = 0,
> + .utftt = 0x128,
> + .ufpt = 256,
> + .ttx_trx =
UCC_FAST_GUMR_TRANSPARENT_TTX_TRX_TRANSPARENT,
> + .tenc = UCC_FAST_TX_ENCODING_NRZ,
> + .renc = UCC_FAST_RX_ENCODING_NRZ,
> + .tcrc = UCC_FAST_16_BIT_CRC,
> + .synl = UCC_FAST_SYNC_LEN_NOT_USED,
> + },
> + .ucc_busy = 0,
> +};
> +
> +static struct ucc_tdm_info utdm_info[8];
> +
> +static void dump_siram(struct tdm_ctrl *tdm_c) { #if defined(DEBUG)

Microscopic note: kernel code tends to do

#ifdef FOO

if only one identifier is being tested and

#if defined(FOO) && defined(BAR)

if more than one is being tested.

There is no rational reason for this ;)

> + int i;
> + u16 phy_num_ts;
> +
> + phy_num_ts = tdm_c->physical_num_ts;
> +
> + pr_debug("SI TxRAM dump\n");
> + /* each slot entry in SI RAM is of 2 bytes */
> + for (i = 0; i < phy_num_ts * 2; i++)
> + pr_debug("%x ", in_8(_immr->sir.tx[i]));
> + pr_debug("\nSI RxRAM dump\n");
> + for (i = 0; i < phy_num_ts * 2; i++)
> + pr_debug("%x ", in_8(_immr->sir.rx[i]));
> + pr_debug("\n");
> +#endif
> +}
> +
> +/*
> + * converts u-law compressed samples to linear PCM
> + * If the CONFIG_TDM_LINEAR_PCM flag is not set the
> + * TDM driver receives u-law compressed data from the
> + * SLIC device. This function converts the compressed
> + * data to linear PCM and sends it to upper layers.
> + */
> +static inline int ulaw2int(unsigned char log) {
> + u32 sign, segment, temp, quant;
> + int val;
> +
> + temp = log ^ 0xFF;
> + sign = (temp & 0x80) >> 7;
> + segment = (temp & 0x70) >> 4;
> + quant = temp & 0x0F;
> + quant <<= 1;
> + quant += 33;
> + quant <<= segment;
> + if (sign)
> + val = 33 - quant;
> + else
> + val = quant - 33;
> +
> + val *= 4;
> + return val;
> +}
> +
> +/*
> + * converts linear PCM samples to u-law compressed format.
> + * If the CONFIG_TDM_LINEAR_PCM flag is not set the
> + * TDM driver calls this function to convert the PCM samples
> + * to u-law compressed format before sending them to SLIC
> + * device.
> + */
> +static inline u8 int2ulaw(short linear) {
> + u8  quant, ret;
> + u16 output, absol, temp;
> + u32 i, sign;
> + char segment;
> +
> + ret = 0;
> + if (linear >= 0)
> + linear = (linear >> 2);
> + else
> + linear = (0xc000 | (linear >> 2));
> +
> + absol = abs(linear) + 33;
> + temp = absol;
> + sign = (linear >= 0) ? 1 : 0;
> + for (i = 0; i < 16; i++) {
> + output = temp & 0x8000;
> + if (output)
> + break;
> + temp <<= 1;
> + }
> + segment = 11 - i;
> + quant = (absol >> segment) & 0x0F;
> + segment--;
> + segment <<= 4;
> + output = segment + quant;
> + if (absol > 8191)
> + output = 0x7F;
> + if (sign)
> + ret ^= 0xFF;
> + else
> + ret ^= 0x7F;
> + return ret;
> +}

hrm, how many copies of ulaw/alaw conversion 

Re: nosmp/maxcpus=0 or 1 -> TSC unstable

2008-01-15 Thread Andi Kleen
Pete Wyckoff <[EMAIL PROTECTED]> writes:

> We've seen the same problem.  We use gettimeofday() for timing of
> network-ish operations on the order of 10-50 us.  But not having
> the TSC makes gettimeofday() itself very slow, on the order of 30 us.
>
> Here's what we've been using for quite a few kernel versions.  I've
> not tried to submit it for fear that it could break some other
> scenario, as you suggest.  Although in hotplug scenarios, this
> function unsynchronized_tsc() should get rerun and disable TSC if
> more processors arrive.
>
> At least count this as a "me too".

The patch is wrong of course because when this is checked not 
all CPUs are booted yet. So it will always use TSC even when
multiple CPUs are going to be booted.

The right fix for Dean's problem would be probably to add a new 
parameter that disables CPU hotplug and forces smp_possible_map
to max_cpus, which could then be set with maxcpus=1 (or similar) 

I would not recommend to use nosmp or maxcpus=0 either because it will
disable the APIC and that is typically a bad thing (especially if you
need network performance) 

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24-rc7-rt3

2008-01-15 Thread Steven Rostedt
We are pleased to announce the 2.6.24-rc7-rt3 tree, which can be
downloaded from the location:

  http://rt.et.redhat.com/download/

Information on the RT patch can be found at:

  http://rt.wiki.kernel.org/index.php/Main_Page

Changes since 2.6.24-rc7-rt2

  - KVM updates (Thomas Gleixner)

  - more PPC32 compile fixes

  - Mips clean ups (Frank Rowand)

  - workqueue lock leak cleanup (Daniel Walker)

  - hacked fix for rt-migration (Mike Galbraith)


to build a 2.6.24-rc7-rt3 tree, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2
  http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.24-rc7.bz2
  http://rt.et.redhat.com/download/patch-2.6.24-rc7-rt3.bz2


And like always, my RT version of Matt Mackall's ketchup will get this
for you nicely:

  http://people.redhat.com/srostedt/rt/tools/ketchup-0.9.8-rt3


The broken out patches are also available.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/5] add /dev/mem_notify device

2008-01-15 Thread Marcelo Tosatti
On Wed, Jan 16, 2008 at 10:57:16AM +0900, KOSAKI Motohiro wrote:
> Hi Pavel
> 
> > >   err = poll(, 1, -1); // wake up at low memory
> > > 
> > > ...
> > > 
> > 
> > Nice, this is really needed for openmoko, zaurus, etc
> > 
> > But this changelog needs to go into Documentation/...
> > 
> > ...and /dev/mem_notify is really a bad name. /dev/memory_low?
> > /dev/oom?
> 
> thank you for your kindful advise.
> 
> but..
> 
> to be honest, my english is very limited.
> I can't make judgments name is good or not.
> 
> Marcelo, What do you think his idea?

"mem_notify" sounds alright, but I don't really care.

Notify:

To give notice to; inform: notified the citizens of the curfew by
posting signs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: State of kgdb on x86-64

2008-01-15 Thread Jason Wessel
Jan Kiszka wrote:
> Jason Wessel wrote:
>   
>> Jan Kiszka wrote:
>> 
>>> Jason Wessel wrote:
>>>   
>>>   
 It was working at the point that I tested it with the 2.6.24-rc5 on
 x86_64.  However I suspect my kernel config may differ drastically from
 what you are using.

 Without any other context provided than the generic message, it is hard
 to know what might have happened. 
 
 
>>> Here is the promised .config. I could also dig out the backtrace of the
>>> panic as kgdb sees it if that helps, just let me know.
>>>
>>> Jan
>>>
>>>   
>>>   
>> The backtrace might be very telling as to what happened.  More
>> information is always better than less :-)
>>
>> 
>
> My primary test box is again out of reach, but meanwhile I was able to
> reproduce some kind of problem under QEMU - that one at least is
> triggered by SMP. With only one CPU -> all apparently fine. Once booting
> QEMU with "-smp 2" -> this happens:
>
> (gdb) tar remote /dev/pts/6
> Remote debugging using /dev/pts/6
> Not all CPUs have been synced for KGDB
> breakpoint () at kernel/kgdb.c:1895
> 1895wmb(); /* Sync point after breakpoint */
> (gdb) c
> Continuing.
> Not all CPUs have been synced for KGDB
> [New Thread 32769]
>
> Program received signal SIGFPE, Arithmetic exception.
> [Switching to Thread 32769]
> 0x8020adb7 in default_idle () at include/asm/irqflags_64.h:140
> 140 __asm__ __volatile__("sti; hlt" : : : "memory");
> (gdb) bt
> #0  0x8020adb7 in default_idle () at include/asm/irqflags_64.h:140
> #1  0x8020ae65 in cpu_idle () at arch/x86/kernel/process_64.c:225
> #2  0x8021ccb9 in start_secondary () at 
> arch/x86/kernel/smpboot_64.c:375
> #3  0x in ?? ()
> (gdb) 
> 
>
> The problem seems to be related to continuing SMP boxes. I'm able to
> boot my box up if I leave kgdb unattached. But when I then later attach
> and continue execution, I get the same crash. Any ideas what goes wrong,
> any suggestion where to start digging? Maybe at "Not all CPUs have been
> synched"?
>   

Generally speaking when you get an error that the CPUs have not been
synced, it means that the IPI which was sent to all the non-master
processors failed.  I took a quick look and it appears that the DIE_TRAP
is occuring after kgdb sends the IPI to the non master cores with the call:

send_IPI_allbutself(APIC_DM_NMI);

In prior kernels that ultimately resulted in an NMI trap.  I am not sure
of the cause of the DIE_TRAP as a result of the IPI.  For now, if you
add the statement "case DIE_TRAP:" right before "case
DIE_NMIWATCHDOG:" in arch/x86/kernel/kgdb_64.c it will sync te
processors, however the kernel should not be trapping for this error
code from the IPI event.  I suspect there has been some kind of change
to the way the IPI/NMI handling is being done in the latest kernels.

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-15 Thread Steven Rostedt

On Tue, 15 Jan 2008 [EMAIL PROTECTED] wrote:

> On Tue, 15 Jan 2008 02:37:37 +0200, =?utf-8?q?S=2E=C3=87a=C4=9Flar?= Onur 
> said:
> > And because of mcount-add-basic-support-for-gcc-profiler-instrum.patch, 
> > closed
> > source nvidia-new module cannot be used with this release (mcount is 
> > exported
> > GPL only), i know this is not supported but i used it with that [2] patch up
> > until now without a single problem.
>
> Playing devil's advocate here - the claim is that EXPORT_SYMBOL_GPL is to
> indicate that code is getting too chummy with Linux internals.
>
> However, in *this* case, isn't it "code that is too chummy with *GCC* 
> internals",
> and thus it isn't our place to say what can and can't be done with code that
> is derivative of the GCC compiler? ;)

Actually, it got put in there by accident. I usually default all my
exports as GPL.  But this breaks pretty much everything, so I'll leave it
as EXPORT_SYMBOL.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-15 Thread Steven Rostedt

On Tue, 15 Jan 2008, [utf-8] S.Ã^GaÄ^_lar Onur wrote:

>
> 2.6.24-rc7-rt2 (-rt2 patchset on top of Linus's current git commit
> 031f2dcd7075e218e74dd7f942ad015cf82dffab) starts to complain like following
> (full dmesg can be found @ [1]) when try to login from console (the other
> acpi related errors also existed in 2.6.24-rc5-rt1) and FYI, plain 2.6.24-rc7
> (again commit 031f2dcd7075e218e74dd7f942ad015cf82dffab) has no issues.

Do you get the same issues if you add to -rc7 and not git.

>
> [...]
> sysfs: duplicate filename 'vcs1' can not be created
> WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()
> Pid: 1298, comm: mingetty Not tainted 2.6.24-rc7-rt2-99 #1
> [...]
>
> And because of mcount-add-basic-support-for-gcc-profiler-instrum.patch, closed
> source nvidia-new module cannot be used with this release (mcount is exported
> GPL only), i know this is not supported but i used it with that [2] patch up
> until now without a single problem.

Ah, sorry about that. I'll try to fix that later on. You should still be
able to use NVidia by turning off function trace.

>
> Please don't misunderstand this, i really do not want to start a discussion
> for this, i just want to ask the possibility of converting this into
> EXPORT_SYMBOL cause i thought some of the possible -rt users may need this
> closed source module explicitly because of its 3D performance.
>
> If anything else needed for sysfs warnings please just say it...
>
> [1] http://cekirdek.pardus.org.tr/~caglar/dmesg.rt
> [2]
> http://svn.pardus.org.tr/pardus/devel/kernel/drivers/nvidia-new/files/rt.patch
>

Thanks for the report. I'll see what I can do for the next release. But
for now this will have to wait till after -rt3.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]PCIE ASPM support - takes 2

2008-01-15 Thread Valdis . Kletnieks
On Tue, 15 Jan 2008 13:02:26 +0800, Shaohua Li said:

> In my test, power difference between powersave mode and performance mode
> is about 1.3w in a system with 3 PCIE links.

Do you have any numbers on what the added latency is for powersave mode, and
a rough idea of how quickly chipsets will drop to low-power? It may affect
usability a lot if it's "adds 10ms latency after 100ms idle" or "adds 100ms
latency after 5 seconds idle" or some other pattern...

(The chipset in my laptop claims to be an 82801G with 4 PCI-Express ports on
it - I'm trying to get a rough idea what usage I'd get out of that feature..)


pgpREocQIaxh0.pgp
Description: PGP signature


Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

2008-01-15 Thread Al Viro
On Tue, Jan 15, 2008 at 07:41:58PM -0800, Linus Torvalds wrote:

> and wonder what happen sif old_parent == new_parent. Is that trying to 
> avoid an ABBA deadlock? Normally you'd do it by ordering the locks, or by 
> taking a third lock to guarantee serialization at a higher level (ie the 
> "s_vfs_rename_mutex" on the VFS layer)
> 
> I'd like to apply these two patches, but I really want to get more of an 
> ack for them from somebody like Al, or at least more of an explanation for 
> why it's all the right thing.

No ACK is coming until we get something resembling analysis of locking
scheme.  Which won't happen until we at least get the "what callers are
allowed to do" written down, damnit.  As it is, I'm more than inclined
to propose ripping kobject_move() out, especially since it has only two
users - something s390-specific and rfcomm, with its shitloads of problems
beyond just sysfs interaction.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-15 Thread Valdis . Kletnieks
On Tue, 15 Jan 2008 02:37:37 +0200, =?utf-8?q?S=2E=C3=87a=C4=9Flar?= Onur said:
> And because of mcount-add-basic-support-for-gcc-profiler-instrum.patch, closed
> source nvidia-new module cannot be used with this release (mcount is exported
> GPL only), i know this is not supported but i used it with that [2] patch up
> until now without a single problem.

Playing devil's advocate here - the claim is that EXPORT_SYMBOL_GPL is to
indicate that code is getting too chummy with Linux internals.

However, in *this* case, isn't it "code that is too chummy with *GCC* 
internals",
and thus it isn't our place to say what can and can't be done with code that
is derivative of the GCC compiler? ;)


pgpCF0KWPf3cW.pgp
Description: PGP signature


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Andrew Morton
On Wed, 16 Jan 2008 11:01:08 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:

> On Tue, Jan 15, 2008 at 09:53:42AM -0800, Michael Rubin wrote:
> > On Jan 15, 2008 12:46 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > > Just a quick question, how does this interact/depend-uppon etc.. with
> > > Fengguangs patches I still have in my mailbox? (Those from Dec 28th)
> > 
> > They don't. They apply to a 2.6.24rc7 tree. This is a candidte for 2.6.25.
> > 
> > This work was done before Fengguang's patches. I am trying to test
> > Fengguang's for comparison but am having problems with getting mm1 to
> > boot on my systems.
> 
> Yeah, they are independent ones. The initial motivation is to fix the
> bug "sluggish writeback on small+large files". Michael introduced
> a new rbtree, and me introduced a new list(s_more_io_wait).
> 
> Basically I think rbtree is an overkill to do time based ordering.
> Sorry, Michael. But s_dirty would be enough for that. Plus, s_more_io
> provides fair queuing between small/large files, and s_more_io_wait
> provides waiting mechanism for blocked inodes.
> 
> The time ordered rbtree may delay io for a blocked inode simply by
> modifying its dirtied_when and reinsert it. But it would no longer be
> that easy if it is to be ordered by location.

What does the term "ordered by location" mean?  Attemting to sort inodes by
physical disk address?  By using their i_ino as a key?

That sounds optimistic.

> If we are going to do location based ordering in the future, the lists
> will continue to be useful. It would simply be a matter of switching
> from the s_dirty(order by time) to some rbtree or radix tree(order by
> location).
> 
> We can even provide both ordering at the same time to different
> fs/inodes which is configurable by the user. Because the s_dirty
> and/or rbtree would provide _only_ ordering(not faireness or waiting)
> and hence is interchangeable.
> 
> This patchset could be a good reference. It does location based
> ordering with radix tree:
> 
> [RFC][PATCH] clustered writeback 

list_heads are just the wrong data structure for this function.  Especially
list_heads which are protected by a non-sleeping lock.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

2008-01-15 Thread Linus Torvalds


On Wed, 16 Jan 2008, Tejun Heo wrote:
>
> * sysfs_move_dir() has an extra dput() on success path.

Are you sure? How did this ever work?

Also, looking at this, I think the "how did this ever work" question is 
answered by "it didn't", but I also think there are still serious problems 
there. Look at

again:
mutex_lock(_parent->d_inode->i_mutex);
if (!mutex_trylock(_parent->d_inode->i_mutex)) {
mutex_unlock(_parent->d_inode->i_mutex);
goto again;
}

and wonder what happen sif old_parent == new_parent. Is that trying to 
avoid an ABBA deadlock? Normally you'd do it by ordering the locks, or by 
taking a third lock to guarantee serialization at a higher level (ie the 
"s_vfs_rename_mutex" on the VFS layer)

I'd like to apply these two patches, but I really want to get more of an 
ack for them from somebody like Al, or at least more of an explanation for 
why it's all the right thing.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCHv2] x86: add is_f00f_bug helper to fault_32|64.c

2008-01-15 Thread Harvey Harrison
On Tue, 2008-01-15 at 22:22 -0500, H. Peter Anvin wrote:
> Kyle McMartin wrote:
> > On Tue, Jan 15, 2008 at 06:48:35PM -0800, Harvey Harrison wrote:
> >> +#ifdef CONFIG_X86_F00F_BUG
> >> +void do_invalid_op(struct pt_regs *, unsigned long);
> >> +#endif
> >> +
> >> +static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
> >> +{
> >> +#ifdef CONFIG_X86_F00F_BUG
> >> +  unsigned long nr;
> > 
> > You can just put the prototype inside the function, you know...
> 
> You can also make the prototype unconditional, even if the function 
> doesn't necessarily exist.

I'll go with the unconditional prototype, the function will always
exist, it's a bit hard to find as it's done with a macro in
traps_32|64.c.

Harvey


From: Harvey Harrison <[EMAIL PROTECTED]>

Further towards unifying these files, add another helper
in same spirit as is_errata93.

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/mm/fault_32.c |   39 ++-
 arch/x86/mm/fault_64.c |   24 
 2 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/arch/x86/mm/fault_32.c b/arch/x86/mm/fault_32.c
index 936bb0c..dae4f69 100644
--- a/arch/x86/mm/fault_32.c
+++ b/arch/x86/mm/fault_32.c
@@ -211,8 +211,6 @@ void dump_pagetable(unsigned long address)
printk("\n");
 }
 
-void do_invalid_op(struct pt_regs *, unsigned long);
-
 static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 {
unsigned index = pgd_index(address);
@@ -288,6 +286,26 @@ static int is_errata93(struct pt_regs *regs, unsigned long 
address)
return 0;
 }
 
+void do_invalid_op(struct pt_regs *, unsigned long);
+
+static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
+{
+#ifdef CONFIG_X86_F00F_BUG
+   unsigned long nr;
+   /*
+* Pentium F0 0F C7 C8 bug workaround.
+*/
+   if (boot_cpu_data.f00f_bug) {
+   nr = (address - idt_descr.address) >> 3;
+
+   if (nr == 6) {
+   do_invalid_op(regs, 0);
+   return 1;
+   }
+   }
+#endif
+   return 0;
+}
 
 /*
  * Handle a fault on the vmalloc or module mapping area
@@ -570,21 +588,8 @@ bad_area_nosemaphore:
return;
}
 
-#ifdef CONFIG_X86_F00F_BUG
-   /*
-* Pentium F0 0F C7 C8 bug workaround.
-*/
-   if (boot_cpu_data.f00f_bug) {
-   unsigned long nr;
-
-   nr = (address - idt_descr.address) >> 3;
-
-   if (nr == 6) {
-   do_invalid_op(regs, 0);
-   return;
-   }
-   }
-#endif
+   if (is_f00f_bug(regs, address))
+   return;
 
 no_context:
/* Are we prepared to handle this kernel fault?  */
diff --git a/arch/x86/mm/fault_64.c b/arch/x86/mm/fault_64.c
index cde110c..ce1a870 100644
--- a/arch/x86/mm/fault_64.c
+++ b/arch/x86/mm/fault_64.c
@@ -256,6 +256,27 @@ static int is_errata93(struct pt_regs *regs, unsigned long 
address)
return 0;
 }
 
+void do_invalid_op(struct pt_regs *, unsigned long);
+
+static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
+{
+#ifdef CONFIG_X86_F00F_BUG
+   unsigned long nr;
+   /*
+* Pentium F0 0F C7 C8 bug workaround.
+*/
+   if (boot_cpu_data.f00f_bug) {
+   nr = (address - idt_descr.address) >> 3;
+
+   if (nr == 6) {
+   do_invalid_op(regs, 0);
+   return 1;
+   }
+   }
+#endif
+   return 0;
+}
+
 static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
 unsigned long error_code)
 {
@@ -572,6 +593,9 @@ bad_area_nosemaphore:
return;
}
 
+   if (is_f00f_bug(regs, address))
+   return;
+
 no_context:
/* Are we prepared to handle this kernel fault?  */
if (fixup_exception(regs))
-- 
1.5.4.rc2.1164.g6451



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


hpet_late_init hang

2008-01-15 Thread Yinghai Lu
"
commit e5ed385fa0d6f35406e3e3ed75e5eb9adeb811df
Author: Balaji Rao <[EMAIL PROTECTED]>
Date:   Tue Jan 15 16:53:29 2008 +0100

Assign IRQs to HPET Timers
"
in x86.git

cause my servers hang
after
Calling initcall 0x80b9a465: hpet_late_init+0x0/0x100()

after reverting that I got:

initcall 0x80b947d1 ran for 19 msecs: pci_iommu_init+0x0/0x13()
Calling initcall 0x80b9a465: hpet_late_init+0x0/0x100()
hpet0: at MMIO 0xfed0, IRQs 2, 8, 31
hpet0: 3 32-bit timers, 2500 Hz
initcall 0x80b9a465: hpet_late_init+0x0/0x100() returned 0.
initcall 0x80b9a465 ran for 7 msecs: hpet_late_init+0x0/0x100()

   CPU0   CPU1   CPU2   CPU3   CPU4   CPU5
  CPU6   CPU7
  0: 86  0  0  0  0  0
 1  0   IO-APIC-edge  timer
  4:  0  0  0  0  0  0
 1838   IO-APIC-edge  serial
  7:  1  0  0  0  0  0
 0  0   IO-APIC-edge
  8:  0  0  0  0  0  0
 0  0   IO-APIC-edge  rtc0

for mcp55, it should already route hpet to ioapic pin2 or the irq0.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: add is_f00f_bug helper to fault_32|64.c

2008-01-15 Thread H. Peter Anvin

Kyle McMartin wrote:

On Tue, Jan 15, 2008 at 06:48:35PM -0800, Harvey Harrison wrote:

+#ifdef CONFIG_X86_F00F_BUG
+void do_invalid_op(struct pt_regs *, unsigned long);
+#endif
+
+static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
+{
+#ifdef CONFIG_X86_F00F_BUG
+   unsigned long nr;


You can just put the prototype inside the function, you know...


You can also make the prototype unconditional, even if the function 
doesn't necessarily exist.


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-15 Thread Mathieu Desnoyers
* Steven Rostedt ([EMAIL PROTECTED]) wrote:
> 
> On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
> >
> > Ok, but what actually insures that the clock->cycle_* reads won't be
> > reordered across the clocksource_read() ?
> 
> 
> 
> Hmm, interesting.I didn't notice that clocksource_read() is a static
> inline.  I was thinking that since it was passing a pointer to a function,
> gcc could not assume that it could move that code across it. But now
> looking to see that clocksource_read is simply a static inline that does:
> 
>   cs->read();
> 
> But still, can gcc assume that it can push loads of unknown origin
> variables across function calls? So something like:
> 
> static int *glob;
> 
> void foo(void) {
>   int x;
> 
>   x = *glob;
> 
>   bar();
> 
>   if (x != *glob)
>   /* ... */
> }
> 
> I can't see how any compiler could honestly move the loading of the first
> x after the calling of bar(). With glob pointing to some unknown
> variable, that may be perfectly fine for bar to modify.
> 
> 
> > > >
> > > > > + cycle_raw = clock->cycle_raw;
> > > > > + cycle_last = clock->cycle_last;
> > > > > +
> > > > > + /* read clocksource: */
> > > > > + cycle_now = clocksource_read(clock);
> 
> So the question here is,can cycle_raw and cycle_last be loaded from
> the unknown source that clock points to after the call to
> clocksource_read()?
> 
>  I'm thinking not.
> 

I agree with you that I don't see how the compiler could reorder this.
So we forget about compiler barriers. Also, the clock source used is a
synchronized clock source (get_cycles_sync on x86_64), so it should make
sure the TSC is read at the right moment.

However, what happens if the clock source is, say, the jiffies ?

Is this case, we have :

static cycle_t jiffies_read(void)
{
return (cycle_t) jiffies;
}

Which is nothing more than a memory read of 

extern unsigned long volatile __jiffy_data jiffies;

I think it is wrong to assume that reads from clock->cycle_raw and from
jiffies will be ordered correctly in SMP. I am tempted to think that
ordering memory writes to clock->cycle_raw vs jiffies is also needed in this
case (where clock->cycle_raw is updated, or where jiffies is updated).

We can fall in the same kind of issue if we read the HPET, which is
memory I/O based. It does not seems correct to assume that MMIO vs
normal memory reads are ordered. (pointing back to this article :
http://lwn.net/Articles/198988/)

Mathieu


> > > > > +
> > > > > + /* calculate the delta since the last update_wall_time: 
> > > > > */
> > > > > + cycle_delta = (cycle_now - cycle_last) & clock->mask;
> > > > > +
> > > > > + } while (cycle_raw != clock->cycle_raw ||
> > > > > +  cycle_last != clock->cycle_last);
> > > > > +
> > > > > + return cycle_raw + cycle_delta;
> > > > > +}
> 
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: add is_f00f_bug helper to fault_32|64.c

2008-01-15 Thread Kyle McMartin
On Tue, Jan 15, 2008 at 06:48:35PM -0800, Harvey Harrison wrote:
> +#ifdef CONFIG_X86_F00F_BUG
> +void do_invalid_op(struct pt_regs *, unsigned long);
> +#endif
> +
> +static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
> +{
> +#ifdef CONFIG_X86_F00F_BUG
> + unsigned long nr;

You can just put the prototype inside the function, you know...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.24-rc7 2/2] sysfs: fix bugs in sysfs_rename/move_dir()

2008-01-15 Thread Tejun Heo
sysfs_rename/move_dir() have the following bugs.

* On dentry lookup failure, kfree() is called on ERR_PTR() value.
* sysfs_move_dir() has an extra dput() on success path.

Fix them.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: work/fs/sysfs/dir.c
===
--- work.orig/fs/sysfs/dir.c
+++ work/fs/sysfs/dir.c
@@ -783,6 +783,7 @@ int sysfs_rename_dir(struct kobject * ko
old_dentry = sysfs_get_dentry(sd);
if (IS_ERR(old_dentry)) {
error = PTR_ERR(old_dentry);
+   old_dentry = NULL;
goto out;
}
 
@@ -850,6 +851,7 @@ int sysfs_move_dir(struct kobject *kobj,
old_dentry = sysfs_get_dentry(sd);
if (IS_ERR(old_dentry)) {
error = PTR_ERR(old_dentry);
+   old_dentry = NULL;
goto out;
}
old_parent = old_dentry->d_parent;
@@ -857,6 +859,7 @@ int sysfs_move_dir(struct kobject *kobj,
new_parent = sysfs_get_dentry(new_parent_sd);
if (IS_ERR(new_parent)) {
error = PTR_ERR(new_parent);
+   new_parent = NULL;
goto out;
}
 
@@ -880,7 +883,6 @@ again:
error = 0;
d_add(new_dentry, NULL);
d_move(old_dentry, new_dentry);
-   dput(new_dentry);
 
/* Remove from old parent's list and insert into new parent's list. */
sysfs_unlink_sibling(sd);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: implement merging printk

2008-01-15 Thread Tejun Heo
Randy Dunlap wrote:
> On Wed, 16 Jan 2008 10:00:09 +0900 Tejun Heo wrote:
> 
> 
>> ---
>>  include/linux/kernel.h |   71 
>>  kernel/printk.c|  215 
>> 
>>  2 files changed, 286 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>> index ade3ac9..f92a4a1 100644
>> --- a/include/linux/kernel.h
>> +++ b/include/linux/kernel.h
>> @@ -175,6 +175,29 @@ extern struct pid *session_of_pgrp(struct pid *pgrp);
>>  extern void dump_thread(struct pt_regs *regs, struct user *dump);
>>  
>>  #ifdef CONFIG_PRINTK
>> +struct mprintk {
>> +char *  header;
>> +char *  body;
>> +char *  cur;
>> +char *  prv;
>> +char *  end;
> 
> We aren't very consistent about this, but I think that we would prefer
> 
>   char*header;
> 
> etc. there.

That's my preference too.  Dunno why I wrote like the above here.  Will
change.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.24-rc7 1/2] sysfs: make sysfs_lookup() return ERR_PTR(-ENOENT) on failed lookup

2008-01-15 Thread Tejun Heo
sysfs tries to keep dcache a strict subset of sysfs_dirent tree by
shooting down dentries when a node is removed, that is, no negative
dentry for sysfs.  However, the lookup function returned NULL and thus
created negative dentries when the target node didn't exist.

Make sysfs_lookup() return ERR_PTR(-ENOENT) on lookup failure.  This
fixes the NULL dereference bug in sysfs_get_dentry() discovered by
bluetooth rfcomm device moving around.

Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
---
 fs/sysfs/dir.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: work/fs/sysfs/dir.c
===
--- work.orig/fs/sysfs/dir.c
+++ work/fs/sysfs/dir.c
@@ -678,8 +678,10 @@ static struct dentry * sysfs_lookup(stru
sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);
 
/* no such entry */
-   if (!sd)
+   if (!sd) {
+   ret = ERR_PTR(-ENOENT);
goto out_unlock;
+   }
 
/* attach dentry and inode */
inode = sysfs_get_inode(sd);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Rik van Riel
On Tue, 15 Jan 2008 20:44:38 -0500
"Daniel Phillips" <[EMAIL PROTECTED]> wrote:

> Along with this effort, could you let me know if the world actually
> cares about online fsck?  Now we know how to do it I think, but is it
> worth the effort.

With a filesystem that is compartmentalized and checksums metadata,
I believe that an online fsck is absolutely worth having.

Instead of the filesystem resorting to mounting the whole volume
read-only on certain errors, part of the filesystem can be offlined
while an fsck runs.  This could even be done automatically in many
situations.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] printk: implement merging printk

2008-01-15 Thread Randy Dunlap
On Wed, 16 Jan 2008 10:00:09 +0900 Tejun Heo wrote:


> ---
>  include/linux/kernel.h |   71 
>  kernel/printk.c|  215 
> 
>  2 files changed, 286 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index ade3ac9..f92a4a1 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -175,6 +175,29 @@ extern struct pid *session_of_pgrp(struct pid *pgrp);
>  extern void dump_thread(struct pt_regs *regs, struct user *dump);
>  
>  #ifdef CONFIG_PRINTK
> +struct mprintk {
> + char *  header;
> + char *  body;
> + char *  cur;
> + char *  prv;
> + char *  end;

We aren't very consistent about this, but I think that we would prefer

char*header;

etc. there.


> + int overflowed;
> +};
> +
> +#define MPRINTK_INITIALIZER(_buf, _size) \
> + {   \
> + .header = NULL, \
> + .body   = _buf, \
> + .cur= _buf, \
> + .prv= NULL, \
> + .end= _buf + _size, \
> + .overflowed = 0,\
> + }
> +
> +#define DEFINE_MPRINTK(name, size)   \
> + char __##name##_buf[size];  \
> + struct mprintk name = MPRINTK_INITIALIZER(__##name##_buf, size)

---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Converting writeback linked lists to a tree based data structure

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 09:53:42AM -0800, Michael Rubin wrote:
> On Jan 15, 2008 12:46 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > Just a quick question, how does this interact/depend-uppon etc.. with
> > Fengguangs patches I still have in my mailbox? (Those from Dec 28th)
> 
> They don't. They apply to a 2.6.24rc7 tree. This is a candidte for 2.6.25.
> 
> This work was done before Fengguang's patches. I am trying to test
> Fengguang's for comparison but am having problems with getting mm1 to
> boot on my systems.

Yeah, they are independent ones. The initial motivation is to fix the
bug "sluggish writeback on small+large files". Michael introduced
a new rbtree, and me introduced a new list(s_more_io_wait).

Basically I think rbtree is an overkill to do time based ordering.
Sorry, Michael. But s_dirty would be enough for that. Plus, s_more_io
provides fair queuing between small/large files, and s_more_io_wait
provides waiting mechanism for blocked inodes.

The time ordered rbtree may delay io for a blocked inode simply by
modifying its dirtied_when and reinsert it. But it would no longer be
that easy if it is to be ordered by location.

If we are going to do location based ordering in the future, the lists
will continue to be useful. It would simply be a matter of switching
from the s_dirty(order by time) to some rbtree or radix tree(order by
location).

We can even provide both ordering at the same time to different
fs/inodes which is configurable by the user. Because the s_dirty
and/or rbtree would provide _only_ ordering(not faireness or waiting)
and hence is interchangeable.

This patchset could be a good reference. It does location based
ordering with radix tree:

[RFC][PATCH] clustered writeback 

Thank you,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-15 Thread Tejun Heo
Hello,

Randy Dunlap wrote:
>>   mprintk_set_header(, KERN_INFO "ata%u.%2u: ", 1, 0);
>>   mprintk_push(, "ATA %d", 7);
>>   mprintk_push(, ", %u sectors\n", 1024);
>>   mprintk(, "everything seems dandy\n");
> 
> Looks pretty good to me except that I would change mprintk_push to
> mprintk_add or mprintk_append (I think that I prefer _add).

I think push and flush sound good when used together but then again the
flush function isn't visible in the interface and push has LIFO ring to
it.  I'm okay with add.  append seems a bit too long.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] libata: make libata use printk_header() and mprintk

2008-01-15 Thread Tejun Heo
Randy Dunlap wrote:
>> -ata_dev_printk(dev, KERN_WARNING,
>> -"Drive reports diagnostics failure. This may indicate a drive\n");
>> -ata_dev_printk(dev, KERN_WARNING,
>> -"fault or invalid emulation. Contact drive vendor for information.\n");
>> -}
> 
> Looks to me like several of these + lines have indent problems:
> following lines (i.e., not first line) of function call should be
> indented more than the first line:

Putting one more tab would put good part of those lines off the 80col
limit.  Hmmm... but I agree it looks ugly.  I'll add one or two spaces
there.

>> +ata_dev_printk(dev, KERN_WARNING,
>> +"Drive reports diagnostics failure. This may indicate a drive\n"
>> +"fault or invalid emulation. Contact drive vendor for 
>> information.\n");

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH for -mm] Fix ARM to play nicely with generic Instrumentation menu

2008-01-15 Thread Mathieu Desnoyers
The conflicting commit for 
move-kconfiginstrumentation-to-arch-kconfig-and-init-kconfig.patch
is the ARM fix from Linus :

commit 38ad9aebe70dc72df08851bbd1620d89329129ba

He just seemed to agree that my approach (just putting the missing ARM
config options in arch/arm/Kconfig) works too. The main advantage it has
is that it is smaller, does not need a cleanup in the future and does
not break the following patches unnecessarily.

It's just been discussed here

http://lkml.org/lkml/2008/1/15/267

However, Linus might prefer to stay with his own patch and I would
totally understand it that late in the release cycle. Therefore I submit
this for the next release cycle.

This patch cleans up the fix from Linus so it does not conflict with the
following patches in -mm.

It applies on top of the current 2.6.24-rc7-git8 + possibly some more
git commits (at commit 0938e7586440ac97cedc0f5528a8684ebfa4ce43).

After applying this patch,
move-kconfiginstrumentation-to-arch-kconfig-and-init-kconfig.patch
applies nicely in the -mm tree without any modification.

Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
CC: Linus Torvalds <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
CC: Adrian Bunk <[EMAIL PROTECTED]>
CC: Randy Dunlap <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
---
 arch/arm/Kconfig |2 -
 arch/arm/Kconfig.instrumentation |   52 ---
 2 files changed, 1 insertion(+), 53 deletions(-)

Index: linux-2.6-lttng/arch/arm/Kconfig
===
--- linux-2.6-lttng.orig/arch/arm/Kconfig   2008-01-15 21:37:06.0 
-0500
+++ linux-2.6-lttng/arch/arm/Kconfig2008-01-15 21:45:23.0 -0500
@@ -130,6 +130,23 @@ config FIQ
 config ARCH_MTD_XIP
bool
 
+if OPROFILE
+
+config OPROFILE_ARMV6
+   def_bool y
+   depends on CPU_V6 && !SMP
+   select OPROFILE_ARM11_CORE
+
+config OPROFILE_MPCORE
+   def_bool y
+   depends on CPU_V6 && SMP
+   select OPROFILE_ARM11_CORE
+
+config OPROFILE_ARM11_CORE
+   bool
+
+endif
+
 config VECTORS_BASE
hex
default 0x if MMU || CPU_HIGH_VECTOR
@@ -1076,7 +1093,7 @@ endmenu
 
 source "fs/Kconfig"
 
-source "arch/arm/Kconfig.instrumentation"
+source "kernel/Kconfig.instrumentation"
 
 source "arch/arm/Kconfig.debug"
 
Index: linux-2.6-lttng/arch/arm/Kconfig.instrumentation
===
--- linux-2.6-lttng.orig/arch/arm/Kconfig.instrumentation   2008-01-15 
21:37:06.0 -0500
+++ /dev/null   1970-01-01 00:00:00.0 +
@@ -1,52 +0,0 @@
-menuconfig INSTRUMENTATION
-   bool "Instrumentation Support"
-   default y
-   ---help---
- Say Y here to get to see options related to performance measurement,
- system-wide debugging, and testing. This option alone does not add any
- kernel code.
-
- If you say N, all options in this submenu will be skipped and
- disabled. If you're trying to debug the kernel itself, go see the
- Kernel Hacking menu.
-
-if INSTRUMENTATION
-
-config PROFILING
-   bool "Profiling support (EXPERIMENTAL)"
-   help
- Say Y here to enable the extended profiling support mechanisms used
- by profilers such as OProfile.
-
-config OPROFILE
-   tristate "OProfile system profiling (EXPERIMENTAL)"
-   depends on PROFILING && !UML
-   help
- OProfile is a profiling system capable of profiling the
- whole system, include the kernel, kernel modules, libraries,
- and applications.
-
- If unsure, say N.
-
-config OPROFILE_ARMV6
-   bool
-   depends on OPROFILE && CPU_V6 && !SMP
-   default y
-   select OPROFILE_ARM11_CORE
-
-config OPROFILE_MPCORE
-   bool
-   depends on OPROFILE && CPU_V6 && SMP
-   default y
-   select OPROFILE_ARM11_CORE
-
-config OPROFILE_ARM11_CORE
-   bool
-
-config MARKERS
-   bool "Activate markers"
-   help
- Place an empty function call at each marker site. Can be
- dynamically changed for a probe function.
-
-endif # INSTRUMENTATION

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] libata: make libata use printk_header() and mprintk

2008-01-15 Thread Randy Dunlap
On Wed, 16 Jan 2008 10:00:10 +0900 Tejun Heo wrote:

> Reimplement libata printk helpers using printk_header, implement
> helpers to initialize mprintk and use mprintk during device
> configuration and EH reporting.
> 
> This fixes various formatting related problems of libata messages such
> as misaligned multiline messages, decoded register lines with leading
> headers making them difficult to tell to which error they belong to,
> awkward manual indents and complex message printing logics.  More
> importantly, by making message assembly flexible, this patch makes
> future changes to device configuration and EH reporting easier.
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> ---
>  drivers/ata/libata-core.c   |  202 
> +++
>  drivers/ata/libata-eh.c |  150 +++-
>  drivers/ata/libata-pmp.c|5 +-
>  drivers/ata/libata-scsi.c   |6 +-
>  drivers/ata/sata_inic162x.c |2 +-
>  drivers/ata/sata_nv.c   |4 +-
>  include/linux/libata.h  |   35 
>  7 files changed, 223 insertions(+), 181 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 4753a18..6fac482 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -125,6 +125,79 @@ MODULE_LICENSE("GPL");
>  MODULE_VERSION(DRV_VERSION);
>  
>  

> @@ -2295,23 +2342,18 @@ int ata_dev_configure(struct ata_device *dev)
>   dev->flags |= ATA_DFLAG_DIPM;
>   }
>  
> - if (dev->horkage & ATA_HORKAGE_DIAGNOSTIC) {
> + if ((dev->horkage & ATA_HORKAGE_DIAGNOSTIC) && print_info) {
>   /* Let the user know. We don't want to disallow opens for
>  rescue purposes, or in case the vendor is just a blithering
>  idiot */
> - if (print_info) {
> - ata_dev_printk(dev, KERN_WARNING,
> -"Drive reports diagnostics failure. This may indicate a drive\n");
> - ata_dev_printk(dev, KERN_WARNING,
> -"fault or invalid emulation. Contact drive vendor for information.\n");
> - }

Looks to me like several of these + lines have indent problems:
following lines (i.e., not first line) of function call should be
indented more than the first line:

> + ata_dev_printk(dev, KERN_WARNING,
> + "Drive reports diagnostics failure. This may indicate a drive\n"
> + "fault or invalid emulation. Contact drive vendor for 
> information.\n");
>   }
>  
>   /* limit bridge transfers to udma5, 200 sectors */
>   if (ata_dev_knobble(dev)) {
> - if (ata_msg_drv(ap) && print_info)
> - ata_dev_printk(dev, KERN_INFO,
> -"applying bridge limits\n");
> + mprintk_push(, ", applying bridge limits");
>   dev->udma_mask &= ATA_UDMA5;
>   dev->max_sectors = ATA_MAX_SECTORS;
>   }

> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
> index caef2bb..80bfa50 100644
> --- a/drivers/ata/libata-pmp.c
> +++ b/drivers/ata/libata-pmp.c
> @@ -408,9 +408,8 @@ static int sata_pmp_configure(struct ata_device *dev, int 
> print_info)
>  
>   if (!(dev->flags & ATA_DFLAG_AN))
>   ata_dev_printk(dev, KERN_INFO,
> - "Asynchronous notification not supported, "
> - "hotplug won't\n work on fan-out "
> - "ports. Use warm-plug instead.\n");

More indent needed below.

> + "Asynchronous notification not supported, hotplug 
> won't\n"
> + "work on fan-out ports. Use warm-plug instead.\n");
>   }
>  
>   return 0;
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 264ae60..7c13663 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3207,9 +3207,9 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync)
>   goto repeat;
>   }
>  
> - ata_port_printk(ap, KERN_ERR, "WARNING: synchronous SCSI scan "
> - "failed without making any progress,\n"
> - "  switching to async\n");
> + ata_port_printk(ap, KERN_ERR,

More indent needed below.

> + "WARNING: synchronous SCSI scan failed without making any \n"
> + " progress, switching to async\n");
>   }
>  
>   queue_delayed_work(ata_aux_wq, >hotplug_task,



---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH -mm 0/3] i386 boot: replace boot_ioremap with enhancedbt_ioremap

2008-01-15 Thread Pallipadi, Venkatesh
 

>-Original Message-
>From: Huang, Ying 
>Sent: Tuesday, January 15, 2008 1:49 AM
>To: Ingo Molnar; Pallipadi, Venkatesh
>Cc: [EMAIL PROTECTED]; H. Peter Anvin; Thomas 
>Gleixner; Ingo Molnar; Andi Kleen; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH -mm 0/3] i386 boot: replace boot_ioremap 
>with enhancedbt_ioremap
>
>On Tue, 2008-01-15 at 09:44 +0100, Ingo Molnar wrote:
>> * Huang, Ying <[EMAIL PROTECTED]> wrote:
>> 
>> > This patchset replaces boot_ioremap with a enhanced version of 
>> > bt_ioremap and renames the bt_ioremap to early_ioremap. 
>This reduces 
>> > 12k from .init.data segment and increases the size of 
>memory that can 
>> > be re-mapped before paging_init to 64k.
>> 
>> in latest x86.git#mm there's an early_ioremap() introduced 
>as part of 
>> the PAT series - available on both 32-bit and 64-bit. Could 
>you take a 
>> look at it and use that if it's OK for your purposes?
>
>After checking the early_ioremap() implementation in
>arch/x86/kernel/setup_32.c, I found that it is a duplication of
>bt_ioremap() implementation in arch/x86/mm/ioremap_32.c. Both
>implementations use set_fixmap(), so they can be used only after
>paging_init().
>
>The early_ioremap implementation provided in this patchset works as
>follow:
>
>- Enhances bt_ioremap, make it usable before paging_init() via a
>dedicated PTE page.
>- Rename bt_ioremap to early_ioremap
>
>So I think maybe we should replace the early_ioremap() 
>implementation in
>PAT series with that of this series.
>

Agreed. PAT can use this for early mappings. Thanks for the patches :)

-Venki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] x86: add is_f00f_bug helper to fault_32|64.c

2008-01-15 Thread Harvey Harrison
Further towards unifying these files, add another helper
in same spirit as is_errata93.  Add an #ifdef around the
forward declaration of do_invalid_op to make it clear it
is only needed in the one place.

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/mm/fault_32.c |   41 -
 arch/x86/mm/fault_64.c |   26 ++
 2 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/arch/x86/mm/fault_32.c b/arch/x86/mm/fault_32.c
index 936bb0c..f50df86 100644
--- a/arch/x86/mm/fault_32.c
+++ b/arch/x86/mm/fault_32.c
@@ -211,8 +211,6 @@ void dump_pagetable(unsigned long address)
printk("\n");
 }
 
-void do_invalid_op(struct pt_regs *, unsigned long);
-
 static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 {
unsigned index = pgd_index(address);
@@ -288,6 +286,28 @@ static int is_errata93(struct pt_regs *regs, unsigned long 
address)
return 0;
 }
 
+#ifdef CONFIG_X86_F00F_BUG
+void do_invalid_op(struct pt_regs *, unsigned long);
+#endif
+
+static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
+{
+#ifdef CONFIG_X86_F00F_BUG
+   unsigned long nr;
+   /*
+* Pentium F0 0F C7 C8 bug workaround.
+*/
+   if (boot_cpu_data.f00f_bug) {
+   nr = (address - idt_descr.address) >> 3;
+
+   if (nr == 6) {
+   do_invalid_op(regs, 0);
+   return 1;
+   }
+   }
+#endif
+   return 0;
+}
 
 /*
  * Handle a fault on the vmalloc or module mapping area
@@ -570,21 +590,8 @@ bad_area_nosemaphore:
return;
}
 
-#ifdef CONFIG_X86_F00F_BUG
-   /*
-* Pentium F0 0F C7 C8 bug workaround.
-*/
-   if (boot_cpu_data.f00f_bug) {
-   unsigned long nr;
-
-   nr = (address - idt_descr.address) >> 3;
-
-   if (nr == 6) {
-   do_invalid_op(regs, 0);
-   return;
-   }
-   }
-#endif
+   if (is_f00f_bug(regs, address))
+   return;
 
 no_context:
/* Are we prepared to handle this kernel fault?  */
diff --git a/arch/x86/mm/fault_64.c b/arch/x86/mm/fault_64.c
index cde110c..17cafe8 100644
--- a/arch/x86/mm/fault_64.c
+++ b/arch/x86/mm/fault_64.c
@@ -256,6 +256,29 @@ static int is_errata93(struct pt_regs *regs, unsigned long 
address)
return 0;
 }
 
+#ifdef CONFIG_X86_F00F_BUG
+void do_invalid_op(struct pt_regs *, unsigned long);
+#endif
+
+static int is_f00f_bug(struct pt_regs *regs, unsigned long address)
+{
+#ifdef CONFIG_X86_F00F_BUG
+   unsigned long nr;
+   /*
+* Pentium F0 0F C7 C8 bug workaround.
+*/
+   if (boot_cpu_data.f00f_bug) {
+   nr = (address - idt_descr.address) >> 3;
+
+   if (nr == 6) {
+   do_invalid_op(regs, 0);
+   return 1;
+   }
+   }
+#endif
+   return 0;
+}
+
 static noinline void pgtable_bad(unsigned long address, struct pt_regs *regs,
 unsigned long error_code)
 {
@@ -572,6 +595,9 @@ bad_area_nosemaphore:
return;
}
 
+   if (is_f00f_bug(regs, address))
+   return;
+
 no_context:
/* Are we prepared to handle this kernel fault?  */
if (fixup_exception(regs))
-- 
1.5.4.rc2.1164.g6451

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-15 Thread Randy Dunlap
On Wed, 16 Jan 2008 10:00:06 +0900 Tejun Heo wrote:

> Hello, all.
> 
> This patchset implements printk_header() and mprintk - merging printk
> - to make printing multiline messages and assembling message
> piece-by-piece easier.
> 
> In a nutshell, printk_header() lets you do the following atomically
> (against other messages).
> 
>  code:
+   printk_header(KERN_INFO "ata1.00: ", "line0\nline1\nline2\n");
> 
>  output:
>   <6>ata1.00: line0
>   <6> line1
>   <6> line2
> 
> And mprintk the following.
> 
>  code:
>   DEFINE_MPRINTK(mp, 2 * 80);
> 
>   mprintk_set_header(, KERN_INFO "ata%u.%2u: ", 1, 0);
>   mprintk_push(, "ATA %d", 7);
>   mprintk_push(, ", %u sectors\n", 1024);
>   mprintk(, "everything seems dandy\n");

Looks pretty good to me except that I would change mprintk_push to
mprintk_add or mprintk_append (I think that I prefer _add).

>  output:
>   <6>ata1.00: ATA 7, 1024 sectors
>   <6> everything seems dandy
> 
> Please read the commit messages and comments for more detail.  If this
> patchset is accepted, I'll write up Documentation/printk.txt which
> contains describtion of the API and guidelines - "don't pack unrelated
> messages into one" kind of stuff.
> 
> This patchset is against the current linux-2.6#master (031f2dcd) and
> contains the following patches.
> 
> 0001-printk-keep-log-level-on-multiline-messages.patch
> 0002-printk-implement-v-printk_header.patch
> 0003-printk-implement-merging-printk.patch
> 0004-libata-make-libata-use-printk_header-and-mprintk.patch
> 
>  drivers/ata/libata-core.c   |  202 +++--
>  drivers/ata/libata-eh.c |  150 --
>  drivers/ata/libata-pmp.c|5 
>  drivers/ata/libata-scsi.c   |6 
>  drivers/ata/sata_inic162x.c |2 
>  drivers/ata/sata_nv.c   |4 
>  include/linux/kernel.h  |   83 ++
>  include/linux/libata.h  |   35 ++--
>  kernel/printk.c |  354 
> 
>  9 files changed, 630 insertions(+), 211 deletions(-)
> 
> More than half of the code increase in kernel.h are from the dummy
> declarations for !CONFIG_PRINTK.  More than one third of printk.c
> increase are comments.  On my x86-64 configuration, printk.o grows
> from 30152 to 34128.
> 
> libata code grows slightly but the increase is from converting the
> printk wrapper from #define to proper functions.  The converted areas
> - device configuration and EH reporting - were reduced in comlexity
> and size.  With all-y, drivers/ata/built-in.o shrinks from 726509 to
> 717657 mostly due to the conversion away from macros.


---
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] lost softirq, 2.6.24-rc7

2008-01-15 Thread Rowand, Frank

Steve,

You are totally correct.  I used the wrong words when I said
"ksoftirqd thread runs".  My apologies for very misleading wording.

I have updated the wording in-line below, in the original message to
indicate that it is softirq threads, in the ksoftirqd() function, not
the ksoftirqd thread.

-Frank

-Original Message-
From: Steven Rostedt [mailto:[EMAIL PROTECTED]
Sent: Tue 1/15/2008 4:39 PM
To: Rowand, Frank
Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]
Subject: Re: [PATCH] lost softirq, 2.6.24-rc7
 
On Tue, Jan 15, 2008 at 02:15:26PM -0800, Frank Rowand wrote:
> From: Frank Rowand <[EMAIL PROTECTED]>
> 
> (Ingo, there is a question for you after the description, just before the
> patch.)
> 
> When running an interrupt and network intensive stress test with PREEMPT_RT
> enabled, the target system stopped processing received network packets.
> skbs from received packets were being queued by net_rx_action(), but the
> NET_RX_SOFTIRQ softirq was never running to remove the skbs from the queue.
> Since the target system root file system is NFS mounted, the system is now
> effectively hung.
> 
> A pseudocode description of how this state was reached follows.
> Each level of indentation represents a function call from the previous line.
> 
> 
> ethernet driver irq handler receives packet
>netif_rx()
>   queues skb (qlen == 1), raises NET_RX_SOFTIRQ
> 
> on return from irq
>___do_softirq() [ 1 ]
>   Reset the pending bitmask

Frank,

This path should not be hit when running with PREEMPT_RT. The softirqs
are now all separate, and are not run in batch in ksoftirqd. In fact,
ksoftirqd should not be running at all with PREEMPT_RT.

-- Steve

>   net_rx_action()
>  dequeues skb (qlen == 0)
>  jiffies incremented, so
> break out of processing
> and raise NET_RX_SOFTIRQ
> (but don't deassert NAPI_STATE_SCHED)
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> -
>  ksoftirqd thread runs

   ^^  should have been:

   the TIMER_SOFTIRQ and RCU_SOFTIRQ softirq threads, which are
   both executing ksoftirqd() run

> process TIMER_SOFTIRQ
> process RCU_SOFTIRQ

> << ksoftirqd sleeps >>

  ^^^  should have been:
  the softirq threads, executing in ksoftirqd(), sleep

> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
> -
> 
>  ___do_softirq() [ 2 ]
> Reset the pending bitmask
> finds NET_RX_SOFTIRQ raised but already running
> << ___do_softirq() [ 2 ] completes >>
> 
>   << ___do_softirq() [ 1 ] resumes >>
>   the pending bitmask is empty, so NET_RX_SOFTIRQ is lost
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/5] add /dev/mem_notify device

2008-01-15 Thread KOSAKI Motohiro
Hi Alan

> > > It also appears there is no way to wait for memory shortages (processes
> > > that can free memory easily) only for memory to start appearing.
> > 
> > poll() with never timeout don't fill your requirement?
> > to be honest, maybe I don't understand your afraid yet. sorry.
> 
> My misunderstanding. There is in fact no way to wait for memory to become
> available. The poll() method you provide works nicely waiting for
> shortages and responding to them by freeing memory.
> 
> It would be interesting to add FASYNC support to this. Some users have
> asked for a signal when memory shortage occurs (as IBM AIX provides
> this). FASYNC support would allow a SIGIO to be delivered from this
> device when memory shortages occurred. Poll as you have implemented is of
> course the easier way for a program to monitor memory and a better
> interface.

OK.
I will challenge implement at mem_notify v5.


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/4] x86: PAT followup - Do not fold two bits in _PAGE_PCD

2008-01-15 Thread venkatesh . pallipadi
Do not fold PCD and PWT bits in _PAGE_PCD. Instead, introduce a new
_PAGE_UC which defines uncached mappings and use it in place of _PAGE_PCD.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

Index: linux-2.6.git/arch/x86/mm/ioremap_32.c
===
--- linux-2.6.git.orig/arch/x86/mm/ioremap_32.c 2008-01-15 03:29:38.0 
-0800
+++ linux-2.6.git/arch/x86/mm/ioremap_32.c  2008-01-15 04:42:59.0 
-0800
@@ -173,7 +173,7 @@
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-   return __ioremap(phys_addr, size, _PAGE_PCD);
+   return __ioremap(phys_addr, size, _PAGE_UC);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
Index: linux-2.6.git/arch/x86/mm/ioremap_64.c
===
--- linux-2.6.git.orig/arch/x86/mm/ioremap_64.c 2008-01-15 03:29:38.0 
-0800
+++ linux-2.6.git/arch/x86/mm/ioremap_64.c  2008-01-15 04:43:07.0 
-0800
@@ -150,7 +150,7 @@
 
 void __iomem *ioremap_nocache (unsigned long phys_addr, unsigned long size)
 {
-   return __ioremap(phys_addr, size, _PAGE_PCD);
+   return __ioremap(phys_addr, size, _PAGE_UC);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
Index: linux-2.6.git/arch/x86/mm/pat.c
===
--- linux-2.6.git.orig/arch/x86/mm/pat.c2008-01-15 03:29:38.0 
-0800
+++ linux-2.6.git/arch/x86/mm/pat.c 2008-01-15 05:01:43.0 -0800
@@ -64,7 +64,7 @@
if (smp_processor_id() && !pat_wc_enabled)
return;
 
-   /* Set PWT+PCD to Write-Combining. All other bits stay the same */
+   /* Set PCD to Write-Combining. All other bits stay the same */
/* PTE encoding used in Linux:
  PAT
  |PCD
@@ -72,7 +72,7 @@
  |||
  000 WB default
  010 WC _PAGE_WC
- 011 UC _PAGE_PCD
+ 011 UC _PAGE_UC
PAT bit unused */
pat = PAT(0,WB) | PAT(1,WT) | PAT(2,WC) | PAT(3,UC) |
  PAT(4,WB) | PAT(5,WT) | PAT(6,WC) | PAT(7,UC);
@@ -97,7 +97,7 @@
 {
switch (flags & _PAGE_CACHE_MASK) {
case _PAGE_WC:  return "write combining";
-   case _PAGE_PCD: return "uncached";
+   case _PAGE_UC: return "uncached";
case 0: return "default";
default:return "broken";
}
@@ -144,7 +144,7 @@
if (!fattr)
return -EINVAL;
else
-   *fattr  = _PAGE_PCD;
+   *fattr  = _PAGE_UC;
}
 
return 0;
@@ -227,13 +227,13 @@
unsigned long flags;
unsigned long want_flags = 0;
if (file->f_flags & O_SYNC)
-   want_flags = _PAGE_PCD;
+   want_flags = _PAGE_UC;
 
 #ifdef CONFIG_X86_32
/*
 * On the PPro and successors, the MTRRs are used to set
 * memory types for physical addresses outside main memory,
-* so blindly setting PCD or PWT on those pages is wrong.
+* so blindly setting UC or PWT on those pages is wrong.
 * For Pentiums and earlier, the surround logic should disable
 * caching for the high addresses through the KEN pin, but
 * we maintain the tradition of paranoia in this code.
@@ -244,7 +244,7 @@
test_bit(X86_FEATURE_CYRIX_ARR, boot_cpu_data.x86_capability) ||
test_bit(X86_FEATURE_CENTAUR_MCR, 
boot_cpu_data.x86_capability)) &&
   offset >= __pa(high_memory))
-   want_flags = _PAGE_PCD;
+   want_flags = _PAGE_UC;
 #endif
 
/* ignore error because we can't handle it here */
Index: linux-2.6.git/arch/x86/pci/i386.c
===
--- linux-2.6.git.orig/arch/x86/pci/i386.c  2008-01-15 03:29:38.0 
-0800
+++ linux-2.6.git/arch/x86/pci/i386.c   2008-01-15 05:02:12.0 -0800
@@ -353,7 +353,7 @@
 */
prot = pgprot_val(vma->vm_page_prot);
if (boot_cpu_data.x86 > 3) {
-   prot |= _PAGE_PCD;
+   prot |= _PAGE_UC;
}
vma->vm_page_prot = __pgprot(prot);
}
Index: linux-2.6.git/include/asm-x86/pgtable.h
===
--- linux-2.6.git.orig/include/asm-x86/pgtable.h2008-01-15 
03:29:38.0 -0800
+++ linux-2.6.git/include/asm-x86/pgtable.h 2008-01-15 05:11:12.0 
-0800
@@ -28,14 +28,16 @@
 #define _PAGE_RW   (_AC(1, L)<<_PAGE_BIT_RW)
 #define _PAGE_USER (_AC(1, L)<<_PAGE_BIT_USER)
 #define _PAGE_PWT  (_AC(1, L)<<_PAGE_BIT_PWT)
-#define _PAGE_PCD  ((_AC(1, L)<<_PAGE_BIT_PCD) | 

[patch 3/4] x86: PAT followup - Remove reserved pages mapping to zero page and not map them

2008-01-15 Thread venkatesh . pallipadi
Remove reserved pages mapping to zeropage. Reserved and holes are now not
mapped at all.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

Index: linux-2.6.git/arch/x86/mm/init_32.c
===
--- linux-2.6.git.orig/arch/x86/mm/init_32.c2008-01-15 11:02:23.0 
-0800
+++ linux-2.6.git/arch/x86/mm/init_32.c 2008-01-15 11:08:29.0 -0800
@@ -143,50 +143,6 @@
return 0;
 }
 
-static unsigned long __init get_res_page(void)
-{
-   static unsigned long res_phys_page;
-   if (!res_phys_page) {
-
-   res_phys_page = (unsigned long)
-   alloc_bootmem_low_pages(PAGE_SIZE);
-   if (!res_phys_page)
-   BUG();
-
-   memset((char *)res_phys_page, 0xe, PAGE_SIZE);
-   res_phys_page = __pa(res_phys_page);
-   }
-   return res_phys_page;
-}
-
-static unsigned long __init get_res_ptepage(void)
-{
-   static unsigned long res_phys_ptepage;
-   pte_t *pte;
-   int pte_ofs;
-   unsigned long pfn;
-
-   if (!res_phys_ptepage) {
-
-   res_phys_ptepage = (unsigned long)
-  alloc_bootmem_low_pages(PAGE_SIZE);
-   if (!res_phys_ptepage)
-   BUG();
-
-   paravirt_alloc_pt(_mm,
- __pa(res_phys_ptepage) >> PAGE_SHIFT);
-
-   /* Set all PTEs in the range to zero page */
-   pfn = get_res_page() >> PAGE_SHIFT;
-   pte = (pte_t *)res_phys_ptepage;
-   for (pte_ofs = 0; pte_ofs < PTRS_PER_PTE; pte++, pte_ofs++)
-   set_pte(pte, pfn_pte(pfn, PAGE_KERNEL));
-
-   res_phys_ptepage = __pa(res_phys_ptepage);
-   }
-   return res_phys_ptepage;
-}
-
 /*
  * This maps the physical memory to kernel virtual address space, a total 
  * of max_low_pfn pages, by creating page tables starting from address 
@@ -199,7 +155,6 @@
pmd_t *pmd;
pte_t *pte;
int pgd_idx, pmd_idx, pte_ofs;
-   unsigned long temp_pfn;
 
pgd_idx = pgd_index(PAGE_OFFSET);
pgd = pgd_base + pgd_idx;
@@ -238,9 +193,7 @@
}
if (cpu_has_pse &&
!is_memory_any_valid(paddr, paddr + PMD_SIZE)) {
-
-   temp_pfn = get_res_ptepage();
-   set_pmd(pmd, __pmd(temp_pfn | _PAGE_TABLE));
+   set_pmd(pmd, __pmd(0));
pfn += PTRS_PER_PTE;
continue;
}
@@ -259,10 +212,7 @@
 
if (!is_memory_any_valid(paddr,
 paddr + PAGE_SIZE)) {
-
-   temp_pfn = get_res_page() >> PAGE_SHIFT;
-   set_pte(pte,
-   pfn_pte(temp_pfn, PAGE_KERNEL));
+   set_pte(pte, __pte(0));
continue;
}
 
Index: linux-2.6.git/arch/x86/mm/init_64.c
===
--- linux-2.6.git.orig/arch/x86/mm/init_64.c2008-01-15 11:06:37.0 
-0800
+++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-15 11:09:18.0 -0800
@@ -494,41 +494,6 @@
 kcore_vsyscall;
 
 
-static unsigned long __init get_res_page(void)
-{
-   static unsigned long res_phys_page;
-   if (!res_phys_page) {
-   pte_t *pte;
-   pte = alloc_low_page(_phys_page);
-   unmap_low_page(pte);
-   }
-   return res_phys_page;
-}
-
-static unsigned long __init get_res_ptepage(void)
-{
-   static unsigned long res_phys_ptepage;
-   if (!res_phys_ptepage) {
-   pte_t *pte_page;
-   unsigned long page_phys;
-   unsigned long entry;
-   int i;
-
-   pte_page = alloc_low_page(_phys_ptepage);
-
-   page_phys = get_res_page();
-   entry = _PAGE_NX | _KERNPG_TABLE | _PAGE_GLOBAL | page_phys;
-   entry &= __supported_pte_mask;
-   for (i = 0; i < PTRS_PER_PTE; i++) {
-   pte_t *pte = pte_page + i;
-   set_pte(pte, __pte(entry));
-   }
-
-   unmap_low_page(pte_page);
-   }
-   return res_phys_ptepage;
-}
-
 static void __init phys_pte_prune(pte_t *pte_page, unsigned long address,
unsigned long end, unsigned long vaddr, unsigned int exec)
 {
@@ -545,15 +510,7 @@
if (!(address & (~PAGE_MASK)) &&
(address + PAGE_SIZE <= end) &&

[patch 2/4] x86: PAT followup - Remove KERNPG_TABLE from pte entry

2008-01-15 Thread venkatesh . pallipadi
KERNPG_TABLE was a bug in earlier patch. Remove it from pte.
pte_val() check is redundant as this routine is called immediately after a
ptepage is allocated afresh.

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

Index: linux-2.6.git/arch/x86/mm/init_64.c
===
--- linux-2.6.git.orig/arch/x86/mm/init_64.c2008-01-15 11:02:23.0 
-0800
+++ linux-2.6.git/arch/x86/mm/init_64.c 2008-01-15 11:06:37.0 -0800
@@ -541,9 +541,6 @@
if (address >= end)
break;
 
-   if (pte_val(*pte))
-   continue;
-
/* Nothing to map. Map the null page */
if (!(address & (~PAGE_MASK)) &&
(address + PAGE_SIZE <= end) &&
@@ -561,9 +558,9 @@
}
 
if (exec)
-   entry = _PAGE_NX|_KERNPG_TABLE|_PAGE_GLOBAL|address;
+   entry = _PAGE_NX|_PAGE_GLOBAL|address;
else
-   entry = _KERNPG_TABLE|_PAGE_GLOBAL|address;
+   entry = _PAGE_GLOBAL|address;
entry &= __supported_pte_mask;
set_pte(pte, __pte(entry));
}

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/4] x86: PAT followup - Incremental changes and bug fixes

2008-01-15 Thread venkatesh . pallipadi
Some incremental changes and bug fixes for PAT patchset. The changes are from
the feedback we received earlier. There are few more pending changes that will
follow soon.

Thanks,
Venki
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/4] x86: PAT followup - use ioremap for devmem read of reserved regions

2008-01-15 Thread venkatesh . pallipadi
map and unmap reserved regions, before accessing through /dev/mem read
interface. This is for full compatibility with existing /dev/mem
usages.
For regions that are mapped in identity map, we use __va().

Signed-off-by: Venkatesh Pallipadi <[EMAIL PROTECTED]>
Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]>

Index: linux-2.6.git/arch/x86/mm/ioremap.c
===
--- linux-2.6.git.orig/arch/x86/mm/ioremap.c2008-01-15 10:05:13.0 
-0800
+++ linux-2.6.git/arch/x86/mm/ioremap.c 2008-01-15 10:39:18.0 -0800
@@ -32,6 +32,39 @@
 }
 EXPORT_SYMBOL(ioremap_wc);
 
+/*
+ * Convert a physical pointer to a virtual kernel pointer for /dev/mem
+ * access
+ */
+void *xlate_dev_mem_ptr(unsigned long phys)
+{
+   void *addr;
+   unsigned long start = phys & PAGE_MASK;
+
+   /*
+* If any memory in PAGE_SIZE is valid, then we can use __va. Otherwise
+* ioremap and unmap the memory.
+*/
+   if (is_memory_any_valid(start, start + PAGE_SIZE))
+   return __va(phys);
+
+   addr = (void *)ioremap(start, PAGE_SIZE);
+   if (addr)
+   addr = (void *)((unsigned long)addr | (phys & ~PAGE_MASK));
+
+   return addr;
+}
+
+void unxlate_dev_mem_ptr(unsigned long phys, void *addr)
+{
+   unsigned long start = phys & PAGE_MASK;
+   if (is_memory_any_valid(start, start + PAGE_SIZE))
+   return;
+
+   iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
+   return;
+}
+
 int valid_phys_addr_range(unsigned long addr, size_t count)
 {
if (addr + count > __pa(high_memory))
Index: linux-2.6.git/drivers/char/mem.c
===
--- linux-2.6.git.orig/drivers/char/mem.c   2008-01-15 10:05:13.0 
-0800
+++ linux-2.6.git/drivers/char/mem.c2008-01-15 10:05:51.0 -0800
@@ -127,9 +127,14 @@
 * by the kernel or data corruption may occur
 */
ptr = xlate_dev_mem_ptr(p);
+   if (!ptr)
+   return -EFAULT;
 
if (copy_to_user(buf, ptr, sz))
return -EFAULT;
+
+   unxlate_dev_mem_ptr(p, ptr);
+
buf += sz;
p += sz;
count -= sz;
@@ -184,6 +189,11 @@
 * by the kernel or data corruption may occur
 */
ptr = xlate_dev_mem_ptr(p);
+   if (!ptr) {
+   if (written)
+   break;
+   return -EFAULT;
+   }
 
copied = copy_from_user(ptr, buf, sz);
if (copied) {
@@ -192,6 +202,9 @@
break;
return -EFAULT;
}
+
+   unxlate_dev_mem_ptr(p, ptr);
+
buf += sz;
p += sz;
count -= sz;
Index: linux-2.6.git/include/asm-generic/iomap.h
===
--- linux-2.6.git.orig/include/asm-generic/iomap.h  2008-01-15 
10:05:13.0 -0800
+++ linux-2.6.git/include/asm-generic/iomap.h   2008-01-15 10:23:24.0 
-0800
@@ -69,4 +69,8 @@
 #define ioremap_wc ioremap_nocache
 #endif
 
+#ifndef unxlate_dev_mem_ptr
+static inline void unxlate_dev_mem_ptr(unsigned long phys, void *addr) {}
+#endif
+
 #endif
Index: linux-2.6.git/include/asm-x86/io.h
===
--- linux-2.6.git.orig/include/asm-x86/io.h 2008-01-15 10:05:13.0 
-0800
+++ linux-2.6.git/include/asm-x86/io.h  2008-01-15 10:21:42.0 -0800
@@ -2,6 +2,7 @@
 #define _ASM_X86_IO_H
 
 #define ioremap_wc ioremap_wc
+#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
 
 #ifdef CONFIG_X86_32
 # include "io_32.h"
@@ -10,6 +11,8 @@
 #endif
 
 extern void __iomem * ioremap_wc(unsigned long offset, unsigned long size);
+extern void *xlate_dev_mem_ptr(unsigned long phys);
+extern void unxlate_dev_mem_ptr(unsigned long phys, void *addr);
 
 #define ARCH_HAS_VALID_PHYS_ADDR_RANGE
 
Index: linux-2.6.git/include/asm-x86/io_32.h
===
--- linux-2.6.git.orig/include/asm-x86/io_32.h  2008-01-15 10:05:13.0 
-0800
+++ linux-2.6.git/include/asm-x86/io_32.h   2008-01-15 10:05:51.0 
-0800
@@ -49,12 +49,6 @@
 #include 
 
 /*
- * Convert a physical pointer to a virtual kernel pointer for /dev/mem
- * access
- */
-#define xlate_dev_mem_ptr(p)   __va(p)
-
-/*
  * Convert a virtual cached pointer to an uncached pointer
  */
 #define xlate_dev_kmem_ptr(p)  p
Index: linux-2.6.git/include/asm-x86/io_64.h
===
--- linux-2.6.git.orig/include/asm-x86/io_64.h  2008-01-15 10:05:13.0 
-0800
+++ linux-2.6.git/include/asm-x86/io_64.h   2008-01-15 

RE: [PATCH] lost softirq, 2.6.24-rc7

2008-01-15 Thread Steven Rostedt

On Tue, 15 Jan 2008, Rowand, Frank wrote:

>
> Steve,
>
> You are totally correct.  I used the wrong words when I said
> "ksoftirqd thread runs".  My apologies for very misleading wording.
>
> I have updated the wording in-line below, in the original message to
> indicate that it is softirq threads, in the ksoftirqd() function, not
> the ksoftirqd thread.

Actually, it's the fact that the code you show runs in ___do_softirq().
In full PREEMPT_RT, that should never happen.

Well, there is one case that that code can run. It's when hardirqs and
softirqs have the same prio, and the hardirq is bound to a single CPU.
But we've had so much trouble with running softirqs from hardirq threads,
that I've disabled it for -rt3.

I'll be (hopefully) releasing -rt3 tonight. I'm not including this patch
because it should never hit those code paths. But feel free to complain if
you still see this issue, and it goes away with the patch. Actually, I've
been thinking of adding a

#ifdef CONFIG_PREEMPT_RT
WARN_ON(1);
#endif

at the start of ___do_softirq();

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/26] Permit filesystem local caching

2008-01-15 Thread Kyle Moffett

On Jan 15, 2008, at 18:46, David Howells wrote:

 (*) 01-keys-inc-payload.diff
 (*) 02-keys-search-keyring.diff
 (*) 03-keys-callout-blob.diff


One vaguely related question:  Is there presently any way to adjust  
the per-user max-key-data limit? I've been tinkering with using the  
new-ish MIT kerberos "KEYRING:" credentials-cache code to hold keys  
for persistent daemons.  Unfortunately "root" keeps hitting the limit  
even with only about 16 keys allocated across a few sessions.  After  
perusing the docs I can't find any documentation on adjusting the  
limits.


I'd really like some way to specifically allow root to allocate up to  
several megs worth of non-swappable key data, although I suppose just  
increasing the global limit slightly wouldn't be bad either.  If such  
functionality already exists then I'd appreciate a pointer to it (and  
possibly respond in kind with documentation patches).


Cheers,
Kyle Moffett

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix Blackfin HARDWARE_PM support

2008-01-15 Thread Bryan Wu
On Jan 16, 2008 1:42 AM, Mathieu Desnoyers <[EMAIL PROTECTED]> wrote:
> This patch restores the blackfin Hardware Performance Monitor Profiling
> support that was killed by
> commit 09cadedbdc01f1a4bea1f427d4fb4642eaa19da9.
>
> Since there seems to be no good reason to behave differently from other
> architectures, it now automatically selects the hardware performance counters
> whenever the profiling is activated.
>
> mach-common/irqpanic.c: pm_overflow
> calls pm_overflow_handler which is in oprofile/op_model_bf533.c. I doubt that
> setting HARDWARE_PM as "m" will work at all, since the pm_overflow_handler
> should be in the core kernel image because it is called by irqpanic.c.
>
> Therefore, I change HARDWARE_PM from a tristate to a bool.
>
> The whole arch/$(ARCH)/oprofile/ is built depending on CONFIG_OPROFILE. Since
> part of the HARDWARE_PM support files sits in this directory, it makes sense 
> to
> also depend on OPROFILE, not only PROFILING. Since OPROFILE already depends on
> PROFILING, it is correct to only depend on OPROFILE only.
>
> Thanks to Adrian Bunk for finding this bug and providing an initial
> patch.
>
> Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]>
> CC: Linus Torvalds <[EMAIL PROTECTED]>
> CC: Adrian Bunk <[EMAIL PROTECTED]>
> CC: Randy Dunlap <[EMAIL PROTECTED]>
> CC: [EMAIL PROTECTED]
> CC: Robin Getz <[EMAIL PROTECTED]>
> ---

Thanks, finally we got this.

Acked-by: Bryan Wu <[EMAIL PROTECTED]>

>  arch/blackfin/Kconfig |4 
>  1 file changed, 4 insertions(+)
>
> Index: linux-2.6-lttng/arch/blackfin/Kconfig
> ===
> --- linux-2.6-lttng.orig/arch/blackfin/Kconfig  2007-12-29 11:00:05.0 
> -0500
> +++ linux-2.6-lttng/arch/blackfin/Kconfig   2007-12-29 11:25:39.0 
> -0500
> @@ -65,6 +65,10 @@ config GENERIC_CALIBRATE_DELAY
> bool
> default y
>
> +config HARDWARE_PM
> +   def_bool y
> +   depends on OPROFILE
> +
>  source "init/Kconfig"
>  source "kernel/Kconfig.preempt"
>
>
>
> --
> Mathieu Desnoyers
> Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

2008-01-15 Thread Fengguang Wu
On Tue, Jan 15, 2008 at 10:33:01AM -0800, Michael Rubin wrote:
> On Jan 15, 2008 4:36 AM, Fengguang Wu <[EMAIL PROTECTED]> wrote:
> > Andrew,
> >
> > This patchset mainly polishes the writeback queuing policies.
> 
> Anyone know which tree is this patched based out of?

They are against the latest -mm tree, or 2.6.24-rc6-mm1.

> > The main goals are:
> >
> > (1) small files should not be starved by big dirty files
> > (2) sync as fast as possible for not-blocked inodes/pages
> > - don't leave them out; no congestion_wait() in between them
> > (3) avoid busy iowait for blocked inodes
> > - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
> >
> 
> Fengguang do you have any specific tests for any of these cases? As I
> have posted earlier I am putting together a writeback test suite for
> test.kernel.org and if you have one (even if it's an ugly shell
> script) that would save me some time.

No, I just run tests with cp/dd etc.  I analyze the code and debug
traces a lot, and know that it works in the situations I can imagine.
But dedicated test suites are good in the long term.

> Also if you want any of mine let me know. :-)

OK, thank you.

Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 3/4] RT: remove finish_arch_switch

2008-01-15 Thread Rowand, Frank
Steve,

Thanks, I'll bring this up over on the linux-mips list to see how they
really want to handle it.

-Frank


-Original Message-
From: Steven Rostedt [mailto:[EMAIL PROTECTED]
Sent: Tue 1/15/2008 6:14 PM
To: Rowand, Frank
Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [PATCH 3/4] RT: remove finish_arch_switch
 
On Tue, Jan 15, 2008 at 02:20:46PM -0800, Frank Rowand wrote:
> From: Frank Rowand <[EMAIL PROTECTED]>
> 
> 
> Index: linux-2.6.24-rc7/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> ===
> --- linux-2.6.24-rc7.orig/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> +++ linux-2.6.24-rc7/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> @@ -1,6 +1,13 @@
>  #ifndef __ASM_MACH_TX49XX_CPU_FEATURE_OVERRIDES_H
>  #define __ASM_MACH_TX49XX_CPU_FEATURE_OVERRIDES_H
>  
> +/* finish_arch_switch_empty is defined if we know finish_arch_switch() will
> + * be empty, based on the lack of features defined in this file.  This is
> + * needed because config preempt will barf in kernel/sched.c ifdef
> + * finish_arch_switch
> + */
> +#define finish_arch_switch_empty
> +
>  #define cpu_has_llsc 1
>  #define cpu_has_64bits   1
>  #define cpu_has_inclusive_pcaches0
> Index: linux-2.6.24-rc7/include/asm-mips/system.h
> ===
> --- linux-2.6.24-rc7.orig/include/asm-mips/system.h
> +++ linux-2.6.24-rc7/include/asm-mips/system.h
> @@ -70,6 +70,8 @@ do {
> \
>   (last) = resume(prev, next, task_thread_info(next));\
>  } while (0)
>  
> +/* preempt kernel barfs in kernel/sched.c ifdef finish_arch_switch */
> +#ifndef finish_arch_switch_empty

I'll take this patch for now, but currently it looks like a hack. I know
you said that, but I'm hoping someone will come up with a better
solution.

Thanks,

-- Steve

>  #define finish_arch_switch(prev) \
>  do { \
>   if (cpu_has_dsp)\
> @@ -77,6 +79,7 @@ do {
> \
>   if (cpu_has_userlocal)  \
>   write_c0_userlocal(current_thread_info()->tp_value);\
>  } while (0)
> +#endif
>  
>  static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
>  {
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] RT: change from raw_spinlock_t to __raw_spinlock_t

2008-01-15 Thread Steven Rostedt
On Tue, Jan 15, 2008 at 02:21:46PM -0800, Frank Rowand wrote:
> From: Frank Rowand <[EMAIL PROTECTED]>
> 
> Fix compile warning (which becomes compile error due to -Werror),
> by changing from raw_spinlock_t to __raw_spinlock_t.
> 

Applied.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] RT: remove finish_arch_switch

2008-01-15 Thread Steven Rostedt
On Tue, Jan 15, 2008 at 02:20:46PM -0800, Frank Rowand wrote:
> From: Frank Rowand <[EMAIL PROTECTED]>
> 
> 
> Index: linux-2.6.24-rc7/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> ===
> --- linux-2.6.24-rc7.orig/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> +++ linux-2.6.24-rc7/include/asm-mips/mach-tx49xx/cpu-feature-overrides.h
> @@ -1,6 +1,13 @@
>  #ifndef __ASM_MACH_TX49XX_CPU_FEATURE_OVERRIDES_H
>  #define __ASM_MACH_TX49XX_CPU_FEATURE_OVERRIDES_H
>  
> +/* finish_arch_switch_empty is defined if we know finish_arch_switch() will
> + * be empty, based on the lack of features defined in this file.  This is
> + * needed because config preempt will barf in kernel/sched.c ifdef
> + * finish_arch_switch
> + */
> +#define finish_arch_switch_empty
> +
>  #define cpu_has_llsc 1
>  #define cpu_has_64bits   1
>  #define cpu_has_inclusive_pcaches0
> Index: linux-2.6.24-rc7/include/asm-mips/system.h
> ===
> --- linux-2.6.24-rc7.orig/include/asm-mips/system.h
> +++ linux-2.6.24-rc7/include/asm-mips/system.h
> @@ -70,6 +70,8 @@ do {
> \
>   (last) = resume(prev, next, task_thread_info(next));\
>  } while (0)
>  
> +/* preempt kernel barfs in kernel/sched.c ifdef finish_arch_switch */
> +#ifndef finish_arch_switch_empty

I'll take this patch for now, but currently it looks like a hack. I know
you said that, but I'm hoping someone will come up with a better
solution.

Thanks,

-- Steve

>  #define finish_arch_switch(prev) \
>  do { \
>   if (cpu_has_dsp)\
> @@ -77,6 +79,7 @@ do {
> \
>   if (cpu_has_userlocal)  \
>   write_c0_userlocal(current_thread_info()->tp_value);\
>  } while (0)
> +#endif
>  
>  static inline unsigned long __xchg_u32(volatile int * m, unsigned int val)
>  {
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] mmaped copy too slow?

2008-01-15 Thread KOSAKI Motohiro
Hi Paulo

> One thing you could also try is to pass MAP_POPULATE to mmap so that the 
> page tables are filled in at the time of the mmap, avoiding a lot of 
> page faults later.
> 
> Just my 2 cents,

OK, I will test your idea and report about tomorrow.
but I don't think page fault is major performance impact.

may be, below 2 things too big
  - stupid page reclaim
  - large cache pollution by memcpy.

Just my 2 cents :-p


- kosaki


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 3/5] add /dev/mem_notify device

2008-01-15 Thread KOSAKI Motohiro
Hi Pavel

> > err = poll(, 1, -1); // wake up at low memory
> > 
> > ...
> > 
> 
> Nice, this is really needed for openmoko, zaurus, etc
> 
> But this changelog needs to go into Documentation/...
> 
> ...and /dev/mem_notify is really a bad name. /dev/memory_low?
> /dev/oom?

thank you for your kindful advise.

but..

to be honest, my english is very limited.
I can't make judgments name is good or not.

Marcelo, What do you think his idea?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] SH/Dreamcast - add support for GD-Rom CDROM drive on SEGA Dreamcast

2008-01-15 Thread Paul Mundt
On Sat, Jan 12, 2008 at 05:36:30AM -0800, Andrew Morton wrote:
> On Fri, 11 Jan 2008 21:56:49 + Adrian McMenamin <[EMAIL PROTECTED]> wrote:
> > +/* keep the function looking like the universal CD Rom specification - 
> > returning int*/
> > +static int gdrom_packetcommand(struct cdrom_device_info *cd_info, struct 
> > packet_command *command)
> > +{
> > +   gdrom_spicommand(>cmd, command->buflen);
> > +   return 0;
> > +}
> 
> Please pass the diff through scripts/checkpatch.pl.  Some things, like the
> above, you may choose to fix.  Some you definitely will.

On Tue, Jan 15, 2008 at 08:41:39PM +, Adrian McMenamin wrote:
> On 15/01/2008, Paul Mundt <[EMAIL PROTECTED]> wrote:
> > On Mon, Jan 14, 2008 at 11:17:15PM +, Adrian McMenamin wrote:
> 
> >
> > > +static bool gdrom_data_request(void)
> > > +{
> > > + return (ctrl_inb(GDROM_ALTSTATUS_REG) & 0x88) == 8;
> > > +}
> > > +
> > Andrew first pointed this out, and this is still broken.
> >
> 
> Eh, no, he didn't. What is wrong with it?
> 
Quoted above for your convenience.

> He compalined about excessively long busy waiting and then not
> checking if the busy wait failed. Both those have been fixed.

This was also covered in the checkpatch output that you conveniently
trimmed in your reply. If you can't see the problem, either your version
of checkpatch or your editor are broken. Huge amounts of your patch
continue to be whitespace damaged, and while you claim to have fixed
that, checkpatch continually whines about the same thing in each
iteration of your patch. Please just fix it up instead of arguing about
it, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/26] Permit filesystem local caching

2008-01-15 Thread James Morris
On Tue, 15 Jan 2008, David Howells wrote:

> 
>   (*) 04-keys-get-label.diff
> 
>   A patch to allow the security label of a key to be retrieved.
>   Included because of patches 05-08.
> 
>   (*) 05-security-current-fsugid.diff
>   (*) 06-security-separate-task-bits.diff
>   (*) 07-security-subjective.diff
>   (*) 08-security-secctx2secid.diff
>   (*) 09-security-additional-classes.diff
>   (*) 10-security-kernel_service-class.diff
>   (*) 11-security-kernel-service.diff

All of the security patches look ok to me.  From the SELinux pov, this 
will need to go in after Paul Moore's labeled networking update (hopefully 
very soon after the next merge window opens).


- James
-- 
James Morris
<[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [MEMSTICK] Initial commit for Sony MemoryStick support

2008-01-15 Thread Alex Dubov

--- Mariusz Kozlowski <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> > Sony MemoryStick cards are used in many products manufactured by Sony. They
> > are available both as storage and as IO expansion cards. Currently, only
> > MemoryStick Pro storage cards are supported via TI FlashMedia MemoryStick
> > interface.
> 
> I tried it here and it doesn't work. My Vaio (PCG-FR285M) is from ~2003 (Is 
> it too old
> for this?). I have some memory stick cards around so If you want a tester 
> just drop me
> an email.
> 
> Regards,
> 
>   Mariusz
> 

The build year is nowhere as helpful as 'lspci -vv' output. Then, given that 
your vaio is equipped
with tifm controller, you'll have to build the driver with debugging enabled 
and send me the
relevant excerpt of your system log.

You should have the following modules loaded, by the way:
memstick
mspro_block
tifm_core
tifm_7xx1
tifm_ms

The autoloading is handled via udev (so the relevant rules are not there yet).



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 4/5] memory_pressure_notify() caller

2008-01-15 Thread KOSAKI Motohiro
Hi Daniel

> > > The notification fires after only ~100 MB allocated, i.e., when page
> > > reclaim is beginning to nag from page cache. Isn't this a bit early?
> > > Repeating the test with swap enabled results in a notification after
> > > ~600 MB allocated, which is more reasonable and just before the system
> > > starts to swap.
> >
> > Your issue may have more to do with the fact that the
> > highmem zone is 128MB in size and some balancing issues
> > between __alloc_pages and try_to_free_pages.
> 
> I don't think so. I ran the test again without highmem and noticed the
> same behaviour:

Thank you for good point out!
Could you please post your test program and reproduced method?

unfortunately,
my simple test is so good works in swapless system ;-)

thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Daniel Phillips
Hi Pavel,

Along with this effort, could you let me know if the world actually
cares about online fsck?  Now we know how to do it I think, but is it
worth the effort.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


BUG related to serial console in 2.6.23.14 kernel.

2008-01-15 Thread Ben Greear

I see this crash when booting 2.6.23.14 on serial console.  This
kernel also has my patches applied, but nothing that should
affect the serial consle (tm).

2.6.20.12 plus my patches boots fine on this same system.

The hardware is some little embedded system with some pro/1000
NICs, Pentium-M processor, 1GB RAM, and generally pc-ish in nature.  It is 
booting
a minimal install of Fedora 8 off of a 2GB CF disk.

Fedora 8 is up-to-date as of today.

I'm happy to provide whatever other info is needed.

Thanks,
Ben


Starting udev: BUG: unable to handle kernel NULL pointer dereference at virtual 
address 00c printing eip:
*pde = 
Oops:  [#1]
PREEMPT
Modules linked in: 8139too mii e1000 i2c_i801 8250_pnp i2c_core button 
usb_storage sd_mod scsidCPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00210202   (2.6.23.14c3 #2)
EIP is at uart_write_room+0xd/0x20
eax: f75fa280   ebx: 0001   ecx: f6e78000   edx: 
esi: 0001   edi: f7fa3800   ebp: f6e79f1c   esp: f6e79ee4
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process start_udev (pid: 367, ti=f6e78000 task=f7d0d0c0 task.ti=f6e78000)
Stack: c0240b4e f7fa395c f7f9a400 f7feea40 f7f9a400 00200246  f7d0d0c0
   c01153e0 f7fa396c f7fa396c 0001 f7fa3800 f7feea40 f6e79f50 c023e308
   0001 0001 b7d06000 c0240a50 f7fa380c c0424f40 0001 
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0xb1/0xe0
 [] show_registers+0x1ff/0x370
 [] die+0x104/0x250
 [] do_page_fault+0x35e/0x640
 [] error_code+0x6a/0x70
 [] tty_write+0x128/0x1c0
 [] redirected_tty_write+0x7c/0x80
 [] vfs_write+0x96/0x130
 [] sys_write+0x3d/0x70
 [] sysenter_past_esp+0x5f/0x89
 ===
Code: 08 89 ec 5d c3 8b 40 10 81 48 10 00 00 00 02 eb db 81 49 10 00 00 00 04 
eb c2 8d 74 26 0
EIP: [] uart_write_room+0xd/0x20 SS:ESP 0068:f6e79ee4


root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
kernel /boot/ct2.6.23.14c3.img ro root=LABEL=/ ide=nodma console=ttyS0,38400
   [Linux-bzImage, setup=0x2c00, size=0x1bf118]
initrd /boot/initrd-ct2.6.23.14c3.img
   [Linux-initrd @ 0x37d3, 0x2bfbc4 bytes]

Linux version 2.6.23.14c3 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 (Red 
Hat 4.1.1-51)8BIOS-provided physical RAM map:
 BIOS-e820:  - 0009a000 (usable)
 BIOS-e820: 0009a000 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3fbf (usable)
 BIOS-e820: 3fbf - 3fbf3000 (ACPI NVS)
 BIOS-e820: 3fbf3000 - 3fc0 (ACPI data)
 BIOS-e820: fec0 - 0001 (reserved)
Warning only 896MB will be used.
Use a HIGHMEM enabled kernel.
896MB LOWMEM available.
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   229376
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0:0 ->   229376
DMI 2.3 present.
ACPI: RSDP 000F6460, 0014 (r0 IntelR)
ACPI: RSDT 3FBF3040, 0028 (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: FACP 3FBF30C0, 0074 (r1 IntelR AWRDACPI 42302E31 AWRD0)
ACPI: DSDT 3FBF3180, 3A5E (r1 INTELR AWRDACPI 1000 MSFT  10E)
ACPI: FACS 3FBF, 0040
ACPI: PM-Timer IO Port: 0x408
Allocating PCI resources starting at 4000 (gap: 3fc0:bf00)
Built 1 zonelists in Zone order.  Total pages: 227584
Kernel command line: ro root=LABEL=/ ide=nodma console=ttyS0,38400
ide_setup: ide=nodma : Prevented DMA
Found and enabled local APIC!
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 16384 bytes)
Detected 1599.858 MHz processor.
Console: colour VGA+ 80x25
console [ttyS0] enabled
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 902236k/917504k available (2326k kernel code, 14652k reserved, 1003k 
data, 212k init, )virtual kernel memory layout:
fixmap  : 0xfffb5000 - 0xf000   ( 296 kB)
vmalloc : 0xf880 - 0xfffb3000   ( 119 MB)
lowmem  : 0xc000 - 0xf800   ( 896 MB)
  .init : 0xc0444000 - 0xc0479000   ( 212 kB)
  .data : 0xc0345846 - 0xc04405e4   (1003 kB)
  .text : 0xc010 - 0xc0345846   (2326 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 3200.75 BogoMIPS (lpj=1600376)
Mount-cache hash table entries: 512
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 1024K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
CPU: Intel(R) Pentium(R) M processor 1600MHz stepping 05
Checking 'hlt' instruction... OK.
ACPI: Core revision 20070126
ACPI: setting ELCR to 0200 (from 0e20)
khelper used greatest stack depth: 7424 bytes left
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI 

Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

2008-01-15 Thread Steven Rostedt

On Tue, 15 Jan 2008, Mathieu Desnoyers wrote:
>
> Ok, but what actually insures that the clock->cycle_* reads won't be
> reordered across the clocksource_read() ?



Hmm, interesting.I didn't notice that clocksource_read() is a static
inline.  I was thinking that since it was passing a pointer to a function,
gcc could not assume that it could move that code across it. But now
looking to see that clocksource_read is simply a static inline that does:

  cs->read();

But still, can gcc assume that it can push loads of unknown origin
variables across function calls? So something like:

static int *glob;

void foo(void) {
int x;

x = *glob;

bar();

if (x != *glob)
/* ... */
}

I can't see how any compiler could honestly move the loading of the first
x after the calling of bar(). With glob pointing to some unknown
variable, that may be perfectly fine for bar to modify.


> > >
> > > > +   cycle_raw = clock->cycle_raw;
> > > > +   cycle_last = clock->cycle_last;
> > > > +
> > > > +   /* read clocksource: */
> > > > +   cycle_now = clocksource_read(clock);

So the question here is,can cycle_raw and cycle_last be loaded from
the unknown source that clock points to after the call to
clocksource_read()?

 I'm thinking not.

> > > > +
> > > > +   /* calculate the delta since the last update_wall_time: 
> > > > */
> > > > +   cycle_delta = (cycle_now - cycle_last) & clock->mask;
> > > > +
> > > > +   } while (cycle_raw != clock->cycle_raw ||
> > > > +cycle_last != clock->cycle_last);
> > > > +
> > > > +   return cycle_raw + cycle_delta;
> > > > +}


-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Chris Mason
On Tue, 15 Jan 2008 20:24:27 -0500
"Daniel Phillips" <[EMAIL PROTECTED]> wrote:

> On Jan 15, 2008 7:15 PM, Alan Cox <[EMAIL PROTECTED]> wrote:
> > > Writeback cache on disk in iteself is not bad, it only gets bad
> > > if the disk is not engineered to save all its dirty cache on
> > > power loss, using the disk motor as a generator or alternatively
> > > a small battery. It would be awfully nice to know which brands
> > > fail here, if any, because writeback cache is a big performance
> > > booster.
> >
> > AFAIK no drive saves the cache. The worst case cache flush for
> > drives is several seconds with no retries and a couple of minutes
> > if something really bad happens.
> >
> > This is why the kernel has some knowledge of barriers and uses them
> > to issue flushes when needed.
> 
> Indeed, you are right, which is supported by actual measurements:
> 
> http://sr5tech.com/write_back_cache_experiments.htm
> 
> Sorry for implying that anybody has engineered a drive that can do
> such a nice thing with writeback cache.
> 
> The "disk motor as a generator" tale may not be purely folklore.  When
> an IDE drive is not in writeback mode, something special needs to done
> to ensure the last write to media is not a scribble.
> 
> A small UPS can make writeback mode actually reliable, provided the
> system is smart enough to take the drives out of writeback mode when
> the line power is off.

We've had mount -o barrier=1 for ext3 for a while now, it makes
writeback caching safe.  XFS has this on by default, as does reiserfs.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86] please setup git for http access

2008-01-15 Thread H. Peter Anvin

Joachim Deguara wrote:

Hello,
  I am trying to access the x86 git tree behind a proxy and therefore over the 
http address.  This works for Linus' tree but not for the x86 tree.  Please 
follow these guidelines to enable http access.


http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#exporting-via-http



Set up now.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-15 Thread Daniel Phillips
On Jan 15, 2008 7:15 PM, Alan Cox <[EMAIL PROTECTED]> wrote:
> > Writeback cache on disk in iteself is not bad, it only gets bad if the
> > disk is not engineered to save all its dirty cache on power loss,
> > using the disk motor as a generator or alternatively a small battery.
> > It would be awfully nice to know which brands fail here, if any,
> > because writeback cache is a big performance booster.
>
> AFAIK no drive saves the cache. The worst case cache flush for drives is
> several seconds with no retries and a couple of minutes if something
> really bad happens.
>
> This is why the kernel has some knowledge of barriers and uses them to
> issue flushes when needed.

Indeed, you are right, which is supported by actual measurements:

http://sr5tech.com/write_back_cache_experiments.htm

Sorry for implying that anybody has engineered a drive that can do
such a nice thing with writeback cache.

The "disk motor as a generator" tale may not be purely folklore.  When
an IDE drive is not in writeback mode, something special needs to done
to ensure the last write to media is not a scribble.

A small UPS can make writeback mode actually reliable, provided the
system is smart enough to take the drives out of writeback mode when
the line power is off.

Regards,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH]PCIE ASPM support - takes 2

2008-01-15 Thread Shaohua Li

On Tue, 2008-01-15 at 05:22 -0700, Matthew Wilcox wrote:
> On Tue, Jan 15, 2008 at 01:07:02PM +0800, Shaohua Li wrote:
> > > > +
> > > > +/* Called after ACPI is enabled */
> > > > +static int __init acpi_pcie_support_init(void)
> > > > +{
> > > > +   pcie_aspm_init();
> > > > +   return 0;
> > > > +}
> > > > +fs_initcall(acpi_pcie_support_init);
> > > 
> > > Is there any reason to put this in here instead of just making
> > > pcie_aspm_init an initcall?
> > yes, this will evaluate some ACPI methods, so must be called after ACPI
> > is initialized, which is a sub_system call
> 
> I wasn't saying that you should change it from being an fs_initcall.  I
> was saying that you might want to consider deleting this function and
> adding
> 
> fs_initcall(pcie_aspm_init);
> 
> in the file that defines pcie_aspm_init.
I thought we'd better put all ACPI support bits support in one routine
call, like OSC_EXT_PCI_CONFIG_SUPPORT, we didn't do it so far, so I
added a new routine. But I might be over thinking.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] printk: implement printk_header() and merging printk

2008-01-15 Thread Tejun Heo
Tejun Heo wrote:
> Hello, all.
> 
> This patchset implements printk_header() and mprintk - merging printk
> - to make printing multiline messages and assembling message
> piece-by-piece easier.
> 
> In a nutshell, printk_header() lets you do the following atomically
> (against other messages).
> 
>  code:
>   printk(KERN_INFO "ata1.00: ", "line0\nline1\nline2\n");

That should have been printk_header instead of printk.  Sorry.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmu notifiers #v2

2008-01-15 Thread Andrea Arcangeli
On Wed, Jan 16, 2008 at 07:18:53AM +1100, Benjamin Herrenschmidt wrote:
> Do you have cases where it's -not- called with the PTE lock held ?

For invalidate_page no because currently it's only called next to the
ptep_get_and_clear that modifies the pte and requires the pte
lock. invalidate_range/release are called w/o pte lock held.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >