date:20070907

Re: Why do so many machines need "noapic"?

2007-09-07 Thread Prakash Punnoor

On the day of Friday 07 September 2007 Chuck Ebbert hast written:
> On 09/06/2007 07:31 AM, Andi Kleen wrote:
> > Chuck Ebbert <[EMAIL PROTECTED]> writes:
> >> Some systems lock up without the noapic option.
> >
> > Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
> This is the first one I've actually had in front of me:
>
>   HP TX1000 notebook
>   Nvidia C51/MCP51 mobile chipset

Do you have a hpet? If not, have you tried using acpi_use_timer_override with 
apic?

bye,
-- 
(°= =°)
//\ Prakash Punnoor /\\
V_/ \_V


signature.asc
Description: This is a digitally signed message part.

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-09-07 Thread Mike Snitzer

On 9/5/07, Daniel Phillips <[EMAIL PROTECTED]> wrote:
> On Wednesday 05 September 2007 03:42, Christoph Lameter wrote:
> > On Wed, 5 Sep 2007, Daniel Phillips wrote:
> > > If we remove our anti-deadlock measures, including the
> > > ddsnap.vm.fixes (a roll-up of Peter's patch set) and the request
> > > throttling code in dm-ddsnap.c, and apply your patch set instead,
> > > we hit deadlock on the socket write path after a few hours
> > > (traceback tomorrow).  So your patch set by itself is a stability
> > > regression.
> >
> > Na, that cannot be the case since it only activates when an OOM
> > condition would otherwise result.
>
> I did not express myself clearly then.  Compared to our current
> anti-deadlock patch set, you patch set is a regression.  Because
> without help from some of our other patches, it does deadlock.
> Obviously, we cannot have that.

Can you be specific about which changes to existing mainline code were
needed to make recursive reclaim "work" in your tests (albeit less
ideally than peterz's patchset in your view)?

Also, in a previous post you stated:

>   Just to recap, we have identified two essential ingredients in the
> recipe for writeout deadlock prevention:
>
>  1) Throttle block IO traffic to a bounded maximum memory use.
>
>  2) Guarantee availability of the required amount of memory.

Which changes allowed you to address 1?  I had a look at the various
patches you provided (via svn) and it wasn't clear which subset
fulfilled 1 for you.  Does it work for all Block IO and not just
specially tuned drivers like ddsnap et al?

regards,
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] crypto: blkcipher_get_spot() handling of buffer at end of page

2007-09-07 Thread Herbert Xu

On Fri, Sep 07, 2007 at 05:09:17PM -0700, Bob Gilligan wrote:
>
> Proposed patch is below.  We found the problem and tested this fix in
> 2.6.20, but it looks like the relevant code in blkcipher.c is the same
> in the latest tree.

Good catch!

I've fixed this slightly differently.  Also, the kmalloc size
in the case where it does straddle a page isn't enough either.

[CRYPTO] blkcipher: Fix handling of kmalloc page straddling

The function blkcipher_get_spot tries to return a buffer of
the specified length that does not straddle a page.  It has
an off-by-one bug so it may advance a page unnecessarily.

What's worse, one of its callers doesn't provide a buffer
that's sufficiently long for this operation.

This patch fixes both problems.  Thanks to Bob Gilligan for
diagnosing this problem and providing a fix.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index 7755834..469cb7f 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -59,11 +59,13 @@ static inline void blkcipher_unmap_dst(struct 
blkcipher_walk *walk)
scatterwalk_unmap(walk->dst.virt.addr, 1);
 }
 
+/* Get a spot of the specified length that does not straddle a page.
+ * The caller needs to ensure that there is enough space for this operation.
+ */
 static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
 {
-   if (offset_in_page(start + len) < len)
-   return (u8 *)((unsigned long)(start + len) & PAGE_MASK);
-   return start;
+   u8 *end_page = (u8 *)(((unsigned long)(start + len - 1)) & PAGE_MASK);
+   return start < end_page ? start : end_page;
 }
 
 static inline unsigned int blkcipher_done_slow(struct crypto_blkcipher *tfm,
@@ -155,7 +157,8 @@ static inline int blkcipher_next_slow(struct blkcipher_desc 
*desc,
if (walk->buffer)
goto ok;
 
-   n = bsize * 2 + (alignmask & ~(crypto_tfm_ctx_alignment() - 1));
+   n = bsize * 3 - (alignmask + 1) +
+   (alignmask & ~(crypto_tfm_ctx_alignment() - 1));
walk->buffer = kmalloc(n, GFP_ATOMIC);
if (!walk->buffer)
return blkcipher_walk_done(desc, walk, -ENOMEM);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why do so many machines need "noapic"?

2007-09-07 Thread Al Boldi

Chuck Ebbert wrote:
> On 09/06/2007 07:31 AM, Andi Kleen wrote:
> > Chuck Ebbert <[EMAIL PROTECTED]> writes:
> >> Some systems lock up without the noapic option.
> >
> > Please find patterns: cpu type, chipsets, mainboard vendors etc.
>
> This is the first one I've actually had in front of me:
>
>   HP TX1000 notebook
>   Nvidia C51/MCP51 mobile chipset
>
> Booting with "noapic" gives some very strange results. This is two
> snapshots of /proc/interrupts taken one second apart. It almost looks
> like timer interrupts are occurring on IRQ 0 and IRQ7 on different
> CPUs:
>
>CPU0   CPU1
>   0: 446096   6224XT-PIC-XTtimer
>   1:342  6XT-PIC-XTi8042
>   2:  0  0XT-PIC-XTcascade
>   5:   3099865XT-PIC-XTsata_nv
>   7:   8145 494718XT-PIC-XTehci_hcd:usb2
>   8:  0  0XT-PIC-XTrtc0
>   9:323  9XT-PIC-XTacpi
>  10:136 36XT-PIC-XTHDA Intel
>  11:  43884   1091XT-PIC-XTohci_hcd:usb1, eth0
>  12:104 19XT-PIC-XTi8042
>  14:   1011 25XT-PIC-XTlibata
>  15:  0  0XT-PIC-XTlibata
> NMI:  0  0
> LOC:   6212 445951
> ERR: 403241
> MIS:  0

You may want to try to reconfigure your bios to reserve irq 5/7 for isa only.

Then post /proc/interrupts again.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Al Boldi

Krzysztof Halasa wrote:
>> Ok, but that's not the most common situaties. What I'm suggesting is a
>> warning or a please note popup. Not neccessarily an error or refusing to
>> continue thing.
>
>What IMHO makes sense is changing all references to SCSI CDROM,
>SCSI DISK etc. to just CDROM, DISK, and changing SCSI (menu) to
>something like MASS STORAGE.

I once sent a patch to make libata a submenu of scsi.

[PATCH] libata Kconfig: Allow libata to be selected from within the SCSI submenu
From: Al Boldi <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
CC: Alan Cox <[EMAIL PROTECTED]>, linux-kernel@vger.kernel.org
Date: 08/01/07 07:22 am

Move libata Kconfig sourcing from the drivers Kconfig into the SCSI Kconfig.

This allows the user to quickly select additional disk/tape/cdrom support 
from within the same menu.

Signed-off-by: Al Boldi <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
---
--- a/drivers/Kconfig   2007-05-02 17:25:30.0 +0300
+++ b/drivers/Kconfig   2007-08-01 06:33:13.0 +0300
@@ -22,8 +22,6 @@ source "drivers/ide/Kconfig"
 
 source "drivers/scsi/Kconfig"
 
-source "drivers/ata/Kconfig"
-
 source "drivers/cdrom/Kconfig"
 
 source "drivers/md/Kconfig"
--- a/drivers/scsi/Kconfig  2007-07-09 06:38:37.0 +0300
+++ b/drivers/scsi/Kconfig  2007-08-01 06:46:42.0 +0300
@@ -7,6 +7,8 @@ config RAID_ATTRS
---help---
  Provides RAID
 
+source "drivers/ata/Kconfig"
+
 config SCSI
tristate "SCSI device support"
depends on BLOCK



Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread Henrique de Moraes Holschuh

On Fri, 07 Sep 2007, David Brownell wrote:
> > The platform for a ThinkPad is either i386 or amd64.
> 
> Both i386 and x86_64 are clearly an "arch".  They even live in
> an "arch" directory:  linux/arch/{i386,x86_64}.

Well, I stand corrected on the "platform" term, then.

> When folk talk about a "PC Platform", they're talking about a
> thing that doesn't quite exist in today's Linux tree.  If we
> ever get to an arch/x86, that could have a plat-pc (or mach-pc)
> subdirectory.  ThinkPads should then be a variant of that.

You'd have so many, it wouldn't be funny.  It would also cause some
headaches for distros, unless one can have an "all platform" kernel or
somesuch.

> > I don't feel like drivers like hdaps, thinkpad-acpi, dock, bay,
> > and many others really belong in the platform bus.  But that's
> > what happens right now.
> 
> As a rule, there needs to be a Good Reason to create a new bus
> type.  A "feel" is a pretty weak reason...

The "feel" is there because:

1. Comments about how what we do is wrong for the platform bus (i.e.  adding
   the devices and the driver in the same module). Even the documentation
   for platform devices make it quite clear we are abusing it.  There was
   one of those comments in this very thread.

2. The fact that a module that has a number of different devices has to
   register itself a number of times as a driver, if it wants to name the
   devices something different...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/20] Introduce struct upid

2007-09-07 Thread sukadev

Andrew Morton [EMAIL PROTECTED] wrote:
| On Fri, 10 Aug 2007 15:47:59 +0400
| [EMAIL PROTECTED] wrote:
| 
| >  struct pid
| >  {
| > atomic_t count;
| > @@ -50,6 +50,8 @@ struct pid
| > /* lists of tasks that use this pid */
| > struct hlist_head tasks[PIDTYPE_MAX];
| > struct rcu_head rcu;
| > +   int level;
| > +   struct upid numbers[1];
| 
| You can make this have size [0] now.  It's a gcc extension and
| is used elsewhere in the kernel.

Sorry, we did not respond to this yet :-)

Well, every process has at least one 'struct upid'. The only "cost"
I see with size [1] is having to subtract 1 in create_pid_cachep().

Besides, we create/initialize the 'struct pid' for the idle process
by hand (see INIT_STRUCT_PID in init_task.h).

If we set this size to [0] now, we would need to dynamically allocate
a 'struct upid' during early boot and attach this upid to init_struct_pid.

Or is there a easy way to attach a 'upid' to init_struct_pid, statically ?

Suka

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PPC64 AT_IGNOREPPC question

2007-09-07 Thread Vitaly Mayatskikh


include/asm-powerpc/elf.h:289

Why we need the second AT_IGNOREPPC entry here?

There is a mm_struct->saved_auxv overflow on PPC64 with AT_VECTOR_SIZE
== 44 (may be on PPC32 too, not checked) when adding all entries to
it. I've removed the second AT_IGNOREPCC from ARCH_DLINFO to prevent 
overflowing, checked it on IBM OpenPower 720 and a system (Fedora 7) is 
going on as well. Have I missed some tricky thing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4-mm1 myri10ge module link error on x86_64

2007-09-07 Thread Daniel Walker

On Fri, 2007-09-07 at 19:59 -0400, Jeff Garzik wrote:

> > 
> > commit 9fd380e892e078b582920325357292c07cc9
> > Author: David S. Miller <[EMAIL PROTECTED](none)>
> > Date:   Thu Sep 6 21:44:36 2007 +0100
> > 
> > [MYRI10GE]: Need to select INET_LRO.
> > 
> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> > 
> > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> > index b92b7dc..7d1a84e 100644
> > --- a/drivers/net/Kconfig
> > +++ b/drivers/net/Kconfig
> > @@ -2496,6 +2496,7 @@ config MYRI10GE
> > depends on PCI
> > select FW_LOADER
> > select CRC32
> > +   select INET_LRO

Didn't catch this one .. Guess -mm a little out of date..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] crypto: blkcipher_get_spot() handling of buffer at end of page

2007-09-07 Thread Bob Gilligan

Hi -- There appears to be a bug in the function blkcipher_get_spot(),
which resides in crypto/blkcipher.c.  This small function reads:

static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
{
if (offset_in_page(start + len) < len)
return (u8 *)((unsigned long)(start + len) & PAGE_MASK);
return start;
}

This function is apparently attempting to detect the case where the
buffer pointed to by "start", of length "len" bytes, straddles a page
boundary.  In that case, it will return the address of the start of the
last page that the buffer resides on.  It works correctly in all cases
except when the buffer resides entirely within one page but is located
at the end of that page (i.e. the last byte of the buffer is at
address 0x.fff).  In that one case, this function will return the
address of the start of the next page (i.e. one byte beyond the end of
the buffer), when it should return "start".

For example, say blkcipher_get_spot() is called with start=0xf7e71ff0,
and len=16.  The function returns 0xf7e72000.  The correct return
value should be 0xf7e71ff0.

This bug appears to be the cause of occasional crashes within the slab
allocator that we have seen testing ipsec with AES encryption.
Tracking the crashes down, we see occasions when the kmalloc() call in
blkcipher_next_slow() is being called with n=32, which is satisfied
out of the size-32 slab.  The kmalloc'ed buffer is passed to
blkcipher_get_spot() twice to generate two pointers into that buffer,
then scatterwalk_copychunks() copies into the second pointer.  In the
case when the buffer returned by kmalloc() occupies the last 32-byte
buffer on the page, the second call to blkcipher_get_spot() returns
the address of the start of the NEXT page, and scaterwalk_copychunks()
over-writes 16 bytes at that address.  Since the next page often holds
another slab, this stomps on the list_head in the slab struct, and the
system crashes when the slab allocator dereferences those pointers a
later point in time.  We've seen several crashes in free_block().

Proposed patch is below.  We found the problem and tested this fix in
2.6.20, but it looks like the relevant code in blkcipher.c is the same
in the latest tree.

Bob.




Correctly handle the case where the buffer passed into
blkcipher_get_spot() resides at the end of a page.

Signed-off-by: Bob Gilligan <[EMAIL PROTECTED]>
---

diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index 6e93004..3b6734e 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -60,8 +60,10 @@ static inline void blkcipher_unmap_dst(struct blkcipher_walk\
 *walk)

 static inline u8 *blkcipher_get_spot(u8 *start, unsigned int len)
 {
-   if (offset_in_page(start + len) < len)
-   return (u8 *)((unsigned long)(start + len) & PAGE_MASK);
+   u8 *end = (start + len - 1);
+
+   if (offset_in_page(end) < (len - 1))
+   return (u8 *)((unsigned long)(end) & PAGE_MASK);
return start;
 }


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.23-rc5] ata_piix: replace spaces with tabs

2007-09-07 Thread Jason Gaston

This patch removes some incorrect formatting spaces and replaces them with tabs.

Signed-off-by: Jason Gaston <[EMAIL PROTECTED]>

--- linux-2.6.23-rc5/drivers/ata/ata_piix.c.orig2007-09-07 
17:11:55.0 -0700
+++ linux-2.6.23-rc5/drivers/ata/ata_piix.c 2007-09-07 17:13:24.0 
-0700
@@ -445,15 +445,15 @@
 };
 
 static const struct piix_map_db tolapai_map_db = {
-.mask = 0x3,
-.port_enable = 0x3,
-.map = {
-/* PM   PS   SM   SS   MAP */
-{  P0,  NA,  P1,  NA }, /* 00b */
-{  RV,  RV,  RV,  RV }, /* 01b */
-{  RV,  RV,  RV,  RV }, /* 10b */
-{  RV,  RV,  RV,  RV },
-},
+   .mask = 0x3,
+   .port_enable = 0x3,
+   .map = {
+   /* PM   PS   SM   SS   MAP */
+   {  P0,  NA,  P1,  NA }, /* 00b */
+   {  RV,  RV,  RV,  RV }, /* 01b */
+   {  RV,  RV,  RV,  RV }, /* 10b */
+   {  RV,  RV,  RV,  RV },
+   },
 };
 
 static const struct piix_map_db *piix_map_db_table[] = {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread David Brownell

> > (Also, note that "platform", "host", and "board" are ambiguous.
> > In some contexts each is synonymous; in others, not.  I avoid
>
> In this specific case I am talking about, they're not.

That is, in *YOUR* usage context they're not.  I had to parse
what you wrote a few times before your comments about $SUBJECT
started to make sense.  I've *never* heard "host" used that way,
and rarely hear "platform" used that way either.


> The platform for a ThinkPad is either i386 or amd64.

Both i386 and x86_64 are clearly an "arch".  They even live in
an "arch" directory:  linux/arch/{i386,x86_64}.

When folk talk about a "PC Platform", they're talking about a
thing that doesn't quite exist in today's Linux tree.  If we
ever get to an arch/x86, that could have a plat-pc (or mach-pc)
subdirectory.  ThinkPads should then be a variant of that.


> I don't feel like drivers like hdaps, thinkpad-acpi, dock, bay,
> and many others really belong in the platform bus.  But that's
> what happens right now.

As a rule, there needs to be a Good Reason to create a new bus
type.  A "feel" is a pretty weak reason...

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4-mm1 myri10ge module link error on x86_64

2007-09-07 Thread Jeff Garzik


David Miller wrote:

From: David Miller <[EMAIL PROTECTED]>
Date: Thu, 06 Sep 2007 13:40:38 -0700 (PDT)


From: Mathieu Desnoyers <[EMAIL PROTECTED]>
Date: Thu, 6 Sep 2007 15:37:51 -0400


I got a link error on myri10ge when building 2.6.23-rc4-mm1 on x86_64 :

ERROR: "lro_flush_all" [drivers/net/myri10ge/myri10ge.ko] undefined!
ERROR: "lro_receive_frags" [drivers/net/myri10ge/myri10ge.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make: *** [_all] Error 2

myri10ge needs some LRO ifdeffery.


Actually the fix is even simpler, missing select in Kconfig.

I've checked the following fix for this into the net-2.6.24
tree.

commit 9fd380e892e078b582920325357292c07cc9
Author: David S. Miller <[EMAIL PROTECTED](none)>
Date:   Thu Sep 6 21:44:36 2007 +0100

[MYRI10GE]: Need to select INET_LRO.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>


diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b92b7dc..7d1a84e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2496,6 +2496,7 @@ config MYRI10GE
depends on PCI
select FW_LOADER
select CRC32
+   select INET_LRO


Yes, that's the correct fix.  ACK.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch for Intel Tolapai

2007-09-07 Thread Jeff Garzik


Gaston, Jason D wrote:

At this time, I don't have any way to test those particular DeviceID's
and I know that the AHCI mode DeviceID works by using the class code
support.  So, I would like to just leave them at they are, if that is
ok.



Fine by me...  Overall I'll follow "vendor's best advice" here :)

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.23-rc5 kernel BUG at fs/nfs/nfs4xdr.c:945

2007-09-07 Thread Michal Piotrowski

Hi,

On 07/09/2007, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> Sep  7 11:42:49 p55lp2 kernel: kernel BUG at fs/nfs/nfs4xdr.c:945!
> Sep  7 11:42:49 p55lp2 kernel: Oops: Exception in kernel mode, sig: 5 [#1]
> Sep  7 11:42:49 p55lp2 kernel: SMP NR_CPUS=128 NUMA pSeries
> Sep  7 11:42:49 p55lp2 kernel: Modules linked in: nfs lockd nfs_acl
> sunrpc ipv6 loop dm_mod ibmveth sg ibmvscsic sd_mod scsi_mod
> Sep  7 11:42:49 p55lp2 kernel: NIP: d0378044 LR:
> d0378034 CTR: 801c5840
> Sep  7 11:42:49 p55lp2 kernel: REGS: c000d971b050 TRAP: 0700   Not
> tainted  (2.6.23-rc5-ppc64)
> Sep  7 11:42:49 p55lp2 kernel: MSR: 80029032   CR:
> 28000444  XER: 0014
> Sep  7 11:42:49 p55lp2 kernel: TASK = c2787740[11508] 'fsstress'
> THREAD: c000d9718000 CPU: 1
> Sep  7 11:42:49 p55lp2 kernel: GPR00: 0001 c000d971b2d0
> d03bd648 0037
> Sep  7 11:42:49 p55lp2 kernel: GPR04:  
>  
> Sep  7 11:42:49 p55lp2 kernel: GPR08: 0002 c0616538
> c000ef7afb58 c0616540
> Sep  7 11:42:49 p55lp2 kernel: GPR12: 4000 c05e4a80
>  200b2510
> Sep  7 11:42:49 p55lp2 kernel: GPR16: 20105550 200b2534
> 2008c15c 0001
> Sep  7 11:42:49 p55lp2 kernel: GPR20:  0001
> f000 c000d971ba30
> Sep  7 11:42:49 p55lp2 kernel: GPR24: d034f524 c000dc4f8054
> c000d971b7d0 c000d9d313f0
> Sep  7 11:42:49 p55lp2 kernel: GPR28: 0276 2200
> d03b8d78 
> Sep  7 11:42:49 p55lp2 kernel: NIP [d0378044]
> .encode_lookup+0x6c/0xbc [nfs]
> Sep  7 11:42:49 p55lp2 kernel: LR [d0378034]
> .encode_lookup+0x5c/0xbc [nfs]
> Sep  7 11:42:49 p55lp2 kernel: Call Trace:
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b2d0] [d0378034]
> .encode_lookup+0x5c/0xbc [nfs] (unreliable)
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b370] [d0379f8c]
> .nfs4_xdr_enc_lookup+0x78/0xbc [nfs]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b440] [d0314534]
> .rpcauth_wrap_req+0xe4/0x124 [sunrpc]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b4f0] [d030a790]
> .call_transmit+0x218/0x2b8 [sunrpc]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b590] [d03124d8]
> .__rpc_execute+0xd4/0x368 [sunrpc]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b630] [d030b114]
> .rpc_do_run_task+0xc8/0x104 [sunrpc]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b6e0] [d030b224]
> .rpc_call_sync+0x2c/0x64 [sunrpc]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b760] [d036ef04]
> ._nfs4_proc_lookupfh+0xd4/0x124 [nfs]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b850] [d03719a0]
> ._nfs4_proc_lookup+0x80/0x21c [nfs]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b910] [d0371ba4]
> .nfs4_proc_lookup+0x68/0xac [nfs]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971b9c0] [d0354bf4]
> .nfs_lookup+0x158/0x334 [nfs]
> Sep  7 11:42:49 p55lp2 kernel: [c000d971bbc0] [c00f3a28]
> .lookup_hash+0xfc/0x140
> Sep  7 11:42:49 p55lp2 kernel: [c000d971bc60] [c00f7b28]
> .sys_renameat+0x164/0x228
> Sep  7 11:42:49 p55lp2 kernel: [c000d971be30] [c0008534]
> syscall_exit+0x0/0x40
> Sep  7 11:42:49 p55lp2 kernel: Instruction dump:
> Sep  7 11:42:49 p55lp2 kernel: e8410028 7fa4eb78 7c7f1b79 7fb80026
> 40820014 e8be83a8 e87e8350 4800c5f9
> Sep  7 11:42:49 p55lp2 kernel: e8410028 7fb80120 7c180026 54001ffe
> <0b00> 380f 7b850020 387f0008

Is this a post 2.6.22 regression? Have you tried 2.6.23-rc5-git1?
(There are a few nfs fixes)

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc4][reRESEND] ata_piix: IDE mode SATA patch for Intel Tolapai

2007-09-07 Thread Jeff Garzik


Gaston, Jason D wrote:

-Original Message-
From: Jeff Garzik [mailto:[EMAIL PROTECTED]
Sent: Friday, August 31, 2007 12:51 AM
To: Gaston, Jason D
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org;
[EMAIL PROTECTED]
Subject: Re: [PATCH 2.6.23-rc4][reRESEND] ata_piix: IDE mode SATA patch

for

Intel Tolapai

Jason Gaston wrote:

Resend trying to remove 8-bit characters in the email.

This patch adds the Intel Tolapai IDE mode SATA controller DID's.

Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>

applied


Jeff,

I just noticed that the following section came through as spaces instead
of tabs.  Do I need to resend a corrected version?

+static const struct piix_map_db tolapai_map_db = {
+.mask = 0x3,
+.port_enable = 0x3,
+.map = {
+/* PM   PS   SM   SS   MAP */
+{  P0,  NA,  P1,  NA }, /* 00b */
+{  RV,  RV,  RV,  RV }, /* 01b */
+{  RV,  RV,  RV,  RV }, /* 10b */
+{  RV,  RV,  RV,  RV },
+},
+};


that's always nice, yes :)

even better would be to run a script through #upstream, accomplishing 
the same thing but for many drivers ;-)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: unable to handle kernel NULL pointer dereference<1>

2007-09-07 Thread Michal Piotrowski

Hi Mark,

[Adding netdev to CC]

On 07/09/2007, Mark Nipper <[EMAIL PROTECTED]> wrote:
> I've received two oopses now from my kernel while running
> the 2.6.22 series.  The first was with 2.6.22.1 back in July and
> the second which happened just within the last day is 2.6.22.5.
> They both appear to be the same bug and I don't think it's
> hardware related.  I'm attaching the entries from logcheck which
> I received when they happened.
>
> I'm not subscribed to the mailing list, so please make
> sure to copy me directly on any replies.  And let me know if
> anyone needs any additional information to try to track this
> down.  Thanks for reading...

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: limiting to UDMA/33 instead of UDMA/100 - pata_pdc202xx_old (also XFS error)?

2007-09-07 Thread Jeff Garzik


res 51/84:00:3f:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
res 51/84:00:3f:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
res 51/84:00:21:9d:fc/00:00:00:00:00/e6 Emask 0x10 (ATA bus error)



ATA bus error == straight-from-hardware reported error

Cable or connector is dying maybe?  Bad power supplies also sometimes 
manifest this way.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2.6.23-rc4][reRESEND] ata_piix: IDE mode SATA patch for Intel Tolapai

2007-09-07 Thread Gaston, Jason D

>-Original Message-
>From: Jeff Garzik [mailto:[EMAIL PROTECTED]
>Sent: Friday, August 31, 2007 12:51 AM
>To: Gaston, Jason D
>Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org;
>[EMAIL PROTECTED]
>Subject: Re: [PATCH 2.6.23-rc4][reRESEND] ata_piix: IDE mode SATA patch
for
>Intel Tolapai
>
>Jason Gaston wrote:
>> Resend trying to remove 8-bit characters in the email.
>>
>> This patch adds the Intel Tolapai IDE mode SATA controller DID's.
>>
>> Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>
>
>applied

Jeff,

I just noticed that the following section came through as spaces instead
of tabs.  Do I need to resend a corrected version?

+static const struct piix_map_db tolapai_map_db = {
+.mask = 0x3,
+.port_enable = 0x3,
+.map = {
+/* PM   PS   SM   SS   MAP */
+{  P0,  NA,  P1,  NA }, /* 00b */
+{  RV,  RV,  RV,  RV }, /* 01b */
+{  RV,  RV,  RV,  RV }, /* 10b */
+{  RV,  RV,  RV,  RV },
+},
+};
+

Thanks,

Jason
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: request for information about the "ath5k" licensing

2007-09-07 Thread Jeff Garzik


Reyk Floeter wrote:

I'm still waiting for an answer. Your process is taking too long.



Speaking as a person through which these changes flow upstream into the 
official kernel (ath5k maintainers -> linville -> me -> linus)...



The most important thing for today is that no ath5k stuff has been 
committed (nor has it ever been).



I would rather take it slow and make sure everybody is happy.  There is 
nothing upstream, and so, there is no need to rush and correct something.


Collectively, this is just growing pains.  Everyone is breaking new 
ground, trying to figure out how to best support atheros stuff on Linux. 
 There are new tools to deal with (svn? git? flavor of the day?:)), new 
licenses with new ramifications to consider, a new wireless stack to 
deal with.


What you are witnessing is but a small part of the chaos as everyone 
tackles these chores simultaneously.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: easy alsa patches for the stable kernel?

2007-09-07 Thread Takashi Iwai

At Fri, 07 Sep 2007 22:59:07 +0200,
Romano Giannetti wrote:
> 
> On Fri, 2007-09-07 at 21:42 +0200, Thorsten Leemhuis wrote:
> > On 07.09.2007 14:58, Takashi Iwai wrote:
> > >>> Ah good.  I added it to ALSA HG tree now.
> 
> Thanks. BTW, is anywhere visible the current hg tree? It seems that
> http://hg-mirror.alsa-project.org/alsa-kernel/ lags a bit behind...

The patch is certainly in the primary HG repo (hg.alsa-project.org).
hg-mirror seems often out of sync, unfortuantely.


> > It's just this line afaics...
> > +   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
> > ...which afaics is doing nothing more then "if DMI-Data matches FOO then
> > apply know workaround BAR". Is that correct or am I missing something
> > here (another patch that this one depends on that isn't in 2.6.23 yet
> > maybe?)?
> >
> 
> Your second guess is right. That line is a patch with respect to current
> mercurial tree, which is quite ahead of the current kernel alsa code.
> Although I'd like to know from where Andrew pulled it, because I was not
> able to find that tree on git.kernel nor alsa-project.org... :-)

It's on git.kernel.org, perex/alsa.git tree mm branch.
You can find the information in the download wiki page of
alsa-project.org.


Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2/2] 2.6.23-rc5: known regressions with patches

2007-09-07 Thread Michal Piotrowski

Hi David,

On 06/09/2007, David Woodhouse <[EMAIL PROTECTED]> wrote:
> On Mon, 2007-09-03 at 12:11 +0200, Michal Piotrowski wrote:
> > MTD
> >
> > Subject : error: implicit declaration of function 'cfi_interleave'
> > References  : http://lkml.org/lkml/2007/8/6/272
> > Last known good : ?
> > Submitter   : Ingo Molnar <[EMAIL PROTECTED]>
> > Caused-By   : ?
> > Handled-By  : David Woodhouse <[EMAIL PROTECTED]>
> > Patch   : http://lkml.org/lkml/2007/8/9/586
> > Status  : patch available
>
> This isn't really a regression -- it's been like this for years.

Ok, I removed it from the KR list.

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: easy alsa patches for the stable kernel?

2007-09-07 Thread Takashi Iwai

At Fri, 07 Sep 2007 21:42:36 +0200,
Thorsten Leemhuis wrote:
> 
> On 07.09.2007 14:58, Takashi Iwai wrote:
> > At Fri, 07 Sep 2007 14:04:01 +0200,
> > Thorsten Leemhuis wrote:
> >> On 07.09.2007 12:21, Takashi Iwai wrote:
> >>> At Fri, 07 Sep 2007 10:22:27 +0200,
> >>> Romano Giannetti wrote:
>  Takashi: good news!
> 
>  diff --git a/sound/pci/hda/patch_realtek.c 
>  b/sound/pci/hda/patch_realtek.c
>  index 3557865..496d119 100644
>  --- a/sound/pci/hda/patch_realtek.c
>  +++ b/sound/pci/hda/patch_realtek.c
>  @@ -9044,6 +9044,7 @@ static const char 
>  *alc268_models[ALC268_MODEL_LAST] = {
>   static struct snd_pci_quirk alc268_cfg_tbl[] = {
>  SND_PCI_QUIRK(0x1043, 0x1205, "ASUS W7J", ALC268_3ST),
>  SND_PCI_QUIRK(0x1179, 0xff10, "TOSHIBA A205", ALC268_TOSHIBA),
>  +   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
>  SND_PCI_QUIRK(0x103c, 0x30cc, "TOSHIBA", ALC268_TOSHIBA),
>  SND_PCI_QUIRK(0x1025, 0x0126, "Acer", ALC268_ACER),
>  SND_PCI_QUIRK(0x1025, 0x0130, "Acer Extensa 5210", ALC268_ACER),
> >>> Ah good.  I added it to ALSA HG tree now.
> >> Just wondering: should easy-and-obvious and less-risky patches like this
> >> one be send to the stable-kernel-maintainers in parallel to adding them
> >> to the HG-Tree (or shortly afterwards)? It could safe users lots of
> >> trouble if such improvements make it quickly into production-ready
> >> kernel-releases (and from there they might even find their way into some
> >> distribution kernels quickly). Hardware then would "just work".
> > Well, this patch is defenitely not for 2.6.23 or stable kernel.
> > It's for 2.6.24.
> 
> Sorry, but why?
> 
> It's just this line afaics...
> +   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
> ...which afaics is doing nothing more then "if DMI-Data matches FOO then
> apply know workaround BAR". Is that correct or am I missing something
> here (another patch that this one depends on that isn't in 2.6.23 yet
> maybe?)?

The patch is based on the workaround codes that have been added after
2.6.23.  Thus the patch cannot work for 2.6.23 or earlier.

> > The problem is often that I
> > want first the merge to Linus tree, and then I forget to submit to
> > stable tree when the merge takes long time in the end.  (Ther merge of
> > alsa.git is too spotty, and that's another big problem for me.  In
> > short, I do NOT maintain alsa.git tree at all...)
> 
> Then I as one of all those long-time-lkml-lurkers without programming
> skills dare to say that maybe the alsa-project might need to improve its
> workflow? Maybe you guys should maintain two git-trees (or multiple
> branches in one tree; sorry, I'm not a git expert and not sure what the
> correct terms are)?

We do have different branches, too.  Most fix patches are usually in
the branch to be pushed (although they are rarely done).  But, the
point is that I am no official subsystem maintainer.

I have an access right to add the patches to ALSA HG tree, which is
converted to git tree automatically.  So, eventually, 90% of patches
come from me.  But, the maintenance of git tree and push request are
out of my hand.  It's a frustrating situation to me, too.

> > Another problem I see is that we have little chance for testing the
> > target patches with stable kernels.
> 
> The stable maintainers release "rc" kernels before they release the
> final ones. And the patch of course should have been applied in
> linus-tree. Both things are not a perfect safety net, but I'd say it
> should be more then enough as long as we are talking about new PCI-IDs
> for existing drivers or "apply workarounds for special machines which we
> detect by their DMI data" (lot's of those seems to be needed these days).

I'm skeptical that people ever test stable rc kernels well for certain
bugs.  Also, adding new PCI ID isn't as safe as it sounds (like in
this case).  It must be tested _before_ applying.

> >  Even it looks OK and works for
> > the later kernels, it often doesn't work or break magically with the
> > older kernels.  Usually, I have no affected hardware, and bug
> > reporters test only with the recent version (partly because developers
> > ask first to try the latest version -- if it works, why to downgrade
> > again?) 
> 
> Because he bug-reporter is likely only one that invested enough time to
> analized the problem and fix it alone or together with you guys. But
> there is likely a buch of other people that get hit by the same problem;

Well, the problem is how we can find out such unlucky guys...

> some will just say "linux sucks" and switch back to some other OS --
> especially if they never have heard of alsa or don't really know what a
> kernel really is or does.

Linux will suck really if one breaks so-called stable thing easily
without actually testing.  For stable stuff, "it should be good" isn't
enough.  It must be: "it IS good."

Don't

RE: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch for Intel Tolapai

2007-09-07 Thread Gaston, Jason D

>-Original Message-
>From: Jeff Garzik [mailto:[EMAIL PROTECTED]
>Sent: Friday, September 07, 2007 3:39 PM
>To: Gaston, Jason D
>Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch
for
>Intel Tolapai
>
>Gaston, Jason D wrote:
>>> -Original Message-
>>> From: Gaston, Jason D
>>> Sent: Friday, August 31, 2007 10:10 AM
>>> To: 'Jeff Garzik'
>>> Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
>>> Subject: RE: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch
>> for
>>> Intel Tolapai
>>>
>>> This device has both AHCI and RAID modes that use the ahci driver.
>> Only
>>> the RAID mode DID's are being added as the PCI class code support
will
>>> cover the AHCI mode.  Looking at the Generic, PCI class code support
>>> section, it uses "board_ahci".  I assumed that they should be the
same
>> as
>>> the generic class code support is working on this platform.
>>>
>>> Thanks,
>>>
>>> Jason
>>>
>>>
 -Original Message-
 From: Jeff Garzik [mailto:[EMAIL PROTECTED]
 Sent: Friday, August 31, 2007 12:47 AM
 To: Gaston, Jason D
 Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
 Subject: Re: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA
patch
>> for
 Intel Tolapai

 Jason Gaston wrote:
> Resend trying to remove 8-bit characters in the email.
>
> This patch adds the Intel Tolapai RAID controller DID's for SATA
>>> support.
> Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>
>
> --- linux-2.6.23-rc4/drivers/ata/ahci.c.orig  2007-08-27
 18:32:35.0 -0700
> +++ linux-2.6.23-rc4/drivers/ata/ahci.c   2007-08-28
>>> 16:58:11.0 -
 0700
> @@ -411,6 +411,8 @@
>   { PCI_VDEVICE(INTEL, 0x292f), board_ahci_pi }, /* ICH9M */
>   { PCI_VDEVICE(INTEL, 0x294d), board_ahci_pi }, /* ICH9 */
>   { PCI_VDEVICE(INTEL, 0x294e), board_ahci_pi }, /* ICH9M */
> + { PCI_VDEVICE(INTEL, 0x502a), board_ahci }, /* Tolapai */
> + { PCI_VDEVICE(INTEL, 0x502b), board_ahci }, /* Tolapai */
 Why did you not use board_ahci_pi?  Is the AHCI ports-implemented
 register unreliable on this platform?
>>
>> Jeff,
>>
>> Do I need to change this to board_ahci_pi or is it ok to leave it at
>> board_ahci, which will be used by the AHCI class code devices?
>
>You are the one who needs to answer this question ;-)
>
>Most new Intel AHCI have a sane and reliable Ports-Implemented register
>value even across reset, unlike earlier ones or some clones.  For
those,
>we use board_ahci_pi.
>
>If PI is not reliable across reset or if BIOS is absent (yes we care
>about that case, when we do our own PCI resume for example), then you
>should use board_ahci.
>
>   Jeff

At this time, I don't have any way to test those particular DeviceID's
and I know that the AHCI mode DeviceID works by using the class code
support.  So, I would like to just leave them at they are, if that is
ok.

Thanks,

Jason
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add all thread stats for TASKSTATS_CMD_ATTR_TGID

2007-09-07 Thread Jonathan Lim

On Fri Aug 31 00:24:47 2007, [EMAIL PROTECTED] wrote:
> 
> Jonathan Lim wrote:
> > On Sat Aug 25 21:58:44 2007, [EMAIL PROTECTED] wrote:
> >>> Also, I don't understand why the code to update btime:
> >>>
> >>> /* calculate task elapsed time in timespec */
> >>> do_posix_clock_monotonic_gettime();
> >>> ts = timespec_sub(uptime, tsk->start_time);
> >>>   ...
> >>> stats->ac_btime = get_seconds() - ts.tv_sec;
> >>>
> >>> does not simply use tsk->start_time or tsk->real_start_time without
> >>> comparing it to the current time.
> >> From what I understand, task->start_time and task->real_start_time
> >> are taken from the realtime clock. The accounting in CSA seems
> >> to be very similar to the accounting done in do_acct_process()
> >> (kernel/acct.c).
> > 
> > In CSA 3.0 ...
> > 
> > csa_acct_eop(int exitcode, struct task_struct *p)
> > 
> > csa->ac_btime = boottime +
> > ((p->start_time.tv_nsec < NSEC_PER_SEC/2) ?
> >  p->start_time.tv_sec :
> >  p->start_time.tv_sec +1);
> > 
> > where 
> > 
> > do_posix_clock_monotonic_gettime();
> > boottime = xtime.tv_sec - uptime.tv_sec;
> > 
> > In an upcoming version of CSA ...
> > 
> > csa_acct_eop(struct taskstats *p)
> > 
> > csa->ac_btime = p->ac_btime;
> > 
> > where
> > 
> > do_posix_clock_monotonic_gettime();
> > ts = uptime - tsk->start_time;
> > p->ac_btime = get_seconds() - ts.tv_sec;
> > = xtime.tv_sec - (uptime - tsk->start_time);
> > = (xtime.tv_sec - uptime) + tsk->start_time;
> > 
> > So they're basically equivalent.
> 
> Excellent, so can Guillaume change ac_btime to be just tsk->start_time?

I don't think so.  Current time (xtime) is relative to the epoch; uptime and
tsk->start_time (jiffies) are both relative to some boot time.  So you need to
subtract uptime from xtime to get the boot time relative to the epoch, then add
tsk->start_time.  The result is what ac_btime should be set to.

I think his recent changes are as follows:

--- a/kernel/tsacct.c   Fri Aug 31 01:42:23 2007 -0700
+++ b/kernel/tsacct.c   Tue Aug 28 20:35:27 2007 +0200
...
-void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
+static void fill_wall_times(struct taskstats *stats, struct task_struct *task)
...
-   ts = timespec_sub(uptime, tsk->start_time);
+   ts = timespec_sub(uptime, task->start_time);
...
-   stats->ac_btime = get_seconds() - ts.tv_sec;
...
+   stats->ac_btime = get_seconds() - ts.tv_sec;

So really no different from before, which is correct.

Jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CORRECTION][PATCH] Fix a potential NULL pointer dereference in uli526x_interrupt() in drivers/net/tulip/uli526x.c

2007-09-07 Thread Jeff Garzik


Micah Gruber wrote:

This patch fixes a potential null dereference bug where we dereference dev 
before a null check. This patch simply moves the dereferencing after the null 
check.

Signed-off-by: Micah Gruber <[EMAIL PROTECTED]>
---

--- a/drivers/net/tulip/uli526x.c
+++ b/drivers/net/tulip/uli526x.c
@@ -663,7 +663,7 @@
 {
struct net_device *dev = dev_id;
struct uli526x_board_info *db = netdev_priv(dev);
-   unsigned long ioaddr = dev->base_addr;
+   unsigned long ioaddr;
unsigned long flags;
 
 	if (!dev) {

@@ -671,6 +671,8 @@
return IRQ_NONE;
}
 
+	ioaddr = dev->base_addr;

+


as satyam noted, just remove the !dev test


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Intel Memory Ordering White Paper

2007-09-07 Thread Linus Torvalds

On Sat, 8 Sep 2007, Nick Piggin wrote:
> 
> So, can we finally noop smp_rmb and smp_wmb on x86?

Did AMD already release their version? If so, we should probably add a 
commit that does that in somewhat early 2.6.24 rc, and add the pointers to 
the whitepapers in the commit message.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: ICH Intel PATA short cable override...

2007-09-07 Thread Jeff Garzik


Mark Lord wrote:

Ditto for selecting transfer modes.



Waiting on one thing AFAICS:

ability to drain/idle all ports +
issue a command on one port +
resume normal parallel port operation

SET FEATURES - XFER MODE is special in that it requires all sorts of 
additional controller handling and careful cross-port synchronization.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: crash while playing bzflag

2007-09-07 Thread Michal Piotrowski

Hi Alex,

On 07/09/2007, Alex Riesen <[EMAIL PROTECTED]> wrote:
> Kernel: v2.6.23-rc5+ (b21010ed6498391c0f359f2a89c907533fe07fec)

Is this a post 2.6.22 regression?

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: crash while playing bzflag

2007-09-07 Thread Chuck Ebbert

On 09/07/2007 03:56 PM, Alex Riesen wrote:
> Kernel: v2.6.23-rc5+ (b21010ed6498391c0f359f2a89c907533fe07fec)
> Ubuntu Feisty, Radeon R200 (9200) dual head, MergedFB, BZFlag in
> OpenGL mode, frozen. That'll teach me playing games at home...
> 
> BUG: unable to handle kernel paging request at virtual address ffa85000
>  printing eip:
> c016eed1
> *pde = 5067
> *pte = 
> Oops:  [#1]
> PREEMPT SMP 
> Modules linked in: binfmt_misc nfs radeon drm nfsd exportfs lockd sunrpc fan 
> firmware_class it87 hwmon_vid hwmon p4_clockmod speedstep_lib ipv6 sg sr_mod 
> cdrom usb_storage snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_pcm 
> snd_mixer_oss snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq 
> snd_timer snd_seq_device generic floppy snd ide_core intel_agp e100 uhci_hcd 
> ehci_hcd soundcore snd_page_alloc agpgart evdev
> CPU:0
> EIP:0060:[__link_path_walk+2146/2867]Not tainted VLI
> EFLAGS: 00010287   (2.6.23-rc5-t #138)
> EIP is at __link_path_walk+0x862/0xb33
> eax: ffa85000   ebx: f0a4dd64   ecx: c0442130   edx: c1782d00
> esi: eee51f30   edi: ffa85000   ebp: f0e49e40   esp: eee51de4
> ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> Process command-not-fou (pid: 2895, ti=eee51000 task=f08f1020 
> task.ti=eee51000)
> Stack: f474c02c 0101 f1db2d64 c016d3eb c1782d00 ffa85000  
>  
> 96ba5598 000b f474c021 c18eff00 f0e49e40 f08e1540 
> eee51f30 
>c1937c78 c18eff00 c016f1e6 f474c000 c1937c78 c18eff00 c180b180 
> f11a7600 
> Call Trace:
>  [do_lookup+79/323] do_lookup+0x4f/0x143
>  [link_path_walk+68/179] link_path_walk+0x44/0xb3
>  [_spin_unlock+5/28] _spin_unlock+0x5/0x1c
>  [get_unused_fd_flags+198/208] get_unused_fd_flags+0xc6/0xd0
>  [do_path_lookup+362/463] do_path_lookup+0x16a/0x1cf
>  [__path_lookup_intent_open+69/117] __path_lookup_intent_open+0x45/0x75
>  [path_lookup_open+32/37] path_lookup_open+0x20/0x25
>  [open_namei+114/1364] open_namei+0x72/0x554
>  [unmap_vmas+791/1240] unmap_vmas+0x317/0x4d8
>  [do_filp_open+37/57] do_filp_open+0x25/0x39
>  [_spin_unlock+5/28] _spin_unlock+0x5/0x1c
>  [get_unused_fd_flags+198/208] get_unused_fd_flags+0xc6/0xd0
>  [do_sys_open+68/192] do_sys_open+0x44/0xc0
>  [sys_open+28/30] sys_open+0x1c/0x1e
>  [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
>  [xfrm_bundle_ok+53/522] xfrm_bundle_ok+0x35/0x20a
>  ===
> Code: f0 ff ff 0f 87 38 01 00 00 8b 46 1c 8b 44 86 20 89 44 24 14 31 ff 85 c0 
> 0f 84 09 01 00 00 89 c7 3d 00 f0 ff ff 0f 87 f5 00 00 00 <80> 38 2f 0f 85 9f 
> 00 00 00 89 f0 e8 38 e1 ff ff 64 a1 00 70 3f 
> EIP: [__link_path_walk+2146/2867] __link_path_walk+0x862/0xb33 SS:ESP 
> 0068:eee51de4
> SysRq : Emergency Sync
> Emergency Sync complete
> SysRq : Emergency Sync
> Emergency Sync complete
> 
> The config, lspci output, Xorg.0.log, and a more complete log of the
> crash attached (the crash happened around Sep  7 21:24:27 in the log,
> I panicked a bit and pressed Alt-SysRq-t and emergency sync).

Whee...

here, in __vfs_follow_link:

if (*link == '/') {   < link points to unmapped memory
path_release(nd);
if (!walk_init_root(link, nd))
/* weird __emul_prefix() stuff did it */
goto out;
}

inlined from __do_follow_link:

if (!IS_ERR(cookie)) {
char *s = nd_get_link(nd);
error = 0;
if (s)
error = __vfs_follow_link(nd, s);
if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}

__do_follow_link is inlined from do_follow_link

presumably inlined here:

if ((lookup_flags & LOOKUP_FOLLOW)
&& inode && inode->i_op && inode->i_op->follow_link) {
err = do_follow_link(, nd);
if (err)
goto return_err;
inode = nd->dentry->d_inode;
} else

What filesystem was this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Union Mount: Readdir approaches

2007-09-07 Thread Matt Keenan

[EMAIL PROTECTED] wrote:
> Hello Bharata,
> I am developing a linux stackable/unification filesystem too.
>
> Bharata B Rao:
>   
>> Questions
>> -
>> 
>   :::
>   
>> First of all, should we even expect a sane lseek(2) on union mounted
>> directories ? If not, will the Approach 2, which works uniformly for
>> all filesystem types be acceptable ?
>>
>> If lseek(2) needs to be supported, then how do we define the seek behaviour
>> when two different types of directories are union'ed ? For eg. how do we 
>> define
>> lseek(2) on a union of ext2 directory on top of a nfs directory ? Since both
>> of them use different encoding methods for filp->f_pos, how do we establish
>> a common lseek(2) behaviour here ?
>>
>> And finally, what is the use case for directory seek ? Would anybody walk
>> directory by directory by seeking into a directory file ?
>> 
>
> Although I don't remember exactly, NFS or smbfs seek for a
> directory. Additionally, any user process can call seekdir or
> something. So I believe any stackable/unification filesystem should
> support it.
>
> Here is my approach. While I don't think it is the best approach since
> it consumes much memory and cpu, I hope it help you. (or assist
> you? Sorry, I don't know correct English word)
>
> - the stackable fs has its own inode, file, dentry object, which has an
>   array for the underlying inode pointers. and the whiteout is a regular
>   file with a special name, instead of a flag in inode. this is the most
>   different architecture from your unification embeded in VFS.
> - the vritual dir inode object has a cache for its child entries. it is
>   called vdir. the cache has a version and a customizlable lifetime too.
> - all the existing underlying (same-named) dir are opened too. the file
>   objects are stored in the virtual file object as an array.
> - the virtual file object has a cache for its child entries too. it is a
>   copy of the one in the inode object.
>
> When the first readdir is issued:
> - call vfs_readdir for every underlying opened dir (file) object.
> - store every entry to either the hash table for the result or the
>   whiteout, when the same-named entry didn't exist in the tables.
> - to improvement the performance, the allocated memory for the hash
>   tables are managed in a pointer array. and the elements are
>   concatinated logically by the pointer.
> - the pointer for the result-table, the version, and the currect jiffies
>   are set to vdir, which is a cache in an inode.
> - all cache are copied to a member in a file object.
> - the index of the cache memory block and the offset in an array is
>   handled as the seek position.
>
> In the case of the application issued this sequence:
> - opendir()
> - readdir()
> - creat or unlink an entry under the dir
> - readdir()
>
> When an entry under the dir was removed or added, the inode version will
> be updated. Since readdir can compare it with the cached version or the
> lifetime (jiffies) in the file object, it can refresh the entries. But
> in this case, it doesn't, since the file position is not 0. If the
> application needs the latest entries, it has to call rewinddir.
> The cache in the file object will updated only the case of obsoleted AND
> the file position is 0.
>
> When a dir who has already its vdir is opened, the cache in the inode
> object will be used without calling vfs_readdir, after checking the
> version and the lifetime which are stored in the inode object. If it is
> obsoleted, vfs_readdir will be called again in order to update the cache
> in the inode.
>
>   
This sounds like a good approach. How does aufs handle low memory
situations? Union mounts seem to be quite common on low memory embedded
systems. Is there a way for the VM to signal to aufs/the union
filesystem to trim its cache? Also on the memory consumption front I
guess you could get the union fs to refer to a singleton name entry
directly instead of creating a new virtual inode et al. This may lead to
some unusualness though for mounts over different filesystems that have
different length directory files (eg vfat and ext3). This does run
counter to the model described above in some ways.

Matt

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Folkert van Heusden

> >> I know that it's difficult to get people to read docs & help text,
> >> and maybe it is needed in more places, but CONFIG_ATA (SATA/PATA)
> >> help text says:
> >>   NOTE: ATA enables basic SCSI support; *however*,
> >>   'SCSI disk support', 'SCSI tape support', or
> >>   'SCSI CDROM support' may also be needed,
> >>   depending on your hardware configuration.
> > 
> > Yes but that would mean that you have to open the help for each item
> > that you add.
> > 
> >> A popup makes some sense, but I don't know if menuconfig knows how to
> >> do popup warnings... and it needs to be done for all *configs,
> >> not just menuconfig.
> > 
> > Maybe add a new type?
> 
> How about
> comment "Note: 'SCSI disk support' is required for SATA/PATA HDDs!"
>   depends on ATA && !BLK_DEV_SD

Yes! Maybe create some status-line at the bottom of the screen in which
these hints scrollby. Like powertop does.


Folkert van Heusden

-- 
MultiTail är en flexibel redskap för att fälja logfilar, utför av
commandoer, filtrera, ge färg, sammanfoga, o.s.v. följa.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Jan Engelhardt


On Sep 7 2007 21:38, Krzysztof Halasa wrote:
>> Ok, but that's not the most common situaties. What I'm suggesting is a
>> warning or a please note popup. Not neccessarily an error or refusing to
>> continue thing.
>
>What IMHO makes sense is changing all references to SCSI CDROM,
>SCSI DISK etc. to just CDROM, DISK, and changing SCSI (menu) to
>something like MASS STORAGE.

There is still too much SCSI in it IMO :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Intel Memory Ordering White Paper

2007-09-07 Thread Nick Piggin

On Saturday 08 September 2007 08:26, Jesse Barnes wrote:
> FYI, we just released a new white paper describing memory ordering for
> Intel processors:
> http://developer.intel.com/products/processor/manuals/index.htm
>
> Should help answer some questions about some of the ordering primitives
> we use on i386 and x86_64.

So, can we finally noop smp_rmb and smp_wmb on x86?
Index: linux-2.6/include/asm-i386/system.h
===
--- linux-2.6.orig/include/asm-i386/system.h
+++ linux-2.6/include/asm-i386/system.h
@@ -286,7 +286,7 @@ static inline unsigned long get_limit(un
 
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
-#define smp_rmb()	rmb()
+#define smp_rmb()	barrier()
 #define smp_wmb()	wmb()
 #define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void) xchg(, value); } while (0)
Index: linux-2.6/include/asm-x86_64/system.h
===
--- linux-2.6.orig/include/asm-x86_64/system.h
+++ linux-2.6/include/asm-x86_64/system.h
@@ -141,8 +141,8 @@ static inline void write_cr8(unsigned lo
 
 #ifdef CONFIG_SMP
 #define smp_mb()	mb()
-#define smp_rmb()	rmb()
-#define smp_wmb()	wmb()
+#define smp_rmb()	barrier()
+#define smp_wmb()	barrier()
 #define smp_read_barrier_depends()	do {} while(0)
 #else
 #define smp_mb()	barrier()

Re: [2/4] 2.6.23-rc5: known regressions

2007-09-07 Thread Michal Piotrowski

Hi,

On 03/09/2007, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > On Mon, 03 Sep 2007 11:48:00 +0100 "H. Peter Anvin" <[EMAIL PROTECTED]> 
> > wrote:
> > Michal Piotrowski wrote:
> > >
> > > Unclassified
> > >
> > > Subject : console is messed up after resume from s2ram or 
> > > switching to console from X
> > > References  : http://lkml.org/lkml/2007/8/4/6
> > > Last known good : ?
> > > Submitter   : Jeff Chua <[EMAIL PROTECTED]>
> > > Caused-By   : ?
> > > Handled-By  : H. Peter Anvin <[EMAIL PROTECTED]>
> > >   Antonino A. Daplas <[EMAIL PROTECTED]>
> > > Workaround  : "s2ram --force --acpi_sleep 1 --vbe_mode"
> > > Status  : problem is being debugged
> > >
> >
> > I'm inclined to write this one off as general STR weirdness.
>
> Both suspend-to-ram and suspend-to-disk are broken on this Vaio.  Running
> 2.6.23-rc4.
>
>
> suspend-to-RAM:
>
> a) sometimes hangs during suspend
>
> b) frequently hangs during resume
>
> c) occasionally acts weird after resume.  system requires repeated
>keypresses to make forward progress.

could be related with

Subject : cpu hotplug support broken in 2.6.23-rc3/highres
timers break cpu hotplug in 2.6.23-rc5
References  : http://lkml.org/lkml/2007/8/27/58
  http://lkml.org/lkml/2007/9/3/65
Last known good : ?
Submitter   : Pavel Machek <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : ?
Status  : problem is being debugged

>
> d) on those occasions where resume-from-RAM _does_ work, it takes much
>longer to resume than it used to.


Subject : resume from ram much slower
References  : http://lkml.org/lkml/2007/8/10/275
Last known good : 2.6.23-rc1 ?
Submitter   : Arkadiusz Miskiewicz <[EMAIL PROTECTED]>
Caused-By   : ?
Handled-By  : Rafael J. Wysocki <[EMAIL PROTECTED]>
Status  : problem is being debugged

?

>
> suspend-to-disk:
>
> a) always hangs when netconsole-over-e100 is enabled (might have been a
>2.6.21->2.6.22 regression).
>
> b) usually hangs during suspend
>
>
> Apart from suspend-to-disk's a), all of the above are post-2.6.21
> regressions.
>
>
>

Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 3/8] Immediate Values - Kconfig menu in EMBEDDED

2007-09-07 Thread Andi Kleen

On Fri, Sep 07, 2007 at 08:46:48AM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen ([EMAIL PROTECTED]) wrote:
> > > +config IMMEDIATE
> > > + default y if !DISABLE_IMMEDIATE
> > 
> > It's still unclear to me why DISABLE_IMMEDIATE is needed. It would
> > be better to make it just the default.
> > 
> 
> It is actually the default on any non embedded configuration. Do you
> think we should make it default to on on embedded configs too ?

I would prefer to not have any config options at all and let
the non converted architectures always use a asm-generic fallback.

> The idea here is to give embedded system developers incentives to
> create an optimized immediate value header for their architecture. I

Sounds like a quite bogus way to do this.

> fear that if it is not trivial to disable when they need to use ROM to
> put the kernel code (as kprobes is, meaning, with a single config
> option), they will refuse to event think about including an optimized
> immediate value header for their architecture.

#ifdef CONFIG_ARCH_SPECIFIC_READONLY
#include  
#else
/* optimized implementation */
#endif

That's trivial.

> And yes, having a CONFIG_READ_ONLY_TEXT makes sense, but it implies
> menu dependencies with not only immediate values but also kprobes,
> paravirt, alternatives, (am I missing others ?)

paravirt and alternatives are x86 only.

I don't think CONFIG_READ_ONLY_TEXT on x86 makes sense.

On other architectures they have to deal with kprobes, but they
presumably do this already. Not really your problem I suspect.


> As long as we find a way for people to disable _all_ code patching in
> their kernel, I'm happy with that. But since every existing code
> patching mechanism can currently be disabled one by one, it makes sense
> to do the same for the immediate values. Having a global
> CONFIG_READ_ONLY_TEXT should IMHO come in a separate effort.

You're clearly deep into overdesign territory here.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch for Intel Tolapai

2007-09-07 Thread Jeff Garzik

Gaston, Jason D wrote:

-Original Message-
From: Gaston, Jason D
Sent: Friday, August 31, 2007 10:10 AM
To: 'Jeff Garzik'
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: RE: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch

for

Intel Tolapai

This device has both AHCI and RAID modes that use the ahci driver.

Only

the RAID mode DID's are being added as the PCI class code support will
cover the AHCI mode.  Looking at the Generic, PCI class code support
section, it uses "board_ahci".  I assumed that they should be the same

as

the generic class code support is working on this platform.

Thanks,

Jason

-Original Message-
From: Jeff Garzik [mailto:[EMAIL PROTECTED]
Sent: Friday, August 31, 2007 12:47 AM
To: Gaston, Jason D
Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2.6.23-rc4][reRESEND] ahci: RAID mode SATA patch

for

Intel Tolapai

Jason Gaston wrote:

Resend trying to remove 8-bit characters in the email.

This patch adds the Intel Tolapai RAID controller DID's for SATA

support.

Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>

--- linux-2.6.23-rc4/drivers/ata/ahci.c.orig2007-08-27

18:32:35.0 -0700

+++ linux-2.6.23-rc4/drivers/ata/ahci.c 2007-08-28

16:58:11.0 -

0700

@@ -411,6 +411,8 @@
{ PCI_VDEVICE(INTEL, 0x292f), board_ahci_pi }, /* ICH9M */
{ PCI_VDEVICE(INTEL, 0x294d), board_ahci_pi }, /* ICH9 */
{ PCI_VDEVICE(INTEL, 0x294e), board_ahci_pi }, /* ICH9M */
+   { PCI_VDEVICE(INTEL, 0x502a), board_ahci }, /* Tolapai */
+   { PCI_VDEVICE(INTEL, 0x502b), board_ahci }, /* Tolapai */

Why did you not use board_ahci_pi?  Is the AHCI ports-implemented
register unreliable on this platform?

Jeff,

Do I need to change this to board_ahci_pi or is it ok to leave it at
board_ahci, which will be used by the AHCI class code devices?

You are the one who needs to answer this question ;-)

Most new Intel AHCI have a sane and reliable Ports-Implemented register 
value even across reset, unlike earlier ones or some clones.  For those, 
we use board_ahci_pi.

If PI is not reliable across reset or if BIOS is absent (yes we care 
about that case, when we do our own PCI resume for example), then you 
should use board_ahci.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 05/10] Text Edit Lock - Alternative code for i386 and x86_64

2007-09-07 Thread Andi Kleen

On Fri, Sep 07, 2007 at 10:04:42AM -0400, Mathieu Desnoyers wrote:
> * Andi Kleen ([EMAIL PROTECTED]) wrote:
> > On Thu, Sep 06, 2007 at 04:01:29PM -0400, Mathieu Desnoyers wrote:
> > > + sync_core();
> > > + /* Not strictly needed, but can speed CPU recovery up. */
> > 
> > That turned out to break on some VIA CPUs. Should be removed.
> > 
> 
> Hrm, when does it break ? At boot time ? Is it the cpuid that breaks or

Yes.

> the clflush ? How do you work around the problem when sync_core or

The CLFLUSH

> clflush is called from elsewhere; does it cause a problem if I call it
> when I update immediate values ?

Unknown currently what are the exact circumstances.

For the other cases it is ignored right now, but when we get 
more information it might be needed to clear the CLFLUSH 
feature bit on those CPUs.

> Is it me or __inline_memcpy is simply a copy of i386's __memcpy ?
> Is there any reason for this name change ?

x86-64 __memcpy does something different. 

It might make more sense

At some point I hope to change the i386 setup to be more like x86-64
anyways -- the x86-64 version is imho much better.

>   A- ugly
>   B- breaking vim syntax highlighting. (actually, all the rest of the
>   file becomes weird after that. The problem is similar to declaration
>   of #defile name ({ some code }). It does not really matter as long as
>   it is in a header, but at the middle of a C file it gets rather
>   annoying). (it never though I would use vim as a coding style
>   reference) ;)

Then define a macro

#define BREAKPOINTS(x) \
((unsigned char [x]){ [0 ... x] = BREAKPOINT_INSTRUCTIONS })

and use that

> And what is rather different between the 2 functions is when we want to
> fill multiple bytes with the same pattern (I fill the unused part of my
> immediate values bypass with 0x90 nops, but I agree that I could use
> add_nops if it was exported).
> 
> Declaration of a variable length array on text_set's stack would break
> older compilers, so I don't think it is a neat solution neither. kmalloc

All supported gccs support variable length arrays.

> The idea is to mimic the local_irq_save/restore semantic, where the
> flags argument is passed without &. This is why I use a macro instead of
> an inline function

Sounds like a bogus idea to me.

> The good effect of disabling interrupts is that it would make sure no
> interrupt handler will run with WP flag cleared on the CPU.  However, it

Yes that was my point. Not a very strong one admittedly.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Intel Memory Ordering White Paper

2007-09-07 Thread Jesse Barnes

FYI, we just released a new white paper describing memory ordering for 
Intel processors:
http://developer.intel.com/products/processor/manuals/index.htm

Should help answer some questions about some of the ordering primitives 
we use on i386 and x86_64.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

(no subject)

2007-09-07 Thread Jim Cromie


auth 2efbb938 subscribe linux-kernel [EMAIL PROTECTED]
auth a339d34a subscribe linux-net [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario?

2007-09-07 Thread Nick Piggin

On Wednesday 05 September 2007 17:07, Christoph Lameter wrote:
> On Wed, 5 Sep 2007, Zhang, Yanmin wrote:
> > > slub_max_order=3 slub_min_objects=8
> >
> > I tried this approach. The testing result showed 2.6.23-rc4 is about
> > 2.5% better than 2.6.22. It really resovles the issue.
>
> Note also that the configuration you tried is the way SLUB is configured
> in Andrew's tree.

It still doesn't sound like it is competitive with SLAB at the same sizes.
What's the problem?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread Henrique de Moraes Holschuh

On Fri, 07 Sep 2007, David Brownell wrote:
> > > For that matter, a *driver* should never create its own device node(s)
> > > in the first place.  Device creation belongs elsewhere, like as part of
> > > platform setup or, for busses with integral enumeration support like
> > > PCI or USB, bus glue.  Linux is moving away from that legacy model.
> >
> > This assumes that we have a better bus than "platform" to dump drivers like
> > thinkpad-acpi, hdaps, and a host of other host-specific stuff.
> 
> I don't follow.  If it's host-specific, then it's easy enough to
> have a host-specific routine creating those platform devices.
> A different host wouldn't call that routine.

We do that in the module that also provides the device driver. E.g. hdaps or
thikpad-acpi will provide both the platform device (and register it), and
the driver.

> (Also, note that "platform", "host", and "board" are ambiguous.
> In some contexts each is synonymous; in others, not.  I avoid

In this specific case I am talking about, they're not.  ThinkPads are the
host.  The platform for a ThinkPad is either i386 or amd64.  But there are
many more hosts that are i386 or amd64 than ThinkPads, and the devices in my
example are thinkpad-specific.

I don't feel like drivers like hdaps, thinkpad-acpi, dock, bay, and many
others really belong in the platform bus.  But that's what happens right
now.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add support for keyboard on SEGA Dreamcast

2007-09-07 Thread Adrian McMenamin

On 05/09/07, Dmitry Torokhov <[EMAIL PROTECTED]> wrote:

>
> Are we guaranteed that the dc_kbd_callback is not running in a separate
> thread?
>
> Please also consider implementing support for changing keyma. Since
> the keymap is pretty full I think the best way is to copy the vanilla
> keymap into a per-device memory and set up keymap, keycodesize and
> keycodemax in input device structure.
>
> Thank you.
>
> --
> Dmitry
>

Dmitry - have now worked out at least one alternative keymap - for
European keyboards. But could you explain the above point? I have to
admit I am not too familiar with the input layer - having really taken
an old, and bit rot infested 2.4 driver and brought it up to standard
for 2.6 - and so I don't follow this point too well!

Can I treat keymaps as firmware and load from userspace? Or is there a
way of making this completely user configurable at runtime?

Any help from you, or any other reader, much appreciated.

Adrian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread David Brownell

> > For that matter, a *driver* should never create its own device node(s)
> > in the first place.  Device creation belongs elsewhere, like as part of
> > platform setup or, for busses with integral enumeration support like
> > PCI or USB, bus glue.  Linux is moving away from that legacy model.
>
> This assumes that we have a better bus than "platform" to dump drivers like
> thinkpad-acpi, hdaps, and a host of other host-specific stuff.

I don't follow.  If it's host-specific, then it's easy enough to
have a host-specific routine creating those platform devices.
A different host wouldn't call that routine.

Embedded Linux platforms do that *ALL* the time.  ARM keys on a
board ID provided early in boot (e.g. by U-Boot).  PowerPC uses
a device tree, which ISTR evolved from the OpenBoot as first used
on SPARC.  Worst comes to worst, the kernel command line can say
which board is involved, and thus which setup code to run.

(Also, note that "platform", "host", and "board" are ambiguous.
In some contexts each is synonymous; in others, not.  I avoid
using "host" except in the protocol sense.  Usually "board" is
pretty specific -- this cpu, those peripherals -- although it
gets messy when the system is really a board stack, or when the
CPU may be socketed or be in a customizable FPGA etc.)


> > I realize that may be more easily said than done in some cases,
> > like i8042 on non-PNP systems.
>
> Yes, and there is a LOT of non-PNP stuff involved, since platform became the
> dumping ground for host-specific devices (as opposed to platform-specific
> devices).

See above ... most embedded systems aren't x86, so lack of PNP is
less of an issue than plain old legacy system designs -- designed
in ways that complicate or prevent probe/discovery schemes, which
gets to be a mess (like the one preceding PNP with DOS/x86/ISA).

Less clear cases include orphaned drivers, especially ones for
hardware that's on its way out or already obsolete.  Most folk
don't want to touch those, for fear of getting stuck to them.  :)

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Nick Piggin

On Saturday 08 September 2007 17:25, Nick Piggin wrote:
> On Saturday 08 September 2007 07:12, Goswin von Brederlow wrote:
> > Nick Piggin <[EMAIL PROTECTED]> writes:
> > > On Saturday 08 September 2007 06:01, Goswin von Brederlow wrote:
> > >>   b) a segment boundary
> > >
> > > This is done, as I said, because of the deadlock issue. While the issue
> > > is more completely fixed in -mm, a special case for kernel memory (eg.
> > > nfsd) is in the latest mainline kernels.
> >
> > Can you tell me where to get the fix from -mm? If it is completly
> > fixed there then that could make our patch obsolete.
>
> In the latest -mm series file, they start at
> mm-revert-kernel_ds-buffered-write-optimisation.patch
> ...
> and go to
> ocfs2-convert-to-new-aops.patch
>
> > >> What actually locks the page? Is it __grab_cache_page or
> > >> a_ops->prepare_write?
> > >
> > > prepare_write must be given a locked page.
> >
> > Then that means __grab_cache_page does return a locked page because
> > there is nothing between the two calls that would.
>
> That's right.
>
> > > No it would be included earlier. The "segment_eq" check should be
> > > allowing kernel writes (nfsd) to write multiple segments. If you have a
> > > patch which changes this significantly, then it would indicate the
> > > existing logic has a problem (or you've got a userspace application
> > > doing the writev, which should be fixed by the write_begin patches in
> > > -mm).
> >
> > I've got userspace application doing the writev. To be exact 14% of
> > the commits were saved by combining multiple segments into a single
> > prepare/write pair. Since the kernel segments don't fragment anymore
> > in 2.6.23-rc5 those savings must come from user space stuff.
> >
> > From the stats posted earlier you can see that there is a substantial
> > amount of calls with 6 segments all (alot) smaller than a page. Lots
> > of calls our patch or the write_begin/end will save.
>
> OK. The write_begin/write_end patchset is intrusive, no question. I'm not
> sure what you're intending to do with it. They have been tested in -mm for
> quite a while now, but just going with a simple patch that tries to copy
> more segments might be OK for you if you're backporting. The deadlock is
> pretty uncommon.

Lustre should probably have to be ported over to write_begin/write_end in
order to use it too. With the patches in -mm, if a filesystem is still using
prepare_write/commit_write, the vm reverts to a safe path which avoids
the deadlock (and allows multi-seg io copies), but copies the data twice.

OTOH, this is very likely to go upstream, so your filesystem will need to be
ported over sooner or later anyway.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/6] Linux Kernel Markers - Architecture Independent Code

2007-09-07 Thread Roland Dreier

 > > Anybody got a proposed scheme for the case where somebody like myself
 > > who is *not* a member of the Maintainer Cabal has looked at a patch, and
 > > found a valid show-stopper that's bigger than just whitespace (breaks on
 > > 64-bit, locking issues, etc), or other commentary that *should* be 
 > > addressed
 > > before it gets merged?  I'd like *some* way to tag a patch with "I had an
 > > issue with V1, but the author addressed it to my satisfaction in V2"

 > I think that'd be Reviewed-By.  While you are not part of the smokey room
 > cabal you have shown technical expertise in various areas so it seems
 > perfectly fine to have reviewed-by from you.  The fix vs a previous version
 > should probably be just in the text with a paragraph ala:

 > Issue blah in a previous version as found by Valdis Kletnieks has been fixed
 > by doing foo.

At ksummit Andrew also mentioned including a link to the relevant
mailing list discussion too, and I think this would be a good example
of when that would be useful.

 - R.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Nick Piggin

On Saturday 08 September 2007 07:12, Goswin von Brederlow wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> > On Saturday 08 September 2007 06:01, Goswin von Brederlow wrote:

> >>   b) a segment boundary
> >
> > This is done, as I said, because of the deadlock issue. While the issue
> > is more completely fixed in -mm, a special case for kernel memory (eg.
> > nfsd) is in the latest mainline kernels.
>
> Can you tell me where to get the fix from -mm? If it is completly
> fixed there then that could make our patch obsolete.

In the latest -mm series file, they start at
mm-revert-kernel_ds-buffered-write-optimisation.patch
...
and go to
ocfs2-convert-to-new-aops.patch


> >> What actually locks the page? Is it __grab_cache_page or
> >> a_ops->prepare_write?
> >
> > prepare_write must be given a locked page.
>
> Then that means __grab_cache_page does return a locked page because
> there is nothing between the two calls that would.

That's right.


> > No it would be included earlier. The "segment_eq" check should be
> > allowing kernel writes (nfsd) to write multiple segments. If you have a
> > patch which changes this significantly, then it would indicate the
> > existing logic has a problem (or you've got a userspace application doing
> > the writev, which should be fixed by the write_begin patches in -mm).
>
> I've got userspace application doing the writev. To be exact 14% of
> the commits were saved by combining multiple segments into a single
> prepare/write pair. Since the kernel segments don't fragment anymore
> in 2.6.23-rc5 those savings must come from user space stuff.
>
> From the stats posted earlier you can see that there is a substantial
> amount of calls with 6 segments all (alot) smaller than a page. Lots
> of calls our patch or the write_begin/end will save.

OK. The write_begin/write_end patchset is intrusive, no question. I'm not sure
what you're intending to do with it. They have been tested in -mm for quite a
while now, but just going with a simple patch that tries to copy more segments
might be OK for you if you're backporting. The deadlock is pretty uncommon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Nick Piggin

On Saturday 08 September 2007 07:00, Goswin von Brederlow wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> > Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
> > also a workaround for the NFSD problem in git commit 29dbb3fc. Did
> > you try a later kernel to see if it is fixed there?
>
> I had a chance to look up that commit (git clone took a while so sorry
> for writing 2 mails). It is present in 2.6.23-rc5 so I already noticed
> it when merging our patch in 2.6.23-rc5.
>
> Upon closer reading of the patch though I see that it will indeed
> prevent writes by the nfsd to be split smaller than PAGE_SIZE and it
> will cause filemap_copy_from_user[_iovec] to be called with a source
> spanning multwhat to do in cambridge pubiple pages.

OK, good.

> So the commit 29dbb3fc should have a simmilar, slightly better even,
> gain for the nfsd and other kernel space segments. But it will not
> improve writes from user space, where ~14% of the commits were saved
> during a days work for me.
>
>
> Now I have a question about fault_in_pages_readable(). Can I call that
> for multiple pages and then call __grab_cache_page() without risking
> one of the pages from getting lost again and causing a deadlock?

No. The existing mainline code is buggy, and it is just hoping that
the userspace page does not get paged out between the fault_in_pages
and the subsequent copy_from_user.

If you do multiple fault_in_pages_readable(), you probably have less
chance of deadlocking, but on the 2nd call, you might still have to
wait for a long time to page in, during which time the 1st page may get
paged out.

But there is a proper solution in -mm with the write_begin aop.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Goswin von Brederlow

Nick Piggin <[EMAIL PROTECTED]> writes:

> On Saturday 08 September 2007 06:01, Goswin von Brederlow wrote:
>> Nick Piggin <[EMAIL PROTECTED]> writes:
>> > So I believe the problem is that for a multi-segment iovec, we currently
>> > prepare_write/commit_write once for each segment, right? We do this
>>
>> It is more complex.
>>
>> Currently a __grab_cache_page, a_ops->prepare_write,
>> filemap_copy_from_user[_iovec] and a_ops->commit_write is done
>> whenever we hit
>>
>>   a) a page boundary
>
> This is required by the prepare_write/commit_write API. The write_begin
> / write_end API is also a page-based one, but in future, we are looking
> at having a more general API but we haven't completely decided on the
> form yet. "perform_write" is one proposal you can look for.
>
>>   b) a segment boundary
>
> This is done, as I said, because of the deadlock issue. While the issue is
> more completely fixed in -mm, a special case for kernel memory (eg. nfsd)
> is in the latest mainline kernels.

Can you tell me where to get the fix from -mm? If it is completly
fixed there then that could make our patch obsolete.

>> Those two cases don't have to, and from the stats basically never,
>> coincide. For NFSd this means we do this TWICE per segment and TWICE
>> per page.
>
> The page boundary doesn't matter so much (well it does for other reasons,
> but we've never been good at them...). The segment boundary means that
> we aren't able to do block sized writes very well and end up doing a lot of
> read-modify-write operations that could be avoided.

Those are extremly costly for lustre. We have tested exporting a
lustre filesystem to NFS. Without fixes we get 40MB/s and with the
fixes it rises to nearly 200MB/s. That is a factor of 5 in speed.

>> > because there is a nasty deadlock in the VM (copy_from_user being
>> > called with a page locked), and copying multiple segs dramatically
>> > increases the chances that one of these copies will cause a page fault
>> > and thus potentially deadlock.
>>
>> What actually locks the page? Is it __grab_cache_page or
>> a_ops->prepare_write?
>
> prepare_write must be given a locked page.

Then that means __grab_cache_page does return a locked page because
there is nothing between the two calls that would.

>> Note that the patch does not change the number of copy_from_user calls
>> being made nor does it change their arguments. If we need 2 (or more)
>> segments to fill a page we still do 2 seperate calls to
>> filemap_copy_from_user_iovec, both only spanning (part of) one
>> segment.
>>
>> What the patch changes is the number of copy_from_user calls between
>> __grab_cache_page and a_ops->commit_write.
>
> So you're doing all copy_from_user calls within a prepare_write? Then
> you're increasing the chances of deadlock. If not, then you're breaking
> the API contract.

Actually due to a bug, as you noticed, we do the copy first and then
prepare/write. But fixing that would indeed do multiple copies between
prepare and commit.

>> Copying a full PAGE_SIZE bytes from multiple segments in one go would
>> be a further improvement if that is possible.
>>
>> > The fix you have I don't think can work because a filesystem must be
>> > notified of the modification _before_ it has happened. (If I understand
>> > correctly, you are skipping the prepare_write potentially until after
>> > some data is copied?).
>>
>> Yes. We changed the order of copy_from_user calls and
>> a_ops->prepare_write by mistake. We will rectify that and do the
>> prepare_write for the full page (when possible) before copying the
>> data into the page.
>
> OK, that is what used to be done, but the API is broken due to this
> deadlock. write_begin/write_end fixes it properly.

I'm verry interested in that fix.

>> > Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
>> > also a workaround for the NFSD problem in git commit 29dbb3fc. Did
>> > you try a later kernel to see if it is fixed there?
>>
>> Later than 2.6.23-rc5?
>
> No it would be included earlier. The "segment_eq" check should be
> allowing kernel writes (nfsd) to write multiple segments. If you have a
> patch which changes this significantly, then it would indicate the
> existing logic has a problem (or you've got a userspace application doing
> the writev, which should be fixed by the write_begin patches in -mm).

I've got userspace application doing the writev. To be exact 14% of
the commits were saved by combining multiple segments into a single
prepare/write pair. Since the kernel segments don't fragment anymore
in 2.6.23-rc5 those savings must come from user space stuff.

>From the stats posted earlier you can see that there is a substantial
amount of calls with 6 segments all (alot) smaller than a page. Lots
of calls our patch or the write_begin/end will save.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: Platform device id

2007-09-07 Thread Henrique de Moraes Holschuh

On Fri, 07 Sep 2007, David Brownell wrote:
> For that matter, a *driver* should never create its own device node(s)
> in the first place.  Device creation belongs elsewhere, like as part of
> platform setup or, for busses with integral enumeration support like
> PCI or USB, bus glue.  Linux is moving away from that legacy model.

This assumes that we have a better bus than "platform" to dump drivers like
thinkpad-acpi, hdaps, and a host of other host-specific stuff.

> I realize that may be more easily said than done in some cases,
> like i8042 on non-PNP systems.

Yes, and there is a LOT of non-PNP stuff involved, since platform became the
dumping ground for host-specific devices (as opposed to platform-specific
devices).

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Goswin von Brederlow

Nick Piggin <[EMAIL PROTECTED]> writes:

> Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
> also a workaround for the NFSD problem in git commit 29dbb3fc. Did
> you try a later kernel to see if it is fixed there?

I had a chance to look up that commit (git clone took a while so sorry
for writing 2 mails). It is present in 2.6.23-rc5 so I already noticed
it when merging our patch in 2.6.23-rc5.

Upon closer reading of the patch though I see that it will indeed
prevent writes by the nfsd to be split smaller than PAGE_SIZE and it
will cause filemap_copy_from_user[_iovec] to be called with a source
spanning multiple pages.

So the commit 29dbb3fc should have a simmilar, slightly better even,
gain for the nfsd and other kernel space segments. But it will not
improve writes from user space, where ~14% of the commits were saved
during a days work for me.

Now I have a question about fault_in_pages_readable(). Can I call that
for multiple pages and then call __grab_cache_page() without risking
one of the pages from getting lost again and causing a deadlock?

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: easy alsa patches for the stable kernel?

2007-09-07 Thread Romano Giannetti

On Fri, 2007-09-07 at 21:42 +0200, Thorsten Leemhuis wrote:
> On 07.09.2007 14:58, Takashi Iwai wrote:
> >>> Ah good.  I added it to ALSA HG tree now.

Thanks. BTW, is anywhere visible the current hg tree? It seems that

http://hg-mirror.alsa-project.org/alsa-kernel/ lags a bit behind...

> It's just this line afaics...
> +   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
> ...which afaics is doing nothing more then "if DMI-Data matches FOO then
> apply know workaround BAR". Is that correct or am I missing something
> here (another patch that this one depends on that isn't in 2.6.23 yet
> maybe?)?
>

Your second guess is right. That line is a patch with respect to current
mercurial tree, which is quite ahead of the current kernel alsa code.
Although I'd like to know from where Andrew pulled it, because I was not
able to find that tree on git.kernel nor alsa-project.org... :-)

> Nevertheless let me use to use this moment and say: thx for all your
> work Takashi!

Seconded. I only hope I will be able to continue to find this patch for
the next releases of the 2.6.23 kernel...

Romano

--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread Henrique de Moraes Holschuh

On Fri, 07 Sep 2007, Dmitry Torokhov wrote:
> On 9/7/07, Jean Delvare <[EMAIL PROTECTED]> wrote:
> > To go one step further, I am questioning the real value of this naming
> > exception for these "unique" platform devices. On top of the bugs I
> > mentioned above, it has potential for compatibility breakage: adding a

Agreed.  But the breakage might happen anyway, if you need to move
attributes from foo.0 to foo.1.  After that first time, userspace will learn
to hunt down all foo.* after what it wants, but still...

> If a device has a . scheme this implies possibility of
> having several instances of said device in a box. There are a few of

No, it doesn't. It allows for, but it does not imply anything.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] Fix BIOS-e820 end address

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [patch] Fix BIOS-e820 end address

--snip of boot message--
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 7fe8cc00 (usable)
end snip---

As you see from above the address 0010 is both
shown as reserved and usable which is confusing.

This patch fixes the BIOS-e820 end address.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 arch/i386/kernel/e820.c   |2 +-
 arch/x86_64/kernel/e820.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: work/arch/i386/kernel/e820.c
===
--- work.orig/arch/i386/kernel/e820.c   2007-09-08 12:00:33.0 -0700
+++ work/arch/i386/kernel/e820.c2007-09-08 13:39:12.0 -0700
@@ -753,7 +753,7 @@
for (i = 0; i < e820.nr_map; i++) {
printk(" %s: %016Lx - %016Lx ", who,
e820.map[i].addr,
-   e820.map[i].addr + e820.map[i].size);
+   e820.map[i].addr + e820.map[i].size - 1);
switch (e820.map[i].type) {
case E820_RAM:  printk("(usable)\n");
break;
Index: work/arch/x86_64/kernel/e820.c
===
--- work.orig/arch/x86_64/kernel/e820.c 2007-09-08 12:00:46.0 -0700
+++ work/arch/x86_64/kernel/e820.c  2007-09-08 13:38:57.0 -0700
@@ -368,7 +368,7 @@
for (i = 0; i < e820.nr_map; i++) {
printk(KERN_INFO " %s: %016Lx - %016Lx ", who,
(unsigned long long) e820.map[i].addr,
-   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size));
+   (unsigned long long) (e820.map[i].addr + 
e820.map[i].size - 1));
switch (e820.map[i].type) {
case E820_RAM:  printk("(usable)\n");
break;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Goswin von Brederlow

Nick Piggin <[EMAIL PROTECTED]> writes:

> On Thursday 06 September 2007 03:41, Bernd Schubert wrote:
> Minor nit: when resubmitting a patch, you should include everything
> (ie. the full changelog of problem statement and fix description) in a
> single mail. It's just a bit easier...

Will do next time.

> So I believe the problem is that for a multi-segment iovec, we currently
> prepare_write/commit_write once for each segment, right? We do this

It is more complex.

Currently a __grab_cache_page, a_ops->prepare_write,
filemap_copy_from_user[_iovec] and a_ops->commit_write is done
whenever we hit

  a) a page boundary
  b) a segment boundary

Those two cases don't have to, and from the stats basically never,
coincide. For NFSd this means we do this TWICE per segment and TWICE
per page.

> because there is a nasty deadlock in the VM (copy_from_user being
> called with a page locked), and copying multiple segs dramatically
> increases the chances that one of these copies will cause a page fault
> and thus potentially deadlock.

What actually locks the page? Is it __grab_cache_page or
a_ops->prepare_write?

Note that the patch does not change the number of copy_from_user calls
being made nor does it change their arguments. If we need 2 (or more)
segments to fill a page we still do 2 seperate calls to
filemap_copy_from_user_iovec, both only spanning (part of) one
segment.

What the patch changes is the number of copy_from_user calls between
__grab_cache_page and a_ops->commit_write.

Copying a full PAGE_SIZE bytes from multiple segments in one go would
be a further improvement if that is possible.

> The fix you have I don't think can work because a filesystem must be
> notified of the modification _before_ it has happened. (If I understand
> correctly, you are skipping the prepare_write potentially until after
> some data is copied?).

Yes. We changed the order of copy_from_user calls and
a_ops->prepare_write by mistake. We will rectify that and do the
prepare_write for the full page (when possible) before copying the
data into the page.

> Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
> also a workaround for the NFSD problem in git commit 29dbb3fc. Did
> you try a later kernel to see if it is fixed there?

Later than 2.6.23-rc5?

> Thanks,
> Nick

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Nick Piggin

On Saturday 08 September 2007 06:01, Goswin von Brederlow wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> > On Thursday 06 September 2007 03:41, Bernd Schubert wrote:
> > Minor nit: when resubmitting a patch, you should include everything
> > (ie. the full changelog of problem statement and fix description) in a
> > single mail. It's just a bit easier...
>
> Will do next time.
>
> > So I believe the problem is that for a multi-segment iovec, we currently
> > prepare_write/commit_write once for each segment, right? We do this
>
> It is more complex.
>
> Currently a __grab_cache_page, a_ops->prepare_write,
> filemap_copy_from_user[_iovec] and a_ops->commit_write is done
> whenever we hit
>
>   a) a page boundary

This is required by the prepare_write/commit_write API. The write_begin
/ write_end API is also a page-based one, but in future, we are looking
at having a more general API but we haven't completely decided on the
form yet. "perform_write" is one proposal you can look for.

>   b) a segment boundary

This is done, as I said, because of the deadlock issue. While the issue is
more completely fixed in -mm, a special case for kernel memory (eg. nfsd)
is in the latest mainline kernels.

> Those two cases don't have to, and from the stats basically never,
> coincide. For NFSd this means we do this TWICE per segment and TWICE
> per page.

The page boundary doesn't matter so much (well it does for other reasons,
but we've never been good at them...). The segment boundary means that
we aren't able to do block sized writes very well and end up doing a lot of
read-modify-write operations that could be avoided.

> > because there is a nasty deadlock in the VM (copy_from_user being
> > called with a page locked), and copying multiple segs dramatically
> > increases the chances that one of these copies will cause a page fault
> > and thus potentially deadlock.
>
> What actually locks the page? Is it __grab_cache_page or
> a_ops->prepare_write?

prepare_write must be given a locked page.

> Note that the patch does not change the number of copy_from_user calls
> being made nor does it change their arguments. If we need 2 (or more)
> segments to fill a page we still do 2 seperate calls to
> filemap_copy_from_user_iovec, both only spanning (part of) one
> segment.
>
> What the patch changes is the number of copy_from_user calls between
> __grab_cache_page and a_ops->commit_write.

So you're doing all copy_from_user calls within a prepare_write? Then
you're increasing the chances of deadlock. If not, then you're breaking
the API contract.

> Copying a full PAGE_SIZE bytes from multiple segments in one go would
> be a further improvement if that is possible.
>
> > The fix you have I don't think can work because a filesystem must be
> > notified of the modification _before_ it has happened. (If I understand
> > correctly, you are skipping the prepare_write potentially until after
> > some data is copied?).
>
> Yes. We changed the order of copy_from_user calls and
> a_ops->prepare_write by mistake. We will rectify that and do the
> prepare_write for the full page (when possible) before copying the
> data into the page.

OK, that is what used to be done, but the API is broken due to this
deadlock. write_begin/write_end fixes it properly.

> > Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
> > also a workaround for the NFSD problem in git commit 29dbb3fc. Did
> > you try a later kernel to see if it is fixed there?
>
> Later than 2.6.23-rc5?

No it would be included earlier. The "segment_eq" check should be
allowing kernel writes (nfsd) to write multiple segments. If you have a
patch which changes this significantly, then it would indicate the
existing logic has a problem (or you've got a userspace application doing
the writev, which should be fixed by the write_begin patches in -mm).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] net: myri10ge: force select inet_lro

2007-09-07 Thread Daniel Walker

This driver uses the inet_lro facilities , but it doesn't force it
to be enabled .. Someone would have to know to enable inet_lro if
they select the driver .. 

Instead, just force INET_LRO if this driver is selected..

Signed-off-by: Daniel Walker <[EMAIL PROTECTED]>

---
 drivers/net/Kconfig |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22/drivers/net/Kconfig
===
--- linux-2.6.22.orig/drivers/net/Kconfig
+++ linux-2.6.22/drivers/net/Kconfig
@@ -2103,6 +2103,7 @@ source "drivers/net/ixp2000/Kconfig"
 config MYRI_SBUS
tristate "MyriCOM Gigabit Ethernet support"
depends on SBUS
+   select INET_LRO
help
  This driver supports MyriCOM Sbus gigabit Ethernet cards.
 
-- 

-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: USB Key light on/off state depending on mount

2007-09-07 Thread Casey Dahlin


Sorry to have left this dormant for so long.

Running eject in either of the ways suggested still leaves the light on 
my particular key turned on.


Stefan Richter wrote:

Guennadi Liakhovetski wrote:
  
I might imagine how windows turns the LED off on 
unmount. Try "eject /dev/sdX", where sdX is your USB storage, after you 
unmount it. Be careful, especially if you have SATA (or SCSI) discs in 
your system or if you use libata for PATA discs not to eject the wrong 
one...



If there is only one USB disk connected:
# eject /dev/disk/by-path/*usb*:0

Provided you let udev create links for you.  BTW, the /dev/disk/by-id/
symlinks are nice for static mount points in /etc/fstab.

After a disk was mounted, eject also accepts the mountpoint as parameter
and will unmount the disk before it tries to eject it.
  


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] list.h: add list_for_each_entry_continue_rcu

2007-09-07 Thread Johannes Berg

On Fri, 2007-09-07 at 12:57 -0700, Paul E. McKenney wrote:

> Actually, list_for_each_continue_rcu() needs to be removed in favor of
> your new list_for_each_entry_continue_rcu().  There are currently only
> two users as of 2.6.22.  One of them immediately does a list_entry(),
> and the other would convert easily as well.  So please let me know
> when this gets accepted!  ;-)

Heh, ok, I won't add the text to that macro if you want to remove it
anyway. I guess I'll add your paragraph and ... hmm. what do I do with
it? Who's responsible for list.h? Can I push this through John Linville
and Dave Miller so I can get the fix into his tree easily without
synchronisation?

johannes

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] list.h: add list_for_each_entry_continue_rcu

2007-09-07 Thread Paul E. McKenney

On Fri, Sep 07, 2007 at 09:09:52PM +0200, Johannes Berg wrote:
> On Fri, 2007-09-07 at 08:34 -0700, Paul E. McKenney wrote:
> 
> > > + * Continue to iterate over rcu list of given type, continuing after
> > > + * the current position.
> > 
> > Please add something like the following to this comment:
> > 
> > Note that the caller is responsible for making sure that
> > the element remains in place between the earlier iterator
> > and this one.  One way to do this is to ensure that
> > both iterators are covered by the same rcu_read_lock(),
> > while others involve reference counts, flags, or mutexes.
> 
> Sure, will do. Should this comment also be added to
> list_for_each_continue_rcu()?

Actually, list_for_each_continue_rcu() needs to be removed in favor of
your new list_for_each_entry_continue_rcu().  There are currently only
two users as of 2.6.22.  One of them immediately does a list_entry(),
and the other would convert easily as well.  So please let me know
when this gets accepted!  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

crash while playing bzflag

2007-09-07 Thread Alex Riesen

Kernel: v2.6.23-rc5+ (b21010ed6498391c0f359f2a89c907533fe07fec)
Ubuntu Feisty, Radeon R200 (9200) dual head, MergedFB, BZFlag in
OpenGL mode, frozen. That'll teach me playing games at home...

BUG: unable to handle kernel paging request at virtual address ffa85000
 printing eip:
c016eed1
*pde = 5067
*pte = 
Oops:  [#1]
PREEMPT SMP 
Modules linked in: binfmt_misc nfs radeon drm nfsd exportfs lockd sunrpc fan 
firmware_class it87 hwmon_vid hwmon p4_clockmod speedstep_lib ipv6 sg sr_mod 
cdrom usb_storage snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_pcm 
snd_mixer_oss snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq 
snd_timer snd_seq_device generic floppy snd ide_core intel_agp e100 uhci_hcd 
ehci_hcd soundcore snd_page_alloc agpgart evdev
CPU:0
EIP:0060:[__link_path_walk+2146/2867]Not tainted VLI
EFLAGS: 00010287   (2.6.23-rc5-t #138)
EIP is at __link_path_walk+0x862/0xb33
eax: ffa85000   ebx: f0a4dd64   ecx: c0442130   edx: c1782d00
esi: eee51f30   edi: ffa85000   ebp: f0e49e40   esp: eee51de4
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process command-not-fou (pid: 2895, ti=eee51000 task=f08f1020 task.ti=eee51000)
Stack: f474c02c 0101 f1db2d64 c016d3eb c1782d00 ffa85000   
    96ba5598 000b f474c021 c18eff00 f0e49e40 f08e1540 eee51f30 
   c1937c78 c18eff00 c016f1e6 f474c000 c1937c78 c18eff00 c180b180 f11a7600 
Call Trace:
 [do_lookup+79/323] do_lookup+0x4f/0x143
 [link_path_walk+68/179] link_path_walk+0x44/0xb3
 [_spin_unlock+5/28] _spin_unlock+0x5/0x1c
 [get_unused_fd_flags+198/208] get_unused_fd_flags+0xc6/0xd0
 [do_path_lookup+362/463] do_path_lookup+0x16a/0x1cf
 [__path_lookup_intent_open+69/117] __path_lookup_intent_open+0x45/0x75
 [path_lookup_open+32/37] path_lookup_open+0x20/0x25
 [open_namei+114/1364] open_namei+0x72/0x554
 [unmap_vmas+791/1240] unmap_vmas+0x317/0x4d8
 [do_filp_open+37/57] do_filp_open+0x25/0x39
 [_spin_unlock+5/28] _spin_unlock+0x5/0x1c
 [get_unused_fd_flags+198/208] get_unused_fd_flags+0xc6/0xd0
 [do_sys_open+68/192] do_sys_open+0x44/0xc0
 [sys_open+28/30] sys_open+0x1c/0x1e
 [sysenter_past_esp+95/133] sysenter_past_esp+0x5f/0x85
 [xfrm_bundle_ok+53/522] xfrm_bundle_ok+0x35/0x20a
 ===
Code: f0 ff ff 0f 87 38 01 00 00 8b 46 1c 8b 44 86 20 89 44 24 14 31 ff 85 c0 
0f 84 09 01 00 00 89 c7 3d 00 f0 ff ff 0f 87 f5 00 00 00 <80> 38 2f 0f 85 9f 00 
00 00 89 f0 e8 38 e1 ff ff 64 a1 00 70 3f 
EIP: [__link_path_walk+2146/2867] __link_path_walk+0x862/0xb33 SS:ESP 
0068:eee51de4
SysRq : Emergency Sync
Emergency Sync complete
SysRq : Emergency Sync
Emergency Sync complete

The config, lspci output, Xorg.0.log, and a more complete log of the
crash attached (the crash happened around Sep  7 21:24:27 in the log,
I panicked a bit and pressed Alt-SysRq-t and emergency sync).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][Intel-IOMMU] Fix for IOMMU early crash

2007-09-07 Thread Keshavamurthy, Anil S

Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash

Populating pci_bus->sysdata way early in the pci discovery phase
sets NON-NULL value to pci_dev->sysdata which breaks the assumption
in the Intel IOMMU driver and crashes the system.


In the drivers/pci/probe.c, pci_dev->sysdata gets a copy of
its pci_bus->sysdata which is not required as
the same can be obtained from pci_dev->bus->sysdata. More over 
the left hand assignment of pci_dev->sysdata is never being used,
so their is no point is setting 
pci_dev->sysdata = pci_bus->sysdata;

This patch removes sysdata from pci_dev struct and creates a new
field called sys_data which is exclusively used 
by IOMMU driver to keep its per device context pointer.

Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]>

---
 drivers/pci/hotplug/fakephp.c |1 -
 drivers/pci/intel-iommu.c |   22 +++---
 drivers/pci/probe.c   |1 -
 include/linux/pci.h   |2 +-
 4 files changed, 12 insertions(+), 14 deletions(-)

Index: work/drivers/pci/hotplug/fakephp.c
===
--- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-08 12:00:20.0 
-0700
+++ work/drivers/pci/hotplug/fakephp.c  2007-09-08 12:07:19.0 -0700
@@ -243,7 +243,6 @@
return;
 
dev->bus = (struct pci_bus*)bus;
-   dev->sysdata = bus->sysdata;
for (devfn = 0; devfn < 0x100; devfn += 8) {
dev->devfn = devfn;
pci_rescan_slot(dev);
Index: work/drivers/pci/intel-iommu.c
===
--- work.orig/drivers/pci/intel-iommu.c 2007-09-08 12:00:47.0 -0700
+++ work/drivers/pci/intel-iommu.c  2007-09-08 12:08:20.0 -0700
@@ -1348,7 +1348,7 @@
list_del(>link);
list_del(>global);
if (info->dev)
-   info->dev->sysdata = NULL;
+   info->dev->sys_data = NULL;
spin_unlock_irqrestore(_domain_lock, flags);
 
detach_domain_for_dev(info->domain, info->bus, info->devfn);
@@ -1361,7 +1361,7 @@
 
 /*
  * find_domain
- * Note: we use struct pci_dev->sysdata stores the info
+ * Note: we use struct pci_dev->sys_data stores the info
  */
 struct dmar_domain *
 find_domain(struct pci_dev *pdev)
@@ -1369,7 +1369,7 @@
struct device_domain_info *info;
 
/* No lock here, assumes no domain exit in normal case */
-   info = pdev->sysdata;
+   info = pdev->sys_data;
if (info)
return info->domain;
return NULL;
@@ -1519,7 +1519,7 @@
}
list_add(>link, >devices);
list_add(>global, _domain_list);
-   pdev->sysdata = info;
+   pdev->sys_data = info;
spin_unlock_irqrestore(_domain_lock, flags);
return domain;
 error:
@@ -1579,7 +1579,7 @@
 static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
struct pci_dev *pdev)
 {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return 0;
return iommu_prepare_identity_map(pdev, rmrr->base_address,
rmrr->end_address + 1);
@@ -1595,7 +1595,7 @@
int ret;
 
for_each_pci_dev(pdev) {
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO ||
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO ||
!IS_GFX_DEVICE(pdev))
continue;
printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n",
@@ -1836,7 +1836,7 @@
int prot = 0;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return virt_to_bus(addr);
 
domain = get_valid_domain_for_dev(pdev);
@@ -1900,7 +1900,7 @@
unsigned long start_addr;
struct iova *iova;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
domain = find_domain(pdev);
BUG_ON(!domain);
@@ -1974,7 +1974,7 @@
size_t size = 0;
void *addr;
 
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return;
 
domain = find_domain(pdev);
@@ -2032,7 +2032,7 @@
unsigned long start_addr;
 
BUG_ON(dir == DMA_NONE);
-   if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO)
+   if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO)
return intel_nontranslate_map_sg(hwdev, sg, nelems, dir);
 
domain = get_valid_domain_for_dev(pdev);
@@ -2234,7 +2234,7 @@
for (i = 0; i < drhd->devices_cnt; i++) {
if (!drhd->devices[i])
continue;
-   drhd->devices[i]->sysdata =

Re: easy alsa patches for the stable kernel?

2007-09-07 Thread Thorsten Leemhuis

On 07.09.2007 14:58, Takashi Iwai wrote:
> At Fri, 07 Sep 2007 14:04:01 +0200,
> Thorsten Leemhuis wrote:
>> On 07.09.2007 12:21, Takashi Iwai wrote:
>>> At Fri, 07 Sep 2007 10:22:27 +0200,
>>> Romano Giannetti wrote:
 Takashi: good news!

 diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
 index 3557865..496d119 100644
 --- a/sound/pci/hda/patch_realtek.c
 +++ b/sound/pci/hda/patch_realtek.c
 @@ -9044,6 +9044,7 @@ static const char *alc268_models[ALC268_MODEL_LAST] 
 = {
  static struct snd_pci_quirk alc268_cfg_tbl[] = {
 SND_PCI_QUIRK(0x1043, 0x1205, "ASUS W7J", ALC268_3ST),
 SND_PCI_QUIRK(0x1179, 0xff10, "TOSHIBA A205", ALC268_TOSHIBA),
 +   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
 SND_PCI_QUIRK(0x103c, 0x30cc, "TOSHIBA", ALC268_TOSHIBA),
 SND_PCI_QUIRK(0x1025, 0x0126, "Acer", ALC268_ACER),
 SND_PCI_QUIRK(0x1025, 0x0130, "Acer Extensa 5210", ALC268_ACER),
>>> Ah good.  I added it to ALSA HG tree now.
>> Just wondering: should easy-and-obvious and less-risky patches like this
>> one be send to the stable-kernel-maintainers in parallel to adding them
>> to the HG-Tree (or shortly afterwards)? It could safe users lots of
>> trouble if such improvements make it quickly into production-ready
>> kernel-releases (and from there they might even find their way into some
>> distribution kernels quickly). Hardware then would "just work".
> Well, this patch is defenitely not for 2.6.23 or stable kernel.
> It's for 2.6.24.

Sorry, but why?

It's just this line afaics...
+   SND_PCI_QUIRK(0x1179, 0xff50, "TOSHIBA A305", ALC268_TOSHIBA),
...which afaics is doing nothing more then "if DMI-Data matches FOO then
apply know workaround BAR". Is that correct or am I missing something
here (another patch that this one depends on that isn't in 2.6.23 yet
maybe?)?

If my above analyze is correct (which IMHO is at least correct for some
of all those alsa-patches that get applied) then I'd say: it's worth
applying them to linus-git tree even after the merge-window, as the risk
that something is wrong is small (¹) and the benefit for users is big
enough to be worth the risk, as users get the fix in their hands 60 - 80
days (round about the time a typical devel cycle takes these days
afaics) earlier that way.

60 - 80 days might sound like not that much to some people, but if we
want to make Linux compatible to todays hardware (and not only
yesterdays) we imho can't wait nearly 1/4 of a year (or longer, as it
takes some time until such a fix hits the distributions, but that's
another part of the problem), as a typical market-lifetime of a modern
notebook is often not much longer then a year in total afaics.

(¹) -- sure, typos or stupid side-effects can happen always -- but
that's not enough a reasons to stand still

>> Sure, before the stable-maintainer will take such patches they needs to
>> be added to linus git-tree beforehand as well. And sure, patches like
>> the one above are not fixing a regression (at least in this case if I
>> read the thread correctly; the old subject thus is misleading afaics),
>> but it's similar to a new PCI-ID that gets added to a existing driver --
>> and that's done now in the stable-series afaics (¹).
>>
>> The alsa-maintainers seem to be in the best position to do this, but it
>> seems they rarely do it. I for example was hit by a regression (sound
>> worked in 2.6.20 and broke afterwards; was fixed in 2.6.23-git by the
>> following patch in case anybody is wondering:
>> http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a4eed138add1018846d17e813560b0c7c0ae8e01
>> ), but the alsa-developers did not submit it for stable afaics. Sure, I
>> could do that myself, but as I said: the alsa-maintainers really have
>> the best overview over the alsa-patches and should know which patches
>> are safe to apply for older kernels.
> 
> I occasionally do but sometimes forget. 

Nevertheless let me use to use this moment and say: thx for all your
work Takashi!

> The problem is often that I
> want first the merge to Linus tree, and then I forget to submit to
> stable tree when the merge takes long time in the end.  (Ther merge of
> alsa.git is too spotty, and that's another big problem for me.  In
> short, I do NOT maintain alsa.git tree at all...)

Then I as one of all those long-time-lkml-lurkers without programming
skills dare to say that maybe the alsa-project might need to improve its
workflow? Maybe you guys should maintain two git-trees (or multiple
branches in one tree; sorry, I'm not a git expert and not sure what the
correct terms are)?

E.g. look at how Jeff handles it for libata; he pushes big stuff during
each merge window; after that lots of small updates (new PCI-IDs) and of
course fixes make it to tree quite often (weekly normally afaics,
sometimes more often, sometimes more seldom) until nearly right before
the

Re: 2.6.23-rc3-mm1 - vdso and gettimeofday issues with glibc

2007-09-07 Thread Chuck Ebbert

On 09/01/2007 06:07 AM, Andi Kleen wrote:
>> write_seqlock_irqsave(_gtod_data.lock, flags);
>> /* copy vsyscall data */
>> vsyscall_gtod_data.clock.vread = clock->vread;
>> vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
>> vsyscall_gtod_data.clock.mask = clock->mask;
>> vsyscall_gtod_data.clock.mult = clock->mult;
>> vsyscall_gtod_data.clock.shift = clock->shift;
>> vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
>> vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;  <===
>> vsyscall_gtod_data.sys_tz = sys_tz;
>> vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;  <===
> 
> Must have been a (harmless) merging mistake, but I bet gcc optimizes it out
> anyways.
> 

I did find this after some digging:


In the vdso code:

static inline long vgetns(void)
{
cycles_t (*vread)(void);
vread = gtod->clock.vread;
return ((vread() - gtod->clock.cycle_last) * gtod->clock.mult) >>
gtod->clock.shift;
}


Looks like an open-coded version of this in the kernel timekeeping code:

static inline s64 __get_nsec_offset(void)
{
cycle_t cycle_now, cycle_delta;
s64 ns_offset;

/* read clocksource: */
cycle_now = clocksource_read(clock);

/* calculate the delta since the last update_wall_time: */
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

/* convert to nanoseconds: */
ns_offset = cyc2ns(clock, cycle_delta);

return ns_offset;
}

But the vdso version isn't doing any masking. And the mask is different for
different clocksources, so it has to track the underlying kernel's clocksource
when it gets changed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2][RESEND] improve generic_file_buffered_write()

2007-09-07 Thread Christoph Hellwig

On Fri, Sep 07, 2007 at 08:52:38PM +0200, Bernd Schubert wrote:
> No further response to our patches yet, so we are sending them again, 
> re-diffed against 2.6.23-rc5
> 
> Hi,
> 
> recently we discovered writing to a nfs-exported lustre filesystem is rather 
> slow (20-40 MB/s writing, but over 200 MB/s reading).
> 
> As I already explained on the nfs mailing list, this happens since there is 
> an 
> offset on the very first page due to the nfs header.
> 
> http://sourceforge.net/mailarchive/forum.php?thread_name=200708312003.30446.bernd-schubert%40gmx.de_name=nfs
> 
> While this especially effects lustre, Olaf Kirch also noticed it on another 
> filesystem before and wrote a nfs patch for it. This patch has two 
> disadvantages  - it requires to move all data within the pages, IMHO rather 
> cpu time consuming, furthermore, it presently causes data corruption when 
> more than one nfs thread is running.
> 
> After thinking it over and over again we (Goswin and I) believe it would be 
> best to improve generic_file_buffered_write().
> If there is sufficient data now, as it is usual for aio writes, 
> generic_file_buffered_write() will now fill each page as much as possible and 
> only then prepare/commit it. Before generic_file_buffered_write() commited 
> chunks of pages even though there were still more data.

While the idea is sound in general the code your touching is almost entirely
gone in -mm and hopefully in 2.6.24.  Can you take a look at the Nick's changes
in -mm that introduce  begin_write and end_write methods replacing prepare_write
and commit_write and see if they improve your situation already.  If not they
should at least provide a framework to deal with it in a slightly cleaner way.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Krzysztof Halasa

Folkert van Heusden <[EMAIL PROTECTED]> writes:

> Ok, but that's not the most common situaties. What I'm suggesting is a
> warning or a please note popup. Not neccessarily an error or refusing to
> continue thing.

What IMHO makes sense is changing all references to SCSI CDROM,
SCSI DISK etc. to just CDROM, DISK, and changing SCSI (menu) to
something like MASS STORAGE.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Why do so many machines need "noapic"?

2007-09-07 Thread Chuck Ebbert

On 09/06/2007 07:31 AM, Andi Kleen wrote:
> Chuck Ebbert <[EMAIL PROTECTED]> writes:
> 
>> Some systems lock up without the noapic option.
> 
> Please find patterns: cpu type, chipsets, mainboard vendors etc.
> 

This is the first one I've actually had in front of me:

  HP TX1000 notebook
  Nvidia C51/MCP51 mobile chipset

Booting with "noapic" gives some very strange results. This is two
snapshots of /proc/interrupts taken one second apart. It almost looks
like timer interrupts are occurring on IRQ 0 and IRQ7 on different
CPUs:

   CPU0   CPU1   
  0: 446096   6224XT-PIC-XTtimer
  1:342  6XT-PIC-XTi8042
  2:  0  0XT-PIC-XTcascade
  5:   3099865XT-PIC-XTsata_nv
  7:   8145 494718XT-PIC-XTehci_hcd:usb2
  8:  0  0XT-PIC-XTrtc0
  9:323  9XT-PIC-XTacpi
 10:136 36XT-PIC-XTHDA Intel
 11:  43884   1091XT-PIC-XTohci_hcd:usb1, eth0
 12:104 19XT-PIC-XTi8042
 14:   1011 25XT-PIC-XTlibata
 15:  0  0XT-PIC-XTlibata
NMI:  0  0 
LOC:   6212 445951 
ERR: 403241
MIS:  0

   CPU0   CPU1   
  0: 447098   6233XT-PIC-XTtimer
  1:343  6XT-PIC-XTi8042
  2:  0  0XT-PIC-XTcascade
  5:   3100865XT-PIC-XTsata_nv
  7:   8158 495847XT-PIC-XTehci_hcd:usb2
  8:  0  0XT-PIC-XTrtc0
  9:323  9XT-PIC-XTacpi
 10:136 36XT-PIC-XTHDA Intel
 11:  43988   1094XT-PIC-XTohci_hcd:usb1, eth0
 12:104 19XT-PIC-XTi8042
 14:   1032 26XT-PIC-XTlibata
 15:  0  0XT-PIC-XTlibata
NMI:  0  0 
LOC:   6221 446953 
ERR: 404383
MIS:  0


>> I found one
>> that will freeze while trying to set up the timer interrupt.
>> Passing 'nolapic' makes it freeze just after:
>>
>>Setting up timer through ExtINT... works
> 
> Always boot with apic=debug
> 

I can't capture the messages. Even when it boots it doesn't last
long enough to get them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/6] Linux Kernel Markers - Architecture Independent Code

2007-09-07 Thread Christoph Hellwig

On Fri, Sep 07, 2007 at 01:10:54PM -0400, [EMAIL PROTECTED] wrote:
> Anybody got a proposed scheme for the case where somebody like myself
> who is *not* a member of the Maintainer Cabal has looked at a patch, and
> found a valid show-stopper that's bigger than just whitespace (breaks on
> 64-bit, locking issues, etc), or other commentary that *should* be addressed
> before it gets merged?  I'd like *some* way to tag a patch with "I had an
> issue with V1, but the author addressed it to my satisfaction in V2"
> 
> (Note that includes "the author convinced me the patch was right and I was
> wrong"...)

I think that'd be Reviewed-By.  While you are not part of the smokey room
cabal you have shown technical expertise in various areas so it seems
perfectly fine to have reviewed-by from you.  The fix vs a previous version
should probably be just in the text with a paragraph ala:

Issue blah in a previous version as found by Valdis Kletnieks has been fixed
by doing foo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Johannes Berg

On Fri, 2007-09-07 at 18:01 +0200, Michael Buesch wrote:

> What's the problem with trying to lock it?

I think I had a problem with it once when I inserted it into some code
that was atomic and it all blew up badly ;) Nothing important really but
it sort of made me not like it much.

johannes


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 1/2][RESEND] improve generic_file_buffered_write()

2007-09-07 Thread Randy Dunlap

On Fri, 7 Sep 2007 20:52:38 +0200 Bernd Schubert wrote:

>  mm/filemap.c |  139 +
>  1 file changed, 95 insertions(+), 44 deletions(-)
> 
>  
> Index: linux-2.6.23-rc5/mm/filemap.c
> ===
> --- linux-2.6.23-rc5.orig/mm/filemap.c2007-09-06 18:33:11.0 
> +0200
> +++ linux-2.6.23-rc5/mm/filemap.c 2007-09-06 18:33:15.0 +0200
> @@ -1834,6 +1834,21 @@
>  }
>  EXPORT_SYMBOL(generic_file_direct_write);
>  

The kernel-doc still needs fixes as indicated below:

> +/**
> + * generic_file_buffered_write - handle iov'ectors
> + * @iob: file operations

s/iob/iocb/

> + * @iov: vector of data to write
> + * @nr_segs: number of iov segments
> + * @pos: position in the file
> + * @ppos:position in the file after this function
> + * @count:   number of bytes to write
> + * written:  offset in iov->base (data to skip on write)

s/written/@written/

> + *
> + * This function will do 3 main tasks for each iov:
> + * - prepare a write
> + * - copy the data from iov into a new page
> + * - commit this page
> + */
>  ssize_t
>  generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
>   unsigned long nr_segs, loff_t pos, loff_t *ppos,

Thanks.
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] list.h: add list_for_each_entry_continue_rcu

2007-09-07 Thread Johannes Berg

On Fri, 2007-09-07 at 08:34 -0700, Paul E. McKenney wrote:

> > + * Continue to iterate over rcu list of given type, continuing after
> > + * the current position.
> 
> Please add something like the following to this comment:
> 
>   Note that the caller is responsible for making sure that
>   the element remains in place between the earlier iterator
>   and this one.  One way to do this is to ensure that
>   both iterators are covered by the same rcu_read_lock(),
>   while others involve reference counts, flags, or mutexes.

Sure, will do. Should this comment also be added to
list_for_each_continue_rcu()?

johannes



signature.asc
Description: This is a digitally signed message part

[PATCH 1/2][RESEND] improve generic_file_buffered_write()

2007-09-07 Thread Bernd Schubert

No further response to our patches yet, so we are sending them again, 
re-diffed against 2.6.23-rc5

Hi,

recently we discovered writing to a nfs-exported lustre filesystem is rather 
slow (20-40 MB/s writing, but over 200 MB/s reading).

As I already explained on the nfs mailing list, this happens since there is an 
offset on the very first page due to the nfs header.

http://sourceforge.net/mailarchive/forum.php?thread_name=200708312003.30446.bernd-schubert%40gmx.de_name=nfs

While this especially effects lustre, Olaf Kirch also noticed it on another 
filesystem before and wrote a nfs patch for it. This patch has two 
disadvantages  - it requires to move all data within the pages, IMHO rather 
cpu time consuming, furthermore, it presently causes data corruption when 
more than one nfs thread is running.

After thinking it over and over again we (Goswin and I) believe it would be 
best to improve generic_file_buffered_write().
If there is sufficient data now, as it is usual for aio writes, 
generic_file_buffered_write() will now fill each page as much as possible and 
only then prepare/commit it. Before generic_file_buffered_write() commited 
chunks of pages even though there were still more data.

Some statistics:

num_writes = 4669440, bytes_total = 20231249633, segs_total = 5738644, 
commit_loops = 7697604, commits_total = 6628750

commit_loops is the number commits without the patch and commits_total the 
number of commits we actually have now. This shows a saving of nearly 14% of 
prepare, commit, cond_sched calls.

<   1:  Write size =0,  Num segs =0
<   2:  Write size =20244,  Num segs =  4455583
<   4:  Write size = 6722,  Num segs =   24
<   8:  Write size =19653,  Num segs =   213842
<  16:  Write size =31778,  Num segs =0
<  32:  Write size =73395,  Num segs =0
<  64:  Write size =   148840,  Num segs =0
< 128:  Write size =   310178,  Num segs =0
< 256:  Write size =89027,  Num segs =0
< 512:  Write size =   111903,  Num segs =0
<1024:  Write size =   140509,  Num segs =0
<2048:  Write size =   244052,  Num segs =0
<4096:  Write size =   217164,  Num segs =0
<8192:  Write size =  2784875,  Num segs =0
<   16384:  Write size =   433506,  Num segs =0
<   32768:  Write size =11742,  Num segs =0
<   65536:  Write size =15783,  Num segs =0
<  131072:  Write size = 6851,  Num segs =0
<  262144:  Write size = 1562,  Num segs =0
<  524288:  Write size =  755,  Num segs =0
< 1048576:  Write size =  531,  Num segs =0
< 2097152:  Write size =  272,  Num segs =0
< 4194304:  Write size =  107,  Num segs =0
< 8388608:  Write size =0,  Num segs =0

Write size shows the number of writes with the total size smaller than denoted 
in the first column. Num segs shows the number of writes with less segments 
than denoted in the first column. Most writes (~95%) only have one segment. 
However, no nfs activity has been done, which is actually the case we made 
the patches for.

size\num1   2   3   4   5   6   7+   
<   1:  0   24  0   0   0   0   0   
<   2:  20244   0   0   0   0   641526  0   
<   4:  67220   0   0   0   0   0   
<   8:  19653   0   0   0   0   213842  0   
<  16:  31778   0   0   0   0   213856  0   
<  32:  73395   0   0   0   0   590 0   
<  64:  147730  0   0   0   0   93626   0   
< 128:  100888  0   0   0   0   119597  0   
< 256:  85588   0   0   0   0   12  0   
< 512:  111900  0   0   0   0   3   0   
<1024:  140509  0   0   0   0   0   0   
<2048:  244052  0   0   0   0   0   0   
<4096:  217160  4   0   0   0   0   0   
<8192:  2784855 20  0   0   0   0   0   
<   16384:  433506  0   0   0   0   0   0   
<   32768:  11742   0   0   0   0   0   0   
<   65536:  15783   0   0   0   0   0   0   
<  131072:  68510   0   0   0   0   0   
<  262144:  15620   0   0   0   0   0   
<  524288:  755 0   0   0   0   0   0   
< 1048576:  531 0   0   0   0   0   0   
< 2097152:  272 0   0   0   0   0   0   
<

Re: patch: improve generic_file_buffered_write() (2nd try 1/2)

2007-09-07 Thread Nick Piggin

On Thursday 06 September 2007 03:41, Bernd Schubert wrote:

> > This comment block should be:
> >
> > /**
> >  * generic_file_buffered_write - handle an iov
> >  * @iocb:   file operations
> >  * @iov:vector of data to write
> >  * @nr_segs:number of iov segments
> >  * @pos:position in the file
> >  * @ppos:   position in the file after this function
> >  * @count:  number of bytes to write
> >  * @written:offset in iov->base (data to skip on write)
> >  *
> >  * This function will do 3 main tasks for each iov:
> >  * - prepare a write
> >  * - copy the data from iov into a new page
> >  * - commit this page
>
> Thanks, done.
>
> I also removed the FIXMEs and created a second patch.
>
> Signed-off-by: Bernd Schubert <[EMAIL PROTECTED]>
> Signed-off-by: Goswin von Brederlow <[EMAIL PROTECTED]>

Minor nit: when resubmitting a patch, you should include everything
(ie. the full changelog of problem statement and fix description) in a
single mail. It's just a bit easier...

So I believe the problem is that for a multi-segment iovec, we currently
prepare_write/commit_write once for each segment, right? We do this
because there is a nasty deadlock in the VM (copy_from_user being
called with a page locked), and copying multiple segs dramatically
increases the chances that one of these copies will cause a page fault
and thus potentially deadlock.

The fix you have I don't think can work because a filesystem must be
notified of the modification _before_ it has happened. (If I understand
correctly, you are skipping the prepare_write potentially until after
some data is copied?).

Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
also a workaround for the NFSD problem in git commit 29dbb3fc. Did
you try a later kernel to see if it is fixed there?

Thanks,
Nick

>
>  mm/filemap.c |  142 +
>  1 file changed, 96 insertions(+), 46 deletions(-)
>
> Index: linux-2.6.20.3/mm/filemap.c
> ===
> --- linux-2.6.20.3.orig/mm/filemap.c  2007-09-05 14:04:18.0 +0200
> +++ linux-2.6.20.3/mm/filemap.c   2007-09-05 18:50:26.0 +0200
> @@ -2057,6 +2057,21 @@
>  }
>  EXPORT_SYMBOL(generic_file_direct_write);
>
> +/**
> + * generic_file_buffered_write - handle iov'ectors
> + * @iob: file operations
> + * @iov: vector of data to write
> + * @nr_segs: number of iov segments
> + * @pos: position in the file
> + * @ppos:position in the file after this function
> + * @count:   number of bytes to write
> + * written:  offset in iov->base (data to skip on write)
> + *
> + * This function will do 3 main tasks for each iov:
> + * - prepare a write
> + * - copy the data from iov into a new page
> + * - commit this page
> + */
>  ssize_t
>  generic_file_buffered_write(struct kiocb *iocb, const struct iovec *iov,
>   unsigned long nr_segs, loff_t pos, loff_t *ppos,
> @@ -2074,6 +2089,11 @@
>   const struct iovec *cur_iov = iov; /* current iovec */
>   size_t  iov_base = 0;  /* offset in the current iovec */
>   char __user *buf;
> + unsigned long   data_start = (pos & (PAGE_CACHE_SIZE -1)); /* Within 
> page
> */ +  loff_t  wpos = pos; /* the position in the file we will return 
> */ +
> + /* position in file as index of pages */
> + unsigned long   index = pos >> PAGE_CACHE_SHIFT;
>
>   pagevec_init(_pvec, 0);
>
> @@ -2087,9 +2107,15 @@
>   buf = cur_iov->iov_base + iov_base;
>   }
>
> + page = __grab_cache_page(mapping, index, _page, _pvec);
> + if (!page) {
> + status = -ENOMEM;
> + goto out;
> + }
> +
>   do {
> - unsigned long index;
>   unsigned long offset;
> + unsigned long data_end; /* end of data within the page */
>   size_t copied;
>
>   offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
> @@ -2106,6 +2132,8 @@
>*/
>   bytes = min(bytes, cur_iov->iov_len - iov_base);
>
> + data_end = offset + bytes;
> +
>   /*
>* Bring in the user page that we will copy from _first_.
>* Otherwise there's a nasty deadlock on copying from the
> @@ -2114,34 +2142,30 @@
>*/
>   fault_in_pages_readable(buf, bytes);
>
> - page = __grab_cache_page(mapping,index,_page,_pvec);
> - if (!page) {
> - status = -ENOMEM;
> - break;
> - }
> -
>   if (unlikely(bytes == 0)) {
>   status = 0;
>   copied = 0;
>   goto zero_length_segment;
>   }
>
> - status = a_ops->prepare_write(file, page, offset, offset+bytes);
> - if (unlikely(status)) {
> - loff_t isize = i_size_read(inode);
> -
> -

Re: [RFC] Union Mount: Readdir approaches

2007-09-07 Thread Erez Zadok

In message <[EMAIL PROTECTED]>, "Josef 'Jeff' Sipek" writes:
> On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> > On Fri, Sep 07, 2007 at 04:31:26PM +0900, [EMAIL PROTECTED] wrote:
> > > 
> > > When the first readdir is issued:
> > > - call vfs_readdir for every underlying opened dir (file) object.
> > > - store every entry to either the hash table for the result or the
> > >   whiteout, when the same-named entry didn't exist in the tables.
> > > - to improvement the performance, the allocated memory for the hash
> > >   tables are managed in a pointer array. and the elements are
> > >   concatinated logically by the pointer.
> > > - the pointer for the result-table, the version, and the currect jiffies
> > >   are set to vdir, which is a cache in an inode.
> > > - all cache are copied to a member in a file object.
> > > - the index of the cache memory block and the offset in an array is
> > >   handled as the seek position.
> > 
> > Ok, interesting approach. So you define the seek behaviour on your
> directory cache rather than allowing the underlying filesystems to
> > interpret the seek. I guess we can do something similar with Union
> > Mounts also.
> 
> Unless I missunderstood something, Unionfs uses the same approach. Even
> Unionfs's ODF branch does the same thing. The major difference is that we
> keep the cache in a file on a disk.

Yup.

Bharata, in the long run, storing a cache of the readdir state on disk, is
the best approach by far.  Since you already spend the CPU and memory
resources to create a merged view, storing it on disk as a contiguous file
isn't that much more effort.  That effort pays off later on esp. if the
directories don't change often:

- you get a compatible behavior with seekdir/telldir (no matter how
  braindead that interface is :-)

- for subsequent directory reading, your performance actually improves
  because you don't have to repeat the duplicate elimination and whiteout
  processing -- just read the cached file from disk as any other file.  You
  then benefit from traditional readahead, and from not having to cache the
  entire contents of the readdir state file, so it falls under normal
  paging/flushing policies.

Any policy which merges the readdir info and keeps it in memory indefinitely
is problematic -- you increase average memory pressure on the system over a
longer period of time; and when you purge your readdir state from memory,
you have to recreate it from scratch, re-consuming the same CPU/memory
resources.

Our ODF code implements the readdir state caching policy, as described in
the ODF design document here:

Finally, I don't think it'll be so easy to get rid of seekdir/telldir, b/c
some of it is the default behavior of non-linux NFS/smb clients (we've seen
it with Solaris NFS clients).

Erez.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] build system: section garbage collection for vmlinux

2007-09-07 Thread Daniel Walker

On Fri, 2007-09-07 at 18:30 +0100, Denys Vlasenko wrote:
> On Friday 07 September 2007 17:31, Daniel Walker wrote:
> > On Thu, 2007-09-06 at 18:07 +0100, Denys Vlasenko wrote:
> > > A bit extended version:
> > > 
> > > In the process in making it work I saw ~10% vmlinux size reductions
> > > (which basically matches what Marcelo says) when I wasn't retaining
> > > sections needed for EXPORT_SYMBOLs, but module loading didn't work.
> > > 
> > > Thus I fixed that by adding KEEP() directives so that EXPORT_SYMBOLs
> > > are never discarded. This was just one of many fixes until kernel
> > > started to actually boot and work.
> > > 
> > > I did that before I posted patches to lkml.
> > > IOW: posted patches are not broken versus module loading.
> > 
> > Ok, this is more like the explanation I was looking for..
> > 
> > During this thread you seemed to indicate the patches you release
> > reduced the kernel ~10% , but now your saying that was pre-release ,
> > right?
> 
> CONFIG_MODULE=n will save ~10%
> CONFIG_MODULE=y - ~1%
> 
> Exact figure depends on .config (whether you happen to include
> especially "fat" code or not).
> 
> I want to explain a bit where I am coming from. I am working on busybox,
> and last release made busybox smaller by "whopping" 2%. This is the result
> of a hundred or so of small code and data shrinks.
> 
> It basically means that I am close to the point of diminishing returns
> trying to make busybox smaller, and memory wastage on the running
> embedded system is now elsewhere - including kernel.

I think this type of pruning is a good thing, you could even say the
biggest bit of low hanging fruit in terms of size reduction. 

I think your patches are good, but need some work. There are still some
changes that could reduce the kernel further (i.e. when modules are
used) .. So I'm not trying to discourage you, but you set off some
alarms with me early in the thread.. Which caused this to drag out..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Union Mount: Readdir approaches

2007-09-07 Thread Josef 'Jeff' Sipek

On Fri, Sep 07, 2007 at 01:28:55PM +0530, Bharata B Rao wrote:
> On Fri, Sep 07, 2007 at 04:31:26PM +0900, [EMAIL PROTECTED] wrote:
> > 
> > When the first readdir is issued:
> > - call vfs_readdir for every underlying opened dir (file) object.
> > - store every entry to either the hash table for the result or the
> >   whiteout, when the same-named entry didn't exist in the tables.
> > - to improvement the performance, the allocated memory for the hash
> >   tables are managed in a pointer array. and the elements are
> >   concatinated logically by the pointer.
> > - the pointer for the result-table, the version, and the currect jiffies
> >   are set to vdir, which is a cache in an inode.
> > - all cache are copied to a member in a file object.
> > - the index of the cache memory block and the offset in an array is
> >   handled as the seek position.
> 
> Ok, interesting approach. So you define the seek behaviour on your
> directory cache rather than allowing the underlying filesystems to
> interpret the seek. I guess we can do something similar with Union
> Mounts also.

Unless I missunderstood something, Unionfs uses the same approach. Even
Unionfs's ODF branch does the same thing. The major difference is that we
keep the cache in a file on a disk.

Josef 'Jeff' Sipek. 

-- 
Evolution, n.:
  A hypothetical process whereby infinitely improbable events occur with
  alarming frequency, order arises from chaos, and no one is given credit.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] build system: section garbage collection for vmlinux

2007-09-07 Thread Daniel Walker

On Fri, 2007-09-07 at 19:24 +0200, Sam Ravnborg wrote:
> Hi Daniel.
> 
> > > I did that before I posted patches to lkml.
> > > IOW: posted patches are not broken versus module loading.
> > 
> > Ok, this is more like the explanation I was looking for..
> > 
> > During this thread you seemed to indicate the patches you release
> > reduced the kernel ~10% , but now your saying that was pre-release ,
> > right? 
> 
> What are you after here?
> 
> If you read the inital post you see the actual savings and you also
> can read that it works wiht modules. The precentage can be calculated
> from these numbers if you are interested.

Right, but he contradicted that during the course of this thread.. Which
is why I'm asking about it..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] build system: section garbage collection for vmlinux

2007-09-07 Thread Denys Vlasenko

On Friday 07 September 2007 17:31, Daniel Walker wrote:
> On Thu, 2007-09-06 at 18:07 +0100, Denys Vlasenko wrote:
> > A bit extended version:
> > 
> > In the process in making it work I saw ~10% vmlinux size reductions
> > (which basically matches what Marcelo says) when I wasn't retaining
> > sections needed for EXPORT_SYMBOLs, but module loading didn't work.
> > 
> > Thus I fixed that by adding KEEP() directives so that EXPORT_SYMBOLs
> > are never discarded. This was just one of many fixes until kernel
> > started to actually boot and work.
> > 
> > I did that before I posted patches to lkml.
> > IOW: posted patches are not broken versus module loading.
> 
> Ok, this is more like the explanation I was looking for..
> 
> During this thread you seemed to indicate the patches you release
> reduced the kernel ~10% , but now your saying that was pre-release ,
> right?

CONFIG_MODULE=n will save ~10%
CONFIG_MODULE=y - ~1%

Exact figure depends on .config (whether you happen to include
especially "fat" code or not).

I want to explain a bit where I am coming from. I am working on busybox,
and last release made busybox smaller by "whopping" 2%. This is the result
of a hundred or so of small code and data shrinks.

It basically means that I am close to the point of diminishing returns
trying to make busybox smaller, and memory wastage on the running
embedded system is now elsewhere - including kernel.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] build system: section garbage collection for vmlinux

2007-09-07 Thread Sam Ravnborg

Hi Daniel.

> > I did that before I posted patches to lkml.
> > IOW: posted patches are not broken versus module loading.
> 
> Ok, this is more like the explanation I was looking for..
> 
> During this thread you seemed to indicate the patches you release
> reduced the kernel ~10% , but now your saying that was pre-release ,
> right? 

What are you after here?

If you read the inital post you see the actual savings and you also
can read that it works wiht modules. The precentage can be calculated
from these numbers if you are interested.

As explained later this patch does NOT remove exported symbols
since they may be used by modules that are not included in the actual
build.
And the patch works for x86_64.

So I'm a bit puzzeled what you are trying to bring forward here.
And please read carefully the initial posting again...

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/6] Linux Kernel Markers - Architecture Independent Code

2007-09-07 Thread Valdis . Kletnieks

On Fri, 07 Sep 2007 12:04:45 EDT, Theodore Tso said:
> This was proposed by Andrew and discussed at the Kernel Summit; the
> basic idea is that it is a formal indication that the person has done
> a *full* review of the patch (a few random comments from the local
> whitespace police don't count),

Anybody got a proposed scheme for the case where somebody like myself
who is *not* a member of the Maintainer Cabal has looked at a patch, and
found a valid show-stopper that's bigger than just whitespace (breaks on
64-bit, locking issues, etc), or other commentary that *should* be addressed
before it gets merged?  I'd like *some* way to tag a patch with "I had an
issue with V1, but the author addressed it to my satisfaction in V2"

(Note that includes "the author convinced me the patch was right and I was
wrong"...)


pgpuQ0eJfPiaT.pgp
Description: PGP signature

[GIT PULL] FireWire fix

2007-09-07 Thread Stefan Richter

Linus, please pull from the for-linus branch at

git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6.git 
for-linus

to receive a fix of the laptop-refuses-to-suspend kind. Or simply apply
the patch from this mail.


There is still an old underlying oddness though which I ask the PPC
folks to investigate and possibly fix post 2.6.23:  On iBook G3 and
older PowerBooks, the onboard FireWire controller's pci_dev
current_state remains PCI_UNKNOWN long after initialization. Sounds like
a bug in platform code to me.


Stat, log, diff:

 drivers/firewire/fw-ohci.c |6 ++
 1 files changed, 2 insertions(+), 4 deletions(-)

commit 5511142870046a7bed947d51ec9b320856ee120a
Author: Stefan Richter <[EMAIL PROTECTED]>
Date:   Thu Sep 6 09:50:30 2007 +0200

firewire: fw-ohci: ignore failure of pci_set_power_state (fix suspend 
regression)

Fixes (papers over) "Sleep problems with kernels >= 2.6.21 on powerpc",
http://lkml.org/lkml/2007/8/25/155.  The issue is that the FireWire
controller's pci_dev.current_state of iBook G3 and presumably older
PowerBooks is still in PCI_UNKNOWN instead of PCI_D0 when the firewire
driver's .suspend method is called.

Like it was suggested earlier in http://lkml.org/lkml/2006/10/24/13, we
do not fail .suspend anymore if pci_set_power_state failed.

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>

diff --git a/drivers/firewire/fw-ohci.c b/drivers/firewire/fw-ohci.c
index 7e427b4..e14c1ca 100644
--- a/drivers/firewire/fw-ohci.c
+++ b/drivers/firewire/fw-ohci.c
@@ -1945,10 +1945,8 @@ static int pci_suspend(struct pci_dev *pdev, 
pm_message_t state)
return err;
}
err = pci_set_power_state(pdev, pci_choose_state(pdev, state));
-   if (err) {
-   fw_error("pci_set_power_state failed\n");
-   return err;
-   }
+   if (err)
+   fw_error("pci_set_power_state failed with %d\n", err);
 
return 0;
 }

-- 
Stefan Richter
-=-=-=== =--= --===
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: origin of __tmp1930643048 network device name: kernel-space or user-space

2007-09-07 Thread davide rossetti

> > I'm assuming you're running some sort of Fedora/RHEL/
> > derivative; this is what you get when you have a device that starts
> > out named ethX, but which needed to be renamed so that an already
> > configured ethX could be changed to that name.
>
> yes, it's FC6.
>
> > For the new device, either add a HWADDR in a ifcfg-ethX file for
> > that interface, add something to /etc/mactab, or add a udev rule.
>
> seems like HWADDR is incompatible with bonding there is some
> message using HWADDR as well as MASTER=bond0 and SLAVE=yes.

for the records I worked-around it:
- problem seems to be the interface 'bond0', which is started first,
stops the ability to create another device named 'eth0' (WHY?)
- so I shifted all of the interface names: 0->1, 1->2,...
now it is ok. still I do not understand why bond0 seems to interfere
with eth0...

davide
-- 
[EMAIL PROTECTED] ICQ:290677265 SKYPE:d.rossetti
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] build system: section garbage collection for vmlinux

2007-09-07 Thread Daniel Walker

On Thu, 2007-09-06 at 18:07 +0100, Denys Vlasenko wrote:

> A bit extended version:
> 
> In the process in making it work I saw ~10% vmlinux size reductions
> (which basically matches what Marcelo says) when I wasn't retaining
> sections needed for EXPORT_SYMBOLs, but module loading didn't work.
> 
> Thus I fixed that by adding KEEP() directives so that EXPORT_SYMBOLs
> are never discarded. This was just one of many fixes until kernel
> started to actually boot and work.
> 
> I did that before I posted patches to lkml.
> IOW: posted patches are not broken versus module loading.

Ok, this is more like the explanation I was looking for..

During this thread you seemed to indicate the patches you release
reduced the kernel ~10% , but now your saying that was pre-release ,
right? 

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread David Brownell

> If a device has a . scheme this implies possibility of
> having several instances of said device in a box. There are a few of
> platform devices that can only have one instance

Like USB peripheral controllers.  Only one external "B" type
connector is allowed.


>   - for example i8042
> keyboard controller (the -1 special handling came from me because
> i80420 name was very confusing - there wasn't a dot separator in the
> name back then).

There were other devices with similar issues.


>   Drivers that allow multiple devices should not
> attempt to use -1 for the very first instance - this should eliminate
> potential for error and special handling that you are talking about.

For that matter, a *driver* should never create its own device node(s)
in the first place.  Device creation belongs elsewhere, like as part of
platform setup or, for busses with integral enumeration support like
PCI or USB, bus glue.  Linux is moving away from that legacy model.

I realize that may be more easily said than done in some cases,
like i8042 on non-PNP systems.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Platform device id

2007-09-07 Thread Jean Delvare

Hi Dmitry,

Thanks for your answer.

On Fri, 7 Sep 2007 10:58:31 -0400, Dmitry Torokhov wrote:
> On 9/7/07, Jean Delvare <[EMAIL PROTECTED]> wrote:
> > While platform_device.id is a u32, platform_device_add() handles "-1" as
> > a special id value. This has potential for confusion and bugs. One such
> > bug was reported to me by David Brownell:
> >
> > http://lists.lm-sensors.org/pipermail/i2c/2007-September/001787.html
> >
> > And since then I've found two other drivers  affected (uartlite and
> > i2c-pxa).
> >
> > Could we at least make platform_device.id an int so as to clear up the
> > confusion? I doubt that the id will ever be a large number anyway.
> >
> > To go one step further, I am questioning the real value of this naming
> > exception for these "unique" platform devices. On top of the bugs I
> > mentioned above, it has potential for compatibility breakage: adding a
> > second device of the same type will rename the first one from "foo" to
> > "foo.0". It also requires specific checks in many individual platform
> > drivers. All this, as I understand it, for a purely aesthetic reason. I
> > don't think this is worth it. Would there be any objection to simply
> > getting rid of this exception and having all platform devices named
> > "foo.%d"?
> 
> If a device has a . scheme this implies possibility of
> having several instances of said device in a box.

This "allows" more than "implies".

>   There are a few of
> platform devices that can only have one instance - for example i8042
> keyboard controller (the -1 special handling came from me because
> i80420 name was very confusing - there wasn't a dot separator in the
> name back then).

I agree that in general there is a single i8042 keyboard controller on
a system, but what prevents someone to build a system with several of
these (e.g. in a multi-console computer)?

I agree that "i80420" was a confusing name, but now that a dot was
inserted between the name and the id, it wouldn't be a problem to have
a device named "i8042.0", would it?

>  Drivers that allow multiple devices should not
> attempt to use -1 for the very first instance - this should eliminate
> potential for error and special handling that you are talking about.

This isn't that easy. For a given kind of device, some systems might
have only one, it might even be strictly impossible to ever have more
than one by design, but other systems might not have this limitation
and may actually have several instances of said device. As we try to
make our drivers as platform-independent as possible, the drivers
themselves can't assume that only either scheme is used, they have to
support both.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1] mspec: handle shrinking virtual memory areas

2007-09-07 Thread Cliff Wickman


The shrinking of a virtual memory area that is mmap(2)'d to a memory
special file (device drivers/char/mspec.c) can cause a panic.

If the mapped size of the vma (vm_area_struct) is very large, mspec allocates
a large vma_data structure with vmalloc(). But such a vma can be shrunk by
an munmap(2).  The current driver uses the current size of each vma to
deduce whether its vma_data structure was allocated by kmalloc() or vmalloc().
So if the vma was shrunk it appears to have been allocated by kmalloc(),
and mspec attempts to free it with kfree().  This results in a panic.

This patch avoids the panic (by preserving the type of the allocation) and
also makes mspec work correctly as the vma is split into pieces by the
munmap(2)'s.

All vma's derived from such a split vma share the same vma_data structure that
represents all the pages mapped into this set of vma's.  The mpec driver
must be made capable of using the right portion of the structure for each
member vma.  In other words, it must index into the array of page addresses
using the portion of the array that represents the current vma. This is
enabled by storing the vma group's vm_start in the vma_data structure.

The vma's are protected by mm->mmap_sem, so the reference count was changed
from an atomic_t to an int.

Diffed against 2.6.13-rc5

Signed-off-by: Cliff Wickman <[EMAIL PROTECTED]>
Acked-by: Jes Sorensen <[EMAIL PROTECTED]>
-
---
 drivers/char/mspec.c |   68 +--
 1 file changed, 45 insertions(+), 23 deletions(-)

Index: mspec_community/drivers/char/mspec.c
===
--- mspec_community.orig/drivers/char/mspec.c
+++ mspec_community/drivers/char/mspec.c
@@ -67,7 +67,7 @@
 /*
  * Page types allocated by the device.
  */
-enum {
+enum mspec_page_type {
MSPEC_FETCHOP = 1,
MSPEC_CACHED,
MSPEC_UNCACHED
@@ -83,15 +83,26 @@ static int is_sn2;
  * One of these structures is allocated when an mspec region is mmaped. The
  * structure is pointed to by the vma->vm_private_data field in the vma struct.
  * This structure is used to record the addresses of the mspec pages.
+ * This structure is shared by all vma's that are split off from the
+ * original vma when split_vma()'s are done.
+ *
+ * The refcnt is incremented non-atomically because all paths leading
+ * to mspec_open() and mspec_close() are single threaded by the exclusive
+ * locking of mm->mmap_sem.
  */
 struct vma_data {
-   atomic_t refcnt;/* Number of vmas sharing the data. */
+   int refcnt; /* Number of vmas sharing the data. */
spinlock_t lock;/* Serialize access to the vma. */
int count;  /* Number of pages allocated. */
-   int type;   /* Type of pages allocated. */
+   enum mspec_page_type type; /* Type of pages allocated. */
+   int flags;  /* See VMD_xxx below. */
+   unsigned long vm_start; /* Original (unsplit) base. */
+   unsigned long vm_end;   /* Original (unsplit) end. */
unsigned long maddr[0]; /* Array of MSPEC addresses. */
 };
 
+#define VMD_VMALLOCED 0x1  /* vmalloc'd rather than kmalloc'd */
+
 /* used on shub2 to clear FOP cache in the HUB */
 static unsigned long scratch_page[MAX_NUMNODES];
 #define SH2_AMO_CACHE_ENTRIES  4
@@ -129,8 +140,8 @@ mspec_zero_block(unsigned long addr, int
  * mspec_open
  *
  * Called when a device mapping is created by a means other than mmap
- * (via fork, etc.).  Increments the reference count on the underlying
- * mspec data so it is not freed prematurely.
+ * (via fork, munmap, etc.).  Increments the reference count on the
+ * underlying mspec data so it is not freed prematurely.
  */
 static void
 mspec_open(struct vm_area_struct *vma)
@@ -138,7 +149,7 @@ mspec_open(struct vm_area_struct *vma)
struct vma_data *vdata;
 
vdata = vma->vm_private_data;
-   atomic_inc(>refcnt);
+   vdata->refcnt++;
 }
 
 /*
@@ -151,34 +162,38 @@ static void
 mspec_close(struct vm_area_struct *vma)
 {
struct vma_data *vdata;
-   int i, pages, result, vdata_size;
+   int index, last_index, result;
 
vdata = vma->vm_private_data;
-   if (!atomic_dec_and_test(>refcnt))
-   return;
 
-   pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
-   vdata_size = sizeof(struct vma_data) + pages * sizeof(long);
-   for (i = 0; i < pages; i++) {
-   if (vdata->maddr[i] == 0)
+   BUG_ON(vma->vm_start < vdata->vm_start || vma->vm_end > vdata->vm_end);
+
+   index = (vma->vm_start - vdata->vm_start) >> PAGE_SHIFT;
+   last_index = (vma->vm_end - vdata->vm_start) >> PAGE_SHIFT;
+   for (; index < last_index; index++) {
+   if (vdata->maddr[index] == 0)
continue;
/*
 * Clear the page before sticking it back
 * into the pool.
 */

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Stefan Richter

Folkert van Heusden wrote:
>> I know that it's difficult to get people to read docs & help text,
>> and maybe it is needed in more places, but CONFIG_ATA (SATA/PATA)
>> help text says:
>>   NOTE: ATA enables basic SCSI support; *however*,
>>   'SCSI disk support', 'SCSI tape support', or
>>   'SCSI CDROM support' may also be needed,
>>   depending on your hardware configuration.
> 
> Yes but that would mean that you have to open the help for each item
> that you add.
> 
>> A popup makes some sense, but I don't know if menuconfig knows how to
>> do popup warnings... and it needs to be done for all *configs,
>> not just menuconfig.
> 
> Maybe add a new type?

How about

comment "Note: 'SCSI disk support' is required for SATA/PATA HDDs!"
depends on ATA && !BLK_DEV_SD

-- 
Stefan Richter
-=-=-=== =--= --===
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/6] Linux Kernel Markers - Architecture Independent Code

2007-09-07 Thread Theodore Tso

On Thu, Sep 06, 2007 at 04:37:37PM -0700, Randy Dunlap wrote:
> Thanks.  I look forward to the explanation of Reviewed-by, what it
> means, and how it differs from Acked-by.

This was proposed by Andrew and discussed at the Kernel Summit; the
basic idea is that it is a formal indication that the person has done
a *full* review of the patch (a few random comments from the local
whitespace police don't count), and is willing to vouch that the patch
is correct, safe, extremely unlikely to cause regressions, etc.  If
the patch does need to be reverted or fixed because it was buggy, then
both the original submitter and the reviewer would bear responsibility
and subsystem maintainers might take that into account when assessing
the reputations of the submitter and reviewer in the future when
deciding whether or not to accept a patch.

Basically, some people seem to be using "Acked-by" to mean, "seems
good to me", without necessarily doing a full review of the patch, and
instead of trying to change the meaning of "Acked-by", to have a new
sign off which is a bit more explicitly about what it means.  (Hmmm,
thinking about it afterwards, maybe "Vouched-by:" would be even
better)

There was some thought about negative attention (i.e., "public
mockery") given to people who sign off on a patch via Reviewed-by:
that subsequently turns out to be buggy or cause a regression, but the
concern with that is that we have enough trouble finding people to
review patches, and we wouldn't want to scare off reviewers.  But it
would be fair to say that the consequences of reviewing patches
successfully or unsuccessfully would naturally impact people's
reputations.

There was also some discussion about whether or not patches would not
be accepted at all without a Reviewed-by, but that probably won't
happen initially.  The general consensus was to gently ease into it
and see how well it works first.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

PATCH to bug #8876

2007-09-07 Thread Nikolay Kopitonenko

Hi there!

Below is a fix for this:
http://bugzilla.kernel.org/show_bug.cgi?id=8876


Applies to any version since 2.6.22 to latest: 2.6.23-rc5-git1

please apply :)


-CUT-
diff -urN a/net/ipv4/devinet.c b/net/ipv4/devinet.c
--- a/net/ipv4/devinet.c2007-07-09 02:32:17.0 +0300
+++ b/net/ipv4/devinet.c2007-08-10 20:33:22.0 +0300
@@ -1193,7 +1193,7 @@
for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
 ifa = ifa->ifa_next, ip_idx++) {
if (ip_idx < s_ip_idx)
-   goto cont;
+   continue;
if (inet_fill_ifaddr(skb, ifa, NETLINK_CB(cb->skb).pid,
 cb->nlh->nlmsg_seq,
 RTM_NEWADDR, NLM_F_MULTI) <= 0)
-/CUT-

Signed-off-by: [EMAIL PROTECTED]


Thanks

Nikolay Kopitonenko
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Smack: Simplified Mandatory Access Control Kernel

2007-09-07 Thread Casey Schaufler


--- Kyle Moffett <[EMAIL PROTECTED]> wrote:

> ...
> 
> As for the script, I'm partway through debugging it but my time is  
> all chewed up with other stuff now, so it may take me an extra couple  
> days.

Any progress on this?


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: scheduling while atomic: ifconfig/0x00000002/4170

2007-09-07 Thread Michael Buesch

On Friday 07 September 2007, Johannes Berg wrote:
> On Thu, 2007-09-06 at 08:46 -0700, Paul E. McKenney wrote:
> 
> > Looks good to me from an RCU viewpoint.  I cannot claim familiarity with
> > this code.  I therefore especially like the indications of where RTNL
> > is held and not!!!
> 
> :)
> 
> > Some questions below based on a quick scan.  And a global question:
> > should the comments about RTNL being held be replaced by ASSERT_RTNL()?
> 
> I don't like ASSERT_RTNL() much because it actually tries to lock it.
> I'd be much happer if it was WARN_ON(!mutex_locked(_mutex)) or
> something equivalent.

What's the problem with trying to lock it?
In the paths where you insert this assertion, you will be locked.
So the trylock will fail and not cause any blocking or something else.
It's basically not more expensive than your mutex_locked test.
And the !mutex_locked test might not work on UP (Not sure, about
the current implementation.)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Folkert van Heusden

> > Maybe it is a nice enhancement for make menuconfig to more explicitly
> > give a pop-up or so when someone selects for example a sata controller
> > while no 'scsi-disk' support was selected?
> 
> I know that it's difficult to get people to read docs & help text,
> and maybe it is needed in more places, but CONFIG_ATA (SATA/PATA)
> help text says:
>   NOTE: ATA enables basic SCSI support; *however*,
>   'SCSI disk support', 'SCSI tape support', or
>   'SCSI CDROM support' may also be needed,
>   depending on your hardware configuration.

Yes but that would mean that you have to open the help for each item
that you add.

> A popup makes some sense, but I don't know if menuconfig knows how to
> do popup warnings... and it needs to be done for all *configs,
> not just menuconfig.

Maybe add a new type?


Folkert van Heusden

-- 
MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san
den commando spiti puru. Piki puru spesrutu sani, wroko nanga difreti
kroru, tja kon makandra, nanga wan lo moro.
http://www.vanheusden.com/multitail/
--
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 5/5] Add DMA engine driver for Freescale MPC85xx processors.

2007-09-07 Thread Nelson, Shannon

>From: Zhang Wei [mailto:[EMAIL PROTECTED] 
>Sent: Friday, September 07, 2007 3:54 AM
>To: [EMAIL PROTECTED]; Nelson, Shannon
>Cc: linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; 
>[EMAIL PROTECTED]; Zhang Wei; Ebony Zhu
>Subject: [PATCH 5/5] Add DMA engine driver for Freescale 
>MPC85xx processors.
>
>The driver implements DMA engine API for Freescale MPC85xx DMA
>controller, which could be used for MEM<-->MEM, IO_ADDR<-->MEM
>and IO_ADDR<-->IO_ADDR data transfer.
>The driver supports the Basic mode of Freescale MPC85xx DMA controller.
>The MPC85xx processors supported include MPC8540/60, MPC8555, MPC8548,
>MPC8641 and so on.
>The support for MPC83xx(MPC8349, MPC8360) is experimental.
>
>Signed-off-by: Zhang Wei <[EMAIL PROTECTED]>
>Signed-off-by: Ebony Zhu <[EMAIL PROTECTED]>
>---
> drivers/dma/Kconfig  |8 +
> drivers/dma/Makefile |1 +
> drivers/dma/fsldma.c |  995 
>++
> drivers/dma/fsldma.h |  188 ++
> 4 files changed, 1192 insertions(+), 0 deletions(-)
> create mode 100644 drivers/dma/fsldma.c
> create mode 100644 drivers/dma/fsldma.h
>
>diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
>index 8f670da..a99e925 100644
>--- a/drivers/dma/Kconfig
>+++ b/drivers/dma/Kconfig
>@@ -40,4 +40,12 @@ config INTEL_IOP_ADMA
> ---help---
>   Enable support for the Intel(R) IOP Series RAID engines.
> 
>+config FSL_DMA
>+  bool "Freescale MPC85xx/MPC83xx DMA support"
>+  depends on DMA_ENGINE && PPC
>+  ---help---
>+Enable support for the Freescale DMA engine. Now, it support
>+MPC8560/40, MPC8555, MPC8548 and MPC8641 processors.
>+The MPC8349, MPC8360 support is experimental.
>+
> endmenu

If this is experimental, perhaps you should mark the depends line as
such
depends on on DMA_ENGINE && PPC && EXPERIMENTAL

[...]

>+static int fsl_dma_self_test(struct fsl_dma_chan *fsl_chan)
>+{
>+  struct dma_chan *chan;
>+  int err = 0;
>+  dma_addr_t addr;
>+  dma_cookie_t cookie;
>+  u8 *src, *dest;
>+  int i;
>+  size_t test_size;
>+  struct dma_async_tx_descriptor *tx1, *tx2, *tx3;
>+  struct fsl_dma_device *fdev;
>+
>+  BUG_ON(!fsl_chan);
>+
>+  fdev = fsl_chan->device;
>+  test_size = 4096;
>+
>+  src = kmalloc(test_size * 2, GFP_KERNEL);
>+  if (!src) {
>+  dev_err(fdev->dev,
>+  "selftest: Can not alloc memory 
>for test!\n");
>+  err = -ENOMEM;
>+  goto out;
>+  }
>+
>+  dest = src + test_size;
>+
>+  for (i = 0; i < test_size; i++)
>+  src[i] = (u8) i;
>+
>+  chan = _chan->common;
>+
>+  if (fsl_dma_alloc_chan_resources(chan) < 1) {
>+  dev_err(fdev->dev,
>+  "selftest: Can not alloc 
>resources for DMA\n");
>+  err = -ENODEV;
>+  goto out;
>+  }
>+
>+  /* TX 1 */
>+  tx1 = fsl_dma_prep_memcpy(chan, test_size / 2, 0);
>+  async_tx_ack(tx1);
>+  addr = dma_map_single(chan->device->dev, src, test_size / 2,
>+  DMA_TO_DEVICE);
>+  fsl_dma_set_src(addr, tx1, 0);
>+  addr = dma_map_single(chan->device->dev, dest, test_size / 2,
>+  
>DMA_FROM_DEVICE);
>+  fsl_dma_set_dest(addr, tx1, 0);
>+
>+  cookie = fsl_dma_tx_submit(tx1);
>+  fsl_dma_memcpy_issue_pending(chan);
>+
>+  while (fsl_dma_is_complete(chan, cookie, NULL, NULL)
>+  != DMA_SUCCESS);

Is this guaranteed to finish?  If there's something wrong and the DMA
never completes, you've now hung this thread.  This is why the ioat_dma
engine does an msleep() and then checks once for completion.  You might
think about this...

>+
>+  /* Test free and re-alloc channel resources */
>+  fsl_dma_free_chan_resources(chan);
>+
>+  if (fsl_dma_alloc_chan_resources(chan) < 1) {
>+  dev_err(fdev->dev,
>+  "selftest: Can not alloc 
>resources for DMA\n");
>+  err = -ENODEV;
>+  goto out;
>+  }
>+
>+  /* Continue to test
>+   * TX 2
>+   */
>+  tx2 = fsl_dma_prep_memcpy(chan, test_size / 4, 0);
>+  async_tx_ack(tx2);
>+  addr = dma_map_single(chan->device->dev, src + test_size / 2,
>+  test_size / 4, DMA_TO_DEVICE);
>+  fsl_dma_set_src(addr, tx2, 0);
>+  addr = dma_map_single(chan->device->dev, dest + test_size / 2,
>+  test_size / 4, DMA_FROM_DEVICE);
>+  fsl_dma_set_dest(addr, tx2, 0);
>+
>+  /* TX 3 */
>+  tx3 = fsl_dma_prep_memcpy(chan, test_size / 4, 0);
>+  async_tx_ack(tx3);
>+  addr = dma_map_single(chan->device->dev, src + 
>test_size * 3 / 4,
>+  test_size / 4, DMA_TO_DEVICE);
>+  fsl_dma_set_src(addr, tx3, 0);
>+  addr =

Re: [PATCH 5/5] Add DMA engine driver for Freescale MPC85xx processors.

2007-09-07 Thread Randy Dunlap

On Fri,  7 Sep 2007 18:54:18 +0800 Zhang Wei wrote:

> Signed-off-by: Zhang Wei <[EMAIL PROTECTED]>
> Signed-off-by: Ebony Zhu <[EMAIL PROTECTED]>
> ---
>  drivers/dma/Kconfig  |8 +
>  drivers/dma/Makefile |1 +
>  drivers/dma/fsldma.c |  995 
> ++
>  drivers/dma/fsldma.h |  188 ++
>  4 files changed, 1192 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/dma/fsldma.c
>  create mode 100644 drivers/dma/fsldma.h
> 
> --- /dev/null
> +++ b/drivers/dma/fsldma.c
> @@ -0,0 +1,995 @@

Thanks for using kernel-doc notation.  However, ...

> +/**
> + * fsl_dma_alloc_descriptor - Allocate descriptor from channel's DMA pool.

Function parameters need to be listed & described here.
See Documentation/kernel-doc-nano-HOWTO.txt or other source files
for examples.

(Applies to all documented function interfaces here.)

> + *
> + * Return - The descriptor allocated. NULL for failed.
> + */
> +static struct fsl_desc_sw *fsl_dma_alloc_descriptor(
> + struct fsl_dma_chan *fsl_chan,
> + gfp_t flags)
> +{
...
> +}

> +/**
> + * fsl_chan_xfer_ld_queue -- Transfer the link descriptors in channel
> + *   ld_queue.

The function's "short description" (unfortunately) must be on only one
line.  E.g.:

 * fsl_chan_xfer_ld_queue - Transfer link descriptors in channel ld_queue.

> + */
> +static void fsl_chan_xfer_ld_queue(struct fsl_dma_chan *fsl_chan)
> +{
...
> +}

> diff --git a/drivers/dma/fsldma.h b/drivers/dma/fsldma.h
> new file mode 100644
> index 000..05be9ed
> --- /dev/null
> +++ b/drivers/dma/fsldma.h
> @@ -0,0 +1,188 @@
> +struct fsl_dma_chan_regs {
> + __mix32 mr; /* 0x00 - Mode Register */
> + __mix32 sr; /* 0x04 - Status Register */
> + __mix64 cdar;   /* 0x08 - Cureent descriptor address register */

  Current

> + __mix64 sar;/* 0x10 - Source Address Register */
> + __mix64 dar;/* 0x18 - Destination Address Register */
> + __mix32 bcr;/* 0x20 - Byte Count Register */
> + __mix64 ndar;   /* 0x24 - Next Descriptor Address Register */
> +};

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Race in the inotify debug code

2007-09-07 Thread Nick Piggin

On Thursday 06 September 2007 02:05, Andrew Morton wrote:
> > On Thu, 23 Aug 2007 11:25:18 -0400 Chuck Ebbert <[EMAIL PROTECTED]>
> > wrote: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=248355
> >
> > Description of problem:
> > Warnings in the kernel log (dmesg):
> > BUG: warning at fs/inotify.c:172/set_dentry_child_flags() (Not tainted)
> >  [] set_dentry_child_flags+0x67/0x13d
> >  [] remove_watch_no_event+0x2f/0x3b
> >  [] inotify_remove_watch_locked+0x12/0x3e
> >  [] mutex_lock+0x1a/0x29
> >  [] inotify_rm_wd+0x6d/0x8a
> >  [] sys_inotify_rm_watch+0x38/0x4f
> >  [] syscall_call+0x7/0xb
> >
> > Appears randomly, about every second/third day.
> >
> > Still happening in kernel 2.6.22.
> >
> >
> > static void set_dentry_child_flags(struct inode *inode, int watched)
> > ...
> > spin_lock(_lock);
> > list_for_each_entry(alias, >i_dentry, d_alias) {
> > struct dentry *child;
> >
> > list_for_each_entry(child, >d_subdirs,
> > d_u.d_child) { if (!child->d_inode) {
> > WARN_ON(child->d_flags &
> > DCACHE_INOTIFY_PARENT_WATCHED); continue;
> > }
> >
> > But in dcache.c, the locks are dropped before this flag is cleared,
> > leaving a race window:
> >
> > void d_delete(struct dentry * dentry)
> > ...
> > spin_lock(_lock);
> > spin_lock(>d_lock);
> > isdir = S_ISDIR(dentry->d_inode->i_mode);
> > if (atomic_read(>d_count) == 1) {
> > dentry_iput(dentry);  < drops dcache_lock
> > and dentry->d_lock fsnotify_nameremove(dentry, isdir);
> >
> > /* remove this and other inotify debug checks after
> > 2.6.18 */ dentry->d_flags &= ~DCACHE_INOTIFY_PARENT_WATCHED; return;
> > }
> >
> > (The comment is nice, it says the debug code should have been removed
> > long ago.)
>
> We've been chasing this bug for a year or so.  Thanks for maybe-solving it.
> I forwarded your email to Nick a few days ago but he's presently tied up
> with kernel slummit.  Please let us not forget about this?


There is some race in the debug code, yes, but I think there could also be
a real race in there too. I've posted a trial patch for it in one of the other
inotify bug reports.

Anyway, yes I'm inclined to just fix that and rip out the debug code. OTOH,
I have been trying to get some tester to reproduce with patches and had
no takers as yet...

BTW, I will be away for the next few weeks so I'll be going slower than usual
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] core: fix build error when referencing arch specific structures

2007-09-07 Thread Mike Travis

Andrew Morton wrote:
>> On Fri, 7 Sep 2007 08:28:05 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote:
>> On Friday 07 September 2007 05:09, [EMAIL PROTECTED] wrote:
>>> Since the core kernel routines need to reference cpu_sibling_map,
>>> whether it be a static array or a per_cpu data variable, an access
>>> function has been defined.
>>>
>>> In addition, changes have been made to the ia64 and ppc64 arch's to
>>> move the cpu_sibling_map from a static cpumask_t array [NR_CPUS] to
>>> be per_cpu cpumask_t arrays.
>>>
>>> Note that I do not have the ability to build or test patch 3/3, the
>>> ppc64 changes.
>>>
>>> Patches are referenced against 2.6.23-rc4-mm1 .
>> It would be better if you could redo the patches with the original patches
>> reverted, not incremental changes. In the end we'll need a full patch set
>> with full changelog anyways, not a series of incremental fixes.
> 
> yup
>  
>> Also I guess some powerpc testers would be needed. Perhaps cc the
>> maintainers?
>>
> 
> yup
> 
> All architectures except sparc64 are now done - please have a shot at doing
> sparc64 as well.

Ok, will do.  I didn't realize there was only one more that used the SCHED_SMT
code.

> 
> I'd suggest that we not implement that cpu_sibling_map() macro and just
> open-code the per_cpu() everywhere.  So henceforth any architecture which
> implements CONFIG_SCHED_SMT must implement the per-cpu sibling map.

Yes, with only one more to do it's not that daunting. ;-)

> That's nice and simple, and avoids the unpleasant
> pretend-function-used-as-an-lvalue trick.  (Well OK, per_cpu() does
> that, but let's avoid resinning).

Yes, the per_cpu macro is quite the specimen. ;-)

Thanks!
Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] core: fix build error when referencing arch specific structures

2007-09-07 Thread Mike Travis

Andi Kleen wrote:
> On Friday 07 September 2007 05:09, [EMAIL PROTECTED] wrote:
>> Since the core kernel routines need to reference cpu_sibling_map,
>> whether it be a static array or a per_cpu data variable, an access
>> function has been defined.
>>
>> In addition, changes have been made to the ia64 and ppc64 arch's to
>> move the cpu_sibling_map from a static cpumask_t array [NR_CPUS] to
>> be per_cpu cpumask_t arrays.
>>
>> Note that I do not have the ability to build or test patch 3/3, the
>> ppc64 changes.
>>
>> Patches are referenced against 2.6.23-rc4-mm1 .
> 
> It would be better if you could redo the patches with the original patches
> reverted, not incremental changes. In the end we'll need a full patch set
> with full changelog anyways, not a series of incremental fixes.

Will do.  Thanks.

I take it I should run a diff against rc4 (w/o mm1) to regenerate a
complete patch, including the prior ones?

> 
> Also I guess some powerpc testers would be needed. Perhaps cc the
> maintainers?

I've been looking for where to Cc: those guys (as Andrew probably realizes
from his extra "spam" from me. ;-)

Thanks!
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] ppc64: Convert cpu_sibling_map to a per_cpu data array

2007-09-07 Thread Mike Travis

Kamalesh Babulal wrote:
> Kamalesh Babulal wrote:
>> [EMAIL PROTECTED] wrote:
>>> Convert cpu_sibling_map to a per_cpu cpumask_t array for the ppc64
>>> architecture.
>>>
>>> Note: these changes have not been built nor tested.
>>>
>>> Note: I also don't know if these changes are particularly
>>> relevant for the ppc64 architecture.
>>>
>>> Signed-off-by: Mike Travis <[EMAIL PROTECTED]>
>>> ---
>>>  arch/powerpc/kernel/setup-common.c|4 ++--
>>>  arch/powerpc/kernel/smp.c |4 ++--
>>>  arch/powerpc/platforms/cell/cbe_cpufreq.c |2 +-
>>>  include/asm-powerpc/smp.h |3 ++-
>>>  include/asm-powerpc/topology.h|2 +-
>>>  5 files changed, 8 insertions(+), 7 deletions(-)
>>>
>>> --- a/arch/powerpc/kernel/setup-common.c
>>> +++ b/arch/powerpc/kernel/setup-common.c
>>> @@ -415,9 +415,9 @@
>>>   * Do the sibling map; assume only two threads per processor.
>>>   */
>>>  for_each_possible_cpu(cpu) {
>>> -cpu_set(cpu, cpu_sibling_map[cpu]);
>>> +cpu_set(cpu, cpu_sibling_map(cpu));
>>>  if (cpu_has_feature(CPU_FTR_SMT))
>>> -cpu_set(cpu ^ 0x1, cpu_sibling_map[cpu]);
>>> +cpu_set(cpu ^ 0x1, cpu_sibling_map(cpu));
>>>  }
>>>
>>>  vdso_data->processorCount = num_present_cpus();
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -61,11 +61,11 @@
>>>
>>>  cpumask_t cpu_possible_map = CPU_MASK_NONE;
>>>  cpumask_t cpu_online_map = CPU_MASK_NONE;
>>> -cpumask_t cpu_sibling_map[NR_CPUS] = { [0 ... NR_CPUS-1] =
>>> CPU_MASK_NONE };
>>> +DEFINE_PER_CPU(cpumask_t, cpu_sibling_map) = CPU_MASK_NONE;
>>>
>>>  EXPORT_SYMBOL(cpu_online_map);
>>>  EXPORT_SYMBOL(cpu_possible_map);
>>> -EXPORT_SYMBOL(cpu_sibling_map);
>>> +EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
>>>
>>>  /* SMP operations for this machine */
>>>  struct smp_ops_t *smp_ops;
>>> --- a/arch/powerpc/platforms/cell/cbe_cpufreq.c
>>> +++ b/arch/powerpc/platforms/cell/cbe_cpufreq.c
>>> @@ -117,7 +117,7 @@
>>>  policy->cur = cbe_freqs[cur_pmode].frequency;
>>>
>>>  #ifdef CONFIG_SMP
>>> -policy->cpus = cpu_sibling_map[policy->cpu];
>>> +policy->cpus = cpu_sibling_map(policy->cpu);
>>>  #endif
>>>
>>>  cpufreq_frequency_table_get_attr(cbe_freqs, policy->cpu);
>>> --- a/include/asm-powerpc/smp.h
>>> +++ b/include/asm-powerpc/smp.h
>>> @@ -58,7 +58,8 @@
>>>  (smp_hw_index[(cpu)] = (phys))
>>>  #endif
>>>
>>> -extern cpumask_t cpu_sibling_map[NR_CPUS];
>>> +DECLARE_PER_CPU(cpumask_t, cpu_sibling_map);
>>> +#define cpu_sibling_map(cpu) per_cpu(cpu_sibling_map, cpu)
>>>
>>>  /* Since OpenPIC has only 4 IPIs, we use slightly different message
>>> numbers.
>>>   *
>>> --- a/include/asm-powerpc/topology.h
>>> +++ b/include/asm-powerpc/topology.h
>>> @@ -108,7 +108,7 @@
>>>  #ifdef CONFIG_PPC64
>>>  #include 
>>>
>>> -#define topology_thread_siblings(cpu)(cpu_sibling_map[cpu])
>>> +#define topology_thread_siblings(cpu)(cpu_sibling_map(cpu))
>>>  #endif
>>>  #endif
>>>
>>>
>>>   
>> Hi Mike,
>>
>> After applying the patch, the build fails with following error
>>
>> CHK include/linux/version.h
>> CHK include/linux/utsrelease.h
>> CC arch/powerpc/kernel/asm-offsets.s
>> In file included from include/linux/smp.h:19,
>> from include/linux/topology.h:33,
>> from include/linux/mmzone.h:660,
>> from include/linux/gfp.h:4,
>> from include/linux/slab.h:14,
>> from include/linux/percpu.h:5,
>> from include/asm/time.h:18,
>> from include/asm/cputime.h:26,
>> from include/linux/sched.h:65,
>> from arch/powerpc/kernel/asm-offsets.c:17:
>> include/asm/smp.h:61: error: expected declaration specifiers or ‘...’
>> before ‘cpu_sibling_map’
>> include/asm/smp.h:61: warning: data definition has no type or storage
>> class
>> include/asm/smp.h:61: warning: type defaults to ‘int’ in declaration
>> of ‘DECLARE_PER_CPU’
>> make[1]: *** [arch/powerpc/kernel/asm-offsets.s] Error 1
>> make: *** [prepare0] Error 2
>>
>> Thanks & Regards,
>> Kamalesh Babulal.
> Hi Make,
> 
> I tried to debug and probably the patch below could help the build error
> 
> Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
> 
> 
> --- a/include/asm-powerpc/smp.h 2007-09-07 18:15:43.0 +0530
> +++ b/include/asm-powerpc/smp.h 2007-09-07 18:16:02.0 +0530
> @@ -25,6 +25,7 @@
> 
> #ifdef CONFIG_PPC64
> #include 
> +#include 
> #endif
> 
> extern int boot_cpuid;
> 
> ---
> Thanks & Regards,
> Kamalesh Babulal.
> 
> 
> 

Thanks!   I'll merge the above in...

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc4-mm1 compile error for ppc 32

2007-09-07 Thread Nick Piggin

On Friday 07 September 2007 09:00, Andrew Morton wrote:
> > On Thu, 6 Sep 2007 14:40:11 -0400 Mathieu Desnoyers
> > <[EMAIL PROTECTED]> wrote: Hi Andrew,
> >
> > Guess what, another one ;)
> >
> >  
> > /opt/crosstool/gcc-4.1.1-glibc-2.3.6/powerpc-405-linux-gnu/bin/powerpc-40
> >5-linux-gnu-gcc -m32 -Wp,-MD,arch/ppc/kernel/.asm-offsets.s.d  -nostdinc
> > -isystem
> > /opt/crosstool/gcc-4.1.1-glibc-2.3.6/powerpc-405-linux-gnu/lib/gcc/powerp
> >c-405-linux-gnu/4.1.1/include -D__KERNEL__ -Iinclude -Iinclude2
> > -I/home/compudj/git/linux-2.6-lttng/include -include
> > include/linux/autoconf.h -Iarch/ppc -Iarch/ppc/include
> > -I/home/compudj/git/linux-2.6-lttng/. -I. -Wall -Wundef
> > -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> > -Werror-implicit-function-declaration -Os
> > -I/home/compudj/git/linux-2.6-lttng/arch/ppc -Iarch/ppc -msoft-float
> > -pipe -ffixed-r2 -mmultiple -mno-altivec -mstring -Wa,-m405
> > -fomit-frame-pointer -g -fno-stack-protector
> > -Wdeclaration-after-statement -Wno-pointer-sign  -D"KBUILD_STR(s)=#s"
> > -D"KBUILD_BASENAME=KBUILD_STR(asm_offsets)" 
> > -D"KBUILD_MODNAME=KBUILD_STR(asm_offsets)" -fverbose-asm -S -o
> > arch/ppc/kernel/asm-offsets.s /home/compudj/git/linux-2.6-lttng/arc!
>
>  h/ppc/kernel/asm-offsets.c
>
> > In file included from
> > /home/compudj/git/linux-2.6-lttng/include/linux/bitops.h:17, from
> > /home/compudj/git/linux-2.6-lttng/include/linux/kernel.h:15, from
> > include2/asm/system.h:7,
> >  from
> > /home/compudj/git/linux-2.6-lttng/include/linux/list.h:9, from
> > /home/compudj/git/linux-2.6-lttng/include/linux/signal.h:8, from
> > /home/compudj/git/linux-2.6-lttng/arch/ppc/kernel/asm-offsets.c:11:
> > arch/ppc/include/asm/bitops.h: In function '__clear_bit_unlock':
> > arch/ppc/include/asm/bitops.h:229: error: expected string literal before
> > ':' token arch/ppc/include/asm/bitops.h:229: confused by earlier errors,
> > bailing out make[2]: *** [arch/ppc/kernel/asm-offsets.s] Error 1
> > make[1]: *** [prepare0] Error 2
> > make: *** [_all] Error 2
>
> What the heck is arch/ppc/include/asm/bitops.h?  I assume that it's
> include/asm-powerpc/bitops.h via some wormhole.
>
>
> If so, the finger points at this:
>
> static __inline__ void __clear_bit_unlock(int nr, volatile unsigned long
> *addr) {
>   __asm__ __volatile__(LWSYNC_ON_SMP ::: "memory");
>   __clear_bit(nr, addr);
> }
>
> which was added by Nick's powerpc-lock-bitops.patch.  I am suspecting that
> this isn't pp32 code?

Hmm, when LWSYNC_ON_SMP is a noop, it seems like it should probably
be an empty string instead of nothing? ("") That should make behaviour
more consistent I think.

Ben?


> (what's with the newly-added old-style __inline__, btw?  That's just more
> stuff we need to clean up later, so there doesn't seem much point in adding
> it).

Consistency really. Otherwise people ask why I've done it differently :P I
don't suppose it makes a future cleanup any harder, while giving a better
result until then.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sata & scsi suggestion for make menuconfig

2007-09-07 Thread Randy Dunlap

On Fri, 7 Sep 2007 14:48:00 +0200 Folkert van Heusden wrote:

> Hi,
> 
> Maybe it is a nice enhancement for make menuconfig to more explicitly
> give a pop-up or so when someone selects for example a sata controller
> while no 'scsi-disk' support was selected?

I know that it's difficult to get people to read docs & help text,
and maybe it is needed in more places, but CONFIG_ATA (SATA/PATA)
help text says:

  NOTE: ATA enables basic SCSI support; *however*,
  'SCSI disk support', 'SCSI tape support', or
  'SCSI CDROM support' may also be needed,
  depending on your hardware configuration.

A popup makes some sense, but I don't know if menuconfig knows how to
do popup warnings... and it needs to be done for all *configs,
not just menuconfig.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 349 matches

Mail list logo