Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Nick Piggin

Ravikiran G Thirumalai wrote:

On Sat, Jan 13, 2007 at 03:39:45PM +1100, Nick Piggin wrote:



What is the "CS time"?



Critical Section :).  This is the maximal time interval I measured  from 
t2 above to the time point we release the spin lock.  This is the hold 
time I guess.



It would be interesting to know how long the maximal lru_lock *hold* time 
is,

which could give us a better indication of whether it is a hardware problem.

For example, if the maximum hold time is 10ms, that it might indicate a
hardware fairness problem.



The maximal hold time was about 3s.


Well then it doesn't seem very surprising that this could cause a 30s wait
time for one CPU in a 16 core system, regardless of fairness.

I guess most of the contention, and the lock hold times are coming from
vmscan? Do you know exactly which critical sections are the culprits?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Ravikiran G Thirumalai
On Fri, Jan 12, 2007 at 05:11:16PM -0800, Andrew Morton wrote:
> On Fri, 12 Jan 2007 17:00:39 -0800
> Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote:
> 
> > But is
> > lru_lock an issue is another question.
> 
> I doubt it, although there might be changes we can make in there to
> work around it.
> 
> 

I tested with PAGEVEC_SIZE define to 62 and 126 -- no difference.  I still
notice the atrociously high spin times.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Ravikiran G Thirumalai
On Sat, Jan 13, 2007 at 03:39:45PM +1100, Nick Piggin wrote:
> Ravikiran G Thirumalai wrote:
> >Hi,
> >We noticed high interrupt hold off times while running some memory 
> >intensive
> >tests on a Sun x4600 8 socket 16 core x86_64 box.  We noticed softlockups,
> 
> [...]
> 
> >We did not use any lock debugging options and used plain old rdtsc to
> >measure cycles.  (We disable cpu freq scaling in the BIOS). All we did was
> >this:
> >
> >void __lockfunc _spin_lock_irq(spinlock_t *lock)
> >{
> >local_irq_disable();
> >> rdtsc(t1);
> >preempt_disable();
> >spin_acquire(>dep_map, 0, 0, _RET_IP_);
> >_raw_spin_lock(lock);
> >> rdtsc(t2);
> >if (lock->spin_time < (t2 - t1))
> >lock->spin_time = t2 - t1;
> >}
> >
> >On some runs, we found that the zone->lru_lock spun for 33 seconds or more
> >while the maximal CS time was 3 seconds or so.
> 
> What is the "CS time"?

Critical Section :).  This is the maximal time interval I measured  from 
t2 above to the time point we release the spin lock.  This is the hold 
time I guess.

> 
> It would be interesting to know how long the maximal lru_lock *hold* time 
> is,
> which could give us a better indication of whether it is a hardware problem.
> 
> For example, if the maximum hold time is 10ms, that it might indicate a
> hardware fairness problem.

The maximal hold time was about 3s.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-rc4-mm1: status of sn9c102_pas202bca?

2007-01-12 Thread Adrian Bunk
On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-rc3-mm1:
>...
>  git-dvb.patch
>...
>  git trees
>...

drivers/media/video/sn9c102/sn9c102_pas202bca.c is no longer used or 
built but still shipped.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tuning/tweaking VM settings for low memory (preventing OOM)

2007-01-12 Thread Willy Tarreau
On Fri, Jan 12, 2007 at 03:58:08PM -0600, Kumar Gala wrote:
> I'm working on an embedded PPC setup with 64M of memory and no swap.   
> I'm trying to figure out how best to tune the VM for an OOM situation  
> I'm running into.
> 
> I'm running a 2.6.16.35 kernel and have a bittorrent app that appears  
> to be initializing a large file for it to download into.  What I see  
> before running the app:
> 
> /bigfoot/usb_disk # cat /proc/meminfo
> MemTotal:62520 kB
> MemFree: 49192 kB
> Buffers:  8240 kB
> Cached:740 kB
> SwapCached:  0 kB
> Active:   8196 kB
> Inactive: 1236 kB
> HighTotal:   0 kB
> HighFree:0 kB
> LowTotal:62520 kB
> LowFree: 49192 kB
> SwapTotal:   0 kB
> SwapFree:0 kB
> Dirty:   0 kB
> Writeback:   0 kB
> Mapped:916 kB
> Slab: 2224 kB
> CommitLimit: 31260 kB
> Committed_AS: 1704 kB
> PageTables: 88 kB
> VmallocTotal:   933872 kB
> VmallocUsed:  9416 kB
> VmallocChunk:   923628 kB
> 
> after the OOM:
> 
> /bigfoot/usb_disk # cat /proc/meminfo
> MemTotal:62520 kB
> MemFree:  1608 kB
> Buffers:  8212 kB
> Cached:  42780 kB
> SwapCached:  0 kB
> Active:   6228 kB
> Inactive:45176 kB
> HighTotal:   0 kB
> HighFree:0 kB
> LowTotal:62520 kB
> LowFree:  1608 kB
> SwapTotal:   0 kB
> SwapFree:0 kB
> Dirty:   35208 kB
> Writeback:5616 kB
> Mapped:892 kB
> Slab: 7788 kB
> CommitLimit: 31260 kB
> Committed_AS: 1704 kB
> PageTables: 88 kB
> VmallocTotal:   933872 kB
> VmallocUsed:  9416 kB
> VmallocChunk:   923628 kB
> 
> Which makes me think that we aren't writing back fast enough.  If I  
> mount the drive "sync" the issue clearly goes away.
> 
> It appears from an strace we are doing ftruncate64(5, 178257920) when  
> we OOM.
> 
> Any ideas on VM parameters to tweak so we throttle this from occurring?

Take a look at /proc/sys/vm/bdflush. There are several useful parameters
there (doc is in linux-xxx/Documentation). For instance, the first column
is the percentage of memory used by writes before starting to write on
disk. When using tcpdump intensively, I lower this one to about 1%.

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hotplug from the user side ?

2007-01-12 Thread Soeren Sonnenburg
On Sat, 2007-01-13 at 10:55 +0900, Tejun Heo wrote:
> Soeren Sonnenburg wrote:
> > It is true it detects a removal and newly plugged devices immediately...
> > However it still prints warnings and errors that it could not
> > synchronize SCSI cache for the disks. Then it prints regular 'rejects
> > I/O to dead device' warning messages and on replugging the disks puts
> > them to the next free sd device (e.g. sdc -> sdd).
> 
> You need to stop using the devices before unplugging.  If you have no
> pending IO to the device, there won't be 'rejects IO to dead device'
> messages.  You can ignore the SCSI cache sync failure if the device is
> properly closed before being unplugged.

Jeff & Tejun thanks *a lot* for clarifying this. I am quite happy to see
that this is working very reliably!

> > These messages sound eval - so now the question is should I care ?
> > ( On the other hand it did not crash the machine )
> 
> So, no, you don't really have to care.  Just make sure the device is
> unmounted prior to unplugging.

OK, but then this really should be in the SATA hotplug FAQ (or can one
fix this somehow?)... No user will ignore messages like this. What is
especially annoying is that udev on the first remove/insert cycle
created a new device node so the disk became /dev/sde (was /dev/sdd):
dmesg output of reinserting the disk 2 times follows:


ata4: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4.00: disabled
ata4: EH complete
ata4.00: detaching (SCSI 3:0:0:0)
Synchronizing SCSI cache for disk sdd: 
FAILED
  status = 0, message = 00, host = 4, driver = 00
  <3>ata4: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0x2 frozen
ata4: hard resetting port
ata4: COMRESET failed (device not ready)
ata4: hardreset failed, retrying in 5 secs
ata4: hard resetting port
ata4: COMRESET failed (device not ready)
ata4: hardreset failed, retrying in 5 secs
ata4: hard resetting port


ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata4.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/100
ata4: EH complete
scsi 3:0:0:0: Direct-Access ATA  ST3750640AS  3.AA PQ: 0
ANSI: 5
SCSI device sde: 1465149168 512-byte hdwr sectors (750156 MB)
sde: Write Protect is off
sde: Mode Sense: 00 3a 00 00
SCSI device sde: drive cache: write back
SCSI device sde: 1465149168 512-byte hdwr sectors (750156 MB)
sde: Write Protect is off
sde: Mode Sense: 00 3a 00 00
SCSI device sde: drive cache: write back
 sde: unknown partition table
sd 3:0:0:0: Attached scsi disk sde
sd 3:0:0:0: Attached scsi generic sg3 type 0


ata4: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting port
ata4: SATA link down (SStatus 0 SControl 310)
ata4.00: disabled
ata4: EH complete
ata4.00: detaching (SCSI 3:0:0:0)
Synchronizing SCSI cache for disk sde: 
FAILED
  status = 0, message = 00, host = 4, driver = 00
  <3>ata4: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0x2 frozen
ata4: hard resetting port
ata4: COMRESET failed (device not ready)
ata4: hardreset failed, retrying in 5 secs
ata4: hard resetting port
ata4: COMRESET failed (device not ready)
ata4: hardreset failed, retrying in 5 secs
ata4: hard resetting port


ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata4.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/100
ata4: EH complete
scsi 3:0:0:0: Direct-Access ATA  ST3750640AS  3.AA PQ: 0
ANSI: 5
SCSI device sde: 1465149168 512-byte hdwr sectors (750156 MB)
sde: Write Protect is off
sde: Mode Sense: 00 3a 00 00
SCSI device sde: drive cache: write back
SCSI device sde: 1465149168 512-byte hdwr sectors (750156 MB)
sde: Write Protect is off
sde: Mode Sense: 00 3a 00 00
SCSI device sde: drive cache: write back
 sde: unknown partition table
sd 3:0:0:0: Attached scsi disk sde
sd 3:0:0:0: Attached scsi generic sg3 type 0

remains /dev/sde ... 

Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-rc5: known regressions with patches

2007-01-12 Thread Adrian Bunk
This email lists some known regressions in 2.6.20-rc5 compared to 2.6.19
with patches available.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: WARNING: "profile_hits" [drivers/kvm/kvm-intel.ko] undefined!
References : http://lkml.org/lkml/2007/1/12/16
Submitter  : Miles Lane <[EMAIL PROTECTED]>
Caused-By  : Ingo Molnar <[EMAIL PROTECTED]>
 commit 07031e14c1127fc7e1a5b98dfcc59f434e025104
Handled-By : Andrew Morton <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/1/12/18
Status : patch available


Subject: KVM: guest crash
References : http://lkml.org/lkml/2007/1/8/163
Submitter  : Roland Dreier <[EMAIL PROTECTED]>
Handled-By : Avi Kivity <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/1/9/280
Status : patch available


Subject: compile error: USB_HID must depend on INPUT
References : http://lkml.org/lkml/2007/1/12/157
Submitter  : Russell King <[EMAIL PROTECTED]>
Handled-By : Russell King <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/1/12/177
Status : patch available


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-rc5: known unfixed regressions

2007-01-12 Thread Adrian Bunk
On Fri, Jan 12, 2007 at 02:27:48PM -0500, Linus Torvalds wrote:
>...
> A lot of developers (including me) will be gone next week for 
> Linux.Conf.Au, so you have a week of rest and quiet to test this, and 
> report any problems. 
> 
> Not that there will be any, right? You all behave now!
>...

This still leaves the old regressions we have not yet fixed...


This email lists some known regressions in 2.6.20-rc5 compared to 2.6.19.

If you find your name in the Cc header, you are either submitter of one
of the bugs, maintainer of an affectected subsystem or driver, a patch
of you caused a breakage or I'm considering you in any other way possibly
involved with one or more of these issues.

Due to the huge amount of recipients, please trim the Cc when answering.


Subject: pktcdvd fails with pata_amd
References : http://bugzilla.kernel.org/show_bug.cgi?id=7810
Submitter  : [EMAIL PROTECTED]
Status : unknown


Subject: problems with CD burning
References : http://www.spinics.net/lists/linux-ide/msg06545.html
Submitter  : Uwe Bugla <[EMAIL PROTECTED]>
Status : unknown


Subject: BUG: scheduling while atomic: hald-addon-stor/...
 cdrom_{open,release,ioctl} in trace
References : http://lkml.org/lkml/2006/12/26/105
 http://lkml.org/lkml/2006/12/29/22
 http://lkml.org/lkml/2006/12/31/133
Submitter  : Jon Smirl <[EMAIL PROTECTED]>
 Damien Wyart <[EMAIL PROTECTED]>
 Aaron Sethman <[EMAIL PROTECTED]>
Status : unknown


Subject: 'shutdown -h now' reboots the system  (CONFIG_USB_SUSPEND)
References : http://lkml.org/lkml/2006/12/25/40
Submitter  : Berthold Cogel <[EMAIL PROTECTED]>
Handled-By : Alexey Starikovskiy <[EMAIL PROTECTED]>
Status : problem is being debugged


Subject: USB keyboard unresponsive after some time
References : http://lkml.org/lkml/2006/12/25/35
 http://lkml.org/lkml/2006/12/26/106
Submitter  : Florin Iucha <[EMAIL PROTECTED]>
Handled-By : Jiri Kosina <[EMAIL PROTECTED]>
 Alan Stern <[EMAIL PROTECTED]>
Status : problem is being debugged


Subject: BUG: at fs/inotify.c:172 set_dentry_child_flags()
References : http://bugzilla.kernel.org/show_bug.cgi?id=7785
Submitter  : Cijoml Cijomlovic Cijomlov <[EMAIL PROTECTED]>
Handled-By : Nick Piggin <[EMAIL PROTECTED]>
Status : problem is being debugged


Subject: BUG: at mm/truncate.c:60 cancel_dirty_page()  (XFS)
References : http://lkml.org/lkml/2007/1/5/308
Submitter  : Sami Farin <[EMAIL PROTECTED]>
Handled-By : David Chinner <[EMAIL PROTECTED]>
Status : problem is being discussed


Subject: BUG: at mm/truncate.c:60 cancel_dirty_page()  (reiserfs)
References : http://lkml.org/lkml/2007/1/7/117
 http://lkml.org/lkml/2007/1/10/202
Submitter  : Malte Schröder <[EMAIL PROTECTED]>
Handled-By : Vladimir V. Saveliev <[EMAIL PROTECTED]>
 Nick Piggin <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/1/10/202
Status : problem is being discussed


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: avoid div in rebalance_tick

2007-01-12 Thread Nick Piggin
On Fri, Jan 12, 2007 at 09:59:40AM +, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
> 
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.
> 
> How about putting back scale = 1 and using
> 
> scale += scale;
> 
> instead of the shift and getting what ought to be even better results

OK, how about this? It only works out to be around 0.01% of my P3's CPU time
at 1000HZ, but it also did make the x86 code 16 bytes smaller.


--
Avoid expensive integer divide 3 times per CPU per tick.

A userspace test of this loop went from 26ns, down to 19ns on a G5; and
from 123ns down to 28ns on a P3.

(Also avoid a variable bit shift, as suggested by Alan. The effect
of this wasn't noticable on the CPUs I tested with).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/kernel/sched.c
===
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,14 +2887,16 @@ static void active_load_balance(struct r
 static void update_load(struct rq *this_rq)
 {
unsigned long this_load;
-   int i, scale;
+   unsigned int i, scale;
 
this_load = this_rq->raw_weighted_load;
 
/* Update our load: */
-   for (i = 0, scale = 1; i < 3; i++, scale <<= 1) {
+   for (i = 0, scale = 1; i < 3; i++, scale += scale) {
unsigned long old_load, new_load;
 
+   /* scale is effectively 1 << i now, and >> i divides by scale */
+
old_load = this_rq->cpu_load[i];
new_load = this_load;
/*
@@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_
 */
if (new_load > old_load)
new_load += scale-1;
-   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+   this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch]cleanup and error reporting for sound/core/init.c

2007-01-12 Thread Oliver Neukum
Am Freitag, 12. Januar 2007 18:42 schrieb Takashi Iwai:
> At Fri, 12 Jan 2007 14:49:57 +0100,
> Oliver Neukum wrote:
> > 
> > +   } else {
> > +if (idx < snd_ecards_limit) {
> > +   if (snd_cards_lock & (1 << idx))
> > +   err = -EBUSY;   /* invalid */
> > +   } else if (idx < SNDRV_CARDS)
> > +   snd_ecards_limit = idx + 1; /* increase the 
> > limit */
> > +   else
> > +   err = -ENODEV;
> 
> The indent looks strange in the above three lines.
> Also, for me it's not much better than before... :)
> (all if's are comparisons of idx with other values.)

Hi,

OK, how about this one? The original indentation makes the control
flow very hard to follow.

Regards
Oliver

Signed-off-by: Oliver Neukum <[EMAIL PROTECTED]>
--

--- sound/core/init.c.alt   2007-01-12 14:26:47.0 +0100
+++ sound/core/init.c   2007-01-13 07:34:29.0 +0100
@@ -114,22 +114,28 @@
if (idx < 0) {
int idx2;
for (idx2 = 0; idx2 < SNDRV_CARDS; idx2++)
+   /* idx == -1 == 0x means: take any free slot */
if (~snd_cards_lock & idx & 1<= snd_ecards_limit)
snd_ecards_limit = idx + 1;
break;
}
-   } else if (idx < snd_ecards_limit) {
-   if (snd_cards_lock & (1 << idx))
-   err = -ENODEV;  /* invalid */
-   } else if (idx < SNDRV_CARDS)
-   snd_ecards_limit = idx + 1; /* increase the limit */
-   else
-   err = -ENODEV;
+   } else {
+if (idx < snd_ecards_limit) {
+   if (snd_cards_lock & (1 << idx))
+   err = -EBUSY;   /* invalid */
+   } else {
+   if (idx < SNDRV_CARDS)
+   snd_ecards_limit = idx + 1; /* increase the 
limit */
+   else
+   err = -ENODEV;
+   }
+   }
if (idx < 0 || err < 0) {
mutex_unlock(_card_mutex);
-   snd_printk(KERN_ERR "cannot find the slot for index %d (range 
0-%i)\n", idx, snd_ecards_limit - 1);
+   snd_printk(KERN_ERR "cannot find the slot for index %d (range 
0-%i), error: %d\n",
+idx, snd_ecards_limit - 1, err);
goto __error;
}
snd_cards_lock |= 1 << idx; /* lock it */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


reiserfs BUGs

2007-01-12 Thread Randy Dunlap

Running fsx-linux (akpm ext3-tools version) on reiserfs,
2.6.20-rc5 on x86_64.

[ 4496.964604] [ cut here ]
[ 4496.964614] Kernel BUG at 880b4499 [verbose debug info unavailable]
[ 4496.964621] invalid opcode:  [1] SMP
[ 4496.964629] CPU 2
[ 4496.964635] Modules linked in: reiserfs xfs jfs loop
[ 4496.964650] Pid: 298, comm: pdflush Not tainted 2.6.20-rc5 #1
[ 4496.964655] RIP: 0010:[]  [] 
:reiserfs:flush_commit_list+0x532/0x60a
[ 4496.964684] RSP: 0018:81011fa47bf0  EFLAGS: 00010246
[ 4496.964690] RAX:  RBX: c2001090f240 RCX: 
[ 4496.964697] RDX:  RSI: 72b3 RDI: c2001090f240
[ 4496.964703] RBP: 81011fa47c60 R08: 81011e521000 R09: 
[ 4496.964710] R10: 810005044100 R11: fffa R12: 81011d497180
[ 4496.964716] R13: 81011e521000 R14: 0088 R15: 
[ 4496.964723] FS:  () GS:81011fc78cc0() 
knlGS:
[ 4496.964730] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[ 4496.964737] CR2: 2b370e2a3000 CR3: 00011e118000 CR4: 06e0
[ 4496.964744] Process pdflush (pid: 298, threadinfo 81011fa46000, task 
81011fb02140)
[ 4496.964749] Stack:  0003 00010282 0058 
0282
[ 4496.964767]  0058 c20010884000 1fa47c60 
8012ddcb
[ 4496.964785]  81011a3abf00 810117180980 45a858b5 

[ 4496.964798] Call Trace:
[ 4496.964815]  [] __wake_up+0x43/0x50
[ 4496.964834]  [] :reiserfs:do_journal_end+0xc95/0xced
[ 4496.964845]  [] find_busiest_group+0x24e/0x68f
[ 4496.964856]  [] keventd_create_kthread+0x0/0x79
[ 4496.964875]  [] :reiserfs:journal_end_sync+0x75/0x7e
[ 4496.964886]  [] pdflush+0x0/0x1d4
[ 4496.964904]  [] :reiserfs:reiserfs_sync_fs+0x41/0x67
[ 4496.964922]  [] :reiserfs:reiserfs_write_super+0xe/0x10
[ 4496.964932]  [] sync_supers+0x67/0xb6
[ 4496.964942]  [] wb_kupdate+0x4d/0x133
[ 4496.964951]  [] pdflush+0x0/0x1d4
[ 4496.964958]  [] pdflush+0x129/0x1d4
[ 4496.964967]  [] wb_kupdate+0x0/0x133
[ 4496.964975]  [] kthread+0xd8/0x10c
[ 4496.964984]  [] schedule_tail+0x45/0xad
[ 4496.964994]  [] child_rip+0xa/0x12
[ 4496.965002]  [] keventd_create_kthread+0x0/0x79
[ 4496.965011]  [] kthread+0x0/0x10c
[ 4496.965019]  [] child_rip+0x0/0x12
[ 4496.965030] 
[ 4496.965031] Code: 0f 0b eb fe 48 8b 03 f0 0f ba 30 10 48 8b 13 8b 02 a9 00 00
[ 4496.965073] RIP  [] :reiserfs:flush_commit_list+0x532/0x60a
[ 4496.965094]  RSP 
[ 4496.965395]  BUG: at kernel/exit.c:860 do_exit()
[ 4496.965407]
[ 4496.965409] Call Trace:
[ 4496.965420]  [] profile_task_exit+0x15/0x17
[ 4496.965430]  [] do_exit+0x55/0x81f
[ 4496.965439]  [] kernel_math_error+0x0/0x96
[ 4496.965450]  [] do_trap+0xdc/0xeb
[ 4496.965458]  [] notifier_call_chain+0x29/0x3e
[ 4496.965468]  [] do_invalid_op+0xa7/0xb3
[ 4496.965488]  [] :reiserfs:flush_commit_list+0x532/0x60a
[ 4496.965498]  [] __wait_on_bit+0x67/0x77
[ 4496.965508]  [] sync_buffer+0x0/0x42
[ 4496.965516]  [] sync_buffer+0x0/0x42
[ 4496.965525]  [] error_exit+0x0/0x84
[ 4496.965545]  [] :reiserfs:flush_commit_list+0x532/0x60a
[ 4496.965556]  [] __wake_up+0x43/0x50
[ 4496.965581]  [] :reiserfs:do_journal_end+0xc95/0xced
[ 4496.965591]  [] find_busiest_group+0x24e/0x68f
[ 4496.965601]  [] keventd_create_kthread+0x0/0x79
[ 4496.965620]  [] :reiserfs:journal_end_sync+0x75/0x7e
[ 4496.965630]  [] pdflush+0x0/0x1d4
[ 4496.965649]  [] :reiserfs:reiserfs_sync_fs+0x41/0x67
[ 4496.965668]  [] :reiserfs:reiserfs_write_super+0xe/0x10
[ 4496.965678]  [] sync_supers+0x67/0xb6
[ 4496.965687]  [] wb_kupdate+0x4d/0x133
[ 4496.965696]  [] pdflush+0x0/0x1d4
[ 4496.965705]  [] pdflush+0x129/0x1d4
[ 4496.965713]  [] wb_kupdate+0x0/0x133
[ 4496.965722]  [] kthread+0xd8/0x10c
[ 4496.965731]  [] schedule_tail+0x45/0xad
[ 4496.965740]  [] child_rip+0xa/0x12
[ 4496.965748]  [] keventd_create_kthread+0x0/0x79
[ 4496.965758]  [] kthread+0x0/0x10c
[ 4496.965772]  [] child_rip+0x0/0x12

msg log: http://oss.oracle.com/~rdunlap/kerneltest/logs/2620-rc5-reis-fsx.log
config:  
http://oss.oracle.com/~rdunlap/kerneltest/configs/config-2620-rc5-reis-fsx

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux Software RAID 5 Performance Optimizations: 2.6.19.1: (211MB/s read & 195MB/s write)

2007-01-12 Thread Al Boldi
Justin Piszcz wrote:
> On Sat, 13 Jan 2007, Al Boldi wrote:
> > Justin Piszcz wrote:
> > > Btw, max sectors did improve my performance a little bit but
> > > stripe_cache+read_ahead were the main optimizations that made
> > > everything go faster by about ~1.5x.   I have individual bonnie++
> > > benchmarks of [only] the max_sector_kb tests as well, it improved the
> > > times from 8min/bonnie run -> 7min 11 seconds or so, see below and
> > > then after that is what you requested.
> >
> > Can you repeat with /dev/sda only?
>
> For sda-- (is a 74GB raptor only)-- but ok.

Do you get the same results for the 150GB-raptor on sd{e,g,i,k}?

> # uptime
>  16:25:38 up 1 min,  3 users,  load average: 0.23, 0.14, 0.05
> # cat /sys/block/sda/queue/max_sectors_kb
> 512
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.891 seconds, 71.2 MB/s
> # echo 192 > /sys/block/sda/queue/max_sectors_kb
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.192 seconds, 71.5 MB/s
> # echo 128 > /sys/block/sda/queue/max_sectors_kb
> # echo 3 > /proc/sys/vm/drop_caches
> # dd if=/dev/sda of=/dev/null bs=1M count=10240
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB) copied, 150.15 seconds, 71.5 MB/s
>
>
> Does this show anything useful?

Probably a latency issue.  md is highly latency sensitive.

What CPU type/speed do you have?  Bootlog/dmesg?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.17 - weird, boot CPU (#0) not listed by the BIOS.

2007-01-12 Thread Len Brown
On Friday 12 January 2007 10:50, Mark Hounschell wrote:
> Mark Hounschell wrote:
> > I have a Tyan S4881 Thunder K8QW 4 processor (8 cores). Kernel 2.6.16.37 
> > boots
> > and runs fine.
> > However kernel 2.6.17 and up doesn't. Here is my boot error msg.
> > 
> > 
> > kernel /vmlinuz-2.6.17-smp  root=/dev/sda5inux version 2.6.17-smp ([EMAIL 
> > PROTECTED])
> > (gcc version 4.1.0 (SUSE Linux)) #1 SMP PREEMPT Fri Jan 12 07:53:35 EST 2007
> > BIOS-provided physical RAM map:
> >  BIOS-e820:  - 00093800 (usable)
> >  BIOS-e820: 00093800 - 000a (reserved)
> >  BIOS-e820: 000c2000 - 0010 (reserved)
> >  BIOS-e820: 0010 - cfea (usable)
> >  BIOS-e820: cfea - cfea4000 (ACPI data)
> >  BIOS-e820: cfea4000 - cff0 (ACPI NVS)
> >  BIOS-e820: cff0 - d000 (reserved)
> >  BIOS-e820: e000 - f000 (reserved)
> >  BIOS-e820: fec0 - fec00400 (reserved)
> >  BIOS-e820: fee0 - fee01000 (reserved)
> >  BIOS-e820: fff8 - 0001 (reserved)
> >  BIOS-e820: 0001 - 00023000 (usable)
> > Warning only 4GB will be used.
> > Use a PAE enabled kernel.
> > 3200MB HIGHMEM available.
> > 896MB LOWMEM available.
> > found SMP MP-table at 000f71f0
> > DMI present.
> > ACPI: PM-Timer IO Port: 0x8008
> > ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] enabled)
> > Processor #16 15:1 APIC version 16

The APIC id for the 1st processor here is 16.
Usually it is 0.

Apparently this has confused some of the smpboot code
with all their new nifty bitmaps for processors online and offline...

Does the latest kernel work any better, say 2.6.19?
What if you throw CONFIG_NR_CPUS=32 at it?

-Len

> > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x11] enabled)
> > Processor #17 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x02] lapic_id[0x12] enabled)
> > Processor #18 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x03] lapic_id[0x13] enabled)
> > Processor #19 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x04] lapic_id[0x14] enabled)
> > Processor #20 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x05] lapic_id[0x15] enabled)
> > Processor #21 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x06] lapic_id[0x16] enabled)
> > Processor #22 15:1 APIC version 16
> > ACPI: LAPIC (acpi_id[0x07] lapic_id[0x17] enabled)
> > Processor #23 15:1 APIC version 16
> > ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
> > ACPI: IOAPIC (id[0x00] address[0xfec0] gsi_base[0])
> > IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23
> > ACPI: IOAPIC (id[0x01] address[0xda20] gsi_base[24])
> > IOAPIC[1]: apic_id 1, version 17, address 0xda20, GSI 24-27
> > ACPI: IOAPIC (id[0x02] address[0xda201000] gsi_base[28])
> > IOAPIC[2]: apic_id 2, version 17, address 0xda201000, GSI 28-31
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> > Enabling APIC mode:  Flat.  Using 3 I/O APICs
> > Using ACPI (MADT) for SMP configuration information
> > Allocating PCI resources starting at d100 (gap: d000:1000)
> > Built 1 zonelists
> > Kernel command line: root=/dev/sda5 vga=normal resume=/dev/sda2  
> > splash=silent
> > "console=ttyS0,19200"
> > Enabling fast FPU save and restore... done.
> > Enabling unmasked SIMD FPU exception support... done.
> > Initializing CPU#0
> > PID hash table entries: 4096 (order: 12, 16384 bytes)
> > Detected 2411.454 MHz processor.
> > Using pmtmr for high-res timesource
> > Console: colour VGA+ 80x25
> > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
> > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> > Memory: 3366304k/4194304k available (1529k kernel code, 38968k reserved, 
> > 633k
> > data, 184k init, 2488960k highmem)
> > Checking if this processor honours the WP bit even in supervisor mode... Ok.
> > Calibrating delay using timer specific routine.. 4827.61 BogoMIPS 
> > (lpj=9655232)
> > Security Framework v1.0.0 initialized
> > Capability LSM initialized
> > Mount-cache hash table entries: 512
> > CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> > CPU: L2 Cache: 1024K (64 bytes/line)
> > CPU 0(2) -> Core 0
> > Intel machine check architecture supported.
> > Intel machine check reporting enabled on CPU#0.
> > Checking 'hlt' instruction... OK.
> > Freeing SMP alternatives: 12k freed
> > ACPI Warning (nsload-0106): Zero-length AML block in table [SSDT] [20060127]
> > CPU0: AMD Athlon(tm) or Opteron(tm) CPU-model unknown stepping 

Re: Linux v2.6.20-rc5

2007-01-12 Thread Jeff Chua



From: Jeff Chua <[EMAIL PROTECTED]>


  CC [M]  drivers/kvm/vmx.o
{standard input}: Assembler messages:
{standard input}:3257: Error: bad register name `%sil'
make[2]: *** [drivers/kvm/vmx.o] Error 1
make[1]: *** [drivers/kvm] Error 2
make: *** [drivers] Error 2



I'm not using the kernel profiler, so here's a patch to make it work without 
CONFIG_PROFILING.



Thanks,
Jeff


--- linux/drivers/kvm/vmx.c.org 2007-01-13 12:57:28 +0800
+++ linux/drivers/kvm/vmx.c 2007-01-13 14:01:17 +0800
@@ -21,7 +21,11 @@
 #include 
 #include 
 #include 
+
+#ifdef CONFIG_PROFILING
 #include 
+#endif
+
 #include 
 #include 

@@ -1861,11 +1865,13 @@
asm ("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
 #endif

+#ifdef CONFIG_PROFILING
/*
 * Profile KVM exit RIPs:
 */
if (unlikely(prof_on == KVM_PROFILING))
profile_hit(KVM_PROFILING, (void *)vmcs_readl(GUEST_RIP));
+#endif

kvm_run->exit_type = 0;
if (fail) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ahci_softreset prevents acpi_power_off

2007-01-12 Thread Faik Uygur
13 Oca 2007 Cts 03:12 tarihinde, Tejun Heo şunları yazmıştı: 
> Hello,

Hello,

Thanks for the response.

> [...]
> Does everything else work okay?  
> Can you access devices attached to 
> ahci?  

Yes. While the machine is on, there seems to be no problem at all. Everything 
works great.

> What happens when you try to shutdown?  

Does not shutdown and freezes.

Hand copied last messages seen on console:

Synchronizing SCSI cache for disk sda:
ACPI: PCI Interrupt for device :06:08.0 disabled
Power down.
acpi_power_off called
  hwsleep-0285 [01] enter_sleep_state: Entering sleep state [S5]

> If possible, please post  
> dmesg of shutting down.  

Following is the netcat output. Please ask if you need anything else.

Regards,
- Faik

Linux version 2.6.20-rc4 ([EMAIL PROTECTED]) (gcc version 3.4.6) #58 SMP Sat 
Jan 13 
07:38:22 EET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009f800 end: 
0009f800 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009f800 size: 0800 end: 
000a type: 2
copy_e820_map() start: 000d8000 size: 00028000 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1fd9 end: 
1fe9 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1fe9 size: d000 end: 
1fe9d000 type: 3
copy_e820_map() start: 1fe9d000 size: 00063000 end: 
1ff0 type: 4
copy_e820_map() start: 1ff0 size: 0010 end: 
2000 type: 2
copy_e820_map() start: e000 size: 10006000 end: 
f0006000 type: 2
copy_e820_map() start: f0008000 size: 4000 end: 
f000c000 type: 2
copy_e820_map() start: fed2 size: 0007 end: 
fed9 type: 2
copy_e820_map() start: ff00 size: 0100 end: 
0001 type: 2
 BIOS-e820:  - 0009f800 (usable)
 BIOS-e820: 0009f800 - 000a (reserved)
 BIOS-e820: 000d8000 - 0010 (reserved)
 BIOS-e820: 0010 - 1fe9 (usable)
 BIOS-e820: 1fe9 - 1fe9d000 (ACPI data)
 BIOS-e820: 1fe9d000 - 1ff0 (ACPI NVS)
 BIOS-e820: 1ff0 - 2000 (reserved)
 BIOS-e820: e000 - f0006000 (reserved)
 BIOS-e820: f0008000 - f000c000 (reserved)
 BIOS-e820: fed2 - fed9 (reserved)
 BIOS-e820: ff00 - 0001 (reserved)
0MB HIGHMEM available.
510MB LOWMEM available.
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   130704
  HighMem130704 ->   130704
early_node_map[1] active PFN ranges
0:0 ->   130704
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0x1008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:13 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 3000 (gap: 2000:c000)
Detected 1729.118 MHz processor.
Built 1 zonelists.  Total pages: 129045
Kernel command line: root=/dev/sda1 mudur=language:tr init=/bin/bash 
[EMAIL PROTECTED]/eth0,[EMAIL PROTECTED]/00:13:02:50:5C:2B
netconsole: local port 
netconsole: local IP 192.168.1.8
netconsole: interface eth0
netconsole: remote port 9353
netconsole: remote IP 192.168.1.3
netconsole: remote ethernet address 00:13:02:50:5c:2b
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:8
... MAX_LOCK_DEPTH:  30
... MAX_LOCKDEP_KEYS:2048
... CLASSHASH_SIZE:   1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS:  16384
... CHAINHASH_SIZE:  8192
 memory used by lock dependency info: 1064 kB
 per task-struct memory footprint: 1200 bytes

| Locking API testsuite:

 | spin |wlock |rlock |mutex | wsem | rsem |
  --
 A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
 A-B-B-A deadlock:  ok  |  ok  |  ok 

Re: [PATCH 0/4] Linux Kernel Markers

2007-01-12 Thread Mathieu Desnoyers
Hi Richard,

* Richard J Moore ([EMAIL PROTECTED]) wrote:
> 
> 
> Mathieu Desnoyers <[EMAIL PROTECTED]> wrote on 20/12/2006
> 23:52:16:
> 
> > Hi,
> >
> > You will find, in the following posts, the latest revision of the Linux
> Kernel
> > Markers. Due to the need some tracing projects (LTTng, SystemTAP) has of
> this
> > kind of mechanism, it could be nice to consider it for mainstream
> inclusion.
> >
> > The following patches apply on 2.6.20-rc1-git7.
> >
> > Signed-off-by : Mathieu Desnoyers <[EMAIL PROTECTED]>
> 
> Mathiue, FWIW I like this idea. A few years ago I implemented something
> similar, but that had no explicit clients. Consequently I made my hooks
> code more generalized than is needed in practice. I do remember that Karim
> reworked the LTT instrumentation to use hooks and it worked fine.
> 

Yes, I think some features you implemented in GKHI, like chained calls to
multiple probes, should be implemented in a "probe management module" which
would be built on top of the marker infrastructure. One of my goal is to
concentrate on having the core right so that, afterward, building on top of it
will be easy.

> You've got the same optimizations for x86 by modifying an instruction's
> immediate operand and thus avoiding a d-cache hit. The only real caveat is
> the need to avoid the unsynchronised cross modification erratum. Which
> means that all processors will need to issue a serializing operation before
> executing a Marker whose state is changed. How is that handled?
> 

Good catch. I thought that modifying only 1 byte would spare us from this
errata, but looking at it in detail tells me than it's not the case.

I see three different ways to address the problem :
1 - Adding some synchronization code in the marker and using
synchronize_sched().
2 - Using an IPI to make other CPUs busy loop while we change the code and then
execute a serializing instruction (iret, cpuid...).
3 - First write an int3 instead of the instruction's first byte. The handler
would do the following :
int3_handler :
  single-step the original instruction.
  iret

Secondly, we call an IPI that does a smp_processor_id() on each CPU and
wait for them to complete. It will make sure we execute a synchronizing
instruction on every CPU even if we do not execute the trap handler.

Then, we write the new 2 bytes instruction atomically instead of the int3
and immediate value.


I exclude (1) because of the performance impact, (2) because it does not deal
with NMIs. It leaves (3). Does it make sense ?


> One additional thing we did, which might be useful at some future point,
> was adding a /proc interface. We reflected the current instrumentation
> though /proc and gave the status of each hook. We even talked about being
> able to enable or disabled instrumentation by writing to /proc but I don't
> think we ever implemented this.
> 

Adding a /proc output to list the active probes and their
callback will be tribial to add to the markers. I think the probe management
module should have its /proc file too to list the chains of connected handlers
once we get there.

> It's high time we settled the issue of instrumentation. It gets my vote,
> 
> Good luck!
> 
> Richard
> 

Thanks,

Mathieu

> - -
> Richard J Moore
> IBM Linux Technology Centre
> 

-- 
OpenPGP public key:  http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 2.6.15-rc5 - removes "video device notify" message (fwd)

2007-01-12 Thread Jeff Chua



Here's a line fix to ignore the "video device notify" message ...

--- linux/drivers/acpi/video.c.org  2007-01-12 23:05:23 +0800
+++ linux/drivers/acpi/video.c  2007-01-12 23:05:29 +0800
@@ -1771,1 +1771,1 @@
-   printk("video device notify\n");
+   //printk("video device notify\n");



Thanks,
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.20-rc5

2007-01-12 Thread Adrian Bunk
On Fri, Jan 12, 2007 at 02:26:45PM -0800, Andrew Morton wrote:
> On Fri, 12 Jan 2007 14:27:48 -0500 (EST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Ok, there it is, in all its shining glory.
> 
> It still doesn't run Excel.
>...

It should work with CrossOver.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tuning/tweaking VM settings for low memory (preventing OOM)

2007-01-12 Thread Nick Piggin

Kumar Gala wrote:
I'm working on an embedded PPC setup with 64M of memory and no swap.   
I'm trying to figure out how best to tune the VM for an OOM situation  
I'm running into.


I'm running a 2.6.16.35 kernel and have a bittorrent app that appears  
to be initializing a large file for it to download into.  What I see  
before running the app:


/bigfoot/usb_disk # cat /proc/meminfo
MemTotal:62520 kB
MemFree: 49192 kB
Buffers:  8240 kB
Cached:740 kB
SwapCached:  0 kB
Active:   8196 kB
Inactive: 1236 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:62520 kB
LowFree: 49192 kB
SwapTotal:   0 kB
SwapFree:0 kB
Dirty:   0 kB
Writeback:   0 kB
Mapped:916 kB
Slab: 2224 kB
CommitLimit: 31260 kB
Committed_AS: 1704 kB
PageTables: 88 kB
VmallocTotal:   933872 kB
VmallocUsed:  9416 kB
VmallocChunk:   923628 kB

after the OOM:

/bigfoot/usb_disk # cat /proc/meminfo
MemTotal:62520 kB
MemFree:  1608 kB
Buffers:  8212 kB
Cached:  42780 kB
SwapCached:  0 kB
Active:   6228 kB
Inactive:45176 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:62520 kB
LowFree:  1608 kB
SwapTotal:   0 kB
SwapFree:0 kB
Dirty:   35208 kB
Writeback:5616 kB
Mapped:892 kB
Slab: 7788 kB
CommitLimit: 31260 kB
Committed_AS: 1704 kB
PageTables: 88 kB
VmallocTotal:   933872 kB
VmallocUsed:  9416 kB
VmallocChunk:   923628 kB

Which makes me think that we aren't writing back fast enough.  If I  
mount the drive "sync" the issue clearly goes away.


It appears from an strace we are doing ftruncate64(5, 178257920) when  
we OOM.


Any ideas on VM parameters to tweak so we throttle this from occurring?


You don't give us the actual OOM message. In newer kernels, there has been
quite a bit of work done to improve the OOM situation -- search changelogs
in mm/oom_kill.c mm/vmscan.c mm/page_alloc.c.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT question

2007-01-12 Thread Nick Piggin

Bill Davidsen wrote:

The point is that if you want to be able to allocate at all, sometimes 
you will have to write dirty pages, garbage collect, and move or swap 
programs. The hardware is just too limited to do something less painful, 
and the user can't see memory to do things better. Linus is right, 
'Claiming that there is a "proper solution" is usually a total red 
herring. Quite often there isn't, and the "paper over" is actually not 
papering over, it's quite possibly the best solution there is.' I think 
any solution is going to be ugly, unfortunately.


It seems quite robust and clean to me, actually. Any userspace memory
that absolutely must be large contiguous regions have to be allocated at
boot or from a pool reserved at boot. All other allocations can be broken
into smaller ones.

Write dirty pages, garbage collect, move or swap programs isn't going
to be robust because there is lots of vital kernel memory that cannot be
moved and will cause fragmentation.

The reclaimable zone work that went on a while ago for hugepages is
exactly how you would also fix this problem and still have a reasonable
degree of flexibility at runtime. It isn't really ugly or hard,  compared
with some of the non-working "solutions" that have been proposed.

The other good thing is that the core mm already has practically
everything required, so the functionality is unintrusive.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Nick Piggin

Ravikiran G Thirumalai wrote:

Hi,
We noticed high interrupt hold off times while running some memory intensive
tests on a Sun x4600 8 socket 16 core x86_64 box.  We noticed softlockups,


[...]


We did not use any lock debugging options and used plain old rdtsc to
measure cycles.  (We disable cpu freq scaling in the BIOS). All we did was
this:

void __lockfunc _spin_lock_irq(spinlock_t *lock)
{
local_irq_disable();
> rdtsc(t1);
preempt_disable();
spin_acquire(>dep_map, 0, 0, _RET_IP_);
_raw_spin_lock(lock);
> rdtsc(t2);
if (lock->spin_time < (t2 - t1))
lock->spin_time = t2 - t1;
}

On some runs, we found that the zone->lru_lock spun for 33 seconds or more
while the maximal CS time was 3 seconds or so.


What is the "CS time"?

It would be interesting to know how long the maximal lru_lock *hold* time is,
which could give us a better indication of whether it is a hardware problem.

For example, if the maximum hold time is 10ms, that it might indicate a
hardware fairness problem.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19.1 failing

2007-01-12 Thread Randy Dunlap
On Sat, 13 Jan 2007 03:58:19 +0100 Von Wolher wrote:

> Hi,
> 
> I just build a 2.6.19.1 vanilla kernel based on the previous config
> (make oldconfig) but for some reason it is not starting. Despite
> following the usual procedure with lilo like many times before it seems
> that lilo tries to boot it and jumps back to the menu screen.

Was your previous config 2.6.18* or 2.6.19?
If it was 2.6.18* and you are using SATA, the config symbol
names for SATA changed and you'll need set them via make *config.

Otherwise we'll probably need more info.

> But selecting the old kernel boots just fine.
> 
> Any one can advise on what could cause such behaviour beside the obvious
>  steps like did i run lilo after kernel compile, check paths ...


---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kdump documentation update for 2.6.20: ia64 portion

2007-01-12 Thread Horms
Hi,

this patch fills in the portions for ia64 kexec.

I'm actually not sure what options are required for the dump-capture
kernel, but "init 1 irqpoll maxcpus=1" has been working fine for me.
Or more to the point, I'm not sure if irqpoll is needed or not.

This patch requires the documentation patch update that Vivek Goyal has
been circulating, and I believe is currently in mm. Feel free to fold it
into that change if it makes things easier for anyone.

Take II

Nanhai,

I have noted that vmlinux.gz may also be used. And added a note about the
kernel being able to automatically place the crashkernel region.
Furthermore, I added a note that if manually specified, the region should
be 64Mb aligned to avoid wastage. I notice that the auto placement code
uses 64Mb. But is this strictly neccessary for all page sizes?

Take III

Fixed some typos, thaniks to Andreas Schwab

Signed-off-by: Simon Horman <[EMAIL PROTECTED]>

Index: linux-2.6/Documentation/kdump/kdump.txt
===
--- linux-2.6.orig/Documentation/kdump/kdump.txt2007-01-12 
17:45:19.0 +0900
+++ linux-2.6/Documentation/kdump/kdump.txt 2007-01-12 17:59:42.0 
+0900
@@ -17,7 +17,7 @@
 memory image to a dump file on the local disk, or across the network to
 a remote system.
 
-Kdump and kexec are currently supported on the x86, x86_64, ppc64 and IA64
+Kdump and kexec are currently supported on the x86, x86_64, ppc64 and ia64
 architectures.
 
 When the system kernel boots, it reserves a small section of memory for
@@ -229,7 +229,23 @@
 
 Dump-capture kernel config options (Arch Dependent, ia64)
 --
-(To be filled)
+
+- No specific options are required to create a dump-capture kernel
+  for ia64, other than those specified in the arch idependent section
+  above. This means that it is possible to use the system kernel
+  as a dump-capture kernel if desired.
+  
+  The crashkernel region can be automatically placed by the system
+  kernel at run time. This is done by specifying the base address as 0,
+  or omitting it all together.
+
+  [EMAIL PROTECTED]
+  or
+  crashkernel=256M
+
+  If the start address is specified, note that the start address of the
+  kernel will be aligned to 64Mb, so if the start address is not then
+  any space below the alignment point will be wasted.
 
 
 Boot into System Kernel
@@ -248,6 +264,10 @@
 
On ppc64, use "[EMAIL PROTECTED]".
 
+   On ia64, [EMAIL PROTECTED] is a generous value that typically works.
+   The region may be automatically placed on ia64, see the
+   dump-capture kernel config option notes above.
+
 Load the Dump-capture Kernel
 
 
@@ -266,7 +286,8 @@
 For ppc64:
- Use vmlinux
 For ia64:
-   (To be filled)
+   - Use vmlinux or vmlinuz.gz
+
 
 If you are using a uncompressed vmlinux image then use following command
 to load dump-capture kernel.
@@ -282,18 +303,19 @@
--initrd= \
--append="root= "
 
+Please note, that --args-linux does not need to be specified for ia64.
+It is planned to make this a no-op on that architecture, but for now
+it should be omitted
+
 Following are the arch specific command line options to be used while
 loading dump-capture kernel.
 
-For i386 and x86_64:
+For i386, x86_64 and ia64:
"init 1 irqpoll maxcpus=1"
 
 For ppc64:
"init 1 maxcpus=1 noirqdistrib"
 
-For IA64
-   (To be filled)
-
 
 Notes on loading the dump-capture kernel:
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Fastboot] [PATCH] Kdump documentation update for 2.6.20: ia64 portion

2007-01-12 Thread Horms
On Fri, Jan 12, 2007 at 11:46:39AM -0800, Jay Lan wrote:
> Horms wrote:
> > Hi,
> > 
> > this patch fills in the portions for ia64 kexec.
> > 
> > I'm actually not sure what options are required for the dump-capture
> > kernel, but "init 1 irqpoll maxcpus=1" has been working fine for me.
> > Or more to the point, I'm not sure if irqpoll is needed or not.
> > 
> > This patch requires the documentation patch update that Vivek Goyal has
> > been circulating, and I believe is currently in mm. Feel free to fold it
> > into that change if it makes things easier for anyone.
> > 
> > Take II
> > 
> > Nanhai,
> > 
> > I have noted that vmlinux.gz may also be used. And added a note about the
> > kernel being able to automatically place the crashkernel region.
> > Furthermore, I added a note that if manually specified, the region should
> > be 64Mb aligned to avoid wastage. I notice that the auto placement code
> > uses 64Mb. But is this strictly neccessary for all page sizes?
> > 
> > Signed-off-by: Simon Horman <[EMAIL PROTECTED]>
> > 
> > Index: linux-2.6/Documentation/kdump/kdump.txt
> > ===
> > --- linux-2.6.orig/Documentation/kdump/kdump.txt2007-01-12 
> > 17:45:19.0 +0900
> > +++ linux-2.6/Documentation/kdump/kdump.txt 2007-01-12 17:59:42.0 
> > +0900
> > @@ -17,7 +17,7 @@
> >  memory image to a dump file on the local disk, or across the network to
> >  a remote system.
> >  
> > -Kdump and kexec are currently supported on the x86, x86_64, ppc64 and IA64
> > +Kdump and kexec are currently supported on the x86, x86_64, ppc64 and ia64
> >  architectures.
> >  
> >  When the system kernel boots, it reserves a small section of memory for
> > @@ -229,7 +229,23 @@
> >  
> >  Dump-capture kernel config options (Arch Dependent, ia64)
> >  --
> > -(To be filled)
> > +
> > +- No specific options are required to create a dump-capture kernel
> > +  for ia64, other than those specified in the arch idependent section
> > +  above. This means that it is possible to use the system kernel
> > +  as a dump-capture kernel if desired.
> > +  
> > +  The crashkernel region can be automatically placed by the system
> > +  kernel at run time. This is done by specifying the base address as 0,
> > +  or omitting it all together.
> 
> In my testing, i found the base address was ignored. Whatever value
> specified was fine. Not necessary to be 0. But i guess it is fine to
> give people a guideline telling them to specify 0.

I submitted a patch to honour non-zero base addresses,
I'm pretty sure it is in there now.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3 regression: suspend to RAM broken on Mac mini Core Duo

2007-01-12 Thread Tino Keitel
On Sat, Jan 13, 2007 at 04:05:28 +0100, Tino Keitel wrote:

[...]

> I think I found the problem. In 2.6.18, I had a slightly different
> config. With 2.6.20-rc4, I had sucessful suspend/resume cycles without
> the USB DVB-T box attached. I tweaked the USB options a bit and
> activated some options (CONFIG_USB_SUSPEND,
> CONFIG_USB_MULTITHREAD_PROBE, CONFIG_USB_EHCI_SPLIT_ISO,
> CONFIG_USB_EHCI_ROOT_HUB_TT, CONFIG_USB_EHCI_TT_NEWSCHED) and now I can
> suspend/resume without hangs. At least I haven't seen one until now.

Just after I sent the mail, I had 2 failures again. :-(

Regards,
Tino
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.20-rc5

2007-01-12 Thread Jeff Chua

On 1/13/07, Jeff Chua <[EMAIL PROTECTED]> wrote:

On 1/13/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Fri, 12 Jan 2007 14:27:48 -0500 (EST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:

  CC [M]  drivers/kvm/vmx.o
{standard input}: Assembler messages:
{standard input}:3257: Error: bad register name `%sil'
make[2]: *** [drivers/kvm/vmx.o] Error 1
make[1]: *** [drivers/kvm] Error 2
make: *** [drivers] Error 2

Am I missing something or this is a real problem?
Applied 2.6.20-rc5-mm-fixes and got this problem.
Using gcc version 3.4.5, binutils-2.17.50.0.8


Same problem with vanilla linux-2.6.20-rc5.

Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Choosing a HyperThreading/SMP/MultiCore kernel ?

2007-01-12 Thread Valdis . Kletnieks
On Fri, 12 Jan 2007 10:03:49 EST, Lennart Sorensen said:
>
> I would expect any distribution should work on these (as long as the
> kernel they use isn't too old.).  Of course if it is a Mac, you need a
> distribution that supports their firmware (which is of course not a PC
> bios).  As long as you can boot it, any i386 or amd64 kernel with smp
> enabled should use all the processors present (well amd64 on the
> core2duo and on the p4 if it is em64t enabled).

amd64 will only work on a core2duo if it's a T7200 or higher - the
lower numbers are 32-bit-only chipsets.  I admit not knowing what
exact variant the Mac has.

> I believe the closest optimization for a Core2 is probably the Pentium M
> (certainly not the P4/netburst).  Not entirely sure though.

CONFIG_MCORE2=y

That's probably even closer :)  At least in 2.6.20-rc4-mm1.  


pgpcgUQwo7pWp.pgp
Description: PGP signature


Re: Linux v2.6.20-rc5

2007-01-12 Thread Jeff Chua

On 1/13/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 12 Jan 2007 14:27:48 -0500 (EST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:



http://userweb.kernel.org/~akpm/2.6.20-rc5-mm-fixes
The KVM and direct-io changes are significant, so if people are testing
those things, please be sure to have that patch applied.


 CC [M]  drivers/kvm/vmx.o
{standard input}: Assembler messages:
{standard input}:3257: Error: bad register name `%sil'
make[2]: *** [drivers/kvm/vmx.o] Error 1
make[1]: *** [drivers/kvm] Error 2
make: *** [drivers] Error 2

Am I missing something or this is a real problem?

Applied 2.6.20-rc5-mm-fixes and got this problem.

Using gcc version 3.4.5, binutils-2.17.50.0.8


Thanks,
Jeff.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.19.1 failing

2007-01-12 Thread Von Wolher
Hi,

I just build a 2.6.19.1 vanilla kernel based on the previous config
(make oldconfig) but for some reason it is not starting. Despite
following the usual procedure with lilo like many times before it seems
that lilo tries to boot it and jumps back to the menu screen.

But selecting the old kernel boots just fine.

Any one can advise on what could cause such behaviour beside the obvious
 steps like did i run lilo after kernel compile, check paths ...

Thanks

Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/7] mm: merge populate and nopage into fault (fixes nonlinear)

2007-01-12 Thread Nick Piggin
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that
encodes the virtual address -> file offset differently from linear
mappings.

I can't see why the filesystem/pagecache code should need to know anything
about it, except for the fact that the ->nopage handler didn't quite pass
down enough information (ie. pgoff). But it is more logical to pass pgoff
rather than have the ->nopage function calculate it itself anyway. And
having the nopage handler install the pte itself is sort of nasty.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn. Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings
are subject to the pagefault vs invalidate/truncate race too, and it seemed
stupid to duplicate the synchronisation logic rather than just consolidate
the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache. Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
no users have hit mainline yet.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -168,11 +168,12 @@ extern unsigned int kobjsize(const void 
 #define VM_NONLINEAR   0x0080  /* Is non-linear (remap_file_pages) */
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
 #define VM_INSERTPAGE  0x0200  /* The vma has had "vm_insert_page()" 
done on it */
-#define VM_CAN_INVALIDATE  0x0400  /* The mapping may be 
invalidated,
+#define VM_CAN_INVALIDATE 0x0400   /* The mapping may be invalidated,
 * eg. truncate or invalidate_inode_*.
 * In this case, do_no_page must
 * return with the page locked.
 */
+#define VM_CAN_NONLINEAR 0x0800/* Has ->fault & does nonlinear pages */
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
@@ -196,6 +197,26 @@ extern unsigned int kobjsize(const void 
  */
 extern pgprot_t protection_map[16];
 
+#define FAULT_FLAG_WRITE   0x01
+#define FAULT_FLAG_NONLINEAR   0x02
+
+/*
+ * fault_data is filled in the the pagefault handler and passed to the
+ * vma's ->fault function. That function is responsible for filling in
+ * 'type', which is the type of fault if a page is returned, or the type
+ * of error if NULL is returned.
+ *
+ * pgoff should be used in favour of address, if possible. If pgoff is
+ * used, one may set VM_CAN_NONLINEAR in the vma->vm_flags to get
+ * nonlinear mapping support.
+ */
+struct fault_data {
+   unsigned long address;
+   pgoff_t pgoff;
+   unsigned int flags;
+
+   int type;
+};
 
 /*
  * These are the virtual MM functions - opening of an area, closing and
@@ -205,6 +226,7 @@ extern pgprot_t protection_map[16];
 struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+   struct page * (*fault)(struct vm_area_struct *vma, struct fault_data * 
fdata);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long 
address, int *type);
unsigned long (*nopfn)(struct vm_area_struct * area, unsigned long 
address);
int (*populate)(struct vm_area_struct * area, unsigned long address, 
unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
@@ -635,7 +657,6 @@ static inline int page_mapped(struct pag
  */
 #define NOPAGE_SIGBUS  (NULL)
 #define NOPAGE_OOM ((struct page *) (-1))
-#define NOPAGE_REFAULT ((struct page *) (-2))  /* Return to userspace, rerun */
 
 /*
  * Error return values for the *_nopfn functions
@@ -669,14 +690,13 @@ extern void pagefault_out_of_memory(void
 extern void show_free_areas(void);
 
 #ifdef CONFIG_SHMEM
-struct page *shmem_nopage(struct vm_area_struct *vma,
-   unsigned long address, int *type);
+struct page *shmem_fault(struct vm_area_struct *vma, struct fault_data *fdata);
 int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *new);
 struct mempolicy *shmem_get_policy(struct vm_area_struct *vma,
unsigned long addr);
 int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
-#define shmem_nopage filemap_nopage
+#define shmem_fault filemap_fault
 
 static inline int shmem_lock(struct file *file, int lock,
 struct user_struct *user)
@@ -1069,9 +1089,11 @@ extern void truncate_inode_pages_range(s
 

[patch 7/7] mm: remove legacy cruft

2007-01-12 Thread Nick Piggin
Remove legacy filemap_nopage and all of the .populate API cruft.

This patch is optional and can be left out (eg. for a cleaner merge with -mm),
and rebased after the previous patches go upstream.

 include/linux/mm.h |9 --
 mm/filemap.c   |  195 -
 mm/fremap.c|   71 ++-
 mm/memory.c|   37 ++
 4 files changed, 21 insertions(+), 291 deletions(-)

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -228,8 +228,6 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
struct page * (*fault)(struct vm_area_struct *vma, struct fault_data * 
fdata);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long 
address, int *type);
-   int (*populate)(struct vm_area_struct * area, unsigned long address, 
unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
-
/* notification that a previously read-only page is about to become
 * writable, if an error is returned it will cause a SIGBUS */
int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page);
@@ -771,8 +769,6 @@ static inline void unmap_shared_mapping_
 
 extern int vmtruncate(struct inode * inode, loff_t offset);
 extern int vmtruncate_range(struct inode * inode, loff_t offset, loff_t end);
-extern int install_page(struct mm_struct *mm, struct vm_area_struct *vma, 
unsigned long addr, struct page *page, pgprot_t prot);
-extern int install_file_pte(struct mm_struct *mm, struct vm_area_struct *vma, 
unsigned long addr, unsigned long pgoff, pgprot_t prot);
 
 #ifdef CONFIG_MMU
 extern int __handle_mm_fault(struct mm_struct *mm,struct vm_area_struct *vma,
@@ -1083,10 +1079,6 @@ extern void truncate_inode_pages_range(s
 
 /* generic vm_area_ops exported for stackable file systems */
 extern struct page *filemap_fault(struct vm_area_struct *, struct fault_data 
*);
-extern struct page * __deprecated_for_modules filemap_nopage(
-   struct vm_area_struct *, unsigned long, int *);
-extern int __deprecated_for_modules filemap_populate(struct vm_area_struct *,
-   unsigned long, unsigned long, pgprot_t, unsigned long, int);
 
 /* mm/page-writeback.c */
 int write_one_page(struct page *page, int wait);
Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1496,201 +1496,6 @@ page_not_uptodate:
 }
 EXPORT_SYMBOL(filemap_fault);
 
-/*
- * filemap_nopage and filemap_populate are legacy exports that are not used
- * in tree. Scheduled for removal.
- */
-struct page *filemap_nopage(struct vm_area_struct *area,
-   unsigned long address, int *type)
-{
-   struct page *page;
-   struct fault_data fdata;
-   fdata.address = address;
-   fdata.pgoff = ((address - area->vm_start) >> PAGE_CACHE_SHIFT)
-   + area->vm_pgoff;
-   fdata.flags = 0;
-
-   page = filemap_fault(area, );
-   if (type)
-   *type = fdata.type;
-
-   return page;
-}
-EXPORT_SYMBOL(filemap_nopage);
-
-static struct page * filemap_getpage(struct file *file, unsigned long pgoff,
-   int nonblock)
-{
-   struct address_space *mapping = file->f_mapping;
-   struct page *page;
-   int error;
-
-   /*
-* Do we have something in the page cache already?
-*/
-retry_find:
-   page = find_get_page(mapping, pgoff);
-   if (!page) {
-   if (nonblock)
-   return NULL;
-   goto no_cached_page;
-   }
-
-   /*
-* Ok, found a page in the page cache, now we need to check
-* that it's up-to-date.
-*/
-   if (!PageUptodate(page)) {
-   if (nonblock) {
-   page_cache_release(page);
-   return NULL;
-   }
-   goto page_not_uptodate;
-   }
-
-success:
-   /*
-* Found the page and have a reference on it.
-*/
-   mark_page_accessed(page);
-   return page;
-
-no_cached_page:
-   error = page_cache_read(file, pgoff);
-
-   /*
-* The page we want has now been added to the page cache.
-* In the unlikely event that someone removed it in the
-* meantime, we'll just come back here and read it again.
-*/
-   if (error >= 0)
-   goto retry_find;
-
-   /*
-* An error return from page_cache_read can result if the
-* system is low on memory, or a problem occurs while trying
-* to schedule I/O.
-*/
-   return NULL;
-
-page_not_uptodate:
-   lock_page(page);
-
-   /* Did it get truncated 

[patch 6/7] mm: merge nopfn into fault

2007-01-12 Thread Nick Piggin
Remove ->nopfn and reimplement the only existing handler using ->fault

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/drivers/char/mspec.c
===
--- linux-2.6.orig/drivers/char/mspec.c
+++ linux-2.6/drivers/char/mspec.c
@@ -182,24 +182,25 @@ mspec_close(struct vm_area_struct *vma)
 
 
 /*
- * mspec_nopfn
+ * mspec_fault
  *
  * Creates a mspec page and maps it to user space.
  */
-static unsigned long
-mspec_nopfn(struct vm_area_struct *vma, unsigned long address)
+static struct page *
+mspec_fault(struct fault_data *fdata)
 {
unsigned long paddr, maddr;
unsigned long pfn;
-   int index;
-   struct vma_data *vdata = vma->vm_private_data;
+   int index = fdata->pgoff;
+   struct vma_data *vdata = fdata->vma->vm_private_data;
 
-   index = (address - vma->vm_start) >> PAGE_SHIFT;
maddr = (volatile unsigned long) vdata->maddr[index];
if (maddr == 0) {
maddr = uncached_alloc_page(numa_node_id());
-   if (maddr == 0)
-   return NOPFN_OOM;
+   if (maddr == 0) {
+   fdata->type = VM_FAULT_OOM;
+   return NULL;
+   }
 
spin_lock(>lock);
if (vdata->maddr[index] == 0) {
@@ -219,13 +220,21 @@ mspec_nopfn(struct vm_area_struct *vma, 
 
pfn = paddr >> PAGE_SHIFT;
 
-   return pfn;
+   fdata->type = VM_FAULT_MINOR;
+   /*
+* vm_insert_pfn can fail with -EBUSY, but in that case it will
+* be because another thread has installed the pte first, so it
+* is no problem.
+*/
+   vm_insert_pfn(fdata->vma, fdata->address, pfn);
+
+   return NULL;
 }
 
 static struct vm_operations_struct mspec_vm_ops = {
.open = mspec_open,
.close = mspec_close,
-   .nopfn = mspec_nopfn
+   .fault = mspec_fault,
 };
 
 /*
Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -228,7 +228,6 @@ struct vm_operations_struct {
void (*close)(struct vm_area_struct * area);
struct page * (*fault)(struct vm_area_struct *vma, struct fault_data * 
fdata);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long 
address, int *type);
-   unsigned long (*nopfn)(struct vm_area_struct * area, unsigned long 
address);
int (*populate)(struct vm_area_struct * area, unsigned long address, 
unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
 
/* notification that a previously read-only page is about to become
@@ -659,12 +658,6 @@ static inline int page_mapped(struct pag
 #define NOPAGE_OOM ((struct page *) (-1))
 
 /*
- * Error return values for the *_nopfn functions
- */
-#define NOPFN_SIGBUS   ((unsigned long) -1)
-#define NOPFN_OOM  ((unsigned long) -2)
-
-/*
  * Different kinds of faults, as returned by handle_mm_fault().
  * Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1288,6 +1288,11 @@ EXPORT_SYMBOL(vm_insert_page);
  *
  * This function should only be called from a vm_ops->fault handler, and
  * in that case the handler should return NULL.
+ *
+ * vma cannot be a COW mapping.
+ *
+ * As this is called only for pages that do not currently exist, we
+ * do not need to flush old virtual caches or the TLB.
  */
 int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned 
long pfn)
 {
@@ -2346,54 +2351,6 @@ static int do_nonlinear_fault(struct mm_
 }
 
 /*
- * do_no_pfn() tries to create a new page mapping for a page without
- * a struct_page backing it
- *
- * As this is called only for pages that do not currently exist, we
- * do not need to flush old virtual caches or the TLB.
- *
- * We enter with non-exclusive mmap_sem (to exclude vma changes,
- * but allow concurrent faults), and pte mapped but not yet locked.
- * We return with mmap_sem still held, but pte unmapped and unlocked.
- *
- * It is expected that the ->nopfn handler always returns the same pfn
- * for a given virtual mapping.
- *
- * Mark this `noinline' to prevent it from bloating the main pagefault code.
- */
-static noinline int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma,
-unsigned long address, pte_t *page_table, pmd_t *pmd,
-int write_access)
-{
-   spinlock_t *ptl;
-   pte_t entry;
-   unsigned long pfn;
-   int ret = VM_FAULT_MINOR;
-
-   pte_unmap(page_table);
-   BUG_ON(!(vma->vm_flags & VM_PFNMAP));
-   BUG_ON(is_cow_mapping(vma->vm_flags));
-
-   pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
-   if 

[patch 5/7] mm: add vm_insert_pfn

2007-01-12 Thread Nick Piggin
Add a vm_insert_pfn helper, so that ->fault handlers can have nopfn
functionality by installing their own pte and returning NULL.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -1151,6 +1151,7 @@ unsigned long vmalloc_to_pfn(void *addr)
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
+int vm_insert_pfn(struct vm_area_struct *, unsigned long addr, unsigned long 
pfn);
 
 struct page *follow_page(struct vm_area_struct *, unsigned long address,
unsigned int foll_flags);
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1277,6 +1277,50 @@ int vm_insert_page(struct vm_area_struct
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+/**
+ * vm_insert_pfn - insert single pfn into user vma
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pfn: source kernel pfn
+ *
+ * Similar to vm_inert_page, this allows drivers to insert individual pages
+ * they've allocated into a user vma. Same comments apply.
+ *
+ * This function should only be called from a vm_ops->fault handler, and
+ * in that case the handler should return NULL.
+ */
+int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned 
long pfn)
+{
+   struct mm_struct *mm = vma->vm_mm;
+   int retval;
+   pte_t *pte, entry;
+   spinlock_t *ptl;
+
+   BUG_ON(!(vma->vm_flags & VM_PFNMAP));
+   BUG_ON(is_cow_mapping(vma->vm_flags));
+
+   retval = -ENOMEM;
+   pte = get_locked_pte(mm, addr, );
+   if (!pte)
+   goto out;
+   retval = -EBUSY;
+   if (!pte_none(*pte))
+   goto out_unlock;
+
+   /* Ok, finally just insert the thing.. */
+   entry = pfn_pte(pfn, vma->vm_page_prot);
+   set_pte_at(mm, addr, pte, entry);
+   update_mmu_cache(vma, addr, entry);
+
+   retval = 0;
+out_unlock:
+   pte_unmap_unlock(pte, ptl);
+
+out:
+   return retval;
+}
+EXPORT_SYMBOL(vm_insert_pfn);
+
 /*
  * maps a range of physical memory into the requested pages. the old
  * mappings are removed. any references to nonexistent pages results
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/7] mm: fix fault vs invalidate race for linear mappings

2007-01-12 Thread Nick Piggin
Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of
pages from pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the
page, before it can be discarded from the pagecache. Between shooting
down ptes to a particular page, and actually dropping the struct page
from the pagecache, do_no_page from any process might fault on that
page and establish a new mapping to the page just before it gets
discarded from the pagecache.

The most common case where such invalidation is used is in file
truncation. This case was catered for by doing a sort of open-coded
seqlock between the file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then
find the page if it is within i_size, and then check truncate_count
under the page table lock and back out and retry if it had
subsequently been changed (ptl will serialise against unmapping, and
ensure a potentially updated truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails
in the case where we would like to invalidate pagecache inside i_size.
do_no_page can come in anytime and filemap_nopage is not aware of the
invalidation in progress (as it is when it is outside i_size). The
end result is that dangling (->mapping == NULL) pages that appear to
be from a particular file may be mapped into userspace with nonsense
data. Valid mappings to the same place will see a different page.

Andrea implemented two working fixes, one using a real seqlock,
another using a page->flags bit. He also proposed using the page lock
in do_no_page, but that was initially considered too heavyweight.
However, it is not a global or per-file lock, and the page cacheline
is modified in do_no_page to increment _count and _mapcount anyway, so
a further modification should not be a large performance hit.
Scalability is not an issue.

This patch implements this latter approach. ->nopage implementations
return with the page locked if it is possible for their underlying
file to be invalidated (in that case, they must set a special vm_flags
bit to indicate so). do_no_page only unlocks the page after setting
up the mapping completely. invalidation is excluded because it holds
the page lock during invalidation of each page (and ensures that the
page is not mapped while holding the lock).

This also allows significant simplifications in do_no_page, because
we have the page locked in the right place in the pagecache from the
start.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/include/linux/mm.h
===
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -168,6 +168,11 @@ extern unsigned int kobjsize(const void 
 #define VM_NONLINEAR   0x0080  /* Is non-linear (remap_file_pages) */
 #define VM_MAPPED_COPY 0x0100  /* T if mapped copy of data (nommu 
mmap) */
 #define VM_INSERTPAGE  0x0200  /* The vma has had "vm_insert_page()" 
done on it */
+#define VM_CAN_INVALIDATE  0x0400  /* The mapping may be 
invalidated,
+* eg. truncate or invalidate_inode_*.
+* In this case, do_no_page must
+* return with the page locked.
+*/
 
 #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1349,9 +1349,10 @@ struct page *filemap_nopage(struct vm_ar
unsigned long size, pgoff;
int did_readaround = 0, majmin = VM_FAULT_MINOR;
 
+   BUG_ON(!(area->vm_flags & VM_CAN_INVALIDATE));
+
pgoff = ((address-area->vm_start) >> PAGE_CACHE_SHIFT) + area->vm_pgoff;
 
-retry_all:
size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
if (pgoff >= size)
goto outside_data_content;
@@ -1373,7 +1374,7 @@ retry_all:
 * Do we have something in the page cache already?
 */
 retry_find:
-   page = find_get_page(mapping, pgoff);
+   page = find_lock_page(mapping, pgoff);
if (!page) {
unsigned long ra_pages;
 
@@ -1407,7 +1408,7 @@ retry_find:
start = pgoff - ra_pages / 2;
do_page_cache_readahead(mapping, file, start, ra_pages);
}
-   page = find_get_page(mapping, pgoff);
+   page = find_lock_page(mapping, pgoff);
if (!page)
goto no_cached_page;
}
@@ -1416,13 +1417,19 @@ 

[patch 1/7] mm: debug check for the fault vs invalidate race

2007-01-12 Thread Nick Piggin
Add a bugcheck for Andrea's pagefault vs invalidate race. This is triggerable
for both linear and nonlinear pages with a userspace test harness (using
direct IO and truncate, respectively).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -120,6 +120,8 @@ void __remove_from_page_cache(struct pag
page->mapping = NULL;
mapping->nrpages--;
__dec_zone_page_state(page, NR_FILE_PAGES);
+
+   BUG_ON(page_mapped(page));
 }
 
 void remove_from_page_cache(struct page *page)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/7] fault vs truncate/invalidate race fix

2007-01-12 Thread Nick Piggin
The following set of patches fix the fault vs invalidate and fault
vs truncate_range race for filemap_nopage mappings, plus those and
fault vs truncate race for nonlinear mappings.

Hasn't changed since I last submitted it, when it was rejected because
it made one of the buffered write deadlocks easier to hit. I'll try
again.

Patches based on 2.6.20-rc4. Comments?

Thanks,
Nick

--
SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/7] mm: simplify filemap_nopage

2007-01-12 Thread Nick Piggin
Identical block is duplicated twice: contrary to the comment, we have been
re-reading the page *twice* in filemap_nopage rather than once.

If any retry logic or anything is needed, it belongs in lower levels anyway.
Only retry once. Linus agrees.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1468,30 +1468,6 @@ page_not_uptodate:
majmin = VM_FAULT_MAJOR;
count_vm_event(PGMAJFAULT);
}
-   lock_page(page);
-
-   /* Did it get unhashed while we waited for it? */
-   if (!page->mapping) {
-   unlock_page(page);
-   page_cache_release(page);
-   goto retry_all;
-   }
-
-   /* Did somebody else get it up-to-date? */
-   if (PageUptodate(page)) {
-   unlock_page(page);
-   goto success;
-   }
-
-   error = mapping->a_ops->readpage(file, page);
-   if (!error) {
-   wait_on_page_locked(page);
-   if (PageUptodate(page))
-   goto success;
-   } else if (error == AOP_TRUNCATED_PAGE) {
-   page_cache_release(page);
-   goto retry_find;
-   }
 
/*
 * Umm, take care of errors if the page isn't up-to-date.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/10] mm: generic_file_buffered_write cleanup

2007-01-12 Thread Nick Piggin
From: Andrew Morton <[EMAIL PROTECTED]>

Clean up buffered write code. Rename some variables and fix some types.

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1854,16 +1854,15 @@ generic_file_buffered_write(struct kiocb
size_t count, ssize_t written)
 {
struct file *file = iocb->ki_filp;
-   struct address_space * mapping = file->f_mapping;
+   struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
struct inode*inode = mapping->host;
longstatus = 0;
struct page *page;
struct page *cached_page = NULL;
-   size_t  bytes;
struct pagevec  lru_pvec;
const struct iovec *cur_iov = iov; /* current iovec */
-   size_t  iov_base = 0;  /* offset in the current iovec */
+   size_t  iov_offset = 0;/* offset in the current iovec */
char __user *buf;
 
pagevec_init(_pvec, 0);
@@ -1874,31 +1873,33 @@ generic_file_buffered_write(struct kiocb
if (likely(nr_segs == 1))
buf = iov->iov_base + written;
else {
-   filemap_set_next_iovec(_iov, _base, written);
-   buf = cur_iov->iov_base + iov_base;
+   filemap_set_next_iovec(_iov, _offset, written);
+   buf = cur_iov->iov_base + iov_offset;
}
 
do {
-   unsigned long index;
-   unsigned long offset;
-   unsigned long maxlen;
-   size_t copied;
+   pgoff_t index;  /* Pagecache index for current page */
+   unsigned long offset;   /* Offset into pagecache page */
+   unsigned long maxlen;   /* Bytes remaining in current iovec */
+   size_t bytes;   /* Bytes to write to page */
+   size_t copied;  /* Bytes copied from user */
 
-   offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
+   offset = (pos & (PAGE_CACHE_SIZE - 1));
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
if (bytes > count)
bytes = count;
 
+   maxlen = cur_iov->iov_len - iov_offset;
+   if (maxlen > bytes)
+   maxlen = bytes;
+
/*
 * Bring in the user page that we will copy from _first_.
 * Otherwise there's a nasty deadlock on copying from the
 * same page as we're writing to, without it being marked
 * up-to-date.
 */
-   maxlen = cur_iov->iov_len - iov_base;
-   if (maxlen > bytes)
-   maxlen = bytes;
fault_in_pages_readable(buf, maxlen);
 
page = __grab_cache_page(mapping,index,_page,_pvec);
@@ -1929,7 +1930,7 @@ generic_file_buffered_write(struct kiocb
buf, bytes);
else
copied = filemap_copy_from_user_iovec(page, offset,
-   cur_iov, iov_base, bytes);
+   cur_iov, iov_offset, bytes);
flush_dcache_page(page);
status = a_ops->commit_write(file, page, offset, offset+bytes);
if (status == AOP_TRUNCATED_PAGE) {
@@ -1947,12 +1948,12 @@ generic_file_buffered_write(struct kiocb
buf += status;
if (unlikely(nr_segs > 1)) {
filemap_set_next_iovec(_iov,
-   _base, status);
+   _offset, status);
if (count)
buf = cur_iov->iov_base +
-   iov_base;
+   iov_offset;
} else {
-   iov_base += status;
+   iov_offset += status;
}
}
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 6/10] mm: be sure to trim blocks

2007-01-12 Thread Nick Piggin
If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then
we may have failed the write operation despite prepare_write having
instantiated blocks past i_size. Fix this, and consolidate the trimming into
one place.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1911,22 +1911,9 @@ generic_file_buffered_write(struct kiocb
}
 
status = a_ops->prepare_write(file, page, offset, offset+bytes);
-   if (unlikely(status)) {
-   loff_t isize = i_size_read(inode);
+   if (unlikely(status))
+   goto fs_write_aop_error;
 
-   if (status != AOP_TRUNCATED_PAGE)
-   unlock_page(page);
-   page_cache_release(page);
-   if (status == AOP_TRUNCATED_PAGE)
-   continue;
-   /*
-* prepare_write() may have instantiated a few blocks
-* outside i_size.  Trim these off again.
-*/
-   if (pos + bytes > isize)
-   vmtruncate(inode, isize);
-   break;
-   }
if (likely(nr_segs == 1))
copied = filemap_copy_from_user(page, offset,
buf, bytes);
@@ -1935,10 +1922,9 @@ generic_file_buffered_write(struct kiocb
cur_iov, iov_offset, bytes);
flush_dcache_page(page);
status = a_ops->commit_write(file, page, offset, offset+bytes);
-   if (status == AOP_TRUNCATED_PAGE) {
-   page_cache_release(page);
-   continue;
-   }
+   if (unlikely(status))
+   goto fs_write_aop_error;
+
if (likely(copied > 0)) {
if (!status)
status = copied;
@@ -1969,6 +1955,25 @@ generic_file_buffered_write(struct kiocb
break;
balance_dirty_pages_ratelimited(mapping);
cond_resched();
+   continue;
+
+fs_write_aop_error:
+   if (status != AOP_TRUNCATED_PAGE)
+   unlock_page(page);
+   page_cache_release(page);
+
+   /*
+* prepare_write() may have instantiated a few blocks
+* outside i_size.  Trim these off again. Don't need
+* i_size_read because we hold i_mutex.
+*/
+   if (pos + bytes > inode->i_size)
+   vmtruncate(inode, inode->i_size);
+   if (status == AOP_TRUNCATED_PAGE)
+   continue;
+   else
+   break;
+
} while (count);
*ppos = pos;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 10/10] mm: fix pagecache write deadlocks

2007-01-12 Thread Nick Piggin
Modify the core write() code so that it won't take a pagefault while holding a
lock on the pagecache page. There are a number of different deadlocks possible
if we try to do such a thing:

1.  generic_buffered_write
2.   lock_page
3.prepare_write
4. unlock_page+vmtruncate
5. copy_from_user
6.  mmap_sem(r)
7.   handle_mm_fault
8.lock_page (filemap_nopage)
9.commit_write
1.   unlock_page

b. sys_munmap / sys_mlock / others
c.  mmap_sem(w)
d.   make_pages_present
e.get_user_pages
f. handle_mm_fault
g.  lock_page (filemap_nopage)

2,8 - recursive deadlock if page is same
2,8;2,8 - ABBA deadlock is page is different
2,6;c,g - ABBA deadlock if page is same

The solution is as follows:
1.  If we find the destination page is uptodate, continue as normal, but use
atomic usercopies which do not take pagefaults and do not zero the uncopied
tail of the destination. The destination is already uptodate, so we can
commit_write the full length even if there was a partial copy: it does not
matter that the tail was not modified, because if it is dirtied and written
back to disk it will not cause any problems (uptodate *means* that the
destination page is as new or newer than the copy on disk).

1a. The above requires that fault_in_pages_readable correctly returns access
information, because atomic usercopies cannot distinguish between
non-present pages in a readable mapping, from lack of a readable mapping.

2.  If we find the destination page is non uptodate, unlock it (this could be
made slightly more optimal), then find and pin the source page with
get_user_pages. Relock the destination page and continue with the copy.
However, instead of a usercopy (which might take a fault), copy the data
via the kernel address space.

(also, rename maxlen to seglen, because it was confusing)

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1843,11 +1843,12 @@ generic_file_buffered_write(struct kiocb
filemap_set_next_iovec(_iov, nr_segs, _offset, written);
 
do {
+   struct page *src_page;
struct page *page;
pgoff_t index;  /* Pagecache index for current page */
unsigned long offset;   /* Offset into pagecache page */
-   unsigned long maxlen;   /* Bytes remaining in current iovec */
-   size_t bytes;   /* Bytes to write to page */
+   unsigned long seglen;   /* Bytes remaining in current iovec */
+   unsigned long bytes;/* Bytes to write to page */
size_t copied;  /* Bytes copied from user */
 
buf = cur_iov->iov_base + iov_offset;
@@ -1857,20 +1858,30 @@ generic_file_buffered_write(struct kiocb
if (bytes > count)
bytes = count;
 
-   maxlen = cur_iov->iov_len - iov_offset;
-   if (maxlen > bytes)
-   maxlen = bytes;
+   /*
+* a non-NULL src_page indicates that we're doing the
+* copy via get_user_pages and kmap.
+*/
+   src_page = NULL;
+
+   seglen = cur_iov->iov_len - iov_offset;
+   if (seglen > bytes)
+   seglen = bytes;
 
-#ifndef CONFIG_DEBUG_VM
/*
 * Bring in the user page that we will copy from _first_.
 * Otherwise there's a nasty deadlock on copying from the
 * same page as we're writing to, without it being marked
 * up-to-date.
+*
+* Not only is this an optimisation, but it is also required
+* to check that the address is actually valid, when atomic
+* usercopies are used, below.
 */
-   fault_in_pages_readable(buf, maxlen);
-#endif
-
+   if (unlikely(fault_in_pages_readable(buf, seglen))) {
+   status = -EFAULT;
+   break;
+   }
 
page = __grab_cache_page(mapping, index);
if (!page) {
@@ -1878,31 +1889,88 @@ generic_file_buffered_write(struct kiocb
break;
}
 
+   /*
+* non-uptodate pages cannot cope with short copies, and we
+* cannot take a pagefault with the destination page locked.
+* So pin the source page to copy it.
+*/
+   if (!PageUptodate(page)) {
+   unlock_page(page);
+
+   bytes = min(bytes, PAGE_CACHE_SIZE -
+((unsigned long)buf & ~PAGE_CACHE_MASK));
+
+   /*
+* Cannot 

[patch 9/10] mm: generic_file_buffered_write iovec cleanup

2007-01-12 Thread Nick Piggin
Hide some of the open-coded nr_segs tests into the iovec helpers. This is
all to simplify generic_file_buffered_write, because that gets more complex
in the next patch.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.h
===
--- linux-2.6.orig/mm/filemap.h
+++ linux-2.6/mm/filemap.h
@@ -22,82 +22,82 @@ __filemap_copy_from_user_iovec_inatomic(
 
 /*
  * Copy as much as we can into the page and return the number of bytes which
- * were sucessfully copied.  If a fault is encountered then clear the page
- * out to (offset+bytes) and return the number of bytes which were copied.
- *
- * NOTE: For this to work reliably we really want 
copy_from_user_inatomic_nocache
- * to *NOT* zero any tail of the buffer that it failed to copy.  If it does,
- * and if the following non-atomic copy succeeds, then there is a small window
- * where the target page contains neither the data before the write, nor the
- * data after the write (it contains zero).  A read at this time will see
- * data that is inconsistent with any ordering of the read and the write.
- * (This has been detected in practice).
+ * were sucessfully copied.  If a fault is encountered then return the number 
of
+ * bytes which were copied.
  */
 static inline size_t
-filemap_copy_from_user(struct page *page, unsigned long offset,
-   const char __user *buf, unsigned bytes)
+filemap_copy_from_user_atomic(struct page *page, unsigned long offset,
+   const struct iovec *iov, unsigned long nr_segs,
+   size_t base, size_t bytes)
 {
char *kaddr;
-   int left;
+   size_t copied;
 
kaddr = kmap_atomic(page, KM_USER0);
-   left = __copy_from_user_inatomic_nocache(kaddr + offset, buf, bytes);
+   if (likely(nr_segs == 1)) {
+   int left;
+   char __user *buf = iov->iov_base + base;
+   left = __copy_from_user_inatomic_nocache(kaddr + offset,
+   buf, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
+   iov, base, bytes);
+   }
kunmap_atomic(kaddr, KM_USER0);
 
-   if (left != 0) {
-   /* Do it the slow way */
-   kaddr = kmap(page);
-   left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
-   kunmap(page);
-   }
-   return bytes - left;
+   return copied;
 }
 
 /*
- * This has the same sideeffects and return value as filemap_copy_from_user().
- * The difference is that on a fault we need to memset the remainder of the
- * page (out to offset+bytes), to emulate filemap_copy_from_user()'s
- * single-segment behaviour.
+ * This has the same sideeffects and return value as
+ * filemap_copy_from_user_atomic().
+ * The difference is that it attempts to resolve faults.
  */
 static inline size_t
-filemap_copy_from_user_iovec(struct page *page, unsigned long offset,
-   const struct iovec *iov, size_t base, size_t bytes)
+filemap_copy_from_user(struct page *page, unsigned long offset,
+   const struct iovec *iov, unsigned long nr_segs,
+size_t base, size_t bytes)
 {
char *kaddr;
size_t copied;
 
-   kaddr = kmap_atomic(page, KM_USER0);
-   copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset, iov,
-base, bytes);
-   kunmap_atomic(kaddr, KM_USER0);
-   if (copied != bytes) {
-   kaddr = kmap(page);
-   copied = __filemap_copy_from_user_iovec_inatomic(kaddr + 
offset, iov,
-base, bytes);
-   if (bytes - copied)
-   memset(kaddr + offset + copied, 0, bytes - copied);
-   kunmap(page);
+   kaddr = kmap(page);
+   if (likely(nr_segs == 1)) {
+   int left;
+   char __user *buf = iov->iov_base + base;
+   left = __copy_from_user_nocache(kaddr + offset, buf, bytes);
+   copied = bytes - left;
+   } else {
+   copied = __filemap_copy_from_user_iovec_inatomic(kaddr + offset,
+   iov, base, bytes);
}
+   kunmap(page);
return copied;
 }
 
 static inline void
-filemap_set_next_iovec(const struct iovec **iovp, size_t *basep, size_t bytes)
+filemap_set_next_iovec(const struct iovec **iovp, unsigned long nr_segs,
+size_t *basep, size_t bytes)
 {
-   const struct iovec *iov = *iovp;
-   size_t base = *basep;
-
-   while (bytes) {
-   int copy = min(bytes, iov->iov_len - base);
-
-   

[patch 8/10] mm: generic_file_buffered_write cleanup more

2007-01-12 Thread Nick Piggin
No need to do the confusing switch of variables from copied into status.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1898,28 +1898,22 @@ generic_file_buffered_write(struct kiocb
goto fs_write_aop_error;
 
if (likely(copied > 0)) {
-   if (!status)
-   status = copied;
-
-   if (status >= 0) {
-   written += status;
-   count -= status;
-   pos += status;
-   buf += status;
-   if (unlikely(nr_segs > 1)) {
-   filemap_set_next_iovec(_iov,
-   _offset, status);
-   if (count)
-   buf = cur_iov->iov_base +
-   iov_offset;
-   } else {
-   iov_offset += status;
-   }
+   written += copied;
+   count -= copied;
+   pos += copied;
+   buf += copied;
+   if (unlikely(nr_segs > 1)) {
+   filemap_set_next_iovec(_iov,
+   _offset, copied);
+   if (count)
+   buf = cur_iov->iov_base + iov_offset;
+   } else {
+   iov_offset += copied;
}
}
if (unlikely(copied != bytes))
-   if (status >= 0)
-   status = -EFAULT;
+   status = -EFAULT;
+
unlock_page(page);
mark_page_accessed(page);
page_cache_release(page);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 5/10] mm: debug write deadlocks

2007-01-12 Thread Nick Piggin
Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the
difficult race where the page may be unmapped before calling copy_from_user.
Makes the race much easier to hit.

This is useful for demonstration and testing purposes, but is removed in a
subsequent patch.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1894,6 +1894,7 @@ generic_file_buffered_write(struct kiocb
if (maxlen > bytes)
maxlen = bytes;
 
+#ifndef CONFIG_DEBUG_VM
/*
 * Bring in the user page that we will copy from _first_.
 * Otherwise there's a nasty deadlock on copying from the
@@ -1901,6 +1902,7 @@ generic_file_buffered_write(struct kiocb
 * up-to-date.
 */
fault_in_pages_readable(buf, maxlen);
+#endif
 
page = __grab_cache_page(mapping,index,_page,_pvec);
if (!page) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 7/10] mm: cleanup pagecache insertion operations

2007-01-12 Thread Nick Piggin
Quite a bit of code is used in maintaining these "cached pages" that are
probably pretty unlikely to get used. It would require a narrow race where
the page is inserted concurrently while this process is allocating a page
in order to create the spare page. Then a multi-page write into an uncached
part of the file, to make use of it.

Next, the buffered write path (and others) uses its own LRU pagevec when it
should be just using the per-CPU LRU pagevec (which will cut down on both data
and code size cacheline footprint). Also, these private LRU pagevecs are
emptied after just a very short time, in contrast with the per-CPU pagevecs
that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required
to add the pages to pagecache for a bulk write (in 4K chunks).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -686,26 +686,22 @@ EXPORT_SYMBOL(find_lock_page);
 struct page *find_or_create_page(struct address_space *mapping,
unsigned long index, gfp_t gfp_mask)
 {
-   struct page *page, *cached_page = NULL;
+   struct page *page;
int err;
 repeat:
page = find_lock_page(mapping, index);
if (!page) {
-   if (!cached_page) {
-   cached_page = alloc_page(gfp_mask);
-   if (!cached_page)
-   return NULL;
-   }
-   err = add_to_page_cache_lru(cached_page, mapping,
-   index, gfp_mask);
-   if (!err) {
-   page = cached_page;
-   cached_page = NULL;
-   } else if (err == -EEXIST)
-   goto repeat;
+   page = alloc_page(gfp_mask);
+   if (!page)
+   return NULL;
+   err = add_to_page_cache_lru(page, mapping, index, gfp_mask);
+   if (unlikely(err)) {
+   page_cache_release(page);
+   page = NULL;
+   if (err == -EEXIST)
+   goto repeat;
+   }
}
-   if (cached_page)
-   page_cache_release(cached_page);
return page;
 }
 EXPORT_SYMBOL(find_or_create_page);
@@ -891,11 +887,9 @@ void do_generic_mapping_read(struct addr
unsigned long next_index;
unsigned long prev_index;
loff_t isize;
-   struct page *cached_page;
int error;
struct file_ra_state ra = *_ra;
 
-   cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
prev_index = ra.prev_page;
@@ -1059,23 +1053,20 @@ no_cached_page:
 * Ok, it wasn't cached, so we need to create a new
 * page..
 */
-   if (!cached_page) {
-   cached_page = page_cache_alloc_cold(mapping);
-   if (!cached_page) {
-   desc->error = -ENOMEM;
-   goto out;
-   }
+   page = page_cache_alloc_cold(mapping);
+   if (!page) {
+   desc->error = -ENOMEM;
+   goto out;
}
-   error = add_to_page_cache_lru(cached_page, mapping,
+   error = add_to_page_cache_lru(page, mapping,
index, GFP_KERNEL);
if (error) {
+   page_cache_release(page);
if (error == -EEXIST)
goto find_page;
desc->error = error;
goto out;
}
-   page = cached_page;
-   cached_page = NULL;
goto readpage;
}
 
@@ -1083,8 +1074,6 @@ out:
*_ra = ra;
 
*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
-   if (cached_page)
-   page_cache_release(cached_page);
if (filp)
file_accessed(filp);
 }
@@ -1542,35 +1531,28 @@ static inline struct page *__read_cache_
int (*filler)(void *,struct page*),
void *data)
 {
-   struct page *page, *cached_page = NULL;
+   struct page *page;
int err;
 repeat:
page = find_get_page(mapping, index);
if (!page) {
-   if (!cached_page) {
-   cached_page = page_cache_alloc_cold(mapping);
-   if (!cached_page)
-   return ERR_PTR(-ENOMEM);
-   }
-   err = add_to_page_cache_lru(cached_page, mapping,
-   index, GFP_KERNEL);
-   if (err == -EEXIST)
-   goto 

[patch 3/10] mm: revert "generic_file_buffered_write(): deadlock on vectored write"

2007-01-12 Thread Nick Piggin
From: Andrew Morton <[EMAIL PROTECTED]>

Revert 6527c2bdf1f833cc18e8f42bd97973d583e4aa83

This patch fixed the following bug:

  When prefaulting in the pages in generic_file_buffered_write(), we only
  faulted in the pages for the firts segment of the iovec.  If the second of
  successive segment described a mmapping of the page into which we're
  write()ing, and that page is not up-to-date, the fault handler tries to lock
  the already-locked page (to bring it up to date) and deadlocks.

  An exploit for this bug is in writev-deadlock-demo.c, in
  http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.

  (These demos assume blocksize < PAGE_CACHE_SIZE).

The problem with this fix is that it takes the kernel back to doing a single
prepare_write()/commit_write() per iovec segment.  So in the worst case we'll
run prepare_write+commit_write 1024 times where we previously would have run
it once. The other problem with the fix is that it fix all the locking problems.




And apparently this change killed NFS overwrite performance, because, I
suppose, it talks to the server for each prepare_write+commit_write.

So just back that patch out - we'll be fixing the deadlock by other means.

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>

Nick says: also it only ever actually papered over the bug, because after
faulting in the pages, they might be unmapped or reclaimed.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1881,21 +1881,14 @@ generic_file_buffered_write(struct kiocb
do {
unsigned long index;
unsigned long offset;
+   unsigned long maxlen;
size_t copied;
 
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
-
-   /* Limit the size of the copy to the caller's write size */
-   bytes = min(bytes, count);
-
-   /*
-* Limit the size of the copy to that of the current segment,
-* because fault_in_pages_readable() doesn't know how to walk
-* segments.
-*/
-   bytes = min(bytes, cur_iov->iov_len - iov_base);
+   if (bytes > count)
+   bytes = count;
 
/*
 * Bring in the user page that we will copy from _first_.
@@ -1903,7 +1896,10 @@ generic_file_buffered_write(struct kiocb
 * same page as we're writing to, without it being marked
 * up-to-date.
 */
-   fault_in_pages_readable(buf, bytes);
+   maxlen = cur_iov->iov_len - iov_base;
+   if (maxlen > bytes)
+   maxlen = bytes;
+   fault_in_pages_readable(buf, maxlen);
 
page = __grab_cache_page(mapping,index,_page,_pvec);
if (!page) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 2/10] mm: revert "generic_file_buffered_write(): handle zero length iovec segments"

2007-01-12 Thread Nick Piggin
From: Andrew Morton <[EMAIL PROTECTED]>

Revert 81b0c8713385ce1b1b9058e916edcf9561ad76d6.

This was a bugfix against 6527c2bdf1f833cc18e8f42bd97973d583e4aa83, which we
also revert.

Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/mm/filemap.c
===
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1911,12 +1911,6 @@ generic_file_buffered_write(struct kiocb
break;
}
 
-   if (unlikely(bytes == 0)) {
-   status = 0;
-   copied = 0;
-   goto zero_length_segment;
-   }
-
status = a_ops->prepare_write(file, page, offset, offset+bytes);
if (unlikely(status)) {
loff_t isize = i_size_read(inode);
@@ -1946,8 +1940,7 @@ generic_file_buffered_write(struct kiocb
page_cache_release(page);
continue;
}
-zero_length_segment:
-   if (likely(copied >= 0)) {
+   if (likely(copied > 0)) {
if (!status)
status = copied;
 
Index: linux-2.6/mm/filemap.h
===
--- linux-2.6.orig/mm/filemap.h
+++ linux-2.6/mm/filemap.h
@@ -87,7 +87,7 @@ filemap_set_next_iovec(const struct iove
const struct iovec *iov = *iovp;
size_t base = *basep;
 
-   do {
+   while (bytes) {
int copy = min(bytes, iov->iov_len - base);
 
bytes -= copy;
@@ -96,7 +96,7 @@ filemap_set_next_iovec(const struct iove
iov++;
base = 0;
}
-   } while (bytes);
+   }
*iovp = iov;
*basep = base;
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/10] fs: libfs buffered write leak fix

2007-01-12 Thread Nick Piggin
simple_prepare_write and nobh_prepare_write leak uninitialised kernel data.
Fix the former, make a note of the latter. Several other filesystems seem
to be iffy here, too.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Index: linux-2.6/fs/libfs.c
===
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -327,32 +327,35 @@ int simple_readpage(struct file *file, s
 int simple_prepare_write(struct file *file, struct page *page,
unsigned from, unsigned to)
 {
-   if (!PageUptodate(page)) {
-   if (to - from != PAGE_CACHE_SIZE) {
-   void *kaddr = kmap_atomic(page, KM_USER0);
-   memset(kaddr, 0, from);
-   memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-   flush_dcache_page(page);
-   kunmap_atomic(kaddr, KM_USER0);
-   }
+   if (PageUptodate(page))
+   return 0;
+
+   if (to - from != PAGE_CACHE_SIZE) {
+   clear_highpage(page);
+   flush_dcache_page(page);
SetPageUptodate(page);
}
+
return 0;
 }
 
 int simple_commit_write(struct file *file, struct page *page,
-   unsigned offset, unsigned to)
+   unsigned from, unsigned to)
 {
-   struct inode *inode = page->mapping->host;
-   loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
-
-   /*
-* No need to use i_size_read() here, the i_size
-* cannot change under us because we hold the i_mutex.
-*/
-   if (pos > inode->i_size)
-   i_size_write(inode, pos);
-   set_page_dirty(page);
+   if (to > from) {
+   struct inode *inode = page->mapping->host;
+   loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+
+   if (to - from == PAGE_CACHE_SIZE)
+   SetPageUptodate(page);
+   /*
+* No need to use i_size_read() here, the i_size
+* cannot change under us because we hold the i_mutex.
+*/
+   if (pos > inode->i_size)
+   i_size_write(inode, pos);
+   set_page_dirty(page);
+   }
return 0;
 }
 
Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -2344,6 +2344,8 @@ int nobh_prepare_write(struct page *page
 
if (is_mapped_to_disk)
SetPageMappedToDisk(page);
+
+   /* XXX: information leak vs read(2) */
SetPageUptodate(page);
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/10] buffered write deadlock fix

2007-01-12 Thread Nick Piggin
The following set of patches attempt to fix the buffered write
locking problems (and there are a couple of peripheral patches
and cleanups there too).

This does pass the write deadlock tests that otherwise fail.

Has survived a few hours of fsx-linux on ext2 and 3.

Patches against 2.6.20-rc4. I didn't have the heart to attempt
to rebase them on -mm, at least until I get some feedback ;)

Thanks,
Nick

--
SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3 regression: suspend to RAM broken on Mac mini Core Duo

2007-01-12 Thread Tino Keitel
On Fri, Jan 12, 2007 at 14:50:25 +, Pavel Machek wrote:
> Hi!
> 
> > > >> > It didn't. It looks like it is unusable, becuase it isn't reliable in
> > > >> > 2.6.20-rc3.
> > > >>
> > > >> Is this issue still present in -rc4?
> > > >
> > > >I used 2.6.20-rc4 in single user mode, and applied 2 patches from
> > > >netdev to get wake on LAN support. This way I was able to set up an
> > > >automatic suspend/resume loop. It looked good, but after e.g. 20
> > > >minutes, the resume hang. So it is reproduceable with 2.6.20-rc4.
> > > >Unfortunately, I can not test the same with 2.6.18, as the wake on LAN
> > > >patches need 2.6.20-rc.
> > > 
> > > Hmm, do you mean this is the first time of this kind of testing?
> > > Is this issue related to LAN driver?
> > > I guess you should be able to set up an automatic suspend/resume loop
> > > with /proc/acpi/alarm, and test similar with 2.6.18.
> > 
> > Thanks for the hint. I just used /proc/acpi/alarm to set up a
> > suspend/resume loop and did ca. 100 cycles in a row with 2.6.18.2 in
> > single user mode, without a failure.
> 
> Can you do similar test on 2.6.20 -- w/o network driver loaded (and
> generaly minimum drivers?)

I think I found the problem. In 2.6.18, I had a slightly different
config. With 2.6.20-rc4, I had sucessful suspend/resume cycles without
the USB DVB-T box attached. I tweaked the USB options a bit and
activated some options (CONFIG_USB_SUSPEND,
CONFIG_USB_MULTITHREAD_PROBE, CONFIG_USB_EHCI_SPLIT_ISO,
CONFIG_USB_EHCI_ROOT_HUB_TT, CONFIG_USB_EHCI_TT_NEWSCHED) and now I can
suspend/resume without hangs. At least I haven't seen one until now.

Thanks for you patience and regards,
Tino

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.20-rc5

2007-01-12 Thread Nigel Cunningham
Hi.

On Fri, 2007-01-12 at 14:26 -0800, Andrew Morton wrote:
> On Fri, 12 Jan 2007 14:27:48 -0500 (EST)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > 
> > Ok, there it is, in all its shining glory.
> > 
> 
> It still doesn't run Excel.

Heretic!

:)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "svc: unknown version (3)" when CONFIG_NFSD_V4=y

2007-01-12 Thread Fengguang Wu
On Sat, Jan 13, 2007 at 06:43:07AM +1100, Neil Brown wrote:
> 
> Ok, thanks.  I must have missed something else wrong in the code..
> 
> Probably this 'break' in the wrong place...
> 
> Could you try this patch instead please - or just move the 'break' to
> where it should be.

Now it worked :)

Thanks,
Wu

> Signed-off-by: Neil Brown <[EMAIL PROTECTED]>
> 
> ### Diffstat output
>  ./fs/nfsd/nfssvc.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff .prev/fs/nfsd/nfssvc.c ./fs/nfsd/nfssvc.c
> --- .prev/fs/nfsd/nfssvc.c2007-01-11 14:55:38.0 +1100
> +++ ./fs/nfsd/nfssvc.c2007-01-13 06:40:12.0 +1100
> @@ -72,7 +72,7 @@ static struct svc_program   nfsd_acl_progr
>   .pg_prog= NFS_ACL_PROGRAM,
>   .pg_nvers   = NFSD_ACL_NRVERS,
>   .pg_vers= nfsd_acl_versions,
> - .pg_name= "nfsd",
> + .pg_name= "nfsacl",
>   .pg_class   = "nfsd",
>   .pg_stats   = _acl_svcstats,
>   .pg_authenticate= _set_client,
> @@ -118,16 +118,16 @@ int nfsd_vers(int vers, enum vers_op cha
>   switch(change) {
>   case NFSD_SET:
>   nfsd_versions[vers] = nfsd_version[vers];
> - break;
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>   if (vers < NFSD_ACL_NRVERS)
> - nfsd_acl_version[vers] = nfsd_acl_version[vers];
> + nfsd_acl_versions[vers] = nfsd_acl_version[vers];
>  #endif
> + break;
>   case NFSD_CLEAR:
>   nfsd_versions[vers] = NULL;
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>   if (vers < NFSD_ACL_NRVERS)
> - nfsd_acl_version[vers] = NULL;
> + nfsd_acl_versions[vers] = NULL;
>  #endif
>   break;
>   case NFSD_TEST:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Proposed changes for libata speed handling

2007-01-12 Thread Tejun Heo
Alan wrote:
> I'm currently hacking on the speed handling code a bit
> 
> I'd like to do the following unless anyone has any objections
> 
> - Remove post_set_mode and make drivers wrap the guts of the existing
> set_mode() function. This allows a driver to wrap and see success/failure
> while removing a callback, and also to add pre-mode code. (ie you'd do
> 
> foo_set_mode() {
> ata_default_set_mode()
> my_fiddling();
> }
> 
> - Fix the ->set_mode method FIXMEs in the current tree [DONE]
> 
> - Add set_specific_mode, with a default behaviour that works for most
> controllers. Those using a private ->set_mode might need a private
> ->set_specific_mode, in some cases like it8212 simply to error the request
> 
> - Hook set_specific_mode to the ata command parser so that instead of
> erroring set_features commands we snoop them and force the mode change
> desired on the controller (if valid)
> 
> - Send the command to set the speed before setting the controller speed,
> so that we send them at the right rate.
> 
> Any comments ?

Wouldn't it be better to have ->determine_xfer_mask() and
->set_specific_mode() than having two somewhat overlapping callbacks?
Or is there some problem that can't be handled that way?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3-mm1 - git-block.patch causes hard lockups

2007-01-12 Thread Valdis . Kletnieks
On , [EMAIL PROTECTED] said:
> On Thu, 04 Jan 2007 22:02:00 PST, Andrew Morton said:
> 
> > 
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc3-mm1/

Still seeing this in -rc4-mm1..

> With git-block.patch applied, my system locks up *hard* at system shutdown
> time - even alt-sysrq doesn't do anything.  Need to do the "power button for 
> 5"
> stunt to get the system back.

And today's chef's special is wild crow, tastefully prepared in a stir-fry
with broccoli and mixed asiatic vegetables, served on a bed of steamed rice...

It doesn't look quite as locked up hard when you fix the ##$*&% script that
did an 'echo 0 > /proc/sys/kernel/sysrq' :)

Here's the hand-copied traceback:

__mutex_lock_slowpath+0x22/0xaa
mutex_lock+0xe/0x10
synchronize_rcu+0x23/0xc5
blk_sync_queue+0x1d/0x5a
blk_release_queue+0x19/0x65
kobject_cleanup+0x53/0x72
kobject_release+0x0/0xf
kobject_release+0xd/0xf
kref_put+0x5f/0x6b
kobject_put+0x19/0x1b
blk_put_queue+0x43/0x48
dm_put+0x11f/0x133
dev_remove+0xa3/0xb7
ctl_ioctl+0x24f/0x29f
dev_remove+0x0/0xb7
file_has_perm+0xa7/0xb6
do_ioctl+0x5e/0x77
vfs_ioctl+0x252/0x26f
sys_ioctl+0x5f/0x82
tracesys+0xdc/0xe1


> The system is Fedora Core 6/Rawhide, and the last command issued (from
> /etc/rc6.d/S01reboot) is "/sbin/cryptsetup remove swap".  It hits that,
> and *wham* we're dead.  Works fine if I revert git-block.patch.
> 
> The line from /etc/crypttab for the encrypted swap:
> 
> swap /dev/mapper/VolGroup00-swap /dev/urandom swap,cipher=aes-cbc-essiv:sha256


pgpCvj877UsH7.pgp
Description: PGP signature


Re: /sys/$DEVPATH/uevent vs uevent attributes

2007-01-12 Thread Michael Tokarev
Greg KH wrote:
> On Fri, Jan 12, 2007 at 10:32:10PM +0300, Michael Tokarev wrote:
>> (No patch at this time, -- just asking about an.. idea ;)
> 
> Let's see what such a patch looks like to see if it would be workable or
> not.

Umm.. it's definitely workable, and even almost trivial.

Just splitting kobject_uevent() routine into two parts, one to format
the environment variables, and one to actually send things over netlink
and executing the hotplug_helper if defined, and using the first part
to format the content of `uevent' file will do the trick.

I don't know how to do the last part.

> And no one forces you to use udev, I have machines with a static /dev
> that work just fine :)

It has less and less chances to work correctly.  For example, this dynamic
sdX thing, when I don't know anymore which sdX is which, without some help
from /dev/disk/by-XXX/.

And more and more software requires udev, at least as packages by distos.
For example, today I've got rid of udev on one of our servers, which has
been installed (debian) due to xen-utils having Depends: udev.  Even when
it doesn't *really* *require* udev, -- i replaced the whole thing with a
5-line shell script.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hotplug from the user side ?

2007-01-12 Thread Tejun Heo
Soeren Sonnenburg wrote:
> It is true it detects a removal and newly plugged devices immediately...
> However it still prints warnings and errors that it could not
> synchronize SCSI cache for the disks. Then it prints regular 'rejects
> I/O to dead device' warning messages and on replugging the disks puts
> them to the next free sd device (e.g. sdc -> sdd).

You need to stop using the devices before unplugging.  If you have no
pending IO to the device, there won't be 'rejects IO to dead device'
messages.  You can ignore the SCSI cache sync failure if the device is
properly closed before being unplugged.

> These messages sound eval - so now the question is should I care ?
> ( On the other hand it did not crash the machine )

So, no, you don't really have to care.  Just make sure the device is
unmounted prior to unplugging.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Fix bttv and friends on 64bit machines with lots of memory.

2007-01-12 Thread hermann pitton
Am Freitag, den 12.01.2007, 22:42 -0200 schrieb Mauro Carvalho Chehab:
> Em Qui, 2007-01-11 às 00:41 +0100, hermann pitton escreveu:
> > Am Mittwoch, den 10.01.2007, 09:58 +0100 schrieb Gerd Hoffmann:
> > >   Hi,
> > > 
> > > We have a DMA32 zone now, lets use it to make sure the card
> > > can reach the memory we have allocated for the video frame
> > > buffers.
> > > 
> > > please apply,
> > > 
> > >   Gerd
> > 
> > Hi,
> > 
> > did anybody already pick up, comment, review Gerd's patch ?
> > 
> > Walks in into his own home like a stranger ...
> > 
> > Gerd, THANKS for all you did.
> > It was a incredible lot!
> 
> Hermann,
> 
> I just picked it today. I was out this week due to a physical damage at
> the hd on my notebook, were my mailboxes are retrieved. Only today I
> have it on a stable condition to return back to activities, successfully
> recovering my /home on it.

Mauro, Gerd,

sorry to be a pain with this one,
just thought it could be a missing each other.

Our maintainers don't need to excuse for anything!

Adrian and all, thanks for fixing the remaining bugs.

Cheers,
Hermann




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Linux Kernel Markers

2007-01-12 Thread Richard J Moore


Mathieu Desnoyers <[EMAIL PROTECTED]> wrote on 20/12/2006
23:52:16:

> Hi,
>
> You will find, in the following posts, the latest revision of the Linux
Kernel
> Markers. Due to the need some tracing projects (LTTng, SystemTAP) has of
this
> kind of mechanism, it could be nice to consider it for mainstream
inclusion.
>
> The following patches apply on 2.6.20-rc1-git7.
>
> Signed-off-by : Mathieu Desnoyers <[EMAIL PROTECTED]>

Mathiue, FWIW I like this idea. A few years ago I implemented something
similar, but that had no explicit clients. Consequently I made my hooks
code more generalized than is needed in practice. I do remember that Karim
reworked the LTT instrumentation to use hooks and it worked fine.

You've got the same optimizations for x86 by modifying an instruction's
immediate operand and thus avoiding a d-cache hit. The only real caveat is
the need to avoid the unsynchronised cross modification erratum. Which
means that all processors will need to issue a serializing operation before
executing a Marker whose state is changed. How is that handled?

One additional thing we did, which might be useful at some future point,
was adding a /proc interface. We reflected the current instrumentation
though /proc and gave the status of each hook. We even talked about being
able to enable or disabled instrumentation by writing to /proc but I don't
think we ever implemented this.

It's high time we settled the issue of instrumentation. It gets my vote,

Good luck!

Richard

- -
Richard J Moore
IBM Linux Technology Centre

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ahci_softreset prevents acpi_power_off

2007-01-12 Thread Tejun Heo
Hello,

Faik Uygur wrote:
> We have a Sony PCG-6H1M laptop. It started failing to poweroff with our 
> switch 
> from 2.6.16 stable series kernels to 2.6.18 stable series. Rebooting works.
> 
> While searching for the cause, I have found these reported bug reports in the 
> kernel bugzilla which may be related to this bug:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=6982
> http://bugzilla.kernel.org/show_bug.cgi?id=7447

Seems mostly unrelated.

> According to git bisect, this is the first bad commit:
> 
> 4658f79bec0b51222e769e328c2923f39f3bda77 is first bad commit
> commit 4658f79bec0b51222e769e328c2923f39f3bda77
> Author: Tejun Heo <[EMAIL PROTECTED]>
> Date:   Wed Mar 22 21:07:03 2006 +0900
> 
> [PATCH] ahci: add softreset
> 
> Now that libata is smart enought to handle both soft and hard resets,
> add softreset method.
> 
> Signed-off-by: Tejun Heo <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
> 
> :04 04 ba0a16d0ef82b6577bb61cfb18e6d9df9ee0984e 
> d0fc78d8f9bbe238f98ac8964562a33e64b30605 M  drivers
> 
> With v2.6.20-rc4 from git, it is still failing to poweroff. By not compiling 
> CONFIG_SCSI_SATA_AHCI, it successfully powers off.
> 
> Also with CONFIG_SCSI_SATA_AHCI, reverting this patch manually by setting 
> softreset to NULL in ata_do_eh calls in ahci.c makes the machine poweroff.

Wow, this is one of the most amazing error report.  ahci softreset
preventing system halt?

> I have attached the dmesg output with defined ATA_DEBUG, ATA_VERBOSE_DEBUG
> if it helps. Also you may find lspci output attached. 
> 
> Please let me know if anything else is needed.

Does everything else work okay?  Can you access devices attached to
ahci?  What happens when you try to shutdown?  If possible, please post
dmesg of shutting down.  You can store it easily using netconsole
(Documentation/networking/netconsole.txt).

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Andrew Morton
On Fri, 12 Jan 2007 17:00:39 -0800
Ravikiran G Thirumalai <[EMAIL PROTECTED]> wrote:

> But is
> lru_lock an issue is another question.

I doubt it, although there might be changes we can make in there to
work around it.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: High lock spin time for zone->lru_lock under extreme conditions

2007-01-12 Thread Ravikiran G Thirumalai
On Fri, Jan 12, 2007 at 01:45:43PM -0800, Christoph Lameter wrote:
> On Fri, 12 Jan 2007, Ravikiran G Thirumalai wrote:
> 
> Moreover mostatomic operations are to remote memory which is also 
> increasing the problem by making the atomic ops take longer. Typically 
> mature NUMA system have implemented hardware provisions that can deal with 
> such high degrees of contention. If this is simply a SMP system that was
> turned into a NUMA box then this is a new hardware scenario for the 
> engineers.

This is using HT as all AMD systems do, but this is one of the 8
socket systems.  

I ran the same test on a 2 node Tyan AMD box, and did not notice the
atrocious spin times. It would be interesting to see how a 4 socket HT box
would fare. Unfortunately, I do not have access to one. If someone has access
to such a box, I can provide the test case and instrumentation patches.

It could very well be the hardware limitation in this case, which means, all
the more reason to enable interrupts with spin locks while spinning. But is
lru_lock an issue is another question.

Thanks,
Kiran
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: sata_sil24 lockups under heavy i/o

2007-01-12 Thread Tejun Heo
Mark Wagner wrote:
> The sil24-connected sata drives are external and connected to their own
> power supply.
> 
> I've replaced the sil24-based card with a Promise SATA300 TX4 controller
> card and everything seems to work now.

Hmmm... sil24 fares well with four ports occupied.  Weird.  Care to give
it another shot?  Maybe pci bus contact was bad or something.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] NMI watchdog lockups caused by mwait_idle

2007-01-12 Thread Darrick J. Wong
Pallipadi, Venkatesh wrote:
> Darrick,
> 
> I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked
> fine. No watchdog lockups.
> Can you try idle routine with hlt instead of mwait. There is no boot
> option for this in x86_64, but you can change
> arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait.
> With that default kernel should use hlt based idle.
> 
> Also, worth seeing will be, what happens when nmi_watchdog=0,
> nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us
> whether nmi_watchdog is raising some false alarm or the CPUs are indeed
> getting locked up here..
> 

Locks up with hlt-based idle too. :(

Here's what I get with nmi_watchdog=0:

[  206.088703] BUG: soft lockup detected on CPU#0!
[  206.093284] 
[  206.093286] Call Trace:
[  206.097324][] softlockup_tick+0xd4/0xe9
[  206.103618]  [] do_flush_tlb_all+0x0/0x68
[  206.109238]  [] run_local_timers+0x13/0x15
[  206.114949]  [] update_process_times+0x4c/0x78
[  206.121008]  [] smp_local_timer_interrupt+0x34/0x51
[  206.127498]  [] smp_apic_timer_interrupt+0x49/0x60
[  206.133901]  [] apic_timer_interrupt+0x66/0x70
[  206.139956][] __smp_call_function+0x66/0x87
[  206.146594]  [] __smp_call_function+0x62/0x87
[  206.152564]  [] do_flush_tlb_all+0x0/0x68
[  206.158188]  [] do_flush_tlb_all+0x0/0x68
[  206.163813]  [] smp_call_function+0x32/0x49
[  206.169611]  [] do_flush_tlb_all+0x0/0x68
[  206.175236]  [] on_each_cpu+0x30/0x67
[  206.180514]  [] flush_tlb_all+0x1c/0x1e
[  206.185965]  [] unmap_vm_area+0x1c3/0x265
[  206.191590]  [] init_level4_pgt+0xc20/0x1000
[  206.197474]  [] remove_vm_area+0x41/0x67
[  206.203010]  [] iounmap+0x8e/0xc8
[  206.207933]  [] acpi_os_unmap_memory+0x9/0xb
[  206.213810]  [] 
acpi_ev_system_memory_region_setup+0x52/0x105
[  206.221174]  [] acpi_ut_delete_internal_obj+0x2c4/0x3b2
[  206.228012]  [] acpi_ut_update_ref_count+0x180/0x1d2
[  206.234587]  [] acpi_ut_update_object_reference+0x160/0x207
[  206.241770]  [] acpi_ut_remove_reference+0xb5/0xd5
[  206.248173]  [] acpi_ns_detach_object+0xca/0xee
[  206.254318]  [] 
acpi_ns_delete_namespace_by_owner+0xcf/0x154
[  206.261597]  [] acpi_ds_terminate_control_method+0xb5/0x14f
[  206.268779]  [] acpi_ps_parse_aml+0x242/0x3a0
[  206.274750]  [] acpi_ps_execute_pass+0xd5/0x10b
[  206.280895]  [] acpi_ps_execute_method+0x1bf/0x2cb
[  206.287298]  [] acpi_ns_evaluate+0x1f8/0x315
[  206.293180]  [] acpi_evaluate_object+0x1d9/0x2fa
[  206.299411]  [] kmem_cache_alloc+0xce/0xda
[  206.305125]  [] :processor:acpi_processor_start+0x656/0x6fd
[  206.312307]  [] kmem_cache_zalloc+0xce/0xf4
[  206.318103]  [] acpi_start_single_object+0x2a/0x54
[  206.324509]  [] acpi_bus_register_driver+0xcd/0x14c
[  206.331001]  [] :processor:acpi_processor_init+0x61/0xb7
[  206.337923]  [] sys_init_module+0xac/0x16c
[  206.343630]  [] system_call+0x7e/0x83

nmi_watchdog={1,2} produce the same errors.

--D
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: /sys/$DEVPATH/uevent vs uevent attributes

2007-01-12 Thread Greg KH
On Fri, Jan 12, 2007 at 10:32:10PM +0300, Michael Tokarev wrote:
> 
> (No patch at this time, -- just asking about an.. idea ;)

Let's see what such a patch looks like to see if it would be workable or
not.

And no one forces you to use udev, I have machines with a static /dev
that work just fine :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Frederik Deweerdt
On Sat, Jan 13, 2007 at 01:08:46AM +0100, Michal Piotrowski wrote:
> Jiri Slaby napisał(a):
> > Frederik Deweerdt wrote:
> >> On Fri, Jan 12, 2007 at 05:53:08PM -0500, Len Brown wrote:
> >>> On Friday 12 January 2007 05:20, Frederik Deweerdt wrote:
>  On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
> >   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc4-mm1/
> >
>  Hi,
> 
>  The git-acpi.patch replaces earlier "if(!handler) return -EINVAL" by
>  "BUG_ON(!handler)". This locks my machine early at boot with a message
>  along the lines of (It's hand copied):
>  Int 6: cr2:  eip: c0570e05 flags: 00010046 cs: 60
>  stack: c054ffac c011db2b c04936d0 c054ff68 c054ffc0 c054fff4 c057da2c
> 
>  Reverting the change as follows, allows booting:
>  Any ideas to debug this further?
>  diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>  index db0c5f6..fba018c 100644
>  --- a/drivers/acpi/tables.c
>  +++ b/drivers/acpi/tables.c
>  @@ -414,7 +414,9 @@ int __init acpi_table_parse(enum acpi_ta
>   unsigned int index;
>   unsigned int count = 0;
>   
>  -BUG_ON(!handler);
>  +if (!handler)
>  +return -EINVAL;
>  +/*BUG_ON(!handler);*/
>   
>   for (i = 0; i < sdt_count; i++) {
>   if (sdt_entry[i].id != id)
> >>> What do you see if on failure you also print out the params, like below?
> > 
> > I get this:
> > 
> > ACPI: RSDP (v000 GBT   ) @ 0x000f6e80
> > ACPI: RSDT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3000
> > ACPI: FADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3040
> > ACPI: MADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff7100
> > ACPI: DSDT (v001 GBTAWRDACPI 0x1000 MSFT 0x010c) @ 0x
> > ACPI: PM-Timer IO Port: 0x1008
> > ACPI: Local APIC address 0xfee0
> > ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> > Processor #0 15:2 APIC version 20
> > ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> > Processor #1 15:2 APIC version 20
> > ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
> > ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
> > ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
> > IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> > ACPI: IRQ0 used by override.
> > ACPI: IRQ2 used by override.
> > ACPI: IRQ9 used by override.
> > Enabling APIC mode:  Flat.  Using 1 I/O APICs
> > ACPI: acpi_table_parse(17, ) HPET NULL handler!
> > Using ACPI (MADT) for SMP configuration information
> > 
> 
> ACPI: acpi_table_parse(17, ) HPET NULL handler!
So the BUG_ON is triggered by CONFIG_HPET_TIMER not being defined,
causing acpi_parse_hpet to be NULL.
Should the acpi_table_parse() called be ifdef'ed of is the previous
behaviour (returning -EINVAL) just OK?

Regards,
Frederik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Jiri Slaby
Jiri Slaby wrote:
>> On Fri, Jan 12, 2007 at 05:53:08PM -0500, Len Brown wrote:
>>> What do you see if on failure you also print out the params, like below?
[...]
> ACPI: acpi_table_parse(17, ) HPET NULL handler!

After re-enabling HPET, it disappeared.

regards,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Fix bttv and friends on 64bit machines with lots of memory.

2007-01-12 Thread Mauro Carvalho Chehab
Em Qui, 2007-01-11 às 00:41 +0100, hermann pitton escreveu:
> Am Mittwoch, den 10.01.2007, 09:58 +0100 schrieb Gerd Hoffmann:
> >   Hi,
> > 
> > We have a DMA32 zone now, lets use it to make sure the card
> > can reach the memory we have allocated for the video frame
> > buffers.
> > 
> > please apply,
> > 
> >   Gerd
> 
> Hi,
> 
> did anybody already pick up, comment, review Gerd's patch ?
> 
> Walks in into his own home like a stranger ...
> 
> Gerd, THANKS for all you did.
> It was a incredible lot!

Hermann,

I just picked it today. I was out this week due to a physical damage at
the hd on my notebook, were my mailboxes are retrieved. Only today I
have it on a stable condition to return back to activities, successfully
recovering my /home on it.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: list_del corruption with fedora 6 kernels (fc5 was ok)

2007-01-12 Thread Alexey Dobriyan
On Fri, Jan 12, 2007 at 07:27:30PM -0500, Lee Revell wrote:
> On Sat, 2007-01-13 at 00:34 +0100, Karl Kiniger wrote:
> > how to track this down?
>
> Reproduce it with an untainted kernel (no nvidia or vmware modules) and
> repost.

How about big fat advice in every tainted oops to bugger off?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: list_del corruption with fedora 6 kernels (fc5 was ok)

2007-01-12 Thread Lee Revell
On Sat, 2007-01-13 at 00:34 +0100, Karl Kiniger wrote:
> how to track this down?

Reproduce it with an untainted kernel (no nvidia or vmware modules) and
repost.

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [BUG] NMI watchdog lockups caused by mwait_idle

2007-01-12 Thread Pallipadi, Venkatesh

Darrick,

I tried 2.6.20-rc4 on a Dempsey system here in my lab and it worked
fine. No watchdog lockups.
Can you try idle routine with hlt instead of mwait. There is no boot
option for this in x86_64, but you can change
arch/x86_64/kernel/process.c:select_idle_routine() not to enable mwait.
With that default kernel should use hlt based idle.

Also, worth seeing will be, what happens when nmi_watchdog=0,
nmi_watchdog=1, and nmi_watchdog=2 boot options. That should tell us
whether nmi_watchdog is raising some false alarm or the CPUs are indeed
getting locked up here..

Thanks,
Venki


>-Original Message-
>From: Darrick J. Wong [mailto:[EMAIL PROTECTED] 
>Sent: Friday, January 12, 2007 1:01 PM
>To: Pallipadi, Venkatesh
>Cc: Linux Kernel Mailing List
>Subject: [BUG] NMI watchdog lockups caused by mwait_idle
>
>Hi Venkatesh,
>
>I have an IBM IntelliStation Z30 with two Dempsey CPUs.  When I try to
>boot 2.6.20-rc4 on it, the system prints messages about NMI watchdog
>lockups.  git-bisect determined that the patch "[PATCH] x86-64: Fix
>interrupt race in idle callback (3rd try)" was the source of these
>problems, and I can work around the problem either by passing
>"idle=poll" to get avoid mwait_idle or by reverting the patch.
>
>Other non-Dempsey Xeon machines with mwait support do not exhibit these
>symptoms.  I will try to determine if this is a bug specific to Dempsey
>CPUs or this particular type of machine.  I suspect the latter, but I
>don't know enough about monitor/mwait to pursue this much further.
>
>What else can I do to diagnose this?
>
>--D
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Michal Piotrowski
Jiri Slaby napisał(a):
> Frederik Deweerdt wrote:
>> On Fri, Jan 12, 2007 at 05:53:08PM -0500, Len Brown wrote:
>>> On Friday 12 January 2007 05:20, Frederik Deweerdt wrote:
 On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc4-mm1/
>
 Hi,

 The git-acpi.patch replaces earlier "if(!handler) return -EINVAL" by
 "BUG_ON(!handler)". This locks my machine early at boot with a message
 along the lines of (It's hand copied):
 Int 6: cr2:  eip: c0570e05 flags: 00010046 cs: 60
 stack: c054ffac c011db2b c04936d0 c054ff68 c054ffc0 c054fff4 c057da2c

 Reverting the change as follows, allows booting:
 Any ideas to debug this further?
 diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
 index db0c5f6..fba018c 100644
 --- a/drivers/acpi/tables.c
 +++ b/drivers/acpi/tables.c
 @@ -414,7 +414,9 @@ int __init acpi_table_parse(enum acpi_ta
unsigned int index;
unsigned int count = 0;
  
 -  BUG_ON(!handler);
 +  if (!handler)
 +  return -EINVAL;
 +  /*BUG_ON(!handler);*/
  
for (i = 0; i < sdt_count; i++) {
if (sdt_entry[i].id != id)
>>> What do you see if on failure you also print out the params, like below?
> 
> I get this:
> 
> ACPI: RSDP (v000 GBT   ) @ 0x000f6e80
> ACPI: RSDT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3000
> ACPI: FADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3040
> ACPI: MADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff7100
> ACPI: DSDT (v001 GBTAWRDACPI 0x1000 MSFT 0x010c) @ 0x
> ACPI: PM-Timer IO Port: 0x1008
> ACPI: Local APIC address 0xfee0
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> Processor #0 15:2 APIC version 20
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> Processor #1 15:2 APIC version 20
> ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
> ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
> IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ9 used by override.
> Enabling APIC mode:  Flat.  Using 1 I/O APICs
> ACPI: acpi_table_parse(17, ) HPET NULL handler!
> Using ACPI (MADT) for SMP configuration information
> 

ACPI: RSDP (v000 ACPIAM) @ 0x000f9e30
ACPI: RSDT (v001 A M I  OEMRSDT  0x1414 MSFT 0x0097) @ 0x7ff3
ACPI: FADT (v002 A M I  OEMFACP  0x1414 MSFT 0x0097) @ 0x7ff30200
ACPI: MADT (v001 A M I  OEMAPIC  0x1414 MSFT 0x0097) @ 0x7ff30390
ACPI: OEMB (v001 A M I  OEMBIOS  0x1414 MSFT 0x0097) @ 0x7ff40040
ACPI: DSDT (v001  P4P81 P4P81104 0x0104 INTL 0x02002026) @ 0x
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: acpi_table_parse(17, ) HPET NULL handler!
Using ACPI (MADT) for SMP configuration information

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


list_del corruption with fedora 6 kernels (fc5 was ok)

2007-01-12 Thread Karl Kiniger
Hi,

these trigger about 1-2 times per week at random times. I dont see a
pattern, one time it happened after plugging in the USB headphone, another
time it happened while the machine was more or less idle.

machine does not reboot automatically ( /proc/sys/kernel/panic is set to 20)

most of the time the panic does not make it into the syslog but I have been 
lucky
three times.

how to track this down?

Greetings,
Karl


NB: the v4l/bt848 stuff is not being used at all.

/var/log/messages.3:Dec 19 11:58:38 wszip-kinigka kernel: list_del corruption. 
next->prev should be c6e1f2c0, but was 35b0
/var/log/messages.4:Dec 13 15:57:07 wszip-kinigka kernel: list_del corruption. 
next->prev should be c9a24be0, but was c1284fe0
(backtraces are essentially the same)

from today:

Jan 12 10:57:23 wszip-kinigka kernel: list_del corruption. prev->next should be 
ea24aa20, but was ea240080
Jan 12 10:57:23 wszip-kinigka kernel: [ cut here ]
Jan 12 10:57:23 wszip-kinigka kernel: kernel BUG at lib/list_debug.c:65!
Jan 12 10:57:23 wszip-kinigka kernel: invalid opcode:  [#1]
Jan 12 10:57:23 wszip-kinigka kernel: SMP 
Jan 12 10:57:23 wszip-kinigka kernel: last sysfs file: /class/net/lo/ifindex
Jan 12 10:57:23 wszip-kinigka kernel: Modules linked in: snd_usb_audio vfat fat 
hfsplus nls_utf8 cifs sbp2 sg usb_storage tun snd_usb_lib autofs4 hidp rfcomm 
l2cap bluetooth vmnet(U) vmmon(U) sunrpc ib_iser rdma_cm ib_addr ib_cm ib_sa 
ib_mad ib_core iscsi_tcp libiscsi scsi_transport_iscsi ipv6 reiserfs loop 
dm_multipath parport_pc lp parport bt878 snd_bt87x snd_cmipci tuner tvaudio 
snd_seq_dummy gameport snd_seq_oss snd_opl3_lib bttv video_buf ir_common 
snd_hwdep snd_seq_midi_event nvidia(U) snd_mpu401_uart snd_seq compat_ioctl32 
snd_pcm_oss i2c_algo_bit snd_rawmidi btcx_risc snd_mixer_oss snd_seq_device 
snd_pcm tveeprom snd_timer videodev ide_cd ohci1394 3c59x v4l1_compat 
v4l2_common snd i2c_core ieee1394 snd_page_alloc cdrom floppy mii soundcore 
serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_mod aic7xxx 
scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Jan 12 10:57:23 wszip-kinigka kernel: CPU:0
Jan 12 10:57:23 wszip-kinigka kernel: EIP:0060:[]Tainted: P   
   VLI
Jan 12 10:57:23 wszip-kinigka kernel: EFLAGS: 00010096   (2.6.18-1.2869.fc6 #1) 
Jan 12 10:57:23 wszip-kinigka kernel: EIP is at list_del+0x23/0x6c
Jan 12 10:57:23 wszip-kinigka kernel: eax: 0048   ebx: ea24aa20   ecx: 
c067e1d0   edx: 0092
Jan 12 10:57:23 wszip-kinigka kernel: esi: f7ffd6c0   edi: cb841000   ebp: 
f7fffe80   esp: f7fefef8
Jan 12 10:57:23 wszip-kinigka kernel: ds: 007b   es: 007b   ss: 0068
Jan 12 10:57:23 wszip-kinigka kernel: Process events/0 (pid: 5, ti=f7fef000 
task=f7d80030 task.ti=f7fef000)
Jan 12 10:57:23 wszip-kinigka kernel: Stack: c0641c4f ea24aa20 ea240080 
ea24aa20 c046b553 f7f7a1c0 0005 0004 
Jan 12 10:57:23 wszip-kinigka kernel:f7ffdef0 f7ffdee0 0005 
f7ffdec0  c046b656   
Jan 12 10:57:23 wszip-kinigka kernel:f7fffe80 f7ffd6e4 f7ffd6c0 
f7fffe80 c18fd340 0282 c046ca7a  
Jan 12 10:57:23 wszip-kinigka kernel: Call Trace:
Jan 12 10:57:23 wszip-kinigka kernel:  [] free_block+0x63/0xdc
Jan 12 10:57:23 wszip-kinigka kernel:  [] drain_array+0x8a/0xb5
Jan 12 10:57:23 wszip-kinigka kernel:  [] cache_reap+0x53/0x117
Jan 12 10:57:23 wszip-kinigka kernel:  [] run_workqueue+0x83/0xc5
Jan 12 10:57:23 wszip-kinigka kernel:  [] worker_thread+0xd9/0x10d
Jan 12 10:57:23 wszip-kinigka kernel:  [] kthread+0xc0/0xed
Jan 12 10:57:23 wszip-kinigka kernel:  [] 
kernel_thread_helper+0x7/0x10
Jan 12 10:57:23 wszip-kinigka kernel: DWARF2 unwinder stuck at 
kernel_thread_helper+0x7/0x10
Jan 12 10:57:23 wszip-kinigka kernel: Leftover inexact backtrace:
Jan 12 10:57:23 wszip-kinigka kernel:  ===
Jan 12 10:57:23 wszip-kinigka kernel: Code: 00 00 89 c3 eb e8 90 90 53 89 c3 83 
ec 0c 8b 40 04 8b 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 4f 1c 64 c0 
e8 2b be f3 ff <0f> 0b 41 00 8c 1c 64 c0 8b 03 8b 40 04 39 d8 74 1c 89 5c 24 04 
Jan 12 10:57:23 wszip-kinigka kernel: EIP: [] list_del+0x23/0x6c 
SS:ESP 0068:f7fefef8
Jan 12 10:57:23 wszip-kinigka kernel:  <3>BUG: sleeping function called from 
invalid context at kernel/rwsem.c:20
Jan 12 10:57:23 wszip-kinigka kernel: in_atomic():0, irqs_disabled():1
Jan 12 10:57:23 wszip-kinigka kernel:  [] dump_trace+0x69/0x1af
Jan 12 10:57:23 wszip-kinigka kernel:  [] show_trace_log_lvl+0x18/0x2c
Jan 12 10:57:23 wszip-kinigka kernel:  [] show_trace+0xf/0x11
Jan 12 10:57:23 wszip-kinigka kernel:  [] dump_stack+0x15/0x17
Jan 12 10:57:23 wszip-kinigka kernel:  [] down_read+0x12/0x20
Jan 12 10:57:23 wszip-kinigka kernel:  [] 
blocking_notifier_call_chain+0xe/0x29
Jan 12 10:57:23 wszip-kinigka kernel:  [] do_exit+0x1b/0x776
Jan 12 10:57:23 wszip-kinigka kernel:  [] die+0x29d/0x2c2
Jan 12 10:57:23 wszip-kinigka kernel:  [] do_invalid_op+0xa2/0xab
Jan 12 10:57:23 

Re: [PATCH 2/5] fixing errors handling during pci_driver resume stage [ata]

2007-01-12 Thread Grant Grundler
On Tue, Jan 09, 2007 at 12:01:28PM +0300, Dmitriy Monakhov wrote:
> ata pci drivers have to return correct error code during resume stage in
> case of errors.
...
> @@ -6246,8 +6253,10 @@ int ata_pci_device_suspend(struct pci_de
>  int ata_pci_device_resume(struct pci_dev *pdev)
>  {
>   struct ata_host *host = dev_get_drvdata(>dev);
> + int err;
>  
> - ata_pci_device_do_resume(pdev);
> + if ((err = ata_pci_device_do_resume(pdev)))
> + return err;

nit: in every other case I looked at you did:
   err = foo()
   if (err) ...

Can you make that consistent here too?

thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] Char: mxser_new, fix sparc compile error

2007-01-12 Thread Jiri Slaby
mxser_new, fix sparc compile error

On sparc B400 is not defined. Use B200 for special baudrate, which
is defined on all platforms.

Signed-off-by: Jiri Slaby <[EMAIL PROTECTED]>

---
commit 2826e3a35f34046890c84a77bc2784a184f9bf6a
tree fcfd15b000e703d91361f2b2c3c1bafb0d18b05d
parent 1ed2feac68d7b7cd50ffcd28cb0830b435e7d120
author Jiri Slaby <[EMAIL PROTECTED]> Sat, 13 Jan 2007 00:27:05 +0059
committer Jiri Slaby <[EMAIL PROTECTED]> Sat, 13 Jan 2007 00:27:05 +0059

 drivers/char/mxser_new.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index 1997390..4c80549 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -189,6 +189,8 @@ static unsigned int mxvar_baud_table1[] = {
 };
 #define BAUD_TABLE_NO ARRAY_SIZE(mxvar_baud_table)
 
+#define B_SPEC B200
+
 static int ioaddr[MXSER_BOARDS] = { 0, 0, 0, 0 };
 static int ttymajor = MXSERMAJOR;
 static int calloutmajor = MXSERCUMAJOR;
@@ -544,7 +546,7 @@ static int mxser_change_speed(struct mxser_port *info,
return ret;
 
if (mxser_set_baud_method[info->tty->index] == 0) {
-   if ((cflag & (CBAUD | CBAUDEX)) == B400)
+   if ((cflag & CBAUD) == B_SPEC)
baud = info->speed;
else
baud = tty_get_baud_rate(info->tty);
@@ -1700,7 +1702,7 @@ static int mxser_ioctl(struct tty_struct *tty, struct 
file *file,
if (speed == mxvar_baud_table[i])
break;
if (i == BAUD_TABLE_NO) {
-   info->tty->termios->c_cflag |= B400;
+   info->tty->termios->c_cflag |= B_SPEC;
} else if (speed != 0)
info->tty->termios->c_cflag |= mxvar_baud_table1[i];
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Jiri Slaby
Frederik Deweerdt wrote:
> On Fri, Jan 12, 2007 at 05:53:08PM -0500, Len Brown wrote:
>> On Friday 12 January 2007 05:20, Frederik Deweerdt wrote:
>>> On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
   
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc4-mm1/

>>> Hi,
>>>
>>> The git-acpi.patch replaces earlier "if(!handler) return -EINVAL" by
>>> "BUG_ON(!handler)". This locks my machine early at boot with a message
>>> along the lines of (It's hand copied):
>>> Int 6: cr2:  eip: c0570e05 flags: 00010046 cs: 60
>>> stack: c054ffac c011db2b c04936d0 c054ff68 c054ffc0 c054fff4 c057da2c
>>>
>>> Reverting the change as follows, allows booting:
>>> Any ideas to debug this further?
>>
>>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>>> index db0c5f6..fba018c 100644
>>> --- a/drivers/acpi/tables.c
>>> +++ b/drivers/acpi/tables.c
>>> @@ -414,7 +414,9 @@ int __init acpi_table_parse(enum acpi_ta
>>> unsigned int index;
>>> unsigned int count = 0;
>>>  
>>> -   BUG_ON(!handler);
>>> +   if (!handler)
>>> +   return -EINVAL;
>>> +   /*BUG_ON(!handler);*/
>>>  
>>> for (i = 0; i < sdt_count; i++) {
>>> if (sdt_entry[i].id != id)
>> What do you see if on failure you also print out the params, like below?

I get this:

ACPI: RSDP (v000 GBT   ) @ 0x000f6e80
ACPI: RSDT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3000
ACPI: FADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff3040
ACPI: MADT (v001 GBTAWRDACPI 0x42302e31 AWRD 0x01010101) @ 0x3fff7100
ACPI: DSDT (v001 GBTAWRDACPI 0x1000 MSFT 0x010c) @ 0x
ACPI: PM-Timer IO Port: 0x1008
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: acpi_table_parse(17, ) HPET NULL handler!
Using ACPI (MADT) for SMP configuration information

reagrds,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [kvm-devel] kvm & dyntick

2007-01-12 Thread Thomas Gleixner
On Fri, 2007-01-12 at 15:25 -0800, Dor Laor wrote:
> This is great news for PV guests.
> 
> Never-the-less we still need to improve our full virtualized guest
> support. 

Full virtualized guests, which have their own dyntick support, are fine
as long as we provide local apic emulation for them.

If a guest does not have that, it will use the periodic mode. There is
no way to circumvent this. We do not know, whether the guest relies on
that periodic interrupt or not.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Choosing a HyperThreading/SMP/MultiCore kernel ?

2007-01-12 Thread Paul Jackson
> Trying to understand, should I set CPUSETS=y

You don't need CPUSETS for this small a system.

But setting it is harmless - for example at least
one major commercial distribution enables CPUSETS
on almost all their product, most of which is running
on PC's less powerful than yours.

CPUSETS provides a facility for managing the memory
and processor placement of jobs running on what are
typically big NUMA systems.  Job X runs on CPUs 0-3
with memory on Nodes 0-1, while Job Y runs on CPUs
4-7 and Nodes 2-3.  And bigger ... to hundreds and
thousands of CPUs and Nodes.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [kvm-devel] kvm & dyntick

2007-01-12 Thread Dor Laor
>* Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
>> > dyntick-enabled guest:
>> > - reduce the load on the host when the guest is idling
>> >   (currently an idle guest consumes a few percent cpu)
>>
>> yeah. KVM under -rt already works with dynticks enabled on both the
>> host and the guest. (but it's more optimal to use a dedicated
>> hypercall to set the next guest-interrupt)
>
>using the dynticks code from the -rt kernel makes the overhead of an
>idle guest go down by a factor of 10-15:
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 2556 mingo 15   0  598m 159m 157m R  1.5  8.0   0:26.20 qemu
>
>( for this to work on my system i have added a 'hyper' clocksource
>  hypercall API for KVM guests to use - this is needed instead of the
>  running-to-slowly TSC. )
>
>   Ingo

This is great news for PV guests.

Never-the-less we still need to improve our full virtualized guest
support. 
First we need a mechanism (can we use the timeout_granularity?) to
dynamically change the host timer frequency so we can support guests
with 100hz that dynamically change their freq to 1000hz and back.

Afterwards we'll need to compensate the lost alarm signals to the guests
by using one of 
 - hrtimers to inject the lost interrupts for specific guests. The
problem this will increase the overall load.
 - Injecting several virtual irq to the guests one after another (using
interrupt window exit). The question is how the guest will be effected
from this unfair behavior.

Can dyntick help HVMs? Will the answer be the same for guest-dense
hosts? I understood that the main gain of dyn-tick is for idle time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Choosing a HyperThreading/SMP/MultiCore kernel ?

2007-01-12 Thread Sunil Naidu

On 1/12/07, Lennart Sorensen <[EMAIL PROTECTED]> wrote:


I would expect any distribution should work on these (as long as the
kernel they use isn't too old.).  Of course if it is a Mac, you need a
distribution that supports their firmware (which is of course not a PC
bios).  As long as you can boot it, any i386 or amd64 kernel with smp
enabled should use all the processors present (well amd64 on the
core2duo and on the p4 if it is em64t enabled).


It is not a Mac here, IBM Workstation. I can see the Processor as
Pentium 4 CPU 3. GHz (family 15, model 4). How to know EM64T enabled,
any command?

Trying to understand, should I set CPUSETS=y and SCHED_MC=y Or ignore them.


I believe the closest optimization for a Core2 is probably the Pentium M
(certainly not the P4/netburst).  Not entirely sure though.



Yep, this ia a MacBookPro. I have decided about the distro. I did ask
this doubt when I got for the custom kernel compilation from source
after installation.

What I have seen in KConfig is, MPENTIUM4 used for the Xeon processor
too. I would try this soon on my Laptop (with SMP since it's a
Core2Duo). Anyway, shall post here.


--
Len Sorensen


Thanks,

~Sunil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Frederik Deweerdt
On Fri, Jan 12, 2007 at 05:53:08PM -0500, Len Brown wrote:
> On Friday 12 January 2007 05:20, Frederik Deweerdt wrote:
> > On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
> > > 
> > >   
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc4-mm1/
> > > 
> > Hi,
> > 
> > The git-acpi.patch replaces earlier "if(!handler) return -EINVAL" by
> > "BUG_ON(!handler)". This locks my machine early at boot with a message
> > along the lines of (It's hand copied):
> > Int 6: cr2:  eip: c0570e05 flags: 00010046 cs: 60
> > stack: c054ffac c011db2b c04936d0 c054ff68 c054ffc0 c054fff4 c057da2c
> > 
> > Reverting the change as follows, allows booting:
> > Any ideas to debug this further?
> 
> 
> > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> > index db0c5f6..fba018c 100644
> > --- a/drivers/acpi/tables.c
> > +++ b/drivers/acpi/tables.c
> > @@ -414,7 +414,9 @@ int __init acpi_table_parse(enum acpi_ta
> > unsigned int index;
> > unsigned int count = 0;
> >  
> > -   BUG_ON(!handler);
> > +   if (!handler)
> > +   return -EINVAL;
> > +   /*BUG_ON(!handler);*/
> >  
> > for (i = 0; i < sdt_count; i++) {
> > if (sdt_entry[i].id != id)
> 
> What do you see if on failure you also print out the params, like below?
> 
I'm sorry, I might not be able to try it until monday. Michal reported
a similar problem though, adding him to CC list.

Regards,
Frederik

> thanks,
> -Len
> 
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index 3fce3db..e2d08a5 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -415,7 +415,12 @@ int __init acpi_table_parse(enum acpi_table_id id, 
> acpi_table_handler handler)
>   unsigned int index = 0;
>   unsigned int count = 0;
>  
> - BUG_ON(!handler);
> + if (!handler) {
> + printk(KERN_WARNING PREFIX
> + "acpi_table_parse(%d, %p) %s NULL handler!\n",
> + id, handler, acpi_table_signatures[id]);
> + return -EINVAL;
> + }
>  
>   for (i = 0; i < sdt_count; i++) {
>   if (sdt_entry[i].id != id)
> 
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3-mm1: umount reiser4 FS stuck in D state

2007-01-12 Thread Laurent Riffard

Le 06.01.2007 19:58, Vladimir V. Saveliev a écrit :

Hello

On Saturday 06 January 2007 13:58, Laurent Riffard wrote:

Hello,

got this with 2.6.20-rc3-mm1:

===
SysRq : Show Blocked State

 freesibling
  task PCstack   pid father child younger older
umountD C013135E  6044  1168   1150 (NOTLB)
   de591ae4 0086 de591abc c013135e dff979c8 c012a6fe 0046 0007 
   dfd94ac0 128d3000 0026  dfd94bcc dff979c8 de591ae4 dffda038 
   0002 dff979c0 dff979bc dff979c8 de591b10 c012d600 dff979f8  
Call Trace:

 [] synchronize_qrcu+0x70/0x8c
 [] __make_request+0x4c/0x29b
 [] generic_make_request+0x1b0/0x1de
 [] submit_bio+0xda/0xe2
 [] write_jnodes_to_disk_extent+0x920/0x974 [reiser4]
 [] update_journal_footer+0x29f/0x2b7 [reiser4]
 [] write_tx_back+0x149/0x185 [reiser4]
 [] reiser4_write_logs+0xea4/0xfd2 [reiser4]
 [] try_commit_txnh+0x7e6/0xa4f [reiser4]
 [] reiser4_txn_end+0x148/0x3cf [reiser4]
 [] reiser4_txn_restart+0xb/0x1a [reiser4]
 [] reiser4_txn_restart_current+0x73/0x75 [reiser4]
 [] force_commit_atom+0x258/0x261 [reiser4]
 [] txnmgr_force_commit_all+0x406/0x697 [reiser4]
 [] release_format40+0x10c/0x193 [reiser4]
 [] reiser4_put_super+0x134/0x16a [reiser4]
 [] generic_shutdown_super+0x55/0xd8
 [] kill_block_super+0x20/0x32
 [] deactivate_super+0x3f/0x51
 [] mntput_no_expire+0x42/0x5f
 [] path_release_on_umount+0x15/0x18
 [] sys_umount+0x1a3/0x1cb
 [] sys_oldumount+0x19/0x1b
 [] sysenter_past_esp+0x5f/0x99
 ===

Scenario:
- umount a reiser4 FS (no need to write something before)


Hmm, I can not reproduce this with 2.6.20-rc3-mm1. Probably I need to config 
the kernel more close to your system.


Earlier kernels were OK.


This still happens with 2.6.20-rc4-mm1...

Should I open a bug report at http://bugzilla.kernel.org?

--
laurent

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Early ACPI lockup (was Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Len Brown
On Friday 12 January 2007 05:20, Frederik Deweerdt wrote:
> On Thu, Jan 11, 2007 at 10:26:27PM -0800, Andrew Morton wrote:
> > 
> >   
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc3/2.6.20-rc4-mm1/
> > 
> Hi,
> 
> The git-acpi.patch replaces earlier "if(!handler) return -EINVAL" by
> "BUG_ON(!handler)". This locks my machine early at boot with a message
> along the lines of (It's hand copied):
> Int 6: cr2:  eip: c0570e05 flags: 00010046 cs: 60
> stack: c054ffac c011db2b c04936d0 c054ff68 c054ffc0 c054fff4 c057da2c
> 
> Reverting the change as follows, allows booting:
> Any ideas to debug this further?


> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index db0c5f6..fba018c 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -414,7 +414,9 @@ int __init acpi_table_parse(enum acpi_ta
>   unsigned int index;
>   unsigned int count = 0;
>  
> - BUG_ON(!handler);
> + if (!handler)
> + return -EINVAL;
> + /*BUG_ON(!handler);*/
>  
>   for (i = 0; i < sdt_count; i++) {
>   if (sdt_entry[i].id != id)

What do you see if on failure you also print out the params, like below?

thanks,
-Len

diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 3fce3db..e2d08a5 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -415,7 +415,12 @@ int __init acpi_table_parse(enum acpi_table_id id, 
acpi_table_handler handler)
unsigned int index = 0;
unsigned int count = 0;
 
-   BUG_ON(!handler);
+   if (!handler) {
+   printk(KERN_WARNING PREFIX
+   "acpi_table_parse(%d, %p) %s NULL handler!\n",
+   id, handler, acpi_table_signatures[id]);
+   return -EINVAL;
+   }
 
for (i = 0; i < sdt_count; i++) {
if (sdt_entry[i].id != id)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] raw: don't allow the creation of a raw device with minor number 0

2007-01-12 Thread jmoyer
==> Regarding Re: [patch] raw: don't allow the creation of a raw device with 
minor number 0; Jan Engelhardt <[EMAIL PROTECTED]> adds:

jengelh> On Jan 12 2007 11:32, Jeff Moyer wrote:

>> Date: Fri, 12 Jan 2007 11:32:11 -0500
>> From: Jeff Moyer <[EMAIL PROTECTED]>
>> To: Linux Kernel Mailing List 
>> Cc: Steven Fernandez <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>
>> Subject: [patch] raw: don't allow the creation of a raw device with minor
>> number 0
>> 
>> Hi,
>> 
>> Minor number 0 (under the raw major) is reserved for the rawctl device
>> file, which is used to query, set, and unset raw device bindings.
>> However, the ioctl interface does not protect the user from specifying
>> a raw device with minor number 0:

jengelh> No idea what to say about this... probably:

jengelh>   What:   RAW driver (CONFIG_RAW_DRIVER)
jengelh>   When:   December 2005
jengelh>   Why:declared obsolete since kernel 2.6.3
jengelh>   O_DIRECT can be used instead
jengelh>   Who:Adrian Bunk <[EMAIL PROTECTED]>

It's still present, still used, and so would benefit from being fixed, in
my opinion.

Cheers,

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT question

2007-01-12 Thread Andrew Morton
On Fri, 12 Jan 2007 15:35:09 -0700
Erik Andersen <[EMAIL PROTECTED]> wrote:

> On Fri Jan 12, 2007 at 05:09:09PM -0500, Linus Torvalds wrote:
> > I suspect a lot of people actually have other reasons to avoid caches. 
> > 
> > For example, the reason to do O_DIRECT may well not be that you want to 
> > avoid caching per se, but simply because you want to limit page cache 
> > activity. In which case O_DIRECT "works", but it's really the wrong thing 
> > to do. We could export other ways to do what people ACTUALLY want, that 
> > doesn't have the downsides.
> 
> I was rather fond of the old O_STREAMING patch by Robert Love,

That was an akpmpatch whcih I did for the Digeo kernel.  Robert picked it
up to dehackify it and get it into mainline, but we ended up deciding that
posix_fadvise() was the way to go because it's standards-based.

It's a bit more work in the app to use posix_fadvise() well.  But the
results will be better.  The app should also use sync_file_range()
intelligently to control its pagecache use.

The problem with all of these things is that the application needs to be
changed, and people often cannot do that.  If we want a general way of
stopping particular apps from swamping pagecache then it'd really need to
be an externally-imposed thing - probably via additional accounting and a
new rlimit.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] raw: don't allow the creation of a raw device with minor number 0

2007-01-12 Thread Jan Engelhardt

On Jan 12 2007 11:32, Jeff Moyer wrote:

>Date: Fri, 12 Jan 2007 11:32:11 -0500
>From: Jeff Moyer <[EMAIL PROTECTED]>
>To: Linux Kernel Mailing List 
>Cc: Steven Fernandez <[EMAIL PROTECTED]>, Andrew Morton <[EMAIL PROTECTED]>
>Subject: [patch] raw: don't allow the creation of a raw device with minor
>number 0
>
>Hi,
>
>Minor number 0 (under the raw major) is reserved for the rawctl device
>file, which is used to query, set, and unset raw device bindings.
>However, the ioctl interface does not protect the user from specifying
>a raw device with minor number 0:

No idea what to say about this... probably:

  What:   RAW driver (CONFIG_RAW_DRIVER)
  When:   December 2005
  Why:declared obsolete since kernel 2.6.3
  O_DIRECT can be used instead
  Who:Adrian Bunk <[EMAIL PROTECTED]>



-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

2007-01-12 Thread Eric Sandeen
Eric Sandeen wrote:

> Alex Tomas wrote:
>   
>> yes, but it shouldn't allow to re-link such inode back, IMHO.
>> a filesystem may start some non-revertable activity in its
>> unlink method.
>>
>> thanks, Alex
>> 
>
> I tend to agree, chatting w/ Al I think he does too.  :)  I'll test
> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
> that if things go well.
>   
Well this seems to fix things up for ext3 (and ext4 by extension):

---

Return -ENOENT from ext[34]_link if we've raced with unlink and
i_nlink is 0.  Doing otherwise has the potential to corrupt the
orphan inode list, because we'd wind up with an inode with a
non-zero link count on the list, and it will never get properly
cleaned up.

Signed-off-by: Eric Sandeen <[EMAIL PROTECTED]>

Index: linux-2.6.19/fs/ext3/namei.c
===
--- linux-2.6.19.orig/fs/ext3/namei.c
+++ linux-2.6.19/fs/ext3/namei.c
@@ -2191,6 +2191,8 @@ static int ext3_link (struct dentry * ol
 
if (inode->i_nlink >= EXT3_LINK_MAX)
return -EMLINK;
+   if (inode->i_nlink == 0)
+   return -ENOENT;
 
 retry:
handle = ext3_journal_start(dir, EXT3_DATA_TRANS_BLOCKS(dir->i_sb) +
Index: linux-2.6.19/fs/ext4/namei.c
===
--- linux-2.6.19.orig/fs/ext4/namei.c
+++ linux-2.6.19/fs/ext4/namei.c
@@ -2189,6 +2189,8 @@ static int ext4_link (struct dentry * ol
 
if (inode->i_nlink >= EXT4_LINK_MAX)
return -EMLINK;
+   if (inode->i_nlink == 0)
+   return -ENOENT;
 
 retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT question

2007-01-12 Thread Erik Andersen
On Fri Jan 12, 2007 at 05:09:09PM -0500, Linus Torvalds wrote:
> I suspect a lot of people actually have other reasons to avoid caches. 
> 
> For example, the reason to do O_DIRECT may well not be that you want to 
> avoid caching per se, but simply because you want to limit page cache 
> activity. In which case O_DIRECT "works", but it's really the wrong thing 
> to do. We could export other ways to do what people ACTUALLY want, that 
> doesn't have the downsides.

I was rather fond of the old O_STREAMING patch by Robert Love,
which added an open() flag telling the kernel to not keep data
from the current file in cache by dropping pages from the
pagecache before the current index.  O_STREAMING was very nice
for when you know you want to read a large file sequentially
without polluting the rest of the cache with GB of data that you
plan on only read once and discard.  It worked nicely at doing
what many people want to use O_DIRECT for.

Using O_STREAMING you would get normal read/write semantics since
you still had the pagecache caching your data, but only the
not-yet-written write-behind data and the not-yet-read read-ahead
data.  With the additional hint the kernel should drop free-able
pages from the pagecache behind the current position, because we
know we will never want them again.  I thought that was a very
nice way of handling things.

 -Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How can I create or read/write a file in linux device driver?

2007-01-12 Thread Jan Engelhardt

On Jan 12 2007 09:27, linux-os (Dick Johnson) wrote:
>
>First, since file-operations require process context, and the kernel
>is not a process, you need to create a kernel thread to handle your file
>I/O.

Not always. If you do file I/O as part of a device driver, you are fine.
quad_dsp is such an example, where writing to /dev/Qdsp_* will trigger writes
to /dev/dsp and /dev/adsp.

>Once you set up this "internal environment," you use the appropriate
>kernel function(s) such as sys_open()

What against filp_open? That avoids the unnecessary getname() stuff in most
syscalls.


-`J'
-- 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.20-rc5

2007-01-12 Thread Andrew Morton
On Fri, 12 Jan 2007 14:27:48 -0500 (EST)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> Ok, there it is, in all its shining glory.
> 

It still doesn't run Excel.

> A lot of developers (including me) will be gone next week for 
> Linux.Conf.Au,

me too.

> so you have a week of rest and quiet to test this, and 
> report any problems. 

I have a few fixes pending:

kvm-add-vm-exit-profiling-fix.patch
revert-nmi_known_cpu-check-during-boot-option-parsing.patch
blockdev-direct_io-fix-signedness-bug.patch
submitchecklist-update.patch
paravirt-mark-the-paravirt_ops-export-internal.patch
kvm-make-sure-there-is-a-vcpu-context-loaded-when.patch
kvm-fix-race-between-mmio-reads-and-injected-interrupts.patch
kvm-x86-emulator-fix-bit-string-instructions.patch
kvm-fix-asm-constraints-with-config_frame_pointer=n.patch
kvm-fix-bogus-pagefault-on-writable-pages.patch
rtc-sh-act-on-rtc_wkalrmenabled-when-setting-an-alarm.patch
fix-blk_direct_io-bio-preparation.patch
tlclk-bug-fix-misc-fixes.patch
mbind-restrict-nodes-to-the-currently-allowed-cpuset.patch
reiserfs-avoid-tail-packing-if-an-inode-was-ever-mmapped.patch

all of which are present in
http://userweb.kernel.org/~akpm/2.6.20-rc5-mm-fixes

The KVM and direct-io changes are significant, so if people are testing
those things, please be sure to have that patch applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT question

2007-01-12 Thread Michael Tokarev
Linus Torvalds wrote:
> 
> On Sat, 13 Jan 2007, Michael Tokarev wrote:
>>> At that point, O_DIRECT would be a way of saying "we're going to do 
>>> uncached accesses to this pre-allocated file". Which is a half-way 
>>> sensible thing to do.
>> Half-way?
> 
> I suspect a lot of people actually have other reasons to avoid caches. 
> 
> For example, the reason to do O_DIRECT may well not be that you want to 
> avoid caching per se, but simply because you want to limit page cache 
> activity. In which case O_DIRECT "works", but it's really the wrong thing 
> to do. We could export other ways to do what people ACTUALLY want, that 
> doesn't have the downsides.
> 
> For example, the page cache is absolutely required if you want to mmap. 
> There's no way you can do O_DIRECT and mmap at the same time and expect 
> any kind of sane behaviour. It may not be what a DB wants to use, but it's 
> an example of where O_DIRECT really falls down.

Provided when the two are about the same part of a file.  If not, and if
the file is "divided" on a proper boundary (sector/page/whatever-aligned),
there's no issues, at least not if all the blocks of a file has been allocated
(no gaps, that is).

What I was referring to in my last email - and said it's a corner case - is:
mmap() start of a file, say, first megabyte of it, where some index/bitmap is
located, and use direct-io on the rest.  So the two aren't overlap.

Still problematic?

>>> But what O_DIRECT does right now is _not_ really sensible, and the 
>>> O_DIRECT propeller-heads seem to have some problem even admitting that 
>>> there _is_ a problem, because they don't care. 
>> Well.  In fact, there's NO problems to admit.
>>
>> Yes, yes, yes yes - when you think about it from a general point of
>> view, and think how non-O_DIRECT and O_DIRECT access fits together,
>> it's a complete mess, and you're 100% right it's a mess.
> 
> You can't admit that even O_DIRECT _without_ any non-O_DIRECT actually 
> fails in many ways right now.
> 
> I've already mentioned ftruncate and block allocation. You don't seem to 
> understand that those are ALSO a problem.

I do understand this.  And this is, too, solved right now in userspace.
For example, when oracle allocates a file for its data, or when it extends
the file, it writes something to every block of new space (using O_DIRECT
while at it, but that's a different story).  The thing is: while it is doing
that, no process tries to do anything with that (part of a) file (not counting
some external processes run by evil hackers ;)  So there's still no races
or fundamental brokeness *in usage*.

It uses ftruncate() to create or extend a file, *and* does O_DIRECT writes
to force block allocations.  That's probably not right, and that alone is
probably difficult to implement in kernel (I just don't know; what I know
for sure is that this way is very slow on ext3).  Maybe because there's no
way to tell kernel something like "set the file size to this and actually
*allocate* space for it" (if it doesn't write some structure to the file).

What I dislike very much is - half-solutions.  And current O_DIRECT indeed
looks like half-a-solution, because sometimes it works, and sometimes, in
*wrong* usage scenario, it doesn't, or racy, etc, and kernel *allows* such
a wrong scenario.  A software should either work correctly, or disallow
a usage where it can't guarantee correctness.  Currently, kernel allows
incorrect usage, and that, plus all the ugly things in code done in attempt
to fix that, suxx.

But the whole thing is not (fundamentally) broken.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Ext3 mounted as ext2 but journal still in effect.

2007-01-12 Thread Pavel Machek
Hi!

> You were right, even after making the changes, it seems to be 
> telling lies:
> 
> # mount
> /dev/hda2 on / type ext2 (rw,usrquota)
> [...]
> 
> However, I think I am still not mounting as ext2:
> 
> # dmesg | grep 'Kernel command'
> Kernel command line: ro root=/dev/hda2 rootfstype=ext2
...
> rootfs / rootfs rw 0 0
> /dev/root / ext3 rw 0 0


> Do I need to mess with the initrd? My grub lines look like
> this:

Yes, probably.

Pavel
-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rc3 regression: suspend to RAM broken on Mac mini Core Duo

2007-01-12 Thread Pavel Machek
Hi!

> > >> > It didn't. It looks like it is unusable, becuase it isn't reliable in
> > >> > 2.6.20-rc3.
> > >>
> > >> Is this issue still present in -rc4?
> > >
> > >I used 2.6.20-rc4 in single user mode, and applied 2 patches from
> > >netdev to get wake on LAN support. This way I was able to set up an
> > >automatic suspend/resume loop. It looked good, but after e.g. 20
> > >minutes, the resume hang. So it is reproduceable with 2.6.20-rc4.
> > >Unfortunately, I can not test the same with 2.6.18, as the wake on LAN
> > >patches need 2.6.20-rc.
> > 
> > Hmm, do you mean this is the first time of this kind of testing?
> > Is this issue related to LAN driver?
> > I guess you should be able to set up an automatic suspend/resume loop
> > with /proc/acpi/alarm, and test similar with 2.6.18.
> 
> Thanks for the hint. I just used /proc/acpi/alarm to set up a
> suspend/resume loop and did ca. 100 cycles in a row with 2.6.18.2 in
> single user mode, without a failure.

Can you do similar test on 2.6.20 -- w/o network driver loaded (and
generaly minimum drivers?)
Pavel

-- 
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: O_DIRECT question

2007-01-12 Thread Linus Torvalds


On Sat, 13 Jan 2007, Michael Tokarev wrote:
> > 
> > At that point, O_DIRECT would be a way of saying "we're going to do 
> > uncached accesses to this pre-allocated file". Which is a half-way 
> > sensible thing to do.
> 
> Half-way?

I suspect a lot of people actually have other reasons to avoid caches. 

For example, the reason to do O_DIRECT may well not be that you want to 
avoid caching per se, but simply because you want to limit page cache 
activity. In which case O_DIRECT "works", but it's really the wrong thing 
to do. We could export other ways to do what people ACTUALLY want, that 
doesn't have the downsides.

For example, the page cache is absolutely required if you want to mmap. 
There's no way you can do O_DIRECT and mmap at the same time and expect 
any kind of sane behaviour. It may not be what a DB wants to use, but it's 
an example of where O_DIRECT really falls down.

> > But what O_DIRECT does right now is _not_ really sensible, and the 
> > O_DIRECT propeller-heads seem to have some problem even admitting that 
> > there _is_ a problem, because they don't care. 
> 
> Well.  In fact, there's NO problems to admit.
> 
> Yes, yes, yes yes - when you think about it from a general point of
> view, and think how non-O_DIRECT and O_DIRECT access fits together,
> it's a complete mess, and you're 100% right it's a mess.

You can't admit that even O_DIRECT _without_ any non-O_DIRECT actually 
fails in many ways right now.

I've already mentioned ftruncate and block allocation. You don't seem to 
understand that those are ALSO a problem.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 'struct task_struct' has no member named 'mems_allowed' (was: Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Andrew Morton
On Fri, 12 Jan 2007 14:00:16 -0800 (PST)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Fri, 12 Jan 2007, Paul Jackson wrote:
> 
> > It might look clearer to someone who is focused on that particular
> > change, but it adds unnecessary noise for the other 90% of the readers
> > of that code who are not concerned with cpusets at that point in time.
> 
> This is in NUMA specific code. And they should be concerned about cpusets 
> since cpusets may affect the node masks they can set. If this is hidden in 
> a macro then it may be overlooked.

bah.  No ifdefs!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disk Cache, Was: O_DIRECT question

2007-01-12 Thread Michael Tokarev
Zan Lynx wrote:
> On Sat, 2007-01-13 at 00:03 +0300, Michael Tokarev wrote:
> [snip]
>> And sure thing, withOUT O_DIRECT, the whole system is almost dead under this
>> load - because everything is thrown away from the cache, even caches of /bin
>> /usr/bin etc... ;)  (For that, fadvise() seems to help a bit, but not alot).
> 
> One thing that I've been using, and seems to work well, is a customized
> version of the readahead program several distros use during boot up.

[idea to lock some (commonly-used) cache pages in memory]

> Something like that could keep your system responsive no matter what the
> disk cache is doing otherwise.

Unfortunately it's not.  Sure, things like libc.so etc will be force-cached
and will start fast.  But not my data files and other stuff (what an
unfortunate thing: memory usually is smaller in size than disks ;)

I can do usual work without noticing something's working with the disks
intensively, doing O_DIRECT I/O.  For example, I can run large report on
a database, which requires alot of disk I/O, and run a kernel compile at
the same time.  Sure, disk access is alot slower, but disk cache helps alot,
too.  My kernel compile will not be much slower than usual.  But if I'll
turn O_DIRECT off, the compile will take ages to finish.  *And* the report
running, too!  Because the system tries hard to cache the WRONG pages!
(yes I remember fadvise  - which aren't used by the database(s) currently,
and quite alot of words has been said about that, too;  I also noticied it's
slower as well, at least currently.)

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 'struct task_struct' has no member named 'mems_allowed' (was: Re: 2.6.20-rc4-mm1)

2007-01-12 Thread Paul Jackson
Christoph wrote:
> If this is hidden in a macro then it may be overlooked.

Sooner or later, every line of code is important.

Shouting any one of them in #ifdef brackets creates
a noisier environment, increasing the chance of missing
another.

And besides ... the other umpteen cpuset hooks all use the
cpuset_*() style macros (except for fs/proc/base.c, which
has its own style ...).

Consistency in style is important in these matters.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

2007-01-12 Thread Alex Tomas
> Eric Sandeen (ES) writes:

 ES> Al says "no" and I'm not arguing.  :)

 ES> Apparently this may be OK with some filesystems, and Al says he doesn't
 ES> want to know about i_nlink in the vfs in any case.

well, generic_drop_inode() uses i_nlink ...

 ES> But I suppose there may be other filesystems which DO care, and should
 ES> be checking if they're not.

this is why I thought VFS could take care.

thanks, Alex
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA hotplug from the user side ?

2007-01-12 Thread Soeren Sonnenburg
On Fri, 2007-01-12 at 12:04 -0500, Jeff Garzik wrote:
> Soeren Sonnenburg wrote:
> > Dear all,
> > 
> > I'd like to try out SATA hotplugging using a SIL3114. Though I was
> > harvesting the web, I could not find any useful information how this is
> > done in practice.
> > 
> > Well I realized that I can still use scsiadd to print and remove
> > devices, e.g.:
> 
> For SIL3114, you shouldn't have to run any commands at all.  It should 
> notice when you yank the cable, or plug in a new device.


It is true it detects a removal and newly plugged devices immediately...
However it still prints warnings and errors that it could not
synchronize SCSI cache for the disks. Then it prints regular 'rejects
I/O to dead device' warning messages and on replugging the disks puts
them to the next free sd device (e.g. sdc -> sdd).

These messages sound eval - so now the question is should I care ?
( On the other hand it did not crash the machine )

What follows is a change between to sata drives attached to port 4/5 of
the sil (ata5/ata6 here):

ata6: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen
ata6: hard resetting port
ata6: SATA link down (SStatus 0 SControl 310)
ata6: failed to recover some devices, retrying in 5 secs
ata5: exception Emask 0x10 SAct 0x0 SErr 0x1 action 0x2 frozen
ata5: hard resetting port
ata5: SATA link down (SStatus 0 SControl 310)
ata5: failed to recover some devices, retrying in 5 secs
ata6: hard resetting port
ata6: SATA link down (SStatus 0 SControl 310)
ata6: failed to recover some devices, retrying in 5 secs
ata5: hard resetting port
ata5: SATA link down (SStatus 0 SControl 310)
ata5: failed to recover some devices, retrying in 5 secs
ata6: hard resetting port
ata6: SATA link down (SStatus 0 SControl 310)
ata6.00: disabled
ata6: EH complete
ata6.00: detaching (SCSI 5:0:0:0)
Synchronizing SCSI cache for disk sdd: 
FAILED
  status = 0, message = 00, host = 4, driver = 00
  <6>ata5: hard resetting port
ata5: SATA link down (SStatus 0 SControl 310)
ata5.00: disabled
ata5: EH complete
ata5.00: detaching (SCSI 4:0:0:0)
Synchronizing SCSI cache for disk sdc: 
FAILED
  status = 0, message = 00, host = 4, driver = 00
  <3>ata6: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0x2 frozen
ata6: hard resetting port
ata6: port is slow to respond, please be patient (Status 0xff)
ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata6.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 0/32)
ata6.00: configured for UDMA/100
ata6: EH complete
scsi 5:0:0:0: Direct-Access ATA  ST3750640AS  3.AA PQ: 0
ANSI: 5
SCSI device sdf: 1465149168 512-byte hdwr sectors (750156 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: drive cache: write back
SCSI device sdf: 1465149168 512-byte hdwr sectors (750156 MB)
sdf: Write Protect is off
sdf: Mode Sense: 00 3a 00 00
SCSI device sdf: drive cache: write back
 sdf: unknown partition table
sd 5:0:0:0: Attached scsi disk sdf
sd 5:0:0:0: Attached scsi generic sg2 type 0
scsi 4:0:0:0: rejecting I/O to dead device
scsi 4:0:0:0: rejecting I/O to dead device
scsi 5:0:0:0: rejecting I/O to dead device
scsi 5:0:0:0: rejecting I/O to dead device
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5 action 0x2 frozen
ata5: hard resetting port
ata5: port is slow to respond, please be patient (Status 0xff)
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: ATA-7, max UDMA/133, 1465149168 sectors: LBA48 NCQ (depth 0/32)
ata5.00: configured for UDMA/100
ata5: EH complete
scsi 4:0:0:0: Direct-Access ATA  ST3750640AS  3.AA PQ: 0
ANSI: 5
SCSI device sdg: 1465149168 512-byte hdwr sectors (750156 MB)
sdg: Write Protect is off
sdg: Mode Sense: 00 3a 00 00
SCSI device sdg: drive cache: write back
SCSI device sdg: 1465149168 512-byte hdwr sectors (750156 MB)
sdg: Write Protect is off
sdg: Mode Sense: 00 3a 00 00
SCSI device sdg: drive cache: write back
 sdg: unknown partition table
sd 4:0:0:0: Attached scsi disk sdg
sd 4:0:0:0: Attached scsi generic sg3 type 0

Best,
Soeren
-- 
For the one fact about the future of which we can be certain is that it
will be utterly fantastic. -- Arthur C. Clarke, 1962
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Cell SPU task notification

2007-01-12 Thread Maynard Johnson


Subject: Enable SPU switch notification to detect currently active SPU tasks.

From: Maynard Johnson <[EMAIL PROTECTED]>

This patch adds to the capability of spu_switch_event_register so that the
caller is also notified of currently active SPU tasks.  It also exports
spu_switch_event_register and spu_switch_event_unregister.

Signed-off-by: Maynard Johnson <[EMAIL PROTECTED]>


Index: linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/sched.c
===
--- linux-2.6.19-rc6-arnd1+patches.orig/arch/powerpc/platforms/cell/spufs/sched.c	2006-12-04 10:56:04.730698720 -0600
+++ linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/sched.c	2007-01-11 09:45:37.918333128 -0600
@@ -46,6 +46,8 @@
 
 #define SPU_MIN_TIMESLICE 	(100 * HZ / 1000)
 
+int notify_active[MAX_NUMNODES];
+
 #define SPU_BITMAP_SIZE (((MAX_PRIO+BITS_PER_LONG)/BITS_PER_LONG)+1)
 struct spu_prio_array {
 	unsigned long bitmap[SPU_BITMAP_SIZE];
@@ -81,18 +83,45 @@
 static void spu_switch_notify(struct spu *spu, struct spu_context *ctx)
 {
 	blocking_notifier_call_chain(_switch_notifier,
-			ctx ? ctx->object_id : 0, spu);
+ ctx ? ctx->object_id : 0, spu);
+}
+
+static void notify_spus_active(void)
+{
+	int node;
+	/* Wake up the active spu_contexts. When the awakened processes 
+	 * sees their notify_active flag is set, they will call
+	 * spu_notify_already_active().
+	 */
+	for (node = 0; node < MAX_NUMNODES; node++) {
+		struct spu *spu;
+		mutex_lock(_prio->active_mutex[node]);
+list_for_each_entry(spu, _prio->active_list[node], list) {
+			struct spu_context *ctx = spu->ctx;
+			wake_up_all(>stop_wq);
+			notify_active[ctx->spu->number] = 1;
+			smp_mb();
+		}
+mutex_unlock(_prio->active_mutex[node]);
+	}
+	yield();
 }
 
 int spu_switch_event_register(struct notifier_block * n)
 {
-	return blocking_notifier_chain_register(_switch_notifier, n);
+	int ret;
+	ret = blocking_notifier_chain_register(_switch_notifier, n);
+	if (!ret)
+		notify_spus_active();
+	return ret;
 }
+EXPORT_SYMBOL_GPL(spu_switch_event_register);
 
 int spu_switch_event_unregister(struct notifier_block * n)
 {
 	return blocking_notifier_chain_unregister(_switch_notifier, n);
 }
+EXPORT_SYMBOL_GPL(spu_switch_event_unregister);
 
 
 static inline void bind_context(struct spu *spu, struct spu_context *ctx)
@@ -250,6 +279,14 @@
 	return spu_get_idle(ctx, flags);
 }
 
+void spu_notify_already_active(struct spu_context *ctx)
+{
+	struct spu *spu = ctx->spu;
+	if (!spu)
+		return;
+	spu_switch_notify(spu, ctx);
+}
+
 /* The three externally callable interfaces
  * for the scheduler begin here.
  *
Index: linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/spufs.h
===
--- linux-2.6.19-rc6-arnd1+patches.orig/arch/powerpc/platforms/cell/spufs/spufs.h	2007-01-08 18:18:40.093354608 -0600
+++ linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/spufs.h	2007-01-08 18:31:03.610345792 -0600
@@ -183,6 +183,7 @@
 void spu_yield(struct spu_context *ctx);
 int __init spu_sched_init(void);
 void __exit spu_sched_exit(void);
+void spu_notify_already_active(struct spu_context *ctx);
 
 extern char *isolated_loader;
 
Index: linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/run.c
===
--- linux-2.6.19-rc6-arnd1+patches.orig/arch/powerpc/platforms/cell/spufs/run.c	2007-01-08 18:33:51.979311680 -0600
+++ linux-2.6.19-rc6-arnd1+patches/arch/powerpc/platforms/cell/spufs/run.c	2007-01-11 10:17:20.777344984 -0600
@@ -10,6 +10,8 @@
 
 #include "spufs.h"
 
+extern int notify_active[MAX_NUMNODES];
+
 /* interrupt-level stop callback function. */
 void spufs_stop_callback(struct spu *spu)
 {
@@ -45,7 +47,9 @@
 	u64 pte_fault;
 
 	*stat = ctx->ops->status_read(ctx);
-	if (ctx->state != SPU_STATE_RUNNABLE)
+	smp_mb();
+
+	if (ctx->state != SPU_STATE_RUNNABLE || notify_active[ctx->spu->number])
 		return 1;
 	spu = ctx->spu;
 	pte_fault = spu->dsisr &
@@ -319,6 +323,11 @@
 		ret = spufs_wait(ctx->stop_wq, spu_stopped(ctx, ));
 		if (unlikely(ret))
 			break;
+		if (unlikely(notify_active[ctx->spu->number])) {
+			notify_active[ctx->spu->number] = 0;
+			if (!(status & SPU_STATUS_STOPPED_BY_STOP))
+spu_notify_already_active(ctx);
+		}
 		if ((status & SPU_STATUS_STOPPED_BY_STOP) &&
 		(status >> SPU_STOP_STATUS_SHIFT == 0x2104)) {
 			ret = spu_process_callback(ctx);


Re: Fwd: [PATCH] Fix some ARM builds due to HID brokenness

2007-01-12 Thread Randy Dunlap
On Fri, 12 Jan 2007 13:44:05 -0800 Andrew Morton wrote:

> On Fri, 12 Jan 2007 21:00:15 +
> Russell King <[EMAIL PROTECTED]> wrote:
> 
> > Could we please have this (or a proper fix) in before 2.6.20 to resolve
> > the regression please?
> > 
> >
> > ...
> >
> > --- a/drivers/hid/Kconfig
> > +++ b/drivers/hid/Kconfig
> > @@ -6,6 +6,7 @@ menu "HID Devices"
> >  
> >  config HID
> > tristate "Generic HID support"
> > +   depends on INPUT
> > default y
> > ---help---
> >   Say Y here if you want generic HID support to connect keyboards,
> > 
> 
> This was merged a week ago..

Right, we are past that to a new patch now.

---
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [RFC] remove ext3 inode from orphan list when link and unlink race

2007-01-12 Thread Eric Sandeen
Alex Tomas wrote:
>> Eric Sandeen (ES) writes:
>  ES> I tend to agree, chatting w/ Al I think he does too.  :)  I'll test
>  ES> a patch that kicks out ext3_link() with -ENOENT at the top, and resubmit
>  ES> that if things go well.
> 
> shouldn't VFS do that?

Al says "no" and I'm not arguing.  :)

Apparently this may be OK with some filesystems, and Al says he doesn't
want to know about i_nlink in the vfs in any case.

But I suppose there may be other filesystems which DO care, and should
be checking if they're not.

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   >