date:20080128

Re: [PATCH] x86: add PCI IDs to k8topology_64.c II

2008-01-28 Thread Andi Kleen

On Mon, Jan 28, 2008 at 11:39:30PM -0800, Yinghai Lu wrote:
> On Jan 29, 2008 12:09 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > > SRAT is essentially just a two dimensional table with node distances.
> >
> > Sorry, that was actually SLIT. SRAT is not two dimensional, but also
> > relatively simple. SLIT you don't really need to implement.
> >
> 
> need to add some CONFIG option to parse SRAT, MADT etc only. but don't
> pull DSDT related...

I don't think it needs a CONFIG. The code should handle this case 
by itself in any case. I'm not entirely sure it does currently, but if it 
doesn't it will likely not be very difficult to fix.

Or are you worried about code size? ACPI is around ~270k on x86-64, 
which while certainly not small should not be a problem on x86 NUMA
systems.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] 2.6.24-git usb reset problems

2008-01-28 Thread Jens Axboe

On Mon, Jan 28 2008, Greg KH wrote:
> On Mon, Jan 28, 2008 at 09:49:35PM +0100, Jens Axboe wrote:
> > Hi,
> > 
> > Running latest -git (head 91525300baf162e83e923b09ca286f9205e21522) and
> > connecting my cf usb storage device yields and endless stream of:
> > 
> > Initializing USB Mass Storage driver...
> > scsi6 : SCSI emulation for USB Mass Storage devices
> > usb-storage: device found at 2
> > usb-storage: waiting for device to settle before scanning
> > usbcore: registered new interface driver usb-storage
> > USB Mass Storage support registered.
> > scsi 6:0:0:0: Direct-Access Generic  STORAGE DEVICE   0125 PQ: 0
> > ANSI: 0
> > sd 6:0:0:0: [sdb] 4001760 512-byte hardware sectors (2049 MB)
> > sd 6:0:0:0: [sdb] Write Protect is off
> > sd 6:0:0:0: [sdb] Mode Sense: 02 00 00 00
> > sd 6:0:0:0: [sdb] Assuming drive cache: write through
> > sd 6:0:0:0: [sdb] 4001760 512-byte hardware sectors (2049 MB)
> > sd 6:0:0:0: [sdb] Write Protect is off
> > sd 6:0:0:0: [sdb] Mode Sense: 02 00 00 00
> > sd 6:0:0:0: [sdb] Assuming drive cache: write through
> >  sdb: sdb1
> > sd 6:0:0:0: [sdb] Attached SCSI removable disk
> > sd 6:0:0:0: Attached scsi generic sg1 type 0
> > scsi 6:0:0:1: Direct-Access Generic  STORAGE DEVICE   0125 PQ: 0
> > ANSI: 0
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > usb 5-1: reset high speed USB device using ehci_hcd and address 2
> > [...]
> > 
> > until I disconnect it. The device doesn't work. Worked fine in 2.6.24.
> > I'm attaching boot messages and my .config.
> 
> That's a bit wierd, as we haven't added any USB patches to the -git tree
> yet after 2.6.24 :)
> 
> Could this be caused by some scsi changes perhaps?

Heh, I guess it could! I'll double check, I reproduced it with two
distinct boots before posting.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Fix recent Ocfs2 breakage

2008-01-28 Thread Joel Becker

On Mon, Jan 28, 2008 at 09:08:04PM -0800, Greg KH wrote:
> And please please please please document stuff like this, and all of the
> different files you have in this subdirectory in Documentation/ABI/ so

Huh, I didn't know Documentation/ABI existed.  That would
certainly help in the future.

> those of us who are trying to figure out the code (and there's still
> parts of the kobject usage I'm pretty sure is not correct) can have a

ocfs2 kobject usage, or other folks'?  If there's anything in
the ocfs2 usage that you are unsure of, feel free to ask!

Joel

-- 

"Also, all of life's big problems include the words 'indictment' or
'inoperable.' Everything else is small stuff."
- Alton Brown

Joel Becker
Principal Software Developer
Oracle
E-mail: [EMAIL PROTECTED]
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add PCI IDs to k8topology_64.c II

2008-01-28 Thread Yinghai Lu

On Jan 29, 2008 12:09 AM, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > SRAT is essentially just a two dimensional table with node distances.
>
> Sorry, that was actually SLIT. SRAT is not two dimensional, but also
> relatively simple. SLIT you don't really need to implement.
>

need to add some CONFIG option to parse SRAT, MADT etc only. but don't
pull DSDT related...

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] let LOG_BUF_SHIFT default to 17

2008-01-28 Thread Adrian Bunk

On Tue, Jan 29, 2008 at 07:35:00AM +0100, Andi Kleen wrote:
> Adrian Bunk <[EMAIL PROTECTED]> writes:
> 
> > 16 kB is often no longer enough for a normal boot of an UP system.
> 
> Better would be to just disable by default/remove noisy messages
> to make the kernel boot output shorter.
> 
> I think we got a lot of IMHO useless messages in there.

Nearly 25% of the messages in my dmesg come from ACPI.

Plus many other messages that cover these generic resources stuff (e.g. 
BIOS map or ioport/iomem reservations).

All of these tend to be of great help when debugging driver regressions, 
since diff'ing the dmesg's often is enough for determining whether it's 
a driver problem or a PCI/ACPI/whatever problem.

> -Andi

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add PCI IDs to k8topology_64.c II

2008-01-28 Thread Andi Kleen

> SRAT is essentially just a two dimensional table with node distances.

Sorry, that was actually SLIT. SRAT is not two dimensional, but also
relatively simple. SLIT you don't really need to implement.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Parallelize IO for e2fsck

2008-01-28 Thread david


On Mon, 28 Jan 2008, Theodore Tso wrote:


On Mon, Jan 28, 2008 at 07:30:05PM +, Pavel Machek wrote:


As user pages are always in highmem, this should be easy to decide:
only send SIGDANGER when highmem is full. (Yes, there are
inodes/dentries/file descriptors in lowmem, but I doubt apps will
respond to SIGDANGER by closing files).


Good point; for a system with at least (say) 2GB of memory, that
definitely makes sense.  For a system with less than 768 megs of
memory (how quaint, but it wasn't that long ago this was a lot of
memory :-), there wouldn't *be* any memory in highmem at all


not to mention machines with 1G of ram (900M lowmem, 128M highmem)

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24 breaks BIOS updates on all Dell machines

2008-01-28 Thread Greg KH

On Tue, Jan 29, 2008 at 12:32:44AM -0600, Michael E Brown wrote:
> BIOS updates are broken on all Dell systems due to Commit
> 109f0e93b6b728f03c1eb4af02bc25d71b646c59, which is now in 2.6.24.
> 
>   static inline void fw_setup_device_id(struct device *f_dev, struct
> device *dev)
>   {
> -   /* XXX warning we should watch out for name collisions */
> -   strlcpy(f_dev->bus_id, dev->bus_id, BUS_ID_SIZE);
> +   snprintf(f_dev->bus_id, BUS_ID_SIZE, "firmware-%s",
> dev->bus_id);
>  }
> 
> Two programs are broken by this change: 
> 1) dellBiosUpdate, which is part of libsmbios
> 2) All of the Dell Update Packages (DUPs) that are part of Dell
> OpenManage: each BIOS release for each of 3-4 dozen platforms.
> 
> These programs are broken due to the pathname change from
> /sys/class/firmware/dell_rbu/   to
> /sys/class/firmware/firmware-dell_rbu/loading. 
> 
> Realistically, I can fix libsmbios in a couple of weeks, but there is no
> way that we can go back and fix a couple hundred DUP packages for this
> change. If this stays, we are looking at over 6 months before we have an
> officially-available Dell OpenManage that can deal with it, and that
> would be for new BIOS releases only, I suspect.
> 
> Some of the relevant threads from when this was submitted and accepted:
> http://lkml.org/lkml/2005/5/23/73
> http://lkml.org/lkml/2005/5/23/62
> 
> Due to the extremely large and disruptive nature of this bug, it would
> be nice to get a 2.6.24.1 with this patch reverted.
> 
> I have copied the relevant developers at Dell who maintain this driver.
> Please preserve the cc: list when replying.

Fair enough, I have no problem reverting this.

Anyone want to keep it in?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] dm-band: The I/O bandwidth controller: Performance Report

2008-01-28 Thread INAKOSHI Hiroya

Hi,

Ryo Tsuruta wrote:
> The results of bandwidth control test on band-groups.
> =
> The configurations of the test #3:
>o Prepare three partitions sdb5 and sdb6.
>o Create two extra band-groups on sdb5, the first is of user1 and the
>  second is of user2.
>o Give weights of 40, 20, 10 and 10 to the user1 band-group, the user2
>  band-group, the default group of sdb5 and sdb6 respectively.
>o Run 128 processes issuing random read/write direct I/O with 4KB data
>  on each device at the same time.

you mean that you run 128 processes on each user-device pairs?  Namely,
I guess that

  user1: 128 processes on sdb5,
  user2: 128 processes on sdb5,
  another: 128 processes on sdb5,
  user2: 128 processes on sdb6.

> Conclusions and future works
> 
> Dm-band works well with random I/Os. I have a plan on running some tests
> using various real applications such as databases or file servers.
> If you have any other good idea to test dm-band, please let me know.

The second preliminary studies might be:

- What if you use a different I/O size on each device (or device-user pair)?
- What if you use a different number of processes on each device (or
device-user pair)?


And my impression is that it's natural dm-band is in device-mapper,
separated from I/O scheduler.  Because bandwidth control and I/O
scheduling are two different things, it may be simpler that they are
implemented in different layers.

Regards,

Hiroya.


> 
> Thank you,
> Ryo Tsuruta.
> 
> ___
> Xen-devel mailing list
> [EMAIL PROTECTED]
> http://lists.xensource.com/xen-devel
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Florian Attenberger

On Mon, 28 Jan 2008 14:13:21 -0500
Gene Heskett <[EMAIL PROTECTED]> wrote:


> >> I had to reboot early this morning due to a freezeup, and I had a
> >> bunch of these in the messages log:
> >> ==
> >> Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0
> >> SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel:
> >> [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma
> >> 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974]  res
> >> 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11
> >> coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11
> >> coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12
> >> coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27
> >> 19:42:12 coyote kernel: [42462.078232] ata1: EH complete
> >> Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968
> >> 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel:
> >> [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12
> >> coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read
> >> cache: enabled, doesn't support DPO or FUA
> >> ===


I had this error too, or maybe only a similar one, and another, neither
of which of i still have the error output laying around, so I'm posting both
fixes, that i found here on lkml:
1) disabling ncq like that:
"echo 1 > /sys/block/sda/device/queue_depth" 
2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch 
( applies to 2.6.24 too )

Signed-off-by: Mark Lord <[EMAIL PROTECTED]>
---

--- old/drivers/ata/libata-sff.c2007-09-28 09:29:22.0 -0400
+++ linux/drivers/ata/libata-sff.c  2007-09-28 09:39:44.0 -0400
@@ -420,6 +420,28 @@
ap->ops->irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+   u8 stat = ata_chk_status(ap);
+   /*
+* Try to clear stuck DRQ if necessary,
+* by reading/discarding up to two sectors worth of data.
+*/
+   if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+   unsigned int i;
+   unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
+
+   printk(KERN_WARNING "Draining up to %u words from data FIFO.\n",
+   limit);
+   for (i = 0; i < limit ; ++i) {
+   ioread16(ap->ioaddr.data_addr);
+   if (!(ata_chk_status(ap) & ATA_DRQ))
+   break;
+   }
+   printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
+   }
+}
+
 /**
  * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  * @ap: port to handle error for
@@ -476,7 +498,7 @@
}
 
ata_altstatus(ap);
-   ata_chk_status(ap);
+   ata_drain_fifo(ap, qc);
ap->ops->irq_clear(ap);
 
spin_unlock_irqrestore(ap->lock, flags);
-





-- 
Florian Attenberger <[EMAIL PROTECTED]>


pgpqZfRawkKTf.pgp
Description: PGP signature

Re: [2.6 patch] let LOG_BUF_SHIFT default to 17

2008-01-28 Thread Andi Kleen

Adrian Bunk <[EMAIL PROTECTED]> writes:

> 16 kB is often no longer enough for a normal boot of an UP system.

Better would be to just disable by default/remove noisy messages
to make the kernel boot output shorter.

I think we got a lot of IMHO useless messages in there.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Improve Documentation/stable_api_nonsense.txt

2008-01-28 Thread Andi Kleen

[EMAIL PROTECTED] (Heikki Orsila) writes:
>  
> +Some complain that kernel interfaces change too often for out-of-the-tree
> +modules, but this claim is false. Changing an interface can be delicate work,
> +and it can take significant amount of developer effort. Therefore, interfaces
> +are not changed without a good reason.

This is actually not correct. Sometimes interfaces are changed
without good reason or the for reasons that turn out to be incorrect.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ipwireless: driver for 3G PC Card

2008-01-28 Thread Pekka Enberg

Hi,

On Jan 29, 2008 1:33 AM, Randy Dunlap <[EMAIL PROTECTED]> wrote:
> > > What part of kernel documentation uses doxygen?
> >
> > So then, what's the problem?
>
> Why is it there?  We have a kernel documentation language.
> Please use it or plain text.

Yes please.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/5] Module: check to see if we have a built in module with the same name

2008-01-28 Thread Rusty Russell

On Monday 28 January 2008 10:38:40 Greg Kroah-Hartman wrote:
> When trying to load a module with the same name as a built-in one, a
> scary kobject backtrace comes up.  Prevent that from checking for this
> condition and warning the user as to what exactly is going on.
>
> Cc: Rusty Russell <[EMAIL PROTECTED]>
> Cc: Linus Torvalds <[EMAIL PROTECTED]>
> Cc: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> ---
>  kernel/module.c |   10 ++
>  1 files changed, 10 insertions(+), 0 deletions(-)

Oh, I pushed this as part of my module updates.

Unfortunately Andrew still doesn't seem to have picked up my patch queue, and 
keeps grabbing random (sometimes outdated) patches which are also in my 
tree :(

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PULL] module patches

2008-01-28 Thread Rusty Russell

The following changes since commit 8561b089afbaed2651591e5a4574fdca451d82f2:
  Linus Torvalds (1):
Merge git://git.kernel.org/.../wim/linux-2.6-watchdog

are available in the git repository at:

  ssh://master.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus 
master

Denis Cheng (1):
  module: fix the module name length in param_sysfs_builtin

Greg Kroah-Hartman (1):
  Module: check to see if we have a built in module with the same name

Jon Masters (1):
  module: add module taint on ndiswrapper

Rusty Russell (5):
  module: Don't report discarded init pages as kernel text.
  module: wait for dependent modules doing init.
  module: Fix gratuitous sprintf in module.c
  module: better OOPS and lockdep coverage for loading modules
  module: make module_address_lookup safe

 include/linux/module.h |   22 ++
 kernel/extable.c   |3 +-
 kernel/kallsyms.c  |   11 ++---
 kernel/module.c|  102 
 kernel/params.c|8 +--
 5 files changed, 90 insertions(+), 56 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.24-ext4-1 patchset released

2008-01-28 Thread Theodore Ts'o


I've just released 2.6.24-ext4-1.  It's basically just a clean up of the
stable patch series, in response to LKML review comments, in preparation
for Linus to pull them into mainline.  

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git 2.6.24-ext4-1
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=shortlog;h=2.6.24-ext4-1

and

ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/ext4-patches/2.6.24-ext4-1

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add PCI IDs to k8topology_64.c

2008-01-28 Thread H. Peter Anvin


Andi Kleen wrote:

also there are some users are using LinuxBIOS or other firmware that doesn't 
have  or like ACPI support. but they still need numa.
for them ACPI doesn't help.


We've had this discussion before. The right way even if you don't 
want to do full ACPI is to do just the minimal static boot tables

and a SRAT. These are quite simple tables and should be easy to set up.
SRAT is essentially just a two dimensional table with node distances.



Indeed.  Hacking everything into the kernel just because LinuxBIOS is 
braindead is NOT a good idea.


-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3][RFC] x86: Catch stray non-kprobe breakpoints

2008-01-28 Thread Ananth N Mavinakayanahalli

On Sun, Jan 27, 2008 at 02:38:56PM +0530, Abhishek Sagar wrote:
> Greetings,
> 
> Non kprobe breakpoints in the kernel might lie inside the .kprobes.text 
> section. Such breakpoints can easily be identified by in_kprobes_functions 
> and can be caught early. These are problematic and a warning should be 
> emitted to discourage them (in any rare case, if they actually occur).

Why? As Masami indicated in an earlier reply, the annotation is to
prevent *only* kprobes.
 
> For this, a check can route the trap handling of such breakpoints away from 
> kprobe_handler (which ends up calling even more functions marked as 
> __kprobes) from inside kprobe_exceptions_notify.

Well.. we pass on control of a !kprobe breakpoint to the kernel. This is
exactly what permits debuggers like xmon to work fine now. I don't see
any harm in such breakpoints being handled autonomously without any sort
of kprobe influence.

Ananth
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Fix recent Ocfs2 breakage

2008-01-28 Thread Mark Fasheh

On Mon, Jan 28, 2008 at 09:08:04PM -0800, Greg KH wrote:
> > Joel Becker (1):
> >   ocfs2: Fix userspace ABI breakage in sysfs
> 
> This is fine with me, for now.

Great, thanks.


> > From: Joel Becker <[EMAIL PROTECTED]>
> > 
> > ocfs2: Fix userspace ABI breakage in sysfs
> > 
> > The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
> > commit c60b71787982cefcf9fa09aa281fa8c4c685d557 "kset: convert ocfs2 to
> > use kset_create".  Specifically, the '/sys/o2cb' kset was moved to
> > '/sys/fs/o2cb'.  This breaks all ocfs2 tools and renders the
> > filesystem unmountable.
> > 
> > This fix moves '/sys/o2cb' back where it belongs.
> 
> "belongs" is pretty odd here.  This is a filesystem specific thing,
> right?  Why not put it in /sys/fs/ then?

We had it there before /sys/fs and as has been noted, it's ABI so we can't
change it right away. In theory, it's actually outside the fs, but in
reality it's pretty tied to Ocfs2, so I have no objection to the idea of it
being eventually moved there.


> And yes, I understand about legacy userspace tools, that's why I have no
> objection to it going back.  But you can put it in both places (with a
> symlink) and change your userspace code, and in a year or so, drop the
> symlink, right?

Yeah, that sounds entirely reasonable. It shouldn't be too hard for us to
fix up ocfs2-tools to look in both places. So long as there's enough lead
time for users to upgrade their toolchain (we can do releases for all
branches of ocfs2-tools, make annoucements on lists, etc), I think the
impact shouldn't be too bad.


> And please please please please document stuff like this, and all of the
> different files you have in this subdirectory in Documentation/ABI/ so
> those of us who are trying to figure out the code (and there's still
> parts of the kobject usage I'm pretty sure is not correct) can have a
> chance to understand exactly how this stuff is being used and expected
> to work.

No problem. I'll get us some patches to symlink things, and add docs in
Documentation/ABI/ explaining how Ocfs2 and userspace communicate. In the
future, as we add ABI it'll be documented there.
--Mark

--
Mark Fasheh
Principal Software Developer, Oracle
[EMAIL PROTECTED]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24 panics initializing ne2k in mips.

2008-01-28 Thread Rob Landley

On Monday 28 January 2008 02:17:37 Rob Landley wrote:
> The 2.6.23 kernel built for mips with the attached .config works fine for
> me under qemu (both big endian and little endian), but a 2.6.24 mips kernel
> segfaults initializing the ne2k driver (again when run under qemu).
>
> I've traced it to this commit:
>
>   http://kernel.org/hg/linux-2.6/rev/74258
>
> Version 74257 works, 74258 does not.

For the git users in the audience, that's commit 
30e748a507919a41f9eb4d10b4954f453325a37d

I just confirmed that:

A) today's -git still exhibits this bug.

B) Backing out the above irq-twiddling patch makes it stop exhibiting the bug 
for me.  (Boots to a shell prompt, it can use the network, etc.)

So that patch is definitely the trigger for the panic, and reverting the patch 
is one way to fix it.  (And probably the best way to make it work for 
2.6.24.1.)

Rob
-- 
"One of my most productive days was throwing away 1000 lines of code."
  - Ken Thompson.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] ext4 update

2008-01-28 Thread Theodore Ts'o


Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

This is the major set of updates meant for 2.6.24 from the ext4 team;
these patches have been baking in -mm for a while.  The two major
features included here is the multi-block allocator that has been in use
by Clusterfs for their luster filesystem, as well as the journal
checksumming features.  There were also a huge number of clean ups and
various bug fixes.

Regards,

- Ted

Adrian Bunk (1):
  ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*)

Alex Tomas (2):
  ext4: Add new functions for searching extent tree
  ext4: Add multi block allocator for ext4

Aneesh Kumar K.V (23):
  ext4: Introduce ext4_lblk_t
  ext4: Introduce ext4_update_*_feature
  ext4:  Fix sparse warnings.
  ext4: Rename i_file_acl to i_file_acl_lo
  ext4: Rename i_dir_acl to i_size_high
  ext4: Add support for 48 bit inode i_blocks.
  ext4: Support large files
  ext2: Fix the max file size for ext2 file system.
  ext3: Fix the max file size for ext3 file system.
  ext4: Return after ext4_error in case of failures
  ext4: Change the default behaviour on error
  Add buffer head related helper functions
  ext4: add block bitmap validation
  ext4: Check for the correct error return from
  ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.
  ext4: Convert truncate_mutex to read write semaphore.
  ext4: Take read lock during overwrite case.
  ext4: Add EXT4_IOC_MIGRATE ioctl
  ext4: Fix ext4_show_options to show the correct mount options.
  ext4: Add ext4_find_next_bit()
  ext4: Enable the multiblock allocator by default
  ext4: Check for return value from sb_set_blocksize
  ext4: Use the ext4_ext_actual_len() helper function

Avantika Mathur (2):
  ext4: add ext4_group_t, and change all group variables to this type.
  ext4: fixes block group number being set to a negative value

Chris Snook (1):
  jbd2: Remove printk from J_ASSERT to preserve registers during BUG

Coly Li (1):
  ext4: sync up block group descriptor with e2fsprogs.

Dmitry Monakhov (1):
  ext4: fix uniniatilized extent splitting error

Eric Sandeen (6):
  ext4 extents: remove unneeded casts
  ext4: different maxbytes functions for bitmap & extentfiles
  ext4: export iov_shorten from kernel for ext4's use
  ext4: store maxbytes for bitmapped  files and return EFBIG as appropriate
  ext4: fix oops on corrupted ext4 mount
  ext4: fix up EXT4FS_DEBUG builds

Girish Shilamkar (1):
  ext4: Add the journal checksum feature

Jan Kara (2):
  ext4: Avoid rec_len overflow with 64KB block size
  jbd2: Fix assertion failure in fs/jbd2/checkpoint.c

Jean Noel Cordenner (2):
  vfs: Add 64 bit i_version support
  ext4: Add inode version support in ext4

Johann Lombardi (1):
  jbd2: jbd2 stats through procfs

Mariusz Kozlowski (1):
  ext4: remove unused code from ext4_find_entry()

Miklos Szeredi (1):
  ext4: Add stripe= option to /proc/mounts

Mingming Cao (4):
  jbd2: add lockdep support
  jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
  jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
  jbd2: sparse pointer use of zero as null

Takashi Sato (1):
  ext4:  Support large blocksize up to PAGESIZE

 Documentation/filesystems/ext4.txt |   10 
 b/Documentation/filesystems/ext4.txt   |   10 
 b/Documentation/filesystems/proc.txt   |   39 
 b/fs/Kconfig   |1 
 b/fs/afs/dir.c |9 
 b/fs/afs/inode.c   |3 
 b/fs/buffer.c  |   44 
 b/fs/ext2/super.c  |   32 
 b/fs/ext3/super.c  |   32 
 b/fs/ext4/Makefile |2 
 b/fs/ext4/balloc.c |   67 
 b/fs/ext4/dir.c|2 
 b/fs/ext4/extents.c|   24 
 b/fs/ext4/file.c   |4 
 b/fs/ext4/group.h  |8 
 b/fs/ext4/ialloc.c |2 
 b/fs/ext4/inode.c  |   15 
 b/fs/ext4/ioctl.c  |3 
 b/fs/ext4/mballoc.c| 4552 +
 b/fs/ext4/migrate.c|  570 +++
 b/fs/ext4/namei.c  |4 
 b/fs/ext4/resize.c |   16 
 b/fs/ext4/super.c  |   15 
 b/fs/ext4/xattr.c  |4 
 b/fs/inode.c   |   17 
 b/fs/jbd2/checkpoint.c |   10 
 b/fs/jbd2/commit.c

[PATCH] dlm/user.c: static initialization improvements

2008-01-28 Thread Denis Cheng

also change name_prefix from char pointer to char array.

Signed-off-by: Denis Cheng <[EMAIL PROTECTED]>
---
 fs/dlm/user.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index 4f74154..2cc5415 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -24,8 +24,7 @@
 #include "lvb_table.h"
 #include "user.h"
 
-static const char *name_prefix="dlm";
-static struct miscdevice ctl_device;
+static const char name_prefix[] = "dlm";
 static const struct file_operations device_fops;
 
 #ifdef CONFIG_COMPAT
@@ -896,14 +895,16 @@ static const struct file_operations ctl_device_fops = {
.owner   = THIS_MODULE,
 };
 
+static struct miscdevice ctl_device = {
+   .name  = "dlm-control",
+   .fops  = _device_fops,
+   .minor = MISC_DYNAMIC_MINOR,
+};
+
 int dlm_user_init(void)
 {
int error;
 
-   ctl_device.name = "dlm-control";
-   ctl_device.fops = _device_fops;
-   ctl_device.minor = MISC_DYNAMIC_MINOR;
-
error = misc_register(_device);
if (error)
log_print("misc_register failed for control device");
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PATCH] driver core patches against 2.6.24

2008-01-28 Thread Rusty Russell

On Sunday 27 January 2008 17:42:28 Linus Torvalds wrote:
> My problem is that the *driver* already exists (because it's compiled in),
> and has already initialized itself, and has already registered.
>
> Then, initrd tries to load an old module for that driver.

I hate to say it, but this is user error.  And it used to be that for some 
drivers you'd actually end up with two in-kernel if you did that.

But if even *you* don't get this right, it should finally prompt us to fix 
this...

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch] Removal of FUTEX_FD

2008-01-28 Thread Rusty Russell

On Friday 25 January 2008 20:40:46 Eric Sesterhenn wrote:
> hi,
>
> since FUTEX_FD was scheduled for removal in June 2007 lets remove it.
> Google Code search found no users for it and NGPT was abandoned in 2003
> according to IBM. futex.h is left untouched to make sure the id does
> not get reassigned. Since queue_me() has no users left it is commented
> out to avoid a warning, i didnt remove it completely since it is part
> of the internal api (matching unqueue_me())

Thanks, I've queued this with a futher cleanup (rename __queue_me to queue_me, 
add comments, and remove the old one entirely).

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Michal Jaegermann

On Mon, Jan 28, 2008 at 08:31:57PM -0500, Gene Heskett wrote:
> 
> In my script, its one line:
> mkinitrd -f initrd-$VER.img $VER && \
> 
> where $VER is the shell variable I edit to = the version number, located at 
> the top of the script.
> 
> Unforch, its failing:
> No module pata_amd found for kernel 2.6.24, aborting.

mkinitrd is just a shell script.  Even if its options, and there is
a quite a number of these, do not allow to influence a choice of
modules in a desired manner, it is pretty trivial to make yourself a
custom version of it and just hardwire there a fixed list of modules
to use instead of relying on general mechanisms which are trying
hard to guess what you may need.

That way your regular 'mkinitrd' will build something to boot with
libata and 'mkinird.ide' will use IDE modules for that purpose using
the same "core" kernel.

If you are using distribution kernels, as opposed to your own
configuration, it is quite likely that you will need to install
'kernel-devel' package and recompile and add required IDE modules
yourself as those may be not provided.  This is done the same way
like for any other "external" module.

   Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Reduce ifdef sections in fault.c

2008-01-28 Thread Jeremy Fitzhardinge


Harvey Harrison wrote:

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
  


x86.git has my patch which makes the pgd_list the same for 32-bit and 
64-bit, which means the code which traverses that list can be common now.


   J

---
 arch/x86/mm/fault.c |   31 +--
 1 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e28cc52..2737493 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -508,6 +508,10 @@ static int vmalloc_fault(unsigned long address)
pmd_t *pmd, *pmd_ref;
pte_t *pte, *pte_ref;
 
+	/* Make sure we are in vmalloc area */

+   if (!(address >= VMALLOC_START && address < VMALLOC_END))
+   return -1;
+
/* Copy kernel mappings over when needed. This can also
   happen within a race in page table update. In the later
   case just flush. */
@@ -603,6 +607,9 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned 
long error_code)
 */
 #ifdef CONFIG_X86_32
if (unlikely(address >= TASK_SIZE)) {
+#else
+   if (unlikely(address >= TASK_SIZE64)) {
+#endif
if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
vmalloc_fault(address) >= 0)
return;
@@ -618,6 +625,8 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned 
long error_code)
goto bad_area_nosemaphore;
}
 
+

+#ifdef CONFIG_X86_32
/* It's safe to allow irq's after cr2 has been saved and the vmalloc
   fault has been handled. */
if (regs->flags & (X86_EFLAGS_IF|VM_MASK))
@@ -630,28 +639,6 @@ void __kprobes do_page_fault(struct pt_regs *regs, 
unsigned long error_code)
if (in_atomic() || !mm)
goto bad_area_nosemaphore;
 #else /* CONFIG_X86_64 */
-   if (unlikely(address >= TASK_SIZE64)) {
-   /*
-* Don't check for the module range here: its PML4
-* is always initialized because it's shared with the main
-* kernel text. Only vmalloc may need PML4 syncups.
-*/
-   if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
- ((address >= VMALLOC_START && address < VMALLOC_END))) {
-   if (vmalloc_fault(address) >= 0)
-   return;
-   }
-
-   /* Can handle a stale RO->RW TLB */
-   if (spurious_fault(address, error_code))
-   return;
-
-   /*
-* Don't take the mm semaphore here. If we fixup a prefetch
-* fault we could otherwise deadlock.
-*/
-   goto bad_area_nosemaphore;
-   }
if (likely(regs->flags & X86_EFLAGS_IF))
local_irq_enable();
 
  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 06/19] dlm: align midcomms message buffer

2008-01-28 Thread Fabio M. Di Nitto


On Sat, 26 Jan 2008, Andrew Morton wrote:


On Thu, 24 Jan 2008 10:50:29 -0600 David Teigland <[EMAIL PROTECTED]> wrote:
From: Fabio M. Di Nitto <[EMAIL PROTECTED]>

gcc does not guarantee that a static buffer is 64bit aligned. This change
allows sparc64 to work.



This buffer is not static: changelog needs fixing: s/static/auto/


---
 fs/dlm/midcomms.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index f8c69dd..da653b5 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -58,7 +58,7 @@ static void copy_from_cb(void *dst, const void *base, 
unsigned offset,
 int dlm_process_incoming_buffer(int nodeid, const void *base,
unsigned offset, unsigned len, unsigned limit)
 {
-   unsigned char __tmp[DLM_INBUF_LEN];
+   unsigned char __tmp[DLM_INBUF_LEN] __attribute__((aligned(64)));
struct dlm_header *msg = (struct dlm_header *) __tmp;
int ret = 0;
int err = 0;


Why does DLM require that this thing be 64-bit aligned?

It all looks rather ugly.  Can't this stuff be implemeted within the C type
system somehow?



how about this one:


From adfe3b0654583d34b0840d20a69e4306d5b98caf Mon Sep 17 00:00:00 2001

Message-Id: <[EMAIL PROTECTED]>
From: Fabio M. Di Nitto <[EMAIL PROTECTED]>
Date: Tue, 29 Jan 2008 06:35:20 +0100
Subject: [PATCH 1/1] dlm: align midcomms message buffer

gcc does not guarantee that an auto buffer is 64bit aligned.
This change allows sparc64 to work.

Signed-off-by: Fabio M. Di Nitto <[EMAIL PROTECTED]>
---
:100644 100644 f8c69dd... 53b7af2... M  fs/dlm/midcomms.c
 fs/dlm/midcomms.c |   13 -
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/dlm/midcomms.c b/fs/dlm/midcomms.c
index f8c69dd..53b7af2 100644
--- a/fs/dlm/midcomms.c
+++ b/fs/dlm/midcomms.c
@@ -58,8 +58,11 @@ static void copy_from_cb(void *dst, const void *base, 
unsigned offset,
 int dlm_process_incoming_buffer(int nodeid, const void *base,
unsigned offset, unsigned len, unsigned limit)
 {
-   unsigned char __tmp[DLM_INBUF_LEN];
-   struct dlm_header *msg = (struct dlm_header *) __tmp;
+   union {
+   unsigned char __buf[DLM_INBUF_LEN];
+   struct dlm_header dlm;
+   } __tmp;
+   struct dlm_header *msg = &__tmp.dlm;
int ret = 0;
int err = 0;
uint16_t msglen;
@@ -100,8 +103,8 @@ int dlm_process_incoming_buffer(int nodeid, const void 
*base,
   in the buffer on the stack (which should work for most
   ordinary messages). */

-   if (msglen > sizeof(__tmp) &&
-   msg == (struct dlm_header *) __tmp) {
+   if (msglen > DLM_INBUF_LEN &&
+   msg == &__tmp.dlm) {
msg = kmalloc(dlm_config.ci_buffer_size, GFP_KERNEL);
if (msg == NULL)
return ret;
@@ -119,7 +122,7 @@ int dlm_process_incoming_buffer(int nodeid, const void 
*base,
dlm_receive_buffer(msg, nodeid);
}

-   if (msg != (struct dlm_header *) __tmp)
+   if (msg != &__tmp.dlm)
kfree(msg);

return err ? err : ret;
--
1.5.3.8


--
I'm going to make him an offer he can't refuse.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: High wake up latencies with FAIR_USER_SCHED

2008-01-28 Thread Srivatsa Vaddagiri

On Mon, Jan 28, 2008 at 09:13:53PM +0100, Guillaume Chazarain wrote:
> Unfortunately it seems to not be completely fixed, with this script:

The maximum scheduling latency of a task with group scheduler is:

Lmax = latency to schedule group entity at level0 + 
   latency to schedule group entity at level1 +
   ...
   latency to schedule task entity at last level

More the hierarchical levels, more the latency looks like. This is particularly 
so because vruntime (and not wall-clock time) is used as the basis of preemption
of entities.  The latency at each level also depends on number of entities at 
that level and sysctl_sched_latency/sched_nr_latency setting.

In this case, we only have two levels - userid + task. So the max scheduling 
latency is:

  Lmax = latency to schedule uid1 group entity (L0) +
 latency to schedule the sleeper task within uid1 group (L1)

In the first script that you had, uid1 had only one sleeper task, while uid2 has
two cpu-hogs. This means L1 is always zero for the sleeper task. L0 is also 
substantially reduced with the patch I sent (giving sleep credit for group 
level entities). Thus we were able to get low scheduling latencies in the case 
of first script.

The second script you have sent is generating two tasks (sleeper + hog) under 
uid 1 and one cpuhog task under uid 2. Consequently the group-entity 
corresponding to uid 1 is always active and hence there is no question of giving
credit to it for sleeping.

As a result, we should expect worst-case latencies of upto [2 * 10 = 20ms] in 
this case. The results you have fall within this range.

In case of !FAIR_USER_SCHED, the sleeper task always gets sleep-credits
and hence its latency is drastically reduced.

IMHO this is expected results and if someone really needs to cut down
this latency, they can reduce sysctl_sched_latency (which will be bad
from perf standpoint, as we will cause more cache thrashing with that).

> #!/usr/bin/python
> 
> import os
> import time
> 
> SLEEP_TIME = 0.1
> SAMPLES = 5
> PRINT_DELAY = 0.5
> 
> def print_wakeup_latency():
> times = []
> last_print = 0
> while True:
> start = time.time()
> time.sleep(SLEEP_TIME)
> end = time.time()
> times.insert(0, end - start - SLEEP_TIME)
> del times[SAMPLES:]
> if end > last_print + PRINT_DELAY:
> copy = times[:]
> copy.sort()
> print '%f ms' % (copy[len(copy)/2] * 1000)
> last_print = end
> 
> if os.fork() == 0:
> if os.fork() == 0:
> os.setuid(1)
> while True:
> pass
> else:
> os.setuid(2)
> while True:
> pass
> else:
> os.setuid(1)
> print_wakeup_latency()
> 
> I get seemingly unpredictable latencies (with or without the patch applied):
> 
> # ./sched.py
> 14.810944 ms
> 19.829893 ms
> 1.968050 ms
> 8.021021 ms
> -0.017977 ms
> 4.926109 ms
> 11.958027 ms
> 5.995893 ms
> 1.992130 ms
> 0.007057 ms
> 0.217819 ms
> -0.004864 ms
> 5.907202 ms
> 6.547832 ms
> -0.012970 ms
> 0.209951 ms
> -0.002003 ms
> 4.989052 ms
> 
> Without FAIR_USER_SCHED, latencies are consistently in the noise.
> Also, I forgot to mention that I'm on a single CPU.
> 
> Thanks for the help.
> 
> -- 
> Guillaume

-- 
Regards,
vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Atari floppy: Rename disk_type to atari_disk_type

2008-01-28 Thread Greg KH

On Sun, Jan 27, 2008 at 02:28:35PM +0100, Geert Uytterhoeven wrote:
> Atari floppy: Rename disk_type to atari_disk_type
> 
> Commit edfaa7c36574f1bf09c65ad602412db9da5f96bf
> 
> Driver core: convert block from raw kobjects to core devices
> 
> This moves the block devices to /sys/class/block. It will create a
> flat list of all block devices, with the disks and partitions in one
> directory. For compatibility /sys/block is created and contains symlinks
> to the disks.
> 
> introduced a global disk_type variable in , causing the
> following compile error on Atari:
> 
> drivers/block/ataflop.c:93: error: conflicting types for 'disk_type'
> include/linux/genhd.h:21: error: previous declaration of 'disk_type' was 
> here
> 
> Rename the local disk_type variable in drivers/block/ataflop.c to
> atari_disk_type, to avoid the conflict.
> 
> Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]>
> Cc: Kay Sievers <[EMAIL PROTECTED]>
> Cc: Greg Kroah-Hartman <[EMAIL PROTECTED]>

Acked-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] Fix recent Ocfs2 breakage

2008-01-28 Thread Greg KH

On Mon, Jan 28, 2008 at 07:33:07PM -0800, Mark Fasheh wrote:
> Greg's commit c60b71787982cefcf9fa09aa281fa8c4c685d557 inadvertantly broke
> Ocfs2 userspace ABI, so I have a rather high priority single line patch from
> Joel to fix things up for you to pull. A copy of the patch is attached to
> the bottom of this e-mail. Embarassingly enough, I missed this while acking
> the patch late last week :(
> 
> Please pull from 'upstream-linus' branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus
> 
> to receive the following updates:
> 
>  fs/ocfs2/cluster/sys.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> Joel Becker (1):
>   ocfs2: Fix userspace ABI breakage in sysfs

This is fine with me, for now.

> From: Joel Becker <[EMAIL PROTECTED]>
> 
> ocfs2: Fix userspace ABI breakage in sysfs
> 
> The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
> commit c60b71787982cefcf9fa09aa281fa8c4c685d557 "kset: convert ocfs2 to
> use kset_create".  Specifically, the '/sys/o2cb' kset was moved to
> '/sys/fs/o2cb'.  This breaks all ocfs2 tools and renders the
> filesystem unmountable.
> 
> This fix moves '/sys/o2cb' back where it belongs.

"belongs" is pretty odd here.  This is a filesystem specific thing,
right?  Why not put it in /sys/fs/ then?

And yes, I understand about legacy userspace tools, that's why I have no
objection to it going back.  But you can put it in both places (with a
symlink) and change your userspace code, and in a year or so, drop the
symlink, right?

And please please please please document stuff like this, and all of the
different files you have in this subdirectory in Documentation/ABI/ so
those of us who are trying to figure out the code (and there's still
parts of the kobject usage I'm pretty sure is not correct) can have a
chance to understand exactly how this stuff is being used and expected
to work.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [8/9] GBPAGES: Implement gbpages support in change_page_attr()

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/pageattr.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/pageattr.c
===
--- linux.orig/arch/x86/mm/pageattr.c
+++ linux/arch/x86/mm/pageattr.c
@@ -203,6 +203,7 @@ static int split_large_page(pte_t *kpte,
pte_t *pbase, *tmp;
struct page *base;
int i, level;
+   unsigned long ps;
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
gfp_flags = GFP_ATOMIC;
@@ -224,12 +225,22 @@ static int split_large_page(pte_t *kpte,
 
address = __pa(address);
addr = address & PMD_PAGE_MASK;
+
+   ps = PAGE_SIZE;
+#ifdef CONFIG_X86_64
+   if (level == PG_LEVEL_1G) {
+   ps = PMD_PAGE_SIZE;
+   pgprot_val(ref_prot) |= _PAGE_PSE;
+   addr &= PUD_PAGE_MASK;
+   }
+#endif
+
pbase = (pte_t *)page_address(base);
 #ifdef CONFIG_X86_32
paravirt_alloc_pt(_mm, page_to_pfn(base));
 #endif
 
-   for (i = 0; i < PTRS_PER_PTE; i++, addr += PAGE_SIZE)
+   for (i = 0; i < PTRS_PER_PTE; i++, addr += ps)
set_pte([i], pfn_pte(addr >> PAGE_SHIFT, ref_prot));
 
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [9/9] GBPAGES: Do kernel direct mapping at boot using GB pages

2008-01-28 Thread Andi Kleen


This should decrease TLB pressure because the kernel will need
less TLB faults for its own data access.

Only done for 64bit because i386 does not support GB page tables.

This only applies to the data portion of the direct mapping; the
kernel text mapping stays with 2MB pages because the AMD Fam10h
microarchitecture does not support GB ITLBs and AMD recommends 
against using GB mappings for code.

Can be disabled with direct_gbpages=off

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/init_64.c |   64 ++
 1 file changed, 55 insertions(+), 9 deletions(-)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -279,13 +279,20 @@ __meminit void early_iounmap(void *addr,
__flush_tlb_all();
 }
 
+static unsigned long direct_entry(unsigned long paddr)
+{
+   unsigned long entry;
+   entry = __PAGE_KERNEL_LARGE|paddr;
+   entry &= __supported_pte_mask;
+   return entry;
+}
+
 static void __meminit
 phys_pmd_init(pmd_t *pmd_page, unsigned long address, unsigned long end)
 {
int i = pmd_index(address);
 
for (; i < PTRS_PER_PMD; i++, address += PMD_SIZE) {
-   unsigned long entry;
pmd_t *pmd = pmd_page + pmd_index(address);
 
if (address >= end) {
@@ -299,9 +306,7 @@ phys_pmd_init(pmd_t *pmd_page, unsigned 
if (pmd_val(*pmd))
continue;
 
-   entry = __PAGE_KERNEL_LARGE|_PAGE_GLOBAL|address;
-   entry &= __supported_pte_mask;
-   set_pmd(pmd, __pmd(entry));
+   set_pmd(pmd, __pmd(direct_entry(address)));
}
 }
 
@@ -335,7 +340,13 @@ phys_pud_init(pud_t *pud_page, unsigned 
}
 
if (pud_val(*pud)) {
-   phys_pmd_update(pud, addr, end);
+   if (!pud_large(*pud))
+   phys_pmd_update(pud, addr, end);
+   continue;
+   }
+
+   if (direct_gbpages > 0) {
+   set_pud(pud, __pud(direct_entry(addr)));
continue;
}
 
@@ -356,9 +367,11 @@ static void __init find_early_table_spac
unsigned long puds, pmds, tables, start;
 
puds = (end + PUD_SIZE - 1) >> PUD_SHIFT;
-   pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
-   tables = round_up(puds * sizeof(pud_t), PAGE_SIZE) +
-round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+   tables = round_up(puds * sizeof(pud_t), PAGE_SIZE);
+   if (!direct_gbpages) {
+   pmds = (end + PMD_SIZE - 1) >> PMD_SHIFT;
+   tables += round_up(pmds * sizeof(pmd_t), PAGE_SIZE);
+   }
 
/*
 * RED-PEN putting page tables only on node 0 could
@@ -378,6 +391,20 @@ static void __init find_early_table_spac
(table_start << PAGE_SHIFT) + tables);
 }
 
+static void init_gbpages(void)
+{
+#ifdef CONFIG_DEBUG_PAGEALLOC
+   /* debug pagealloc causes too much recursion with gbpages */
+   if (direct_gbpages == 0)
+   return;
+#endif
+   if (direct_gbpages >= 0 && cpu_has_gbpages) {
+   printk(KERN_INFO "Using GB pages for direct mapping\n");
+   direct_gbpages = 1;
+   } else
+   direct_gbpages = 0;
+}
+
 /*
  * Setup the direct mapping of the physical memory at PAGE_OFFSET.
  * This runs before bootmem is initialized and gets pages directly from
@@ -396,8 +423,10 @@ void __init_refok init_memory_mapping(un
 * memory mapped. Unfortunately this is done currently before the
 * nodes are discovered.
 */
-   if (!after_bootmem)
+   if (!after_bootmem) {
+   init_gbpages();
find_early_table_space(end);
+   }
 
start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
@@ -444,6 +473,21 @@ void __init paging_init(void)
 }
 #endif
 
+static void split_gb_page(pud_t *pud, unsigned long paddr)
+{
+   int i;
+   pmd_t *pmd;
+   struct page *p = alloc_page(GFP_KERNEL);
+   if (!p)
+   return;
+
+   paddr &= PUD_PAGE_MASK;
+   pmd = page_address(p);
+   for (i = 0; i < PTRS_PER_PTE; i++, paddr += PMD_PAGE_SIZE)
+   pmd[i] = __pmd(direct_entry(paddr));
+   pud_populate(NULL, pud, pmd);
+}
+
 /*
  * Unmap a kernel mapping if it exists. This is useful to avoid
  * prefetches from the CPU leading to inconsistent cache lines.
@@ -467,6 +511,8 @@ __clear_kernel_mapping(unsigned long add
continue;
 
pud = pud_offset(pgd, address);
+   if (pud_large(*pud))
+   split_gb_page(pud, __pa(address));
if (pud_none(*pud))
continue;
 
--
To unsubscribe from this list: send the

[PATCH] [7/9] Add an option to disable direct mapping gbpages and a global variable

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 Documentation/x86_64/boot-options.txt |3 +++
 arch/x86/mm/init_64.c |   12 
 include/asm-x86/pgtable_64.h  |2 ++
 3 files changed, 17 insertions(+)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -53,6 +53,18 @@ static unsigned long dma_reserve __initd
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
+int direct_gbpages;
+
+static int __init parse_direct_gbpages(char *arg)
+{
+   if (!strcmp(arg, "off")) {
+   direct_gbpages = -1;
+   return 0;
+   }
+   return -1;
+}
+early_param("direct_gbpages", parse_direct_gbpages);
+
 /*
  * NOTE: pagetable_init alloc all the fixmap pagetables contiguous on the
  * physical space so we can cache the place of the first one and move
Index: linux/include/asm-x86/pgtable_64.h
===
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -239,6 +239,8 @@ static inline int pud_large(pud_t pte)
 
 #define update_mmu_cache(vma,address,pte) do { } while (0)
 
+extern int direct_gbpages;
+
 /* Encode and de-code a swap entry */
 #define __swp_type(x)  (((x).val >> 1) & 0x3f)
 #define __swp_offset(x)((x).val >> 8)
Index: linux/Documentation/x86_64/boot-options.txt
===
--- linux.orig/Documentation/x86_64/boot-options.txt
+++ linux/Documentation/x86_64/boot-options.txt
@@ -307,3 +307,6 @@ Debugging
stuck (default)
 
 Miscellaneous
+
+   direct_gbpages=off
+   Do not use GB pages for kernel direct mapping.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [6/9] GBPAGES: Add gbpages support to lookup_address

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/pageattr.c|5 +
 include/asm-x86/pgtable.h |1 +
 2 files changed, 6 insertions(+)

Index: linux/arch/x86/mm/pageattr.c
===
--- linux.orig/arch/x86/mm/pageattr.c
+++ linux/arch/x86/mm/pageattr.c
@@ -152,9 +152,14 @@ pte_t *lookup_address(unsigned long addr
 
if (pgd_none(*pgd))
return NULL;
+
+   *level = PG_LEVEL_1G;
pud = pud_offset(pgd, address);
if (pud_none(*pud))
return NULL;
+   if (pud_large(*pud))
+   return (pte_t *)pud;
+
pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return NULL;
Index: linux/include/asm-x86/pgtable.h
===
--- linux.orig/include/asm-x86/pgtable.h
+++ linux/include/asm-x86/pgtable.h
@@ -242,6 +242,7 @@ enum {
PG_LEVEL_NONE,
PG_LEVEL_4K,
PG_LEVEL_2M,
+   PG_LEVEL_1G,
 };
 
 /*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [4/9] Add pgtable accessor functions for GB pages

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-x86/pgtable_32.h |2 ++
 include/asm-x86/pgtable_64.h |6 ++
 2 files changed, 8 insertions(+)

Index: linux/include/asm-x86/pgtable_64.h
===
--- linux.orig/include/asm-x86/pgtable_64.h
+++ linux/include/asm-x86/pgtable_64.h
@@ -199,6 +199,12 @@ static inline unsigned long pmd_bad(pmd_
 #define pud_offset(pgd, address) ((pud_t *) pgd_page_vaddr(*(pgd)) + 
pud_index(address))
 #define pud_present(pud) (pud_val(pud) & _PAGE_PRESENT)
 
+static inline int pud_large(pud_t pte)
+{
+   return (pud_val(pte) & (_PAGE_PSE|_PAGE_PRESENT)) ==
+   (_PAGE_PSE|_PAGE_PRESENT);
+}
+
 /* PMD  - Level 2 access */
 #define pmd_page_vaddr(pmd) ((unsigned long) __va(pmd_val(pmd) & PTE_MASK))
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
Index: linux/include/asm-x86/pgtable_32.h
===
--- linux.orig/include/asm-x86/pgtable_32.h
+++ linux/include/asm-x86/pgtable_32.h
@@ -148,6 +148,8 @@ static inline void clone_pgd_range(pgd_t
  */
 #define pgd_offset_k(address) pgd_offset(_mm, address)
 
+static inline int pud_large(pud_t pud) { return 0; }
+
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
  *
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [5/9] GBPAGES: Support gbpages in pagetable dump

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/fault.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/fault.c
===
--- linux.orig/arch/x86/mm/fault.c
+++ linux/arch/x86/mm/fault.c
@@ -240,7 +240,8 @@ void dump_pagetable(unsigned long addres
pud = pud_offset(pgd, address);
if (bad_address(pud)) goto bad;
printk("PUD %lx ", pud_val(*pud));
-   if (!pud_present(*pud)) goto ret;
+   if (!pud_present(*pud) || pud_large(*pud))
+   goto ret;
 
pmd = pmd_offset(pud, address);
if (bad_address(pmd)) goto bad;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] [SPARC/SPARC64] fix usage of .section .sched.text in assembler code

2008-01-28 Thread David Miller

From: Sam Ravnborg <[EMAIL PROTECTED]>
Date: Sat, 26 Jan 2008 23:54:39 +0100

> ld will generate an unique named section when assembler do not
> use "ax" but gcc does. Add the misisng annotation.
> 
> Signed-off-by: Sam Ravnborg <[EMAIL PROTECTED]>
> Cc: Ingo Molnar <[EMAIL PROTECTED]>

Applied, thanks Sam.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [3/9] GBPAGES: Split LARGE_PAGE_SIZE/MASK into PUD_PAGE_SIZE/PMD_PAGE_SIZE

2008-01-28 Thread Andi Kleen


Split the existing LARGE_PAGE_SIZE/MASK macro into two new macros
PUD_PAGE_SIZE/MASK and PMD_PAGE_SIZE/MASK. 

Fix up all callers to use the new names.

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/boot/compressed/head_64.S |8 
 arch/x86/kernel/head_64.S  |4 ++--
 arch/x86/kernel/pci-gart_64.c  |2 +-
 arch/x86/mm/init_64.c  |6 +++---
 arch/x86/mm/pageattr.c |2 +-
 include/asm-x86/page.h |4 ++--
 include/asm-x86/page_32.h  |4 
 include/asm-x86/page_64.h  |3 +++
 8 files changed, 20 insertions(+), 13 deletions(-)

Index: linux/include/asm-x86/page_64.h
===
--- linux.orig/include/asm-x86/page_64.h
+++ linux/include/asm-x86/page_64.h
@@ -23,6 +23,9 @@
 #define MCE_STACK 5
 #define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
+#define PUD_PAGE_SIZE (_AC(1, UL) << PUD_SHIFT)
+#define PUD_PAGE_MASK (~(PUD_PAGE_SIZE-1))
+
 #define __PAGE_OFFSET   _AC(0x8100, UL)
 
 #define __PHYSICAL_START   CONFIG_PHYSICAL_START
Index: linux/arch/x86/boot/compressed/head_64.S
===
--- linux.orig/arch/x86/boot/compressed/head_64.S
+++ linux/arch/x86/boot/compressed/head_64.S
@@ -80,8 +80,8 @@ startup_32:
 
 #ifdef CONFIG_RELOCATABLE
movl%ebp, %ebx
-   addl$(LARGE_PAGE_SIZE -1), %ebx
-   andl$LARGE_PAGE_MASK, %ebx
+   addl$(PMD_PAGE_SIZE -1), %ebx
+   andl$PMD_PAGE_MASK, %ebx
 #else
movl$CONFIG_PHYSICAL_START, %ebx
 #endif
@@ -220,8 +220,8 @@ ENTRY(startup_64)
/* Start with the delta to where the kernel will run at. */
 #ifdef CONFIG_RELOCATABLE
leaqstartup_32(%rip) /* - $startup_32 */, %rbp
-   addq$(LARGE_PAGE_SIZE - 1), %rbp
-   andq$LARGE_PAGE_MASK, %rbp
+   addq$(PMD_PAGE_SIZE - 1), %rbp
+   andq$PMD_PAGE_MASK, %rbp
movq%rbp, %rbx
 #else
movq$CONFIG_PHYSICAL_START, %rbp
Index: linux/arch/x86/kernel/pci-gart_64.c
===
--- linux.orig/arch/x86/kernel/pci-gart_64.c
+++ linux/arch/x86/kernel/pci-gart_64.c
@@ -501,7 +501,7 @@ static __init unsigned long check_iommu_
}
 
a = aper + iommu_size;
-   iommu_size -= round_up(a, LARGE_PAGE_SIZE) - a;
+   iommu_size -= round_up(a, PMD_PAGE_SIZE) - a;
 
if (iommu_size < 64*1024*1024) {
printk(KERN_WARNING
Index: linux/arch/x86/kernel/head_64.S
===
--- linux.orig/arch/x86/kernel/head_64.S
+++ linux/arch/x86/kernel/head_64.S
@@ -63,7 +63,7 @@ startup_64:
 
/* Is the address not 2M aligned? */
movq%rbp, %rax
-   andl$~LARGE_PAGE_MASK, %eax
+   andl$~PMD_PAGE_MASK, %eax
testl   %eax, %eax
jnz bad_address
 
@@ -88,7 +88,7 @@ startup_64:
 
/* Add an Identity mapping if I am above 1G */
leaq_text(%rip), %rdi
-   andq$LARGE_PAGE_MASK, %rdi
+   andq$PMD_PAGE_MASK, %rdi
 
movq%rdi, %rax
shrq$PUD_SHIFT, %rax
Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -443,10 +443,10 @@ __clear_kernel_mapping(unsigned long add
 {
unsigned long end = address + size;
 
-   BUG_ON(address & ~LARGE_PAGE_MASK);
-   BUG_ON(size & ~LARGE_PAGE_MASK);
+   BUG_ON(address & ~PMD_PAGE_MASK);
+   BUG_ON(size & ~PMD_PAGE_MASK);
 
-   for (; address < end; address += LARGE_PAGE_SIZE) {
+   for (; address < end; address += PMD_PAGE_SIZE) {
pgd_t *pgd = pgd_offset_k(address);
pud_t *pud;
pmd_t *pmd;
Index: linux/include/asm-x86/page_32.h
===
--- linux.orig/include/asm-x86/page_32.h
+++ linux/include/asm-x86/page_32.h
@@ -13,6 +13,10 @@
  */
 #define __PAGE_OFFSET  _AC(CONFIG_PAGE_OFFSET, UL)
 
+/* Eventually 32bit should be moved over to the new names too */
+#define LARGE_PAGE_SIZE PMD_PAGE_SIZE
+#define LARGE_PAGE_MASK PMD_PAGE_MASK
+
 #ifdef CONFIG_X86_PAE
 #define __PHYSICAL_MASK_SHIFT  36
 #define __VIRTUAL_MASK_SHIFT   32
Index: linux/include/asm-x86/page.h
===
--- linux.orig/include/asm-x86/page.h
+++ linux/include/asm-x86/page.h
@@ -13,8 +13,8 @@
 #define PHYSICAL_PAGE_MASK (PAGE_MASK & __PHYSICAL_MASK)
 #define PTE_MASK   (_AT(long, PHYSICAL_PAGE_MASK))
 
-#define LARGE_PAGE_SIZE(_AC(1,UL) << PMD_SHIFT)
-#define LARGE_PAGE_MASK(~(LARGE_PAGE_SIZE-1))
+#define PMD_PAGE_SIZE  (_AC(1, UL) << PMD_SHIFT)
+#define PMD_PAGE_MASK

[PATCH] [2/9] GBPAGES: Add feature macros for the gbpages cpuid bit

2008-01-28 Thread Andi Kleen


Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 include/asm-x86/cpufeature.h |2 ++
 1 file changed, 2 insertions(+)

Index: linux/include/asm-x86/cpufeature.h
===
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -49,6 +49,7 @@
 #define X86_FEATURE_MP (1*32+19) /* MP Capable. */
 #define X86_FEATURE_NX (1*32+20) /* Execute Disable */
 #define X86_FEATURE_MMXEXT (1*32+22) /* AMD MMX extensions */
+#define X86_FEATURE_GBPAGES(1*32+26) /* GB pages */
 #define X86_FEATURE_RDTSCP (1*32+27) /* RDTSCP */
 #define X86_FEATURE_LM (1*32+29) /* Long Mode (x86-64) */
 #define X86_FEATURE_3DNOWEXT   (1*32+30) /* AMD 3DNow! extensions */
@@ -175,6 +176,7 @@
 #define cpu_has_pebs   boot_cpu_has(X86_FEATURE_PEBS)
 #define cpu_has_clflushboot_cpu_has(X86_FEATURE_CLFLSH)
 #define cpu_has_btsboot_cpu_has(X86_FEATURE_BTS)
+#define cpu_has_gbpagesboot_cpu_has(X86_FEATURE_GBPAGES)
 
 #if defined(CONFIG_X86_INVLPG) || defined(CONFIG_X86_64)
 # define cpu_has_invlpg1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [0/9] Latest GBPAGES patchkit for 2.6.25

2008-01-28 Thread Andi Kleen


This patchkit implements support for the 1GB pages of AMD Fam10h CPUs 
in the kernel direct mapping. 

Change to previous versions:
- Ported to latest change_page_attr
- kexec now works again
- Ported to latest git-x86
- Minor cleanups.

I believe this patchkit is ready for the 2.6.25 merge.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] [1/9] Handle kernel near memory hole in clear_kernel_mapping

2008-01-28 Thread Andi Kleen


This was a long standing obscure problem in the relocatable kernel. The
AMD GART driver needs to unmap part of the GART in the kernel direct mapping to 
prevent cache corruption. With the relocatable kernel it is in theory possible 
that the separate kernel text mapping straddles that area too. 

Normally it should not happen because GART tends to be >= 2GB, and the kernel 
is normally not loaded that high, but it is possible in theory. 

Teach clear_kernel_mapping() about this case.

This will become more important once the kernel mapping uses 1GB pages.

Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

---
 arch/x86/mm/init_64.c |   25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

Index: linux/arch/x86/mm/init_64.c
===
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -438,7 +438,8 @@ void __init paging_init(void)
  * address and size must be aligned to 2MB boundaries.
  * Does nothing when the mapping doesn't exist.
  */
-void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+static void __init
+__clear_kernel_mapping(unsigned long address, unsigned long size)
 {
unsigned long end = address + size;
 
@@ -475,6 +476,28 @@ void __init clear_kernel_mapping(unsigne
__flush_tlb_all();
 }
 
+#define overlaps(as, ae, bs, be) ((ae) >= (bs) && (as) <= (be))
+
+void __init clear_kernel_mapping(unsigned long address, unsigned long size)
+{
+   int sh = PMD_SHIFT;
+   unsigned long kernel = __pa(__START_KERNEL_map);
+
+   /*
+* Note that we cannot unmap the kernel itself because the unmapped
+* holes here are always at least 2MB aligned.
+* This just applies to the trailing areas of the 40MB kernel mapping.
+*/
+   if (overlaps(kernel >> sh, (kernel + KERNEL_TEXT_SIZE) >> sh,
+   __pa(address) >> sh, __pa(address + size) >> sh)) {
+   printk(KERN_WARNING
+   "Kernel mapping at %lx within 2MB of memory hole\n",
+   kernel);
+   __clear_kernel_mapping(__START_KERNEL_map+__pa(address), size);
+   }
+   __clear_kernel_mapping(address, size);
+}
+
 /*
  * Memory hotplug specific functions
  */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Kasper Sandberg

On Mon, 2008-01-28 at 23:49 -0500, Gene Heskett wrote:
> On Monday 28 January 2008, Kasper Sandberg wrote:
>  [...]

> >
> >I can invalidate this theory...
> >i helped a guy on irc debug this problem, and he had ati. I tried having
> >him stop using fglrx, and go to r300.. same problem, and same problem
> >even with vesa.. :)
> >
> No Kasper, you are validating it, that it is not nvidia related, which is 
> what 
> I was also saying.
yeah thats what i mean - i can invalidate the theory that all the
affected boxes run nvidia.

> 
> >also, i have this on my fileserver with .20, which doesent even run X,
> >or module support in kernel :)
> 
> That far back?  Although ISTR I saw it happen once only when I was running 
> 2.6.18-somethingorother.

Yes im afraid so.. i will now provide some complete details, as i feel
they are relevant.

the thing is, i run 6x300gb disks, IDE, in raid5.

i have both an onboard via ide controller, and then i bought a promise
pdc 202 new thingie. i had problem however..

after a bit of time, i would get DMA reset error thing, and it all
kindof went NUTS. it was as if all data access were skewed, and as you
might imagine, this made everything fail badly.

i purchased an ITE based controller for the drives on the promise, but
exactly the same thing happened.

the errors i got was:
hdf: dma_intr: bad DMA status (dma_stat=75)
hdf: dma_intr: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
---

i then found new hope, when i heard that libata provided much better
error handling, so i upgraded to .20.

this made my box usable.

the error happens once or twice a day, the disk led will turn on
constantly, and all IO freezes for about half a minute, where it returns
PROPERLY(thank you libata!). as far as i can tell, the only side effect
is that i get those messages like described here, and flooded with on
google.

to put some timeline perspective into this.
i believe it was in 2005 i assembled the system, and when i realized it
was faulty, on old ide driver, i stopped using it - that miht have been
in beginning of 2006. then for almost a year i werent using it, hoping
to somehow fix it, but in january 2007 i think it was, atleast in the
very beginning of 2007, i hit upon the idea of trying libata, and ever
since the system has been running 24/7 - doing these errors around 2
times a day.

i have multiple times reported my problems to lkml, but nothing has
happened, i also tried to aproeach jgarzik direcly, but he was not
interested.

i really hope this can be solved now, its a huge problem

my fileserver has an asus k8v motherboard, with via chipset (k8t880 i
think it is, or something like it). currently using the promise
controller again(strangely enough all the timeouts seems to happen here,
and when the ITE was on, there, not the onboard one), in conjunction
with the onboard via.

> >> complaint.  Again, fix the nv driver so it will run my screen & I'll be

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo

Robert wrote:
> Kuan Luo wrote:
> > Robert worte.
> >> Kuan, does this patch (using the notifiers to see if the 
> command is 
> >> really done) still work if one port on the controller has 
> >> ADMA disabled 
> >> because it's in ATAPI mode? I seem to recall Allen Martin 
> mentioning 
> >> that notifiers wouldn't work in this case.
> >>
> > 
> > I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
> one cdrom in
> > the same controller. 
> > I mkfs hd and mounted the cdrom and no error happened.
> > 
> > Allen,  is there anything about notifier that we should pay 
> attention
> > to?
> 
> Assuming not, then this patch should be applied..
> 
> 

The patch should be applied.
We use the notifier register  and there is nothing to do with our
notifier register in atapi mode.

Allen wrote:
I think that's one of the cases where memory notifiers don't work (one
of the drives is not in ADMA mode either because it's ATAPI or it's in
legacy mode).  There's no issue with the notifier registers though. 
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/32] ide-tape redux v1

2008-01-28 Thread Borislav Petkov

Hi Bart,

[...]

> > the BKL in idetape_write_release() with finer-grained locking etc, probably 
> > also
> > some pipeline improvements, removal of OnStream support, etc. but that'll 
> > come
> > later.
> 
> On-Stream support has been long gone but it seems that deprecation
> warning etc. managed to survive.
> 
> w.r.t. to the pipeline-mode: it should be pipelined into /dev/null
> 
> rationale:
> - it is _very_ complex
> - causes errors to be deferred till the next user-space access
> - direct I/O using blk_rq_map_user() will offer superior performance
> 
> the only question is whether to remove it...

Well, on the one hand, since the driver is only being maintained we should not
remove code that works. Also, i don't know how many users ide-tape really has
but, would it be worth the trouble at all? Because if nobody's using it, we
could just as well pipe the whole thing into /dev/null.. On the other hand, the
pipelining part _is_ kinda big and, right, it is not that straightfoward to
look at it and know what it actually does - it truly is a student project :)

> >  Documentation/ide/ChangeLog.ide-tape.1995-2002 |  405 +++
> >  drivers/ide/Kconfig|3 +-
> >  drivers/ide/ide-tape.c | 4146 
> > +---
> >  3 files changed, 1991 insertions(+), 2563 deletions(-)

[...]

> BTW what happend to patch #23?

Well, it appeared in my lkml mailbox having gone over vger which means at least
somebody got it :). But, yeah, that was a real nightmare yesterday sending all
those patches in one go. See, i got a stupid umts modem behind a not so 
transparent
proxy :) whose subnet is listed in almost every spam database on the planet
and whenever i try to send more than one mail i hit all sorts of mail server
restrictions like yahoo's maximum messages per day crap.. Gmail seems a bit
smarter ?! and scans the mail message and then says all kinds of funny stuff :):

27 10:48:31 gollum postfix/smtp[4011]: F1710123BFD: to=<[EMAIL PROTECTED]>, 
relay=vger.kernel.org[209.132.176.167]:25,
delay=10, delays=0.19/0.29/2.7/7.2, dsn=2.7.1,  status=sent (250 2.7.1 Looks 
like Linux source DIFF email.. BF:; S1753942AbYA0Js4)

what's next, probably something like:

...(250 3.x.x uh, ok, i'm gonna relay your mail but please have another coffee, 
please) ;

Anyway, resending #23 to you in a private mail.

-- 
Regards/Gruß,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Kasper Sandberg wrote:
 [...]
>> >We have no way of debugging that module, so please try 2.6.24 without it.
>>
>> Sorry, I can't do this and have a working machine.  The nv driver has
>> suffered bit rot or something since the FC2 days when it COULD run a 19"
>> crt at 1600x1200, and will not drive this 20" wide screen lcd 1680x1050
>> monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking
>> like a jpg compressed to 10%.  The system is not usable on a day to basis
>> without the nvidia driver.
>>
>> Fix the nv driver so it will run this screen at its native resolution and
>> I'll be glad to run it even if it won't run google earth, which I do use
>> from time to time.  Now, if in all the hits you can get from google on
>> this, currently 14,800 just for 'exception Emask', apparently caused by a
>> timeout, if 100% of the complainers are running nvidia drivers also, then
>> I see a legit
>
>I can invalidate this theory...
>i helped a guy on irc debug this problem, and he had ati. I tried having
>him stop using fglrx, and go to r300.. same problem, and same problem
>even with vesa.. :)
>
No Kasper, you are validating it, that it is not nvidia related, which is what 
I was also saying.

>also, i have this on my fileserver with .20, which doesent even run X,
>or module support in kernel :)

That far back?  Although ISTR I saw it happen once only when I was running 
2.6.18-somethingorother.

>> complaint.  Again, fix the nv driver so it will run my screen & I'll be
>> glad to switch.  I can see the reason, sure, but the machine must be
>> capable of doing its common day to day stuff, while using that driver,
>> like running kde for kmail, and browsers that work.
>>
>> >If the problems persist, please try to capture a complete log from the
>> >failing kernel -- the interesting bits are everything from initial boot
>> >up to and including the first few errors. You may need to increase the
>> >kernel's log buffer size if the log gets truncated
>> > (CONFIG_LOG_BUF_SHIFT).
>>
>> If by log you mean /var/log/messages, I have several megabytes of those.
>> If you mean a live dmesg capture taken right now, its attached. It
>> contains several of these at the bottom.  I long ago made the kernel log
>> buffer bigger, cuz it couldn't even show the start immediately after the
>> boot, and even the dump to syslog was truncated.
>>
>> >There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.
>>
>> That is what I was afraid of.  I've done some limited grepping in that
>> branch of the kernel tree, and cannot seem to locate where this EH handler
>> is being invoked from.
>>
>> There is 2 lines of interest in the dmesg:
>>
>> [0.00] Nvidia board detected. Ignoring ACPI timer override.
>> [0.00] If you got timer trouble try acpi_use_timer_override
>>
>> But I have NDI what it means, kernel argument/xconfig option?
>>
>> I've also done some googling, and it appears this problem is fairly
>> widespread since the switchover to libata was encouraged.  A stock fedora
>> F8 kernel suffers the same freezes and eventually locks up, but does it
>> without the error messages being logged, it just freezes, feeling
>> identical to this in the minutes before the total freeze.  I've tried 2 of
>> those too, but the newest one won't even run X.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
bureaucrat, n:
A politician who has tenure.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [AUDIT]: Increase skb->truesize in audit_expand

2008-01-28 Thread David Miller

From: James Morris <[EMAIL PROTECTED]>
Date: Tue, 29 Jan 2008 01:13:03 +1100 (EST)

> On Mon, 28 Jan 2008, Herbert Xu wrote:
> 
> > Hi:
> > 
> > [AUDIT]: Increase skb->truesize in audit_expand
> > 
> > The recent UDP patch exposed this bug in the audit code.  It
> > was calling pskb_expand_head without increasing skb->truesize.
> > The caller of pskb_expand_head needs to do so because that function
> > is designed to be called in places where truesize is already fixed
> > and therefore it doesn't update its value.
> > 
> > Because the audit system is using it in a place where the truesize
> > has not yet been fixed, it needs to update its value manually.
> > 
> > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>
> 
> Acked-by: James Morris <[EMAIL PROTECTED]>
> 
> 
> (Candidate for stable ?)

Applied, and yes I'll queue this up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Improve Documentation/stable_api_nonsense.txt

2008-01-28 Thread Greg KH

On Tue, Jan 29, 2008 at 01:09:59AM +0200, Heikki Orsila wrote:
> * Make a remark about avoiding unnecessary changes in interfaces
> * Improve wording

Well, "improve" is a bit judgemental :)

> @@ -19,7 +19,7 @@ Executive Summary
>  You think you want a stable kernel interface, but you really do not, and
>  you don't even know it.  What you want is a stable running driver, and
>  you get that only if your driver is in the main kernel tree.  You also
> -get lots of other good benefits if your driver is in the main kernel
> +get lots of other benefits if your driver is in the main kernel
>  tree, all of which has made Linux into such a strong, stable, and mature
>  operating system which is the reason you are using it in the first
>  place.

That change is fine.

> @@ -68,11 +68,11 @@ consider the following facts about the Linux kernel:
>  There is no way that binary drivers from one architecture will run
>  on another architecture properly.
>  
> -Now a number of these issues can be addressed by simply compiling your
> -module for the exact specific kernel configuration, using the same exact
> +Now, a number of these issues can be addressed by simply compiling your
> +module for the same kernel configuration, using the same

No, I want to emphasize the word "exact" here.  It has to be the same.

>  C compiler that the kernel was built with.  This is sufficient if you
>  want to provide a module for a specific release version of a specific
> -Linux distribution.  But multiply that single build by the number of
> +Linux distribution. However, multiply that single build by the number of

You messed with the "two space" rule, and changed the word unecessarily
in my opinion.

>  different Linux distributions and the number of different supported
>  releases of the Linux distribution and you quickly have a nightmare of
>  different build options on different releases.  Also realize that each
> @@ -93,7 +93,7 @@ keep a Linux kernel driver that is not in the main kernel 
> tree up to
>  date over time.
>  
>  Linux kernel development is continuous and at a rapid pace, never
> -stopping to slow down.  As such, the kernel developers find bugs in
> +slowing down.  As such, the kernel developers find bugs in

No, they never stop, I say leave it as is.

> @@ -116,7 +116,7 @@ issues:
>  
>  This is in stark contrast to a number of closed source operating systems
>  which have had to maintain their older USB interfaces over time.  This
> -provides the ability for new developers to accidentally use the old
> +has the risk for new developers to accidentally use the old

It's not so much as a "risk" as it is what always seems to happen.  So I
don't like this change either.

> @@ -145,6 +145,10 @@ as small as possible, and that all potential interfaces 
> are tested as
>  well as they can be (unused interfaces are pretty much impossible to
>  test for validity.)
>  
> +Some complain that kernel interfaces change too often for out-of-the-tree
> +modules, but this claim is false. Changing an interface can be delicate work,
> +and it can take significant amount of developer effort. Therefore, interfaces
> +are not changed without a good reason.

No, their claim is a valid one, it's not "false".  However we are not
going to do anything about it, and as such, we don't need this kind of
wording to get people worried about it even more.

So, care to redo the patch?

Also note, there are other translations of this text already, so if you
want to change phrases like this, you might want to cc: those
maintainers as well to get their input.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMA mapping on SCSI device?

2008-01-28 Thread Andi Kleen


> The ideal solution would be to do mapping against a different struct
> device for each port, so that we could maintain the proper DMA mask for
> each of them at all times. However I'm not sure if that's possible.

I cannot imagine why it should be that difficult. The PCI subsystem
could over a pci_clone_device() or similar function.   For all complicated
purposes (sysfs etc)  the original device could be used, so it would
be hopefully not that difficult.

The alternative would be to add a new family of PCI mapping
functions that take an explicit mask. Disadvantage would be changing 
all architectures, but on the other hand the interface could be phase
in one by one (and nF4 primarily only works on x86 anyways) 

I suspect the later would be a little cleaner, although they don't
make much difference.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: down_killable implementations for every architecture

2008-01-28 Thread Andi Kleen

On Tuesday 29 January 2008 00:19, Matthew Wilcox wrote:
> As part of the TASK_KILLABLE changes, we're going to need
> down_killable().  Unfortunately, semaphores are implemented for every
> architecture, which we should probably fix at some point.

It would be best to just change it now before doing further changes. Right now
we have the bizarre situation that semaphores are more optimized
with fast path inline assembly code than the far more critical spinlocks.
But that clearly doesn't make much sense. So the best approach would
be likely to just pick some generic C implementation from some architecture
and use it everywhere.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Kasper Sandberg

On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
> On Monday 28 January 2008, Mikael Pettersson wrote:
> >Gene Heskett writes:
> > > On Monday 28 January 2008, Peter Zijlstra wrote:
> > > >On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
> > > >> 1. Wrong mailing list; use linux-ide (@vger) instead.
> > > >
> > > >What, and keep all us other interested people in the dark?
> > >
> > > As a test, I tried rebooting to the latest fedora kernel and found it
> > > kills X, so I'm back to the second to last fedora version ATM, and the
> > > third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
> > > two completed with no errors.
> > >
> > > I've added the linux-ide list to refresh those people of the problem,
> > > the logs are being spammed by this message stanza:
> > >
> > >  Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
> > > 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
> > > [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma
> > > 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]  res
> > > 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25
> > > coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25
> > > coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25
> > > coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28
> > > 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
> > > Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968
> > > 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel:
> > > [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25
> > > coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled,
> > > read cache: enabled, doesn't support DPO or FUA
> >
> >It's not obvious from this incomplete dmesg log what HW or driver
> >is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,
> >
> >it should be pata_amd driving a WDC disk:
> > > [   30.702887] pata_amd :00:09.0: version 0.3.10
> > > [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
> > > [   30.703188] scsi0 : pata_amd
> > > [   30.709313] scsi1 : pata_amd
> > > [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000
> > > irq 14 [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma
> > > 0xf008 irq 15 [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0,
> > > 15.05R15, max UDMA/100 [   30.864756] ata1.00: 390721968 sectors, multi
> > > 16: LBA48
> > > [   30.871629] ata1.00: configured for UDMA/100
> >
> >Unfortunately we also see:
> > > [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
> > > [   48.549725] ACPI: PCI Interrupt :02:00.0[A] -> Link [APC4] -> GSI
> > > 19 (level, high) -> IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
> > > Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
> >
> >We have no way of debugging that module, so please try 2.6.24 without it.
> 
> Sorry, I can't do this and have a working machine.  The nv driver has 
> suffered 
> bit rot or something since the FC2 days when it COULD run a 19" crt at 
> 1600x1200, and will not drive this 20" wide screen lcd 1680x1050 monitor at 
> more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg 
> compressed to 10%.  The system is not usable on a day to basis without the 
> nvidia driver.
> 
> Fix the nv driver so it will run this screen at its native resolution and 
> I'll 
> be glad to run it even if it won't run google earth, which I do use from time 
> to time.  Now, if in all the hits you can get from google on this, currently 
> 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of 
> the complainers are running nvidia drivers also, then I see a legit 
I can invalidate this theory...
i helped a guy on irc debug this problem, and he had ati. I tried having
him stop using fglrx, and go to r300.. same problem, and same problem
even with vesa.. :)

also, i have this on my fileserver with .20, which doesent even run X,
or module support in kernel :)

> complaint.  Again, fix the nv driver so it will run my screen & I'll be glad 
> to switch.  I can see the reason, sure, but the machine must be capable of 
> doing its common day to day stuff, while using that driver, like running kde 
> for kmail, and browsers that work.
> 
> >If the problems persist, please try to capture a complete log from the
> >failing kernel -- the interesting bits are everything from initial boot
> >up to and including the first few errors. You may need to increase the
> >kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).
> 
> If by log you mean /var/log/messages, I have several megabytes of those.
> If you mean a live dmesg capture taken right now, its attached. It contains 
> several of these at the bottom.  I long ago made the kernel log buffer 
> bigger, cuz it couldn't even show the

Re: [PATCH] x86: add PCI IDs to k8topology_64.c

2008-01-28 Thread Andi Kleen

> also there are some users are using LinuxBIOS or other firmware that doesn't 
> have  or like ACPI support. but they still need numa.
> for them ACPI doesn't help.

We've had this discussion before. The right way even if you don't 
want to do full ACPI is to do just the minimal static boot tables
and a SRAT. These are quite simple tables and should be easy to set up.
SRAT is essentially just a two dimensional table with node distances.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Mark Lord wrote:
>Gene Heskett wrote:
>..
>
>> That's ok, dd seemed to do the job also.
>
>..
>
>The two programs operate entirely differently from each other,
>so it may still be worth trying the make_bad_sector utility there.
>
>dd goes through the regular kernel I/O calls,
>whereas make_bad_sector sends raw ATA commands
>directly (more or less) to the drive.
>
Humm, if it (the sector error) continues.  I'm rather convinced that was a one 
time transient item caused by doing so many hardware resets.  It has not 
repeated in subsequent stanzas of this error.  Several times it went away 
while the drives long self test was in progress, and the resets that go with 
the reboot, or one of these errors seems to stop the long test, which from my 
reading, should resume with no delay, but maybe that only applies to a 
powerdown restart, which I haven't been doing.  The last such error was about 
11 hours ago now. I just started another long test, which if ok, should clear 
the stuff its showing now because the test was interrupted.  It has passed 
that test twice before in the last 36 hours.

Thanks Mark.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
You are a fluke of the universe; you have no right to be here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add PCI IDs to k8topology_64.c

2008-01-28 Thread Yinghai Lu

On Monday 28 January 2008 07:48:06 pm Andi Kleen wrote:
> "Joachim Deguara" <[EMAIL PROTECTED]> writes:
> 
> > Quick history, this is a harmless patch that got dropped by Andi as a mixup 
> > to 
> 
> It's not harmless.
> 
> > dropping another patch of mine that was made obsolete by Yinghai.
> > http://thread.gmane.org/gmane.linux.kernel/559581
> 
> No that's not the correct history. The correct history is that 
> I intentionally rejected this patch because the old k8topology
> hack should really not be used anymore on modern machines (especially
> not on Quad Cores). SRAT is the far better way to handle this problem
> because it has a proper abstraction.
> 
> The problem with k8topology.c is that it needs to know very low level
> information (like HT node numbers etc.) the kernel should not really
> need to know and which are difficult to handle generally without
> motherboard specific knowledge. 
> 
> k8topology.c mostly guesses, which was never a good way to handle this. 
> Also in in the various "node has no memory" cases it needs quite
> hackish fallback heuristics which will be always fragile. Then there
> are some ugly interactions with quad cores. And some other issues
> 
> I still think the patch a bad idea because adapting this file all
> the time is a long term maintenance issue. I can say that as 
> the original author :-) It was just a quick hack long ago
> to get NUMA going early. But now it far outlived its usefulness
> and adapting it to modern machines is the wrong direction. 
> 
> Best is to phase k8topology out.

then with acpi=off, we can not use numa any more.

also there are some users are using LinuxBIOS or other firmware that doesn't 
have  or like ACPI support. but they still need numa.
for them ACPI doesn't help.

YH

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] correct inconsistent ntp interval/tick_length usage

2008-01-28 Thread Roman Zippel

Hi,

On Mon, 28 Jan 2008, john stultz wrote:

> Regardless, current_tick_length() really is the base interval we're
> using in the error accumulation loop, so it seems the cleanest interface
> to use (just to avoid redundancy at least) when establishing the
> clocksource's interval length. Or do you not agree?

I see, what you need to use in timex.h for !CONFIG_NO_HZ is:

#define NTP_INTERVAL_LENGTH ((s64)LATCH * NSEC_PER_SEC) / (s64)CLOCK_TICK_RATE)

this calculates the base length of a clock tick in nsec.

current_tick_length() would only work during boot. If you switch clocks 
later, it would include random adjustments specific to the old clock.

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: fix overlap between pagetable with bss section v2

2008-01-28 Thread Yinghai Lu

[PATCH] x86_64: fix overlap between pagetable with bss section v2

one early crash on one 8 node 256g machine

Command line: console=uart8250,io,0x3f8,115200n8 
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug 
apic=debug acpi.debug_level=0x000f pci=routeirq ip=dhcp load_ramdisk=1 
ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009bc00 (usable)
 BIOS-e820: 0009bc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - dffe (usable)
 BIOS-e820: dffe - dffee000 (ACPI data)
 BIOS-e820: dffee000 - d050 (ACPI NVS)
 BIOS-e820: d050 - e000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00402000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d4-e42000

Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3

Call Trace:
 [] lapic_get_maxlvt+0x0/0x10
 [] clear_local_APIC+0x5/0xcf
 [] disable_local_APIC+0x5/0x17
 [] smp_send_stop+0x46/0x4c
 [] panic+0x94/0x13e
 [] sctp_eps_proc_init+0x12/0x34
 [] reserve_early+0x30/0x6c
 [] init_memory_mapping+0x2cd/0x2dc
 [] setup_arch+0x21f/0x44e
 [] start_kernel+0x6f/0x2c7
 [] _sinittext+0x1cc/0x1d3

it turns out there is overlap between pgtable and bss...

in System.map we have
80d40420 b rsi_table
80d40620 B krb5_seq_lock
80d40628 b i.20437
80d40630 b xprt_rdma_inline_write_padding
80d40638 b sunrpc_table_header
80d40640 b zero
80d40644 b min_memreg
80d40648 b rpcrdma_tk_lock_g
80d40650 B sctp_assocs_id_lock
80d40658 B proc_net_sctp
80d40660 B sctp_assocs_id
80d40680 B sysctl_sctp_mem
80d40690 B sysctl_sctp_rmem
80d406a0 B sysctl_sctp_wmem
80d406b0 b sctp_ctl_socket
80d406b8 b sctp_pf_inet6_specific
80d406c0 b sctp_pf_inet_specific
80d406c8 b sctp_af_v4_specific
80d406d0 b sctp_af_v6_specific
80d406d8 b sctp_rand.33270
80d406dc b sctp_memory_pressure
80d406e0 b sctp_sockets_allocated
80d406e4 b sctp_memory_allocated
80d406e8 b sctp_sysctl_header
80d406f0 b zero
80d406f4 A __bss_stop
80d406f4 A _end

need to round up table_start to PAGE_SIZE

also make the panic more informative.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -70,8 +70,8 @@ void __init reserve_early(unsigned long 
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
r = _res[i];
if (end > r->start && start < r->end)
-   panic("Duplicated early reservation %lx-%lx\n",
- start, end);
+   panic("Overlap early reservation %lx-%lx to %lx-%lx\n",
+ start, end, r->start, r->end);
}
if (i >= MAX_EARLY_RES)
panic("Too many early reservations");
Index: linux-2.6/arch/x86/mm/init_64.c
===
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -358,6 +358,13 @@ static void __init find_early_table_spac
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");
 
+   /*
+* when you have a lot of ram like 256g, early_table will not fit
+* into 0x8000 range, find_e820_area will find area after kerne bss
+* but the table_start is not page align, so need to round it up to
+* avoid overlap with bss
+*/
+   table_start = round_up(table_start, PAGE_SIZE);
table_start >>= PAGE_SHIFT;
table_end = table_start;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] SELinux: Fix double free in selinux_netlbl_sock_setsid()

2008-01-28 Thread David Miller

From: Paul Moore <[EMAIL PROTECTED]>
Date: Mon, 28 Jan 2008 21:20:26 -0500

> As pointed out by Adrian Bunk, commit 45c950e0f839fded922ebc0bfd59b1081cc71b70
> caused a double-free when security_netlbl_sid_to_secattr() fails.  This patch
> fixes this by removing the netlbl_secattr_destroy() call from that function
> since we are already releasing the secattr memory in
> selinux_netlbl_sock_setsid().
> 
> Signed-off-by: Paul Moore <[EMAIL PROTECTED]>

Applied, and I'll queue this up for -stable too.

Please, when mentioning specific commits please also provide
the changelog headline along with the SHA1 hash.

The reason is that when this fix is moved over to another
tree where the SHA1 of the causing change is different people
studying your fix won't be able to find it without more stable
contextual information.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: Reduce ifdef sections in fault.c

2008-01-28 Thread Harvey Harrison

Signed-off-by: Harvey Harrison <[EMAIL PROTECTED]>
---
 arch/x86/mm/fault.c |   31 +--
 1 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e28cc52..2737493 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -508,6 +508,10 @@ static int vmalloc_fault(unsigned long address)
pmd_t *pmd, *pmd_ref;
pte_t *pte, *pte_ref;
 
+   /* Make sure we are in vmalloc area */
+   if (!(address >= VMALLOC_START && address < VMALLOC_END))
+   return -1;
+
/* Copy kernel mappings over when needed. This can also
   happen within a race in page table update. In the later
   case just flush. */
@@ -603,6 +607,9 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned 
long error_code)
 */
 #ifdef CONFIG_X86_32
if (unlikely(address >= TASK_SIZE)) {
+#else
+   if (unlikely(address >= TASK_SIZE64)) {
+#endif
if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
vmalloc_fault(address) >= 0)
return;
@@ -618,6 +625,8 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned 
long error_code)
goto bad_area_nosemaphore;
}
 
+
+#ifdef CONFIG_X86_32
/* It's safe to allow irq's after cr2 has been saved and the vmalloc
   fault has been handled. */
if (regs->flags & (X86_EFLAGS_IF|VM_MASK))
@@ -630,28 +639,6 @@ void __kprobes do_page_fault(struct pt_regs *regs, 
unsigned long error_code)
if (in_atomic() || !mm)
goto bad_area_nosemaphore;
 #else /* CONFIG_X86_64 */
-   if (unlikely(address >= TASK_SIZE64)) {
-   /*
-* Don't check for the module range here: its PML4
-* is always initialized because it's shared with the main
-* kernel text. Only vmalloc may need PML4 syncups.
-*/
-   if (!(error_code & (PF_RSVD|PF_USER|PF_PROT)) &&
- ((address >= VMALLOC_START && address < VMALLOC_END))) {
-   if (vmalloc_fault(address) >= 0)
-   return;
-   }
-
-   /* Can handle a stale RO->RW TLB */
-   if (spurious_fault(address, error_code))
-   return;
-
-   /*
-* Don't take the mm semaphore here. If we fixup a prefetch
-* fault we could otherwise deadlock.
-*/
-   goto bad_area_nosemaphore;
-   }
if (likely(regs->flags & X86_EFLAGS_IF))
local_irq_enable();
 
-- 
1.5.4.rc4.1142.gf5a97



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: add PCI IDs to k8topology_64.c

2008-01-28 Thread Andi Kleen

"Joachim Deguara" <[EMAIL PROTECTED]> writes:

> Quick history, this is a harmless patch that got dropped by Andi as a mixup 
> to 

It's not harmless.

> dropping another patch of mine that was made obsolete by Yinghai.
> http://thread.gmane.org/gmane.linux.kernel/559581

No that's not the correct history. The correct history is that 
I intentionally rejected this patch because the old k8topology
hack should really not be used anymore on modern machines (especially
not on Quad Cores). SRAT is the far better way to handle this problem
because it has a proper abstraction.

The problem with k8topology.c is that it needs to know very low level
information (like HT node numbers etc.) the kernel should not really
need to know and which are difficult to handle generally without
motherboard specific knowledge. 

k8topology.c mostly guesses, which was never a good way to handle this. 
Also in in the various "node has no memory" cases it needs quite
hackish fallback heuristics which will be always fragile. Then there
are some ugly interactions with quad cores. And some other issues

I still think the patch a bad idea because adapting this file all
the time is a long term maintenance issue. I can say that as 
the original author :-) It was just a quick hack long ago
to get NUMA going early. But now it far outlived its usefulness
and adapting it to modern machines is the wrong direction. 

Best is to phase k8topology out.

-Andi

>
> -Joachim
>
> --
>
> x86: add PCI IDs to k8topology_64.c
> 
> This just adds the PCI IDs of AMD's family 10h and 11h CPU's northbridges 
> to
> k8topology discovery.
> 
> Signed-off-by: Joachim Deguara <[EMAIL PROTECTED]>
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
> Acked-by: Yinghai Lu <[EMAIL PROTECTED]>
>
> diff --git a/arch/x86/mm/k8topology_64.c b/arch/x86/mm/k8topology_64.c
> index a96006f..b123ea3 100644
> --- a/arch/x86/mm/k8topology_64.c
> +++ b/arch/x86/mm/k8topology_64.c
> @@ -28,11 +28,15 @@ static __init int find_northbridge(void)
>   u32 header;
>   
>   header = read_pci_config(0, num, 0, 0x00);  
> - if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)))
> + if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
>   continue;   
>  
>   header = read_pci_config(0, num, 1, 0x00); 
> - if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)))
> + if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
>   continue;   
>   return num; 
>   } 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: struct cifs_dfs_referral_inode_operations is unused

2008-01-28 Thread Steven French

We have one DFS patch remaining to merge and then will need to do a 
cleanup patch


Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com



Adrian Bunk <[EMAIL PROTECTED]> 
01/28/2008 04:11 PM

To
Igor Mammedov <[EMAIL PROTECTED]>, Steven French/Austin/[EMAIL PROTECTED]
cc
[EMAIL PROTECTED], linux-kernel@vger.kernel.org
Subject
struct cifs_dfs_referral_inode_operations is unused






struct cifs_dfs_referral_inode_operations is unused, which does not seem 
to have been intended?

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMA mapping on SCSI device?

2008-01-28 Thread Matthew Wilcox

On Mon, Jan 28, 2008 at 06:08:44PM -0600, Robert Hancock wrote:
> The 
> thought of using the SCSI struct device for DMA mapping was brought up 
> at one point.. any thoughts on that?

I believe this will work on some architectures and not others.
Anything that uses include/asm-generic/dma-mapping.h will break, for
example.  It would be nice for those architectures to get fixed ...

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[git pull] Fix recent Ocfs2 breakage

2008-01-28 Thread Mark Fasheh

Greg's commit c60b71787982cefcf9fa09aa281fa8c4c685d557 inadvertantly broke
Ocfs2 userspace ABI, so I have a rather high priority single line patch from
Joel to fix things up for you to pull. A copy of the patch is attached to
the bottom of this e-mail. Embarassingly enough, I missed this while acking
the patch late last week :(

Please pull from 'upstream-linus' branch of
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2.git upstream-linus

to receive the following updates:

 fs/ocfs2/cluster/sys.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Joel Becker (1):
  ocfs2: Fix userspace ABI breakage in sysfs


From: Joel Becker <[EMAIL PROTECTED]>

ocfs2: Fix userspace ABI breakage in sysfs

The userspace ABI of ocfs2's internal cluster stack (o2cb) was broken by
commit c60b71787982cefcf9fa09aa281fa8c4c685d557 "kset: convert ocfs2 to
use kset_create".  Specifically, the '/sys/o2cb' kset was moved to
'/sys/fs/o2cb'.  This breaks all ocfs2 tools and renders the
filesystem unmountable.

This fix moves '/sys/o2cb' back where it belongs.

Signed-off-by: Joel Becker <[EMAIL PROTECTED]>
Signed-off-by: Mark Fasheh <[EMAIL PROTECTED]>
---
 fs/ocfs2/cluster/sys.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/cluster/sys.c b/fs/ocfs2/cluster/sys.c
index a4b0773..0c095ce 100644
--- a/fs/ocfs2/cluster/sys.c
+++ b/fs/ocfs2/cluster/sys.c
@@ -64,7 +64,7 @@ int o2cb_sys_init(void)
 {
int ret;
 
-   o2cb_kset = kset_create_and_add("o2cb", NULL, fs_kobj);
+   o2cb_kset = kset_create_and_add("o2cb", NULL, NULL);
if (!o2cb_kset)
return -ENOMEM;
 
-- 
1.5.3.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord


Gene Heskett wrote:

On Monday 28 January 2008, Gene Heskett wrote:

On Monday 28 January 2008, Robert Hancock wrote:
[...]


Check the /etc/modprobe.conf file, a lot of distributions use this to
generate the initrd. If there's references to pata_amd it'll try and
include it.

Bingo!  Thanks Robert, I'll try it again with that line commented.  I wasn't
aware of that connection at all.  Yup, it worked, I feel a reboot coming
on. :)


But it didn't work, apparently commenting that line out needs to be balanced 
by adding another line telling it amd74xx is the 'hostadapter', not 
necessarily scsi.


Can this be made more universal so I don't have to edit /etc/modprobe.conf?
..


You could really do it like Linus (and me), and not bother with modules
for critical services like hard disks.

Just build them *into* the core kernel (select "y" or "checkmark" rather
than "m" or "dot" for modules).  This eliminates a ton of crap that can fail,
and may also make your kernel a micro-MIP faster (core memory is often mapped
without page table entries, whereas loaded modules use page tables.. slower, 
slightly).

Linus just edits the /boot/grub/menu.lst, and clones an existing boot entry
for the new kernel, editing the "kernel" line to match the name of the file
that got installed in /boot by "make install" (from the kernel directory).
He just leaves the ramdisk/initrd line as-was --> wrong version, but that's 
okay.

I totally get rid of them here, but that requires hardcoding the root=/dev/
part on the "kernel" line.  No big deal, it works just fine that way.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: DMA mapping on SCSI device?

2008-01-28 Thread Grant Grundler

On Jan 29, 2008 11:08 AM, Robert Hancock <[EMAIL PROTECTED]> wrote:
...
> The last solution I tried was to set the DMA mask on both ports to
> 32-bit on slave_configure when an ATAPI device is connected. However,
> this runs into complications as well. This is run on initialization and
> when trying to set the other port into 32-bit DMA, it may not be
> initialized yet. Plus, it forces the port with a hard drive on it into
> 32-bit DMA needlessly.

Have you measured the impact of setting the PCI dma mask to 32-bit?

Last time Alex Williamson (HP) measured this on IA64, we deliberately
forced pci_map_sg() to use the IOMMU even for devices that were 64-bit
capable. We got 3-5% better throughput since the device had fewer
entries to retrieve and the devices (at the time) weren't that good at
processing SG lists.

>
> The ideal solution would be to do mapping against a different struct
> device for each port, so that we could maintain the proper DMA mask for
> each of them at all times. However I'm not sure if that's possible. The
> thought of using the SCSI struct device for DMA mapping was brought up
> at one point.. any thoughts on that?

I'm pretty sure that's not possible (using two PCI dev structs). I'm
skeptical it's worth converting DMA services to use SCSI devs since
that's an extremely invasive change for a marginal benefit.

hth,
grant

> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2008-01-28 Thread Matthew Wilcox

On Mon, Jan 28, 2008 at 07:05:05PM -0800, Arjan van de Ven wrote:
> I think there's only one fundamental disagreement; and that is:
> do we think that things are now totally fixed and no new major issues
> will arrive after the "fix yet another mmconfig thing" patches are merged.
> 
> If the answer is no, then imho my patch is the right approach; it will limit 
> the damage and doesn't make
> the people suffer who don't need extended config space.
> If the answer is yet, then my patch is not needed.
> 
> This is a judgment call; I'm skeptical, others are more optimistic that after 
> 2 years of messing around
> they have finally found the last golden fix.

I'm more optimistic because we've so severely restricted the use of
mmconf after these patches that it's unlikely to cause problems.  I also
hear Vista is now using mmconf, so fewer implementations are going to
be buggy at this point.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord


Gene Heskett wrote:
..

That's ok, dd seemed to do the job also.

..

The two programs operate entirely differently from each other,
so it may still be worth trying the make_bad_sector utility there.

dd goes through the regular kernel I/O calls,
whereas make_bad_sector sends raw ATA commands
directly (more or less) to the drive.

-ml
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2008-01-28 Thread Arjan van de Ven

On Mon, 28 Jan 2008 12:44:31 -0800
Greg KH <[EMAIL PROTECTED]> wrote:

> On Mon, Jan 28, 2008 at 01:32:06PM -0500, Tony Camuso wrote:
> > Greg,
> >
> > Have you given Grant's suggestion any further consideration?
> >
> > I'd like to know how the MMCONFIG issues discussed in this thread
> > are going to be handled upstream. I have a patch implemented in
> > RHEL 5.2, but I would rather have the upstream patch implemented,
> > whatever it is.
> 
> Well, everyone still doesn't seem to agree on the proper way forward
> here, so for me to just "pick one" isn't very appropriate.
> 
> So, can we try again?

I think there's only one fundamental disagreement; and that is:
do we think that things are now totally fixed and no new major issues
will arrive after the "fix yet another mmconfig thing" patches are merged.

If the answer is no, then imho my patch is the right approach; it will limit 
the damage and doesn't make
the people suffer who don't need extended config space.
If the answer is yet, then my patch is not needed.

This is a judgment call; I'm skeptical, others are more optimistic that after 2 
years of messing around
they have finally found the last golden fix.

-- 
If you want to reach me at my work email, use [EMAIL PROTECTED]
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Change pci_raw_ops to pci_raw_read/write

2008-01-28 Thread Matthew Wilcox


We want to allow different implementations of pci_raw_ops for standard
and extended config space on x86.  Rather than clutter generic code with
knowledge of this, we make pci_raw_ops private to x86 and use it to
implement the new raw interface -- raw_pci_read() and raw_pci_write().

Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 arch/ia64/pci/pci.c   |   25 -
 arch/ia64/sn/pci/tioce_provider.c |   16 
 arch/x86/kernel/quirks.c  |2 +-
 arch/x86/pci/common.c |   25 +++--
 arch/x86/pci/direct.c |4 ++--
 arch/x86/pci/fixup.c  |6 --
 arch/x86/pci/legacy.c |2 +-
 arch/x86/pci/mmconfig-shared.c|6 +++---
 arch/x86/pci/mmconfig_32.c|   10 ++
 arch/x86/pci/mmconfig_64.c|8 +---
 arch/x86/pci/pci.h|   15 +++
 arch/x86/pci/visws.c  |3 ---
 drivers/acpi/osl.c|   25 ++---
 drivers/ata/Kconfig   |3 +++
 drivers/ata/Makefile  |3 +++
 include/linux/pci.h   |   16 
 16 files changed, 84 insertions(+), 85 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 488e48a..8fd7e82 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -43,8 +43,7 @@
 #define PCI_SAL_EXT_ADDRESS(seg, bus, devfn, reg)  \
(((u64) seg << 28) | (bus << 20) | (devfn << 12) | (reg))
 
-static int
-pci_sal_read (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_read(unsigned int seg, unsigned int bus, unsigned int devfn,
  int reg, int len, u32 *value)
 {
u64 addr, data = 0;
@@ -68,8 +67,7 @@ pci_sal_read (unsigned int seg, unsigned int bus, unsigned 
int devfn,
return 0;
 }
 
-static int
-pci_sal_write (unsigned int seg, unsigned int bus, unsigned int devfn,
+int raw_pci_write(unsigned int seg, unsigned int bus, unsigned int devfn,
   int reg, int len, u32 value)
 {
u64 addr;
@@ -91,24 +89,17 @@ pci_sal_write (unsigned int seg, unsigned int bus, unsigned 
int devfn,
return 0;
 }
 
-static struct pci_raw_ops pci_sal_ops = {
-   .read = pci_sal_read,
-   .write =pci_sal_write
-};
-
-struct pci_raw_ops *raw_pci_ops = _sal_ops;
-
-static int
-pci_read (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 
*value)
+static int pci_read(struct pci_bus *bus, unsigned int devfn, int where,
+   int size, u32 *value)
 {
-   return raw_pci_ops->read(pci_domain_nr(bus), bus->number,
+   return raw_pci_read(pci_domain_nr(bus), bus->number,
 devfn, where, size, value);
 }
 
-static int
-pci_write (struct pci_bus *bus, unsigned int devfn, int where, int size, u32 
value)
+static int pci_write(struct pci_bus *bus, unsigned int devfn, int where,
+   int size, u32 value)
 {
-   return raw_pci_ops->write(pci_domain_nr(bus), bus->number,
+   return raw_pci_write(pci_domain_nr(bus), bus->number,
  devfn, where, size, value);
 }
 
diff --git a/arch/ia64/sn/pci/tioce_provider.c 
b/arch/ia64/sn/pci/tioce_provider.c
index e1a3e19..999f14f 100644
--- a/arch/ia64/sn/pci/tioce_provider.c
+++ b/arch/ia64/sn/pci/tioce_provider.c
@@ -752,13 +752,13 @@ tioce_kern_init(struct tioce_common *tioce_common)
 * Determine the secondary bus number of the port2 logical PPB.
 * This is used to decide whether a given pci device resides on
 * port1 or port2.  Note:  We don't have enough plumbing set up
-* here to use pci_read_config_xxx() so use the raw_pci_ops vector.
+* here to use pci_read_config_xxx() so use raw_pci_read().
 */
 
seg = tioce_common->ce_pcibus.bs_persist_segment;
bus = tioce_common->ce_pcibus.bs_persist_busnum;
 
-   raw_pci_ops->read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,);
+   raw_pci_read(seg, bus, PCI_DEVFN(2, 0), PCI_SECONDARY_BUS, 1,);
tioce_kern->ce_port1_secondary = (u8) tmp;
 
/*
@@ -799,11 +799,11 @@ tioce_kern_init(struct tioce_common *tioce_common)
 
/* mem base/limit */
 
-   raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+   raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
  PCI_MEMORY_BASE, 2, );
base = (u64)tmp << 16;
 
-   raw_pci_ops->read(seg, bus, PCI_DEVFN(dev, 0),
+   raw_pci_read(seg, bus, PCI_DEVFN(dev, 0),
  PCI_MEMORY_LIMIT, 2, );
limit = (u64)tmp << 16;
limit |= 0xfUL;
@@ -817,21 +817,21 @@ tioce_kern_init(struct tioce_common *tioce_common)
 * attributes.
 */
 
-

PCI x86: always use conf1 to access config space below 256 bytes

2008-01-28 Thread Matthew Wilcox

PCI x86: always use conf1 to access config space below 256 bytes

Thanks to Loic Prylli <[EMAIL PROTECTED]>, who originally proposed
this idea.

Always using legacy configuration mechanism for the legacy config space
and extended mechanism (mmconf) for the extended config space is
a simple and very logical approach. It's supposed to resolve all
known mmconf problems. It still allows per-device quirks (tweaking
dev->cfg_size). It also allows to get rid of mmconf fallback code.

Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]>
Signed-off-by: Matthew Wilcox <[EMAIL PROTECTED]>
---
 arch/x86/pci/mmconfig-shared.c |   35 ---
 arch/x86/pci/mmconfig_32.c |   22 +-
 arch/x86/pci/mmconfig_64.c |   22 ++
 arch/x86/pci/pci.h |7 ---
 4 files changed, 19 insertions(+), 67 deletions(-)

diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
index 4df637e..6b521d3 100644
--- a/arch/x86/pci/mmconfig-shared.c
+++ b/arch/x86/pci/mmconfig-shared.c
@@ -22,42 +22,9 @@
 #define MMCONFIG_APER_MIN  (2 * 1024*1024)
 #define MMCONFIG_APER_MAX  (256 * 1024*1024)
 
-DECLARE_BITMAP(pci_mmcfg_fallback_slots, 32*PCI_MMCFG_MAX_CHECK_BUS);
-
 /* Indicate if the mmcfg resources have been placed into the resource table. */
 static int __initdata pci_mmcfg_resources_inserted;
 
-/* K8 systems have some devices (typically in the builtin northbridge)
-   that are only accessible using type1
-   Normally this can be expressed in the MCFG by not listing them
-   and assigning suitable _SEGs, but this isn't implemented in some BIOS.
-   Instead try to discover all devices on bus 0 that are unreachable using MM
-   and fallback for them. */
-static void __init unreachable_devices(void)
-{
-   int i, bus;
-   /* Use the max bus number from ACPI here? */
-   for (bus = 0; bus < PCI_MMCFG_MAX_CHECK_BUS; bus++) {
-   for (i = 0; i < 32; i++) {
-   unsigned int devfn = PCI_DEVFN(i, 0);
-   u32 val1, val2;
-
-   pci_conf1_read(0, bus, devfn, 0, 4, );
-   if (val1 == 0x)
-   continue;
-
-   if (pci_mmcfg_arch_reachable(0, bus, devfn)) {
-   raw_pci_ops->read(0, bus, devfn, 0, 4, );
-   if (val1 == val2)
-   continue;
-   }
-   set_bit(i + 32 * bus, pci_mmcfg_fallback_slots);
-   printk(KERN_NOTICE "PCI: No mmconfig possible on device"
-  " %02x:%02x\n", bus, i);
-   }
-   }
-}
-
 static const char __init *pci_mmcfg_e7520(void)
 {
u32 win;
@@ -270,8 +237,6 @@ void __init pci_mmcfg_init(int type)
return;
 
if (pci_mmcfg_arch_init()) {
-   if (type == 1)
-   unreachable_devices();
if (known_bridge)
pci_mmcfg_insert_resources(IORESOURCE_BUSY);
pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF;
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c
index 1bf5816..7b75e65 100644
--- a/arch/x86/pci/mmconfig_32.c
+++ b/arch/x86/pci/mmconfig_32.c
@@ -30,10 +30,6 @@ static u32 get_base_addr(unsigned int seg, int bus, unsigned 
devfn)
struct acpi_mcfg_allocation *cfg;
int cfg_num;
 
-   if (seg == 0 && bus < PCI_MMCFG_MAX_CHECK_BUS &&
-   test_bit(PCI_SLOT(devfn) + 32*bus, pci_mmcfg_fallback_slots))
-   return 0;
-
for (cfg_num = 0; cfg_num < pci_mmcfg_config_num; cfg_num++) {
cfg = _mmcfg_config[cfg_num];
if (cfg->pci_segment == seg &&
@@ -68,13 +64,16 @@ static int pci_mmcfg_read(unsigned int seg, unsigned int 
bus,
u32 base;
 
if ((bus > 255) || (devfn > 255) || (reg > 4095)) {
-   *value = -1;
+err:   *value = -1;
return -EINVAL;
}
 
+   if (reg < 256)
+   return pci_conf1_read(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
-   return pci_conf1_read(seg,bus,devfn,reg,len,value);
+   goto err;
 
spin_lock_irqsave(_config_lock, flags);
 
@@ -105,9 +104,12 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int 
bus,
if ((bus > 255) || (devfn > 255) || (reg > 4095))
return -EINVAL;
 
+   if (reg < 256)
+   return pci_conf1_write(seg,bus,devfn,reg,len,value);
+
base = get_base_addr(seg, bus, devfn);
if (!base)
-   return pci_conf1_write(seg,bus,devfn,reg,len,value);
+   return -EINVAL;
 
spin_lock_irqsave(_config_lock, flags);
 
@@ -134,12 +136,6 @@ static struct pci_raw_ops pci_mmcfg = {
.write =

Re: [PATCH 0/2] Relax restrictions on setting CONFIG_NUMA on x86

2008-01-28 Thread KOSAKI Motohiro

> here's a QuickStart:
> 
>http://redhat.com/~mingo/x86.git/README

Thanks!



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pull request: DMA pool updates

2008-01-28 Thread Andrew Morton

On Mon, 28 Jan 2008 19:45:25 -0700 Matthew Wilcox <[EMAIL PROTECTED]> wrote:

> > afaik these patches have been tested by nobody except thyself?
> 
> I've tested them myself, then I sent them to the perf team who ran the
> (4 hour long) benchmark, and they reported success.  As with many patches
> these days, they sank into a pit of indifference.

I like to think that's because everyone is all fired up about bugfixes and
the regression reports.  heh.

It's a simple matter for me to add another git tree, which gets things a
bit more exposure.

>  Perhaps I need to
> take a leaf from my former government's book and sex up my patch
> descriptions a bit.

Well these two pulls came with effectively no description at all.  Put
yourself in Linus's position...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Patch v2] Make PCI extended config space (MMCONFIG) a driver opt-in

2008-01-28 Thread Matthew Wilcox

On Mon, Jan 28, 2008 at 02:53:34PM -0800, Greg KH wrote:
> Please send me patches, in a form that can be merged, along with a
> proper changelog entry, in the order in which you wish them to be
> applied, so I know exactly what changes you are referring to.

I'll send each patch as a reply to this email.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: fixed a bug of adma in rhel4u5 with HDS7250SASUN500G.

2008-01-28 Thread Kuan Luo

robert wrote:
> Kuan Luo wrote:
> > Robert worte.
> >> Kuan, does this patch (using the notifiers to see if the 
> command is 
> >> really done) still work if one port on the controller has 
> >> ADMA disabled 
> >> because it's in ATAPI mode? I seem to recall Allen Martin 
> mentioning 
> >> that notifiers wouldn't work in this case.
> >>
> > 
> > I just tried the 2.6.24-rc7 sata_nv driver with one hd and  
> one cdrom in
> > the same controller. 
> > I mkfs hd and mounted the cdrom and no error happened.
> > 
> > Allen,  is there anything about notifier that we should pay 
> attention
> > to?
> 
> Assuming not, then this patch should be applied..
> 
> 

I am asking someone about the issue.
Soon i will be getting a concrete response.
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "Default Linux Capabilities" default in 2.6.24

2008-01-28 Thread James Morris

On Mon, 28 Jan 2008, Matt LaPlante wrote:

> On Thu, 24 Jan 2008 19:12:01 -0600
> Matt LaPlante <[EMAIL PROTECTED]> wrote:
> 
> > 
> > I'm doing a make oldconfig with the new 2.6.24 kernel.  I came to the 
> > prompt for "Default Linux Capabilities" which defaults to No:
> > 
> > ---
> >  Default Linux Capabilities (SECURITY_CAPABILITIES) [N/y/?] (NEW) ?
> > ---
> > 
> > However the help text recommends saying Yes.
> > 
> > ---
> >  This enables the "default" Linux capabilities functionality.
> >  If you are unsure how to answer this question, answer Y.
> > ---
> > 
> > Does this seem incongruous?  Also, what's the "question"? :)
> > 
> > Thanks, 
> > Matt LaPlante
> 
> Anyone?

I think this should be default y.


-- 
James Morris
<[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pull request: DMA pool updates

2008-01-28 Thread Matthew Wilcox

On Mon, Jan 28, 2008 at 05:07:34PM -0800, Andrew Morton wrote:
> The usual form is, I believe,
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc.git dmapool
> 
> Otherwise people get all confused and think it's an empty tree (like I just
> did).

Sorry!

> There were no replies to v2 of the patch series.  It all looks reasonable
> from a quick scan (assuming the patches are unchanged since then).

I haven't changed them, correct.

> afaik these patches have been tested by nobody except thyself?

I've tested them myself, then I sent them to the perf team who ran the
(4 hour long) benchmark, and they reported success.  As with many patches
these days, they sank into a pit of indifference.  Perhaps I need to
take a leaf from my former government's book and sex up my patch
descriptions a bit.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86_64: fix overlap between pagetable with bss section

2008-01-28 Thread Yinghai Lu

[PATCH] x86_64: fix overlap between pagetable with bss section

one early crash on one 8 node 256g machine

Command line: console=uart8250,io,0x3f8,115200n8 
initrd=kernel.org/mydisk11_x86_64.gz rw root=/dev/ram0 debug initcall_debug 
apic=debug acpi.debug_level=0x000f pci=routeirq ip=dhcp load_ramdisk=1 
ramdisk_size=131072 BOOT_IMAGE=kernel.org/bzImage_2.6.25_k8.1
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009bc00 (usable)
 BIOS-e820: 0009bc00 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - dffe (usable)
 BIOS-e820: dffe - dffee000 (ACPI data)
 BIOS-e820: dffee000 - d050 (ACPI NVS)
 BIOS-e820: d050 - e000 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 00402000 (usable)
Early serial console at I/O port 0x3f8 (options '115200n8')
console [uart0] enabled
end_pfn_map = 67239936
Kernel panic - not syncing: Duplicated early reservation d4-e42000

Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #3

Call Trace:
 [] lapic_get_maxlvt+0x0/0x10
 [] clear_local_APIC+0x5/0xcf
 [] disable_local_APIC+0x5/0x17
 [] smp_send_stop+0x46/0x4c
 [] panic+0x94/0x13e
 [] sctp_eps_proc_init+0x12/0x34
 [] reserve_early+0x30/0x6c
 [] init_memory_mapping+0x2cd/0x2dc
 [] setup_arch+0x21f/0x44e
 [] start_kernel+0x6f/0x2c7
 [] _sinittext+0x1cc/0x1d3

one later oops on other machine

Calling initcall 0x80bc33ac: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 005f
IP: [] proc_register+0xe7/0x10f
PGD 0
Oops:  [1] SMP
CPU 7
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #1
RIP: 0010:[]  [] proc_register+0xe7/0x10f
RSP: :811074c55e60  EFLAGS: 00010246
RAX: 8d8d RBX: 811074d78d80 RCX: 811074c55e08
RDX:  RSI: 0141 RDI: 80cc2460
RBP:  R08:  R09: 811074d78d80
R10:  R11: 80b78750 R12: 811074c55e6c
R13:  R14: 811074c55ee0 R15: 0006eb27426e
FS:  () GS:811074cc7f00() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 005f CR3: 00201000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 1, threadinfo 811074c54000, task 810874c54000)
Stack:  80a57340 0141 811074d78d80 
 ff97 802bfef0  
  80bc3b41 811074c55ee0 80bc349b
Call Trace:
 [] ? create_proc_entry+0x73/0x8a
 [] ? sctp_snmp_proc_init+0x1c/0x34
 [] ? sctp_init+0xef/0x711
 [] ? kernel_init+0x175/0x2e1
 [] ? child_rip+0xa/0x12
 [] ? kernel_init+0x0/0x2e1
 [] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 
c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 
58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [] proc_register+0xe7/0x10f
 RSP 
CR2: 005f
---[ end trace c97bfb5810c69e0c ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out there is overlap between pgtable and bss...

need to round up table_start to PAGE

also make the panic more informative.

Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]>

diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
index f8b7beb..6f07bab 100644
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -70,8 +70,8 @@ void __init reserve_early(unsigned long start, unsigned long 
end)
for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
r = _res[i];
if (end > r->start && start < r->end)
-   panic("Duplicated early reservation %lx-%lx\n",
- start, end);
+   panic("Overlap early reservation %lx-%lx to %lx-%lx\n",
+ start, end, r->start, r->end);
}
if (i >= MAX_EARLY_RES)
panic("Too many early reservations");
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b09faf2..bf02f7e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -358,6 +358,8 @@ static void __init find_early_table_space(unsigned long end)
if (table_start == -1UL)
panic("Cannot find space for the kernel page tables");
 
+   /* need to round it up to avoid overlap less one page */
+   table_start = round_up(table_start, PAGE_SIZE);
table_start >>= PAGE_SHIFT;
table_end = table_start;
 
--

Re: [PATCH] correct inconsistent ntp interval/tick_length usage

2008-01-28 Thread john stultz


On Fri, 2008-01-25 at 15:07 +0100, Roman Zippel wrote:
> Hi,
> 
> On Wed, 23 Jan 2008, john stultz wrote:
> 
> > This difference in calculation was causing the clocksource correction
> > code to apply a correction factor to the clocksource so the two
> > intervals were the same, however this results in the actual frequency of
> > the clocksource to be made incorrect. I believe this difference would
> > affect all clocksources, although to differing degrees depending on the
> > clocksource resolution.
> 
> Let's look at why the correction is done in first place. The update steps 
> don't add up precisely to 1sec (LATCH*HZ != CLOCK_TICK_RATE), so a small 
> addjustment is used to make up for it. The problem here is that if the 
> update frequency changes, the addjustment isn't correct anymore.
> The simple fix is to just omit the addjustment in these cases in ntp.c:
> 
> #if NTP_INTERVAL_FREQ == HZ
> ...
> #else
> #define CLOCK_TICK_ADJUST 0
> #endif

Hmmm, although this doesn't explain why the issue is seen when
NTP_INTERVAL_FREQ == HZ (as it is in my system's case). Or am I missing
something?

Regardless, current_tick_length() really is the base interval we're
using in the error accumulation loop, so it seems the cleanest interface
to use (just to avoid redundancy at least) when establishing the
clocksource's interval length. Or do you not agree?

thanks
-john






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] SELinux: Fix double free in selinux_netlbl_sock_setsid()

2008-01-28 Thread Paul Moore

As pointed out by Adrian Bunk, commit 45c950e0f839fded922ebc0bfd59b1081cc71b70
caused a double-free when security_netlbl_sid_to_secattr() fails.  This patch
fixes this by removing the netlbl_secattr_destroy() call from that function
since we are already releasing the secattr memory in
selinux_netlbl_sock_setsid().

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---

 security/selinux/ss/services.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 4bf715d..3a16aba 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2629,7 +2629,6 @@ int security_netlbl_sid_to_secattr(u32 sid, struct 
netlbl_lsm_secattr *secattr)
 
 netlbl_sid_to_secattr_failure:
POLICY_RDUNLOCK;
-   netlbl_secattr_destroy(secattr);
return rc;
 }
 #endif /* CONFIG_NETLABEL */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Gene Heskett wrote:
>On Monday 28 January 2008, Robert Hancock wrote:
>[...]
>
>>Check the /etc/modprobe.conf file, a lot of distributions use this to
>>generate the initrd. If there's references to pata_amd it'll try and
>>include it.
>
>Bingo!  Thanks Robert, I'll try it again with that line commented.  I wasn't
>aware of that connection at all.  Yup, it worked, I feel a reboot coming
>on. :)

But it didn't work, apparently commenting that line out needs to be balanced 
by adding another line telling it amd74xx is the 'hostadapter', not 
necessarily scsi.

Can this be made more universal so I don't have to edit /etc/modprobe.conf?

Thanks.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Because we don't think about future generations, they will never forget us.
-- Henrik Tikkanen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: "Default Linux Capabilities" default in 2.6.24

2008-01-28 Thread Matt LaPlante

On Thu, 24 Jan 2008 19:12:01 -0600
Matt LaPlante <[EMAIL PROTECTED]> wrote:

> 
> I'm doing a make oldconfig with the new 2.6.24 kernel.  I came to the prompt 
> for "Default Linux Capabilities" which defaults to No:
> 
> ---
>  Default Linux Capabilities (SECURITY_CAPABILITIES) [N/y/?] (NEW) ?
> ---
> 
> However the help text recommends saying Yes.
> 
> ---
>  This enables the "default" Linux capabilities functionality.
>  If you are unsure how to answer this question, answer Y.
> ---
> 
> Does this seem incongruous?  Also, what's the "question"? :)
> 
> Thanks, 
> Matt LaPlante

Anyone?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow

On Mon, 28 Jan 2008, Gene Heskett wrote:

> On Monday 28 January 2008, Daniel Barkalow wrote:
> >On Mon, 28 Jan 2008, Gene Heskett wrote:
> >> On Monday 28 January 2008, Daniel Barkalow wrote:
> >> >Building this and installing it along with the appropriate initrd (which
> >> >might be handled by Fedora's install scripts)
> >>
> >> Or mine, which I've been using for years.
> >
> >You're ahead of a surprising number of people, including me, if you
> >understand making initrds.
> 
> In my script, its one line:
> mkinitrd -f initrd-$VER.img $VER && \
> 
> where $VER is the shell variable I edit to = the version number, located at 
> the top of the script.
> 
> Unforch, its failing:
> No module pata_amd found for kernel 2.6.24, aborting.
> 
> This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
> on.  So something is still dependent on it. 

That looks like something in the guts of the initrd; it probably thinks 
you need pata_amd and it's unhappy that you don't have it.

Actually, another thing to try is making the ATA/etc one be "y" and 
pata_amd be "m". Most likely, this should lead to the ATA one claiming the 
drive before the module is loaded (but the module would be loaded later, 
to avoid upsetting the initrd); you should be able to tell from dmesg (or 
/dev, for that matter) which one got it, and I think built-in drivers will 
claim everything they can before an initrd gets loaded.

> I do have one sata drive, on an accessory card in the box, so I need the 
> rest of the sata_sil and friends stuff. 

Assuming it isn't picking up your hard drive, which it isn't, that 
shouldn't matter.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Robert Hancock wrote:
[...]
>Check the /etc/modprobe.conf file, a lot of distributions use this to
>generate the initrd. If there's references to pata_amd it'll try and
>include it.

Bingo!  Thanks Robert, I'll try it again with that line commented.  I wasn't 
aware of that connection at all.  Yup, it worked, I feel a reboot coming 
on. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
If everything seems to be going well, you have obviously overlooked something.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [CPUISOL] CPU isolation extensions

2008-01-28 Thread Daniel Walker

On Mon, 2008-01-28 at 16:12 -0800, Max Krasnyanskiy wrote:

> Not accurate enough and way too much overhead for what I need. I know at this 
> point it probably 
> sounds like I'm talking BS :). I wish I've released the engine and examples 
> by now. Anyway let 
> me just say that SW MAC has crazy tight deadlines with lots of small tasks. 
> Using nanosleep() & 
> gettimeofday() is simply not practical. So it's all TSC based with clever 
> time sync logic between
> HW and SW.

I don't know if it's BS or not, you clearly fixed your own problem which
is good .. Although when you say "RT patches cannot achieve what I
needed. Even RTAI/Xenomai can't do that." , and HRT is "Not accurate
enough and way too much overhead" .. Given the hardware your using,
that's all difficult to believe.. You also said this code has been
running on production systems for two year, which means it's at least
two years old .. There's been some good sized leaps in real time linux
in the past two years ..

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Daniel Barkalow wrote:
>On Mon, 28 Jan 2008, Gene Heskett wrote:
>> On Monday 28 January 2008, Daniel Barkalow wrote:
>> >Building this and installing it along with the appropriate initrd (which
>> >might be handled by Fedora's install scripts)
>>
>> Or mine, which I've been using for years.
>
>You're ahead of a surprising number of people, including me, if you
>understand making initrds.

In my script, its one line:
mkinitrd -f initrd-$VER.img $VER && \

where $VER is the shell variable I edit to = the version number, located at 
the top of the script.

Unforch, its failing:
No module pata_amd found for kernel 2.6.24, aborting.

This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
on.  So something is still dependent on it.  I do have one sata drive, on an 
accessory card in the box, so I need the rest of the sata_sil and friends 
stuff.  Its my virtual tapes for amanda.  Also home built, the amanda 
security model cannot be successfully bent into the shape of an rpm.  They 
BTW are #2 on coverity's list of most secure software.

So I've rebuilt 2.6.24 as it originally was, and added the acpi timer line to 
the 2.6.24-rc8 stanza's kernel argument list.  It will boot one or the other 
when I next reboot.  Its been about 8 hours since the last error was logged, 
which is totally weirdsville to this old fart.  Phase of the moon maybe?  The 
visit to the sawbones to see about my heart?  They are going to fit me with a 
30 day recorder tomorrow, my skip a beat problem is getting worse.  The sort 
of stuff that goes with the 7nth decade I guess.  Officially, I'm wearing out 
me, too much sugar, too many times nearly electrocuted=shingles yadda 
yadda. :-)  Oh, and don't forget Arther, he moved in uninvited about 25 years 
ago too.  Those people that talk about the golden years?  They're full of 
excrement...

>> >will either get you back to
>> >old IDE or will make your kernel panic on boot, depending on whether you
>> >got it right (so make sure you can still boot the kernel you're sure of
>> > or something from a boot disk). This will also cause your hard drives to
>> > show up as different device nodes, so if your boot process doesn't mount
>> > by disk uuid but by some other feature (and I don't know what Fedora
>> > does), you'll also need to change it to something either stable across
>> > access methods or which works for the one you're now using.
>>
>> It mounts by LABEL=.  All of it.
>
>That'll save a huge amount of hassle. So long as you manage to get the
>right drivers included and the wrong drivers not included, you should be
>pretty much set.
>
>> Fedora is not the only people having trouble,  name a distro, its probably
>> someplace in that 14,800 hit google returns.
>
>Yeah, but they each may need different instructions, particularly if
>they're not mounting by label in general, or not mounting the root
>partition by label. That was the big hassle going the opposite direction.
>And the procedure is 4 lines to describe to somebody who knows how to
>build and install a new kernel for the distro, which is much shorter than
>the explanation of how you generally build and install a kernel. A real
>howto would have to explain where to get the distro's kernel sources and
>default configuration, for example.
>
>   -Daniel
>*This .sig left intentionally blank*

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Never drink from your finger bowl -- it contains only water.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Robert Hancock


Gene Heskett wrote:

On Monday 28 January 2008, Robert Hancock wrote:

Gene Heskett wrote:

And so far no one has tried to comment on those 2 dmesg lines I've quoted
a couple of times now, here's another:
[0.00] Nvidia board detected. Ignoring ACPI timer override.
[0.00] If you got timer trouble try acpi_use_timer_override
what the heck is that trying to tell me to do, in some sort of broken
english?

A lot of NVIDIA-chipset motherboards have BIOS problems where they
include an incorrect ACPI interrupt override for the timer interrupt,
which tends to cause the system to fail to boot due to the timer
interrupt not working. The kernel normally ignores ACPI interrupt
overrides on the timer interrupt for NVIDIA chipsets for this reason.
Unfortunately on some such boards the override is actually correct and
needed, and so this actually causes problems. Hence the
acpi_use_timer_override option.

In any case this is unlikely to have anything to do with your problem,
since if that was messed up you likely would never have even booted.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


In this case, there seems to be a buglet.  I turned on the nvidia/amd drives 
under the ATA section of the menu, and turned off the pata_amd under the sata 
menu in xconfig.


But I've tried twice now and it fails to build the initrd because the pata_amd 
module is on the missing list.  Of course its missing, I didn't have it 
built...


Next?


Check the /etc/modprobe.conf file, a lot of distributions use this to 
generate the initrd. If there's references to pata_amd it'll try and 
include it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 1/6] mmu_notifier: Core code

2008-01-28 Thread Christoph Lameter

On Mon, 28 Jan 2008, Robin Holt wrote:

> USE_AFTER_FREE!!!  I made this same comment as well as other relavent
> comments last week.

Must have slipped somehow. Patch needs to be applied after the rcu fix.

Please repeat the other relevant comments if they are still relevant I 
thought I had worked through them.



mmu_notifier_release: remove mmu_notifier struct from list before calling 
->release

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/mmu_notifier.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/mm/mmu_notifier.c
===
--- linux-2.6.orig/mm/mmu_notifier.c2008-01-28 17:17:05.0 -0800
+++ linux-2.6/mm/mmu_notifier.c 2008-01-28 17:17:10.0 -0800
@@ -21,9 +21,9 @@ void mmu_notifier_release(struct mm_stru
rcu_read_lock();
hlist_for_each_entry_safe_rcu(mn, n, t,
  >mmu_notifier.head, hlist) {
+   hlist_del_rcu(>hlist);
if (mn->ops->release)
mn->ops->release(mn, mm);
-   hlist_del_rcu(>hlist);
}
rcu_read_unlock();
synchronize_rcu();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/18] MMC: OMAP: Introduce new multislot structure and change driver to use it

2008-01-28 Thread Roel Kluin

Carlos Aguiar wrote:
> From: Juha Yrjola <[EMAIL PROTECTED]>
> 
> Introduce new MMC multislot structure and change driver to use it.

> diff --git a/drivers/mmc/host/omap.c b/drivers/mmc/host/omap.c

It could be that I misunderstand, but...

> @@ -897,19 +1037,106 @@ static const struct mmc_host_ops mmc_omap_ops = {
>   .get_ro = mmc_omap_get_ro,
>  };
>  
> -static int __init mmc_omap_probe(struct platform_device *pdev)
> +static int __init mmc_omap_new_slot(struct mmc_omap_host *host, int id)
>  {
> - struct omap_mmc_conf *minfo = pdev->dev.platform_data;
> + struct mmc_omap_slot *slot = NULL;
>   struct mmc_host *mmc;
> + int r;
> +
> + mmc = mmc_alloc_host(sizeof(struct mmc_omap_slot), host->dev);

since you mmc_alloc_host here...

> + if (mmc == NULL)
> + return -ENOMEM;
> +
> + slot = mmc_priv(mmc);
> + slot->host = host;
> + slot->mmc = mmc;
> + slot->id = id;
> + slot->pdata = >pdata->slots[id];
> +
> + host->slots[id] = slot;
> +
> + mmc->caps = MMC_CAP_MULTIWRITE | MMC_CAP_MMC_HIGHSPEED |
> + MMC_CAP_SD_HIGHSPEED;
> + if (host->pdata->conf.wire4)
> + mmc->caps |= MMC_CAP_4_BIT_DATA;
> +
> + mmc->ops = _omap_ops;
> + mmc->f_min = 40;
> +
> + if (cpu_class_is_omap2())
> + mmc->f_max = 4800;
> + else
> + mmc->f_max = 2400;
> + if (host->pdata->max_freq)
> + mmc->f_max = min(host->pdata->max_freq, mmc->f_max);
> + mmc->ocr_avail = slot->pdata->ocr_mask;
> +
> + /* Use scatterlist DMA to reduce per-transfer costs.
> +  * NOTE max_seg_size assumption that small blocks aren't
> +  * normally used (except e.g. for reading SD registers).
> +  */
> + mmc->max_phys_segs = 32;
> + mmc->max_hw_segs = 32;
> + mmc->max_blk_size = 2048;   /* BLEN is 11 bits (+1) */
> + mmc->max_blk_count = 2048;  /* NBLK is 11 bits (+1) */
> + mmc->max_req_size = mmc->max_blk_size * mmc->max_blk_count;
> + mmc->max_seg_size = mmc->max_req_size;
> +
> + r = mmc_add_host(mmc);
> + if (r < 0)
> + return r;

shouldn't you mmc_free_host(mmc) here?

> +
> + if (slot->pdata->name != NULL) {
> + r = device_create_file(>class_dev,
> + _attr_slot_name);
> + if (r < 0)
> + goto err_remove_host;
> + }
> +
> + if (slot->pdata->get_ro != NULL) {
> + r = device_create_file(>class_dev,
> + _attr_ro);
> + }
> +
> + return 0;
> +
> +err_remove_slot_name:

This label has no goto, so the 2 lines below are dead code...
Ok, now I see It's in the next patch. (maybe better to put these lines
in that patch?)

> + if (slot->pdata->name != NULL)
> + device_remove_file(>class_dev, _attr_ro);
> +err_remove_host:
> + mmc_remove_host(mmc);

and maybe mmc_free_host(mmc) here?

> + return r;
> +}

from mmc_omap_probe()

+   for (i = 0; i < pdata->nr_slots; i++) {
+   ret = mmc_omap_new_slot(host, i);
+   if (ret < 0) {
+   while (--i >= 0)
+   mmc_omap_remove_slot(host->slots[i]);

mmc_omap_remove_slot() does a mmc_free_host(), but note the decrement of i:
the current isn't freed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] Force enable HPET on (some?) ICH9 boards

2008-01-28 Thread Pallipadi, Venkatesh


Patch looks good.
If BIOS does not report HPET on more of such systems we may have to add
other chipsets in ICH9 family (ICH9_8, ...) as well.

Acked-by: Venkatesh Pallipadi <[EMAIL PROTECTED]> 


>-Original Message-
>From: Alistair John Strachan [mailto:[EMAIL PROTECTED] 
>Sent: Sunday, January 27, 2008 6:33 AM
>To: Pallipadi, Venkatesh
>Cc: Ingo Molnar; Linux Kernel Mailing List; Alistair John Strachan
>Subject: [PATCH] Force enable HPET on (some?) ICH9 boards
>
>Some consumer ICH9 boards (such as the Abit IP35 Pro) do not 
>provide a BIOS
>option for enabling the HPET. The same ICH workaround used for 
>6,7,8 can be
>applied to 9. Here I enable the only PCI id that was visible 
>on my system.
>
>I have confirmed the HPETs work both from userspace and as a 
>clocksource for
>the running kernel (2.6.24 here) after applying this patch.
>
>Force enabled HPET at base address 0xfed0
>hpet clockevent registered
>hpet0: at MMIO 0xfed0, IRQs 2, 8, 0, 0
>hpet0: 4 64-bit timers, 14318180 Hz
>
>Signed-off-by: Alistair John Strachan <[EMAIL PROTECTED]>
>
>---
> arch/x86/kernel/quirks.c |2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
>diff --git a/arch/x86/kernel/quirks.c b/arch/x86/kernel/quirks.c
>index fab30e1..150ba29 100644
>--- a/arch/x86/kernel/quirks.c
>+++ b/arch/x86/kernel/quirks.c
>@@ -162,6 +162,8 @@ 
>DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 
>PCI_DEVICE_ID_INTEL_ICH7_31,
>ich_force_enable_hpet);
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 
>PCI_DEVICE_ID_INTEL_ICH8_1,
>ich_force_enable_hpet);
>+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 
>PCI_DEVICE_ID_INTEL_ICH9_7,
>+   ich_force_enable_hpet);
> 
> 
> static struct pci_dev *cached_dev;
>
>-- 
>Cheers,
>Alistair.
>
>137/1 Warrender Park Road, Edinburgh, UK.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Udev coldplugging loads 8139too driver instead of 8139cp

2008-01-28 Thread Stephen Hemminger

On Tue, 29 Jan 2008 03:46:08 +0300
Michael Tokarev <[EMAIL PROTECTED]> wrote:

> Frederik Himpe wrote:
> > Linux 2.6.24 kernel gives the following messages when udev coldplugging
> > loads the driver for my NIC:
> > 
> > 8139too :00:0b.0: This (id 10ec:8139 rev 20) is an enhanced 8139C+ chip
> > 8139too :00:0b.0: Use the "8139cp" driver for improved performance and 
> > stability.
> 
> There are 2 drivers for 8139-based NICs.  For really different two kinds
> of hardware, which both uses the same PCI identifiers.  Both drivers
> "claims" to work with all NICs with those PCI ids, because "externally"
> (by means of udev for example) it's impossible to distinguish the two
> kinds of hardware, it becomes clean only when the driver (either of the
> two) loads and actually checks which hardware we have here.

Is there any chance of using subdevice or subversion to tell them apart?
That worked for other vendors like DLINK who slapped same ID on different
cards.

> Udev in fact loads both - 8139cp and 8139too.  The difference is the ORDER
> in which it loads them - if for "cp-handled" hardware it first loads "too",
> too will complain as above and will NOT claim the device.  The same is
> true for the opposite.
> 
> So - in short - things has always been this way (thanks to realtec).
> I've seen similar (but opposite) effects on my systems, which are all
> should be serviced by 8139too driver but 8139cp loaded first - up
> till i gave up and just disabled 8139cp...
> 
> I don't know what happened in 2.6.24, but my guess is that since 8139too-based
> hw is now alot more common, the two drivers are listed in the opposite
> order.
> 
> In short: NotABug, or ComplainToRealtec (but that's wy too late and
> will not help anyway) ;)
> 
> /mjt
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Pull request: DMA pool updates

2008-01-28 Thread Andrew Morton

On Mon, 28 Jan 2008 17:11:47 -0700
Matthew Wilcox <[EMAIL PROTECTED]> wrote:

> 
> G'day Linus, mate
> 
> Could you pull the dmapool branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc.git please?

The usual form is, I believe,

git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc.git dmapool

Otherwise people get all confused and think it's an empty tree (like I just
did).

> All the patches have been posted to linux-kernel before, and various
> comments (and acks) have been taken into account.
> 
> It's a fairly nice performance improvement, so would be good to get in.
> It's survived a few hours of *mumble* high-stress database benchmark,
> so I have high confidence in its stability.

Could we please at least have a shortlog so we can find out what the patch
titles are so we can google them so we can find out what the heck you're
proposing we add to the kernel?

It shouldn't be this hard!

There were no replies to v2 of the patch series.  It all looks reasonable
from a quick scan (assuming the patches are unchanged since then).

afaik these patches have been tested by nobody except thyself?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-28 Thread Fernando Lopez-Lezcano

On Mon, 2008-01-28 at 10:26 -0800, Fernando Lopez-Lezcano wrote:
> On Sun, 2008-01-27 at 05:46 +0100, Mike Galbraith wrote:
> > On Sat, 2008-01-26 at 17:59 -0800, Fernando Lopez-Lezcano wrote:
> > 
> > > Hi Ingo... back to testing. 
> > > History:
> > > 
> > > 2.6.23.x + rt has not been very usable for audio applications. 
> > > 2.6.24-rt1: same so far. 
> > > 
> > > Why: Jack keeps printing "delayed..." messages and has xruns which means
> > > that somehow the timing is delayed more than what jack would think
> > > reasonable. As in the case with an old timing bug, the problem
> > > dissapears when booting the kernel with idle=poll. Other users of Planet
> > > CCRMA are able to replicate the behavior, which goes away with idle=poll
> > > or booting the machine with only one core. As a workaround I have been
> > > packaging 2.6.22.x but now I'm not able to use that as the old rt14
> > > patch, suitably tweaked results in a non working kernel. 
> > > 
> > > So it looks like, again, timing is getting skewed when the jack process
> > > jumps between cpus and thus jack sees timing jumps that are just not
> > > happenning. 
> > > 
> > > This is with a build based on 2.6.24 using as a base the latest Fedora
> > > rawhide source package plus 2.6.24-rt1. 
> > 
> > Do you have a simple testcase?  (one which doesn't entail installing
> > ccrma and becoming an audiophile)
> 
> No, I don't at this point. 
> I'll see if I can cook something simple today... (naively thinking that
> some short C code could test for the clock being actually monotonic
> across cpus). 

Sorry, no luck so far in writing something simple that will fail. I
tried testing for the results from repeated calls to clock_gettime (what
jack uses for timing by default) to actually be monotonic, while a
script uses taskset to force a cpu switch and of course got no errors. 

2.6.24-rt1 with idle=poll works fine, without it I get multiple problems
with the jack internal timing, or least that is what it seems to me from
the symptoms. 

-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett

On Monday 28 January 2008, Robert Hancock wrote:
>Gene Heskett wrote:
>> And so far no one has tried to comment on those 2 dmesg lines I've quoted
>> a couple of times now, here's another:
>> [0.00] Nvidia board detected. Ignoring ACPI timer override.
>> [0.00] If you got timer trouble try acpi_use_timer_override
>> what the heck is that trying to tell me to do, in some sort of broken
>> english?
>
>A lot of NVIDIA-chipset motherboards have BIOS problems where they
>include an incorrect ACPI interrupt override for the timer interrupt,
>which tends to cause the system to fail to boot due to the timer
>interrupt not working. The kernel normally ignores ACPI interrupt
>overrides on the timer interrupt for NVIDIA chipsets for this reason.
>Unfortunately on some such boards the override is actually correct and
>needed, and so this actually causes problems. Hence the
>acpi_use_timer_override option.
>
>In any case this is unlikely to have anything to do with your problem,
>since if that was messed up you likely would never have even booted.
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [EMAIL PROTECTED]
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

In this case, there seems to be a buglet.  I turned on the nvidia/amd drives 
under the ATA section of the menu, and turned off the pata_amd under the sata 
menu in xconfig.

But I've tried twice now and it fails to build the initrd because the pata_amd 
module is on the missing list.  Of course its missing, I didn't have it 
built...

Next?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Of course it's possible to love a human being if you don't know them too well.
-- Charles Bukowski
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Udev coldplugging loads 8139too driver instead of 8139cp

2008-01-28 Thread Michael Tokarev

Frederik Himpe wrote:
> Linux 2.6.24 kernel gives the following messages when udev coldplugging
> loads the driver for my NIC:
> 
> 8139too :00:0b.0: This (id 10ec:8139 rev 20) is an enhanced 8139C+ chip
> 8139too :00:0b.0: Use the "8139cp" driver for improved performance and 
> stability.

There are 2 drivers for 8139-based NICs.  For really different two kinds
of hardware, which both uses the same PCI identifiers.  Both drivers
"claims" to work with all NICs with those PCI ids, because "externally"
(by means of udev for example) it's impossible to distinguish the two
kinds of hardware, it becomes clean only when the driver (either of the
two) loads and actually checks which hardware we have here.

Udev in fact loads both - 8139cp and 8139too.  The difference is the ORDER
in which it loads them - if for "cp-handled" hardware it first loads "too",
too will complain as above and will NOT claim the device.  The same is
true for the opposite.

So - in short - things has always been this way (thanks to realtec).
I've seen similar (but opposite) effects on my systems, which are all
should be serviced by 8139too driver but 8139cp loaded first - up
till i gave up and just disabled 8139cp...

I don't know what happened in 2.6.24, but my guess is that since 8139too-based
hw is now alot more common, the two drivers are listed in the opposite
order.

In short: NotABug, or ComplainToRealtec (but that's wy too late and
will not help anyway) ;)

/mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2.6.24] pci_ids: patch for Intel ICH10 DeviceID's

2008-01-28 Thread Gaston, Jason D

>-Original Message-
>From: Grant Grundler [mailto:[EMAIL PROTECTED]
>Sent: Monday, January 28, 2008 4:22 PM
>To: Gaston, Jason D
>Cc: [EMAIL PROTECTED]; linux-kernel@vger.kernel.org; linux-
>[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: Re: [PATCH 2.6.24] pci_ids: patch for Intel ICH10 DeviceID's
>
>On Mon, Jan 28, 2008 at 05:20:36PM -0800, Jason Gaston wrote:
>> This patch adds the Intel ICH10 LPC and SMBus Controller DeviceID's.
>
>Jason,
>two questions:
>Have you submitted these to pciids.sf.net?
>Where are these used in the kernel?
>
>thanks,
>grant
>
>>
>> Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>
>>
>> --- linux-2.6.24/include/linux/pci_ids.h.orig2008-01-24
>14:58:37.0 -0800
>> +++ linux-2.6.24/include/linux/pci_ids.h 2008-01-28
>15:05:41.0 -0800
>> @@ -2339,6 +2339,12 @@
>>  #define PCI_DEVICE_ID_INTEL_MCH_PC1 0x359a
>>  #define PCI_DEVICE_ID_INTEL_E7525_MCH   0x359e
>>  #define PCI_DEVICE_ID_INTEL_IOAT_CNB0x360b
>> +#define PCI_DEVICE_ID_INTEL_ICH10_0 0x3a14
>> +#define PCI_DEVICE_ID_INTEL_ICH10_1 0x3a16
>> +#define PCI_DEVICE_ID_INTEL_ICH10_2 0x3a18
>> +#define PCI_DEVICE_ID_INTEL_ICH10_3 0x3a1a
>> +#define PCI_DEVICE_ID_INTEL_ICH10_4 0x3a30
>> +#define PCI_DEVICE_ID_INTEL_ICH10_5 0x3a60
>>  #define PCI_DEVICE_ID_INTEL_IOAT_SNB0x402f
>>  #define PCI_DEVICE_ID_INTEL_IOAT_SCNB   0x65ff
>>  #define PCI_DEVICE_ID_INTEL_TOLAPAI_0   0x5031

Yes, these have been submitted to pciids.sf.net.  I am also in the
process up updating the pci.ids strings by adding "ICH10" to them.  

These pci_ids.h entries are used in the drivers/i2c/busses/i2c-i801.c
driver and the /arch/x86/pci/irq.c file.

Thank you,

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow

On Mon, 28 Jan 2008, Gene Heskett wrote:

> On Monday 28 January 2008, Daniel Barkalow wrote:
> >Building this and installing it along with the appropriate initrd (which
> >might be handled by Fedora's install scripts)
> 
> Or mine, which I've been using for years.

You're ahead of a surprising number of people, including me, if you 
understand making initrds.

> >will either get you back to 
> >old IDE or will make your kernel panic on boot, depending on whether you
> >got it right (so make sure you can still boot the kernel you're sure of or
> >something from a boot disk). This will also cause your hard drives to show
> >up as different device nodes, so if your boot process doesn't mount by
> >disk uuid but by some other feature (and I don't know what Fedora does),
> >you'll also need to change it to something either stable across access
> >methods or which works for the one you're now using.
> 
> It mounts by LABEL=.  All of it.

That'll save a huge amount of hassle. So long as you manage to get the 
right drivers included and the wrong drivers not included, you should be 
pretty much set.

> Fedora is not the only people having trouble,  name a distro, its probably 
> someplace in that 14,800 hit google returns.

Yeah, but they each may need different instructions, particularly if 
they're not mounting by label in general, or not mounting the root 
partition by label. That was the big hassle going the opposite direction. 
And the procedure is 4 lines to describe to somebody who knows how to 
build and install a new kernel for the distro, which is much shorter than 
the explanation of how you generally build and install a kernel. A real 
howto would have to explain where to get the distro's kernel sources and 
default configuration, for example.

-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.24] ata_piix: IDE mode SATA patch for Intel ICH10 DeviceID's

2008-01-28 Thread Jason Gaston

This patch adds the Intel ICH10 IDE mode SATA Controller DeviceID's.

Signed-off-by:  Jason Gaston <[EMAIL PROTECTED]>

--- linux-2.6.24/drivers/ata/ata_piix.c.orig2008-01-24 14:58:37.0 
-0800
+++ linux-2.6.24/drivers/ata/ata_piix.c 2008-01-28 14:58:22.0 -0800
@@ -263,6 +263,14 @@
{ 0x8086, 0x292e, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_ahci },
/* SATA Controller IDE (Tolapai) */
{ 0x8086, 0x5028, PCI_ANY_ID, PCI_ANY_ID, 0, 0, tolapai_sata_ahci },
+   /* SATA Controller IDE (ICH10) */
+   { 0x8086, 0x3a00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_ahci },
+   /* SATA Controller IDE (ICH10) */
+   { 0x8086, 0x3a06, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+   /* SATA Controller IDE (ICH10) */
+   { 0x8086, 0x3a20, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata_ahci },
+   /* SATA Controller IDE (ICH10) */
+   { 0x8086, 0x3a26, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
 
{ } /* terminate list */
 };
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2 of 4] x86: revert "defer cr3 reload when doing pud_clear()"

2008-01-28 Thread Jeremy Fitzhardinge

Revert "defer cr3 reload when doing pud_clear()" since I'm going to
replace it.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgalloc_32.h |7 ---
 include/asm-x86/pgtable-3level.h |   21 ++---
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -74,13 +74,6 @@ static inline void pmd_free(pmd_t *pmd)
 
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   /* This is called just after the pmd has been detached from
-  the pgd, which requires a full tlb flush to be recognized
-  by the CPU.  Rather than incurring multiple tlb flushes
-  while the address space is being pulled down, make the tlb
-  gathering machinery do a full flush when we're done. */
-   tlb->fullmm = 1;
-
paravirt_release_pd(__pa(pmd) >> PAGE_SHIFT);
tlb_remove_page(tlb, virt_to_page(pmd));
 }
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -96,23 +96,14 @@ static inline void pud_clear(pud_t *pudp
set_pud(pudp, __pud(0));
 
/*
-* In principle we need to do a cr3 reload here to make sure
-* the processor recognizes the changed pgd.  In practice, all
-* the places where pud_clear() gets called are followed by
-* full tlb flushes anyway, so we can defer the cost here.
+* Pentium-II erratum A13: in PAE mode we explicitly have to flush
+* the TLB via cr3 if the top-level pgd is changed...
 *
-* Specifically:
-*
-* mm/memory.c:free_pmd_range() - immediately after the
-* pud_clear() it does a pmd_free_tlb().  We change the
-* mmu_gather structure to do a full tlb flush (which has the
-* effect of reloading cr3) when the pagetable free is
-* complete.
-*
-* arch/x86/mm/hugetlbpage.c:huge_pmd_unshare() - the call to
-* this is followed by a flush_tlb_range, which on x86 does a
-* full tlb flush.
+* XXX I don't think we need to worry about this here, since
+* when clearing the pud, the calling code needs to flush the
+* tlb anyway.  But do it now for safety's sake. - jsgf
 */
+   write_cr3(read_cr3());
 }
 
 #define pud_page(pud) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1 of 4] x86: unify PAE/non-PAE pgd_ctor

2008-01-28 Thread Jeremy Fitzhardinge

The constructors for PAE and non-PAE pgd_ctors are more or less
identical, and can be made into the same function.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
Cc: William Irwin <[EMAIL PROTECTED]>

---
 arch/x86/mm/pgtable_32.c |   58 +-
 1 file changed, 22 insertions(+), 36 deletions(-)

diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c
--- a/arch/x86/mm/pgtable_32.c
+++ b/arch/x86/mm/pgtable_32.c
@@ -219,50 +219,39 @@ static inline void pgd_list_del(pgd_t *p
list_del(>lru);
 }
 
+#define UNSHARED_PTRS_PER_PGD  \
+   (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
-
-#if (PTRS_PER_PMD == 1)
-/* Non-PAE pgd constructor */
-static void pgd_ctor(void *pgd)
+static void pgd_ctor(void *p)
 {
+   pgd_t *pgd = p;
unsigned long flags;
 
-   /* !PAE, no pagetable sharing */
+   /* Clear usermode parts of PGD */
memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
 
spin_lock_irqsave(_lock, flags);
 
-   /* must happen under lock */
-   clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-   swapper_pg_dir + USER_PTRS_PER_PGD,
-   KERNEL_PGD_PTRS);
-   paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
-   __pa(swapper_pg_dir) >> PAGE_SHIFT,
-   USER_PTRS_PER_PGD,
+   /* If the pgd points to a shared pagetable level (either the
+  ptes in non-PAE, or shared PMD in PAE), then just copy the
+  references from swapper_pg_dir. */
+   if (PAGETABLE_LEVELS == 2 ||
+   (PAGETABLE_LEVELS == 3 && SHARED_KERNEL_PMD)) {
+   clone_pgd_range(pgd + USER_PTRS_PER_PGD,
+   swapper_pg_dir + USER_PTRS_PER_PGD,
KERNEL_PGD_PTRS);
-   pgd_list_add(pgd);
+   paravirt_alloc_pd_clone(__pa(pgd) >> PAGE_SHIFT,
+   __pa(swapper_pg_dir) >> PAGE_SHIFT,
+   USER_PTRS_PER_PGD,
+   KERNEL_PGD_PTRS);
+   }
+
+   /* list required to sync kernel mapping updates */
+   if (!SHARED_KERNEL_PMD)
+   pgd_list_add(pgd);
+
spin_unlock_irqrestore(_lock, flags);
 }
-#else  /* PTRS_PER_PMD > 1 */
-/* PAE pgd constructor */
-static void pgd_ctor(void *pgd)
-{
-   /* PAE, kernel PMD may be shared */
-
-   if (SHARED_KERNEL_PMD) {
-   clone_pgd_range((pgd_t *)pgd + USER_PTRS_PER_PGD,
-   swapper_pg_dir + USER_PTRS_PER_PGD,
-   KERNEL_PGD_PTRS);
-   } else {
-   unsigned long flags;
-
-   memset(pgd, 0, USER_PTRS_PER_PGD*sizeof(pgd_t));
-   spin_lock_irqsave(_lock, flags);
-   pgd_list_add(pgd);
-   spin_unlock_irqrestore(_lock, flags);
-   }
-}
-#endif /* PTRS_PER_PMD */
 
 static void pgd_dtor(void *pgd)
 {
@@ -275,9 +264,6 @@ static void pgd_dtor(void *pgd)
pgd_list_del(pgd);
spin_unlock_irqrestore(_lock, flags);
 }
-
-#define UNSHARED_PTRS_PER_PGD  \
-   (SHARED_KERNEL_PMD ? USER_PTRS_PER_PGD : PTRS_PER_PGD)
 
 #ifdef CONFIG_X86_PAE
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0 of 4] x86: cleanups from pmd lifetime series

2008-01-28 Thread Jeremy Fitzhardinge

Hi Ingo,

Here's a followup set from that last batch of patches:
 1. fix up the pgd_ctor merge, so that non-PAE will end up getting
kernel mappings
 2. revert "optimise-pud_clear-cr3-reload"
 3. only do a cr3 reload if pud_clear is being used on the active pagetable
 4. update documentation about PAE tlb flushing.

The third of these makes pud_clear more robust, since it doesn't rely
on it being followed by the right kind of TLB flush.  In practice it
shouldn't make any performance difference, since the only performance
critical paths pud_clear is used on are exit and execve, and they both
operate on some other pagetable at the time the old pagetable is being
pulled down.

It will generate TLB flushes in the case of a usermode process
munmapping a 1+G chunk of its address space, or something to do with
unsharing a hugetlbfs mapping.  I don't think either of these are
performance critical.

Thanks,
J


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4 of 4] x86: update reference for PAE tlb flushing

2008-01-28 Thread Jeremy Fitzhardinge

Remove bogus reference to "Pentium-II erratum A13" and point to the
actual canonical source of information about what requirements x86
processors have for PAE pagetable updates.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgalloc_32.h |6 --
 include/asm-x86/pgtable-3level.h |6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgalloc_32.h b/include/asm-x86/pgalloc_32.h
--- a/include/asm-x86/pgalloc_32.h
+++ b/include/asm-x86/pgalloc_32.h
@@ -87,8 +87,10 @@ static inline void pud_populate(struct m
set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
 
/*
-* Pentium-II erratum A13: in PAE mode we explicitly have to flush
-* the TLB via cr3 if the top-level pgd is changed...
+* According to Intel App note "TLBs, Paging-Structure Caches,
+* and Their Invalidation", April 2007, document 317080-001,
+* section 8.1: in PAE mode we explicitly have to flush the
+* TLB via cr3 if the top-level pgd is changed...
 */
if (mm == current->active_mm)
write_cr3(read_cr3());
diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -98,8 +98,10 @@ static inline void pud_clear(pud_t *pudp
set_pud(pudp, __pud(0));
 
/*
-* Pentium-II erratum A13: in PAE mode we explicitly have to flush
-* the TLB via cr3 if the top-level pgd is changed...
+* According to Intel App note "TLBs, Paging-Structure Caches,
+* and Their Invalidation", April 2007, document 317080-001,
+* section 8.1: in PAE mode we explicitly have to flush the
+* TLB via cr3 if the top-level pgd is changed...
 *
 * Make sure the pud entry we're updating is within the
 * current pgd to avoid unnecessary TLB flushes.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3 of 4] x86: pud_clear: only reload cr3 if necessary

2008-01-28 Thread Jeremy Fitzhardinge

Rather than unconditionally reloading cr3, only do so if the pud we're
updating is within the active pgd.

This eliminates TLB flushes most of the time.  The
performance-critical uses of pud_clear are during execve and exit, but
in those cases cr3 is referring to some other pagetable.  The only
other use of pud_clear is during a large (1Gbyte+) munmap, and those
are sufficiently rare that a couple of cr3 reloads won't hurt.

Signed-off-by: Jeremy Fitzhardinge <[EMAIL PROTECTED]>
---
 include/asm-x86/pgtable-3level.h |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/asm-x86/pgtable-3level.h b/include/asm-x86/pgtable-3level.h
--- a/include/asm-x86/pgtable-3level.h
+++ b/include/asm-x86/pgtable-3level.h
@@ -93,17 +93,20 @@ static inline void native_pmd_clear(pmd_
 
 static inline void pud_clear(pud_t *pudp)
 {
+   unsigned long pgd;
+
set_pud(pudp, __pud(0));
 
/*
 * Pentium-II erratum A13: in PAE mode we explicitly have to flush
 * the TLB via cr3 if the top-level pgd is changed...
 *
-* XXX I don't think we need to worry about this here, since
-* when clearing the pud, the calling code needs to flush the
-* tlb anyway.  But do it now for safety's sake. - jsgf
+* Make sure the pud entry we're updating is within the
+* current pgd to avoid unnecessary TLB flushes.
 */
-   write_cr3(read_cr3());
+   pgd = read_cr3();
+   if (__pa(pudp) >= pgd && __pa(pudp) < (pgd + 
sizeof(pgd_t)*PTRS_PER_PGD))
+   write_cr3(pgd);
 }
 
 #define pud_page(pud) \


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1261 matches

Mail list logo